Modern data analysis presents a variety of challenges, including the size, the dimensionality, the complexity, and the multiple-modality of the data. In an attempt to keep pace with these growing challenges, data scientists combine tools inspired from mathematics, from computer science, and from statistics. This TRIPODS Summer Bootcamp will provide attendees a hands-on introduction to emerging techniques for using topology with machine learning for the purpose of data analysis.
Topological and machine learning techniques potentially play complimentary roles for analyzing data. In topological data analysis, one leverages the fact that the shape of the data often reflects important and interpretable patterns within, although topological techniques alone typically cannot match the predictive power of machine learning. By contrast, machine learning algorithms provide state-of-the-art accuracies on predictive tasks, but the manner by which they arrive at a prediction is often difficult to interpret. Machine learning would benefit if one could use mathematics to provide more interpretability, even in exchange for reduced predictive power. There are by now a variety of ways to combine topology with machine learning, and the diversity of such approaches is growing. The goal of the TRIPODS Summer Bootcamp is to expose attendees to current tools combining topology and machine learning. The bootcamp will focus not only the successes of such algorithms but also on their inherent challenges, in order to inspire the development of novel approaches.
The bootcamp will consist of a hands-on tutorial during days 1-3, and a research conference during days 4-5.
Days 1-3: Introductory tutorial on applied topology and machine learning
The first three days of the bootcamp will include an introductory tutorial on applied topology, on machine learning, and on the marriage between the two. The featured topic from applied topology will be persistent homology, and the featured topic from machine learning will be classical algorithms such as clustering, support vector machines (SVM), and random forests. Finally, featured topics for combining persistent homology with machine learning will include the bottleneck or Wasserstein distances, persistence landscapes, and persistence images. The tutorial will emphasize hands-on coding exercises with real data. Participants will compare the performance and interpretability of standard algorithms on a variety of machine learning tasks, and they will also create and test variants of their own invention.
We will be doing computational exercises to accompany the bootcamp. Please see our
tutorial at
https://github.com/ICERM-TRIPODS-Top-ML/Top-ML/wiki
and our
code at
https://github.com/ICERM-TRIPODS-Top-ML/Top-ML.
Days 4-5: Research conference on topology and machine learning
The final two days of the bootcamp will feature a research conference on current trends in topology and machine learning. The conference will be targeted at a more expert audience not necessarily present at the preparatory bootcamp tutorials during the first three days.