Geometric and Topological Methods in Data Science
Institute for Computational and Experimental Research in Mathematics (ICERM)
December 16, 2021  December 17, 2021
Your device timezone is . Do you want to view schedules in or choose a custom timezone?
Thursday, December 16, 2021

8:45  9:00 am ESTWelcome11th Floor Lecture Hall
 Jeffrey Brock, Yale University
 Bjorn Sandstede, Brown University

9:00  9:45 am ESTGeometry of Molecular Conformations in CryoEM11th Floor Lecture Hall
 Speaker
 Roy Lederman, Yale University
 Session Chair
 Jeffrey Brock, Yale University
Abstract
CryoElectron Microscopy (cryoEM) is an imaging technology that is revolutionizing structural biology. Cryoelectron microscopes produce many very noisy twodimensional projection images of individual frozen molecules; unlike related methods, such as computed tomography (CT), the viewing direction of each particle image is unknown. The unknown directions and extreme noise make the determination of the structure of molecules challenging. While other methods for structure determination, such as xray crystallography and NMR, measure ensembles of molecules, cryoelectron microscopes produce images of individual particles. Therefore, cryoEM could potentially be used to study mixtures of conformations of molecules. We will discuss a range of recent methods for analyzing the geometry of molecular conformations using cryoEM data.

10:00  10:30 am ESTCoffee Break11th Floor Collaborative Space

11:00  11:45 am ESTGEOMETRIC AND TOPOLOGICAL APPROACHES TO REPRESENTATION LEARNING IN BIOMEDICAL DATA11th Floor Lecture Hall
 Speaker
 Smita Krishnaswamy, Yale University
 Session Chair
 Jeffrey Brock, Yale University
Abstract
Highthroughput, highdimensional data has become ubiquitous in the biomedical sciences as a result of breakthroughs in measurement technologies and data collection. While these large datasets containing millions of observations of cells, peoples, or brain voxels hold great potential for understanding generative state space of the data, as well as drivers of differentiation, disease and progression, they also pose new challenges in terms of noise, missing data, measurement artifacts, and the socalled “curse of dimensionality.” In this talk, I will cover data geometric and topological approaches to understanding the shape and structure of the data. First, we show how diffusion geometry and deep learning can be used to obtain useful representations of the data that enable denoising, dimensionality reduction. Next we show how to combine diffusion geometry with topology to extract multigranular features from the data to assist in differential and predictive analysis. On the flip side, we also create a manifold geometry from topological descriptors, and show its applications to neuroscience. Finally, we will show how to learn dynamics from static snapshot data by using a manifoldregularized neural ODEbased optimal transport. Together, we will show a complete framework for exploratory and unsupervised analysis of big biomedical data.

12:00  12:10 pm ESTGroup Photo11th Floor Lecture Hall

12:10  1:30 pm ESTLunch/Free Time

1:30  2:15 pm ESTMetric Repair11th Floor Lecture Hall
 Speaker
 Anna Gilbert, Yale University
 Session Chair
 Bjorn Sandstede, Brown University
Abstract
Metric embeddings are key algorithmic and mathematical techniques in applied mathematics and approximation algorithms, and their adaptations are ubiquitous in machine learning. They are used to embed one metric space into another with the hope of revealing hidden structure or reducing the dimension of a data set. Examples include the random projection of a set of points in high dimensions to a lower dimension and the embedding of a graph into a treelike structure. The fundamental limitation with the application of metric embeddings to machine learning is that their use in data analysis is predicated upon the input data coming from a metric space. Real data, however, do not necessarily conform to a metric; they are messy. The fundamental problem in our research program is metric repair: given a set of input distances, adjust them so that they conform to a metric.

2:30  3:00 pm ESTCoffee Break11th Floor Collaborative Space

3:00  3:45 pm ESTFrom Questionnaires to PDEs: Dynamics and Emergent Models from Disorganized Data11th Floor Lecture Hall
 Virtual Speaker
 Yannis Kevrekidis, Johns Hopkins University
 Session Chair
 Bjorn Sandstede, Brown University
Abstract
Starting with sets of disorganized observations of spatiotemporally evolving systems obtained at different (also disorganized) sets of parameters, we demonstrate the datadriven derivation of generative, parameter dependent, evolutionary partial differential equation models of the data. We know what observations were made at the same physical location, the same time or the same set of parameter values  knowing neither where the physical location is, nor when the temporal moment is, nor what the parameter values are; this tensor type of data is reminiscent of shuffled (multi)puzzle tiles .
The {\em independent variables} for the evolution equations (their ``space"" and ``time"") as well as their effective parameters are all ``emergent"", i.e. determined in a datadriven way from our disorganized observations of behavior in them.
We use a diffusion map based ``questionnaire"" approach to build a parametrization of our emergent space for the data. This approach iteratively processes the data by successively observing them on the ``space"", the ``time"" and the ``parameter"" axes of a tensor. Once the data are organized, we use neuralnetworkbased learning to approximate the operators governing the evolution equations in this emergent space. Our illustrative example is based on a previously developed vertexplussignaling model of \textit{Drosophila} embryonic development. This allows us to discuss features of the process like symmetry breaking, translational invariance of the emergent PDE model, and interpretability. 
4:00  4:45 pm ESTTopological data analysis of zebrafish patterns11th Floor Lecture Hall
 Virtual Speaker
 Alexandria Volkening, Purdue University
 Session Chair
 Bjorn Sandstede, Brown University
Abstract
Selforganization is present at many scales in biology, and here I will focus specifically on elucidating how brightly colored cells interact to form skin patterns in zebrafish. Wildtype zebrafish are named for their dark and light stripes, but mutant zebrafish feature variable skin patterns, including spots and labyrinth curves. All of these patterns form as the fish grow due to the interactions of tens of thousands of pigment cells, making agentbased modeling a natural approach for describing pattern formation. By identifying cell interactions that may change to create mutant patterns, my longterm goal is to help link genes, cell behavior, and visible animal characteristics in fish. However, agentbased models are stochastic and have many parameters, so comparing simulated patterns and fish images is often a qualitative process. Developing analytically tractable continuum models from agentbased systems is one means of addressing these challenges and better understanding the roles of different parameters in pattern formation. Alternatively, methods from topological data analysis can be applied to cellbased systems directly. In this talk, I will overview our models and present quantitative comparisons of in silico and in vivo cellbased patterns using our topological methods.

5:00  6:00 pm ESTReception11th Floor Collaborative Space
Friday, December 17, 2021

9:00  9:45 am ESTRobust and Scalable Learning of Gaussian Mixture Models11th Floor Lecture Hall
 Speaker
 Kisung You, Yale University
 Session Chair
 Ian Adelstein, Yale University
Abstract
A Gaussian mixture model (GMM) is one of the highlighted methods in both machine learning and statistics communities for probabilistic clustering and density estimation. Estimation of the model is usually executed by the expectationmaximization (EM)like algorithm. When the sample size is large, however, the EM algorithm may not be a convenient option due to exponential growth in computational costs. In this talk, I present a divideandconquer approach with minimal communication to resolve this problem by working with a Hilbertian structure of GMMs induced by kernel embedding of Gaussian measures. This is done by estimating multiple models on independent subsets of the data and aggregating those into a single GMM by geometric median in the Hilbert space, which guarantees robustness of the estimate under mild conditions. Next, once the estimate is achieved, it may contain overly redundant components in that the obtained clustering is not meaningful and interpretation of each component becomes incomprehensible. Upon the observation, two postprocessing strategies for model reduction and clustering characterization are proposed.

10:00  10:30 am ESTCoffee Break11th Floor Collaborative Space

11:00  11:45 am ESTCharacterizing Transitions in Developmental Biology using Topological Machine Learning11th Floor Lecture Hall
 Speaker
 Dhananjay Bhaskar, Yale University
 Session Chair
 Ian Adelstein, Yale University
Abstract
I will present ongoing work applying topological data analysis (TDA) and machine learning to identify transitions in cell organization and cell state within the context of developmental biology. First, using cell positions obtained from agentbased simulations of cell sorting and skin pigmentation, the complex relationship between cellcell interactions and emergent patterns is automatically discovered via unsupervised classification of persistence images. This approach is used to analyze phase transitions in proliferating, heterogeneous populations and found to be empirically robust to random perturbations and finitesize effects. Next, I will discuss challenges associated with TDA of highdimensional single cell sequencing datasets. In particular, lack of suitable techniques for intrinsic dimension and curvature estimation is limiting the use of multiparameter filtration as a tool for understanding these data. I will briefly outline a novel approach for tackling this problem, using graph diffusion probabilities to predict curvature on toy data consisting of points sampled from quadric surfaces.

12:00  1:15 pm ESTLunch/Free Time

1:15  2:00 pm ESTGeometry of Neural Representations Shapes MultiTask Function in Neural Networks and Humans11th Floor Lecture Hall
 Speaker
 John Murray, Yale University
 Session Chair
 Smita Krishnaswamy, Yale University
Abstract
Flexible cognitive behavior requires the ability to learn and perform a diversity of tasks without detrimental interference. What are the geometric properties of neural representations that support multitask learning and function? In this talk I will present recent and ongoing studies integrating computational modeling and empirical data to link the representational geometry of neural networks to cognitive function.

2:15  2:45 pm ESTCoffee Break11th Floor Collaborative Space

2:45  3:30 pm ESTConnecting molecules to individual cell behavior to emergent collective behavior11th Floor Lecture Hall
 Speaker
 Thierry Emonet, Yale University
 Session Chair
 Smita Krishnaswamy, Yale University
Abstract
Cells live in communities where they interact with each other and their environment. By coordinating individuals, such interactions often result in collective behavior and function that emerge on scales larger than the individuals and are beneficial to the population. At the same time, populations of individuals, even isogenic ones, display phenotypic heterogeneity, which diversifies individual behavior and enhances the resilience of the population in unexpected situations. This raises a dilemma: although individuality provides advantages, it also tends to reduce coordination. I will discuss our experimental and theoretical efforts that use bacterial chemotaxis as model system to understand the origin of individual cellular behavior and performance, and how populations of cells reconciliate individuality with group behavior to robustly operate in multiple environments. Bacterial chemotaxis is one of the best understood model systems of all of biology. As such it enables us to examine both experimentally and theoretically how dynamical interactions at one scale give rise to structure and function at the next (larger) scale. Thus, it is a great testbed for novel mathematical methods to study data.

3:45  4:30 pm ESTGeometric Scattering And Applications11th Floor Lecture Hall
 Speaker
 Michael Perlmutter, University of California, Los Angeles
 Session Chair
 Smita Krishnaswamy, Yale University
Abstract
The scattering transform is a mathematical model of convolutional neural networks (CNNs) introduced for functions defined on Euclidean space by Stephan\'e Mallat. It differs from traditional CNNs by using predesigned, wavelet filters rather than filters which are learned from training data. This leads to a network which provably has stability and invariance guarantees. Moreover, in situations where the wavelets can be designed in correspondence to underlying physics, it can produce very good numerical results. The rise of geometric deep learning motivated the introduction of geometric scattering transforms for data sets modeled as graphs or manifolds. These networks use wavelets constructed using the spectral decompositions of an appropriate Laplacian operator or via polynomials of a diffusion operator. In my talk, I will discuss applications of these networks to a variety of geometric deep learning tasks and show that they have analogous stability and invariance guarantees to their Euclidean predecessor. I will then talk about modifications of the graph scattering transform which can increase numerical performance and also work using the graph scattering transform as the front end of an encoderdecoder network for the purposes of molecule generation.
All event times are listed in ICERM local time in Providence, RI (Eastern Standard Time / UTC5).
All event times are listed in .
ICERM local time in Providence, RI is Eastern Standard Time (UTC5). Would you like to switch back to ICERM time or choose a different custom timezone?
Schedule Timezone Updated