Organizing Committee
- Shuchin Aeron
Tufts University - Markos Katsoulakis
University of Massachusetts Amherst - James Murphy
Tufts University - Luc Rey-Bellet
UMass Amherst - Bjorn Sandstede
Brown University
Abstract
This workshop will focus on the intersection of mathematics, statistics, machine learning, and computation, when viewed through the lens of optimal transport (OT). Mathematical topics will include low-dimensional models for OT, linearizations of OT, and the geometry of OT including gradient flows and gradient descent in the space of measures. Relevant statistical topics will include reliable and efficient estimation of OT plans in high dimensions, the role of regularization in computing OT distances and plans, with applications to robust statistics, uncertainty quantification, and overparameterized machine learning. Computation will be a recurring theme of the workshop, with emphasis on the development of fast algorithms and applications to computational biology, high energy physics, material science, spatio-temporal modeling, natural language processing, and image processing.
Confirmed Speakers & Participants
Talks will be presented virtually or in-person as indicated in the schedule below.
- Speaker
- Poster Presenter
- Attendee
- Virtual Attendee
-
Shuchin Aeron
Tufts University
-
Kang An
Rice University
-
Ricardo Baptista
California Institute of Technology
-
Panagiota Birmpa
Heriot-Watt Univeristy
-
Jose Blanchet
Stanford
-
Tobias Blickhan
Max Planck Institute for Plasma Physics
-
Tamara Broderick
Massachusetts Institute of Technology
-
Dongwei Chen
Clemson University
-
Ziyu Chen
University of Massachusetts Amherst
-
Yongxin Chen
Georgia Institute of Technology
-
Yu-Chen Cheng
Dana-Farber Cancer Institute
-
Jannatul Ferdous Chhoa
University of Houston
-
Frank Cole
University of Massachusetts Amherst
-
Emil Constantinescu
Argonne National Laboratory
-
Keisha Cook
Tulane University
-
YUQING DAI
Duke University
-
Steve Damelin
Mathematical Scientist, Ann Arbor MI
-
Fred Daum
Raytheon
-
ROCIO DIAZ MARTIN
Vanderbilt University
-
Christopher Eads
University of Texas at Arlington
-
Gabriel Earle
University of Massachusetts Amherst
-
Ranthony Edmonds
The Ohio State University
-
Marcia Fampa
Federal University of Rio de Janeiro
-
Weifu Fang
Wright State University
-
Guosheng Fu
University of Notre Dame
-
Wilfrid Gangbo
UCLA
-
Tryphon Georgiou
University of California, Irvine
-
Hyemin Gu
University of Massachusetts Amherst
-
Nestor Guillen
Texas State University
-
Florian Gunsilius
University of Michigan
-
Minh Ha Quang
RIKEN
-
Keaton Hamm
University of Texas at Arlington
-
Alexander Hsu
University of Washington
-
Yinan Hu
New York University
-
Stefanie Jegelka
MIT
-
Sixian Jin
Worcester Polytechnic Institute
-
Yijie Jin
Georgia Institute of Technology
-
Justin Kakeu
University of Prince Edward Island
-
Markos Katsoulakis
University of Massachusetts Amherst
-
Varun Khurana
University of California, San Diego
-
Jun Kitagawa
Michigan State University
-
Vladimir Kobzar
Columbia University
-
Marie Jose Kuffner
John Hopkins University
-
Christian Kümmerle
Univeristy of North Carolina at Charlotte
-
Dohyun Kwon
University of Seoul
-
Ivan Lau
Rice University
-
Shiying Li
University of North Carolina at Chapel Hill
-
Wuchen Li
University of South Carolina
-
Jiaming Liang
Yale University
-
Wei Liu
Rensselaer Polytechnic Institute
-
Jun Liu
Southern Illinois University Edwardsville
-
Yulong Lu
University of Massachusetts Amherst
-
Miranda Lynch
Hauptman-Woodward Medical Research Institute
-
ildebrando magnani
University of Michigan
-
Charalambos Makridakis
Foundation for Research and Technology-Hellas (FORTH)
-
Brendan Mallery
Tufts University
-
Youssef Marzouk
Massachusetts Institute of Technology
-
SHOAIB BIN MASUD
Tufts University
-
Tyler Maunu
Brandeis University
-
Henok Mawi
Howard University
-
Ian Oliver McPherson
Johns Hopkins University
-
Kun Meng
Brown University
-
TOSHIO MIKAMI
Tsuda University
-
Martin Molina Fructuoso
Brandeis University
-
Caroline Moosmüller
University of North Carolina at Chapel Hill
-
Rebecca Morrison
University of Colorado Boulder
-
Chenchen Mou
City University of Hong Kong
-
Lidia Mrad
Mount Holyoke College
-
James Murphy
Tufts University
-
Evangelos Nastas
SUNY
-
Elisa Negrini
Institute for Pure and Applied Mathematics, University of California Los Angeles
-
T. H. Molena Nguyen
North Carolina State University
-
Djordje Nikolic
University of California, Santa Barbara
-
Nikolas Nüsken
King’s College London
-
Marcel Nutz
Columbia University
-
Daniel Packer
The Ohio State University
-
Ali Pakniyat
University of Alabama
-
Soumik Pal
University of Washington, Seattle
-
Ioannis (Yannis) Pantazis
Foundations of Research and Technology - Hellas
-
Farhad Pourkamali Anaraki
University of Colorado Denver
-
Yusuf Qaddura
The Ohio State University
-
Michael Rawson
Pacific Northwest National Lab
-
Luc Rey-Bellet
UMass Amherst
-
Bety Rostandy
Brightseed Bio
-
Bjorn Sandstede
Brown University
-
Geoffrey Schiebinger
University of British Columbia
-
Bodhisattva Sen
Columbia University
-
Zhaiming Shen
University of Georgia
-
Yunpeng Shi
Princeton University
-
Nimita Shinde
ICERM, Brown University
-
Reed Spitzer
Brown University
-
Stephan Sturm
WPI
-
Shashank Sule
University of Maryland, College Park
-
Sui Tang
University of California Santa Barbara
-
Allen Tannenbaum
Stony Brook University
-
Sumati Thareja
Vanderbilt University
-
Mohammad Taha Toghani
Rice University
-
Cesar Uribe
Rice University
-
Brian Van Koten
University of Massachusetts, Amherst
-
Dootika Vats
Indian Institute of Technology Kanpur
-
Hong Wang
University of South Carolina
-
Yi Wang
Johns Hopkins University
-
Jeremy Wang
Brown University
-
Matthew Werenski
Tufts University
-
Yangyang Xu
Rensselaer Polytechnic Institute
-
Tinghui Xu
University of Wisconsin-Madison
-
Sheng Xu
Princeton University
-
Xingchi Yan
Harvard University
-
Ruiyi Yang
Princeton University
-
Bowen Yang
Caltech
-
Shixuan Zhang
Brown University
-
Stephen Zhang
University of Melbourne
-
Benjamin Zhang
University of Massachusetts Amherst
-
Jiwei Zhao
University of Wisconsin-Madison
-
Bohan Zhou
Dartmouth College
-
Xiang ZHOU
City University of Hong Kong
-
Wei Zhu
University of Massachusetts Amherst
Workshop Schedule
Monday, May 8, 2023
-
8:50 - 9:00 am EDTWelcome11th Floor Lecture Hall
-
9:00 - 9:45 am EDTNetwork Analysis of High Dimensional Data11th Floor Lecture Hall
- Speaker
- Allen Tannenbaum, Stony Brook University
- Session Chair
- James Murphy, Tufts University
Abstract
A major problem in data science is representation of data so that the variables driving key functions can be uncovered and explored. Correlation analysis is widely used to simplify networks of feature variables by reducing redundancies, but makes limited use of the network topology, relying on comparison of direct neighbor variables. The proposed method incorporates relational or functional profiles of neighboring variables along multiple common neighbors, which are fitted with Gaussian mixture models and compared using a data metric based on a version of optimal mass transport tailored to Gaussian mixtures. Hierarchical interactive visualization of the result leads to effective unbiased hypothesis generation. We will discuss several applications to medical imaging and cancer networks.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTSigned Cumulative Distribution Transform for Machine Learning11th Floor Lecture Hall
- Speaker
- Sumati Thareja, Vanderbilt University
- Session Chair
- James Murphy, Tufts University
Abstract
Classification and estimation problems are at the core of machine learning. In this talk we will see a new mathematical signal transform that renders data easy to classify or estimate, based on a very old theory of transportation that was started by Monge. We will learn about the existing Cumulative Distribution Transform and then extend to a more general measure theoretic framework, to define the new transform (Signed Cumulative Distribution Transform). We will look at both forward (analysis) and inverse (synthesis) formulas for the transform, and describe several of its properties including translation, scaling, convexity and isometry. Finally, we will demonstrate two applications of the transform in classifying (detecting) signals under random displacements and estimation of signal parameters under such displacements.
-
11:30 am - 12:15 pm EDTTowards a Mathematical Theory of Development11th Floor Lecture Hall
- Virtual Speaker
- Geoffrey Schiebinger, University of British Columbia
- Session Chair
- James Murphy, Tufts University
Abstract
This talk introduces a mathematical theory of developmental biology, based on optimal transport. While, in principle, organisms are made of molecules whose motions are described by the Schödinger equation, there are simply too many molecules for this to be useful. Optimal transport—a fascinating topic in its own right, at the intersection of probability, statistics and optimization—provides a set of equations that describe development at the level of cells. Biology has entered a new era of precision measurement and massive datasets. Techniques like single-cell RNA sequencing (scRNA-seq) and single-cell ATAC-seq have emerged as powerful tools to profile cell states at unprecedented molecular resolution. One of the most exciting prospects associated with this new trove of data is the possibility of studying temporal processes, such as differentiation and development. If we could understand the genetic forces that control embryonic development, then we would have a better idea of how cell types are stabilized throughout adult life and how they destabilize with age or in diseases like cancer. This would be within reach if we could analyze the dynamic changes in gene expression, as populations develop and subpopulations differentiate. However, this is not directly possible with current measurement technologies because they are destructive (e.g. cells must be lysed to measure expression profiles). Therefore, we cannot directly observe the waves of transcriptional patterns that dictate changes in cell type. This talk introduces a rigorous framework for understanding the developmental trajectories of cells in a dynamically changing, heterogeneous population based on static snapshots along a time-course. The framework is based on a simple hypothesis: over short time-scales cells can only change their expression profile by small amounts. We formulate this in precise mathematical terms using a classical tool called optimal transport (OT), and we propose that this optimal transport hypothesis is a fundamental mathematical principle of developmental biology.
-
12:30 - 2:00 pm EDTLunch - Optimal transport: Junior and seniorsWorking Lunch
-
2:00 - 2:45 pm EDTMultivariate Distribution-free testing using Optimal Transport11th Floor Lecture Hall
- Speaker
- Bodhisattva Sen, Columbia University
- Session Chair
- James Murphy, Tufts University
Abstract
We propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of optimal transport (see e.g., Villani (2003)). We demonstrate the applicability of this approach by constructing exactly distribution-free tests for two classical nonparametric problems: (i) testing for the equality of two multivariate distributions, and (ii) testing for mutual independence between two random vectors. In particular, we propose (multivariate) rank versions of Hotelling T^2 and kernel two-sample tests (e.g., Gretton et al. (2012), Szekely and Rizzo (2013)), and kernel tests for independence (e.g., Gretton et al. (2007), Szekely et al. (2007)) for scenarios (i) and (ii) respectively. We investigate the consistency and asymptotic distributions of these tests, both under the null and local contiguous alternatives. We also study the local power and asymptotic (Pitman) efficiency of these multivariate tests (based on optimal transport), and show that a subclass of these tests achieve attractive efficiency lower bounds that mimic the remarkable efficiency results of Hodges and Lehmann (1956) and Chernoff and Savage (1958) (for the Wilcoxon-rank sum test). To the best of our knowledge, these are the first collection of multivariate, nonparametric, exactly distribution-free tests that provably achieve such attractive efficiency lower bounds. We also study the rates of convergence of the rank maps (aka optimal transport maps).
-
3:00 - 3:30 pm EDTCoffee Break11th Floor Collaborative Space
-
3:30 - 4:15 pm EDTOptimal transport for estimating generalization properties of machine learning models11th Floor Lecture Hall
- Speaker
- Stefanie Jegelka, MIT
- Session Chair
- James Murphy, Tufts University
Abstract
One important challenge in practical machine learning is to estimate the generalization properties of a trained model, i.e., judging how well it will perform on unseen data. In this talk, we will discuss two examples of how optimal transport can help with this challenge. First, we address the problem of estimating generalization of deep neural networks. A critical factor is the geometry of the data in the latent embedding space. We analyze this data arrangement via an optimal-transport-based generalization of variance, and show its theoretical and empirical relevance via generalization bounds that are also empirically predictive. Second, we study the stability of neural networks for graph inputs, i.e., graph neural networks (GNNs), under shifts of the data distribution. In particular, to derive stability bounds, we need a suitable metric in the input space of graphs. We derive such a (pseudo)metric targeted to GNNs via a recursive optimal transport based distance between sets of trees. Our metric correlates better than state of the art with the behavior of GNNs under data distribution shifts.
-
4:30 - 6:30 pm EDTReception11th Floor Collaborative Space
Tuesday, May 9, 2023
-
9:00 - 9:45 am EDTOn the Convergence Rate of Sinkhorn’s Algorithm11th Floor Lecture Hall
- Speaker
- Marcel Nutz, Columbia University
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
We study Sinkhorn's algorithm for solving the entropically regularized optimal transport problem. Its iterate π_t is shown to satisfy H(π_t|π∗)+H(π∗|π_t)=O(1/t) where H denotes relative entropy and π∗ the optimal coupling. This holds for a large class of cost functions and marginals, including quadratic cost with subgaussian marginals. We also obtain the rate O(1/t) for the dual suboptimality and O(1/t^2) for the marginal entropies. More precisely, we derive non-asymptotic bounds, and in contrast to previous results on linear convergence that are limited to bounded costs, our estimates do not deteriorate exponentially with the regularization parameter. We also obtain a stability result for π∗ as a function of the marginals, quantified in relative entropy.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTApplications of No-Collision Transportation Maps in Manifold Learning11th Floor Lecture Hall
- Speaker
- Elisa Negrini, Institute for Pure and Applied Mathematics, University of California Los Angeles
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
In this work, we investigate applications of no-collision transportation maps introduced in [Nurbekyan et. al., 2020] in manifold learning for image data. Recently, there has been a surge in applying transportationbased distances and features for data representing motion-like or deformationlike phenomena. Indeed, comparing intensities at fixed locations often does not reveal the data structure. No-collision maps and distances developed in [Nurbekyan et. al., 2020] are sensitive to geometric features similar to optimal transportation (OT) maps but much cheaper to compute due to the absence of optimization. In this work, we prove that no-collision distances provide an isometry between translations (respectively dilations) of a single probability measure and the translation (respectively dilation) vectors equipped with a Euclidean distance. Furthermore, we prove that no-collision transportation maps, as well as OT and linearized OT maps, do not in general provide an isometry for rotations. The numerical experiments confirm our theoretical findings and show that no-collision distances achieve similar or better performance on several manifold learning tasks compared to other OT and Euclidean-based methods at a fraction of a computational cost.
-
11:30 am - 12:15 pm EDTGraphical Optimal Transport and its applications11th Floor Lecture Hall
- Speaker
- Yongxin Chen, Georgia Institute of Technology
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
Multi-marginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the well-developed algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games.
-
12:30 - 2:00 pm EDTLunch/Free Time
-
2:00 - 2:45 pm EDTAdvances in Distributionally Robust Optimization (DRO): Unifications, Extensions, and Applications11th Floor Lecture Hall
- Speaker
- Jose Blanchet, Stanford
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
We will discuss recent developments in distributionally robust optimization, including a tractable class of problems that simultaneously unifies and extends most of the formulations studied in DRO (including phi-divergence, inverse-phi-divergence, Wasserstein, and Sinkhorn). This unification is based on optimal transport theory with martingale constraints. We discuss various benefits of having the flexibility offered by these formulations in connection with, for example, the theory of epi-convergence and statistical robustness. We apply some of these new developments to optimal portfolio selection. Our implementations are motivated by intriguing experiments which show an unexpected out-of-sample performance of non-robust policies in real data. This talk is partly based on joint work with Daniel Kuhn, Jiajin Li, Yiping Lu, and Bahar Taskesen.
-
3:00 - 5:00 pm EDT
Wednesday, May 10, 2023
-
9:00 - 9:45 am EDTMirror gradient flows: Euclidean and Wasserstein11th Floor Lecture Hall
- Speaker
- Soumik Pal, University of Washington, Seattle
- Session Chair
- Markos Katsoulakis, University of Massachusetts Amherst
Abstract
We will talk about a new family of Wasserstein gradient flows that is inspired by Euclidean mirror gradient flows. These flows can often display faster convergence rates than the usual gradient flows. They have rich geometrical structures and give rise to a wide generalization of the Langevin diffusions and the Fokker-Planck PDEs. An immediate applications come from considering limits of Sinkhorn iterations.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTMany Processors, Little Time: MCMC for Partitions via Optimal Transport Couplings11th Floor Lecture Hall
- Speaker
- Tamara Broderick, Massachusetts Institute of Technology
- Session Chair
- Luc Rey-Bellet, UMass Amherst
Abstract
Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinite-time limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the “label-switching problem”: semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions’ (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, time-limited regime.
-
11:30 am - 12:15 pm EDTOn Hamilton-Jacobi (HJ) equations on the Wasserstein space on graphs.11th Floor Lecture Hall
- Speaker
- Wilfrid Gangbo, UCLA
- Session Chair
- Markos Katsoulakis, University of Massachusetts Amherst
-
12:25 - 12:30 pm EDTGroup Photo (Immediately After Talk)11th Floor Lecture Hall
-
12:30 - 2:00 pm EDTLunch/Free Time
-
2:00 - 2:45 pm EDTCertifiable low-dimensional structure in transport and inference11th Floor Lecture Hall
- Speaker
- Youssef Marzouk, Massachusetts Institute of Technology
- Session Chair
- Markos Katsoulakis, University of Massachusetts Amherst
Abstract
I will discuss two notions of low-dimensional structure in probability measures, and their interplay with transport-driven methods for sampling and approximate inference. The first seeks to approximate a high-dimensional target measure as a low-dimensional update of a dominating reference measure. The second is low-rank conditional structure, where the goal is to replace conditioning variables with low-dimensional projections or summaries. In both cases, under appropriate assumptions on the reference or target measures, we can derive gradient-based upper bounds on the associated approximation error and minimize these bounds to identify good subspaces for approximation. The associated subspaces then dictate specific structural ansatzes for transport maps that represent the target of interest as the pushforward or pullback of a suitable reference measure. I will show several algorithmic instantiations of this idea: a greedy algorithm that builds deep compositions of maps, where low-dimensional projections of the parameters are iteratively transformed to match the target; and a simulation-based inference algorithm that uses low-rank conditional structure to efficiently solve Bayesian inverse problems. Based on joint work with Ricardo Baptista, Michael Brennan, and Olivier Zahm.
-
3:00 - 3:30 pm EDTCoffee Break11th Floor Collaborative Space
-
3:30 - 4:15 pm EDTOptimal Mass Transport meets Stochastic Thermodynamics: Dissipation & Power in Physics and Biology11th Floor Lecture Hall
- Speaker
- Tryphon Georgiou, University of California, Irvine
- Session Chair
- Markos Katsoulakis, University of Massachusetts Amherst
Abstract
The discovery in 1998 of a link between the Wasserstein-2 metric, entropy, and the heat equation, by Jordan, Kinderlehrer, and Otto, precipitated the increasing relevance of optimal mass transport in the evolving theory of finite-time thermodynamics, aka stochastic energetics. Specifically, dissipation in finite-time thermodynamic transitions for Langevin models of colloidal particles can be measured in terms of the Wasserstein length of trajectories. This enabling new insight has led to quantifying power and efficiency of thermodynamic cycles that supersede classical quasi-static Carnot engine concepts that alternate their contact between heat baths of different temperatures. Indeed, naturally occurring processes often harvest energy from temperature or chemical gradients, where the enabling mechanism responsible for transduction of energy relies on non-equilibrium steady states and finite-time cycling. Optimal mass transport provides the geometric structure of the manifold of thermodynamic states for studying energy harvesting mechanisms. In this, dissipation and work output can be expressed as path and area integrals, and fundamental limitations on power and eficiency, in geometric terms leading to isoperimetric problems. The analysis presented provides guiding principles for building autonomous engines that extract work from thermal or chemical anisotropy in the environment.
Thursday, May 11, 2023
-
9:00 - 9:45 am EDTWasserstein Isometric Mapping and Image Manifold Learning11th Floor Lecture Hall
- Speaker
- Keaton Hamm, University of Texas at Arlington
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
We will discuss an algorithm called Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. We will discuss computational speedups to the algorithm such as use of linearized optimal transport or the Nystr\"{o}m method. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global and local techniques.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTFunction-space regularized divergences for machine learning applications11th Floor Lecture Hall
- Speaker
- Ioannis (Yannis) Pantazis, Foundations of Research and Technology - Hellas
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
Divergences such as Kullback-Leibler, Rényi and f-divergence play an increasingly important role in probabilistic machine learning offering a notion of distance between probability distributions. In the recent past, divergence estimation has been developed on the premise of variational formulas and function parametrization via neural networks. Despite the successes, the statistical estimation of a divergence is still considered a very challenging problem mainly due to high variance of the neural-based estimators. Particularly, hard cases include high dimensional data, large divergence values and Rényi divergence when its order is larger than one. Our recent work focuses on reducing the variance by regularizing the function space of the variational formulas. We will present novel families of divergences which enjoy enhanced statistical properties as well as their properties. Those function-space regularized divergences have been tested against a series of ML application including generative adversarial networks, mutual information estimation and rare sub-population detection.
-
11:30 am - 12:15 pm EDTControlling regularized conservation laws: Entropy-entropy flux pairs11th Floor Lecture Hall
- Speaker
- Wuchen Li, University of South Carolina
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
In this talk, we study variational problems for regularized conservation laws with Lax`s entropy-entropy flux pairs. We first introduce a modified optimal transport space based on conservation laws with diffusion. Using this space, we demonstrate that conservation laws with diffusion are flux-gradient flows. We next construct variational problems for these flows, for which we derive dual PDE systems for regularized conservation laws. Several examples, including traffic flow and Burgers` equation, are presented. We successfully compute the control of conservation laws by incorporating both primal-dual algorithms and monotone schemes. This is based on joint work with Siting Liu and Stanley Osher.
-
12:30 - 2:00 pm EDTLunch - Outlook and future directions for OT in Data Science and Machine LearningWorking Lunch
-
2:00 - 2:45 pm EDTTriangular transport for learning probabilistic graphical models11th Floor Lecture Hall
- Speaker
- Rebecca Morrison, University of Colorado Boulder
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
Probabilistic graphical models encode the conditional independence properties satisfied by a joint probability distribution. If the distribution is Gaussian, the edges of an undirected graphical model correspond to non-zero entries of the precision matrix. Generalizing this result to continuous non-Gaussian distributions, one can show that an edge exists if and only if an entry of the Hessian of the log density is non-zero (everywhere). But evaluation of the log density requires density estimation: for this, we propose the graph-learning algorithm SING (Sparsity Identification in Non-Gaussian distributions), which uses triangular transport for the density estimation step; this choice is advantageous as triangular maps inherit sparsity from conditional independence in the target distribution. Loosely speaking, the more non-Gaussian the distribution, the more difficult the transport problem. For a broad class of non-Gaussian distributions, however, estimating the Hessian of the log density is much easier than estimating the density itself. For the transport community, this result serves as a sort of goal-oriented transport framework, in which the particular goal of graph learning greatly simplifies the transport problem.
-
3:00 - 3:30 pm EDTCoffee Break11th Floor Collaborative Space
-
3:30 - 4:15 pm EDTStein transport for Bayesian inference11th Floor Lecture Hall
- Speaker
- Nikolas Nüsken, King’s College London
- Session Chair
- Shuchin Aeron, Tufts University
Abstract
This talk is about Stein transport, a novel methodology for Bayesian inference that pushes an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can equivalently be obtained from either a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. I will discuss the geometric underpinnings of Stein transport and SVGD, and - time permitting - connections to MCMC and the theory of large deviations.
Friday, May 12, 2023
-
9:00 - 9:45 am EDTLipschitz regularized gradient flows and latent generative particles11th Floor Lecture Hall
- Speaker
- Panagiota Birmpa, Heriot-Watt Univeristy
- Session Chair
- Markos Katsoulakis, University of Massachusetts Amherst
Abstract
Lipschitz regularized f-divergences interpolate between the Wasserstein metric and f-divergences and provide a flexible family of loss functions for non-absolutely continuous distributions (i.e. empirical), possibly with heavy tails. We construct gradient flows based on those divergences taking advantage of neural network spectral normalization (a closely related form of Lipschitz regularization). The Lipschitz regularized gradient flows induce a transport/discriminator particle algorithm where generative particles are moved along a vector field given by the gradient of the discriminator, the latter computed as in generative adversarial networks (GANs). The particle system generates approximate samples from typically high-dimensional distributions known only from data. Examples of such gradient flows are Lipschitz-regularized Fokker-Planck and porous medium equations for Kullback-Leibler and alpha-divergences respectively. Such PDE perspectives allow the analysis of the algorithm’s stability and convergence, for instance through an empirical, Lipschitz regularized, version of Fisher information which tracks the convergence of the algorithms.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTApproximations and learning in the Wasserstein space11th Floor Lecture Hall
- Speaker
- Caroline Moosmüller, University of North Carolina at Chapel Hill
- Session Chair
- Luc Rey-Bellet, UMass Amherst
Abstract
Detecting differences and building classifiers between distributions, given only finite samples, are important tasks in a number of scientific fields. Optimal transport and the Wasserstein distance have evolved as the most natural concept to deal with such tasks, but have some computational drawbacks. In this talk, we describe an approximation framework through local linearizations that significantly reduces both the computational effort and the required training data in supervised learning settings. We also introduce LOT Wassmap, a computationally feasibly algorithm to uncover low-dimensional structures in the Wasserstein space. We provide guarantees on the embedding quality, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. The proposed algorithms are demonstrated in pattern recognition tasks in imaging and medical applications.
-
11:30 am - 12:15 pm EDTOptimal transport problems with interaction effects11th Floor Lecture Hall
- Speaker
- Nestor Guillen, Texas State University
- Session Chair
- Luc Rey-Bellet, UMass Amherst
Abstract
We consider two variations on the optimal transportation problem where the particles/agents being transported interact with each other. For instance, imagine the problem of moving a collection of boxes from one configuration to another, where all the boxes move in unison and must avoid each other. As we shall show, these problems can be posed as quadratic optimization problems in the space of probability measures over the space of paths. Although the resulting optimization problem is not always convex, one can show existence and even uniqueness for some types of interactions. Moreover, we show these problems admit a fluid mechanics formulation in the style of Benamou and Brenier. This talk is based on works in collaboration with René Cabrera (UT Austin) and Jacob Homerosky (Texas State).
-
12:30 - 2:00 pm EDTLunch/Free Time
-
2:00 - 2:45 pm EDTOn geometric properties of sliced optimal transport metrics11th Floor Lecture Hall
- Speaker
- Jun Kitagawa, Michigan State University
- Session Chair
- Luc Rey-Bellet, UMass Amherst
Abstract
The sliced and max sliced Wasserstein metrics were originally proposed as a way to use 1D transport to speed up computation of the usual optimal transport metrics defined on spaces of probability measures. Some basic results are known about their metric structure, but not much is available in the way of a systematic study. In this talk, I will first discuss some further properties of these sliced metrics. Then, I will introduce a larger family of metric spaces into which these metrics can be embedded, which seem to have more desirable geometric properties. This talk is based on joint work with Asuka Takatsu.
-
3:00 - 3:30 pm EDTCoffee Break11th Floor Collaborative Space
-
3:30 - 4:15 pm EDTMatching for causal effects via multimarginal unbalanced optimal transport11th Floor Lecture Hall
- Speaker
- Florian Gunsilius, University of Michigan
- Session Chair
- Luc Rey-Bellet, UMass Amherst
Abstract
Matching on covariates is a well-established framework for estimating causal effects in observational studies. A major challenge is that established methods like matching via nearest neighbors possess poor statistical properties when the dimension of the continuous covariates is high. This article introduces an alternative matching approach based on unbalanced optimal transport that possesses better statistical properties in high-dimensional settings. In particular, we prove that the proposed method dominates classical nearest neighbor matching in mean squared error in finite samples when the dimension of the continuous covariates is high enough. This notable result is already present in low dimensions, as we demonstrate in simulations. It follows from two properties of the new estimator. First, for any positive “matching radius”, the optimal matching obtained converges at the parametric rate in any dimension to the optimal population matching. This stands in contrast to the classical nearest neighbor matching, which suffers from a curse of dimensionality in the continuous covariates. Second, as the matching radius converges to zero, the method is unbiased in the population for the average treatment effect on the overlapping region. The approach also possesses several other desirable properties: it is flexible in allowing for many different ways to define the matching radius and the cost of matching, can be bootstrapped for inference, provides interpretable weights based on the cost of matching individuals, can be efficiently implemented via Sinkhorn iterations, and can match several treatment arms simultaneously. Importantly, it only selects good matches from any treatment arm, thus providing unbiased estimates of average treatment effects in the region of overlapping supports
All event times are listed in ICERM local time in Providence, RI (Eastern Daylight Time / UTC-4).
All event times are listed in .
ICERM local time in Providence, RI is Eastern Daylight Time (UTC-4). Would you like to switch back to ICERM time or choose a different custom timezone?