Organizing Committee
Abstract

This workshop will focus on the intersection of mathematics, statistics, machine learning, and computation, when viewed through the lens of optimal transport (OT). Mathematical topics will include low-dimensional models for OT, linearizations of OT, and the geometry of OT including gradient flows and gradient descent in the space of measures. Relevant statistical topics will include reliable and efficient estimation of OT plans in high dimensions, the role of regularization in computing OT distances and plans, with applications to robust statistics, uncertainty quantification, and overparameterized machine learning. Computation will be a recurring theme of the workshop, with emphasis on the development of fast algorithms and applications to computational biology, high energy physics, material science, spatio-temporal modeling, natural language processing, and image processing.

Image for "Optimal Transport in Data Science"

Confirmed Speakers & Participants

Talks will be presented virtually or in-person as indicated in the schedule below.

  • Speaker
  • Poster Presenter
  • Attendee
  • Virtual Attendee
  • Shuchin Aeron
    Tufts University
  • Kang An
    Rice University
  • Ricardo Baptista
    California Institute of Technology
  • Panagiota Birmpa
    Heriot-Watt Univeristy
  • Jose Blanchet
    Stanford
  • Tobias Blickhan
    Max Planck Institute for Plasma Physics
  • Tamara Broderick
    Massachusetts Institute of Technology
  • Dongwei Chen
    Clemson University
  • Ziyu Chen
    University of Massachusetts Amherst
  • Yongxin Chen
    Georgia Institute of Technology
  • Yu-Chen Cheng
    Dana-Farber Cancer Institute
  • Jannatul Ferdous Chhoa
    University of Houston
  • Frank Cole
    University of Massachusetts Amherst
  • Emil Constantinescu
    Argonne National Laboratory
  • Keisha Cook
    Tulane University
  • YUQING DAI
    Duke University
  • Steve Damelin
    Mathematical Scientist, Ann Arbor MI
  • Fred Daum
    Raytheon
  • ROCIO DIAZ MARTIN
    Vanderbilt University
  • Christopher Eads
    University of Texas at Arlington
  • Gabriel Earle
    University of Massachusetts Amherst
  • Ranthony Edmonds
    The Ohio State University
  • Marcia Fampa
    Federal University of Rio de Janeiro
  • Weifu Fang
    Wright State University
  • Guosheng Fu
    University of Notre Dame
  • Wilfrid Gangbo
    UCLA
  • Tryphon Georgiou
    University of California, Irvine
  • Hyemin Gu
    University of Massachusetts Amherst
  • Nestor Guillen
    Texas State University
  • Florian Gunsilius
    University of Michigan
  • Minh Ha Quang
    RIKEN
  • Keaton Hamm
    University of Texas at Arlington
  • Alexander Hsu
    University of Washington
  • Yinan Hu
    New York University
  • Stefanie Jegelka
    MIT
  • Sixian Jin
    Worcester Polytechnic Institute
  • Yijie Jin
    Georgia Institute of Technology
  • Justin Kakeu
    University of Prince Edward Island
  • Markos Katsoulakis
    University of Massachusetts Amherst
  • Varun Khurana
    University of California, San Diego
  • Jun Kitagawa
    Michigan State University
  • Vladimir Kobzar
    Columbia University
  • Marie Jose Kuffner
    John Hopkins University
  • Christian Kümmerle
    Univeristy of North Carolina at Charlotte
  • Dohyun Kwon
    University of Seoul
  • Ivan Lau
    Rice University
  • Shiying Li
    University of North Carolina at Chapel Hill
  • Wuchen Li
    University of South Carolina
  • Jiaming Liang
    Yale University
  • Wei Liu
    Rensselaer Polytechnic Institute
  • Jun Liu
    Southern Illinois University Edwardsville
  • Yulong Lu
    University of Massachusetts Amherst
  • Miranda Lynch
    Hauptman-Woodward Medical Research Institute
  • ildebrando magnani
    University of Michigan
  • Charalambos Makridakis
    Foundation for Research and Technology-Hellas (FORTH)
  • Brendan Mallery
    Tufts University
  • Youssef Marzouk
    Massachusetts Institute of Technology
  • SHOAIB BIN MASUD
    Tufts University
  • Tyler Maunu
    Brandeis University
  • Henok Mawi
    Howard University
  • Ian Oliver McPherson
    Johns Hopkins University
  • Kun Meng
    Brown University
  • TOSHIO MIKAMI
    Tsuda University
  • Martin Molina Fructuoso
    Brandeis University
  • Caroline Moosmüller
    University of North Carolina at Chapel Hill
  • Rebecca Morrison
    University of Colorado Boulder
  • Chenchen Mou
    City University of Hong Kong
  • Lidia Mrad
    Mount Holyoke College
  • James Murphy
    Tufts University
  • Evangelos Nastas
    SUNY
  • Elisa Negrini
    Institute for Pure and Applied Mathematics, University of California Los Angeles
  • T. H. Molena Nguyen
    North Carolina State University
  • Djordje Nikolic
    University of California, Santa Barbara
  • Nikolas Nüsken
    King’s College London
  • Marcel Nutz
    Columbia University
  • Daniel Packer
    The Ohio State University
  • Ali Pakniyat
    University of Alabama
  • Soumik Pal
    University of Washington, Seattle
  • Ioannis (Yannis) Pantazis
    Foundations of Research and Technology - Hellas
  • Farhad Pourkamali Anaraki
    University of Colorado Denver
  • Yusuf Qaddura
    The Ohio State University
  • Michael Rawson
    Pacific Northwest National Lab
  • Luc Rey-Bellet
    UMass Amherst
  • Bety Rostandy
    Brightseed Bio
  • Bjorn Sandstede
    Brown University
  • Geoffrey Schiebinger
    University of British Columbia
  • Bodhisattva Sen
    Columbia University
  • Zhaiming Shen
    University of Georgia
  • Yunpeng Shi
    Princeton University
  • Nimita Shinde
    ICERM, Brown University
  • Reed Spitzer
    Brown University
  • Stephan Sturm
    WPI
  • Shashank Sule
    University of Maryland, College Park
  • Sui Tang
    University of California Santa Barbara
  • Allen Tannenbaum
    Stony Brook University
  • Sumati Thareja
    Vanderbilt University
  • Mohammad Taha Toghani
    Rice University
  • Cesar Uribe
    Rice University
  • Brian Van Koten
    University of Massachusetts, Amherst
  • Dootika Vats
    Indian Institute of Technology Kanpur
  • Hong Wang
    University of South Carolina
  • Yi Wang
    Johns Hopkins University
  • Jeremy Wang
    Brown University
  • Matthew Werenski
    Tufts University
  • Yangyang Xu
    Rensselaer Polytechnic Institute
  • Tinghui Xu
    University of Wisconsin-Madison
  • Sheng Xu
    Princeton University
  • Xingchi Yan
    Harvard University
  • Ruiyi Yang
    Princeton University
  • Bowen Yang
    Caltech
  • Shixuan Zhang
    Brown University
  • Stephen Zhang
    University of Melbourne
  • Benjamin Zhang
    University of Massachusetts Amherst
  • Jiwei Zhao
    University of Wisconsin-Madison
  • Bohan Zhou
    Dartmouth College
  • Xiang ZHOU
    City University of Hong Kong
  • Wei Zhu
    University of Massachusetts Amherst

Workshop Schedule

Monday, May 8, 2023
  • 8:50 - 9:00 am EDT
    Welcome
    11th Floor Lecture Hall
  • 9:00 - 9:45 am EDT
    Network Analysis of High Dimensional Data
    11th Floor Lecture Hall
    • Speaker
    • Allen Tannenbaum, Stony Brook University
    • Session Chair
    • James Murphy, Tufts University
    Abstract
    A major problem in data science is representation of data so that the variables driving key functions can be uncovered and explored. Correlation analysis is widely used to simplify networks of feature variables by reducing redundancies, but makes limited use of the network topology, relying on comparison of direct neighbor variables. The proposed method incorporates relational or functional profiles of neighboring variables along multiple common neighbors, which are fitted with Gaussian mixture models and compared using a data metric based on a version of optimal mass transport tailored to Gaussian mixtures. Hierarchical interactive visualization of the result leads to effective unbiased hypothesis generation. We will discuss several applications to medical imaging and cancer networks.
  • 10:00 - 10:30 am EDT
    Coffee Break
    11th Floor Collaborative Space
  • 10:30 - 11:15 am EDT
    Signed Cumulative Distribution Transform for Machine Learning
    11th Floor Lecture Hall
    • Speaker
    • Sumati Thareja, Vanderbilt University
    • Session Chair
    • James Murphy, Tufts University
    Abstract
    Classification and estimation problems are at the core of machine learning. In this talk we will see a new mathematical signal transform that renders data easy to classify or estimate, based on a very old theory of transportation that was started by Monge. We will learn about the existing Cumulative Distribution Transform and then extend to a more general measure theoretic framework, to define the new transform (Signed Cumulative Distribution Transform). We will look at both forward (analysis) and inverse (synthesis) formulas for the transform, and describe several of its properties including translation, scaling, convexity and isometry. Finally, we will demonstrate two applications of the transform in classifying (detecting) signals under random displacements and estimation of signal parameters under such displacements.
  • 11:30 am - 12:15 pm EDT
    Towards a Mathematical Theory of Development
    11th Floor Lecture Hall
    • Virtual Speaker
    • Geoffrey Schiebinger, University of British Columbia
    • Session Chair
    • James Murphy, Tufts University
    Abstract
    This talk introduces a mathematical theory of developmental biology, based on optimal transport. While, in principle, organisms are made of molecules whose motions are described by the Schödinger equation, there are simply too many molecules for this to be useful. Optimal transport—a fascinating topic in its own right, at the intersection of probability, statistics and optimization—provides a set of equations that describe development at the level of cells. Biology has entered a new era of precision measurement and massive datasets. Techniques like single-cell RNA sequencing (scRNA-seq) and single-cell ATAC-seq have emerged as powerful tools to profile cell states at unprecedented molecular resolution. One of the most exciting prospects associated with this new trove of data is the possibility of studying temporal processes, such as differentiation and development. If we could understand the genetic forces that control embryonic development, then we would have a better idea of how cell types are stabilized throughout adult life and how they destabilize with age or in diseases like cancer. This would be within reach if we could analyze the dynamic changes in gene expression, as populations develop and subpopulations differentiate. However, this is not directly possible with current measurement technologies because they are destructive (e.g. cells must be lysed to measure expression profiles). Therefore, we cannot directly observe the waves of transcriptional patterns that dictate changes in cell type. This talk introduces a rigorous framework for understanding the developmental trajectories of cells in a dynamically changing, heterogeneous population based on static snapshots along a time-course. The framework is based on a simple hypothesis: over short time-scales cells can only change their expression profile by small amounts. We formulate this in precise mathematical terms using a classical tool called optimal transport (OT), and we propose that this optimal transport hypothesis is a fundamental mathematical principle of developmental biology.
  • 12:30 - 2:00 pm EDT
    Lunch - Optimal transport: Junior and seniors
    Working Lunch
  • 2:00 - 2:45 pm EDT
    Multivariate Distribution-free testing using Optimal Transport
    11th Floor Lecture Hall
    • Speaker
    • Bodhisattva Sen, Columbia University
    • Session Chair
    • James Murphy, Tufts University
    Abstract
    We propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of optimal transport (see e.g., Villani (2003)). We demonstrate the applicability of this approach by constructing exactly distribution-free tests for two classical nonparametric problems: (i) testing for the equality of two multivariate distributions, and (ii) testing for mutual independence between two random vectors. In particular, we propose (multivariate) rank versions of Hotelling T^2 and kernel two-sample tests (e.g., Gretton et al. (2012), Szekely and Rizzo (2013)), and kernel tests for independence (e.g., Gretton et al. (2007), Szekely et al. (2007)) for scenarios (i) and (ii) respectively. We investigate the consistency and asymptotic distributions of these tests, both under the null and local contiguous alternatives. We also study the local power and asymptotic (Pitman) efficiency of these multivariate tests (based on optimal transport), and show that a subclass of these tests achieve attractive efficiency lower bounds that mimic the remarkable efficiency results of Hodges and Lehmann (1956) and Chernoff and Savage (1958) (for the Wilcoxon-rank sum test). To the best of our knowledge, these are the first collection of multivariate, nonparametric, exactly distribution-free tests that provably achieve such attractive efficiency lower bounds. We also study the rates of convergence of the rank maps (aka optimal transport maps).
  • 3:00 - 3:30 pm EDT
    Coffee Break
    11th Floor Collaborative Space
  • 3:30 - 4:15 pm EDT
    Optimal transport for estimating generalization properties of machine learning models
    11th Floor Lecture Hall
    • Speaker
    • Stefanie Jegelka, MIT
    • Session Chair
    • James Murphy, Tufts University
    Abstract
    One important challenge in practical machine learning is to estimate the generalization properties of a trained model, i.e., judging how well it will perform on unseen data. In this talk, we will discuss two examples of how optimal transport can help with this challenge. First, we address the problem of estimating generalization of deep neural networks. A critical factor is the geometry of the data in the latent embedding space. We analyze this data arrangement via an optimal-transport-based generalization of variance, and show its theoretical and empirical relevance via generalization bounds that are also empirically predictive. Second, we study the stability of neural networks for graph inputs, i.e., graph neural networks (GNNs), under shifts of the data distribution. In particular, to derive stability bounds, we need a suitable metric in the input space of graphs. We derive such a (pseudo)metric targeted to GNNs via a recursive optimal transport based distance between sets of trees. Our metric correlates better than state of the art with the behavior of GNNs under data distribution shifts.
  • 4:30 - 6:30 pm EDT
    Reception
    11th Floor Collaborative Space
Tuesday, May 9, 2023
  • 9:00 - 9:45 am EDT
    On the Convergence Rate of Sinkhorn’s Algorithm
    11th Floor Lecture Hall
    • Speaker
    • Marcel Nutz, Columbia University
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    We study Sinkhorn's algorithm for solving the entropically regularized optimal transport problem. Its iterate π_t is shown to satisfy H(π_t|π∗)+H(π∗|π_t)=O(1/t) where H denotes relative entropy and π∗ the optimal coupling. This holds for a large class of cost functions and marginals, including quadratic cost with subgaussian marginals. We also obtain the rate O(1/t) for the dual suboptimality and O(1/t^2) for the marginal entropies. More precisely, we derive non-asymptotic bounds, and in contrast to previous results on linear convergence that are limited to bounded costs, our estimates do not deteriorate exponentially with the regularization parameter. We also obtain a stability result for π∗ as a function of the marginals, quantified in relative entropy.
  • 10:00 - 10:30 am EDT
    Coffee Break
    11th Floor Collaborative Space
  • 10:30 - 11:15 am EDT
    Applications of No-Collision Transportation Maps in Manifold Learning
    11th Floor Lecture Hall
    • Speaker
    • Elisa Negrini, Institute for Pure and Applied Mathematics, University of California Los Angeles
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    In this work, we investigate applications of no-collision transportation maps introduced in [Nurbekyan et. al., 2020] in manifold learning for image data. Recently, there has been a surge in applying transportationbased distances and features for data representing motion-like or deformationlike phenomena. Indeed, comparing intensities at fixed locations often does not reveal the data structure. No-collision maps and distances developed in [Nurbekyan et. al., 2020] are sensitive to geometric features similar to optimal transportation (OT) maps but much cheaper to compute due to the absence of optimization. In this work, we prove that no-collision distances provide an isometry between translations (respectively dilations) of a single probability measure and the translation (respectively dilation) vectors equipped with a Euclidean distance. Furthermore, we prove that no-collision transportation maps, as well as OT and linearized OT maps, do not in general provide an isometry for rotations. The numerical experiments confirm our theoretical findings and show that no-collision distances achieve similar or better performance on several manifold learning tasks compared to other OT and Euclidean-based methods at a fraction of a computational cost.
  • 11:30 am - 12:15 pm EDT
    Graphical Optimal Transport and its applications
    11th Floor Lecture Hall
    • Speaker
    • Yongxin Chen, Georgia Institute of Technology
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    Multi-marginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the well-developed algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games.
  • 12:30 - 2:00 pm EDT
    Lunch/Free Time
  • 2:00 - 2:45 pm EDT
    Advances in Distributionally Robust Optimization (DRO): Unifications, Extensions, and Applications
    11th Floor Lecture Hall
    • Speaker
    • Jose Blanchet, Stanford
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    We will discuss recent developments in distributionally robust optimization, including a tractable class of problems that simultaneously unifies and extends most of the formulations studied in DRO (including phi-divergence, inverse-phi-divergence, Wasserstein, and Sinkhorn). This unification is based on optimal transport theory with martingale constraints. We discuss various benefits of having the flexibility offered by these formulations in connection with, for example, the theory of epi-convergence and statistical robustness. We apply some of these new developments to optimal portfolio selection. Our implementations are motivated by intriguing experiments which show an unexpected out-of-sample performance of non-robust policies in real data. This talk is partly based on joint work with Daniel Kuhn, Jiajin Li, Yiping Lu, and Bahar Taskesen.
  • 3:00 - 5:00 pm EDT
    Coffee Break & Poster Session
    Poster Session - 10th Floor Collaborative Space
Wednesday, May 10, 2023
  • 9:00 - 9:45 am EDT
    Mirror gradient flows: Euclidean and Wasserstein
    11th Floor Lecture Hall
    • Speaker
    • Soumik Pal, University of Washington, Seattle
    • Session Chair
    • Markos Katsoulakis, University of Massachusetts Amherst
    Abstract
    We will talk about a new family of Wasserstein gradient flows that is inspired by Euclidean mirror gradient flows. These flows can often display faster convergence rates than the usual gradient flows. They have rich geometrical structures and give rise to a wide generalization of the Langevin diffusions and the Fokker-Planck PDEs. An immediate applications come from considering limits of Sinkhorn iterations.
  • 10:00 - 10:30 am EDT
    Coffee Break
    11th Floor Collaborative Space
  • 10:30 - 11:15 am EDT
    Many Processors, Little Time: MCMC for Partitions via Optimal Transport Couplings
    11th Floor Lecture Hall
    • Speaker
    • Tamara Broderick, Massachusetts Institute of Technology
    • Session Chair
    • Luc Rey-Bellet, UMass Amherst
    Abstract
    Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinite-time limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the “label-switching problem”: semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions’ (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, time-limited regime.
  • 11:30 am - 12:15 pm EDT
    On Hamilton-Jacobi (HJ) equations on the Wasserstein space on graphs.
    11th Floor Lecture Hall
    • Speaker
    • Wilfrid Gangbo, UCLA
    • Session Chair
    • Markos Katsoulakis, University of Massachusetts Amherst
  • 12:25 - 12:30 pm EDT
    Group Photo (Immediately After Talk)
    11th Floor Lecture Hall
  • 12:30 - 2:00 pm EDT
    Lunch/Free Time
  • 2:00 - 2:45 pm EDT
    Certifiable low-dimensional structure in transport and inference
    11th Floor Lecture Hall
    • Speaker
    • Youssef Marzouk, Massachusetts Institute of Technology
    • Session Chair
    • Markos Katsoulakis, University of Massachusetts Amherst
    Abstract
    I will discuss two notions of low-dimensional structure in probability measures, and their interplay with transport-driven methods for sampling and approximate inference. The first seeks to approximate a high-dimensional target measure as a low-dimensional update of a dominating reference measure. The second is low-rank conditional structure, where the goal is to replace conditioning variables with low-dimensional projections or summaries. In both cases, under appropriate assumptions on the reference or target measures, we can derive gradient-based upper bounds on the associated approximation error and minimize these bounds to identify good subspaces for approximation. The associated subspaces then dictate specific structural ansatzes for transport maps that represent the target of interest as the pushforward or pullback of a suitable reference measure. I will show several algorithmic instantiations of this idea: a greedy algorithm that builds deep compositions of maps, where low-dimensional projections of the parameters are iteratively transformed to match the target; and a simulation-based inference algorithm that uses low-rank conditional structure to efficiently solve Bayesian inverse problems. Based on joint work with Ricardo Baptista, Michael Brennan, and Olivier Zahm.
  • 3:00 - 3:30 pm EDT
    Coffee Break
    11th Floor Collaborative Space
  • 3:30 - 4:15 pm EDT
    Optimal Mass Transport meets Stochastic Thermodynamics: Dissipation & Power in Physics and Biology
    11th Floor Lecture Hall
    • Speaker
    • Tryphon Georgiou, University of California, Irvine
    • Session Chair
    • Markos Katsoulakis, University of Massachusetts Amherst
    Abstract
    The discovery in 1998 of a link between the Wasserstein-2 metric, entropy, and the heat equation, by Jordan, Kinderlehrer, and Otto, precipitated the increasing relevance of optimal mass transport in the evolving theory of finite-time thermodynamics, aka stochastic energetics. Specifically, dissipation in finite-time thermodynamic transitions for Langevin models of colloidal particles can be measured in terms of the Wasserstein length of trajectories. This enabling new insight has led to quantifying power and efficiency of thermodynamic cycles that supersede classical quasi-static Carnot engine concepts that alternate their contact between heat baths of different temperatures. Indeed, naturally occurring processes often harvest energy from temperature or chemical gradients, where the enabling mechanism responsible for transduction of energy relies on non-equilibrium steady states and finite-time cycling. Optimal mass transport provides the geometric structure of the manifold of thermodynamic states for studying energy harvesting mechanisms. In this, dissipation and work output can be expressed as path and area integrals, and fundamental limitations on power and eficiency, in geometric terms leading to isoperimetric problems. The analysis presented provides guiding principles for building autonomous engines that extract work from thermal or chemical anisotropy in the environment.
Thursday, May 11, 2023
  • 9:00 - 9:45 am EDT
    Wasserstein Isometric Mapping and Image Manifold Learning
    11th Floor Lecture Hall
    • Speaker
    • Keaton Hamm, University of Texas at Arlington
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    We will discuss an algorithm called Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. We will discuss computational speedups to the algorithm such as use of linearized optimal transport or the Nystr\"{o}m method. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global and local techniques.
  • 10:00 - 10:30 am EDT
    Coffee Break
    11th Floor Collaborative Space
  • 10:30 - 11:15 am EDT
    Function-space regularized divergences for machine learning applications
    11th Floor Lecture Hall
    • Speaker
    • Ioannis (Yannis) Pantazis, Foundations of Research and Technology - Hellas
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    Divergences such as Kullback-Leibler, Rényi and f-divergence play an increasingly important role in probabilistic machine learning offering a notion of distance between probability distributions. In the recent past, divergence estimation has been developed on the premise of variational formulas and function parametrization via neural networks. Despite the successes, the statistical estimation of a divergence is still considered a very challenging problem mainly due to high variance of the neural-based estimators. Particularly, hard cases include high dimensional data, large divergence values and Rényi divergence when its order is larger than one. Our recent work focuses on reducing the variance by regularizing the function space of the variational formulas. We will present novel families of divergences which enjoy enhanced statistical properties as well as their properties. Those function-space regularized divergences have been tested against a series of ML application including generative adversarial networks, mutual information estimation and rare sub-population detection.
  • 11:30 am - 12:15 pm EDT
    Controlling regularized conservation laws: Entropy-entropy flux pairs
    11th Floor Lecture Hall
    • Speaker
    • Wuchen Li, University of South Carolina
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    In this talk, we study variational problems for regularized conservation laws with Lax`s entropy-entropy flux pairs. We first introduce a modified optimal transport space based on conservation laws with diffusion. Using this space, we demonstrate that conservation laws with diffusion are flux-gradient flows. We next construct variational problems for these flows, for which we derive dual PDE systems for regularized conservation laws. Several examples, including traffic flow and Burgers` equation, are presented. We successfully compute the control of conservation laws by incorporating both primal-dual algorithms and monotone schemes. This is based on joint work with Siting Liu and Stanley Osher.
  • 12:30 - 2:00 pm EDT
    Lunch - Outlook and future directions for OT in Data Science and Machine Learning
    Working Lunch
  • 2:00 - 2:45 pm EDT
    Triangular transport for learning probabilistic graphical models
    11th Floor Lecture Hall
    • Speaker
    • Rebecca Morrison, University of Colorado Boulder
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    Probabilistic graphical models encode the conditional independence properties satisfied by a joint probability distribution. If the distribution is Gaussian, the edges of an undirected graphical model correspond to non-zero entries of the precision matrix. Generalizing this result to continuous non-Gaussian distributions, one can show that an edge exists if and only if an entry of the Hessian of the log density is non-zero (everywhere). But evaluation of the log density requires density estimation: for this, we propose the graph-learning algorithm SING (Sparsity Identification in Non-Gaussian distributions), which uses triangular transport for the density estimation step; this choice is advantageous as triangular maps inherit sparsity from conditional independence in the target distribution. Loosely speaking, the more non-Gaussian the distribution, the more difficult the transport problem. For a broad class of non-Gaussian distributions, however, estimating the Hessian of the log density is much easier than estimating the density itself. For the transport community, this result serves as a sort of goal-oriented transport framework, in which the particular goal of graph learning greatly simplifies the transport problem.
  • 3:00 - 3:30 pm EDT
    Coffee Break
    11th Floor Collaborative Space
  • 3:30 - 4:15 pm EDT
    Stein transport for Bayesian inference
    11th Floor Lecture Hall
    • Speaker
    • Nikolas Nüsken, King’s College London
    • Session Chair
    • Shuchin Aeron, Tufts University
    Abstract
    This talk is about Stein transport, a novel methodology for Bayesian inference that pushes an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can equivalently be obtained from either a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. I will discuss the geometric underpinnings of Stein transport and SVGD, and - time permitting - connections to MCMC and the theory of large deviations.
Friday, May 12, 2023
  • 9:00 - 9:45 am EDT
    Lipschitz regularized gradient flows and latent generative particles
    11th Floor Lecture Hall
    • Speaker
    • Panagiota Birmpa, Heriot-Watt Univeristy
    • Session Chair
    • Markos Katsoulakis, University of Massachusetts Amherst
    Abstract
    Lipschitz regularized f-divergences interpolate between the Wasserstein metric and f-divergences and provide a flexible family of loss functions for non-absolutely continuous distributions (i.e. empirical), possibly with heavy tails. We construct gradient flows based on those divergences taking advantage of neural network spectral normalization (a closely related form of Lipschitz regularization). The Lipschitz regularized gradient flows induce a transport/discriminator particle algorithm where generative particles are moved along a vector field given by the gradient of the discriminator, the latter computed as in generative adversarial networks (GANs). The particle system generates approximate samples from typically high-dimensional distributions known only from data. Examples of such gradient flows are Lipschitz-regularized Fokker-Planck and porous medium equations for Kullback-Leibler and alpha-divergences respectively. Such PDE perspectives allow the analysis of the algorithm’s stability and convergence, for instance through an empirical, Lipschitz regularized, version of Fisher information which tracks the convergence of the algorithms.
  • 10:00 - 10:30 am EDT
    Coffee Break
    11th Floor Collaborative Space
  • 10:30 - 11:15 am EDT
    Approximations and learning in the Wasserstein space
    11th Floor Lecture Hall
    • Speaker
    • Caroline Moosmüller, University of North Carolina at Chapel Hill
    • Session Chair
    • Luc Rey-Bellet, UMass Amherst
    Abstract
    Detecting differences and building classifiers between distributions, given only finite samples, are important tasks in a number of scientific fields. Optimal transport and the Wasserstein distance have evolved as the most natural concept to deal with such tasks, but have some computational drawbacks. In this talk, we describe an approximation framework through local linearizations that significantly reduces both the computational effort and the required training data in supervised learning settings. We also introduce LOT Wassmap, a computationally feasibly algorithm to uncover low-dimensional structures in the Wasserstein space. We provide guarantees on the embedding quality, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. The proposed algorithms are demonstrated in pattern recognition tasks in imaging and medical applications.
  • 11:30 am - 12:15 pm EDT
    Optimal transport problems with interaction effects
    11th Floor Lecture Hall
    • Speaker
    • Nestor Guillen, Texas State University
    • Session Chair
    • Luc Rey-Bellet, UMass Amherst
    Abstract
    We consider two variations on the optimal transportation problem where the particles/agents being transported interact with each other. For instance, imagine the problem of moving a collection of boxes from one configuration to another, where all the boxes move in unison and must avoid each other. As we shall show, these problems can be posed as quadratic optimization problems in the space of probability measures over the space of paths. Although the resulting optimization problem is not always convex, one can show existence and even uniqueness for some types of interactions. Moreover, we show these problems admit a fluid mechanics formulation in the style of Benamou and Brenier. This talk is based on works in collaboration with René Cabrera (UT Austin) and Jacob Homerosky (Texas State).
  • 12:30 - 2:00 pm EDT
    Lunch/Free Time
  • 2:00 - 2:45 pm EDT
    On geometric properties of sliced optimal transport metrics
    11th Floor Lecture Hall
    • Speaker
    • Jun Kitagawa, Michigan State University
    • Session Chair
    • Luc Rey-Bellet, UMass Amherst
    Abstract
    The sliced and max sliced Wasserstein metrics were originally proposed as a way to use 1D transport to speed up computation of the usual optimal transport metrics defined on spaces of probability measures. Some basic results are known about their metric structure, but not much is available in the way of a systematic study. In this talk, I will first discuss some further properties of these sliced metrics. Then, I will introduce a larger family of metric spaces into which these metrics can be embedded, which seem to have more desirable geometric properties. This talk is based on joint work with Asuka Takatsu.
  • 3:00 - 3:30 pm EDT
    Coffee Break
    11th Floor Collaborative Space
  • 3:30 - 4:15 pm EDT
    Matching for causal effects via multimarginal unbalanced optimal transport
    11th Floor Lecture Hall
    • Speaker
    • Florian Gunsilius, University of Michigan
    • Session Chair
    • Luc Rey-Bellet, UMass Amherst
    Abstract
    Matching on covariates is a well-established framework for estimating causal effects in observational studies. A major challenge is that established methods like matching via nearest neighbors possess poor statistical properties when the dimension of the continuous covariates is high. This article introduces an alternative matching approach based on unbalanced optimal transport that possesses better statistical properties in high-dimensional settings. In particular, we prove that the proposed method dominates classical nearest neighbor matching in mean squared error in finite samples when the dimension of the continuous covariates is high enough. This notable result is already present in low dimensions, as we demonstrate in simulations. It follows from two properties of the new estimator. First, for any positive “matching radius”, the optimal matching obtained converges at the parametric rate in any dimension to the optimal population matching. This stands in contrast to the classical nearest neighbor matching, which suffers from a curse of dimensionality in the continuous covariates. Second, as the matching radius converges to zero, the method is unbiased in the population for the average treatment effect on the overlapping region. The approach also possesses several other desirable properties: it is flexible in allowing for many different ways to define the matching radius and the cost of matching, can be bootstrapped for inference, provides interpretable weights based on the cost of matching individuals, can be efficiently implemented via Sinkhorn iterations, and can match several treatment arms simultaneously. Importantly, it only selects good matches from any treatment arm, thus providing unbiased estimates of average treatment effects in the region of overlapping supports

All event times are listed in ICERM local time in Providence, RI (Eastern Daylight Time / UTC-4).

All event times are listed in .