Organizing Committee
 Shuchin Aeron
Tufts University  Markos Katsoulakis
University of Massachusetts Amherst  James Murphy
Tufts University  Luc ReyBellet
UMass Amherst  Bjorn Sandstede
Brown University
Abstract
This workshop will focus on the intersection of mathematics, statistics, machine learning, and computation, when viewed through the lens of optimal transport (OT). Mathematical topics will include lowdimensional models for OT, linearizations of OT, and the geometry of OT including gradient flows and gradient descent in the space of measures. Relevant statistical topics will include reliable and efficient estimation of OT plans in high dimensions, the role of regularization in computing OT distances and plans, with applications to robust statistics, uncertainty quantification, and overparameterized machine learning. Computation will be a recurring theme of the workshop, with emphasis on the development of fast algorithms and applications to computational biology, high energy physics, material science, spatiotemporal modeling, natural language processing, and image processing.
Confirmed Speakers & Participants
Talks will be presented virtually or inperson as indicated in the schedule below.
 Speaker
 Poster Presenter
 Attendee
 Virtual Attendee

Shuchin Aeron
Tufts University

Kang An
Rice University

Ricardo Baptista
California Institute of Technology

Panagiota Birmpa
HeriotWatt Univeristy

Jose Blanchet
Stanford

Tobias Blickhan
Max Planck Institute for Plasma Physics

Tamara Broderick
Massachusetts Institute of Technology

Dongwei Chen
Clemson University

Ziyu Chen
University of Massachusetts Amherst

Yongxin Chen
Georgia Institute of Technology

YuChen Cheng
DanaFarber Cancer Institute

Jannatul Ferdous Chhoa
University of Houston

Frank Cole
University of Massachusetts Amherst

Emil Constantinescu
Argonne National Laboratory

Keisha Cook
Tulane University

YUQING DAI
Duke University

Steve Damelin
University of Michigan

Fred Daum
Raytheon

ROCIO DIAZ MARTIN
Vanderbilt University

Christopher Eads
University of Texas at Arlington

Gabriel Earle
University of Massachusetts Amherst

Ranthony Edmonds
The Ohio State University

Marcia Fampa
Federal University of Rio de Janeiro

Weifu Fang
Wright State University

Guosheng Fu
University of Notre Dame

Wilfrid Gangbo
UCLA

Tryphon Georgiou
University of California, Irvine

Hyemin Gu
University of Massachusetts Amherst

Nestor Guillen
Texas State University

Florian Gunsilius
University of Michigan

Minh Ha Quang
RIKEN

Keaton Hamm
University of Texas at Arlington

Alexander Hsu
University of Washington

Yinan Hu
New York University

Stefanie Jegelka
MIT

Sixian Jin
Worcester Polytechnic Institute

Yijie Jin
Georgia Institute of Technology

Justin Kakeu
University of Prince Edward Island

Markos Katsoulakis
University of Massachusetts Amherst

Varun Khurana
University of California, San Diego

Jun Kitagawa
Michigan State University

Vladimir Kobzar
Columbia University

Marie Jose Kuffner
John Hopkins University

Christian Kümmerle
Univeristy of North Carolina at Charlotte

Dohyun Kwon
University of Seoul

Ivan Lau
Rice University

Shiying Li
University of North Carolina at Chapel Hill

Wuchen Li
University of South Carolina

Jiaming Liang
Yale University

Wei Liu
Rensselaer Polytechnic Institute

Jun Liu
Southern Illinois University Edwardsville

Yulong Lu
University of Massachusetts Amherst

Miranda Lynch
HauptmanWoodward Medical Research Institute

ildebrando magnani
University of Michigan

Charalambos Makridakis
Foundation for Research and TechnologyHellas (FORTH)

Brendan Mallery
Tufts University

Youssef Marzouk
Massachusetts Institute of Technology

SHOAIB BIN MASUD
Tufts University

Tyler Maunu
Brandeis University

Henok Mawi
Howard University

Ian Oliver McPherson
Johns Hopkins University

Kun Meng
Brown University

TOSHIO MIKAMI
Tsuda University

T. H. Molena
North Carolina State University

Martin Molina Fructuoso
Brandeis University

Caroline Moosmüller
University of North Carolina at Chapel Hill

Rebecca Morrison
University of Colorado Boulder

Chenchen Mou
City University of Hong Kong

Lidia Mrad
Mount Holyoke College

James Murphy
Tufts University

Evangelos Nastas
SUNY

Elisa Negrini
Institute for Pure and Applied Mathematics, University of California Los Angeles

Djordje Nikolic
University of California, Santa Barbara

Nikolas Nüsken
King’s College London

Marcel Nutz
Columbia University

Daniel Packer
The Ohio State University

Ali Pakniyat
University of Alabama

Soumik Pal
University of Washington, Seattle

Ioannis (Yannis) Pantazis
Foundations of Research and Technology  Hellas

Farhad Pourkamali Anaraki
University of Colorado Denver

Yusuf Qaddura
The Ohio State University

Michael Rawson
Pacific Northwest National Lab

Luc ReyBellet
UMass Amherst

Bety Rostandy
Brightseed Bio

Bjorn Sandstede
Brown University

Geoffrey Schiebinger
University of British Columbia

Bodhisattva Sen
Columbia University

Zhaiming Shen
University of Georgia

Yunpeng Shi
Princeton University

Nimita Shinde
ICERM, Brown University

Reed Spitzer
Brown University

Stephan Sturm
WPI

Shashank Sule
University of Maryland, College Park

Sui Tang
University of California Santa Barbara

Allen Tannenbaum
Stony Brook University

Sumati Thareja
Vanderbilt University

Mohammad Taha Toghani
Rice University

Cesar Uribe
Rice University

Brian Van Koten
University of Massachusetts, Amherst

Dootika Vats
Indian Institute of Technology Kanpur

Hong Wang
University of South Carolina

Yi Wang
Johns Hopkins University

Jeremy Wang
Brown University

Matthew Werenski
Tufts University

Yangyang Xu
Rensselaer Polytechnic Institute

Tinghui Xu
University of WisconsinMadison

Sheng Xu
Princeton University

Xingchi Yan
Harvard University

Ruiyi Yang
Princeton University

Bowen Yang
Caltech

Shixuan Zhang
Brown University

Stephen Zhang
University of Melbourne

Benjamin Zhang
University of Massachusetts Amherst

Jiwei Zhao
University of WisconsinMadison

Bohan Zhou
Dartmouth College

Xiang ZHOU
City University of Hong Kong

Wei Zhu
University of Massachusetts Amherst
Workshop Schedule
Monday, May 8, 2023

8:50  9:00 am EDTWelcome11th Floor Lecture Hall

9:00  9:45 am EDTNetwork Analysis of High Dimensional Data11th Floor Lecture Hall
 Speaker
 Allen Tannenbaum, Stony Brook University
 Session Chair
 James Murphy, Tufts University
Abstract
A major problem in data science is representation of data so that the variables driving key functions can be uncovered and explored. Correlation analysis is widely used to simplify networks of feature variables by reducing redundancies, but makes limited use of the network topology, relying on comparison of direct neighbor variables. The proposed method incorporates relational or functional profiles of neighboring variables along multiple common neighbors, which are fitted with Gaussian mixture models and compared using a data metric based on a version of optimal mass transport tailored to Gaussian mixtures. Hierarchical interactive visualization of the result leads to effective unbiased hypothesis generation. We will discuss several applications to medical imaging and cancer networks.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTSigned Cumulative Distribution Transform for Machine Learning11th Floor Lecture Hall
 Speaker
 Sumati Thareja, Vanderbilt University
 Session Chair
 James Murphy, Tufts University
Abstract
Classification and estimation problems are at the core of machine learning. In this talk we will see a new mathematical signal transform that renders data easy to classify or estimate, based on a very old theory of transportation that was started by Monge. We will learn about the existing Cumulative Distribution Transform and then extend to a more general measure theoretic framework, to define the new transform (Signed Cumulative Distribution Transform). We will look at both forward (analysis) and inverse (synthesis) formulas for the transform, and describe several of its properties including translation, scaling, convexity and isometry. Finally, we will demonstrate two applications of the transform in classifying (detecting) signals under random displacements and estimation of signal parameters under such displacements.

11:30 am  12:15 pm EDTTowards a Mathematical Theory of Development11th Floor Lecture Hall
 Virtual Speaker
 Geoffrey Schiebinger, University of British Columbia
 Session Chair
 James Murphy, Tufts University
Abstract
This talk introduces a mathematical theory of developmental biology, based on optimal transport. While, in principle, organisms are made of molecules whose motions are described by the Schödinger equation, there are simply too many molecules for this to be useful. Optimal transport—a fascinating topic in its own right, at the intersection of probability, statistics and optimization—provides a set of equations that describe development at the level of cells. Biology has entered a new era of precision measurement and massive datasets. Techniques like singlecell RNA sequencing (scRNAseq) and singlecell ATACseq have emerged as powerful tools to profile cell states at unprecedented molecular resolution. One of the most exciting prospects associated with this new trove of data is the possibility of studying temporal processes, such as differentiation and development. If we could understand the genetic forces that control embryonic development, then we would have a better idea of how cell types are stabilized throughout adult life and how they destabilize with age or in diseases like cancer. This would be within reach if we could analyze the dynamic changes in gene expression, as populations develop and subpopulations differentiate. However, this is not directly possible with current measurement technologies because they are destructive (e.g. cells must be lysed to measure expression profiles). Therefore, we cannot directly observe the waves of transcriptional patterns that dictate changes in cell type. This talk introduces a rigorous framework for understanding the developmental trajectories of cells in a dynamically changing, heterogeneous population based on static snapshots along a timecourse. The framework is based on a simple hypothesis: over short timescales cells can only change their expression profile by small amounts. We formulate this in precise mathematical terms using a classical tool called optimal transport (OT), and we propose that this optimal transport hypothesis is a fundamental mathematical principle of developmental biology.

12:30  2:00 pm EDTLunch  Optimal transport: Junior and seniorsWorking Lunch

2:00  2:45 pm EDTMultivariate Distributionfree testing using Optimal Transport11th Floor Lecture Hall
 Speaker
 Bodhisattva Sen, Columbia University
 Session Chair
 James Murphy, Tufts University
Abstract
We propose a general framework for distributionfree nonparametric testing in multidimensions, based on a notion of multivariate ranks defined using the theory of optimal transport (see e.g., Villani (2003)). We demonstrate the applicability of this approach by constructing exactly distributionfree tests for two classical nonparametric problems: (i) testing for the equality of two multivariate distributions, and (ii) testing for mutual independence between two random vectors. In particular, we propose (multivariate) rank versions of Hotelling T^2 and kernel twosample tests (e.g., Gretton et al. (2012), Szekely and Rizzo (2013)), and kernel tests for independence (e.g., Gretton et al. (2007), Szekely et al. (2007)) for scenarios (i) and (ii) respectively. We investigate the consistency and asymptotic distributions of these tests, both under the null and local contiguous alternatives. We also study the local power and asymptotic (Pitman) efficiency of these multivariate tests (based on optimal transport), and show that a subclass of these tests achieve attractive efficiency lower bounds that mimic the remarkable efficiency results of Hodges and Lehmann (1956) and Chernoff and Savage (1958) (for the Wilcoxonrank sum test). To the best of our knowledge, these are the first collection of multivariate, nonparametric, exactly distributionfree tests that provably achieve such attractive efficiency lower bounds. We also study the rates of convergence of the rank maps (aka optimal transport maps).

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTOptimal transport for estimating generalization properties of machine learning models11th Floor Lecture Hall
 Speaker
 Stefanie Jegelka, MIT
 Session Chair
 James Murphy, Tufts University
Abstract
One important challenge in practical machine learning is to estimate the generalization properties of a trained model, i.e., judging how well it will perform on unseen data. In this talk, we will discuss two examples of how optimal transport can help with this challenge. First, we address the problem of estimating generalization of deep neural networks. A critical factor is the geometry of the data in the latent embedding space. We analyze this data arrangement via an optimaltransportbased generalization of variance, and show its theoretical and empirical relevance via generalization bounds that are also empirically predictive. Second, we study the stability of neural networks for graph inputs, i.e., graph neural networks (GNNs), under shifts of the data distribution. In particular, to derive stability bounds, we need a suitable metric in the input space of graphs. We derive such a (pseudo)metric targeted to GNNs via a recursive optimal transport based distance between sets of trees. Our metric correlates better than state of the art with the behavior of GNNs under data distribution shifts.

4:30  6:30 pm EDTReception11th Floor Collaborative Space
Tuesday, May 9, 2023

9:00  9:45 am EDTOn the Convergence Rate of Sinkhorn’s Algorithm11th Floor Lecture Hall
 Speaker
 Marcel Nutz, Columbia University
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
We study Sinkhorn's algorithm for solving the entropically regularized optimal transport problem. Its iterate π_t is shown to satisfy H(π_tπ∗)+H(π∗π_t)=O(1/t) where H denotes relative entropy and π∗ the optimal coupling. This holds for a large class of cost functions and marginals, including quadratic cost with subgaussian marginals. We also obtain the rate O(1/t) for the dual suboptimality and O(1/t^2) for the marginal entropies. More precisely, we derive nonasymptotic bounds, and in contrast to previous results on linear convergence that are limited to bounded costs, our estimates do not deteriorate exponentially with the regularization parameter. We also obtain a stability result for π∗ as a function of the marginals, quantified in relative entropy.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTApplications of NoCollision Transportation Maps in Manifold Learning11th Floor Lecture Hall
 Speaker
 Elisa Negrini, Institute for Pure and Applied Mathematics, University of California Los Angeles
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
In this work, we investigate applications of nocollision transportation maps introduced in [Nurbekyan et. al., 2020] in manifold learning for image data. Recently, there has been a surge in applying transportationbased distances and features for data representing motionlike or deformationlike phenomena. Indeed, comparing intensities at fixed locations often does not reveal the data structure. Nocollision maps and distances developed in [Nurbekyan et. al., 2020] are sensitive to geometric features similar to optimal transportation (OT) maps but much cheaper to compute due to the absence of optimization. In this work, we prove that nocollision distances provide an isometry between translations (respectively dilations) of a single probability measure and the translation (respectively dilation) vectors equipped with a Euclidean distance. Furthermore, we prove that nocollision transportation maps, as well as OT and linearized OT maps, do not in general provide an isometry for rotations. The numerical experiments confirm our theoretical findings and show that nocollision distances achieve similar or better performance on several manifold learning tasks compared to other OT and Euclideanbased methods at a fraction of a computational cost.

11:30 am  12:15 pm EDTGraphical Optimal Transport and its applications11th Floor Lecture Hall
 Speaker
 Yongxin Chen, Georgia Institute of Technology
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
Multimarginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the welldeveloped algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games.

12:30  2:00 pm EDTLunch/Free Time

2:00  2:45 pm EDTAdvances in Distributionally Robust Optimization (DRO): Unifications, Extensions, and Applications11th Floor Lecture Hall
 Speaker
 Jose Blanchet, Stanford
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
We will discuss recent developments in distributionally robust optimization, including a tractable class of problems that simultaneously unifies and extends most of the formulations studied in DRO (including phidivergence, inversephidivergence, Wasserstein, and Sinkhorn). This unification is based on optimal transport theory with martingale constraints. We discuss various benefits of having the flexibility offered by these formulations in connection with, for example, the theory of epiconvergence and statistical robustness. We apply some of these new developments to optimal portfolio selection. Our implementations are motivated by intriguing experiments which show an unexpected outofsample performance of nonrobust policies in real data. This talk is partly based on joint work with Daniel Kuhn, Jiajin Li, Yiping Lu, and Bahar Taskesen.

3:00  5:00 pm EDT
Wednesday, May 10, 2023

9:00  9:45 am EDTMirror gradient flows: Euclidean and Wasserstein11th Floor Lecture Hall
 Speaker
 Soumik Pal, University of Washington, Seattle
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
We will talk about a new family of Wasserstein gradient flows that is inspired by Euclidean mirror gradient flows. These flows can often display faster convergence rates than the usual gradient flows. They have rich geometrical structures and give rise to a wide generalization of the Langevin diffusions and the FokkerPlanck PDEs. An immediate applications come from considering limits of Sinkhorn iterations.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTMany Processors, Little Time: MCMC for Partitions via Optimal Transport Couplings11th Floor Lecture Hall
 Speaker
 Tamara Broderick, Massachusetts Institute of Technology
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinitetime limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the “labelswitching problem”: semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions’ (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, timelimited regime.

11:30 am  12:15 pm EDTOn HamiltonJacobi (HJ) equations on the Wasserstein space on graphs.11th Floor Lecture Hall
 Speaker
 Wilfrid Gangbo, UCLA
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst

12:25  12:30 pm EDTGroup Photo (Immediately After Talk)11th Floor Lecture Hall

12:30  2:00 pm EDTLunch/Free Time

2:00  2:45 pm EDTCertifiable lowdimensional structure in transport and inference11th Floor Lecture Hall
 Speaker
 Youssef Marzouk, Massachusetts Institute of Technology
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
I will discuss two notions of lowdimensional structure in probability measures, and their interplay with transportdriven methods for sampling and approximate inference. The first seeks to approximate a highdimensional target measure as a lowdimensional update of a dominating reference measure. The second is lowrank conditional structure, where the goal is to replace conditioning variables with lowdimensional projections or summaries. In both cases, under appropriate assumptions on the reference or target measures, we can derive gradientbased upper bounds on the associated approximation error and minimize these bounds to identify good subspaces for approximation. The associated subspaces then dictate specific structural ansatzes for transport maps that represent the target of interest as the pushforward or pullback of a suitable reference measure. I will show several algorithmic instantiations of this idea: a greedy algorithm that builds deep compositions of maps, where lowdimensional projections of the parameters are iteratively transformed to match the target; and a simulationbased inference algorithm that uses lowrank conditional structure to efficiently solve Bayesian inverse problems. Based on joint work with Ricardo Baptista, Michael Brennan, and Olivier Zahm.

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTOptimal Mass Transport meets Stochastic Thermodynamics: Dissipation & Power in Physics and Biology11th Floor Lecture Hall
 Speaker
 Tryphon Georgiou, University of California, Irvine
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
The discovery in 1998 of a link between the Wasserstein2 metric, entropy, and the heat equation, by Jordan, Kinderlehrer, and Otto, precipitated the increasing relevance of optimal mass transport in the evolving theory of finitetime thermodynamics, aka stochastic energetics. Specifically, dissipation in finitetime thermodynamic transitions for Langevin models of colloidal particles can be measured in terms of the Wasserstein length of trajectories. This enabling new insight has led to quantifying power and efficiency of thermodynamic cycles that supersede classical quasistatic Carnot engine concepts that alternate their contact between heat baths of different temperatures. Indeed, naturally occurring processes often harvest energy from temperature or chemical gradients, where the enabling mechanism responsible for transduction of energy relies on nonequilibrium steady states and finitetime cycling. Optimal mass transport provides the geometric structure of the manifold of thermodynamic states for studying energy harvesting mechanisms. In this, dissipation and work output can be expressed as path and area integrals, and fundamental limitations on power and eficiency, in geometric terms leading to isoperimetric problems. The analysis presented provides guiding principles for building autonomous engines that extract work from thermal or chemical anisotropy in the environment.
Thursday, May 11, 2023

9:00  9:45 am EDTWasserstein Isometric Mapping and Image Manifold Learning11th Floor Lecture Hall
 Speaker
 Keaton Hamm, University of Texas at Arlington
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
We will discuss an algorithm called Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a lowdimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. We will discuss computational speedups to the algorithm such as use of linearized optimal transport or the Nystr\"{o}m method. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global and local techniques.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTFunctionspace regularized divergences for machine learning applications11th Floor Lecture Hall
 Speaker
 Ioannis (Yannis) Pantazis, Foundations of Research and Technology  Hellas
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
Divergences such as KullbackLeibler, Rényi and fdivergence play an increasingly important role in probabilistic machine learning offering a notion of distance between probability distributions. In the recent past, divergence estimation has been developed on the premise of variational formulas and function parametrization via neural networks. Despite the successes, the statistical estimation of a divergence is still considered a very challenging problem mainly due to high variance of the neuralbased estimators. Particularly, hard cases include high dimensional data, large divergence values and Rényi divergence when its order is larger than one. Our recent work focuses on reducing the variance by regularizing the function space of the variational formulas. We will present novel families of divergences which enjoy enhanced statistical properties as well as their properties. Those functionspace regularized divergences have been tested against a series of ML application including generative adversarial networks, mutual information estimation and rare subpopulation detection.

11:30 am  12:15 pm EDTControlling regularized conservation laws: Entropyentropy flux pairs11th Floor Lecture Hall
 Speaker
 Wuchen Li, University of South Carolina
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
In this talk, we study variational problems for regularized conservation laws with Lax`s entropyentropy flux pairs. We first introduce a modified optimal transport space based on conservation laws with diffusion. Using this space, we demonstrate that conservation laws with diffusion are fluxgradient flows. We next construct variational problems for these flows, for which we derive dual PDE systems for regularized conservation laws. Several examples, including traffic flow and Burgers` equation, are presented. We successfully compute the control of conservation laws by incorporating both primaldual algorithms and monotone schemes. This is based on joint work with Siting Liu and Stanley Osher.

12:30  2:00 pm EDTLunch  Outlook and future directions for OT in Data Science and Machine LearningWorking Lunch

2:00  2:45 pm EDTTriangular transport for learning probabilistic graphical models11th Floor Lecture Hall
 Speaker
 Rebecca Morrison, University of Colorado Boulder
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
Probabilistic graphical models encode the conditional independence properties satisfied by a joint probability distribution. If the distribution is Gaussian, the edges of an undirected graphical model correspond to nonzero entries of the precision matrix. Generalizing this result to continuous nonGaussian distributions, one can show that an edge exists if and only if an entry of the Hessian of the log density is nonzero (everywhere). But evaluation of the log density requires density estimation: for this, we propose the graphlearning algorithm SING (Sparsity Identification in NonGaussian distributions), which uses triangular transport for the density estimation step; this choice is advantageous as triangular maps inherit sparsity from conditional independence in the target distribution. Loosely speaking, the more nonGaussian the distribution, the more difficult the transport problem. For a broad class of nonGaussian distributions, however, estimating the Hessian of the log density is much easier than estimating the density itself. For the transport community, this result serves as a sort of goaloriented transport framework, in which the particular goal of graph learning greatly simplifies the transport problem.

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTStein transport for Bayesian inference11th Floor Lecture Hall
 Speaker
 Nikolas Nüsken, King’s College London
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
This talk is about Stein transport, a novel methodology for Bayesian inference that pushes an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can equivalently be obtained from either a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a timevarying score function as well as specific weights attached to the particles. I will discuss the geometric underpinnings of Stein transport and SVGD, and  time permitting  connections to MCMC and the theory of large deviations.
Friday, May 12, 2023

9:00  9:45 am EDTLipschitz regularized gradient flows and latent generative particles11th Floor Lecture Hall
 Speaker
 Panagiota Birmpa, HeriotWatt Univeristy
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
Lipschitz regularized fdivergences interpolate between the Wasserstein metric and fdivergences and provide a flexible family of loss functions for nonabsolutely continuous distributions (i.e. empirical), possibly with heavy tails. We construct gradient flows based on those divergences taking advantage of neural network spectral normalization (a closely related form of Lipschitz regularization). The Lipschitz regularized gradient flows induce a transport/discriminator particle algorithm where generative particles are moved along a vector field given by the gradient of the discriminator, the latter computed as in generative adversarial networks (GANs). The particle system generates approximate samples from typically highdimensional distributions known only from data. Examples of such gradient flows are Lipschitzregularized FokkerPlanck and porous medium equations for KullbackLeibler and alphadivergences respectively. Such PDE perspectives allow the analysis of the algorithm’s stability and convergence, for instance through an empirical, Lipschitz regularized, version of Fisher information which tracks the convergence of the algorithms.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTApproximations and learning in the Wasserstein space11th Floor Lecture Hall
 Speaker
 Caroline Moosmüller, University of North Carolina at Chapel Hill
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
Detecting differences and building classifiers between distributions, given only finite samples, are important tasks in a number of scientific fields. Optimal transport and the Wasserstein distance have evolved as the most natural concept to deal with such tasks, but have some computational drawbacks. In this talk, we describe an approximation framework through local linearizations that significantly reduces both the computational effort and the required training data in supervised learning settings. We also introduce LOT Wassmap, a computationally feasibly algorithm to uncover lowdimensional structures in the Wasserstein space. We provide guarantees on the embedding quality, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. The proposed algorithms are demonstrated in pattern recognition tasks in imaging and medical applications.

11:30 am  12:15 pm EDTOptimal transport problems with interaction effects11th Floor Lecture Hall
 Speaker
 Nestor Guillen, Texas State University
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
We consider two variations on the optimal transportation problem where the particles/agents being transported interact with each other. For instance, imagine the problem of moving a collection of boxes from one configuration to another, where all the boxes move in unison and must avoid each other. As we shall show, these problems can be posed as quadratic optimization problems in the space of probability measures over the space of paths. Although the resulting optimization problem is not always convex, one can show existence and even uniqueness for some types of interactions. Moreover, we show these problems admit a fluid mechanics formulation in the style of Benamou and Brenier. This talk is based on works in collaboration with René Cabrera (UT Austin) and Jacob Homerosky (Texas State).

12:30  2:00 pm EDTLunch/Free Time

2:00  2:45 pm EDTOn geometric properties of sliced optimal transport metrics11th Floor Lecture Hall
 Speaker
 Jun Kitagawa, Michigan State University
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
The sliced and max sliced Wasserstein metrics were originally proposed as a way to use 1D transport to speed up computation of the usual optimal transport metrics defined on spaces of probability measures. Some basic results are known about their metric structure, but not much is available in the way of a systematic study. In this talk, I will first discuss some further properties of these sliced metrics. Then, I will introduce a larger family of metric spaces into which these metrics can be embedded, which seem to have more desirable geometric properties. This talk is based on joint work with Asuka Takatsu.

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTMatching for causal effects via multimarginal unbalanced optimal transport11th Floor Lecture Hall
 Speaker
 Florian Gunsilius, University of Michigan
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
Matching on covariates is a wellestablished framework for estimating causal effects in observational studies. A major challenge is that established methods like matching via nearest neighbors possess poor statistical properties when the dimension of the continuous covariates is high. This article introduces an alternative matching approach based on unbalanced optimal transport that possesses better statistical properties in highdimensional settings. In particular, we prove that the proposed method dominates classical nearest neighbor matching in mean squared error in finite samples when the dimension of the continuous covariates is high enough. This notable result is already present in low dimensions, as we demonstrate in simulations. It follows from two properties of the new estimator. First, for any positive “matching radius”, the optimal matching obtained converges at the parametric rate in any dimension to the optimal population matching. This stands in contrast to the classical nearest neighbor matching, which suffers from a curse of dimensionality in the continuous covariates. Second, as the matching radius converges to zero, the method is unbiased in the population for the average treatment effect on the overlapping region. The approach also possesses several other desirable properties: it is flexible in allowing for many different ways to define the matching radius and the cost of matching, can be bootstrapped for inference, provides interpretable weights based on the cost of matching individuals, can be efficiently implemented via Sinkhorn iterations, and can match several treatment arms simultaneously. Importantly, it only selects good matches from any treatment arm, thus providing unbiased estimates of average treatment effects in the region of overlapping supports
All event times are listed in ICERM local time in Providence, RI (Eastern Daylight Time / UTC4).
All event times are listed in .
ICERM local time in Providence, RI is Eastern Daylight Time (UTC4). Would you like to switch back to ICERM time or choose a different custom timezone?
Request Reimbursement
This section is for general purposes only and does not indicate that all attendees receive funding. Please refer to your personalized invitation to review your offer.
 ORCID iD
 As this program is funded by the National Science Foundation (NSF), ICERM is required to collect your ORCID iD if you are receiving funding to attend this program. Be sure to add your ORCID iD to your Cube profile as soon as possible to avoid delaying your reimbursement.
 Acceptable Costs

 1 roundtrip between your home institute and ICERM
 Flights on U.S. or E.U. airlines – economy class to either Providence airport (PVD) or Boston airport (BOS)
 Ground Transportation to and from airports and ICERM.
 Unacceptable Costs

 Flights on nonU.S. or nonE.U. airlines
 Flights on U.K. airlines
 Seats in economy plus, business class, or first class
 Change ticket fees of any kind
 Multiuse bus passes
 Meals or incidentals
 Advance Approval Required

 Personal car travel to ICERM from outside New England
 Multipledestination plane ticket; does not include layovers to reach ICERM
 Arriving or departing from ICERM more than a day before or day after the program
 Multiple trips to ICERM
 Rental car to/from ICERM
 Flights on a Swiss, Japanese, or Australian airlines
 Arriving or departing from airport other than PVD/BOS or home institution's local airport
 2 oneway plane tickets to create a roundtrip (often purchased from Expedia, Orbitz, etc.)
 Travel Maximum Contributions

 New England: $250
 Other contiguous US: $750
 Asia & Oceania: $2,000
 All other locations: $1,500
 Note these rates were updated in Spring 2022 and superseded any prior invitation rates. Any invitations without travel support will still not receive travel support.
 Reimbursement Requests

Request Reimbursement with Cube
Refer to the back of your ID badge for more information. Checklists are available at the front desk and in the Reimbursement section of Cube.
 Reimbursement Tips

 Scanned original receipts are required for all expenses
 Airfare receipt must show full itinerary and payment
 ICERM does not offer per diem or meal reimbursement
 Allowable mileage is reimbursed at prevailing IRS Business Rate and trip documented via pdf of Google Maps result
 Keep all documentation until you receive your reimbursement!
 Reimbursement Timing

6  8 weeks after all documentation is sent to ICERM. All reimbursement requests are reviewed by numerous central offices at Brown who may request additional documentation.
 Reimbursement Deadline

Submissions must be received within 30 days of ICERM departure to avoid applicable taxes. Submissions after thirty days will incur applicable taxes. No submissions are accepted more than six months after the program end.