Optimal Transport in Data Science
Institute for Computational and Experimental Research in Mathematics (ICERM)
May 8, 2023  May 12, 2023
Your device timezone is . Do you want to view schedules in or choose a custom timezone?
Monday, May 8, 2023

8:50  9:00 am EDTWelcome11th Floor Lecture Hall

9:00  9:45 am EDTNetwork Analysis of High Dimensional Data11th Floor Lecture Hall
 Speaker
 Allen Tannenbaum, Stony Brook University
 Session Chair
 James Murphy, Tufts University
Abstract
A major problem in data science is representation of data so that the variables driving key functions can be uncovered and explored. Correlation analysis is widely used to simplify networks of feature variables by reducing redundancies, but makes limited use of the network topology, relying on comparison of direct neighbor variables. The proposed method incorporates relational or functional profiles of neighboring variables along multiple common neighbors, which are fitted with Gaussian mixture models and compared using a data metric based on a version of optimal mass transport tailored to Gaussian mixtures. Hierarchical interactive visualization of the result leads to effective unbiased hypothesis generation. We will discuss several applications to medical imaging and cancer networks.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTSigned Cumulative Distribution Transform for Machine Learning11th Floor Lecture Hall
 Speaker
 Sumati Thareja, Vanderbilt University
 Session Chair
 James Murphy, Tufts University
Abstract
Classification and estimation problems are at the core of machine learning. In this talk we will see a new mathematical signal transform that renders data easy to classify or estimate, based on a very old theory of transportation that was started by Monge. We will learn about the existing Cumulative Distribution Transform and then extend to a more general measure theoretic framework, to define the new transform (Signed Cumulative Distribution Transform). We will look at both forward (analysis) and inverse (synthesis) formulas for the transform, and describe several of its properties including translation, scaling, convexity and isometry. Finally, we will demonstrate two applications of the transform in classifying (detecting) signals under random displacements and estimation of signal parameters under such displacements.

11:30 am  12:15 pm EDTTowards a Mathematical Theory of Development11th Floor Lecture Hall
 Virtual Speaker
 Geoffrey Schiebinger, University of British Columbia
 Session Chair
 James Murphy, Tufts University
Abstract
This talk introduces a mathematical theory of developmental biology, based on optimal transport. While, in principle, organisms are made of molecules whose motions are described by the Schödinger equation, there are simply too many molecules for this to be useful. Optimal transport—a fascinating topic in its own right, at the intersection of probability, statistics and optimization—provides a set of equations that describe development at the level of cells. Biology has entered a new era of precision measurement and massive datasets. Techniques like singlecell RNA sequencing (scRNAseq) and singlecell ATACseq have emerged as powerful tools to profile cell states at unprecedented molecular resolution. One of the most exciting prospects associated with this new trove of data is the possibility of studying temporal processes, such as differentiation and development. If we could understand the genetic forces that control embryonic development, then we would have a better idea of how cell types are stabilized throughout adult life and how they destabilize with age or in diseases like cancer. This would be within reach if we could analyze the dynamic changes in gene expression, as populations develop and subpopulations differentiate. However, this is not directly possible with current measurement technologies because they are destructive (e.g. cells must be lysed to measure expression profiles). Therefore, we cannot directly observe the waves of transcriptional patterns that dictate changes in cell type. This talk introduces a rigorous framework for understanding the developmental trajectories of cells in a dynamically changing, heterogeneous population based on static snapshots along a timecourse. The framework is based on a simple hypothesis: over short timescales cells can only change their expression profile by small amounts. We formulate this in precise mathematical terms using a classical tool called optimal transport (OT), and we propose that this optimal transport hypothesis is a fundamental mathematical principle of developmental biology.

12:30  2:00 pm EDTLunch  Optimal transport: Junior and seniorsWorking Lunch

2:00  2:45 pm EDTMultivariate Distributionfree testing using Optimal Transport11th Floor Lecture Hall
 Speaker
 Bodhisattva Sen, Columbia University
 Session Chair
 James Murphy, Tufts University
Abstract
We propose a general framework for distributionfree nonparametric testing in multidimensions, based on a notion of multivariate ranks defined using the theory of optimal transport (see e.g., Villani (2003)). We demonstrate the applicability of this approach by constructing exactly distributionfree tests for two classical nonparametric problems: (i) testing for the equality of two multivariate distributions, and (ii) testing for mutual independence between two random vectors. In particular, we propose (multivariate) rank versions of Hotelling T^2 and kernel twosample tests (e.g., Gretton et al. (2012), Szekely and Rizzo (2013)), and kernel tests for independence (e.g., Gretton et al. (2007), Szekely et al. (2007)) for scenarios (i) and (ii) respectively. We investigate the consistency and asymptotic distributions of these tests, both under the null and local contiguous alternatives. We also study the local power and asymptotic (Pitman) efficiency of these multivariate tests (based on optimal transport), and show that a subclass of these tests achieve attractive efficiency lower bounds that mimic the remarkable efficiency results of Hodges and Lehmann (1956) and Chernoff and Savage (1958) (for the Wilcoxonrank sum test). To the best of our knowledge, these are the first collection of multivariate, nonparametric, exactly distributionfree tests that provably achieve such attractive efficiency lower bounds. We also study the rates of convergence of the rank maps (aka optimal transport maps).

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTOptimal transport for estimating generalization properties of machine learning models11th Floor Lecture Hall
 Speaker
 Stefanie Jegelka, MIT
 Session Chair
 James Murphy, Tufts University
Abstract
One important challenge in practical machine learning is to estimate the generalization properties of a trained model, i.e., judging how well it will perform on unseen data. In this talk, we will discuss two examples of how optimal transport can help with this challenge. First, we address the problem of estimating generalization of deep neural networks. A critical factor is the geometry of the data in the latent embedding space. We analyze this data arrangement via an optimaltransportbased generalization of variance, and show its theoretical and empirical relevance via generalization bounds that are also empirically predictive. Second, we study the stability of neural networks for graph inputs, i.e., graph neural networks (GNNs), under shifts of the data distribution. In particular, to derive stability bounds, we need a suitable metric in the input space of graphs. We derive such a (pseudo)metric targeted to GNNs via a recursive optimal transport based distance between sets of trees. Our metric correlates better than state of the art with the behavior of GNNs under data distribution shifts.

4:30  6:30 pm EDTReception11th Floor Collaborative Space
Tuesday, May 9, 2023

9:00  9:45 am EDTOn the Convergence Rate of Sinkhorn’s Algorithm11th Floor Lecture Hall
 Speaker
 Marcel Nutz, Columbia University
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
We study Sinkhorn's algorithm for solving the entropically regularized optimal transport problem. Its iterate π_t is shown to satisfy H(π_tπ∗)+H(π∗π_t)=O(1/t) where H denotes relative entropy and π∗ the optimal coupling. This holds for a large class of cost functions and marginals, including quadratic cost with subgaussian marginals. We also obtain the rate O(1/t) for the dual suboptimality and O(1/t^2) for the marginal entropies. More precisely, we derive nonasymptotic bounds, and in contrast to previous results on linear convergence that are limited to bounded costs, our estimates do not deteriorate exponentially with the regularization parameter. We also obtain a stability result for π∗ as a function of the marginals, quantified in relative entropy.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTApplications of NoCollision Transportation Maps in Manifold Learning11th Floor Lecture Hall
 Speaker
 Elisa Negrini, Institute for Pure and Applied Mathematics, University of California Los Angeles
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
In this work, we investigate applications of nocollision transportation maps introduced in [Nurbekyan et. al., 2020] in manifold learning for image data. Recently, there has been a surge in applying transportationbased distances and features for data representing motionlike or deformationlike phenomena. Indeed, comparing intensities at fixed locations often does not reveal the data structure. Nocollision maps and distances developed in [Nurbekyan et. al., 2020] are sensitive to geometric features similar to optimal transportation (OT) maps but much cheaper to compute due to the absence of optimization. In this work, we prove that nocollision distances provide an isometry between translations (respectively dilations) of a single probability measure and the translation (respectively dilation) vectors equipped with a Euclidean distance. Furthermore, we prove that nocollision transportation maps, as well as OT and linearized OT maps, do not in general provide an isometry for rotations. The numerical experiments confirm our theoretical findings and show that nocollision distances achieve similar or better performance on several manifold learning tasks compared to other OT and Euclideanbased methods at a fraction of a computational cost.

11:30 am  12:15 pm EDTGraphical Optimal Transport and its applications11th Floor Lecture Hall
 Speaker
 Yongxin Chen, Georgia Institute of Technology
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
Multimarginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the welldeveloped algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games.

12:30  2:00 pm EDTLunch/Free Time

2:00  2:45 pm EDTAdvances in Distributionally Robust Optimization (DRO): Unifications, Extensions, and Applications11th Floor Lecture Hall
 Speaker
 Jose Blanchet, Stanford
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
We will discuss recent developments in distributionally robust optimization, including a tractable class of problems that simultaneously unifies and extends most of the formulations studied in DRO (including phidivergence, inversephidivergence, Wasserstein, and Sinkhorn). This unification is based on optimal transport theory with martingale constraints. We discuss various benefits of having the flexibility offered by these formulations in connection with, for example, the theory of epiconvergence and statistical robustness. We apply some of these new developments to optimal portfolio selection. Our implementations are motivated by intriguing experiments which show an unexpected outofsample performance of nonrobust policies in real data. This talk is partly based on joint work with Daniel Kuhn, Jiajin Li, Yiping Lu, and Bahar Taskesen.

3:00  5:00 pm EDT
Wednesday, May 10, 2023

9:00  9:45 am EDTMirror gradient flows: Euclidean and Wasserstein11th Floor Lecture Hall
 Speaker
 Soumik Pal, University of Washington, Seattle
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
We will talk about a new family of Wasserstein gradient flows that is inspired by Euclidean mirror gradient flows. These flows can often display faster convergence rates than the usual gradient flows. They have rich geometrical structures and give rise to a wide generalization of the Langevin diffusions and the FokkerPlanck PDEs. An immediate applications come from considering limits of Sinkhorn iterations.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTMany Processors, Little Time: MCMC for Partitions via Optimal Transport Couplings11th Floor Lecture Hall
 Speaker
 Tamara Broderick, Massachusetts Institute of Technology
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinitetime limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the “labelswitching problem”: semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions’ (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, timelimited regime.

11:30 am  12:15 pm EDTOn HamiltonJacobi (HJ) equations on the Wasserstein space on graphs.11th Floor Lecture Hall
 Speaker
 Wilfrid Gangbo, UCLA
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst

12:25  12:30 pm EDTGroup Photo (Immediately After Talk)11th Floor Lecture Hall

12:30  2:00 pm EDTLunch/Free Time

2:00  2:45 pm EDTCertifiable lowdimensional structure in transport and inference11th Floor Lecture Hall
 Speaker
 Youssef Marzouk, Massachusetts Institute of Technology
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
I will discuss two notions of lowdimensional structure in probability measures, and their interplay with transportdriven methods for sampling and approximate inference. The first seeks to approximate a highdimensional target measure as a lowdimensional update of a dominating reference measure. The second is lowrank conditional structure, where the goal is to replace conditioning variables with lowdimensional projections or summaries. In both cases, under appropriate assumptions on the reference or target measures, we can derive gradientbased upper bounds on the associated approximation error and minimize these bounds to identify good subspaces for approximation. The associated subspaces then dictate specific structural ansatzes for transport maps that represent the target of interest as the pushforward or pullback of a suitable reference measure. I will show several algorithmic instantiations of this idea: a greedy algorithm that builds deep compositions of maps, where lowdimensional projections of the parameters are iteratively transformed to match the target; and a simulationbased inference algorithm that uses lowrank conditional structure to efficiently solve Bayesian inverse problems. Based on joint work with Ricardo Baptista, Michael Brennan, and Olivier Zahm.

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTOptimal Mass Transport meets Stochastic Thermodynamics: Dissipation & Power in Physics and Biology11th Floor Lecture Hall
 Speaker
 Tryphon Georgiou, University of California, Irvine
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
The discovery in 1998 of a link between the Wasserstein2 metric, entropy, and the heat equation, by Jordan, Kinderlehrer, and Otto, precipitated the increasing relevance of optimal mass transport in the evolving theory of finitetime thermodynamics, aka stochastic energetics. Specifically, dissipation in finitetime thermodynamic transitions for Langevin models of colloidal particles can be measured in terms of the Wasserstein length of trajectories. This enabling new insight has led to quantifying power and efficiency of thermodynamic cycles that supersede classical quasistatic Carnot engine concepts that alternate their contact between heat baths of different temperatures. Indeed, naturally occurring processes often harvest energy from temperature or chemical gradients, where the enabling mechanism responsible for transduction of energy relies on nonequilibrium steady states and finitetime cycling. Optimal mass transport provides the geometric structure of the manifold of thermodynamic states for studying energy harvesting mechanisms. In this, dissipation and work output can be expressed as path and area integrals, and fundamental limitations on power and eficiency, in geometric terms leading to isoperimetric problems. The analysis presented provides guiding principles for building autonomous engines that extract work from thermal or chemical anisotropy in the environment.
Thursday, May 11, 2023

9:00  9:45 am EDTWasserstein Isometric Mapping and Image Manifold Learning11th Floor Lecture Hall
 Speaker
 Keaton Hamm, University of Texas at Arlington
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
We will discuss an algorithm called Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a lowdimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. We will discuss computational speedups to the algorithm such as use of linearized optimal transport or the Nystr\"{o}m method. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global and local techniques.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTFunctionspace regularized divergences for machine learning applications11th Floor Lecture Hall
 Speaker
 Ioannis (Yannis) Pantazis, Foundations of Research and Technology  Hellas
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
Divergences such as KullbackLeibler, Rényi and fdivergence play an increasingly important role in probabilistic machine learning offering a notion of distance between probability distributions. In the recent past, divergence estimation has been developed on the premise of variational formulas and function parametrization via neural networks. Despite the successes, the statistical estimation of a divergence is still considered a very challenging problem mainly due to high variance of the neuralbased estimators. Particularly, hard cases include high dimensional data, large divergence values and Rényi divergence when its order is larger than one. Our recent work focuses on reducing the variance by regularizing the function space of the variational formulas. We will present novel families of divergences which enjoy enhanced statistical properties as well as their properties. Those functionspace regularized divergences have been tested against a series of ML application including generative adversarial networks, mutual information estimation and rare subpopulation detection.

11:30 am  12:15 pm EDTControlling regularized conservation laws: Entropyentropy flux pairs11th Floor Lecture Hall
 Speaker
 Wuchen Li, University of South Carolina
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
In this talk, we study variational problems for regularized conservation laws with Lax`s entropyentropy flux pairs. We first introduce a modified optimal transport space based on conservation laws with diffusion. Using this space, we demonstrate that conservation laws with diffusion are fluxgradient flows. We next construct variational problems for these flows, for which we derive dual PDE systems for regularized conservation laws. Several examples, including traffic flow and Burgers` equation, are presented. We successfully compute the control of conservation laws by incorporating both primaldual algorithms and monotone schemes. This is based on joint work with Siting Liu and Stanley Osher.

12:30  2:00 pm EDTLunch  Outlook and future directions for OT in Data Science and Machine LearningWorking Lunch

2:00  2:45 pm EDTTriangular transport for learning probabilistic graphical models11th Floor Lecture Hall
 Speaker
 Rebecca Morrison, University of Colorado Boulder
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
Probabilistic graphical models encode the conditional independence properties satisfied by a joint probability distribution. If the distribution is Gaussian, the edges of an undirected graphical model correspond to nonzero entries of the precision matrix. Generalizing this result to continuous nonGaussian distributions, one can show that an edge exists if and only if an entry of the Hessian of the log density is nonzero (everywhere). But evaluation of the log density requires density estimation: for this, we propose the graphlearning algorithm SING (Sparsity Identification in NonGaussian distributions), which uses triangular transport for the density estimation step; this choice is advantageous as triangular maps inherit sparsity from conditional independence in the target distribution. Loosely speaking, the more nonGaussian the distribution, the more difficult the transport problem. For a broad class of nonGaussian distributions, however, estimating the Hessian of the log density is much easier than estimating the density itself. For the transport community, this result serves as a sort of goaloriented transport framework, in which the particular goal of graph learning greatly simplifies the transport problem.

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTStein transport for Bayesian inference11th Floor Lecture Hall
 Speaker
 Nikolas Nüsken, King’s College London
 Session Chair
 Shuchin Aeron, Tufts University
Abstract
This talk is about Stein transport, a novel methodology for Bayesian inference that pushes an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can equivalently be obtained from either a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a timevarying score function as well as specific weights attached to the particles. I will discuss the geometric underpinnings of Stein transport and SVGD, and  time permitting  connections to MCMC and the theory of large deviations.
Friday, May 12, 2023

9:00  9:45 am EDTLipschitz regularized gradient flows and latent generative particles11th Floor Lecture Hall
 Speaker
 Panagiota Birmpa, HeriotWatt Univeristy
 Session Chair
 Markos Katsoulakis, University of Massachusetts Amherst
Abstract
Lipschitz regularized fdivergences interpolate between the Wasserstein metric and fdivergences and provide a flexible family of loss functions for nonabsolutely continuous distributions (i.e. empirical), possibly with heavy tails. We construct gradient flows based on those divergences taking advantage of neural network spectral normalization (a closely related form of Lipschitz regularization). The Lipschitz regularized gradient flows induce a transport/discriminator particle algorithm where generative particles are moved along a vector field given by the gradient of the discriminator, the latter computed as in generative adversarial networks (GANs). The particle system generates approximate samples from typically highdimensional distributions known only from data. Examples of such gradient flows are Lipschitzregularized FokkerPlanck and porous medium equations for KullbackLeibler and alphadivergences respectively. Such PDE perspectives allow the analysis of the algorithm’s stability and convergence, for instance through an empirical, Lipschitz regularized, version of Fisher information which tracks the convergence of the algorithms.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTApproximations and learning in the Wasserstein space11th Floor Lecture Hall
 Speaker
 Caroline Moosmüller, University of North Carolina at Chapel Hill
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
Detecting differences and building classifiers between distributions, given only finite samples, are important tasks in a number of scientific fields. Optimal transport and the Wasserstein distance have evolved as the most natural concept to deal with such tasks, but have some computational drawbacks. In this talk, we describe an approximation framework through local linearizations that significantly reduces both the computational effort and the required training data in supervised learning settings. We also introduce LOT Wassmap, a computationally feasibly algorithm to uncover lowdimensional structures in the Wasserstein space. We provide guarantees on the embedding quality, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. The proposed algorithms are demonstrated in pattern recognition tasks in imaging and medical applications.

11:30 am  12:15 pm EDTOptimal transport problems with interaction effects11th Floor Lecture Hall
 Speaker
 Nestor Guillen, Texas State University
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
We consider two variations on the optimal transportation problem where the particles/agents being transported interact with each other. For instance, imagine the problem of moving a collection of boxes from one configuration to another, where all the boxes move in unison and must avoid each other. As we shall show, these problems can be posed as quadratic optimization problems in the space of probability measures over the space of paths. Although the resulting optimization problem is not always convex, one can show existence and even uniqueness for some types of interactions. Moreover, we show these problems admit a fluid mechanics formulation in the style of Benamou and Brenier. This talk is based on works in collaboration with René Cabrera (UT Austin) and Jacob Homerosky (Texas State).

12:30  2:00 pm EDTLunch/Free Time

2:00  2:45 pm EDTOn geometric properties of sliced optimal transport metrics11th Floor Lecture Hall
 Speaker
 Jun Kitagawa, Michigan State University
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
The sliced and max sliced Wasserstein metrics were originally proposed as a way to use 1D transport to speed up computation of the usual optimal transport metrics defined on spaces of probability measures. Some basic results are known about their metric structure, but not much is available in the way of a systematic study. In this talk, I will first discuss some further properties of these sliced metrics. Then, I will introduce a larger family of metric spaces into which these metrics can be embedded, which seem to have more desirable geometric properties. This talk is based on joint work with Asuka Takatsu.

3:00  3:30 pm EDTCoffee Break11th Floor Collaborative Space

3:30  4:15 pm EDTMatching for causal effects via multimarginal unbalanced optimal transport11th Floor Lecture Hall
 Speaker
 Florian Gunsilius, University of Michigan
 Session Chair
 Luc ReyBellet, UMass Amherst
Abstract
Matching on covariates is a wellestablished framework for estimating causal effects in observational studies. A major challenge is that established methods like matching via nearest neighbors possess poor statistical properties when the dimension of the continuous covariates is high. This article introduces an alternative matching approach based on unbalanced optimal transport that possesses better statistical properties in highdimensional settings. In particular, we prove that the proposed method dominates classical nearest neighbor matching in mean squared error in finite samples when the dimension of the continuous covariates is high enough. This notable result is already present in low dimensions, as we demonstrate in simulations. It follows from two properties of the new estimator. First, for any positive “matching radius”, the optimal matching obtained converges at the parametric rate in any dimension to the optimal population matching. This stands in contrast to the classical nearest neighbor matching, which suffers from a curse of dimensionality in the continuous covariates. Second, as the matching radius converges to zero, the method is unbiased in the population for the average treatment effect on the overlapping region. The approach also possesses several other desirable properties: it is flexible in allowing for many different ways to define the matching radius and the cost of matching, can be bootstrapped for inference, provides interpretable weights based on the cost of matching individuals, can be efficiently implemented via Sinkhorn iterations, and can match several treatment arms simultaneously. Importantly, it only selects good matches from any treatment arm, thus providing unbiased estimates of average treatment effects in the region of overlapping supports
All event times are listed in ICERM local time in Providence, RI (Eastern Standard Time / UTC5).
All event times are listed in .
ICERM local time in Providence, RI is Eastern Standard Time (UTC5). Would you like to switch back to ICERM time or choose a different custom timezone?
Schedule Timezone Updated