Organizing Committee
- Marta D'Elia
Pasteur Labs. and Stanford University - George Karniadakis
Brown University - Siddhartha Mishra
ETH Zurich - Themistoklis Sapsis
MIT - Jinchao Xu
Pennsylvania State University - Zhongqiang Zhang
Worcester Polytechnic Institute
Abstract
MSML2023 is the fourth edition of a newly established conference, with emphasis on promoting the study of mathematical theory and algorithms of machine learning, as well as applications of machine learning in scientific computing and engineering disciplines. This conference aims to bring together the communities of machine learning, applied mathematics, and computational science and engineering, to exchange ideas and progress in the fast-growing field of scientific machine learning (SciML). The objective of this annual conference series is to promote the study of:
- Theory and algorithms of machine learning.
- Applications in scientific and engineering disciplines such as physics, chemistry, material sciences, fluid and solid mechanics, etc.
- To provide hands-on tutorials for students and new researchers in the field.
Previous MSML Conferences:
First MSML: https://www.pacm.princeton.edu/news/msml2020-mathematical-and-scientific-machine-learning-conference
Second MSML: https://msml21.github.io/
Third MSML: https://msml22.github.io/
Submission for contributed posters/talks is now open and will stay open until February 10, 2023.
Please submit this form to be considered.
This workshop is partially funded by AFOSR award FA9550-23-1-0193 and DOE award DE-SC0023818.
Confirmed Speakers & Participants
Talks will be presented virtually or in-person as indicated in the schedule below.
- Speaker
- Poster Presenter
- Attendee
- Virtual Attendee
-
Jonas Actor
Sandia National Laboratories
-
Brad Aimone
Sandia National Laboratories
-
Rima Alaifari
ETH Zurich, D-MATH (SAM)
-
Bang An
King Abdullah University of Science and Technology
-
Shivam Barwey
Argonne National Laboratory
-
Peter Battaglia
DeepMind
-
Sara Bicego
Imperial College London
-
Andrea Bonfanti
BMW
-
Nicolas Boulle
University of Cambridge
-
Michael Bronstein
University of Oxford
-
Edoardo Centofanti
University of Pavia
-
Biswadeep Chakraborty
Georgia Institute of Technology
-
Peng Chen
Georgia Institute of Technology
-
Jingyi Chen
The University of Tulsa
-
Yifan Chen
Caltech
-
Paula Chen
Brown University
-
Xiaoli Chen
National University of Singapore
-
Zheng “Leslie” Chen
University of Massachusetts Dartmouth
-
Siu Wun Cheung
Lawrence Livermore National Laboratory
-
Frank Cole
University of Massachusetts Amherst
-
Marta D'Elia
Pasteur Labs. and Stanford University
-
Tharindu De Alwis
Worcester Polytechnic Institute
-
David Del Rey Fernádez
University of Waterloo
-
Suchuan Dong
Purdue University
-
Priya Donti
MIT
-
Fariba Fahroo
AFOSR
-
Tiffany Fan
Stanford University
-
Muhammad Faryad
Pennsylvania State University
-
Daniel Floryan
University of Houston
-
Khaled Furati
King Fahd University of Petroleum and Minerals
-
Zhenyuan Gao
Dassault Systemes Simulia Corp
-
Anna Gilbert
Yale University
-
Charles Godfrey
Pacific Northwest National Laboratory
-
Somdatta Goswami
Brown University
-
Jonathan Gryak
Queens College, CUNY
-
Kanan Gupta
Texas A&M University
-
Jihun Han
Dartmouth College
-
Ashlin Harris
Brown University
-
Huan He
Harvard University
-
Alex Hernandez-Garcia
Mila
-
Francisco Holguin
Johns Hopkins Applied Physics Lab
-
Youngjoon Hong
Sungkyunkwan University
-
Intekhab Hossain
Harvard University
-
Amanda Howard
Pacific Northwest National Laboratory
-
Juntao Huang
Texas Tech University
-
Nadeesha Jayaweera
Worcester Polytechnic Institute
-
Stefanie Jegelka
MIT
-
Reese Jones
Sandia National Laboratories
-
Adar Kahana
Brown University
-
Vasileios Kalantzis
IBM Research
-
Amin Karbasi
Yale University
-
George Karniadakis
Brown University
-
Yannis Kevrekidis
Johns Hopkins University
-
Taufiquar Khan
UNC Charolotte
-
Dohyun Kim
Brown University
-
Tyler Kroells
Iowa State University
-
Dinesh Kumar
University of Bristol
-
Seulip Lee
University of Georgia
-
Youngkyu Lee
KAIST
-
Chunyan Li
University of South Carolina
-
Ying Liang
Purdue University
-
Lizuo Liu
Southern Methodist University
-
Yun Lu
Kutztown University
-
Lu Lu
University of Pennsylvania
-
Sibusiso Mabuza
JMP Statistical Discovery
-
Georg Maierhofer
Sorbonne Université
-
Carlo Marcati
University of Pavia
-
Romit Maulik
Argonne National Laboratory
-
HRUSHIKESH MHASKAR
Claremont Graduate University (Claremont, CA, US)
-
Katarzyna Michalowska
Brown University
-
Siddhartha Mishra
ETH Zurich
-
Katherine Moore
Amherst College
-
Kateryna Morozovska
KTH Royal Institute of Technology
-
Indranil Nayak
The Ohio State University
-
Suman Neupane
University of North Carolina at Charlotte
-
Ebenezer Oluwasakin
Middle Tennessee State University
-
Houman Owhadi
California Institute of Technology
-
Stefano Pagani
Politecnico di Milano
-
Priya Panda
Yale University
-
Luca Pavarino
Università degli Studi di Pavia
-
Luca Pegolotti
Stanford University
-
Adrienne Propp
Stanford University
-
Akshay Rangamani
Massachusetts Institute of Technology
-
Signe Riemer-Sørensen
SINTEF Digital
-
Ritwick ROY
Simulia Corp.
-
Mohsen Sadr
Massachusetts Institute of Technology
-
Themistoklis Sapsis
MIT
-
Daniele Schiavazzi
University of Notre Dame
-
Catherine Schuman
University of Tennessee
-
Christoph Schwab
ETH Zürich
-
Jacob Seidman
University of Pennsylvania
-
Peter Sentz
Brown University
-
Panos Stinis
Pacific Northwest National Laboratory
-
Maria Luisa Taccari
university of leeds
-
Daniel Tartakovsky
Stanford University
-
Nathaniel Trask
Sandia National Laboratory
-
Elise Walker
Sandia National Laboratories
-
Hong Wang
University of South Carolina
-
Steffen W. R. Werner
Courant Institute, New York University
-
Jinchao Xu
Pennsylvania State University
-
Mengjia Xu
New Jersey Institute of Technology
-
Jue Yan
Iowa State University
-
Qian Yang
University of Connecticut
-
Yue Yu
Lehigh University
-
zecheng zhang
Carnegie Mellon University
-
Zhongqiang Zhang
Worcester Polytechnic Institute
-
Hongli Zhao
University of Chicago
-
Xueyu Zhu
University of Iowa
-
Qiao Zhuang
Worcester Polytechnic Institute
Workshop Schedule
Monday, June 5, 2023
-
8:30 - 8:50 am EDTCheck In11th Floor Collaborative Space
-
8:50 - 9:00 am EDTWelcome11th Floor Lecture Hall
- Brendan Hassett, ICERM/Brown University
-
9:00 - 9:45 am EDTA graph exterior calculus for structure-preserving ML: data-driven modeling, graph analytics, and causal learning11th Floor Lecture Hall
- Speaker
- Nathaniel Trask, Sandia National Laboratory
- Session Chair
- George Karniadakis, Brown University
Abstract
We introduce a graph exterior calculus which offers a rigorous mathematical framework for developing machine learning models on graphs. The calculus exactly mimics traditional vector calculus, providing analogous theoretical tools to variational methods for PDEs and providing a simple mathematical framework for stability analysis and preservation of physical and mathematical structure. In this talk we briefly introduce the fundamentals of the framework before showing how it may be used for a broad range of scientific machine learning tasks: fitting of network models to data, graph discovery associated with control volume analysis, reversible/irreversible bracket dynamics discovery, and causal DAG discovery.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTLearning simulation: graphs, physics, and weather11th Floor Lecture Hall
- Virtual Speaker
- Peter Battaglia, DeepMind
- Session Chair
- George Karniadakis, Brown University
Abstract
Simulation is one of the most important tools in science and engineering. However accurate simulation faces two challenges: (1) heavy compute requirements, and (2) sophisticated underlying equations which require deep expertise to formulate. Recent advances in machine learning-based simulation are now addressing both challenges by (1) allowing dynamics to be modeled with cheaper representations and computations, and (2) learning dynamics models directly from data. This talk will survey advances in graph-based learned simulation from the past few years, then deep dive into recent advances in machine learning-based weather prediction that have resulted in learned simulators that outperform the top operational forecasting systems in the world.
-
11:30 am - 12:15 pm EDTMachine learning for science or machine learning + science?11th Floor Lecture Hall
- Speaker
- Anna Gilbert, Yale University
- Session Chair
- George Karniadakis, Brown University
Abstract
There has been considerable attention paid to machine learning tools for science. There is a belief that such tools can "reveal profound insights hiding in large and growing datasets." Indeed, people hope that models "can be automatically derived from that data" and that such models can be used to identify features, reduce complexity, and control experiments. This talk aims to caution the community against irrational exuberance and to investigate very carefully such claims. I will discuss two situations in which the blind usage of machine learning techniques fails to live up to the hype.
-
12:30 - 2:30 pm EDTLunch/Free Time
-
2:30 - 3:15 pm EDTComputational Hypergraph Discovery11th Floor Lecture Hall
- Speaker
- Houman Owhadi, California Institute of Technology
- Session Chair
- Jinchao Xu, Pennsylvania State University
Abstract
Most problems in Computational Sciences and Engineering can be formulated as that of discovering and/or completing, from data, a computational hypergraph representing (possibly partially known) functional dependencies (represented as hyperedges) between groups of variables (represented as nodes). When the structure of the hypergraph is known then the problem reduces to that of approximating (from data) unknown functions and variables and it is solved, in the Computational Graph Completion (CGC) framework, by replacing unknown functions by Gaussian Processes and computing their MAP or Empirical Bayes estimators given available data. In this talk we will focus on the problem of discovering the structure (connectivity) of the hypergraph itself from data.
-
3:30 - 4:00 pm EDTCoffee Break11th Floor Collaborative Space
-
4:00 - 4:20 pm EDTNeural architecture search for scientific machine learning11th Floor Lecture Hall
- Speaker
- Romit Maulik, Argonne National Laboratory
- Session Chair
- Jinchao Xu, Pennsylvania State University
Abstract
The construction of high-performing neural network architectures is central to their impressive performance in several scientific machine learning (SciML) tasks. In this talk, we will introduce a search framework for discovering high performing neural networks on distributed computing resources. Moreover, we will also demonstrate how our search framework may be used for multiobjective optimization as well as ensemble-based uncertainty quantification. Our search will be used to discover accurate and efficient neural networks for various SciML tasks such as for geophysical forecasting and flow-reconstruction from sparse observations with quantified uncertainty.
-
4:30 - 4:50 pm EDTCell-average based neural network method for time dependent problems11th Floor Lecture Hall
- Speaker
- Jue Yan, Iowa State University
- Session Chair
- Jinchao Xu, Pennsylvania State University
Abstract
In this talk, we present the recently developed cell-average based neural network (CANN) method. The method is motivated by finite volume scheme and is based on the integral or weak formulation of the PDEs. A simple feed forward network is forced to learn the solution average difference between two neighboring time steps. Well trained network parameter set is identified as the scheme coefficients of an explicit one-step finite volume type method. The CANN method is implemented as a regular finite volume scheme, thus is mesh dependent. Different to conventional numerical methods, CANN method can be relieved from the explicit scheme CFL restriction thus can adapt large time step size for solution evolution forward in time. CANN method can sharply evolve contact discontinuity with almost zero numerical diffusion. Shock and rarefaction waves are well captured for nonlinear hyperbolic conservation laws.
-
5:00 - 6:30 pm EDTReception11th Floor Collaborative Space
Tuesday, June 6, 2023
-
9:00 - 9:45 am EDTPhysics-inspired learning on graphs11th Floor Lecture Hall
- Virtual Speaker
- Michael Bronstein, University of Oxford
- Session Chair
- Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
The message-passing paradigm has been the “battle horse” of deep learning on graphs for several years, making graph neural networks a big success in a wide range of applications, from particle physics to protein design. From a theoretical viewpoint, it established the link to the Weisfeiler-Lehman hierarchy, allowing to analyse the expressive power of GNNs. We argue that the very “node-and-edge”-centric mindset of current graph deep learning schemes may hinder future progress in the field. As an alternative, we propose physics-inspired “continuous” learning models that open up a new trove of tools from the fields of differential geometry, algebraic topology, and differential equations so far largely unexplored in graph ML.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTOn Learning Operators11th Floor Lecture Hall
- Virtual Speaker
- Siddhartha Mishra, ETH Zurich
- Session Chair
- Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
Learning Operators from data is emerging as a dominant paradigm in the application of machine learning to PDEs. Yet, what exactly entails operator learning still eludes a rigorous characterization. We argue that it is not enough for a model to process functions as inputs and outputs to be characterized as an operator learning framework. Instead, it is essential to impose some form of continuous-discrete equivalence to enable the genuine learning of operators, rather than just discrete representations of them. To this end, we adapt tools from harmonic analysis in the form of frame theory to define representation equivalent neural operators (ReNOs), which are constructed to enforce a suitable form of continuous-discrete equivalence. We investigate whether several existing learning frameworks are ReNOs or not. Then, we will present a novel operator learning paradigm, that of convolutional neural operators (CNOs) which are designed to be ReNOs. CNOs are shown to approximate operators, stemming from a large class of PDEs, to desired accuracy. Moreover, we compare CNOs to existing operator learning algorithms in numerical experiments to demonstrate that CNOs are competitive in performance for a variety of PDEs.
-
11:30 - 11:50 am EDTTransfer learning for surrogate models of PDEs11th Floor Lecture Hall
- Speaker
- Adrienne Propp, Stanford University
- Session Chair
- Nathaniel Trask, Sandia National Laboratory
Abstract
The development of efficient surrogates for PDEs is a critical step towards scalable modeling of complex, multiscale systems-of-systems. We use transfer learning with multilevel data to train a deep convolutional NN (CNN)-based surrogate model, which significantly reduces the cost of data generation relative to a conventional approach. We show that transfer learning on a mixture of high- and low-fidelity training data—obtained with a two-dimensional PDE and its one-dimensional approximation, respectively—reduces the cost of data generation without reducing performance of the surrogate.
-
12:00 - 12:20 pm EDTLearning Reduced-Order Models for Cardiovascular Simulations using Graph Neural Networks11th Floor Lecture Hall
- Speaker
- Luca Pegolotti, Stanford University
- Session Chair
- Nathaniel Trask, Sandia National Laboratory
Abstract
We present a novel approach for simulating blood flow dynamics in cardiovascular modeling using a modified version of MeshGraphNet, a graph neural network architecture originally developed for meshed data. Our method involves developing one-dimensional reduced-order models that predict the pressure and flow rate at vessel centerline nodes. The graph-neural network acts as an iterative solver, taking the state of the system at a particular timestep as input and providing an update that allows us to evolve the system to the next timestep. The approach is accurate and generalizable, achieving errors below 2% and 3% for pressure and flow rate, respectively, in a variety of anatomies and boundary conditions. Our modifications to MeshGraphNet enable its application to our specific problem domain, and our findings demonstrate the effectiveness of our approach for simulating blood flow dynamics in complex cardiovascular systems.
-
12:30 - 2:30 pm EDTPoster Session LunchPoster Session - 10th Floor Collaborative Space
-
2:30 - 2:50 pm EDTGFlowNets to accelerate scientific discovery with machine learning11th Floor Lecture Hall
- Speaker
- Alex Hernandez-Garcia, Mila
- Session Chair
- Nathaniel Trask, Sandia National Laboratory
Abstract
Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the pace of scientific discoveries. The last few decades have seen the consolidation of data-driven scientific discoveries. However, in order to leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key current challenge for machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating uncertainty and generating sets of diverse candidates. This motivated a new machine learning probabilistic framework called GFlowNets, which can be applied both for modelling and for the experimental design components of the active learning theory-experiment-analysis loop. GFlowNets learn to sample proportionally to a reward function, which enables sampling diverse, high-reward candidates. Equipped with the capabilities of deep learning, GFlowNets can also be used to perform efficient and amortized probabilistic inference, consistent with the knowledge captured in the world model, trained from acquired experimental data. This talk briefly introduced GFlowNets and its relevance for scientific discovery, in particular when used together with active learning.
-
3:00 - 3:20 pm EDTSparse Cholesky Factorization for Solving PDEs with Gaussian Processes11th Floor Lecture Hall
- Speaker
- Yifan Chen, Caltech
- Session Chair
- Nathaniel Trask, Sandia National Laboratory
Abstract
"Gaussian processes (GPs) and kernel methods are promising automatic approaches for solving PDEs, as they combine the theoretical rigor of traditional numerical algorithms with the flexible design of machine learning solvers. The complexity bottleneck of GP-based methods lies in computing with dense kernel matrices. In the case of PDE problems, these matrices may also involve partial derivatives of the kernels, and fast algorithms for such matrices are less developed compared to the derivative-free case. In this talk, we will discuss a rigorous sparse Cholesky factorization algorithm to make GP-based PDE solvers scalable. The algorithm relies on the near-sparsity of the Cholesky factor under a multiscale ordering of the pointwise and derivative-type entries of the matrices. It enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N \log^d (N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. As a result, this leads to a near-linear space/time complexity method for solving general PDEs with GPs."
-
3:30 - 4:00 pm EDTCoffee Break11th Floor Collaborative Space
-
4:00 - 4:20 pm EDTMachine-Learned Finite Element Exterior Calculus for Linear and Nonlinear Problems11th Floor Lecture Hall
- Speaker
- Jonas Actor, Sandia National Laboratories
- Session Chair
- Nathaniel Trask, Sandia National Laboratory
Abstract
For many applications, scientific machine learning techniques are still limited in their ability to guarantee structure preservation inherent to various physical systems, and in their ability to achieve theoretic convergence rates. To address this limitation, we introduce a scientific machine learning framework based upon a partition of unity architecture; this architecture identifies physically-relevant control volumes, encoding generalized fluxes between subdomains via Whitney forms. Subsequently, this architecture admits a data-driven finite element exterior calculus allowing discovery of mixed finite element spaces with closed form quadrature rules. The resulting differentiable parameterization of geometry may be trained in an end-to-end fashion to extract reduced models from full field data while exactly preserving physics and while matching expected convergence rates. The framework is developed for manifolds in arbitrary dimension, with examples provided for H(div) problems in two dimensions for both linear and nonlinear physical systems. These examples highlight the convergence rates, structure preservation properties, and model reduction capabilities of the learned finite element exterior calculus architecture. In particular, we consider a lithium-ion battery problem where we discover a reduced finite element space encoding transport pathways from high-fidelity microstructure resolved simulations; our approach reduces the 5.89M finite element simulation to 136 elements yet still reproduces pressure with under 0.1% error and preserves conservation.
-
4:30 - 4:50 pm EDTMachine learning constitutive models of inelastic materials with microstructure11th Floor Lecture Hall
- Speaker
- Reese Jones, Sandia National Laboratories
- Session Chair
- Nathaniel Trask, Sandia National Laboratory
Abstract
Traditional simulations of complex physical processes, such as material deformation, are both crucial technologically and expensive computationally. Furthermore the development of physi- cal models via traditional methods is particularly time-consuming in human terms. Developing comparably accurate models directly from data can enable rapid development of accurate mod- els as well as more robust design, uncertainty quantification, and exhaustive structure-property exploration. We have been developing neural network models that are guided by traditional constitutive theory, such as tensor function representation theorems to embed symmetries, and also exploiting deep learning to infer intrinsic microstructural features. Neural networks are flexible since sub-components of their graph-like structure can be arranged to suit particular tasks, such as image processing and time integration, and represent the mechanistic flow of information. Furthermore graphs facilitate the treatment of the multiscale aspects of materials with microstructure. This talk will describe the architectures and demonstrate the efficacy of neural networks designed to model the response of complex history-dependent materials with pores, inclusions or grains based solely on observable data.
Wednesday, June 7, 2023
-
9:00 - 9:45 am EDTRepresentation equivalent Neural Operators11th Floor Lecture Hall
- Speaker
- Rima Alaifari, ETH Zurich, D-MATH (SAM)
- Session Chair
- Zhongqiang Zhang, Worcester Polytechnic Institute
Abstract
In operator learning, it has been observed that proposed models may not behave as operators when implemented on a computer, questioning the very essence of what operator learning should be. We contend that some form of continuous-discrete equivalence is necessary for an architecture to genuinely learn the underlying operator, rather than just discretizations of it. Employing frames, we introduce the framework of Representation equivalent Neural Operator (ReNO) to ensure operations at the continuous and discrete level are equivalent.
Joint work with Francesca Bartolucci (TU Delft), Emmanuel de Bezenac (ETH Zurich), Bogdan Raonic (ETH Zurich), Roberto Molinaro (ETH Zurich), Siddhartha Mishra (ETH Zurich). -
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTDeep Operator Network Approximation Rates11th Floor Lecture Hall
- Virtual Speaker
- Christoph Schwab, ETH Zürich
- Session Chair
- Zhongqiang Zhang, Worcester Polytechnic Institute
Abstract
We establish expression rate bounds for neural Deep Operator Networks (DON) emulating maps G between (subsets of) separable Hilbert spaces X and Y. The DON architecture considered uses linear encoders E and decoders D via biorthogonal Riesz bases of X, Y, and an approximator network of an infinite-dimensional, parametric coordinate map that is either holomorphic (`smooth case') or merely Lipschitz continuous (`rough case') on the sequence space.'
-
11:30 - 11:50 am EDTDeep neural operators with reliable extrapolation for multiphysics, multiscale & multifidelity problems11th Floor Lecture Hall
- Speaker
- Lu Lu, University of Pennsylvania
- Session Chair
- Panos Stinis, Pacific Northwest National Laboratory
Abstract
It is widely known that neural networks (NNs) are universal approximators of functions. However, a less known but powerful result is that a NN can accurately approximate any nonlinear operator. This universal approximation theorem of operators is suggestive of the potential of deep neural networks (DNNs) in learning operators of complex systems. In this talk, I will present the deep operator network (DeepONet) to learn various operators that represent deterministic and stochastic differential equations. I will also present several extensions of DeepONet, such as DeepM&Mnet for multiphysics problems, DeepONet with proper orthogonal decomposition (POD-DeepONet), (Fourier-)MIONet for multiple-input operators, and multifidelity DeepONet. I will demonstrate the effectiveness of DeepONet and its extensions to diverse multiphysics and multiscale problems, such as nanoscale heat transport, bubble growth dynamics, high-speed boundary layers, electroconvection, hypersonics, and geological carbon sequestration. Deep learning models are usually limited to interpolation scenarios, and I will quantify the extrapolation complexity and develop a complete workflow to address the challenge of extrapolation for deep neural operators.
-
12:00 - 12:20 pm EDTTransfer learning in deep operator networks11th Floor Lecture Hall
- Speaker
- Somdatta Goswami, Brown University
- Session Chair
- Panos Stinis, Pacific Northwest National Laboratory
Abstract
Transfer learning allows knowledge gained while learning to execute one task (source) to be transferred to a related but distinct task (target), thereby resolving the cost of data collecting and labeling, potential computational power restrictions, and dataset distribution mismatches. Based on the deep operator network, we propose a new transfer learning framework for task-specific learning (functional regression in partial differential equations) under conditional shift (DeepONet). Task-specific operator learning is achieved by fine-tuning task-specific layers of the target DeepONet with a hybrid loss function that allows for the matching of individual target samples while simultaneously preserving the global features of the target data's conditional distribution. By embedding conditional distributions onto a reproducing kernel Hilbert space, we minimize the statistical distance between labelled target data and the surrogate prediction on unlabelled target data, as inspired by conditional embedding operator theory. We demonstrate the benefits of our approach for a variety of transfer learning scenarios involving nonlinear partial differential equations under varying conditions caused by geometric domain shifts and model dynamics. Despite significant discrepancies between the source and target domains, our transfer learning architecture enables fast and effective learning of heterogeneous tasks.
-
12:20 - 12:25 pm EDTGroup Photo (Immediately After Talk)11th Floor Lecture Hall
-
12:30 - 2:30 pm EDTPoster Session LunchPoster Session - 10th Floor Collaborative Space
-
2:30 - 2:50 pm EDTLocal approximation of operators11th Floor Lecture Hall
- Speaker
- HRUSHIKESH MHASKAR, Claremont Graduate University (Claremont, CA, US)
- Session Chair
- Lu Lu, University of Pennsylvania
Abstract
Many applications, such as system identification, classification of time series, direct and inverse problems in partial differential equations, and uncertainty quantification lead to the question of approximation of a non-linear operator between metric spaces $\mathfrak{X}$ and $\mathfrak{Y}$. We study the problem of determining the degree of approximation of a such operators on a compact subset $K_\mathfrak{X}\subset \mathfrak{X}$ using a finite amount of information. If $\mathcal{F}: K_\mathfrak{X}\to K_\mathfrak{Y}$, a well established strategy to approximate $\mathcal{F}(F)$ for some $F\in K_\mathfrak{X}$ is to encode $F$ (respectively, $\mathcal{F}(F)$) in terms of a finite number $d$ (repectively $m$) of real numbers. Together with appropriate reconstruction algorithms (decoders), the problem reduces to the approximation of $m$ functions on a compact subset of a high dimensional Euclidean space $\mathbb{R}^d$, equivalently, the unit sphere $\mathbb{S}^d$ embedded in $\mathbb{R}^{d+1}$. The problem is challenging because $d$, $m$, as well as the complexity of the approximation on $\mathbb{S}^d$ are all large, and it is necessary to estimate the accuracy keeping track of the inter-dependence of all the approximations involved. In this paper, we establish constructive methods to do this efficiently; i.e., with the constants involved in the estimates on the approximation on $\mathbb{S}^d$ being $\mathcal{O}(d^{1/6})$. We study different smoothness classes for the operators, and also propose a method for approximation of $\mathcal{F}(F)$ using only information in a small neighborhood of $F$, resulting in an effective reduction in the number of parameters involved. To further mitigate the problem of large number of parameters, we propose prefabricated networks, resulting in a substantially smaller number of effective parameters. The problem is studied in both deterministic and probabilistic settings.
-
3:00 - 3:20 pm EDTMultifidelity Deep Operator Networks11th Floor Lecture Hall
- Speaker
- Amanda Howard, Pacific Northwest National Laboratory
- Session Chair
- Lu Lu, University of Pennsylvania
Abstract
Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems such as climate modeling. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In many cases, we may not have access to sufficient high-fidelity data to train, however we may have a large amount of low-fidelity that that is less accurate with greater uncertainty associated with it. The question is how to combine the low-fidelity and high-fidelity data to create a model that is capable of training more accurately than using the low- or high-fidelity data alone. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fidelity to accurately learn complex operators when sufficient high-fidelity data is not available. Additionally, we demonstrate that the presence of low-fidelity data can improve the predictions of physics-informed learning with DeepONets. We demonstrate the new multi-fidelity training in diverse examples, including modeling of the ice-sheet dynamics of the Humboldt glacier, Greenland, using two different fidelity models and also using the same physical model at two different resolutions. We will discuss extensions of the multifidelity framework, such as how multifidelity learning can contribute to more accurate training even in the absence of data, with only physics used to train.
-
3:30 - 4:00 pm EDTCoffee Break11th Floor Collaborative Space
-
4:00 - 4:20 pm EDTNeural Fields: A Unifying Framework for Operator Learning11th Floor Lecture Hall
- Speaker
- Jacob Seidman, University of Pennsylvania
- Session Chair
- Panos Stinis, Pacific Northwest National Laboratory
Abstract
Operator learning is an emerging area of machine learning which aims to learn mappings between infinite dimensional function spaces. There have been many successful architectures proposed for this problem, including the Fourier neural operator, DeepONet, and their extensions. Simultaneously, the field of computer vision has developed architectures, known as neural fields, for modeling quantities defined over spatial domains, such as signed distance functions and radiance fields. These neural fields can then be conditioned globally (e.g. with an object class) or locally (e.g. with information within a patch of the image) to modify their output without the need for retraining. In this talk, we demonstrate that the architectures used in operator learning are in fact examples of these neural fields whose outputs are modified based on global and local information of input functions. In doing so, we give a unified framework for succinctly explaining differences between popular operator learning architectures. Additionally, this framework creates a bridge for adapting well-developed tools for computer vision for use in operator learning problems.
-
4:30 - 4:50 pm EDTOperator Learning For Solving PDE-related Problems.11th Floor Lecture Hall
- Speaker
- zecheng zhang, Carnegie Mellon University
- Session Chair
- Panos Stinis, Pacific Northwest National Laboratory
Abstract
The data-driven approach has become an excellent option for some scientific computing problems. There are various data-driven treatments for PDE-related problems. Many of them can be implemented in the operator learning framework as the underlying mathematical computation problems construct the operator. I will focus on and discuss operator learning. In particular, I will discuss some theoretical extensions on the classical structure and introduce a new framework: basis enhanced learning (Bel). Bel does not require a specific discretization of the input and output functions and achieves great prediction accuracy. Universal approximation theory and some applications, including some newly proposed engineering applications, will be discussed.
Thursday, June 8, 2023
-
9:00 - 9:45 am EDTExploring Efficient AI with Spiking Neural Networks11th Floor Lecture Hall
- Speaker
- Priya Panda, Yale University
- Session Chair
- Themistoklis Sapsis, MIT
Abstract
Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning due to their huge energy efficiency benefits on neuromorphic hardware. In this presentation, I will talk about important techniques for training SNNs which bring a huge benefit in terms of latency, accuracy and even robustness. We will first delve into a recently proposed method Batch Normalization Through Time (BNTT) that allows us to train SNNs from scratch with very low latency and enables us to target interesting applications like video segmentation, human activity recognition and beyond traditional learning scenarios, like federated training. Then, I will discuss novel architectures with temporal feedback connections discovered by SNNs by using neural architecture search that further lower latency and improve energy efficiency, and point to interesting temporal effects. Finally, I will delve into the hardware perspective of SNNs when implemented on standard CMOS and compute-in-memory accelerators with our recently proposed SATA and SpikeSim tools. It turns out that the multiple timestep computation in SNNs can lead to extra memory overhead and repeated DRAM access that annuls all the compute-sparsity related advantages. I will highlight some algorithmic techniques such as, membrane-potential sharing, early time-step exit that use the temporal dimension in SNNs to reduce the overhead.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTDynamics and symmetries in neural network learning11th Floor Lecture Hall
- Speaker
- Stefanie Jegelka, MIT
- Session Chair
- Themistoklis Sapsis, MIT
Abstract
This talk will encompass two topics in the area of scientific machine learning: learning dynamics and symmetries. First, we look at training dynamics: recent results show that the training of neural networks does not always converge to a fixed point in parameter space. We investigate generalization in such settings. By taking a dynamical systems perspective and defining a more general notion of algorithmic stability, we draw connections between training behavior, stability and generalization. Second, in many applications, data and tasks have symmetries and thereby imply desired invariances. We will look at invariances of eigenvectors, which are important, for instance, when learning with graphs. We derive appropriate neural network architectures and show empirical and theoretical benefits of encoding such invariances.
-
11:30 am - 12:15 pm EDTA Probabilistic Future for Neuromorphic Computing11th Floor Lecture Hall
- Speaker
- Brad Aimone, Sandia National Laboratories
- Session Chair
- Priya Panda, Yale University
Abstract
Neuromorphic (aka brain-inspired) computing is an exciting new paradigm that can make computing dramatically more energy-efficient while potentially offering a path to realize the brain’s elusive artificial intelligence capabilities. As today’s neuromorphic hardware is already delivering an energy-efficient alternative to conventional approaches, the challenge has shifted to the math and computing communities – can we come up with strategies and techniques to take advantage of this potential for scientific computing applications? In our work in neuromorphic algorithms, we have realized that neuromorphic computing may impact a much broader range of applications than widely appreciated, but to do so we must be willing to move beyond the proven recipes that work for conventional computing. In this talk, I will present several vignettes of how we can create new applications for neuromorphic hardware today and going forward. I will first describe the value proposition of neuromorphic systems relative to modern alternatives such as GPUs and CPUs. To illustrate this, I will show how spiking neuromorphic hardware, such as the Intel Loihi platform or the ARM-based SpiNNaker platform, can be used to implement Monte Carlo random walk processes with a wide range of numerical computing applications. Finally, I will introduce a new approach to neuromorphic computing that emphasizes the stochastic nature of the brain that we call COINFLIPS.
-
12:30 - 2:30 pm EDTPoster Session LunchPoster Session - 10th Floor Collaborative Space
-
2:30 - 2:50 pm EDTEvolving Spiking Neural Networks for Scientific Applications11th Floor Lecture Hall
- Speaker
- Catherine Schuman, University of Tennessee
- Session Chair
- Priya Panda, Yale University
Abstract
Effectively leveraging the characteristics and capabilities of spiking neural networks for neuromorphic computing systems for real-world applications is an important challenge, especially as we target applications that have severe energy constraints. In this talk, I will overview the use of evolutionary optimization to design spiking neural networks for neuromorphic deployment. I will specifically highlight several real-world applications, such as radiation detection, internal combustion engine control, and autonomous race car control. I will discuss the advantages of using evolutionary optimization to design spiking neural networks for neuromorphic systems, including the ability to perform multi-objective optimization for energy efficiency and resiliency.
-
3:00 - 3:20 pm EDTUsing Spiking Neural Networks for Scientific Computations11th Floor Lecture Hall
- Speaker
- Adar Kahana, Brown University
- Session Chair
- Priya Panda, Yale University
Abstract
The field of machine learning accelerates rapidly in the scientific community, attracting many researchers to develop innovative methods for supervised and unsupervised learning machines. Two drawbacks of such techniques are the long computational effort it takes to train the models, and the need for a large volume of data for training. Researchers investigate more efficient learning machines, leading to the proposal of spiking neural networks, a biologically plausible learning framework. The inspiration of these networks is the human brain, which is considered as a very (arguably the most) efficient learning machine. In this talk we introduce the spiking neural networks and propose a method for using them for function regression. We also show how DeepONets can be used for the same task, but with spiking input data, with even better performance. We also propose a method for long time integration in this spiking framework. Last, we analyze the advantages of using spiking neural networks for scientific computing and discuss how it can be used for other scientific computing (with or without machine learning) from both the algorithm and hardware perspective.
-
3:30 - 4:00 pm EDTCoffee Break11th Floor Collaborative Space
-
4:00 - 4:20 pm EDTLearning to Predict using Network of Spiking Neurons11th Floor Lecture Hall
- Speaker
- Biswadeep Chakraborty, Georgia Institute of Technology
- Session Chair
- Brad Aimone, Sandia National Laboratories
Abstract
The emergence of computing technologies based on the brain is offering innovative energy-efficient information processing methods. Spiking Neural Networks, regarded as the third wave of Artificial Intelligence, are based on the learning principles in the brain, making them a biologically plausible model of neural processing. Spike-Time- Dependent Plasticity (STDP) is an efficient continual learning model of synaptic plasticity based on the same principles that underlie synaptic plasticity in the brain. We present our work on a heterogeneous recurrent spiking neural network which consists of heterogeneous neurons with varying firing/relaxation dynamics. The model learns using a heterogeneous STDP model with varying learning dynamics for each synapse. The heterogeneity in neuronal and synaptic dynamics reduces the spiking activity of a Recurrent Spiking Neural Network while improving prediction performance, enabling spike-efficient learning.
-
4:30 - 4:50 pm EDTExact Gradient Computation for Spiking Neural Networks via Forward Propagation11th Floor Lecture Hall
- Speaker
- Amin Karbasi, Yale University
- Session Chair
- Brad Aimone, Sandia National Laboratories
Abstract
Spiking neural networks (SNN) have recently emerged as alternatives to traditional neural networks, owing to its energy efficiency benefits and capacity to capture biological neuronal mechanisms. However, the classic backpropagation algorithm for training traditional networks has been notoriously difficult to apply to SNN due to the hard-thresholding and discontinuities at spike times. Therefore, a large majority of prior work believes exact gradients for SNN w.r.t. their weights do not exist and has focused on approximation methods to produce surrogate gradients. In this paper, (1) by applying the implicit function theorem to SNN at the discrete spike times, we prove that, albeit being non-differentiable in time, SNNs have well-defined gradients w.r.t. their weights, and (2) we propose a novel training algorithm, called forward propagation (FP), that computes exact gradients for SNN. FP exploits the causality structure between the spikes and allows us to parallelize computation forward in time. It can be used with other algorithms that simulate the forward pass, and it also provides insights on why other related algorithms such as Hebbian learning and also recently-proposed surrogate gradient methods may perform well.
Friday, June 9, 2023
-
9:00 - 9:45 am EDTOptimization-in-the-loop ML for energy and climate11th Floor Lecture Hall
- Virtual Speaker
- Priya Donti, MIT
- Session Chair
- Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
Addressing climate change will require concerted action across society, including the development of innovative technologies. While methods from machine learning (ML) have the potential to play an important role, these methods often struggle to contend with the physics, hard constraints, and complex decision-making processes that are inherent to many climate and energy problems. To address these limitations, I present the framework of “optimization-in-the-loop ML,” and show how it can enable the design of ML models that explicitly capture relevant constraints and decision-making processes. For instance, this framework can be used to design learning-based controllers that provably enforce the stability criteria or operational constraints associated with the systems in which they operate. It can also enable the design of task-based learning procedures that are cognizant of the downstream decision-making processes for which a model’s outputs will be used. By significantly improving performance and preventing critical failures, such techniques can unlock the potential of ML for operating low-carbon power grids, improving energy efficiency in buildings, and addressing other high-impact problems of relevance to climate action.
-
10:00 - 10:30 am EDTCoffee Break11th Floor Collaborative Space
-
10:30 - 11:15 am EDTSome old and some new thoughts on data-driven modeling of complex systems11th Floor Lecture Hall
- Speaker
- Yannis Kevrekidis, Johns Hopkins University
- Session Chair
- Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
I will talk about old and new results in the data driven modeling of complex dynamics: From learning neural ODEs and PDEs in the 1990s, to emergent spaces, optimal algorithm discovery and data driven well-posedness today, touching a little on when to learn and when not to, and (just a little) on causality.
-
11:30 am - 12:15 pm EDTLearning Neural Operators for Complex Physical System Modeling11th Floor Lecture Hall
- Speaker
- Yue Yu, Lehigh University
- Session Chair
- Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
For many decades, physics-based PDEs have been commonly employed for modeling complex system responses, then traditional numerical methods were employed to solve the PDEs and provide predictions. However, when governing laws are unknown or when high degrees of heterogeneity present, these classical models may become inaccurate. In this talk we propose to use data-driven modeling which directly utilizes high-fidelity simulation and experimental measurements to learn the hidden physics and provide further predictions. In particular, we develop PDE-inspired neural operator architectures, to learn the mapping between loading conditions and the corresponding system responses. By parameterizing the increment between layers as an integral operator, our neural operator can be seen as the analog of a time-dependent nonlocal equation, which captures the long-range dependencies in the feature space and is guaranteed to be resolution-independent. Moreover, when applying to (hidden) PDE solving tasks, our neural operator provides a universal approximator to a fixed point iterative procedure, and partial physical knowledge can be incorporated to further improve the model’s generalizability and transferability. As a real-world application, we learn the material models directly from digital image correlation (DIC) displacement tracking measurements on a porcine tricuspid valve leaflet tissue, and show that the learnt model substantially outperforms conventional constitutive models.
-
12:30 - 2:30 pm EDTLunch/Free Time
-
2:30 - 3:15 pm EDTDynamics in Deep Classifiers trained with the Square Loss: normalization, low rank, and generalization bounds11th Floor Lecture Hall
- Speaker
- Mengjia Xu, New Jersey Institute of Technology
- Session Chair
- George Karniadakis, Brown University
Abstract
We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum ρ, which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is ρ. In particular, we derive novel norm-based generalization bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent (SGD) in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. We also predict the existence of an intrinsic SGD noise in the weight matrices and in the margins. Specifically, we prove that the asymptotic quasi-interpolating solutions obtained by SGD in the presence of regularization show fluctuations that are larger for the weight matrices in layers closer to the input layer. We show that these fluctuations are due to a chaotic-like SGD dynamics arising from the competition between minimizing the error and minimizing the rank. Under the square loss, mini-batch SGD as well as weight decay (WD) are necessary for chaos; under exponential loss functions chaos occurs also for the case without WD.
-
3:30 - 4:00 pm EDTCoffee Break11th Floor Collaborative Space
-
4:00 - 4:20 pm EDTA Method for Computing Inverse Parametric PDE Problems with Randomized Neural Networks11th Floor Lecture Hall
- Speaker
- Suchuan Dong, Purdue University
- Session Chair
- George Karniadakis, Brown University
Abstract
We present a method for computing the inverse parameters and the solution field to inverse parametric partial differential equations (PDE) based on randomized neural networks. This extends the local extreme learning machine technique originally developed for forward PDEs to inverse problems. We develop three algorithms for training the neural network to solve the inverse PDE problem. The first algorithm (termed NLLSQ) determines the inverse parameters and the trainable network parameters all together by the nonlinear least squares method with perturbations (NLLSQ-perturb). The second algorithm (termed VarPro-F1) eliminates the inverse parameters from the overall problem by variable projection to attain a reduced problem about the trainable network parameters only. It solves the reduced problem first by the NLLSQ-perturb algorithm for the trainable network parameters, and then computes the inverse parameters by the linear least squares method. The third algorithm (termed VarPro-F2) eliminates the trainable network parameters from the overall problem by variable projection to attain a reduced problem about the inverse parameters only. It solves the reduced problem for the inverse parameters first, and then computes the trainable network parameters afterwards. VarPro-F1 and VarPro-F2 are reciprocal to each other in some sense. The presented method produces accurate results for inverse PDE problems. For noise-free data, the errors of the inverse parameters and the solution field decrease exponentially as the number of collocation points or the number of trainable network parameters increases, and can reach a level close to the machine accuracy. For noisy data, the accuracy degrades compared with the case of noise-free data, but the method remains quite accurate. Several numerical examples will be presented to demonstrate the characteristics and accuracy of the current method. It will be compared with the state-of-the-art neural network-based method for inverse PDEs.
-
4:30 - 4:50 pm EDTLeveraging Multi-time Hamilton Jacobi PDEs for Certain Scientific Machine Learning Problems11th Floor Lecture Hall
- Speaker
- Paula Chen, Brown University
- Session Chair
- George Karniadakis, Brown University
Abstract
Multi-time Hamilton-Jacobi partial differential equations (HJ PDEs) have deep connections with a wide range of fields, including optimal control, differential games, and imaging sciences. In this poster, we establish a novel theoretical connection between the multi-time Hopf formula, which corresponds to a representation of the solution to certain multi-time HJ PDEs, and certain learning problems. Through this novel connection, we increase the interpretability of the training process of certain machine learning applications, by showing that when we solve these learning problems, we also solve a multi-time HJ PDE and, by extension, its corresponding optimal control problem. As a first exploration of this connection, we establish the connection between the Linear Quadratic Regulator (LQR) and the regularized linear regression problem. We then leverage our theoretical connection to adapt standard LQR solvers (namely, those based on the Riccati ODEs) to design new approaches to training methods in learning. Finally, we provide some numerical examples demonstrating the computational advantages of our Riccati-based approach in the context of continual learning, transfer learning, and sparse dynamics identification. This is a joint work with Jerome Darbon (Brown University), George Karniadakis (Brown University), Tingwei Meng (UCLA), and Zongren Zou (Brown University).
All event times are listed in ICERM local time in Providence, RI (Eastern Daylight Time / UTC-4).
All event times are listed in .
ICERM local time in Providence, RI is Eastern Daylight Time (UTC-4). Would you like to switch back to ICERM time or choose a different custom timezone?