Organizing Committee
 Marta D'Elia
Pasteur Labs. and Stanford University  George Karniadakis
Brown University  Siddhartha Mishra
ETH Zurich  Themistoklis Sapsis
MIT  Jinchao Xu
Pennsylvania State University  Zhongqiang Zhang
Worcester Polytechnic Institute
Abstract
MSML2023 is the fourth edition of a newly established conference, with emphasis on promoting the study of mathematical theory and algorithms of machine learning, as well as applications of machine learning in scientific computing and engineering disciplines. This conference aims to bring together the communities of machine learning, applied mathematics, and computational science and engineering, to exchange ideas and progress in the fastgrowing field of scientific machine learning (SciML). The objective of this annual conference series is to promote the study of:
 Theory and algorithms of machine learning.
 Applications in scientific and engineering disciplines such as physics, chemistry, material sciences, fluid and solid mechanics, etc.
 To provide handson tutorials for students and new researchers in the field.
Previous MSML Conferences:
First MSML: https://www.pacm.princeton.edu/news/msml2020mathematicalandscientificmachinelearningconference
Second MSML: https://msml21.github.io/
Third MSML: https://msml22.github.io/
Submission for contributed posters/talks is now open and will stay open until February 10, 2023.
Please submit this form to be considered.
This workshop is partially funded by AFOSR award FA95502310193.
Confirmed Speakers & Participants
Talks will be presented virtually or inperson as indicated in the schedule below.
 Speaker
 Poster Presenter
 Attendee
 Virtual Attendee

Jonas Actor
Sandia National Laboratories

Brad Aimone
Sandia National Laboratories

Rima Alaifari
ETH Zurich, DMATH (SAM)

Bang An
King Abdullah University of Science and Technology

Shivam Barwey
Argonne National Laboratory

Peter Battaglia
DeepMind

Sara Bicego
Imperial College London

Andrea Bonfanti
BMW

Nicolas Boulle
University of Cambridge

Michael Bronstein
University of Oxford

Edoardo Centofanti
University of Pavia

Biswadeep Chakraborty
Georgia Institute of Technology

Peng Chen
Georgia Institute of Technology

Jingyi Chen
The University of Tulsa

Yifan Chen
Caltech

Paula Chen
Brown University

Xiaoli Chen
National University of Singapore

Zheng “Leslie” Chen
University of Massachusetts Dartmouth

Siu Wun Cheung
Lawrence Livermore National Laboratory

Frank Cole
University of Massachusetts Amherst

Marta D'Elia
Pasteur Labs. and Stanford University

Tharindu De Alwis
Worcester Polytechnic Institute

David Del Rey Fernádez
University of Waterloo

Suchuan Dong
Purdue University

Priya Donti
MIT

Fariba Fahroo
AFOSR

Tiffany Fan
Stanford University

Muhammad Faryad
Pennsylvania State University

Daniel Floryan
University of Houston

Khaled Furati
King Fahd University of Petroleum and Minerals

Zhenyuan Gao
Dassault Systemes Simulia Corp

Anna Gilbert
Yale University

Charles Godfrey
Pacific Northwest National Laboratory

Somdatta Goswami
Brown University

Jonathan Gryak
Queens College, CUNY

Kanan Gupta
Texas A&M University

Jihun Han
Dartmouth College

Ashlin Harris
Brown University

Huan He
Harvard University

Alex HernandezGarcia
Mila

Francisco Holguin
Johns Hopkins Applied Physics Lab

Youngjoon Hong
Sungkyunkwan University

Intekhab Hossain
Harvard University

Amanda Howard
Pacific Northwest National Laboratory

Juntao Huang
Texas Tech University

Nadeesha Jayaweera
Worcester Polytechnic Institute

Stefanie Jegelka
MIT

Reese Jones
Sandia National Laboratories

Adar Kahana
Brown University

Vasileios Kalantzis
IBM Research

Amin Karbasi
Yale University

George Karniadakis
Brown University

Yannis Kevrekidis
Johns Hopkins University

Taufiquar Khan
UNC Charolotte

Dohyun Kim
Brown University

Tyler Kroells
Iowa State University

Dinesh Kumar
University of Bristol

Seulip Lee
University of Georgia

Youngkyu Lee
KAIST

Chunyan Li
University of South Carolina

Ying Liang
Purdue University

Lizuo Liu
Southern Methodist University

Yun Lu
Kutztown University

Lu Lu
University of Pennsylvania

Sibusiso Mabuza
JMP Statistical Discovery

Georg Maierhofer
Sorbonne Université

Carlo Marcati
University of Pavia

Romit Maulik
Argonne National Laboratory

HRUSHIKESH MHASKAR
Claremont Graduate University (Claremont, CA, US)

Katarzyna Michalowska
Brown University

Siddhartha Mishra
ETH Zurich

Katherine Moore
Amherst College

Kateryna Morozovska
KTH Royal Institute of Technology

Indranil Nayak
The Ohio State University

Suman Neupane
University of North Carolina at Charlotte

Ebenezer Oluwasakin
Middle Tennessee State University

Houman Owhadi
California Institute of Technology

Stefano Pagani
Politecnico di Milano

Priya Panda
Yale University

Luca Pavarino
Università degli Studi di Pavia

Luca Pegolotti
Stanford University

Adrienne Propp
Stanford University

Akshay Rangamani
Massachusetts Institute of Technology

Signe RiemerSørensen
SINTEF Digital

Ritwick ROY
Simulia Corp.

Mohsen Sadr
Massachusetts Institute of Technology

Themistoklis Sapsis
MIT

Daniele Schiavazzi
University of Notre Dame

Catherine Schuman
University of Tennessee

Christoph Schwab
ETH Zürich

Jacob Seidman
University of Pennsylvania

Peter Sentz
Brown University

Panos Stinis
Pacific Northwest National Laboratory

Maria Luisa Taccari
university of leeds

Daniel Tartakovsky
Stanford University

Nathaniel Trask
Sandia National Laboratory

Elise Walker
Sandia National Laboratories

Hong Wang
University of South Carolina

Steffen W. R. Werner
Courant Institute, New York University

Jinchao Xu
Pennsylvania State University

Mengjia Xu
New Jersey Institute of Technology

Jue Yan
Iowa State University

Qian Yang
University of Connecticut

Yue Yu
Lehigh University

zecheng zhang
Carnegie Mellon University

Zhongqiang Zhang
Worcester Polytechnic Institute

Hongli Zhao
University of Chicago

Xueyu Zhu
University of Iowa

Qiao Zhuang
Worcester Polytechnic Institute
Workshop Schedule
Monday, June 5, 2023

8:30  8:50 am EDTCheck In11th Floor Collaborative Space

8:50  9:00 am EDTWelcome11th Floor Lecture Hall
 Brendan Hassett, ICERM/Brown University

9:00  9:45 am EDTA graph exterior calculus for structurepreserving ML: datadriven modeling, graph analytics, and causal learning11th Floor Lecture Hall
 Speaker
 Nathaniel Trask, Sandia National Laboratory
 Session Chair
 George Karniadakis, Brown University
Abstract
We introduce a graph exterior calculus which offers a rigorous mathematical framework for developing machine learning models on graphs. The calculus exactly mimics traditional vector calculus, providing analogous theoretical tools to variational methods for PDEs and providing a simple mathematical framework for stability analysis and preservation of physical and mathematical structure. In this talk we briefly introduce the fundamentals of the framework before showing how it may be used for a broad range of scientific machine learning tasks: fitting of network models to data, graph discovery associated with control volume analysis, reversible/irreversible bracket dynamics discovery, and causal DAG discovery.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTLearning simulation: graphs, physics, and weather11th Floor Lecture Hall
 Virtual Speaker
 Peter Battaglia, DeepMind
 Session Chair
 George Karniadakis, Brown University
Abstract
Simulation is one of the most important tools in science and engineering. However accurate simulation faces two challenges: (1) heavy compute requirements, and (2) sophisticated underlying equations which require deep expertise to formulate. Recent advances in machine learningbased simulation are now addressing both challenges by (1) allowing dynamics to be modeled with cheaper representations and computations, and (2) learning dynamics models directly from data. This talk will survey advances in graphbased learned simulation from the past few years, then deep dive into recent advances in machine learningbased weather prediction that have resulted in learned simulators that outperform the top operational forecasting systems in the world.

11:30 am  12:15 pm EDTMachine learning for science or machine learning + science?11th Floor Lecture Hall
 Speaker
 Anna Gilbert, Yale University
 Session Chair
 George Karniadakis, Brown University
Abstract
There has been considerable attention paid to machine learning tools for science. There is a belief that such tools can "reveal profound insights hiding in large and growing datasets." Indeed, people hope that models "can be automatically derived from that data" and that such models can be used to identify features, reduce complexity, and control experiments. This talk aims to caution the community against irrational exuberance and to investigate very carefully such claims. I will discuss two situations in which the blind usage of machine learning techniques fails to live up to the hype.

12:30  2:30 pm EDTLunch/Free Time

2:30  3:15 pm EDTComputational Hypergraph Discovery11th Floor Lecture Hall
 Speaker
 Houman Owhadi, California Institute of Technology
 Session Chair
 Jinchao Xu, Pennsylvania State University
Abstract
Most problems in Computational Sciences and Engineering can be formulated as that of discovering and/or completing, from data, a computational hypergraph representing (possibly partially known) functional dependencies (represented as hyperedges) between groups of variables (represented as nodes). When the structure of the hypergraph is known then the problem reduces to that of approximating (from data) unknown functions and variables and it is solved, in the Computational Graph Completion (CGC) framework, by replacing unknown functions by Gaussian Processes and computing their MAP or Empirical Bayes estimators given available data. In this talk we will focus on the problem of discovering the structure (connectivity) of the hypergraph itself from data.

3:30  4:00 pm EDTCoffee Break11th Floor Collaborative Space

4:00  4:20 pm EDTNeural architecture search for scientific machine learning11th Floor Lecture Hall
 Speaker
 Romit Maulik, Argonne National Laboratory
 Session Chair
 Jinchao Xu, Pennsylvania State University
Abstract
The construction of highperforming neural network architectures is central to their impressive performance in several scientific machine learning (SciML) tasks. In this talk, we will introduce a search framework for discovering high performing neural networks on distributed computing resources. Moreover, we will also demonstrate how our search framework may be used for multiobjective optimization as well as ensemblebased uncertainty quantification. Our search will be used to discover accurate and efficient neural networks for various SciML tasks such as for geophysical forecasting and flowreconstruction from sparse observations with quantified uncertainty.

4:30  4:50 pm EDTCellaverage based neural network method for time dependent problems11th Floor Lecture Hall
 Speaker
 Jue Yan, Iowa State University
 Session Chair
 Jinchao Xu, Pennsylvania State University
Abstract
In this talk, we present the recently developed cellaverage based neural network (CANN) method. The method is motivated by finite volume scheme and is based on the integral or weak formulation of the PDEs. A simple feed forward network is forced to learn the solution average difference between two neighboring time steps. Well trained network parameter set is identified as the scheme coefficients of an explicit onestep finite volume type method. The CANN method is implemented as a regular finite volume scheme, thus is mesh dependent. Different to conventional numerical methods, CANN method can be relieved from the explicit scheme CFL restriction thus can adapt large time step size for solution evolution forward in time. CANN method can sharply evolve contact discontinuity with almost zero numerical diffusion. Shock and rarefaction waves are well captured for nonlinear hyperbolic conservation laws.

5:00  6:30 pm EDTReception11th Floor Collaborative Space
Tuesday, June 6, 2023

9:00  9:45 am EDTPhysicsinspired learning on graphs11th Floor Lecture Hall
 Virtual Speaker
 Michael Bronstein, University of Oxford
 Session Chair
 Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
The messagepassing paradigm has been the “battle horse” of deep learning on graphs for several years, making graph neural networks a big success in a wide range of applications, from particle physics to protein design. From a theoretical viewpoint, it established the link to the WeisfeilerLehman hierarchy, allowing to analyse the expressive power of GNNs. We argue that the very “nodeandedge”centric mindset of current graph deep learning schemes may hinder future progress in the field. As an alternative, we propose physicsinspired “continuous” learning models that open up a new trove of tools from the fields of differential geometry, algebraic topology, and differential equations so far largely unexplored in graph ML.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTOn Learning Operators11th Floor Lecture Hall
 Virtual Speaker
 Siddhartha Mishra, ETH Zurich
 Session Chair
 Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
Learning Operators from data is emerging as a dominant paradigm in the application of machine learning to PDEs. Yet, what exactly entails operator learning still eludes a rigorous characterization. We argue that it is not enough for a model to process functions as inputs and outputs to be characterized as an operator learning framework. Instead, it is essential to impose some form of continuousdiscrete equivalence to enable the genuine learning of operators, rather than just discrete representations of them. To this end, we adapt tools from harmonic analysis in the form of frame theory to define representation equivalent neural operators (ReNOs), which are constructed to enforce a suitable form of continuousdiscrete equivalence. We investigate whether several existing learning frameworks are ReNOs or not. Then, we will present a novel operator learning paradigm, that of convolutional neural operators (CNOs) which are designed to be ReNOs. CNOs are shown to approximate operators, stemming from a large class of PDEs, to desired accuracy. Moreover, we compare CNOs to existing operator learning algorithms in numerical experiments to demonstrate that CNOs are competitive in performance for a variety of PDEs.

11:30  11:50 am EDTTransfer learning for surrogate models of PDEs11th Floor Lecture Hall
 Speaker
 Adrienne Propp, Stanford University
 Session Chair
 Nathaniel Trask, Sandia National Laboratory
Abstract
The development of efficient surrogates for PDEs is a critical step towards scalable modeling of complex, multiscale systemsofsystems. We use transfer learning with multilevel data to train a deep convolutional NN (CNN)based surrogate model, which significantly reduces the cost of data generation relative to a conventional approach. We show that transfer learning on a mixture of high and lowfidelity training data—obtained with a twodimensional PDE and its onedimensional approximation, respectively—reduces the cost of data generation without reducing performance of the surrogate.

12:00  12:20 pm EDTLearning ReducedOrder Models for Cardiovascular Simulations using Graph Neural Networks11th Floor Lecture Hall
 Speaker
 Luca Pegolotti, Stanford University
 Session Chair
 Nathaniel Trask, Sandia National Laboratory
Abstract
We present a novel approach for simulating blood flow dynamics in cardiovascular modeling using a modified version of MeshGraphNet, a graph neural network architecture originally developed for meshed data. Our method involves developing onedimensional reducedorder models that predict the pressure and flow rate at vessel centerline nodes. The graphneural network acts as an iterative solver, taking the state of the system at a particular timestep as input and providing an update that allows us to evolve the system to the next timestep. The approach is accurate and generalizable, achieving errors below 2% and 3% for pressure and flow rate, respectively, in a variety of anatomies and boundary conditions. Our modifications to MeshGraphNet enable its application to our specific problem domain, and our findings demonstrate the effectiveness of our approach for simulating blood flow dynamics in complex cardiovascular systems.

12:30  2:30 pm EDTPoster Session LunchPoster Session  10th Floor Collaborative Space

2:30  2:50 pm EDTGFlowNets to accelerate scientific discovery with machine learning11th Floor Lecture Hall
 Speaker
 Alex HernandezGarcia, Mila
 Session Chair
 Nathaniel Trask, Sandia National Laboratory
Abstract
Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the pace of scientific discoveries. The last few decades have seen the consolidation of datadriven scientific discoveries. However, in order to leverage largescale data sets and highthroughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key current challenge for machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating uncertainty and generating sets of diverse candidates. This motivated a new machine learning probabilistic framework called GFlowNets, which can be applied both for modelling and for the experimental design components of the active learning theoryexperimentanalysis loop. GFlowNets learn to sample proportionally to a reward function, which enables sampling diverse, highreward candidates. Equipped with the capabilities of deep learning, GFlowNets can also be used to perform efficient and amortized probabilistic inference, consistent with the knowledge captured in the world model, trained from acquired experimental data. This talk briefly introduced GFlowNets and its relevance for scientific discovery, in particular when used together with active learning.

3:00  3:20 pm EDTSparse Cholesky Factorization for Solving PDEs with Gaussian Processes11th Floor Lecture Hall
 Speaker
 Yifan Chen, Caltech
 Session Chair
 Nathaniel Trask, Sandia National Laboratory
Abstract
"Gaussian processes (GPs) and kernel methods are promising automatic approaches for solving PDEs, as they combine the theoretical rigor of traditional numerical algorithms with the flexible design of machine learning solvers. The complexity bottleneck of GPbased methods lies in computing with dense kernel matrices. In the case of PDE problems, these matrices may also involve partial derivatives of the kernels, and fast algorithms for such matrices are less developed compared to the derivativefree case. In this talk, we will discuss a rigorous sparse Cholesky factorization algorithm to make GPbased PDE solvers scalable. The algorithm relies on the nearsparsity of the Cholesky factor under a multiscale ordering of the pointwise and derivativetype entries of the matrices. It enables us to compute $\epsilon$approximate inverse Cholesky factors of the kernel matrices with complexity $O(N \log^d (N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. As a result, this leads to a nearlinear space/time complexity method for solving general PDEs with GPs."

3:30  4:00 pm EDTCoffee Break11th Floor Collaborative Space

4:00  4:20 pm EDTMachineLearned Finite Element Exterior Calculus for Linear and Nonlinear Problems11th Floor Lecture Hall
 Speaker
 Jonas Actor, Sandia National Laboratories
 Session Chair
 Nathaniel Trask, Sandia National Laboratory
Abstract
For many applications, scientific machine learning techniques are still limited in their ability to guarantee structure preservation inherent to various physical systems, and in their ability to achieve theoretic convergence rates. To address this limitation, we introduce a scientific machine learning framework based upon a partition of unity architecture; this architecture identifies physicallyrelevant control volumes, encoding generalized fluxes between subdomains via Whitney forms. Subsequently, this architecture admits a datadriven finite element exterior calculus allowing discovery of mixed finite element spaces with closed form quadrature rules. The resulting differentiable parameterization of geometry may be trained in an endtoend fashion to extract reduced models from full field data while exactly preserving physics and while matching expected convergence rates. The framework is developed for manifolds in arbitrary dimension, with examples provided for H(div) problems in two dimensions for both linear and nonlinear physical systems. These examples highlight the convergence rates, structure preservation properties, and model reduction capabilities of the learned finite element exterior calculus architecture. In particular, we consider a lithiumion battery problem where we discover a reduced finite element space encoding transport pathways from highfidelity microstructure resolved simulations; our approach reduces the 5.89M finite element simulation to 136 elements yet still reproduces pressure with under 0.1% error and preserves conservation.

4:30  4:50 pm EDTMachine learning constitutive models of inelastic materials with microstructure11th Floor Lecture Hall
 Speaker
 Reese Jones, Sandia National Laboratories
 Session Chair
 Nathaniel Trask, Sandia National Laboratory
Abstract
Traditional simulations of complex physical processes, such as material deformation, are both crucial technologically and expensive computationally. Furthermore the development of physi cal models via traditional methods is particularly timeconsuming in human terms. Developing comparably accurate models directly from data can enable rapid development of accurate mod els as well as more robust design, uncertainty quantification, and exhaustive structureproperty exploration. We have been developing neural network models that are guided by traditional constitutive theory, such as tensor function representation theorems to embed symmetries, and also exploiting deep learning to infer intrinsic microstructural features. Neural networks are flexible since subcomponents of their graphlike structure can be arranged to suit particular tasks, such as image processing and time integration, and represent the mechanistic flow of information. Furthermore graphs facilitate the treatment of the multiscale aspects of materials with microstructure. This talk will describe the architectures and demonstrate the efficacy of neural networks designed to model the response of complex historydependent materials with pores, inclusions or grains based solely on observable data.
Wednesday, June 7, 2023

9:00  9:45 am EDTRepresentation equivalent Neural Operators11th Floor Lecture Hall
 Speaker
 Rima Alaifari, ETH Zurich, DMATH (SAM)
 Session Chair
 Zhongqiang Zhang, Worcester Polytechnic Institute
Abstract
In operator learning, it has been observed that proposed models may not behave as operators when implemented on a computer, questioning the very essence of what operator learning should be. We contend that some form of continuousdiscrete equivalence is necessary for an architecture to genuinely learn the underlying operator, rather than just discretizations of it. Employing frames, we introduce the framework of Representation equivalent Neural Operator (ReNO) to ensure operations at the continuous and discrete level are equivalent.
Joint work with Francesca Bartolucci (TU Delft), Emmanuel de Bezenac (ETH Zurich), Bogdan Raonic (ETH Zurich), Roberto Molinaro (ETH Zurich), Siddhartha Mishra (ETH Zurich). 
10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTDeep Operator Network Approximation Rates11th Floor Lecture Hall
 Virtual Speaker
 Christoph Schwab, ETH Zürich
 Session Chair
 Zhongqiang Zhang, Worcester Polytechnic Institute
Abstract
We establish expression rate bounds for neural Deep Operator Networks (DON) emulating maps G between (subsets of) separable Hilbert spaces X and Y. The DON architecture considered uses linear encoders E and decoders D via biorthogonal Riesz bases of X, Y, and an approximator network of an infinitedimensional, parametric coordinate map that is either holomorphic (`smooth case') or merely Lipschitz continuous (`rough case') on the sequence space.'

11:30  11:50 am EDTDeep neural operators with reliable extrapolation for multiphysics, multiscale & multifidelity problems11th Floor Lecture Hall
 Speaker
 Lu Lu, University of Pennsylvania
 Session Chair
 Panos Stinis, Pacific Northwest National Laboratory
Abstract
It is widely known that neural networks (NNs) are universal approximators of functions. However, a less known but powerful result is that a NN can accurately approximate any nonlinear operator. This universal approximation theorem of operators is suggestive of the potential of deep neural networks (DNNs) in learning operators of complex systems. In this talk, I will present the deep operator network (DeepONet) to learn various operators that represent deterministic and stochastic differential equations. I will also present several extensions of DeepONet, such as DeepM&Mnet for multiphysics problems, DeepONet with proper orthogonal decomposition (PODDeepONet), (Fourier)MIONet for multipleinput operators, and multifidelity DeepONet. I will demonstrate the effectiveness of DeepONet and its extensions to diverse multiphysics and multiscale problems, such as nanoscale heat transport, bubble growth dynamics, highspeed boundary layers, electroconvection, hypersonics, and geological carbon sequestration. Deep learning models are usually limited to interpolation scenarios, and I will quantify the extrapolation complexity and develop a complete workflow to address the challenge of extrapolation for deep neural operators.

12:00  12:20 pm EDTTransfer learning in deep operator networks11th Floor Lecture Hall
 Speaker
 Somdatta Goswami, Brown University
 Session Chair
 Panos Stinis, Pacific Northwest National Laboratory
Abstract
Transfer learning allows knowledge gained while learning to execute one task (source) to be transferred to a related but distinct task (target), thereby resolving the cost of data collecting and labeling, potential computational power restrictions, and dataset distribution mismatches. Based on the deep operator network, we propose a new transfer learning framework for taskspecific learning (functional regression in partial differential equations) under conditional shift (DeepONet). Taskspecific operator learning is achieved by finetuning taskspecific layers of the target DeepONet with a hybrid loss function that allows for the matching of individual target samples while simultaneously preserving the global features of the target data's conditional distribution. By embedding conditional distributions onto a reproducing kernel Hilbert space, we minimize the statistical distance between labelled target data and the surrogate prediction on unlabelled target data, as inspired by conditional embedding operator theory. We demonstrate the benefits of our approach for a variety of transfer learning scenarios involving nonlinear partial differential equations under varying conditions caused by geometric domain shifts and model dynamics. Despite significant discrepancies between the source and target domains, our transfer learning architecture enables fast and effective learning of heterogeneous tasks.

12:20  12:25 pm EDTGroup Photo (Immediately After Talk)11th Floor Lecture Hall

12:30  2:30 pm EDTPoster Session LunchPoster Session  10th Floor Collaborative Space

2:30  2:50 pm EDTLocal approximation of operators11th Floor Lecture Hall
 Speaker
 HRUSHIKESH MHASKAR, Claremont Graduate University (Claremont, CA, US)
 Session Chair
 Lu Lu, University of Pennsylvania
Abstract
Many applications, such as system identification, classification of time series, direct and inverse problems in partial differential equations, and uncertainty quantification lead to the question of approximation of a nonlinear operator between metric spaces $\mathfrak{X}$ and $\mathfrak{Y}$. We study the problem of determining the degree of approximation of a such operators on a compact subset $K_\mathfrak{X}\subset \mathfrak{X}$ using a finite amount of information. If $\mathcal{F}: K_\mathfrak{X}\to K_\mathfrak{Y}$, a well established strategy to approximate $\mathcal{F}(F)$ for some $F\in K_\mathfrak{X}$ is to encode $F$ (respectively, $\mathcal{F}(F)$) in terms of a finite number $d$ (repectively $m$) of real numbers. Together with appropriate reconstruction algorithms (decoders), the problem reduces to the approximation of $m$ functions on a compact subset of a high dimensional Euclidean space $\mathbb{R}^d$, equivalently, the unit sphere $\mathbb{S}^d$ embedded in $\mathbb{R}^{d+1}$. The problem is challenging because $d$, $m$, as well as the complexity of the approximation on $\mathbb{S}^d$ are all large, and it is necessary to estimate the accuracy keeping track of the interdependence of all the approximations involved. In this paper, we establish constructive methods to do this efficiently; i.e., with the constants involved in the estimates on the approximation on $\mathbb{S}^d$ being $\mathcal{O}(d^{1/6})$. We study different smoothness classes for the operators, and also propose a method for approximation of $\mathcal{F}(F)$ using only information in a small neighborhood of $F$, resulting in an effective reduction in the number of parameters involved. To further mitigate the problem of large number of parameters, we propose prefabricated networks, resulting in a substantially smaller number of effective parameters. The problem is studied in both deterministic and probabilistic settings.

3:00  3:20 pm EDTMultifidelity Deep Operator Networks11th Floor Lecture Hall
 Speaker
 Amanda Howard, Pacific Northwest National Laboratory
 Session Chair
 Lu Lu, University of Pennsylvania
Abstract
Operator learning for complex nonlinear systems is increasingly common in modeling multiphysics and multiscale systems such as climate modeling. However, training such highdimensional operators requires a large amount of expensive, highfidelity data, either from experiments or simulations. In many cases, we may not have access to sufficient highfidelity data to train, however we may have a large amount of lowfidelity that that is less accurate with greater uncertainty associated with it. The question is how to combine the lowfidelity and highfidelity data to create a model that is capable of training more accurately than using the low or highfidelity data alone. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fidelity to accurately learn complex operators when sufficient highfidelity data is not available. Additionally, we demonstrate that the presence of lowfidelity data can improve the predictions of physicsinformed learning with DeepONets. We demonstrate the new multifidelity training in diverse examples, including modeling of the icesheet dynamics of the Humboldt glacier, Greenland, using two different fidelity models and also using the same physical model at two different resolutions. We will discuss extensions of the multifidelity framework, such as how multifidelity learning can contribute to more accurate training even in the absence of data, with only physics used to train.

3:30  4:00 pm EDTCoffee Break11th Floor Collaborative Space

4:00  4:20 pm EDTNeural Fields: A Unifying Framework for Operator Learning11th Floor Lecture Hall
 Speaker
 Jacob Seidman, University of Pennsylvania
 Session Chair
 Panos Stinis, Pacific Northwest National Laboratory
Abstract
Operator learning is an emerging area of machine learning which aims to learn mappings between infinite dimensional function spaces. There have been many successful architectures proposed for this problem, including the Fourier neural operator, DeepONet, and their extensions. Simultaneously, the field of computer vision has developed architectures, known as neural fields, for modeling quantities defined over spatial domains, such as signed distance functions and radiance fields. These neural fields can then be conditioned globally (e.g. with an object class) or locally (e.g. with information within a patch of the image) to modify their output without the need for retraining. In this talk, we demonstrate that the architectures used in operator learning are in fact examples of these neural fields whose outputs are modified based on global and local information of input functions. In doing so, we give a unified framework for succinctly explaining differences between popular operator learning architectures. Additionally, this framework creates a bridge for adapting welldeveloped tools for computer vision for use in operator learning problems.

4:30  4:50 pm EDTOperator Learning For Solving PDErelated Problems.11th Floor Lecture Hall
 Speaker
 zecheng zhang, Carnegie Mellon University
 Session Chair
 Panos Stinis, Pacific Northwest National Laboratory
Abstract
The datadriven approach has become an excellent option for some scientific computing problems. There are various datadriven treatments for PDErelated problems. Many of them can be implemented in the operator learning framework as the underlying mathematical computation problems construct the operator. I will focus on and discuss operator learning. In particular, I will discuss some theoretical extensions on the classical structure and introduce a new framework: basis enhanced learning (Bel). Bel does not require a specific discretization of the input and output functions and achieves great prediction accuracy. Universal approximation theory and some applications, including some newly proposed engineering applications, will be discussed.
Thursday, June 8, 2023

9:00  9:45 am EDTExploring Efficient AI with Spiking Neural Networks11th Floor Lecture Hall
 Speaker
 Priya Panda, Yale University
 Session Chair
 Themistoklis Sapsis, MIT
Abstract
Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning due to their huge energy efficiency benefits on neuromorphic hardware. In this presentation, I will talk about important techniques for training SNNs which bring a huge benefit in terms of latency, accuracy and even robustness. We will first delve into a recently proposed method Batch Normalization Through Time (BNTT) that allows us to train SNNs from scratch with very low latency and enables us to target interesting applications like video segmentation, human activity recognition and beyond traditional learning scenarios, like federated training. Then, I will discuss novel architectures with temporal feedback connections discovered by SNNs by using neural architecture search that further lower latency and improve energy efficiency, and point to interesting temporal effects. Finally, I will delve into the hardware perspective of SNNs when implemented on standard CMOS and computeinmemory accelerators with our recently proposed SATA and SpikeSim tools. It turns out that the multiple timestep computation in SNNs can lead to extra memory overhead and repeated DRAM access that annuls all the computesparsity related advantages. I will highlight some algorithmic techniques such as, membranepotential sharing, early timestep exit that use the temporal dimension in SNNs to reduce the overhead.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTDynamics and symmetries in neural network learning11th Floor Lecture Hall
 Speaker
 Stefanie Jegelka, MIT
 Session Chair
 Themistoklis Sapsis, MIT
Abstract
This talk will encompass two topics in the area of scientific machine learning: learning dynamics and symmetries. First, we look at training dynamics: recent results show that the training of neural networks does not always converge to a fixed point in parameter space. We investigate generalization in such settings. By taking a dynamical systems perspective and defining a more general notion of algorithmic stability, we draw connections between training behavior, stability and generalization. Second, in many applications, data and tasks have symmetries and thereby imply desired invariances. We will look at invariances of eigenvectors, which are important, for instance, when learning with graphs. We derive appropriate neural network architectures and show empirical and theoretical benefits of encoding such invariances.

11:30 am  12:15 pm EDTA Probabilistic Future for Neuromorphic Computing11th Floor Lecture Hall
 Speaker
 Brad Aimone, Sandia National Laboratories
 Session Chair
 Priya Panda, Yale University
Abstract
Neuromorphic (aka braininspired) computing is an exciting new paradigm that can make computing dramatically more energyefficient while potentially offering a path to realize the brain’s elusive artificial intelligence capabilities. As today’s neuromorphic hardware is already delivering an energyefficient alternative to conventional approaches, the challenge has shifted to the math and computing communities – can we come up with strategies and techniques to take advantage of this potential for scientific computing applications? In our work in neuromorphic algorithms, we have realized that neuromorphic computing may impact a much broader range of applications than widely appreciated, but to do so we must be willing to move beyond the proven recipes that work for conventional computing. In this talk, I will present several vignettes of how we can create new applications for neuromorphic hardware today and going forward. I will first describe the value proposition of neuromorphic systems relative to modern alternatives such as GPUs and CPUs. To illustrate this, I will show how spiking neuromorphic hardware, such as the Intel Loihi platform or the ARMbased SpiNNaker platform, can be used to implement Monte Carlo random walk processes with a wide range of numerical computing applications. Finally, I will introduce a new approach to neuromorphic computing that emphasizes the stochastic nature of the brain that we call COINFLIPS.

12:30  2:30 pm EDTPoster Session LunchPoster Session  10th Floor Collaborative Space

2:30  2:50 pm EDTEvolving Spiking Neural Networks for Scientific Applications11th Floor Lecture Hall
 Speaker
 Catherine Schuman, University of Tennessee
 Session Chair
 Priya Panda, Yale University
Abstract
Effectively leveraging the characteristics and capabilities of spiking neural networks for neuromorphic computing systems for realworld applications is an important challenge, especially as we target applications that have severe energy constraints. In this talk, I will overview the use of evolutionary optimization to design spiking neural networks for neuromorphic deployment. I will specifically highlight several realworld applications, such as radiation detection, internal combustion engine control, and autonomous race car control. I will discuss the advantages of using evolutionary optimization to design spiking neural networks for neuromorphic systems, including the ability to perform multiobjective optimization for energy efficiency and resiliency.

3:00  3:20 pm EDTUsing Spiking Neural Networks for Scientific Computations11th Floor Lecture Hall
 Speaker
 Adar Kahana, Brown University
 Session Chair
 Priya Panda, Yale University
Abstract
The field of machine learning accelerates rapidly in the scientific community, attracting many researchers to develop innovative methods for supervised and unsupervised learning machines. Two drawbacks of such techniques are the long computational effort it takes to train the models, and the need for a large volume of data for training. Researchers investigate more efficient learning machines, leading to the proposal of spiking neural networks, a biologically plausible learning framework. The inspiration of these networks is the human brain, which is considered as a very (arguably the most) efficient learning machine. In this talk we introduce the spiking neural networks and propose a method for using them for function regression. We also show how DeepONets can be used for the same task, but with spiking input data, with even better performance. We also propose a method for long time integration in this spiking framework. Last, we analyze the advantages of using spiking neural networks for scientific computing and discuss how it can be used for other scientific computing (with or without machine learning) from both the algorithm and hardware perspective.

3:30  4:00 pm EDTCoffee Break11th Floor Collaborative Space

4:00  4:20 pm EDTLearning to Predict using Network of Spiking Neurons11th Floor Lecture Hall
 Speaker
 Biswadeep Chakraborty, Georgia Institute of Technology
 Session Chair
 Brad Aimone, Sandia National Laboratories
Abstract
The emergence of computing technologies based on the brain is offering innovative energyefficient information processing methods. Spiking Neural Networks, regarded as the third wave of Artificial Intelligence, are based on the learning principles in the brain, making them a biologically plausible model of neural processing. SpikeTime Dependent Plasticity (STDP) is an efficient continual learning model of synaptic plasticity based on the same principles that underlie synaptic plasticity in the brain. We present our work on a heterogeneous recurrent spiking neural network which consists of heterogeneous neurons with varying firing/relaxation dynamics. The model learns using a heterogeneous STDP model with varying learning dynamics for each synapse. The heterogeneity in neuronal and synaptic dynamics reduces the spiking activity of a Recurrent Spiking Neural Network while improving prediction performance, enabling spikeefficient learning.

4:30  4:50 pm EDTExact Gradient Computation for Spiking Neural Networks via Forward Propagation11th Floor Lecture Hall
 Speaker
 Amin Karbasi, Yale University
 Session Chair
 Brad Aimone, Sandia National Laboratories
Abstract
Spiking neural networks (SNN) have recently emerged as alternatives to traditional neural networks, owing to its energy efficiency benefits and capacity to capture biological neuronal mechanisms. However, the classic backpropagation algorithm for training traditional networks has been notoriously difficult to apply to SNN due to the hardthresholding and discontinuities at spike times. Therefore, a large majority of prior work believes exact gradients for SNN w.r.t. their weights do not exist and has focused on approximation methods to produce surrogate gradients. In this paper, (1) by applying the implicit function theorem to SNN at the discrete spike times, we prove that, albeit being nondifferentiable in time, SNNs have welldefined gradients w.r.t. their weights, and (2) we propose a novel training algorithm, called forward propagation (FP), that computes exact gradients for SNN. FP exploits the causality structure between the spikes and allows us to parallelize computation forward in time. It can be used with other algorithms that simulate the forward pass, and it also provides insights on why other related algorithms such as Hebbian learning and also recentlyproposed surrogate gradient methods may perform well.
Friday, June 9, 2023

9:00  9:45 am EDTOptimizationintheloop ML for energy and climate11th Floor Lecture Hall
 Virtual Speaker
 Priya Donti, MIT
 Session Chair
 Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
Addressing climate change will require concerted action across society, including the development of innovative technologies. While methods from machine learning (ML) have the potential to play an important role, these methods often struggle to contend with the physics, hard constraints, and complex decisionmaking processes that are inherent to many climate and energy problems. To address these limitations, I present the framework of “optimizationintheloop ML,” and show how it can enable the design of ML models that explicitly capture relevant constraints and decisionmaking processes. For instance, this framework can be used to design learningbased controllers that provably enforce the stability criteria or operational constraints associated with the systems in which they operate. It can also enable the design of taskbased learning procedures that are cognizant of the downstream decisionmaking processes for which a model’s outputs will be used. By significantly improving performance and preventing critical failures, such techniques can unlock the potential of ML for operating lowcarbon power grids, improving energy efficiency in buildings, and addressing other highimpact problems of relevance to climate action.

10:00  10:30 am EDTCoffee Break11th Floor Collaborative Space

10:30  11:15 am EDTSome old and some new thoughts on datadriven modeling of complex systems11th Floor Lecture Hall
 Speaker
 Yannis Kevrekidis, Johns Hopkins University
 Session Chair
 Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
I will talk about old and new results in the data driven modeling of complex dynamics: From learning neural ODEs and PDEs in the 1990s, to emergent spaces, optimal algorithm discovery and data driven wellposedness today, touching a little on when to learn and when not to, and (just a little) on causality.

11:30 am  12:15 pm EDTLearning Neural Operators for Complex Physical System Modeling11th Floor Lecture Hall
 Speaker
 Yue Yu, Lehigh University
 Session Chair
 Marta D'Elia, Pasteur Labs. and Stanford University
Abstract
For many decades, physicsbased PDEs have been commonly employed for modeling complex system responses, then traditional numerical methods were employed to solve the PDEs and provide predictions. However, when governing laws are unknown or when high degrees of heterogeneity present, these classical models may become inaccurate. In this talk we propose to use datadriven modeling which directly utilizes highfidelity simulation and experimental measurements to learn the hidden physics and provide further predictions. In particular, we develop PDEinspired neural operator architectures, to learn the mapping between loading conditions and the corresponding system responses. By parameterizing the increment between layers as an integral operator, our neural operator can be seen as the analog of a timedependent nonlocal equation, which captures the longrange dependencies in the feature space and is guaranteed to be resolutionindependent. Moreover, when applying to (hidden) PDE solving tasks, our neural operator provides a universal approximator to a fixed point iterative procedure, and partial physical knowledge can be incorporated to further improve the model’s generalizability and transferability. As a realworld application, we learn the material models directly from digital image correlation (DIC) displacement tracking measurements on a porcine tricuspid valve leaflet tissue, and show that the learnt model substantially outperforms conventional constitutive models.

12:30  2:30 pm EDTLunch/Free Time

2:30  3:15 pm EDTDynamics in Deep Classifiers trained with the Square Loss: normalization, low rank, and generalization bounds11th Floor Lecture Hall
 Speaker
 Mengjia Xu, New Jersey Institute of Technology
 Session Chair
 George Karniadakis, Brown University
Abstract
We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum ρ, which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is ρ. In particular, we derive novel normbased generalization bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasiinterpolating solutions obtained by stochastic gradient descent (SGD) in the presence of weight decay have a bias toward lowrank weight matrices, which should improve generalization. We also predict the existence of an intrinsic SGD noise in the weight matrices and in the margins. Specifically, we prove that the asymptotic quasiinterpolating solutions obtained by SGD in the presence of regularization show fluctuations that are larger for the weight matrices in layers closer to the input layer. We show that these fluctuations are due to a chaoticlike SGD dynamics arising from the competition between minimizing the error and minimizing the rank. Under the square loss, minibatch SGD as well as weight decay (WD) are necessary for chaos; under exponential loss functions chaos occurs also for the case without WD.

3:30  4:00 pm EDTCoffee Break11th Floor Collaborative Space

4:00  4:20 pm EDTA Method for Computing Inverse Parametric PDE Problems with Randomized Neural Networks11th Floor Lecture Hall
 Speaker
 Suchuan Dong, Purdue University
 Session Chair
 George Karniadakis, Brown University
Abstract
We present a method for computing the inverse parameters and the solution field to inverse parametric partial differential equations (PDE) based on randomized neural networks. This extends the local extreme learning machine technique originally developed for forward PDEs to inverse problems. We develop three algorithms for training the neural network to solve the inverse PDE problem. The first algorithm (termed NLLSQ) determines the inverse parameters and the trainable network parameters all together by the nonlinear least squares method with perturbations (NLLSQperturb). The second algorithm (termed VarProF1) eliminates the inverse parameters from the overall problem by variable projection to attain a reduced problem about the trainable network parameters only. It solves the reduced problem first by the NLLSQperturb algorithm for the trainable network parameters, and then computes the inverse parameters by the linear least squares method. The third algorithm (termed VarProF2) eliminates the trainable network parameters from the overall problem by variable projection to attain a reduced problem about the inverse parameters only. It solves the reduced problem for the inverse parameters first, and then computes the trainable network parameters afterwards. VarProF1 and VarProF2 are reciprocal to each other in some sense. The presented method produces accurate results for inverse PDE problems. For noisefree data, the errors of the inverse parameters and the solution field decrease exponentially as the number of collocation points or the number of trainable network parameters increases, and can reach a level close to the machine accuracy. For noisy data, the accuracy degrades compared with the case of noisefree data, but the method remains quite accurate. Several numerical examples will be presented to demonstrate the characteristics and accuracy of the current method. It will be compared with the stateoftheart neural networkbased method for inverse PDEs.

4:30  4:50 pm EDTLeveraging Multitime Hamilton Jacobi PDEs for Certain Scientific Machine Learning Problems11th Floor Lecture Hall
 Speaker
 Paula Chen, Brown University
 Session Chair
 George Karniadakis, Brown University
Abstract
Multitime HamiltonJacobi partial differential equations (HJ PDEs) have deep connections with a wide range of fields, including optimal control, differential games, and imaging sciences. In this poster, we establish a novel theoretical connection between the multitime Hopf formula, which corresponds to a representation of the solution to certain multitime HJ PDEs, and certain learning problems. Through this novel connection, we increase the interpretability of the training process of certain machine learning applications, by showing that when we solve these learning problems, we also solve a multitime HJ PDE and, by extension, its corresponding optimal control problem. As a first exploration of this connection, we establish the connection between the Linear Quadratic Regulator (LQR) and the regularized linear regression problem. We then leverage our theoretical connection to adapt standard LQR solvers (namely, those based on the Riccati ODEs) to design new approaches to training methods in learning. Finally, we provide some numerical examples demonstrating the computational advantages of our Riccatibased approach in the context of continual learning, transfer learning, and sparse dynamics identification. This is a joint work with Jerome Darbon (Brown University), George Karniadakis (Brown University), Tingwei Meng (UCLA), and Zongren Zou (Brown University).
All event times are listed in ICERM local time in Providence, RI (Eastern Standard Time / UTC5).
All event times are listed in .
ICERM local time in Providence, RI is Eastern Standard Time (UTC5). Would you like to switch back to ICERM time or choose a different custom timezone?
Request Reimbursement
This section is for general purposes only and does not indicate that all attendees receive funding. Please refer to your personalized invitation to review your offer.
 ORCID iD
 As this program is funded by the National Science Foundation (NSF), ICERM is required to collect your ORCID iD if you are receiving funding to attend this program. Be sure to add your ORCID iD to your Cube profile as soon as possible to avoid delaying your reimbursement.
 Acceptable Costs

 1 roundtrip between your home institute and ICERM
 Flights on U.S. or E.U. airlines – economy class to either Providence airport (PVD) or Boston airport (BOS)
 Ground Transportation to and from airports and ICERM.
 Unacceptable Costs

 Flights on nonU.S. or nonE.U. airlines
 Flights on U.K. airlines
 Seats in economy plus, business class, or first class
 Change ticket fees of any kind
 Multiuse bus passes
 Meals or incidentals
 Advance Approval Required

 Personal car travel to ICERM from outside New England
 Multipledestination plane ticket; does not include layovers to reach ICERM
 Arriving or departing from ICERM more than a day before or day after the program
 Multiple trips to ICERM
 Rental car to/from ICERM
 Flights on a Swiss, Japanese, or Australian airlines
 Arriving or departing from airport other than PVD/BOS or home institution's local airport
 2 oneway plane tickets to create a roundtrip (often purchased from Expedia, Orbitz, etc.)
 Travel Maximum Contributions

 New England: $350
 Other contiguous US: $850
 Asia & Oceania: $2,000
 All other locations: $1,500
 Note these rates were updated in Spring 2023 and superseded any prior invitation rates. Any invitations without travel support will still not receive travel support.
 Reimbursement Requests

Request Reimbursement with Cube
Refer to the back of your ID badge for more information. Checklists are available at the front desk and in the Reimbursement section of Cube.
 Reimbursement Tips

 Scanned original receipts are required for all expenses
 Airfare receipt must show full itinerary and payment
 ICERM does not offer per diem or meal reimbursement
 Allowable mileage is reimbursed at prevailing IRS Business Rate and trip documented via pdf of Google Maps result
 Keep all documentation until you receive your reimbursement!
 Reimbursement Timing

6  8 weeks after all documentation is sent to ICERM. All reimbursement requests are reviewed by numerous central offices at Brown who may request additional documentation.
 Reimbursement Deadline

Submissions must be received within 30 days of ICERM departure to avoid applicable taxes. Submissions after thirty days will incur applicable taxes. No submissions are accepted more than six months after the program end.