Organizing Committee

Rough path theory emerged as a branch of stochastic analysis to give an improved approach to dealing with the interactions of complex random systems. In that context, it continues to resolve important questions, but its broader theoretical footprint has been substantial. Most notable is its contribution to Hairer’s Fields-Medal-winning work on regularity structures. At the core of rough path theory is the so-called signature transform which, while being simple to define, has rich mathematical properties bringing in aspects of analysis, geometry, and algebra. Hambly and Lyons (Annals of Math, 2010) built upon earlier work of Chen, showing how the signature represents the path uniquely up to generalized reparameterizations. This turns out to have practical implications allowing one to summarise the space of functions on unparameterized paths and data streams in a very economical way.

Over the past five years, a significant strand of applied work has been undertaken to exploit the mathematical richness of this object in diverse data science challenges from healthcare, to computer vision to gesture recognition. The log signature is becoming a powerful way to summarise the fine structure of a data stream in a neural net. The emergence of neural differential equations as an important tool in data science further deepens the connections with rough paths.

This four-day workshop will bring together key expertise across disciplines to advance understanding of some of the most pressing and exciting challenges. The week will start with structured, tutorial-style lectures on the foundational aspects of signatures their use in data science, and topics of broad appeal. These will include:

  • Mathematical foundations
  • Neural rough differential equations
  • Signature-based kernel methods
  • Expected signatures
  • Applications of signatures to action recognition and healthcare
There will also be an extended interactive practical session on computing with signatures. This session will be directed by leaders with participants working in groups with others of a similar level of previous experience. The aim of this session will be for participants to develop skills using the latest packages to implement data-focused tasks involving signatures.

The rest of the workshop will consist of technical talks and contributed talks from participants. There will be both organized discussions as well as opportunities for informal interaction.

To be considered for the practical portions of the workshop please apply by June 20, 2021.

Image for "VIRTUAL ONLY: Applications of Rough Paths: Computational Signatures and Data Science"

Confirmed Speakers & Participants

Talks will be presented virtually or in-person as indicated in the schedule below.

  • Speaker
  • Poster Presenter
  • Attendee
  • Virtual Attendee
  • Elie Alhajjar
    US Military Academy
  • Sergio Almada Monter
    Morgan Stanley
  • Nicholas B
  • Fabrice Baudoin
    University of Connecticut
  • Daniel Bedau
    Western Digital
  • Ryne Beeson
    CU Aerospace LLC
  • Jose Blanchet
    Columbia University and Stanford University
  • Linus Bleistein
    Inria Paris
  • Robert Boissy
    SeqStream PBC
  • Michael Brenner
    Harvard University
  • Luis Carvalho
    Boston University
  • Thomas Cass
    Imperial College London
  • Ricky Tian Qi Chen
    University of Toronto
  • Danilo de Barros
  • Joscha Diehl
    University of Greifswald
  • Jing Dong
    Columbia Business School
  • Bruce Driver
    University of California, San Diego
  • Weinan E
    Princeton University
  • Kurusch Ebrahimi-Fard
    Norwegian University of Science and Technology NTNU
  • Adeline Fermanian
  • Emilio Ferrucci
    Imperial Collete
  • James Foster
    University of Oxford
  • Peter Foster
    Alan Turing Institute
  • Mazyar Ghani Varzaneh
    TU Berlin
  • Jonathan H
  • Wesley Hamilton
    University of Utah
  • boumediene hamzi
  • Darryl Holm
    Imperial College London
  • Blanka Horvath
    King's College London
  • Songyan Hou
  • Kevin Hu
    Brown University
  • Xinru Hua
  • Tomoyuki Ichiba
    University of California Santa Barbara
  • Drago Indjic
  • Arman Khaledian
    Bank of America
  • Patrick Kidger
    University of Oxford
  • Tom L
    Alan Turing Institute
  • Jeroen Lamb
    Imperial College London
  • Adrien Laurent
    University of Geneva
  • Darrick Lee
    University of Pennsylvania
  • Maud Lemercier
    University of Warwick
  • Jose Henry Leon-Janampa
    JHL Quantitative Analysis Ltd.
  • Terry Lyons
    University of Oxford
  • Firdous Mala
    GDC Sopore, Bla
  • Thomas Mellan
    Imperial College London
  • Remy Messadene
    Imperial College London
  • Thibault Meyers
    ENSTA Paris
  • Ming Min
    University of California, Santa Barbara
  • Swapnil Mishra
    Imperial College London
  • Eduardo Mojica-Nava
    Universidad Nacional de Colombia
  • Sam Morley
    University of Oxford
  • James Morrill
    University of Oxford
  • Hao Ni
    University College London and Alan Turing Institute
  • Harald Oberhauser
    University of Oxford
  • Yanni Papandreou
    Imperial College London
  • Anastasia Papavasileiou
    Warwick University
  • Zhijun (George) Qiao
    University of Texas Rio Grande Valley
  • Jeremy Reizenstein
  • Marwan Sakran
    Universität Greifswald
  • Cristopher Salvi
    University of Oxford
  • Guillermo Sapiro
    Duke University
  • Kevin Schlegel
    The Alan Turing Institute
  • leonard schmitz
    university of greifswald
  • Anna Seigal
    University of Oxford
  • Yeonjong Shin
    Brown University
  • Farrokh Shirjian
    Tarbiat Modares University
  • Nikolas Tapia
    Weierstrass Institute
  • Csaba Toth
    University of Oxford
  • Trang Tran
  • William Turner
    Imperial College London
  • Mihaela van der Schaar
    University of Cambridge
  • Roberto Velho
    Federal University of Rio Grande do Sul
  • Bo Wang
    Massachusetts General Hospital
  • Niklas Weber
    LMU Munich
  • Stephan Wojtowytsch
    Princeton University
  • Yue Wu
    University of Oxford
  • George Wynne
    Imperial College London
  • Xinyi Xiang
    Google Developers Group
  • Wei Xiong
    University of Oxford
  • Masanao Yajima
    Boston Universisty
  • Weixin Yang
    The Alan Turing Institute
  • Haiyan Yu
    Penn State University
  • Vasilis Zafiris
    University of Houston-Downtown
  • Xin (Cindy) Zhang
    South China University of Technology

Workshop Schedule

Tuesday, July 6, 2021
  • 8:50 - 9:00 am EDT
    • Kavita Ramanan, Brown University
  • 9:00 - 10:00 am EDT
    Tutorial: Mathematical foundations of the signature
    • Terry Lyons, University of Oxford
    • Harald Oberhauser, University of Oxford
  • 10:05 - 10:30 am EDT
    Tutorial: Signatures in the Wild
    • Peter Foster, Alan Turing Institute
  • 10:35 - 11:00 am EDT
    ML Without Unnecessary Harm: Blind Pareto Fairness and Subgroup Robustness
    • Guillermo Sapiro, Duke University
    With the wide adoption of machine learning algorithms across various application domains, there is a growing interest in the fairness properties of such algorithms. The vast majority of the activity in the field of group fairness addresses disparities between predefined groups based on protected features such as gender, age, and race, which need to be available at train, and often also at test, time. These approaches are static and retrospective, since algorithms designed to protect groups identified a priori cannot anticipate and protect the needs of different at-risk groups in the future. In this work we analyze the space of solutions for worst-case fairness beyond demographics, and propose Blind Pareto Fairness (BPF), a method that leverages no-regret dynamics to recover a fair minimax classifier that reduces worst-case risk of any potential subgroup of sufficient size, and guarantees that the remaining population receives the best possible level of service. BPF addresses fairness beyond demographics, that is, it does not rely on predefined notions of at-risk groups, neither at train nor at test time. Our experimental results show that the proposed framework improves worst-case risk in multiple standard datasets, while simultaneously providing better levels of service for the remaining population, in comparison to competing methods.
  • 11:05 - 11:30 am EDT
    Coffee Break
  • 11:30 am - 12:00 pm EDT
    Tutorial: Signatures in the Wild
    • James Morrill, University of Oxford
  • 12:00 - 12:30 pm EDT
    Coffee Break
  • 12:30 - 1:00 pm EDT
    Infancy Longitudinal Structural MRI Data Analysis with Path Signature Features for the Cognitive Scores Prediction
    • Xin (Cindy) Zhang, South China University of Technology
    Path signature has unique advantages on extracting high order differential features of sequential data. Our team has been studying the path signature theory and actively applied it to various applications, including infant cognitive score prediction, human motion recognition, hand-written character recognition, hand-written text line recognition and writer identification etc. In this talk, I will share our most recent works on infant cognitive score prediction using learnable path signature features and simple deep learning models. The cognitive score can reveal individual’s abilities on intelligence, motion, language abilities. Recent research discovered that the cognitive ability is closely related with individual’s cortical structure and its development. We have proposed two frameworks to predict the cognitive score with different path signature features. For the first framework, we construct the temporal path signature along the age growth and extract signature features from longitudinal structural MRI data. By incorporating the cortical temporal path signature into the multi-stream deep learning model, the individual cognitive score can be predicted, even with missing data issues. For the second framework, we propose the learnable path signature algorithm to compute the developmental feature. Further, we obtain the brain region-wise development graph for the first two-year infant. Then we have employed the graph convolutional network for the score prediction. These two frameworks have been tested on two in-house cognitive data sets and reached state-of-the-art results.
  • 1:00 - 2:30 pm EDT
    Lunch/Free Time
  • 2:30 - 3:30 pm EDT
    Practical Session 1 : Computing some examples
    • Peter Foster, Alan Turing Institute
    • Sam Morley, University of Oxford
Wednesday, July 7, 2021
  • 9:00 - 9:30 am EDT
    Tutorial: Log-signatures and Neural Rough Differential Equations
    • James Foster, University of Oxford
  • 9:35 - 10:05 am EDT
    Neural Stochastic Differential Equations
    • Patrick Kidger, University of Oxford
  • 10:10 - 10:40 am EDT
    Tutorial: Kernels and Signatures
    • Cristopher Salvi, University of Oxford
  • 10:40 - 11:10 am EDT
    Coffee Break
  • 11:10 - 11:35 am EDT
    Distribution Regression for Sequential Data
    • Maud Lemercier, University of Warwick
    Distribution regression on sequential data describes the task of learning a function from a group of time series to a single scalar target. I will present a generic framework, based on the expected signature, which enables to compactly summarise a cloud of time series and make decisions on it. I will then demonstrate empirically how this framework achieves state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.
  • 11:40 am - 3:30 pm EDT
    Lunch/Free Time
  • 3:30 - 5:00 pm EDT
    Practical Session: Computing some examples
    • Peter Foster, Alan Turing Institute
    • Sam Morley, University of Oxford
Thursday, July 8, 2021
  • 9:00 - 9:30 am EDT
    Tutorial: Generative models and Signature-Based Machine Learning Models
    • Hao Ni, University College London and Alan Turing Institute
  • 9:35 - 10:05 am EDT
    Tutorial: Path recovery from signature feature representation
    • Weixin Yang, The Alan Turing Institute
    To recover the underlying paths from a given signature representation can not only give confidence in using signature as features in machine learning tasks but also be useful for data augmentation or key-frame extraction. The "Signatory" python package allows the signature transformation to act as a layer in a trainable neural network. Inspired by it, we tried to recover different types of underlying paths from a given signature-based representation. By visualizing the results, we discussed the effects of different hyper-parameters for our signature feature set.
  • 10:10 - 10:40 am EDT
    Tutorial: Gaussian Processes
    • Csaba Toth, University of Oxford
  • 10:45 - 11:15 am EDT
    Coffee Break
  • 11:15 - 11:45 am EDT
    Framing RNN as a kernel method: A neural ODE approach
    • Adeline Fermanian, LPSM
    Building on the interpretation of a recurrent neural network (RNN) as a continuous- time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature. This connection allows us to frame a RNN as a kernel method in a suitable reproducing kernel Hilbert space. As a consequence, we obtain theoretical guarantees on generalization and stability for a large class of recurrent networks. Our results are illustrated on simulated datasets.
  • 11:45 am - 12:10 pm EDT
    Signature kernels and expected signatures
    • Thomas Cass, Imperial College London
  • 12:10 - 1:30 pm EDT
    Lunch/Free Time
  • 1:40 - 2:05 pm EDT
    Machine Learning for PDEs
    • Michael Brenner, Harvard University
    I will discuss methods for using machine learning to speed up solutions of nonlinear partial differential equations, focusing on learning discretizations for coarse graining the numerical solutions of PDEs. I will start with examples in 1d, and then move on to the Navier Stokes equation.
  • 2:05 - 2:30 pm EDT
    Data -Driven Market Simulators some simple applications of signature kernel methods in mathematical finance
    • Blanka Horvath, King's College London
    Techniques that address sequential data have been a central theme in machine learning research in the past years. More recently, such considerations have entered the field of finance-related ML applications in several areas where we face inherently path dependent problems: from (deep) pricing and hedging (of path-dependent options) to generative modeling of synthetic market data, which we refer to as market generation.
    We revisit Deep Hedging from the perspective of the role of the data streams used for training and highlight how this perspective motivates the use of highly accurate generative models for synthetic data generation. From this, we draw conclusions regarding the implications for risk management and model governance of these applications, in contrast torisk-management in classical quantitative finance approaches.
    Indeed, financial ML applications and their risk-management heavily rely on a solid means of measuring and efficiently computing (smilarity-)metrics between datasets consisting of sample paths of stochastic processes. Stochastic processes are at their core random variables with values on path space. However, while the distance between two (finite dimensional) distributions was historically well understood, the extension of this notion to the level of stochastic processes remained a challenge until recently. We discuss the effect of different choices of such metrics while revisiting some topics that are central to ML-augmented quantitative finance applications (such as the synthetic generation and the evaluation of similarity of data streams) from a regulatory (and model governance) perpective. Finally, we discuss the effect of considering refined metrics which respect and preserve the information structure (the filtration) of the marketand the implications and relevance of such metrics on financial results.
  • 3:35 - 4:15 pm EDT
    Coffee Break
  • 4:15 - 4:45 pm EDT
    Tutorial: Action recognition from landmark data
    • Kevin Schlegel, The Alan Turing Institute
  • 4:50 - 5:20 pm EDT
    Tutorial: Two transforms for signature features
    • Yue Wu, University of Oxford
    In this tutorial, I will introduce two transforms that work with signature transforms, one is designed to handle missing data, and the other one is designed to embed the effect of the absolute position of the data stream into signature features in a unified and efficient way.
Friday, July 9, 2021
  • 9:00 - 9:25 am EDT
    Signatures, tensor decompositions, and nonlinear algebra
    • Anna Seigal, University of Oxford
    I will begin by discussing tensor rank, and the equations that arise when decomposing tensors into rank one terms. I will then consider decompositions of signature tensors, and their systems of equations. Along the way, I will mention joint work with Max Pfeffer and Bernd Sturmfels, and with Terry Lyons and Cris Salvi.
  • 9:30 - 9:55 am EDT
    espilon-Strong Simulation of Stochastic Differential Equations Driven by Levy Processes
    • Jing Dong, Columbia Business School
    Consider a stochastic differential equation dY(t)=f(X(t))dX(t), where X(t) is a pure jump Levy process with finite p-variation, 1<= p < 2, and f is alpha-Lipschitz for some alpha>p. Following the geometric solution construction of Levy-driven stochastic differential equations, we develop a class of epsilon-strong simulation algorithms that allows us to construct a probability space, supporting both Y and a fully simulatable process Y_epsilon, such that Y_epsilon is within epsilon distance from Y under the Skorokhod J1 topology on compact time intervals with probability 1. Moreover, the user can adaptively refine the accuracy levels. This tolerance-enforcement feature allows us to easily combine our algorithm with multilevel Monte Carlo method for efficient estimation of expectations, and adding as a benefit a straightforward analysis of the rate of convergence.
  • 10:05 - 10:30 am EDT
    Stochastic gradient descent for noise with ML-type scaling
    • Stephan Wojtowytsch, Princeton University
    There are two types of convergence results for stochastic gradient descent: (1) SGD finds minimizers of convex objective functions and (2) SGD finds critical points of smooth objective functions. We show that, if the objective landscape and noise possess certain properties which are reminiscent of deep learning problems, then we can obtain global convergence guarantees of first type under second type assumptions for a fixed (small, but positive) learning rate. The convergence is exponential, but with a large random coefficient. If the learning rate exceeds a certain threshold, we discuss minimum selection by studying the invariant distribution of a continuous time SGD model. We show that at a critical threshold, SGD prefers minimizers where the objective function is 'flat' in a precise sense.
  • 10:35 am - 12:30 pm EDT
    Lunch/Free Time
  • 12:30 - 1:15 pm EDT
    Discussion of Challenges
  • 1:15 - 1:25 pm EDT
    Coffee Break
  • 1:25 - 1:40 pm EDT
    Open Session
  • 1:40 - 2:35 pm EDT
    Coffee Break
  • 2:35 - 3:00 pm EDT
    Controlled Rough Paths Revisited
    • Bruce Driver, University of California, San Diego
    In this talk, I will discuss some of the details needed for the theory of controlled rough paths in the infinite dimensional Banach space setting. A key point will be that one may use truncated signatures of smooth curves as ``generating functions'' of the algebraic identities needed to make the theory work. This is a report on work in progress.
  • 3:00 - 3:25 pm EDT
    Exact Sampling of Stochastic Differential Equations
    • Jose Blanchet, Columbia University and Stanford University
    We consider the problem of generating exact samples at finitely many locations from the solution of a generic multidimensional stochastic differential equation (SDE) driven by Brownian motion at a given time. If the SDE can be transformed into one with a constant diffusion coefficient and gradient drift exact samples can be obtained by sequential acceptance / rejection. In general, in this talk, we will explain how to use the theory of rough paths to obtain such exact samples. This is the first generic algorithm for exact samples of generic multivariate diffusions.
  • 3:30 - 4:00 pm EDT
    Closing Remarks

All event times are listed in ICERM local time in Providence, RI (Eastern Standard Time / UTC-5).

All event times are listed in .