Organizing Committee
Abstract

WiSDM 2019 is a research collaboration workshop targeted toward women working in data science and mathematics. This program will bring together women at all stages of their careers, from graduate students to senior researchers, to collaborate on problems in data science.

Data science is typically characterized as work at the intersection of mathematics, computer science, statistics, and an application domain. The scientific focus will be on cutting-edge problems in network analysis for gene detection, group dynamics, graph clustering, novel statistical and topological learning algorithms, tensor product decompositions, reconciliation of assurance of anonymity and privacy with utility measures for data transfer and analytics, as well as efficient and accurate completion, inference and fusion methods for large data and correlations.

Applications are now open. Applicants should rank their top 3 choices of projects in their personal statement. Project descriptions can be found below.

Application deadline extended to March 31, 2019.

Image Credits: Ma, Needell et.al
Group Leads
  • Andrea Bertozzi
    UCLA
  • Carlotta Domeniconi
    George Mason University
  • Giseon Heo
    University of Alberta
  • Misha Kilmer
    Tufts University
  • Deanna Needell
    UCLA
  • Umut Ozbek
    Icahn School of Medicine at Mount Sinai
  • Emina Soljanin
    Rutgers University

Confirmed Speakers & Participants

  • Speaker
  • Poster Presenter
  • Attendee

Application Information

ICERM welcomes applications from faculty, postdocs, graduate students, industry scientists, and other researchers who wish to participate. Some funding may be available for travel and lodging. Graduate students who apply must have their advisor submit a statement of support in order to be considered.

Applications are not currently open. Please check back at a later date.

Your Visit to ICERM

ICERM Facilities
ICERM is located on the 10th & 11th floors of 121 South Main Street in Providence, Rhode Island. ICERM's business hours are 8:00am - 4:00pm during this event. See our facilities page for more info about ICERM and Brown's available facilities.
Traveling to ICERM
ICERM is located at Brown University in Providence, Rhode Island. Providence's T.F. Green Airport (15 minutes south) and Boston's Logan Airport (1 hour north) are the closest airports. Providence is also on Amtrak's Northeast Corridor. In-depth directions and transportation information are available on our travel page.
Lodging
To secure ICERM's preferred hotel rate at the Hampton Inn & Suites Providence Downtown, use this link. ICERM regularly works with two additional area hotels for short visits. The Providence Biltmore and Hilton Garden Inn both have discounted rates available. Contact housing@icerm.brown.edu before booking outside of the preferred rate or if you would like to book alternate accommodations.
The only way ICERM participants should book a room is through the hotel reservation links located on this page or through links emailed to them from an ICERM email address (first_last@icerm.brown.edu). ICERM never works with any conference booking vendors and never collects credit card information.
Childcare/Schools
Those traveling with family who are interested in information about childcare and/or schools should contact housing@icerm.brown.edu.
Technology Resources
Wireless internet access ("Brown-Guest") and wireless printing is available for all ICERM visitors. Eduroam is available for members of participating institutions. Thin clients in all offices and common areas provide open access to a web browser, SSH terminal, and printing capability. See our Technology Resources page for setup instructions and to learn about all available technology.
Discrimination and Harassment Policy
ICERM is committed to creating a safe, professional, and welcoming environment that benefits from the diversity and experiences of all its participants. The Brown University "Discrimination and Workplace Harassment Policy" applies to all ICERM participants and staff. Participants with concerns or requests for assistance on a discrimination or harassment issue should contact the ICERM Director, who is the responsible employee at ICERM under this policy.
Exploring Providence
Providence's world-renowned culinary scene provides ample options for lunch and dinner. Neighborhoods near campus, including College Hill Historic District, have many local attractions. Check out the map on our Explore Providence page to see what's near ICERM.

Visa Information

Contact visa@icerm.brown.edu for assistance.

Reimbursable
B-1 or Visa Waiver Business (WB)
Not Reimbursable
B-2 or Visa Waiver Tourist (WT)
Already in the US?

F-1 and J-1 not sponsored by ICERM: need to obtain a letter approving reimbursement from the International Office of your home institution PRIOR to travel.

H-1B holders do not need letter of approval.

All other visas: alert ICERM staff immediately about your situation.

Financial Support

Acceptable Costs
  • 1 roundtrip between your home institute and ICERM
  • Flights on U.S. or E.U. airlines – economy class to either Providence airport (PVD) or Boston airport (BOS)
  • Ground Transportation to and from airports and ICERM.
Unacceptable Costs
  • Flights on non-U.S. or non-E.U. airlines
  • Flights on U.K. airlines
  • Seats in economy plus, business class, or first class
  • Change ticket fees of any kind
  • Multi-use bus passes
  • Meals or incidentals
Advance Approval Required
  • Personal car travel to ICERM from outside New England
  • Multiple-destination plane ticket; does not include layovers to reach ICERM
  • Arriving or departing from ICERM more than a day before or day after the program
  • Multiple trips to ICERM
  • Rental car to/from ICERM
  • Flights on a Swiss, Japanese, or Australian airlines
  • Arriving or departing from airport other than PVD/BOS or home institution's local airport
  • 2 one-way plane tickets to create a roundtrip (often purchased from Expedia, Orbitz, etc.)
Reimbursement Request Form

https://icerm.brown.edu/money/

Refer to the back of your ID badge for more information. Checklists are available at the front desk.

Reimbursement Tips
  • Scanned original receipts are required for all expenses
  • Airfare receipt must show full itinerary and payment
  • ICERM does not offer per diem or meal reimbursement
  • Allowable mileage is reimbursed at prevailing IRS Business Rate and trip documented via pdf of Google Maps result
  • Keep all documentation until you receive your reimbursement!
Reimbursement Timing

6 - 8 weeks after all documentation is sent to ICERM. All reimbursement requests are reviewed by numerous central offices at Brown who may request additional documentation.

Reimbursement Deadline

Submissions must be received within 30 days of ICERM departure to avoid applicable taxes. Submissions after thirty days will incur applicable taxes. No submissions are accepted more than six months after the program end.

Project Descriptions

Project 1: Graph regularization of high dimensional data

Leadership: Andrea Bertozzi (UCLA), Yifei Lou (UT Dallas)

There has been a large volume of mathematical models to process signals and/or images that are defined on a regular domain. As for irregular or unsorted data graph modeling often provides a flexible representation to capture the underlying structures. However, some key notions in image processing, such as translation, convolution, and dilation, are not straightforward on graphs. This project aims to develop a graph-regularized framework in data analysis, to address key challenges regarding both theoretical and computational aspects in graph representation, and to demonstrate its capacity in various applications. More specifically, given a graph, the graph Fourier transform is defined in terms of the eigenvectors of graph Laplacian. As a result, aforementioned image processing operators can be carried out on the graph frequency domain. This approach offers a possible way to process data on the graph, but computational efficiency remains an open question. The project will be supplemented with prototypical applications in data science, such as social networks, electric power grids, and hyperspectral imaging.

Project 2: Tensor Tools for Multiway Data Analysis

Leadership: Misha Kilmer (Tufts University)

Many problems in scientific settings involve operators or data that are inherently multidimensional: consider the storage of digital video data referenced by frame number, color band, and spatial dimensions; data on gene responses to different chemical combinations; discrete PDE snapshot data according to three spatial dimensions and a time dimension. It makes sense to treat multiway (aka tensor) data in its natural format. However, it is in this regime that traditional results from linear algebra break down, necessitating the development of new constructs and algorithms to treat multiway data.

Recent work has shown that with the right tensor tools, processing data in tensor format rather than matrix format can definitively provide additional structural information that allows for better compression and analysis of the data, for example in facial recognition. However, different data sets involve different structural characteristics, and some tensor decompositions are better suited than others to reveal the corresponding latent features. On the other hand, for large datasets, the computational time for the decompositions must also be a consideration.

In this project, participants with learn about some of the state-of-the-art tensor techniques from both a mathematical and computational viewpoint for compressing and mining of multiway data. Particular attention will be given to one such approach using tensor-tensor products whose associated algebraic framework permits a computationally efficient extension of linear algebraic and data analytic concepts such as PCA, dictionary learning, clustering and neural nets. In addition to investigating the use of some of these tensor algorithms on real data, we will also consider open questions in the theoretical understanding of the data analysis tools built on this new mathematical framework.

Project 3: Inferences on Incomplete and Multi-Modal Data with Applications to Medical Data

Leadership: Deanna Needell (UCLA)

Recent technological and scientific advances have allowed the acquisition of vast amounts of various types of data, including medical and medically related survey data. Such an abundance of information should lead to new scientific understanding in the mechanism of disease, diagnosis, and treatment. However, the large-scale nature of this data requires novel mathematical techniques in order to effectively extract and analyze the information. This project will address three main existing challenges in analyzing this type of data. Our goals focus on (i) analyzing large-scale but highly incomplete data, (ii) the need for computationally efficient methods that still provide very accurate inferential results, (iii) data fusion techniques for analyzing a wide array of data types in one cohesive framework. We will use recently acquired Lyme disease data as a motivating example in the design and testing of our methods.

Project 4: Modeling Spatial and Temporal Dynamics in Networks

Leadership: Carlotta Domeniconi (George Mason University), Sibel Tari (Middle East Technical University)

Humans are social beings that organize and form groups, or communities. Groups are defined as a set of densely-connected nodes relative to the rest of the network. Some groups are short-lived and survive for a fraction of their members' life while others exist over many lifetimes. A group brings together a set of nodes, but these nodes may serve different roles within the group. As an example, the figure shows a subgraph from a Facebook snapshot network. Nodes are colored by their primary role and sized according to their in-degree. Edges are sized according to the number of interactions they represent.

The purpose of this project is to improve our understanding of group dynamics while avoiding having to model the behavior of individuals. Instead, we will consider groups as first-class entities in a network and identify useful features, some of which may be derived from group members, which indicate the current status and predict the future outcome of a group.

Potential approaches to be explored include nonlinear dynamic systems modeling, linear algebra techniques, and embedding techniques via deep learning. Initial studies indicate that some roles may fit a prey-predator temporal dynamic model. We will also consider modeling both temporal dynamics and spatial configurations (e.g. via reaction-diffusion models). Hyper graphs that capture multi-way interactions among individuals may also be considered. We anticipate using networks of collaborators (e.g. Scratch users), social networks (e.g. Facebook), and possibly financial networks for the detection of anomalous trends.

Project 5: Development of a Statistical Topological Learning Algorithm

Leadership: Giseon Heo (University of Alberta), Xu Wang (Laurier University)

The analysis and interpretation of high dimensional data has become increasingly more challenging, requiring sophisticated analytic techniques. Thus, it may no longer be effective to independently apply data analysis methods from specific scientific disciplines such as statistics, mathematics, or computing science to solve a complex problem. We aim to develop a novel statistical and topological learning (STL) algorithm which will be used for analyzing high-dimensional data based upon persistent homology from computational topology and geometry, neural networks from deep learning, as well as classical and advanced methods in statistics and machine learning. The STL algorithm will be applied to chronic diseases in order to aid clinicians to create the most optimal treatment plan for each patient.

Project 6: User Anonymity and Data Privacy

Leadership: Emina Soljanin (Rutgers University)

Simultaneous knowledge extraction by multiple institutions from large volumes of data has to honor the demand for privacy and anonymity from individuals. This project will explore recon- ciling assurance of anonymity and privacy with the various utility measures in data transfer and data analytics. We have done some preliminary work on anonymity mixes which are, in some form, a building block of many practical anonymity systems.

An anonymity (threshold) Mix is a sophisticated message router that receives and holds packets from message sources and forwards them in a batch to their respective destinations only when it accumulates messages from some prescribed number of sources. Because of such simultaneous transmissions of messages, the identities of communicating pairs remain hidden to possible adversaries that seek to link message sources and destinations. The price of achieving anonymity in this way is delay, because messages are held at the Mix until a batch of a certain size is formed. This talk will describe two promising ideas about how to compute the delay, and present some preliminary results. One idea is to model batch mixes as generalized assembly-like queues and develop an approximate queuing analysis of these objects. The other idea is to model the source/destination channels as urns and messages as balls, and compute the channels' queues occupancy, and the time it takes to accumulate enough messages to have a departure, as a variant of the coupon collection problem.

Project 7: Comparing/Combining Clustering Techniques for Omics Data Integration

Leadership: Umut Özbek (Icahn School of Medicine at Mount Sinai)

Network analysis, detecting modules and pathway enrichment have been widely used to identify candidate genes, which would be used as targets for drug development and outcome prediction. Using the comprehensive and multi-dimensional data generated by The Cancer Genome Atlas (TCGA), which is a collaboration between the National Cancer Institute and the National Human Genome Research Institute, participants will be encouraged to build networks using a conditional graphical model and investigate different clustering techniques to select genes that are potentially associated with the disease. Through these exercises, participants will learn a novel statistical technique to construct a network, visualize their network using online tools and statistical software, applying clustering algorithms to create modules and interpret the results statistically, clinically and biologically.