In recent years, the availability of huge amounts of data across virtually all fields has ushered in an entirely new way of thinking about and using data. The scientific method --- and classical statistics --- involves formulating a hypothesis, and then testing that (pre-specified) hypothesis on some data. However, as datasets have continued to grow in size, the goal of data generation has increasingly moved away from using data to test a pre-specified hypothesis. Instead, people use data to generate new hypotheses and then test those hypotheses on the same data. Unfortunately, classical statistical methods do not apply when the same data are used for hypothesis generation and hypothesis testing. In this talk, I'll show what can go wrong when people engage in this sort of "double-dipping". I will also present some solutions, using the new statistical framework of selective inference.

Image for "A Virtual ICERM Public Lecture: More data, more problems - Double-dipping in statistics"

About the Speaker

Daniela Witten is a professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data, with a focus on unsupervised learning.

Daniela is the recipient of an NIH Director's Early Independence Award, a Sloan Research Fellowship, an NSF CAREER Award, a Simons Investigator Award in Mathematical Modeling of Living Systems, a David Byar Award, a Gertrude Cox Scholarship, and an NDSEG Research Fellowship. She is also the recipient of the Spiegelman Award from the American Public Health Association for a statistician under age 40 who has made outstanding contributions to statistics for public health, as well as the Leo Breiman Award for contributions to the field of statistical machine learning. She is a Fellow of the American Statistical Association, and an Elected Member of the International Statistical Institute.

Daniela’s work has been featured in the popular media: among other forums, in Forbes Magazine (three times) and Elle Magazine, on KUOW radio (Seattle's local NPR affiliate station), in a NOVA documentary, and as a PopTech Science Fellow.

Daniela is a co-author (with Gareth James, Trevor Hastie, and Rob Tibshirani) of the very popular textbook "Introduction to Statistical Learning". She was a member of the National Academy of Medicine (formerly the Institute of Medicine) committee that released the report "Evolution of Translational Omics".

Daniela completed a BS in Math and Biology with Honors and Distinction at Stanford University in 2005, and a PhD in Statistics at Stanford University in 2010.

Daniela Witten, University of Washington