Return to Colloquia & Seminar listing
Towards a new toolbox of optimal statistical primitives
Mathematics of Data & DecisionsSpeaker: | Jasper Lee, UC Davis |
Location: | 1025 PDSB |
Start time: | Tue, Oct 1 2024, 3:10PM |
Given society's increasing reliance on data, its collection and processing into useful information is a technical problem of growing focus, and perhaps paradoxically, a critical bottleneck in many data science and machine learning applications. My research focuses on designing algorithms that push the limits of both statistical efficiency and computational efficiency. In particular, my work tackles the divide between the theory and practice of data science, which exists even for the most basic statistical problems including mean and (co)variance estimation. Conventional methods such as the sample mean, while supported by theoretical results under strong assumptions, are often brittle in the presence of extreme data points. To counter such deficiencies, practitioners often use ad-hoc and unprincipled "outlier removal" heuristics, revealing a marked gap between the theory and practice even for these fundamental problems. In this talk, I will describe my work towards building a new toolbox of optimal statistical primitives, bridging the theory-practice divide. I will specifically highlight 3 works: A) constructing a statistically-optimal and computationally-efficient 1-dimensional mean estimator, whose estimation error is optimal even in the leading multiplicative constant, under bare minimum distributional assumptions, B) a rather different but optimal mean estimator for the "very high-dimensional" regime, and C) a recent result showing that the estimator from "A)" is robust even under the presence of adversarial data corruption.