Return to Colloquia & Seminar listing
Statistics and Computation in the Age of Massive Data
Special EventsSpeaker: | Michael I. Jordan, University of California, Berkeley |
Location: | 1147 MSB |
Start time: | Wed, Apr 4 2012, 4:10PM |
There are many issues remaining to be addressed, or even formulated, at the interface of statistics and computation. One way to capture the current state of affairs is the following: If we view data as a resource, how can it be that in many practical problems of interest we find ourselves uncomfortable at being given too much data? The issue is both statistical and computational---on a fixed computational budget we are unable to guarantee that the statistical risk decreases as the number of data points grows (without bound). A general theory not yet being available, I present two initial forays into the problem domain. The first is an exploration of the bootstrap in the regime of very large data sets, where it is computationally infeasible to obtain bootstrap resamples. I present a new procedure, the ``bag of little bootstraps,'' which inherits the favorable theoretical properties of the bootstrap but is also scalable. The second is an exploration of divide-and-conquer strategies for matrix completion. Here the theoretical support is provided by concentration theorems for random matrices, and I present a new approach to this problem based on Stein's method. [Joint work with Ariel Kleiner, Lester Mackey, Purna Sarkar, Ameet Talwalkar, Richard Chen, Brendan Farrell and Joel Tropp].