Return to Colloquia & Seminar listing
Statistics and Computation in the Age of Massive Data
Special Events| Speaker: | Michael I. Jordan, University of California, Berkeley |
| Location: | 1147 MSB |
| Start time: | Wed, Apr 4 2012, 4:10PM |
Description
There are many issues remaining to be addressed, or even formulated,
at the interface of statistics and computation. One way to capture
the current state of affairs is the following: If we view data as a
resource, how can it be that in many practical problems of interest
we find ourselves uncomfortable at being given too much data? The issue
is both statistical and computational---on a fixed computational budget
we are unable to guarantee that the statistical risk decreases as the
number of data points grows (without bound). A general theory not
yet being available, I present two initial forays into the problem
domain. The first is an exploration of the bootstrap in the regime of
very large data sets, where it is computationally infeasible to obtain
bootstrap resamples. I present a new procedure, the ``bag of little
bootstraps,'' which inherits the favorable theoretical properties of the
bootstrap but is also scalable. The second is an exploration of
divide-and-conquer strategies for matrix completion. Here the
theoretical support is provided by concentration theorems for random
matrices, and I
present a new approach to this problem based on Stein's method. [Joint
work with Ariel Kleiner, Lester Mackey, Purna Sarkar, Ameet Talwalkar,
Richard Chen, Brendan Farrell and Joel Tropp].
