Return to Colloquia & Seminar listing
Sampling algorithms and coresets for Lp regression and applications
Applied Math| Speaker: | Michael Mahoney, Yahoo! Inc. |
| Location: | 1147 MSB |
| Start time: | Fri, Apr 6 2007, 12:10PM |
Description
L2 regression, also known as the least squares approximation
problem, and more generally Lp regression, are fundamental problems that
have numerous applications in mathematics and statistical data analysis.
Recall that the over-constrained Lp regression problem takes as input an n x
d matrix A, where n > d, an n-vector b, and a number p, and it returns as
output a d-vector x such that ||Ax-b||_p is minimized. Several randomized
algorithms for this problem, each of which provides a relative-error
approximation to the exact solution, are described. The first algorithm
applies novel ``subspace sampling'' probabilities to randomly sample
constraints and thus construct a coreset for the L2 regression problem. The
second algorithm (of T. Sarlos) applies a random projection and uses a
``fast'' version of the Johnson-Lindenstrauss lemma to obtain an efficient
approximation to the L2 regression problem. The third algorithm generalizes
the concept of a ``well-conditioned'' basis and of ``subspace-preserving
sampling'' to obtain efficiently a coreset for Lp regression, for all
$p\in[1,\infty)$. Applications of these random projection and random
sampling algorithms to internet data analysis will be briefly discussed.
