Return to Colloquia & Seminar listing
Topic Modeling: A Provable Algorithm
Probability| Speaker: | Ravi Kannan, Microsoft Research |
| Location: | 2112 MSB |
| Start time: | Wed, Mar 4 2015, 2:10PM |
Description
Topic Modeling is widely used. The general model inference problem
is hard. Known provable algorithms need some strong assumptions on the
model. After a from-first-principles introduction, the talk will formulate a
model with two natural assumptions: (i) each document has a dominant
topic and (ii) each topic has dominant words. We provably solve the model
fitting problem. The algorithm is based on three natural components: a
thresholding step, Singular Value Decomposition and clustering. The proof
crucially draws upon recent results on the convergence of another widely
used tool: Lloyd’s k-means algorithm. We test both the performance of the
algorithm and the validity of the assumptions on real document corpora with
success. The simple device of thresholding has other uses - we will see an
exponential advantage for certain “planted” problems in terms of the signalto-noise
ratio.
Joint with T. Bansal and C. Bhattacharyya, Indian Institute of Science.
