Seminars :: math.ucdavis.edu

Topic Modeling: A Provable Algorithm

Probability

Speaker:	Ravi Kannan, Microsoft Research
Location:	2112 MSB
Start time:	Wed, Mar 4 2015, 2:10PM

Description

Topic Modeling is widely used. The general model inference problem is hard. Known provable algorithms need some strong assumptions on the model. After a from-first-principles introduction, the talk will formulate a model with two natural assumptions: (i) each document has a dominant topic and (ii) each topic has dominant words. We provably solve the model fitting problem. The algorithm is based on three natural components: a thresholding step, Singular Value Decomposition and clustering. The proof crucially draws upon recent results on the convergence of another widely used tool: Lloyd’s k-means algorithm. We test both the performance of the algorithm and the validity of the assumptions on real document corpora with success. The simple device of thresholding has other uses - we will see an exponential advantage for certain “planted” problems in terms of the signalto-noise ratio. Joint with T. Bansal and C. Bhattacharyya, Indian Institute of Science.

Quick Links+

Mathematics Colloquia and Seminars

Topic Modeling: A Provable Algorithm

Description