Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Hierarchical Clustering for Everyone

Special Events

Speaker: Grigory Yaroslavtsev, Indiana University, Bloomington
Location: 1147 MSB
Start time: Wed, Jan 29 2020, 4:10PM

Compared to the highly successful flat clustering methods (e.g. k-means), despite its important role in data analysis, hierarchical clustering has been lacking in rigorous algorithmic studies until late due to absence of rigorous objectives. Since 2016, a sequence of works has emerged and gave novel algorithms for this problem. This was enabled by a breakthrough by Dasgupta, who introduced a formal optimization objective into the study of hierarchical clustering.

In this talk I will give an overview of our recent progress on models and scalable algorithms for hierarchical clustering applicable to both arbitrary and high-dimensional vector data, including embedding vectors arising from deep learning. I will first discuss various linkage-based algorithms (single-linkage, average-linkage) and their formal properties with respect to various objectives. I will then introduce 1) a new projection-based approximation algorithm for vector data, 2) a new partitioning-based algorithm for arbitrary data. I will also discuss scalable implementations using projected gradient descent and experimental results on large-scale vector embedding datasets from deep learning and other methods. The talk will be self-contained and doesn’t assume prior knowledge of clustering methods.



Meet the speaker at a reception starting 3:45pm in front of 1147 MSB. Refreshments will be served.