Return to Colloquia & Seminar listing
Stochastic Algorithms for Large-Scale Machine Learning Problems
Special EventsSpeaker: | Shiqian Ma, The Chinese Univ. of Hong Kong |
Location: | 2112 MSB |
Start time: | Mon, Jan 30 2017, 5:10PM |
Stochastic gradient descent (SGD) method and its variants are the main approaches for solving machine learning problems that involve large-scale training dataset. This talk addresses two issues in SGD. (i) One of the major issues in SGD is how to choose the step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. We propose to use the Barzilai-Borwein method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. Numerical results on standard machine learning problems are reported to demonstrate the advantages of our methods. (ii) Another issue is how to incorporate the second-order information to SGD. We propose a stochastic quasi-Newton method for solving nonconvex learning problems. Note that all existing stochastic quasi-Newton methods can only handle convex problems. Convergence and complexity results of our method are established. Numerical results on classification problems using SVM and neural networks are reported.