Return to Colloquia & Seminar listing
Learning the learning rate in gradient descent
Special EventsSpeaker: | Rachel Ward, University of Texas, Austin |
Related Webpage: | https://www.ma.utexas.edu/users/rachel/ |
Location: | 1147 MSB |
Start time: | Tue, May 8 2018, 4:10PM |
Finding a proper learning rate in stochastic optimization is an important problem. Choosing a learning rate that is too small leads to painfully slow convergence, while a learning rate that is too large can cause the loss function to fluctuate around the minimum or even to diverge. In practice, the learning rate is often tuned by hand for different problems at hand. Several methods have been proposed recently for automatic adjustment of the learning rate according to gradient data that is received along the way. We review these methods, and propose a simple method, inspired by reparametrization of the loss function in polar coordinates. We prove that the proposed method achieves optimal oracle convergence rates in batch and stochastic settings, but without having to know certain parameters of the loss function in advance.
This is our 10th joint Math-Stat Colloquium. Refreshment/reception will start at 3:30pm at 1147 MSB.