Return to Colloquia & Seminar listing
Neural Networks Learning: The Power of Initialization
PDE and Applied Math SeminarSpeaker: | Amit Daniely, Google |
Location: | 1147 MSB |
Start time: | Fri, May 13 2016, 4:10PM |
Given real numbers w_1,...,w_d called weights, an (artificial) neuron computes the function g(x_1,...,x_d) = s(w_1*x_1+...+w_d*x_d) where s:R->R is some fixed (usually non-linear) function. A neural network is obtained by connecting many neurons. Given weights for each of them, it computes a function f:R^n->R.
Neural networks are useful for supervised learning, where the goal is to approximate (learn) a function f*:R^n->R based on a sample (x_1,f*(x_1)),...,(x_m,f*(x_m)). To this end, neural-networks algorithms fix a network, initialize its weights at random, and then locally optimize the weights in order to fit the sample. Despite this procedure optimizes highly non-convex objectives, neural networks enjoy exceptional success recently.
We develop a general connection between neural networks and reproducing kernel Hilbert spaces. Concretely, we show that with high probability over the initial choice of the weights, all functions in the corresponding kernel space can be approximated by a simple and convex change of the network's weights. Hence, even though the training objective is non-convex, the initial random network often forms a good starting point for optimization.
Joint work with Roy Frostig and Yoram Singer