CS Machine Learning Seminar: The Neural Covariance SDE
Thursday, September 22, 2022
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization
Daniel M. Roy and Mufan (Bill) Li
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers.
To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
Joint work with Mihai Nica, available as a preprint at https://arxiv.org/abs/2206.02768.
Daniel Roy is a Canada CIFAR AI Chair at the Vector Institute and an Associate Professor in the Department of Statistical Sciences at the University of Toronto, with cross appointments in Computer Science and Electrical and Computer Engineering. Roy's research spans machine learning, statistics, mathematical logic, applied probability, and computer science. Prior to joining Toronto, Roy was a Research Fellow of Emmanuel College and Newton International Fellow of the Royal Society and Royal Academy of Engineering. http://danroy.org
Mufan (Bill) Li is a PhD candidate in the Department of Statistical Sciences at the University of Toronto, supervised by Daniel Roy and Murat Erdogdu. Mufan’s research is primarily focused on deep learning theory and, in particular, the study of infinite-depth-and-width limits, as well as on sampling algorithms based on Langevin diffusion. https://mufan-li.github.io/