The Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

Lotteries and tickets are often used as a didactical analogy to explain the success of overparameterized neural networks: “larger networks succeed because they more likely contain a well-initialized subnetwork that can learn the task in isolation, much like buying more tickets increases the chances of winning a lottery.”

This explanation is intuitive but misleading: it suggests that subnetworks can be treated in isolation from the rest of the network. Following this reasoning leads to interpreting learning in wide networks as a multi-start optimization process, where gradient descent simply conducts a parallel search over subnetworks. We argue that this view is flawed since, among other reasons, winning tickets can be made to fail by perturbing the rest of the network.

Blog

The Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

Leave a CommentCancel reply