How do neural networks work? It’s a question that can confuse novices and experts alike. A team from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) says that understanding these representations, as well as how they inform the ways that neural networks learn from data, is crucial for improving the interpretability, efficiency, and generalizability of deep learning models.
With that mind, the CSAIL researchers have developed a new framework for understanding how representations form in neural networks. Their Canonical Representation Hypothesis (CRH) posits that, during training, neural networks inherently align their latent representations, weights, and neuron gradients within each layer. This alignment implies that neural networks naturally learn compact representations based on the degree and modes of deviation from the CRH.
Senior author Tomaso Poggio says that, by understanding and leveraging this alignment, engineers can potentially design networks that are more efficient and easier to understand. The research is posted to the arXiv preprint server.