The current conventional wisdom on deep neural networks (DNNs) is that, in most cases, simply scaling up a model’s parameters and adopting computationally intensive architectures will result in large performance improvements. Although this scaling strategy has proven successful in research labs, real-world industrial deployments introduce a number of complications, as developers often need to repeatedly train a DNN, transmit it to different devices, and ensure it can perform under various hardware constraints with minimal accuracy loss.
The research community has thus become increasingly interested in reducing such models’ storage size on devices while also improving their run-time. Explorations in this area have tended to follow one of two avenues: reducing model size via compression techniques, or using model pruning to reduce computation burdens.
In the new paper LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification, a team from the University of Maryland and Google Research proposes a way to “bridge the gap” between the two approaches with LilNetX, an end-to-end trainable technique for neural networks that jointly optimizes model parameters for accuracy, model size on the disk, and computation on any given task.