2BP
2-Stage Backpropagation.
As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators.
Join the discussion on this paper page.
Leave a reply