Jan 182024 Asynchronous Local-SGD Training for Language Modeling Join the discussion on this paper page.