May 32024 Paper page — A Careful Examination of Large Language Model Performance on Grade School Arithmetic A Careful Examination of Large Language Model Performance on Grade School Arithmetic How overfit are popular LLMs on public benchmarks? Join the discussion on this paper page.