Sep 252025 Measuring the performance of our models on real-world tasks We’re introducing, a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations.