Hold off on your panic — until AI passes this test

While DeepSeek makes AI cheaper, seemingly without cutting corners on quality, a group is trying to figure out how to make tests for AI models that are hard enough. It’s ‘Humanity’s Last Exam’

If you’re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that AI systems can’t pass.

For years, AI systems were measured by giving new models a variety of standardized benchmark tests. Many of these tests consisted of challenging, SAT-calibre problems in areas like math, science and logic. Comparing the models’ scores over time served as a rough measure of AI progress.

Blog