Toggle light / dark theme

AI Agent Benchmark for Real-World Professional Workflows

To solve this “utility problem,” researchers have introduced a rigorous new testing ground called Agents’ Last Exam (ALE). The name carries a dual meaning: it acts as a final graduation exam to prove an AI agent is actually ready for corporate deployment, and it represents the absolute frontier of what today’s technology can handle.

The creators of ALE don’t intend for it to be a static, one-time leaderboard. Designed as a “living benchmark,” its pool of tests will continuously grow as new industries and workflows evolve. Ultimately, the goal of Agents’ Last Exam is to shift the AI industry’s focus away from winning abstract academic trophies and toward creating digital assistants capable of driving genuine, measurable economic growth.


Challenge and measure AI agents on economically valuable and real-world tasks.

Agents’ Last Exam is building the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 300+ industry experts, it now spans all 55 targeted sub-industries covering most major fields of professional work performed on a computer, with 1,500+ tasks collected toward a 5,000-task target, keeping scores objective, comparable, and meaningful across domains.

AI is incapable of telling the truth

We worry that AI will spread misinformation, but the real problem runs deeper: AI is incapable of telling the truth at all. Philosophers Bun-Sun Kim and Hongjoon Jo draw on Foucault and Heidegger to argue that humans speak truthfully because our finite, mortal existence is at stake in every word we say. AI, lacking a body, anxiety, or a conscience, risks nothing — it just recombines the internet’s idle talk into statistically plausible text, with no self to reveal. Outsourcing our communication to AI doesn’t just degrade information; it traps us in an endless loop of crowd-sourced mimicry, and threatens our capacity for genuine thought.

ChatGPT can answer complex questions and even seem to hold conversations. But can it tell the truth?

In an era where AI can answer virtually any human question, we must examine whether AI language can truly contain truth. Since the Dartmouth Conference of 1956, we’ve witnessed dramatic technological evolution—from the AI Winter of the 1970s and 80s to today’s sophisticated language models like ChatGPT that generate remarkably human-like text. As we increasingly delegate communication to artificial, rather than human, entities, a fundamental question emerges: Can AI’s artificial language capture the essence of truth conveyed by human discourse?

Kyocera develops breakthrough multilayer ceramic core substrate for advanced AI semiconductors

face_with_colon_three I still think that ceramics would be very useful to stop the need for global mining operations that rely heavily on rare materials when they can make the same chip from ceramics.


To be shown at ECTC 2026, May 26–29 in Orlando, USA, the new substrate technology delivers superior rigidity and circuit miniaturization for next-gen data centers, AI, and ASIC packaging.

Future AI chips could be built on glass

The idea is to use glass as the substrate, or layer, on which multiple silicon chips are connected. This form of “packaging” is an increasingly popular way to build computing hardware, because it lets engineers combine specialized chips designed for specific functions into a single system. But it presents challenges, including the fact that hardworking chips can run so hot they physically warp the substrate they’re built on. This can lead to misaligned components and may reduce how efficiently the chips can be cooled, leading to damage or premature failure.

“As AI workloads surge and package sizes expand, the industry is confronting very real mechanical constraints that impact the trajectory of high-performance computing,” says Deepak Kulkarni, a senior fellow at the chip design company Advanced Micro Devices (AMD). “One of the most fundamental is warpage.”

That’s where glass comes in. It can handle the added heat better than existing substrates, and it will let engineers keep shrinking chip packages—which will make them faster and more energy efficient. It “unlocks the ability to keep scaling package footprints without hitting a mechanical wall,” says Kulkarni.

Taking Longer Steps in Numerical Simulations

It’s often the case that a dynamical system’s constituents move orders of magnitude more quickly than the collective motion that interests researchers. That disparity in scale frustrates modelers. So many computationally intensive time steps are needed to reach the final state that the computation becomes infeasible. Now Filippo Bigi of the Swiss Federal Institute of Technology in Lausanne (EPFL) and his colleagues have extended and tested an approach that uses a machine-learning model to extend the time steps in an atomic-scale simulation by an order of magnitude or more while obeying physical constraints [1]. Their method is general and could be applied to planetary systems, molecular machines, and other dynamical systems.

The EPFL researchers’ starting point was a formulation of classical mechanics that describes the evolution of a system in terms of the positions and momenta of its constituents and an energy term, the Hamiltonian. In general, these and other equations of classical mechanics satisfy fundamental geometric constraints. What’s more, approximate solutions of those equations can be made to satisfy the same constraints. Bigi and his colleagues realized that machine learning could leapfrog over many time steps while also respecting those same geometric constraints.

The researchers tested their approach on several systems, including the three-body problem of celestial dynamics and the transition of germanium telluride to a glassy state. Their simulations reproduced trusted benchmarks but with time steps ten or so times longer. Currently, enforcing the physical constraints undoes most of the computational advantage of the longer time steps. However, the team is optimistic that it can find more computationally efficient implementations.

To discover new physics, AI may need to ‘unlearn’ the old one

A study in the Journal of Cosmology and Astroparticle Physics explores how a machine-learning strategy known as transfer learning could dramatically reduce the computational cost of searching for new physics beyond the standard cosmological model—while also revealing an unexpected risk: Sometimes AI systems can become too reliant on what they already know.

Artificial intelligence is widely used in cosmology to analyze the universe. But testing theories beyond the standard cosmological model, known as ΛCDM, remains extremely computationally demanding.

Although ΛCDM successfully describes many properties of the universe—from its expansion to the distribution of galaxies—physicists know it is probably incomplete. Recent observations hint that phenomena such as massive neutrinos, modified gravity or evolving dark energy could point toward new physics beyond the current model.

Transcending the Brain? AI, Radical Brain Enhancement and the Nature of Consciousness

Human Rights, Ethics, and Artificial Intelligence: Challenges for the next 70 Years of the Universal Declaration.

Susan schneider, university of connecticut, department of philosophy.

Transcending the Brain? AI, Radical Brain Enhancement and the Nature of Consciousness.
The views expressed in this video are those of the speaker(s) at the time of recording and do not necessarily reflect those of the Carr-Ryan Center for Human Rights or Harvard Kennedy School. These perspectives have been presented to encourage debate on important public policy challenges.

Claude Fable 5 and Claude Mythos 5

While Mythos 5 remains largely unconstrained for restricted government and trusted enterprise partners, Fable 5 is wrapped in a sophisticated safety perimeter. If Fable 5 detects a prompt drifting toward high-risk vectors—like cyberwarfare exploits, advanced biology, or chemical synthesis—it doesn’t just give a generic “I can’t answer that” error. Instead, the query seamlessly falls back to Claude Opus 4.8 (Anthropic’s next-most capable model) to handle the response safely.


Today we’re launching Claude Fable 5: a Mythos-class1 model that we’ve made safe for general use.

Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.

/* */