Japanese researchers took inspiration from our eyes, creating an artificial retina that could give today’s machine vision systems an upgrade.
Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal-ing properties, and limitations remain insufficiently understood. Current evaluations primarily fo-cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How-ever, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of composi-tional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: low-complexity tasks where standard models surprisingly outperform LRMs, medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities.
*Equal contribution. †Work done during an internship at Apple.
In this episode, host Hannah Fry is joined by Max Jaderberg and Rebecca Paul of Isomorphic Labs to explore the future of drug discovery in the age of AI. They discuss how new technology, particularly AlphaFold 3, is revolutionizing the field by predicting the structure of life’s molecules, paving the way for faster and more efficient drug discovery.
They dig into the immense complexities of designing new drugs: How do you find the right molecular key for the right biological lock? How can AI help scientists understand disease better and overcome challenges like drug toxicity? And what about the diseases that are currently considered “undruggable”? Finally, they explore the ultimate impact of this technology, from the future of personalized medicine to the ambitious goal of being able to eventually design treatments for all diseases.
Further reading:
AlphaFold 3: https://www.nature.com/articles/s41586-024-07487-w.
AlphaFold Server: https://alphafoldserver.com/
Isomorphic Labs: https://www.isomorphiclabs.com/
AlphaFold 3 code and weights: https://github.com/google-deepmind/alphafold3
Timecodes:
00:00 Intro.
02:11 AI & Disease.
05:30 AI in Biology.
06:51 Molecules and Proteins.
12:05 AlphaFold 3
14:40 Demo.
16:20 Human-AI collaboration.
24:30 Drug Design Challenges.
39:00 Beyond Animal Models.
44:35 AI Drug Future.
46:30 Outro.
Thanks to everyone who made this possible, including but not limited to:
IN A NUTSHELL 🔬 Scientists have created a swarm of tiny robots that function as a dynamic, living material. 🧬 Inspired by embryonic development, these robots can change shape, self-heal, and adapt like smart materials. 💡 Equipped with light sensors and magnets, the robots coordinate their movements to transition between solid and liquid states. 🏗️
Since their discovery, the historically and biblically hugely important Dead Sea Scrolls have transformed our understanding of Jewish and Christian origins. However, while the general date of the scrolls is from the third century BCE until the second century CE, individual manuscripts thus far could not be securely dated.
Now, by combining radiocarbon dating, paleography, and artificial intelligence, an international team of researchers led by the University of Groningen has developed a date-prediction model, called Enoch, that provides much more accurate date estimates for individual manuscripts on empirical grounds.
Using this model, the researchers demonstrate that many Dead Sea Scrolls are older than previously thought. And for the first time, they establish that two biblical scroll fragments come from the time of their presumed biblical authors. They present their results in the journal PLOS One.
A trio of AI researchers at KAIST AI, in Korea, has developed what they call a Chain-of-Zoom framework that allows the generation of extreme super-resolution imagery using existing super-resolution models without the need for retraining.
In their study published on the arXiv preprint server, Bryan Sangwoo Kim, Jeongsol Kim, and Jong Chul Ye broke down the process of zooming in on an image and then used an existing super-resolution model at each step to refine the image, resulting in incremental improvements in resolution.
The team in Korea began by noting that existing frameworks for improving the resolution of pictures tend to use interpolation or regression when zooming, resulting in blurry imagery. To overcome these problems, they took a new approach—using a stepwise zooming process, in which subsequent steps improve on those that came before.