Toggle light / dark theme

Humans no longer have exclusive control over training social robots to interact effectively, thanks to a new study from the University of Surrey and the University of Hamburg.

The study, which will be presented at this year’s IEEE International Conference on Robotics and Automation (ICRA), introduces a new simulation method that lets researchers test their social robots without needing human participants, making research faster and scalable.

Using a humanoid robot, the research team developed a dynamic scanpath prediction model to help the robot predict where a person would look in a social setting.

As artificial intelligence (AI) tools shake up the scientific workflow, Sam Rodriques dreams of a more systemic transformation. His start-up company, FutureHouse in San Francisco, California, aims to build an ‘AI scientist’ that can command the entire research pipeline, from hypothesis generation to paper production.

Today, his team took a step in that direction, releasing what it calls the first true ‘reasoning model’ specifically designed for scientific tasks. The model, called ether0, is a large language model (LLM) that’s purpose-built for chemistry, which it learnt simply by taking a test of around 500,000 questions. Following instructions in plain English, ether0 can spit out formulae for drug-like molecules that satisfy a range of criteria.

A University of Nebraska–Lincoln engineering team is another step closer to developing soft robotics and wearable systems that mimic the ability of human and plant skin to detect and self-heal injuries.

Husker engineer Eric Markvicka, along with graduate students Ethan Krings and Patrick McManigal, recently presented a paper at the prestigious IEEE International Conference on Robotics and Automation in Atlanta, Georgia, that sets forth a systems-level approach for a soft robotics technology that can identify damage from a puncture or extreme pressure, pinpoint its location and autonomously initiate self-repair.

The paper was among the 39 of 1,606 submissions selected as an ICRA 2025 Best Paper Award finalist. It was also a finalist for the Best Student Paper Award and in the mechanism and design category.

#Repost


A Husker engineering team is another step closer to developing soft robotics and wearable systems that mimic the ability of human and plant skin to detect and self-heal injuries.

Researchers report that rentosertib, an AI-discovered TNIK inhibitor, showed promising safety and potential to improve lung function in patients with idiopathic pulmonary fibrosis in a 12-week phase 2a clinical trial. The highest dose group demonstrated a trend toward increased forced vital capacity, especially among patients not receiving standard antifibrotic therapy, supporting further clinical investigation.

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal-ing properties, and limitations remain insufficiently understood. Current evaluations primarily fo-cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How-ever, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of composi-tional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: low-complexity tasks where standard models surprisingly outperform LRMs, medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities.

*Equal contribution. †Work done during an internship at Apple.