Toggle light / dark theme

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

Even the best-trained robots struggle when they leave the lab. They face “distribution shifts”—situations they didn’t see in training, like a brand of cereal with a new box design or a human suddenly walking into their personal space. Static datasets (fixed instructions) simply can’t prepare a robot for every “what if” scenario.

To make sense of all this messy real-world data, the researchers introduced two key technical innovations to the robot’s “Vision-Language-Action” (VLA) brain.


Imagine bringing home a single robot to be your all-in-one kitchen assistant—you want it to brew your morning Gongfu tea, make fresh juice in the afternoon, and mix the perfect cocktail at night. While it might have been trained extensively in a lab, in your house, the counter is slightly higher, the fruit is shaped differently, and your cocktail shaker is transparent. Pre-trained Vision-Language-Action (VLA) models provide an incredible starting point, yet real-world deployment is never a fixed test distribution. This leaves a critical, unsolved challenge: how do we take the heterogeneous experience generated across a fleet of robots and use it to post-train a single, generalist model across a wide range of tasks simultaneously?

We present Learning While Deploying (LWD), a fleet-scale offline-to-online RL framework for continual post-training of generalist VLA policies. Instead of treating deployment as the finish line where a policy is merely evaluated, LWD turns it into a training loop through which the policy improves. A pre-trained policy is deployed across a robot fleet, and both autonomous rollouts and human interventions are aggregated into a shared replay buffer for offline and online updates. The updated policy is then redeployed, enabling continuous improvement by leveraging interaction data from the entire fleet.

A Generalist Learns Beyond Demonstrations

Some robot learning systems have explored data flywheels: deploying a policy, collecting new robot data, extracting high-quality behaviors, and training the next policy to imitate them. While this supports scalable improvement, it still treats deployment mainly as a source of expert demonstrations. Prior post-training systems mainly focus on specialist policies, leaving fleet-scale post-training of a single generalist policy across diverse tasks unresolved.

Fascinating new research suggests artificial neurodivergence could help solve the AI alignment problem

A new study suggests the key to safe AI isn’t perfect obedience, but cognitive diversity. Researchers propose that creating “neurodivergent” AI ecosystems, where systems check and balance each other, offers a pragmatic solution to the alignment problem.

Brain-inspired chip could reduce AI energy use by 70%

Replicating the brain’s capabilities, an impossible task, may theoretically require thousands of H100, one of NVIDIA’s most powerful GPUs. At 700 watts per chip, we are looking at power consumption in the megawatt range. The brain runs on 20 watts. Scientists have taken inspiration from this remarkable organ to create chips that could cut conventional energy use by 70%.

Researchers at the University of Cambridge have developed a new brain-inspired nanoscale device that they say could dramatically reduce the enormous energy demands of artificial intelligence hardware. The team created an ultra-low-power “memristor”: a device that can both store and process information in the same location, much like synapses in the human brain.

In conventional computing architectures, memory and processing units are physically separated, requiring data to shuttle back and forth between these units for every task. This seemingly simple process consumes enormous amounts of electricity and is a significant contributor to AI’s exploding power demands.

High trust in AI leaves individuals vulnerable to “cognitive surrender,” study finds

People are increasingly outsourcing their thinking to artificial intelligence, bypassing critical reflection entirely. New research reveals that this “cognitive surrender” inflates confidence and causes users to blindly adopt algorithm-generated answers, even when the software is wrong.

Rethinking robotics with physical intelligence

Today’s advances in robotics are often driven by breakthroughs in artificial intelligence, machine learning, and perception. But in complex and constrained environments, the limiting factor is often hardware, not software. Systems that rely on constant data processing, high-bandwidth communication, and centralized compute can face delays, power constraints, and vulnerabilities that limit performance or prevent mission success altogether.

DARPA is looking to tackle these challenges by embedding intelligence directly into the physical materials of robotic systems. A new Request for Information (RFI), calls on the research community to help define a new class of materials capable of intermixed sensing, adapting, and acting in real time without relying on continuous external computation or communication links.

While the RFI itself is exploratory, it is a first step toward a more immediate opportunity: an invite-only, in-person workshop planned for the summer 2026. Selected participants will have the chance to present their ideas, engage with DARPA, and inform future program directions.

New Linux ‘Copy Fail’ flaw gives hackers root on major distros

An exploit has been published for a local privilege escalation vulnerability dubbed “Copy Fail” that impacts Linux kernels released since 2017, allowing an unprivileged local attacker to gain root permissions.

The vulnerability is tracked as CVE-2026–31431 and was discovered by the offensive security company Theori, using its AI-driven pentesting platform Xint Code after scaning the Linux crypto/ sybsystem for about an hour.

Theori reported the finding to the Linux kernel security team on March 23, and patches became available within a week. Technical details and a proof-of-concept exploit for the flaw emerged publicly yesterday.

The MIT-IBM Computing Research Lab Launches to Shape the Future of AI and Quantum Computing

The new lab expands its scope to include quantum computing, alongside foundational artificial intelligence research, with the goal of unlocking new computational approaches that go beyond the limits of today’s classical systems.

/* */