Toggle light / dark theme

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

Even the best-trained robots struggle when they leave the lab. They face “distribution shifts”—situations they didn’t see in training, like a brand of cereal with a new box design or a human suddenly walking into their personal space. Static datasets (fixed instructions) simply can’t prepare a robot for every “what if” scenario.

To make sense of all this messy real-world data, the researchers introduced two key technical innovations to the robot’s “Vision-Language-Action” (VLA) brain.


Imagine bringing home a single robot to be your all-in-one kitchen assistant—you want it to brew your morning Gongfu tea, make fresh juice in the afternoon, and mix the perfect cocktail at night. While it might have been trained extensively in a lab, in your house, the counter is slightly higher, the fruit is shaped differently, and your cocktail shaker is transparent. Pre-trained Vision-Language-Action (VLA) models provide an incredible starting point, yet real-world deployment is never a fixed test distribution. This leaves a critical, unsolved challenge: how do we take the heterogeneous experience generated across a fleet of robots and use it to post-train a single, generalist model across a wide range of tasks simultaneously?

We present Learning While Deploying (LWD), a fleet-scale offline-to-online RL framework for continual post-training of generalist VLA policies. Instead of treating deployment as the finish line where a policy is merely evaluated, LWD turns it into a training loop through which the policy improves. A pre-trained policy is deployed across a robot fleet, and both autonomous rollouts and human interventions are aggregated into a shared replay buffer for offline and online updates. The updated policy is then redeployed, enabling continuous improvement by leveraging interaction data from the entire fleet.

A Generalist Learns Beyond Demonstrations

Some robot learning systems have explored data flywheels: deploying a policy, collecting new robot data, extracting high-quality behaviors, and training the next policy to imitate them. While this supports scalable improvement, it still treats deployment mainly as a source of expert demonstrations. Prior post-training systems mainly focus on specialist policies, leaving fleet-scale post-training of a single generalist policy across diverse tasks unresolved.

You have no free will at all | Stanford professor Robert Sapolsky

Become a Big Think member to unlock expert classes, premium print issues, exclusive events and more: https://bigthink.com/membership/?utm_… How your biology and environment make your decisions for you, according to Dr. Robert Sapolsky.

Up next, Your reptilian brain, explained ► • Your reptilian brain, explained | Robert S…

Robert Sapolsky, PhD is an author, researcher, and professor of biology, neurology, and neurosurgery at Stanford University. In this interview with Big Think’s Editor-in-Chief, Robert Chapman Smith, Sapolsky discusses the content of his most recent book, “Determined: The Science of Life Without Free Will.”

Being held as a child, growing up in a collectivist culture, or experiencing any sort of brain trauma – among hundreds of other things – can shape your internal biases and ultimately influence the decisions you make. This, explains Sapolsky, means that free will is not – and never has been – real. Even physiological factors like hunger can discreetly influence decision making, as discovered in a study that found judges were more likely to grant parole after they had eaten.

This insight is key for interpreting human behavior, helping not only scientists but those who aim to evolve education systems, mental health research, and even policy making.

Go Deeper with Big Think:

Firehorse superstition helps uncover why women’s education may not drive Japan’s fertility decline

The rapidly declining marriage and fertility rates across developed East Asian societies strain pension and health care systems, threaten economic growth, and reshape entire societies. To tackle this issue, governments in Japan and across East Asia have invested heavily in pronatalist measures, but often with limited success. For instance, Japan’s government has repeatedly expanded childcare subsidies and parental leave provisions, yet the total fertility rate hit a record low of 1.20 in 2024.

A common narrative in media commentary, policy circles, and even within families is that women are “too educated” or “too career-focused” to marry and have children. However, the exact causal relationship between women’s education level and family formation is not well understood.

To fill this knowledge gap, a team of researchers from Japan and Singapore, led by Associate Professor Rong Fu from the Faculty of Commerce, Waseda University, Japan, used a novel quasi-experimental approach to understand the relationship between education, fertility, and marriage in Japan.

Probabilistic projections of global wind and solar power growth based on historical national experience

PROLONG, a data-driven probabilistic model of technology growth, projects wind and solar expansion consistent with 2 °C pathways and faster than current policy scenarios. The 1.5 °C pathway lies beyond the 95th percentile of projections and meeting this target would require major effort.

Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul

Google this week announced a new set of Play policy updates to strengthen user privacy and protect businesses against fraud, even as it revealed it blocked or removed over 8.3 billion ads globally and suspended 24.9 million accounts in 2025.

The new policy updates relate to contact and location permissions in Android, allowing third-party apps to access the contact lists and a user’s location in a more privacy-friendly manner. This includes a new Contact Picker, which offers a standardized, secure, and searchable interface for contact selection.

“This feature allows users to grant apps access only to the specific contacts they choose, aligning with Android’s commitment to data transparency and minimized permission footprints,” Google said.

EarthSpace 2026

Register now for 2026! A discussion of Earth and space on Earth Day, with Frank White, me, and other great guests!


EarthSpace 2026 brings together leaders, thinkers, and builders to explore one core idea: the future of Earth and the future of space are not separate conversations.

From climate solutions to space infrastructure, from policy to culture, the choices we make today will define how humanity lives on this planet—and beyond it.

This is not a passive webinar. It’s a focused, high-signal conversation with people actively shaping the frontier.

Toward a policy for machine-learning tools in kernel development

The first topic of discussion at the 2025 Maintainers Summit has been in the air for a while: what role — if any — should machine-learning-based tools have in the kernel development process? While there has been a fair amount of controversy around these tools, and concerns remain, it seems that the kernel community, or at least its high-level maintainership, is comfortable with these tools becoming a significant part of the development process.

Sasha Levin began the discussion by pointing to a summary he had sent to the mailing lists a few days before. There is some consensus, he said, that human accountability for patches is critical, and that use of a large language model in the creation of a patch does not change that. Purely machine-generated patches, without human involvement, are not welcome. Maintainers must retain the authority to accept or reject machine-generated contributions as they see fit. And, he said, there is agreement that the use of tools should be disclosed in some manner.

But, he asked the group: is there agreement in general that these tools are, in the end, just more tools? Steve Rostedt said that LLM-generated code may bring legal concerns that other tools do not raise, but Greg Kroah-Hartman answered that the current developers certificate of origin (“Signed-off-by”) process should cover the legal side of things. Rostedt agreed that the submitter is ultimately on the hook for the code they contribute, but he wondered about the possibility of some court ruling that a given model violates copyright years after the kernel had accepted code it generated. That would create the need for a significant cleanup effort.

The global burden of childhood and adolescent cancer (age 0–19 years) from 1990 to 2023: a systematic analysis for the Global Burden of Disease Study 2023

Acute lymphoid leukemia and brain and central nervous system cancers were estimated to be the greatest contributors to new childhood cancer cases in 0–19-year-olds in 2023.

A new comprehensive study published in The Lancet from researchers at IHME and St. Jude Children’s Research Hospital — Science and Medicine examined the burden of childhood and adolescent cancer from 1990 to 2023, aiming to inform effective cancer policy planning around the globe.

Read the study.


Childhood cancer was the eighth-leading cause of childhood deaths and the ninth-leading cause of DALYs among all cancers in 2023. Globally, in 2023, there were an estimated 377 000 incident childhood cancer cases, 144 000 deaths, and 11·7 million DALYs due to childhood cancer.

Perspectives on an Emerging 18TH Sdg Articulation — an Sri Side Event at Copuos Legal Subcommittee

(SRI) will organize a high-level side event during the COPUOS Legal Subcommittee on 16 April 2026 at UNOOSA (Vienna), proposed and convened by Dr. Gülin Dede, titled “Operationalising Space as a Cross-Cutting Enabler of Sustainable Development: Perspectives on an Emerging 18th SDG Articulation.”

The session will bring together legal, policy, industry, and Global South perspectives to examine how outer space is evolving from a sectoral domain into a critical enabling infrastructure for the 2030 Agenda, while simultaneously requiring stewardship as an environment in its own right.

Positioned as an early contribution to shaping how space sustainability is framed within the broader UN system, the event will also be broadcast by the United Nations, extending its reach beyond the room to a global audience.

/* */