Toggle light / dark theme

Reinforcement learning (RL) has become central to advancing Large Language Models (LLMs), empowering them with improved reasoning capabilities necessary for complex tasks. However, the research community faces considerable challenges in reproducing state-of-the-art RL techniques due to incomplete disclosure of key training details by major industry players. This opacity has limited the progress of broader scientific efforts and collaborative research.

Researchers from ByteDance, Tsinghua University, and the University of Hong Kong recently introduced DAPO (Dynamic Sampling Policy Optimization), an open-source large-scale reinforcement learning system designed for enhancing the reasoning abilities of Large Language Models. The DAPO system seeks to bridge the gap in reproducibility by openly sharing all algorithmic details, training procedures, and datasets. Built upon the verl framework, DAPO includes training codes and a thoroughly prepared dataset called DAPO-Math-17K, specifically designed for mathematical reasoning tasks.

DAPO’s technical foundation includes four core innovations aimed at resolving key challenges in reinforcement learning. The first, “Clip-Higher,” addresses the issue of entropy collapse, a situation where models prematurely settle into limited exploration patterns. By carefully managing the clipping ratio in policy updates, this technique encourages greater diversity in model outputs. “Dynamic Sampling” counters inefficiencies in training by dynamically filtering samples based on their usefulness, thus ensuring a more consistent gradient signal. The “Token-level Policy Gradient Loss” offers a refined loss calculation method, emphasizing token-level rather than sample-level adjustments to better accommodate varying lengths of reasoning sequences. Lastly, “Overlong Reward Shaping” introduces a controlled penalty for excessively long responses, gently guiding models toward concise and efficient reasoning.

A team of medical researchers and engineers at Google Research has developed a way to use the front-facing camera on a smartphone to monitor a patient’s heart rate. The team has published a paper on the technology on the arXiv preprint server.

Tracking a patient’s over time can reveal clues about their cardiovascular health. The most important measurement is resting heart rate (RHR)—people with an above-normal rate are at a higher risk of heart disease and/or stroke. Persistently high rates, the researchers note, can signal a serious problem.

Over the past several years, personal health device makers have developed wearable external heart monitors, such as necklaces or smartwatches. But these devices are expensive. The researchers have found a cheaper alternative—a deep-learning system that analyzes video from the front-facing camera of a smartphone. The system is called PHRM.

In today’s AI news, all eyes will be on Nvidia’s GPU Technology Conference this week, where the company is expected to unveil its next AI chips. Nvidia CEO Jensen Huang said he will share more about the upcoming Blackwell Ultra AI chip, Vera Rubin platform, and plans for upcoming products at the annual conference, known as the GTC, during the company’s fourth quarter earnings call.

In other advancements, after decades of relying on Google’s ten blue links to find everything, consumers are quickly adapting to a completely new format: AI chatbots that do the searching for them. Adobe analyzed “more than 1 trillion visits to U.S. retail sites” through its analytics platform, and conducted a survey of “more than 5,000 U.S. respondents” to better understand how people are using AI.

Meanwhile, Barry Eggers, Co-Founder and Managing Partner at Lightspeed Venture Partners, is a luminary in the venture capital industry. As the AI landscape continues to evolve, Barry discusses the challenge of building defensible AI startups. Beyond just access to models, AI startups need differentiated data, network effects, and unique applications to maintain a competitive edge.

Re thinking of starting a new business and need advice on what to do, your first move should be turning to an AI chatbot tool. That t answer who won the Oscars last year? IBM Fellow, Martin Keen explains how RAG (Retrieval-Augmented Generation) and CAG (Cache-Augmented Generation) address knowledge gaps in AI. Discover their strengths in real-time retrieval, scalability, and efficient workflows for smarter AI systems. + s Gemini 2.0 about to revolutionize image generation and editing? In this video, Tim is diving deep into Google We close out with, Anthropic researchers Ethan Perez, Joe Benton, and Akbir Khan discuss AI control—an approach to managing the risks of advanced AI systems. They discuss real-world evaluations showing how humans struggle to detect deceptive AI, the three major threat models researchers are working to mitigate, and the overall idea of controlling highly-capable AI systems whose goals may differ from our own.

Thats all for today, but AI is moving fast — subscribe and follow for more Neural News.

(https://open.substack.com/pub/remunerationlabs/p/nvidia-is-a…Share=true)


Since the general AI agent Manus was launched last week, it has spread online like wildfire. And not just in China, where it was developed by the Wuhan-based startup Butterfly Effect. It’s made its way into the global conversation, with influential voices in tech, including Twitter cofounder Jack Dorsey and Hugging Face product lead Victor Mustar, praising its performance. Some have even dubbed it “the second DeepSeek,” comparing it to the earlier AI model that took the industry by surprise for its unexpected capabilities as well as its origin.

S first general AI agent, using multiple AI models (such as Anthropic.


The new general AI agent from China had some system crashes and server overload—but it’s highly intuitive and shows real promise for the future of AI helpers.

A new video demonstrates the Unitree G1 Humanoid Robot using the HoST framework to stand up from seemingly impossible positions. Whether lying flat on its back, slumped against a wall, reclining in a chair, or sprawled out on a sofa, the robot methodically adjusts itself before rising with unsettling precision. It’s very reminiscent of someone rising from the dead, a comparison I’m not really that excited to make when it comes to robots.

Sign up for the most interesting tech & entertainment news out there.

New theoretical physics research introduces a simulation method of machine-learning-based effective Hamiltonian for super-large-scale atomic structures. This effective Hamiltonian method could simulate much larger structures than the methods based on quantum mechanisms and classical mechanics.

The findings are published in npj Computational Materials under the title, “Active learning of effective Hamiltonian for super-large-scale .” The paper was authored by an international team of physicists, including the University of Arkansas, Nanjing University, and the University of Luxembourg.

In ferroelectrics and dielectrics, there is one kind of structure—mesoscopic structure, which usually has atoms more than millions.

The Black Basta ransomware operation created an automated brute-forcing framework dubbed ‘BRUTED’ to breach edge networking devices like firewalls and VPNs.

The framework has enabled BlackBasta to streamline initial network access and scale ransomware attacks on vulnerable internet-exposed endpoints.

The discovery of BRUTED comes from EclecticIQ researcher Arda Büyükkaya following an in-depth examination of the ransomware gang’s leaked internal chat logs.

A Cornell-led research team has developed an artificial intelligence-powered ring equipped with micro-sonar technology that can continuously—and in real time—track fingerspelling in American Sign Language (ASL).

In its current form, SpellRing could be used to enter text into computers or smartphones via fingerspelling, which is used in ASL to spell out words without corresponding signs, such as proper nouns, names and technical terms. With further development, the device—believed to be the first of its kind—could revolutionize ASL translation by continuously tracking entire signed words and sentences.

The research is published on the arXiv preprint server.

We move thanks to coordination among many skeletal muscle fibers, all twitching and pulling in sync. While some muscles align in one direction, others form intricate patterns, helping parts of the body move in multiple ways.

In recent years, scientists and engineers have looked to muscles as potential actuators for “biohybrid” robots—machines powered by soft, artificially grown . Such bio-bots could squirm and wiggle through spaces where traditional machines cannot. For the most part, however, researchers have only been able to fabricate artificial muscle that pulls in one direction, limiting any robot’s range of motion.

Now MIT engineers have developed a method to grow artificial muscle tissue that twitches and flexes in multiple coordinated directions. As a demonstration, they grew an artificial, muscle-powered structure that pulls both concentrically and radially, much like how the iris in the human eye acts to dilate and constrict the pupil.

International Iberian Nanotechnology Laboratory (INL) researchers have developed a neuromorphic photonic semiconductor neuron capable of processing optical information through self-sustained oscillations. Exploring the use of light to control negative differential resistance (NDR) in a micropillar quantum resonant tunneling diode (RTD), the research indicates that this approach could lead to highly efficient light-driven neuromorphic computing systems.

Neuromorphic computing seeks to replicate the information-processing capabilities of biological neural networks. Neurons in rely on rhythmic burst firing for sensory encoding, , and network synchronization, functions that depend on oscillatory activity for signal transmission and processing.

Existing neuromorphic approaches replicate these processes using electrical, mechanical, or thermal stimuli, but optical-based systems offer advantages in speed, energy efficiency, and miniaturization. While previous research has demonstrated photonic synapses and artificial afferent nerves, these implementations require additional circuits that increase power consumption and complexity.