View recent discussion. Abstract: Modern Vision-Language Models (VLMs) can solve a wide range of tasks requiring visual reasoning. In real-world scenarios, desirable properties for VLMs include fast inference and controllable generation (e.g., constraining outputs to adhere to a desired format). However, existing autoregressive (AR) VLMs like LLaVA struggle in these aspects. Discrete diffusion models (DMs) offer a promising alternative, enabling parallel decoding for faster inference and bidirectional context for controllable generation through text-infilling. While effective in language-only settings, DMs’ potential for multimodal tasks is underexplored. We introduce LaViDa, a family of VLMs built on DMs. We build LaViDa by equipping DMs with a vision encoder and jointly fine-tune the combined parts for multimodal instruction following.
Whenever I used to think about brain-computer interfaces (BCI), I typically imagined a world where the Internet was served up directly to my mind through cyborg-style neural implants—or basically how it’s portrayed in Ghost in the Shell. In that world, you can read, write, and speak to others without needing to lift a finger or open your mouth. It sounds fantastical, but the more I learn about BCI, the more I’ve come to realize that this wish list of functions is really only the tip of the iceberg. And when AR and VR converge with the consumer-ready BCI of the future, the world will be much stranger than fiction.
Be it Elon Musk’s latest company Neuralink —which is creating “minimally invasive” neural implants to suit a wide range of potential future applications, or Facebook directly funding research on decoding speech from the human brain—BCI seems to be taking an important step forward in its maturity. And while these well-funded companies can only push the technology forward for its use as a medical devices today thanks to regulatory hoops governing implants and their relative safety, eventually the technology will get to a point when it’s both safe and cheap enough to land into the brainpan’s of neurotypical consumers.
Although there’s really no telling when you or I will be able to pop into an office for an outpatient implant procedure (much like how corrective laser eye surgery is done today), we know at least that this particular future will undoubtedly come alongside significant advances in augmented and virtual reality. But before we consider where that future might lead us, let’s take a look at where things are today.
At an event in Hangzhou, China, Chinese and French guests simply donned 49-gram Rokid AR glasses and spoke in their own languages while instant subtitles appeared on the lenses, enabling seamless dialogue.
Meta has laid off employees in the company’s Reality Labs division that is tasked with developing virtual reality, augmented reality and wearable devices.
Human cyborgs are individuals who integrate advanced technology into their bodies, enhancing their physical or cognitive abilities. This fusion of man and machine blurs the line between science fiction and reality, raising questions about the future of humanity, ethics, and the limits of human potential. From bionic limbs to brain-computer interfaces, cyborg technology is rapidly evolving, pushing us closer to a world where humans and machines become one.
Modern brain–computer interfaces (BCI), utilizing electroencephalograms for bidirectional human–machine communication, face significant limitations from movement-vulnerable rigid sensors, inconsistent skin–electrode impedance, and bulky electronics, diminishing the system’s continuous use and portability. Here, we introduce motion artifact–controlled micro–brain sensors between hair strands, enabling ultralow impedance density on skin contact for long-term usable, persistent BCI with augmented reality (AR). An array of low-profile microstructured electrodes with a highly conductive polymer is seamlessly inserted into the space between hair follicles, offering high-fidelity neural signal capture for up to 12 h while maintaining the lowest contact impedance density (0.03 kΩ·cm−2) among reported articles. Implemented wireless BCI, detecting steady-state visually evoked potentials, offers 96.4% accuracy in signal classification with a train-free algorithm even during the subject’s excessive motions, including standing, walking, and running. A demonstration captures this system’s capability, showing AR-based video calling with hands-free controls using brain signals, transforming digital communication. Collectively, this research highlights the pivotal role of integrated sensors and flexible electronics technology in advancing BCI’s applications for interactive digital environments.
Researchers from Georgia Institute of Technology (Georgia Tech) have developed a microscopic brain sensor which is so tiny that it can be placed in the small gap between your hair follicles on the scalp, slightly under the skin. The sensor is discreet enough not to be noticed and minuscule enough to be worn comfortably all day.
Brain sensors offer high-fidelity signals, allowing your brain to communicate directly with devices like computers, augmented reality (AR) glasses, or robotic limbs. This is part of what’s known as a Brain-Computer Interface (BCI).
A new approach to streaming technology may significantly improve how users experience virtual reality and augmented reality environments, according to a study from NYU Tandon School of Engineering.
The research—presented in a paper at the 16th ACM Multimedia Systems Conference (ACM MMSys 2025) on April 1, 2025—describes a method for directly predicting visible content in immersive 3D environments, potentially reducing bandwidth requirements by up to 7-fold while maintaining visual quality.
The technology is being applied in an ongoing NYU Tandon project to bring point cloud video to dance education, making 3D dance instruction streamable on standard devices with lower bandwidth requirements.
Without the ability to control infrared light waves, autonomous vehicles wouldn’t be able to quickly map their environment and keep “eyes” on the cars and pedestrians around them; augmented reality couldn’t display realistic 3D displays; doctors would lose an important tool for early cancer detection. Dynamic light control allows for upgrades to many existing systems, but complexities associated with fabricating programmable thermal devices hinder availability.
A new active metasurface, the electrically-programmable graphene field effect transistor (Gr-FET), from the labs of Sheng Shen and Xu Zhang in Carnegie Mellon University’s College of Engineering, enables the control of mid-infrared states across a wide range of wavelengths, directions, and polarizations. This enhanced control enables advancements in applications ranging from infrared camouflage to personalized health monitoring.
“For the first time, our active metasurface devices exhibited the monolithic integration of the rapidly modulated temperature, addressable pixelated imaging, and resonant infrared spectrum,” said Xiu Liu, postdoctoral associate in mechanical engineering and lead author of the paper published in Nature Communications. “This breakthrough will be of great interest to a wide range of infrared photonics, materials science, biophysics, and thermal engineering audiences.”
For over a century, galvanic vestibular stimulation (GVS) has been used as a way to stimulate the inner ear nerves by passing a small amount of current.
We use GVS in a two player escape the room style VR game set in a dark virtual world. The VR player is remote controlled like a robot by a non-VR player with GVS to alter the VR player’s walking trajectory. We also use GVS to induce the physical sensations of virtual motion and mitigate motion sickness in VR.
Brain hacking has been a futurist fascination for decades. Turns out, we may be able to make it a reality as research explores the impact of GVS on everything from tactile sensation to memory.
Misha graduated in June 2018 from the MIT Media Lab where she worked in the Fluid Interfaces group with Prof Pattie Maes. Misha works in the area of human-computer interaction (HCI), specifically related to virtual, augmented and mixed reality. The goal of her work is to create systems that use the entire body for input and output and automatically adapt to each user’s unique state and context. Misha calls her concept perceptual engineering, i.e., immersive systems that alter the user’s perception (or more specifically the input signals to their perception) and influence or manipulate it in subtle ways. For example, they modify a user’s sense of balance or orientation, manipulate their visual attention and more, all without the user’s explicit awareness, and in order to assist or guide their interactive experience in an effortless way.
The systems Misha builds use the entire body for input and output, i.e., they can use movement, like walking, or a physiological signal, like breathing as input, and can output signals that actuate the user’s vestibular system with electrical pulses, causing the individual to move or turn involuntarily. HCI up to now has relied upon deliberate, intentional usage, both for input (e.g., touch, voice, typing) and for output (interpreting what the system tells you, shows you, etc.). In contrast, Misha develops techniques and build systems that do not require this deliberate, intentional user interface but are able to use the body as the interface for more implicit and natural interactions.
Misha’s perceptual engineering approach has been shown to increase the user’s sense of presence in VR/MR, provide novel ways to communicate between the user and the digital system using proprioception and other sensory modalities, and serve as a platform to question the boundaries of our sense of agency and trust.