Every year, more than 69 million people around the world suffer traumatic brain injury, which leaves many of them unable to communicate through speech, typing, or gestures. These people’s lives could dramatically improve if researchers developed a technology to decode language directly from noninvasive brain recordings. Today, we’re sharing research that takes a step toward this goal. We’ve developed an AI model that can decode speech from noninvasive recordings of brain activity.
From three seconds of brain activity, our results show that our model can decode the corresponding speech segments with up to 73 percent top-10 accuracy from a vocabulary of 793 words, i.e., a large portion of the words we typically use on a day-to-day basis.
Decoding speech from brain activity has been a long-standing goal of neuroscientists and clinicians, but most of the progress has relied on invasive brain-recording techniques, such as stereotactic electroencephalography and electrocorticography. These devices provide clearer signals than noninvasive methods but require neurosurgical interventions. While results from that work suggest that decoding speech from recordings of brain activity is feasible, decoding speech with noninvasive approaches would provide a safer, more scalable solution that could ultimately benefit many more people. This is very challenging, however, since noninvasive recordings are notoriously noisy and can greatly vary across recording sessions and individuals for a variety of reasons, including differences in each person’s brain and where the sensors are placed.