Tuochao Chen, a University of Washington doctoral student, recently toured a museum in Mexico. Chen doesn’t speak Spanish, so he ran a translation app on his phone and pointed the microphone at the tour guide. But even in a museum’s relative quiet, the surrounding noise was too much. The resulting text was useless.
Various technologies have emerged lately promising fluent translation, but none of these solved Chen’s problem of public spaces. Meta’s new glasses, for instance, function only with an isolated speaker; they play an automated voice translation after the speaker finishes.
Now, Chen and a team of UW researchers have designed a headphone system that translates several speakers at once, while preserving the direction and qualities of people’s voices. The team built the system, called Spatial Speech Translation, with off-the-shelf noise-canceling headphones fitted with microphones. The team’s algorithms separate out the different speakers in a space and follow them as they move, translate their speech and play it back with a 2–4 second delay.