Based on how an AI model transcribes audio into text, the researchers behind the study could map brain activity that takes place during conversation more accurately than traditional models that encode specific features of language structure — such as phonemes (the simple sounds that make up words) and parts of speech (such as nouns, verbs and adjectives).
The model used in the study, called Whisper, instead takes audio files and their text transcripts, which are used as training data to map the audio to the text. It then uses the statistics of that mapping to “learn” to predict text from new audio files that it hasn’t previously heard.