The AI revolution, which has begun to transform our lives over the past three years, is built on a fundamental linguistic principle that lies at the base of large language models such as ChatGPT. Words in a natural language are not strung together in random patterns; rather, there is a statistical structure that allows the model to guess the next word based on what came before. Yet these models overlook a crucial dimension of human communication: content that is not conveyed by words.
In a new study published in the Proceedings of the National Academy of Sciences, researchers from Prof. Elisha Moses’s lab at the Weizmann Institute of Science reveal that the melody of speech in spontaneous conversations in English functions as a distinct language, with a “vocabulary” of hundreds of basic melodies and even rules of syntax that can be used to predict the next melody in the sequence. The study lays the foundation for an artificial intelligence that will understand language beyond words.
The melody, or music, of speech, referred to by the linguistic term “prosody,” encompasses variations in pitch (intonation), loudness (for example, for emphasis), tempo and sound quality (such as a whisper or creaky voice). This form of expression predates words in evolution: Recent studies reveal that both chimpanzees and whales incorporate complex prosodic structures in their communication.