How does the brain turn sound into meaning? Study sheds light

A new study has investigated which neurons react to different vocal pitches, discerning between different voices and reacting to emphasis. The findings help us to understand how the brain gains meaning from the sound of speech.

Whether it is discerning between a question and a statement, detecting the phenomenon of "upspeak," or simply figuring out when a person is angry, our brains are constantly at work distinguishing innumerable variations in sounds and gaining meaning from them.

This is all the more impressive when we consider the fact that people have different voices, each with their specific pitch, and that while discerning these minor differences within a person's voice, the human brain also breaks down the sound of speech into consonants, vowels, and word units. This is all done, of course, at remarkable speed.

New research carried out by scientists at University of California, San Francisco (UCSF) examines how the brain processes the subtle changes in vocal pitch or intonation during speech. These patterns of sound - called prosody by scientists and poets - are crucial to our ability to gain meaning from sound.

The findings were published in the journal Science.

As the authors explain, previous research in primates has located areas in the brain that respond to pitch and intonation, but these studies did not go into further depth to identify how neurons in these areas pick up prosody and help the brain to process it into meaning.

The new research - led by study co-author Claire Tang, a fourth-year graduate student in the laboratory of senior study author Dr. Edward Chang, a professor of neurological surgery at the UCSF Weill Institute for Neurosciences - aimed to do just that.

Studying neurons in brain's auditory cortex

Tang and colleagues recruited 10 participants and asked them to listen to four sentences. These sentences were recorded by three different synthesized voices.

Each of the sentences was spoken under four different intonation conditions: neutral, emphasis 1 (emphasizing the first word in the sentence), emphasis 3 (emphasizing the third word), and question.

For instance, one sentence was "Movies demand minimal energy." It was first said as a neutral statement, then as "Movies demand minimal energy," thirdly as "Movies demand minimal energy," and finally as "Movies demand minimal energy?"

Using high-density electrocorticography - in which the participants had tiny electrodes placed at a high density over the surface of their brains - Tang and team monitored the neuronal activity of a brain area called the superior temporal gyrus (STG).

The STG is known to play a key role in the recognition of prosody and spoken words, as it forms the primary auditory cortex of the human brain.

To assess how neurons in this area react to different variables, the team designed a set of conditions wherein these sentences were spoken varying the intonation contour, the phonetic content - that is, a sentence that starts with the word "Movies" is different in sound from one that starts with the word "Reindeer" - or the speaker's identity.

How STG neurons react to speech sound

The researchers identified not only neurons in the STG that could "tell" the difference between the three synthesized voices but also neurons that could discern between the four sentences, regardless of the voice that was uttering them.

More specifically, the scientists saw increased activity in certain neurons when exposed to different sets of sounds that made up the sentences. These neurons were working to recognize the sentences based on the sounds - or phonemes - and irrespective of the voice.

A last group of neurons could "tell" the difference between the four intonation contours. These brain cells had higher or lower activity depending on the emphasis in a sentence, regardless of which sentence it was or the voice that was uttering it.

To verify their findings, Tang and team designed an algorithm with the purpose of predicting which, and how, neurons would react to various sentences uttered by different speakers.

They found that the class of neurons responsible for distinguishing different voices "focused" on the so-called absolute pitch, while the brain cells that reacted to intonation focused on the so-called relative pitch.

Absolute pitch regards the different pitches of individual speakers, while the relative pitch is concerned with how the same voice changes in pitch.

"One of [our] missions is to understand how the brain converts sounds into meaning," says Tang. "What we're seeing here is that there are neurons in the brain's neocortex that are processing not just what words are being said, but how those words are said."

"We were able to show not just where prosody is encoded in the brain, but also how, by explaining the activity in terms of specific changes in vocal pitch."

Claire Tang

"Now, a major unanswered question is how the brain controls our vocal tracts to make these intonational speech sounds," adds Dr. Chang. "We hope we can solve this mystery soon."