While cutting and reassembling recordings of speech, scientists from Duke University in Durham, NC, and Massachusetts Institute of Technology in Cambridge have located an area of the brain that is sensitive to the timing of speech.

brain with sound wavesShare on Pinterest
The superior temporal sulcus showed larger responses to speech quilts made up of longer segments than shorter speech quilts.

Speech timing is a critical element of spoken language and an essential element of the structure of human speech. In order to understand speech, the brain is required to integrate the rapidly evolving information of phonemes, syllables and words.

Phonemes are the shortest, most basic units of speech and last around 30-60 milliseconds. By comparison, syllables take a longer 200-300 milliseconds, and whole words are longer still.

To cope with the flood of information, it is likely that the human auditory system takes shortcuts to process the sounds effectively.

Study co-author Tobias Overath, an assistant research professor of psychology and neuroscience at Duke, reveals in the journal Nature Neuroscience that these shortcuts may take place by the auditory system sampling chunks of information similar in length to that of an average consonant or syllable.

Overath and collaborators, including corresponding author Josh McDermott from Massachusetts Institute of Technology (MIT), used an innovative algorithm to create new sounds from cut and reassembled 30-960-millisecond chunks of foreign speech recordings. The authors refer to these new sounds as "speech quilts."

The shorter the snippets of speech quilts, the more significant the disruption was to the original speech structure.

The scientists measured the activity of neurons by playing speech quilts to study participants while scanning their brains in a functional magnetic resonance imaging (fMRI) machine.

The below clips are cut up recordings of German speech that were reassembled and played to participants in the brain-scanning machine to study how the brain processes the sounds of speech:

Credit: Tobias Overath, Duke University

The team theorized that brain areas involved in sound processing would show larger responses to speech quilts made up of longer segments.

The theory was proven semi-correct - a region of the brain called the superior temporal sulcus (STS) became highly active during the 480- and 960-millisecond quilts compared with the 30-millisecond quilts. However, In contrast, other areas of the brain involved in processing sound did not change their response as a result of the differences in the sound quilts.

"That was pretty exciting. We knew we were onto something," comments Overath, who is also a member of the Duke Institute for Brain Sciences.

The STS brain region is recognized as integrating auditory and other sensory information. However, no previous studies have shown STS to be sensitive to time structures in speech.

STS unresponsive to quilting manipulation applied to control sounds

To test whether the activation of the STS was due to time structures and rule out other explanations, the researchers tested numerous control sounds that they created to mimic speech.

Of the three synthetic sounds they created and tested, the first shared the frequency of speech but lacked its rhythms, the second had pitch removed from the speech and the third used environmental sounds.

Each of the control sounds were chopped up into either 30- or 960-millisecond pieces and stitched back together as quilts, before playing them to participants.

In contrast to the response to the speech quilts, the STS was not responsive to the quilting manipulation when it was applied to the control sounds.

Overath says:

"We really went to great lengths to be certain that the effect we were seeing in STS was due to speech-specific processing and not due to some other explanation, for example, pitch in the sound or it being a natural sound as opposed to some computer-generated sound."

Future research for the group will study whether the STS response is similar for foreign speech that is phonetically dissimilar to English, such as Mandarin or quilts of familiar speech that is intelligible and has meaning.

The team expects that familiar speech may lead to stronger activation on the left side of the brain, which is thought to be dominant in processing language.