The resonant dynamics of speech perception: Interword integration and duration-dependent backward effects

Stephen Grossberg, Christopher W. Myers
2000 Psychological review  
How do listeners integrate temporally distributed phonemic information into coherent representations of syllables and words? During fluent speech perception, variations in the durations of speech sounds and silent pauses can produce different perceived groupings. For example, increasing the silence interval between the words "gray chip" may result in the percept "great chip", whereas increasing the duration of fricative noise in "chip" may alter the percept to "great ship" (Repp et al., 1978) .
more » ... The ARTWORD neural model quantitatively simulates such contextsensitive speech data. In ARTWORD, sequential activation and storage of phonemic items in working memory provides bottom-up input to unitized representations, or list chunks, that group together sequences of items of variable length. The list chunks compete with each other as they dynamically integrate this bottom-up information. The winning groupings feed back to provide top-down support to their phonemic items. Feedback establishes a resonance which temporarily boosts the activation levels of selected items and chunks, thereby creating an emergent conscious percept. Because the resonance evolves more slowly than working memory activation, it can be influenced by information presented after relatively long intervening silence intervals. The same phonemic input can hereby yield different groupings depending on its arrival time. Processes of resonant transfer and competitive teaming help determine which groupings win the competition. Habituating levels of neurotransmitter along the pathways that sustain the resonant feedback lead to a resonant collapse that permits the formation of subsequent resonances. R / ("ch"). Remarkably, without changing the amount of silence separating the words, a variation in the initial segment of the second word can alter perception of the first word. The boundary between regions 2 and 3 reveals, moreover, a trading relation between silence and noise durations. At longer silence durations, longer noise durations are required in order to cue a switch R / in /t R a/ can "switch" to the fricative / R /, when the following vowel /a/ is shortened (Kluender & Walsh, 1988) . These durational contrast phenomena illustrate how changing the relative duration of the working memory inputs (for example, how /b/ is processed relative to a short or long /a/) can change the hypotheses selected by the grouping network (/ba/ or /wa/). Recently, Boardman et al. (1999) developed a working memory model, called PHONET, that was used to quantitatively simulate how the /ba/-/wa/ distinction depends on the subsequent vowel
doi:10.1037//0033-295x.107.4.735 fatcat:6rlao2pu4nf25g7fn6buyar35a