Temporal scaling of neural responses to compressed and dilated natural speech

Y. Lerner, C. J. Honey, M. Katkov, U. Hasson
2014 Journal of Neurophysiology  
Lerner Y, Honey CJ, Katkov M, Hasson U. Temporal scaling of neural responses to compressed and dilated natural speech. Different brain areas integrate information over different timescales, and this capacity to accumulate information increases from early sensory areas to higher order perceptual and cognitive areas. It is currently unknown whether the timescale capacity of each brain area is fixed or whether it adaptively rescales depending on the rate at which information arrives from the
more » ... Here, using functional MRI, we measured brain responses to an auditory narrative presented at different rates. We asked whether neural responses to slowed (speeded) versions of the narrative could be compressed (stretched) to match neural responses to the original narrative. Temporal rescaling was observed in early auditory regions (which accumulate information over short timescales) as well as linguistic and extra-linguistic brain areas (which can accumulate information over long timescales). The temporal rescaling phenomenon started to break down for stimuli presented at double speed, and intelligibility was also impaired for these stimuli. These data suggest that 1) the rate of neural information processing can be rescaled according to the rate of incoming information, both in early sensory regions as well as in higher order cortexes, and 2) the rescaling of neural dynamics is confined to a range of rates that match the range of behavioral performance. fMRI; real-life auditory stimuli; speed of information processing; slow and fast rates of speech REAL-LIFE EVENTS, SUCH AS listening to a speech or watching a movie, unfold over many minutes. During such events, our brains absorb information continuously for the duration of the experience. The information gathered at each particular moment, however, only becomes meaningful in the context of previous events. For example, the meaning of a word depends on its context within a sentence, and each sentence only achieves full meaning in the context of the larger narrative. Recently, we suggested that the brain handles this nested temporal complexity by increasing its processing timescale from low level sensory areas to high level frontal and parietal areas (Hasson et al. 2008; Lerner et al. 2011; . By analogy with the notion of a spatial receptive field (SRF), the temporal receptive window (TRW) of a neural circuit can be defined as the length of time before a response during which sensory information may affect that response. TRWs are short in sensory areas and become gradually longer toward higher order areas (see MATERIALS AND METHODS and Fig. 1 ). The problem of processing temporally nested information is complicated by the fact that real-life information arrives at varying rates. For example, the fastest American English speakers articulate a sentence about twice as fast at the slowest speakers (Smith 2002). Because human listeners can comprehend spoken language with remarkable robustness to speech rate, their brains must be capable of integrating patterns of information over multiple-timescales (e.g., combining sequences of words within and across sentences), while maintaining a functional invariance to the absolute arrival time of each item. Numerous brain mechanisms can produce functional invariance to absolute speech timing. For example, rate invariance can be achieved in a system in which the processing timescale is absolutely fixed, so that if speech signals arrive at different rates then a different region is needed to process the same information. In this system, temporal rescaling of the responses would not be observed within any area; instead, the various aspects of speech processing would switch between brain regions as the speech rate is varied. The second model would achieve functional invariance via a memory buffer system (e.g., in the early auditory cortex), which accepts and accumulates information at a variable rate but then transmits its output at a constant (perhaps optimal) rate to higher order brain areas. In such a system we would observe rescaling of the neural responses only up until the buffer level of the neural hierarchy but not in higher order regions. Finally, in a model suggested by Gütig and Sompolinsky (2009) , functional invariance can be generically achieved by adaptive scaling of the firing rate of neurons according to the incoming information rate. The firing rate, and its modulation, are accelerated (decelerated) when the information rate is high (low). Under this model we would expect response rescaling in all regions. Partial support for the rescaling hypothesis comes from electrophysiological and neuroimaging studies that reported, for early auditory areas, rescaling of neural responses with speech rate (Ahissar et al. 2001; Poldrack et al. 2001; Nourski et al. 2009; Mukamel et al. 2011; Vagharchakian et al. 2012; Peelle et al. 2013) . However, rescaling of responses in these areas is unsurprising, given that the responses in early auditory areas, with short TRWs, are locked to the momentary low level properties of the audio envelope, which scales with the speech rate. A crucial test of the rescaling hypothesis concerns the responses in higher order areas, which have longer TRWs. Response rescaling is more challenging in higher order areas, because processing in these areas must be sensitive to the temporal relations over seconds of time (e.g., word sequences
doi:10.1152/jn.00497.2013 pmid:24647432 pmcid:PMC4044438 fatcat:zmhsalt475djlbhxm5qpdzvaii