A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory‐nerve fibers

Rick L. Jenison, Steven Greenberg, Keith R. Kluender, William S. Rhode
1991 Journal of the Acoustical Society of America  
A composite model of the auditory periphery, based upon a unique analysis technique for deriving filter response characteristics from cat auditory-nerve fibers, is presented. The model is distinctive in its ability to capture a significant broadening of auditory-nerve fiber frequency selectivity as a function of increasing sound-pressure level within a computationally tractable time-invariant structure. The output of the model shows the tongtopic distribution of synchrony activity of single
more » ... ivity of single fibers in response to the steady-state vowel [ • ] presented over a 40-dB range of sound-pressure levels and is compared with the population-response data of Young and Sachs (1979). The model, while limited by its time invariance, accurately captures most of the place-synchrony response patterns reported by the Johns Hopkins group. In both the physiology and in the model, auditory-nerve fibers spanning a broad tongtopic range synchronize to the first formant (Fj), with the proportion of units phase-locked to Fj increasing appreciably at moderate to high sound-pressure levels. A smaller proportion of fibers maintain phase locking to the second and third formants across the same intensity range. At sound-pressure levels of 60 dB and above, the vast majority of fibers with characteristic frequencies greater than 3 kHz synchronize to F• (512 Hz), rather than to frequencies in the most sensitive portion of their response range. On the basis of these response patterns it is suggested that neural synchrony is the dominant auditory-nerve representation of formant information under "normal" listening conditions in which speech signals occur across a wide range of intensities and against a background of unpredictable and frequently intense acoustic interference. F,xperimental studies by Rhode ( 1971 ) and others (e.g., Johnstone etal., 1986; Robles etal., 1986) have demonstrated th,tt the motion of the basilar membrane (BM) is highly nonlinear. The input-output function for basilar membrane motion becomes highly compressive at moderate-to-high sound-pressure levels (SPLs) typical of conversational speech (60-70 dB SPL). One consequence of such compression is a broadening of the membrane's frequency selectivity, which is reflected in the filtering characteristics of auditorynerve (AN) fibers, particularly in those most sensitive to frequencies above 3 kHz. For high SPLs, the frequency response of both the BM and high-CF AN fibers approximates a low-pass filter. Such broadening in frequency selectivity is observed in the "tail" component of the frequency threshold curve of high-characteristic-frequency (CF) AN fibers and in the basalward spread of activity evoked by low-frequency signals at high SPLs. For example, at 70 dB SPL, a 1-kHz sinusold will produce a response across a very large proportion of AN fibers whose CFs range between 0.5 and 8 kHz (Pfeiffer and Kim, 1975; Kim et al., 1980) . We believe that such upward spread of activity is especially important for coding certain features of the speech signal, such as the first formant. Because most models of the auditory periphery assume 773
doi:10.1121/1.401947 pmid:1939884 fatcat:yvcgdx4ik5apfggnib74zgen4y