Modeling auditory processing of amplitude modulation

Torsten Dau
1997 Journal of the Acoustical Society of America  
Preface A detailed knowledge of the processes involved in hearing is an essential prerequisite for numerous medical and technical applications, such as, e.g., diagnosis and treatment of hearing disorders, construction and tting of digital hearing aids, public address systems in theaters and other auditoria, and speech processing in telecommunication and man-machine interaction. Although much is known about the physiology and psychology of hearing as well as the \eective" signal processing in
more » ... al processing in the auditory system, still many unsolved problems remain and even more fascinating properties of the human ear still have to be characterized by the scientist. This is one of the primary goals of the interdisciplinary graduate college \Psychoacoustics" at the University of Oldenburg where physicists, psychologists, computer scientists, and physicians (specialized in audiology) pursue an interdisciplinary approach towards a better understanding of hearing and its various applications. Within this graduate college, approximately 25 Ph.D. students perform their respective Ph.D. work and training program in an interdisciplinary context. The current issue is based on the doctoral dissertation by Torsten Dau and is one of the most outstanding \outputs" of this graduate college. Torsten Dau's work is focussed on the quantitative modeling of the auditory system's performance in psychoacoustical experiments. Rather than trying to model each physiological detail of auditory processing, his approach is to focus on the \eective" signal processing in the auditory system which uses as little physiological assumptions and physical parameters as necessary, but tries to predict as many psychoacoustical aspects and eects as possible. While his previous work has focussed on temporal eects of auditory processing, Torsten Dau's dissertation focuses on the perception and processing of amplitude modulations. This topic is of particular importance, because most of the natural signals (including speech) are characterized by amplitude modulations and, in addition, physiological data provide evidence of specialized amplitude modulation processing systems in the brain. Thus, an adequate modeling of modulation perception should beakey to the quantitative understanding of the functioning of our ear. The current work now presents a new quantitative signal processing model and validates this model by using "critical" experiments both from the literature and by using data from own experiments. The main chapters of the current work (chapters 2-4) are self-consistent papers that have already been submitted in a modied version to scientic journals. The rst of these main parts (chapter 2) develops the structure of the processing model by developing a kind of \articial" listener, i.e., a computer model which is fed by the same signals as in the psychoacoustical experiments performed with human listeners and is constructed to predict the responses on a trial-by-trial basis. The specialty of this model is the modulation lterbank which forms an essential improvement over previous versions of the model. The current modeling approach reects the close cooperation between the research groups at the \Drittes Physikalisches Institut" in G ottingen, the IPO in Eindhoven, and the University of Oldenburg, and is based on many years of experience in psychoacoustic research. With this modulation lterbank, several eects of modulation detection and modulation masking can be explained in a very exact and intriguing way. In addition, analytical calculations are presented that deal with the modulation spectra of bandpass-ltered signals. Also, an extensive comparison is made between own measurements and model predictions and results from the literature. Thus, a large body of data and several compelling arguments are collected that favour the model structure developed here. Chapter 3 extends the model which was originally designed to deal with narrow-band signals to the important case of broad-band signals and the case of considering a larger temporal range. The intriguing "trick" used by Torsten Dau is to simultaneously evaluate several auditory channels with a combined \optimum" detector so that an equivalence exists between the evaluation of several narrow-band signals and a single broad-band signal. Since previous models of modulation processing from the literature assume such a broad-band analysis, this approach bridges the gap between these previous models and the model developed here. A similar principle is used for the temporal domain where the temporal extension of the signal yields a better detectability of amplitude modulations. This increase in detectability can be described in an intriguing way by appropriate choice of the optimum detector. This concept thus yields a mathematical formulation of the \multiple-look strategy" often referred to in the literature. As in the previous chapter, Torsten Dau can predict both the own experimental data and the data from the literature. The fourth chapter nally deals with the special case of amplitude modulation of sinusoidal carriers at very high frequencies where the coding of information in the central nervous system does not allow for a unique temporal representation of acoustical signals. Because of this eect, previous studies from the literature could not describe the results of modulation perception experiments in a satisfactory way. Torsten Dau can now show i n a v ery impressive w a y that his model structure is also capable of explaining these experimental data. Although the coincidence between his predictions and the data is not as \perfect" as in the previous chapters, the possible causes for these discrepancies are explained in detail. Taken together, the current work can beconsidered an important milestone in the quantitative description of the eective signal processing in the auditory system. Based on this modeling approach introduced here, the science of psychoacoustics can be put on a quantitative, numerical foundation. Thus, it might eventually bepossible to distinguish between \processing" factors and \psychological" factors contributing to the hearing process. These \processing" factors can be incorporated in a \computer ear" which might be the basis for future applications such as digital hearing aids, speech coders, and speech recognition systems. Thus, the current work seems to be both of interest to fundamental scientists (who are seeking to understand the functioning of the highly nonlinear and complex human auditory system) and to applied scientists (who seek to use auditory principles for the improvement of technical systems in hearing and speech technology). I hope that the reader will enjoy reading this work in a similar way as I enjoyed working with Torsten on his dissertation and that the reader might get some impression of the truly interdisciplinary spirit of the graduate college in Oldenburg. Oldenburg, summer 1996 Abstract In this thesis a new modeling approach is developed which is able to predict human performance in a variety of experimental conditions related to modulation detection and modulation masking. Envelope uctuations are analyzed with a modulation lterbank. The parameters of the lterbank were adjusted to allow the model to account for modulation detection and modulation masking data with narrowband carriers at a high center frequency. In the range 0-10 Hz, the modulation lters have a constant bandwidth of 5 Hz. Between 10 and 1000 Hz a logarithmic scaling with a constant Q-value of 2 is assumed. This leads to the following predictions: For conditions in which the modulation frequency (f mod ) is smaller than half the bandwidth of the carrier (f), the model predicts an increase in modulation thresholds with increasing modulation frequency. This prediction agrees with the lowpass characteristic in the temporal modulation transfer function (TMTF) in the literature. Within the model this lowpass characteristic is caused by the logarithmic scaling of the modulation lter bandwidth. In conditions with f mod > f 2 , the model can account for the highpass characteristic in the threshold function, reecting the auditory system's frequency selectivity for modulation. In modulation detection conditions with carrier bandwidths larger than a critical band, the modulation analysis is performed in parallel within each excited peripheral channel. In the detection stage of the model, the outputs of all modulation lters from all excited peripheral channels are combined linearly and with optimal weights. The model accounts for the ndings that, (i), the \time constants" associated with the temporal modulation transfer functions (TMTFs) for bandlimited noise carriers do not vary with carrier center frequency and that, (ii), the time constants associated with the TMTF's decrease monotonically with increasing carrier bandwidth. The model also accounts for data of modulation masking with broadband noise carriers. The predicted masking pattern produced by a narrowband noise along the modulation frequency scale is in very good agreement with results from the literature. To integrate information across time, a \multiple-look" strategy is realized within the detection stage. This strategy allows the model to account for long time constants derived from the data on modulation integration without introducing true long-term integration. Instead, the long \eective" time constants result from the combination of information from dierent \looks" via multiple sampling and probability summation. In modulation detection experiments with deterministic carriers (such a s s i n usoids), the limiting factor for detecting modulation within the model is the internal noise that is added as independent noise to the output of all modulation lters in all peripheral lters. In addition, the shape of the peripheral lters plays a major role in stimulus conditions where the detection is based on the \audibility" of the spectral sidebands of the modulation. The model can account for the observed at modulation detection thresholds up to a modulation rate of about 100 Hz and also for the frequency-dependent roll-o in the threshold function observed in the data for a set of carrier frequencies in the range from 2{9 kHz. The model might also be used in applications such as psychoacoustical experiments with hearing-impaired listeners, speech i n telligibility and speech quality predictions.
doi:10.1121/1.418727 fatcat:jd4j4swdvzhnbh25ptrwccgtem