Perceiving vowels in the presence of another sound: Constraints on formant perception

C. J. Darwin
1984 Journal of the Acoustical Society of America  
Speech is normally heard against a background of other sounds, yet our ability to isolate perceptually the speech of a particular talker is poorly understood. The experiments reported here illustrate two different ways in which a listener may decide whether a tone at a harmonic of a vowel's fundamental forms part of the vowel. First, a tone that starts or stops at a different time from a vowel is less likely to be heard as part of that vowel than if it is simultaneous with it; moreover, this
more » ... ect occurs regardless of whether the tone has been added to a normal vowel, or to a vowel that has already been reduced in energy at the tone's frequency. Second, energy added simultaneously with a vowel, at a harmonic frequency near to the vowel's first formant, may or may not be fully incorporated into the vowel percept, depending on its relation to the first formant: When the additional tone is just below the vowel's first formant frequency, it is less likely to be incorporated than energy that is added at a frequency just above the first formant. Both experiments show that formants may only be estimated after properties of the sound wave have been grouped into different apparent sound sources. The first result illustrates a general auditory mechanism for performing perceptual grouping, while the second result illustrates a mechanism that may use a more specific constraint on vocal-tract transfer functions. PACS numbers: 43.70.Dn, 43.66.Jh Scheffers, 1983). With random noise, the main perceptual problem is the detection of structure, whereas with an additional formant or voice, the main problem is to group evident structure appropriately (cf. Bregman, 197,8; McAndams, 1980; McAdams and Bregman, 1979). In performing appropriate grouping of formants, it is clear that a common harmonic spacing is influential, Sentences synthesized on a different fundamental frequency from an interfering passage of speech are more intelligible than those of the same fundamental (Brokx and Nooteboom, 1982). The same is true for pairs of simultaneous isolated vowel sounds (Scheffers, 1983); the intelligibility of the vowels is slightly higher when they are synthesized on different fundamentals (80%) than when they are synthesized on the SatHe fulldamental (68•o). Similnrly, when four fornn•n_ts may be grouped in two alternative ways to give both a threeformant syllable and a separate single formant, listeners tend to group together the three formants that share a common fundamental (Darwin, 1981, experiment IV). The size of the grouping effect by a common ftmdamental is not large in some experiments, and clearly can have no effect in voiceless speech, or for formants in which the individual harmonics are too close together, relative to the critical bandwidth, to be resolved (but see Bregman et aL, 1983). It is likely that other factors also play a role. In Scheftera' 1636
doi:10.1121/1.391610 pmid:6520301 fatcat:ycb2aqmysve6blepjzqg4ywr3e