Acoustic characteristics of American English vowels

James Hillenbrand, Laura A. Getty, Michael J. Clark, Kimberlee Wheeler
1995 Journal of the Acoustical Society of America  
The purpose of this study was to replicate and extend the classic study of vowel acoustics by Peterson and Barney (PB) [J. Acoust. Soc. Am. 24, 175-184 (1952)]. Recordings were made of 45 men, 48 women, and 46 children producing the vowels/i,t,e,e,a:,a,•,o,u,u,n,3•/in h-V-d syllables. Formant contours for F1-F4 were measured from LPC spectra using a custom interactive editing tool. For comparison with the PB data, formant patterns were sampled at a time that was judged by visual inspection to
more » ... ual inspection to be maximally steady. Analysis of the formant data shows numerous differences between the present data and those of PB, both in terms of average frequencies of F1 and F2, and the degree of overlap among adjacent vowels. As with the original study, listening tests showed that the signals were nearly always identified as the vowel intended by the talker. Discriminant analysis showed that the vowels were more poorly separated than the PB data based on a static sample of the formant pattern. However, the vowels can be separated with a high degree of accuracy if duration and spectral change information is included. PACS numbers: 43.70.Fq, 43.71.Es, 43.72.Ar INTRODUCTION The most widely cited experiment on the acoustics and perception of vowels is a surprisingly simple study conducted at Bell Telephone Laboratories by Peterson and Barney (1952) shortly after the introduction of the sound spectrograph. Peterson and Barney (PB) recorded two repetitions of ten vowels in /hVd/ context spoken by 33 men, 28 women, and 15 children. Acoustic measurements from narrow-band spectra consisted of formant frequencies (F l-F3), formant amplitudes, and fundamental frequency (F0). The measurements were taken at a single time slice that was judged to be "steady state." The /hVd/ signals were also presented to listeners for identification. The results of the measurement study showed a strong relationship between the intended vowel and the formant frequency pattern. However, there was considerable formant frequency variability from one speaker to the next, and there was a substantial degree of overlap in the formant frequency patterns among adjacent vowels. The listening study showed that the vowels were highly identifiable: The overall error rate was 5.6%, and nearly all of the errors involved confusions between adjacent vowels. The PB measurements have played a central role in the development and testing of theories of vowel recognition. Acoustic measurements for the signals recorded by PB have been widely distributed to speech research laboratories (e.g., Watrous, 1991) and have been used in numerous studies to evaluate alternative models of vowel recognition (e.g., Nearey, 1978; Nearey et aL, 1979; Syrdal, 1985; Syrdal and Gopal, 1986; Nearey, 1992; Lippmann, 1989; Miller, 1989; Hillenbrand and Gayreft, 1993a). Despite the widespread use of the PB measurements, there are several well recognized limitations to the database. Perhaps the most important limitation is that the database consi•t• exclusively of acoustic measurements taken at a single time slice. Duration measurements were not made, and no information is available about the pattern of spectral change over time. There is now a solid body of evidence indicating that dynamic properties such as duration and spectral change play an important role in vowel perception (e.g. 1953; Whalen, 1989). Other limitations of the PB database include: (1) There is no indication that subjects were screened for dialect, and very little is known about the dialect of either the speakers or the listeners; (2) listening results were not reported separately for men, women, and child talkers; (3) no information is given about the age or gender of the child talkers; (4) measures were made from a relatively small group of children; (5) there is no way to determine the identifiability of individual tokens; (6) measurement reliability was not reported; and (7) since the original signals are no longer available, the database cannot be used to evaluate signal representations other than F0 and formant frequencies. The present study represents an attempt to address these limitations. Recordings were made of/hVd/utterances spoken by a large group of men, women, and children. Measurements were made of vowel duration, F0 contours, and formant frequency contours. The signals were also presented to a panel of listeners for identification. Finally, discriminant analysis was used to classfly the signals using various combinations of the acoustic measurements. I. ACOUSTIC ANALYSIS A. Methods 1. Talkers Talkers consisted of 45 men, 48 women, and 46 ten-to 12-year-old children (27 boys, 19 girls). The majority of the speakers (87%) were raised in Michigan's lower peninsula, primarily the southeastern and southwestern parts of the state. The remainder were primarily from other areas of the 3099 J. Acoust. Soc. Am. 97 (5), Pt. 1, May 1995 0001-4966/95/97(5)/3099/13/$6.00 ¸ 1995 Acoustical Society of America 3099 upper midwest, such as Illinois, Wisconsin, Minnesota, northern Ohio, and northern Indiana. An extensive screening procedure was used to select these 139 subjects from a larger group. The most important part of the screening procedure was a careful dialect assessment, focusing especially on subjects' production of the/a/-/•/distinction. The/a/-/•/distinction is not maintained by many speakers of American English, a fact which we believed (incorrecfiy, as it turned out) might account for the relatively high confusability reported by PB for this pair of vowels. The screening procedure began with a 5-to 7-min informal conversation with one of the experimenters. This conversation was tape recorded for later review by an experienced phonetitian. Subjects next read a 128-word passage that contained several instances of words with/o/and/•/. Subjects were eliminated if the phonetician noted any systematic departure from general American English, or if the speaker failed to maintain the /o/-/•/ distinction either in spontaneous speech or in the 128-word passage. Subjects were also required to pass a brief task which tested their ability to discriminate/n/-/•/minimal pairs. In addition to the dialect assessment, subjects were eliminated if they: (1) were non-native speakers of English; (2) showed any evidence of a speech, language, or voice disorder; (3) showed any
doi:10.1121/1.411872 pmid:7759650 fatcat:tymrvmfegvgh3clmnrs63sdd4u