音韻識別のための特徴系列の動的処理法
Dynamic Processing of Feature Series for Phoneme Discrimination

kazuyo Tanaka
1975 Acoustical Science and Technology  
Phoneme discrimination process for automatic speech recognition requires four stages ; extraction of acoustical features, adaptation to speaker, procesising of dynamics and utilization ef linguistic informa-tion・ This paper proposes a procedure for processing dynamics such as normalization of coarticulation and segmentation of continuous speech. Formant frequencies Fi(i--1,2,3) and the corresponding vocal tract length l are taken as a series of input primitive features. These parameters are
more » ... parameters are obtained by estimating vocal tract area functions and vocal tract lengths from speech wavei),2i Let normalized parameters for adaptation to speaker be denoted by 2t which will be defined as 2t" (Fi ' Fei)!Fet, i--1, 2, 3 Cl) where Foi is the i-th resonance frequency of a uniform tube of length l. The parameters 2i have been carculated fer isolated Japanese vowels in order to confirm suitability of normalization of speaker variation3). In general, continuous speech can be treated as a trajectory in a three dimensional space A(2i,22,23). Let the trajectory be denoted by r(t) which is a vector from the center of the space A. The procedure proposed in this paper has a multi-stage processing structure as mentioned in the following. The trajectory r(t) is modified gradually by passing through eaeh stage, that is, phonemic categories represented by r(t) beceme gradually distinct. Onestage in the multi-stage process has three computational steps given by : * EacsuSVopfothD・kdikdiOemtsvaMza ** mmp ipttt (taIirtostaeMeent) for d?r(N) dr(N) <2) +km r=rCN)+kv dt2 dt . i=i-kp・Q(t)・grad ¢ 1. (3) rC"+i)=S[ITa(t+T)-p(T).e(t,T)・i(t+r) /Sii.a(t+T)・P(T)・6(t (4)・ N; Stage number . kv,km,kp; constants (km<O,1lpvl<lkmzl) O(2i,22,a3); potential function in A a(t); factor of sound pressure [lintensityLo・e] p(r); temporal decay factor [exp(-kpT2)] i(t, T); spacial deeay factor ' [exp(-k"ii;'(t+T)-i(t)112)] Q(t)=SZ.P(r)'6(t,T)dT. Let discrete-time representations of r(t) and a(t) be deneted by {ri}, i= 1, 2, 3, ・・・ and {at}, i--1, 2, 3, ・・・ respectively; the discrete-time version of Eq. (2) is given by F'i=rt+(5) where dT is time interval of {rt} and kr isa constant. The subscript j' depends on kp in such a way as J"dTccl!kp. Eq. (2) represents contrast effect and Eq. (3) represents phonemic potential of perception. Eq. (4) formulates assimilationeffectin temporal and spacial demain. At the initial stage of computational procedure, the region of temporal integration is set narrow and that of spacial integration wide (that is, large kp and small ki.) These eonstants kp and kn are made gradually smaller and larger respectively as the proeessing pass through the stages. At the later stages, where j'・tiT is NII-Electronic Mbrary
doi:10.20697/jasj.31.7_449 fatcat:4r2mk6vnnvheppzmdv3e4p6ufu