The Ideal Interaural Parameter Mask: A bound on binaural separation systems

Michael I. Mandel, Daniel P. W. Ellis
2009 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics  
• Task: binaural, underdetermined source separation in reverberation • We introduce the Ideal Interaural Parameter Mask (IIPM) to upper bound mask-based source separation algorithms that use only the differences between two channels. • We also make the following improvements to our Model-based EM Source Separation and Localization (MESSL) system garbage source model that absorbs reverberant energy prior on interaural level difference to force it closer to its anechoic value oracle reliability
more » ... asure to weight spectrogram regions • MESSL comes within 0.9 dB SNRI of IIPM bound Ideal Interaural Parameter Mask (IIPM) Example kernel density estimates in dB from the 4125 Hz band, target at 0 o and interferer at 90 o . Target energy Interferer energy Ratio • Similar to the ideal binary mask, but based solely on interaural parameters at a given freq, all cells with the same parameters have the same fate • The IIPM has oracle knowledge of binaural transfer functions • Models the target and interferer sources' energy non-parametrically as a function of interaural level and phase differences (ILD and IPD) separate model at each frequency interaural parameters from mixture, energy weights from oracle • Classification of observed interaural parameters evaluate both KDEs at observed values and assign to the more energetic independent classification at every time-frequency point • "Training IIPM" also has oracle knowledge of pre-mixed signals equivalent to ideal binary mask when kernel bandwidths are 0 • "Testing IIPM" passes different signals through the same binaural transfer functions Example ideal masks Ideal binary mask Training IIPM Testing IIPM References S. Harding, J. Barker, and G. J. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Tr. ASLP, 14(1):58-67, 2006. M.I. Mandel and D.P.W. Ellis. EM localization and separation using interaural level and phase cues. In WASPAA, pages 275-278, 2007. Overview of MESSL algorithm • MESSL models each source's interaural level and time (phase) difference • Clusters time-frequency points into sources • EM algorithm alternates between estimating the time-frequency points dominated by each source maximizing the likelihood of each sources' interaural parameter model • Garbage source same parametrization as compact sources but initialized with flat ITD and IPD models, and ILD of 0 dB • ILD prior captures gross dependency of ILD on ITD and frequency set from ITD initialization • Oracle reliability meant to reduce noise in parameter estimates proxy for estimates of e.g. direct-to-reverberant ratio (DRR) mask with 0.99 where DRR > 0 dB and 0.01 where DRR < 0 dB Example masks from original MESSL, MESSL with a garbage source, and MESSL with a garbage source and ILD prior. MESSL +Garbage src +ILD Prior The ILD prior captures the gross dependence of ILD on ITD and frequency in anechoic head-related transfer functions (HRTFs). 5 10 15 20 −0.5 0 0.5 −40 −20 0 20 40 Frequency (kHz) ITD (ms) ILD (dB) 5 10 15 20 −0.5 0 0.5 −20 0 20 Frequency (kHz) ITD (ms) ILD (dB) Actual ILD-ITD relationship ILD Prior • Captures general properties of the HRTFs, rather than the room trained on HRTFs from a different person 4. Experiment • Two simultaneous utterances both from same speaker targets are strings of five digits interferers are TIMIT sentences • Convolved w/ measured binaural impulse responses reverberant, T 60 = 565 ms • Evaluated using signal-to-noise ratio improvement Results Signal to noise ratio improvement vs separation angle for MESSL variants, the Ideal Binary Mask, and the Ideal Interaural Parameter Masks. Error bars show 1 standard error. −90 −45 0 45 90 0 2 4 6 8 10 Angle (deg) SNRI (dB) IBM IIPM training IIPM testing MESSL+Garb+ILD Prior+Rel MESSL+Garb+ILD Prior MESSL+Garb MESSL Overall separation results, averaged across all angles. Algorithm SNRI ± 95% Improvement Ideal binary mask 9.55 ± 0.25 2.55 dB IIPM training 7.68 ± 0.18 IIPM testing 6.77 ± 0.14 0.91 dB MESSL+Garb+ILD prior+Rel 5.97 ± 0.20 MESSL+Garb+ILD prior 5.86 ± 0.18 1.45 dB MESSL+Garb 5.39 ± 0.17 MESSL 4.41 ± 0.15 Regression analysis of results • Linear regression of SNRI on different model and mixture parameters -"Original" coefficients apply to predictors in their original units -"Standardized" coefficients apply to unit-variance predictors Effect of model and mixture parameters on separation SNRI, estimates and 95% confidence intervals. Predictor Unit Original Standardized Initial SNR dB −0.24 ± 0.02 −0.57 ± 0.05 Garbage src binary 0.87 ± 0.09 0.44 ± 0.05 ILD prior binary 0.48 ± 0.09 0.24 ± 0.05 Reliability binary 0.33 ± 0.09 0.16 ± 0.05 cos(Angle) -−0.33 ± 0.12 −0.14 ± 0.05
doi:10.1109/aspaa.2009.5346506 dblp:conf/waspaa/MandelE09 fatcat:ndlclqb5cjfihiavxdqbqhqubi