On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis [chapter]

Christopher Hummersone, Toby Stokes, Tim Brookes
2014 Signals and Communication Technology  
The ideal binary mask (IBM) is widely considered to be the benchmark for time-frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality
more » ... ut. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.
doi:10.1007/978-3-642-55016-4_12 fatcat:c2gn3xterfdztl62p5puzsrwte