Dereverberation method with reverberation time estimation using floored ratio of spectral subtraction

Yuuki Tachioka, Toshiyuki Hanazawa, Tomohiro Iwasaki
2013 Acoustical Science and Technology  
Introduction In reverberant environments, reverberant components of speech degrade the performance of automatic speech recognition (ASR). There are many dereverberation methods for improving the performance. Some researchers have proposed dereverberation methods with a low computational load based on a statistical model of reverberation [1, 2] . Lebart et al. proposed a dereverberation method [3] using Polack's statistical model [2] , whose parameter is reverberation time (RT). This method is
more » ... fective and its computational load is relatively low; however, its performance is unstable because it estimates RT only from the end of an utterance. Gomez et al. proposed an effective method of the dereverberation of late reverberation, but this method requires an impulse response in a room to have been measured in advance [4] . Löllmann et al. also used a statistical model whose RT is estimated by a maximum likelihood approach [5] . This method needs more parameters and computational load than Lebart's method. The key to using statistical models for dereverberation is to limit the number of parameters and to estimate them robustly. In this paper, we propose a dereverberation method in which spectral subtraction (SS) is used [6] . We also use Polack's statistical model and propose a method of estimating RT. In [3], RT is estimated only from the end of utterances, which is inappropriate for speech recognition because RT must be estimated in a short time. It is also difficult to detect the end of an utterance robustly, considering the overlap of utterances. On the other hand, because speech has sparseness in the time-frequency domain [7], we can utilize the decay characteristic of not only the end of utterances but also whole utterances at the frequency bin. Concretely, the proposed method estimates RT from floored ratios after SS. The floored ratio is the ratio of the number of floored points by SS to the total number of points on the time-frequency plane. This is a more robust and effective algorithm than [3] . Additionally, the proposed algorithm does not require training data and reverberation characteristics in a room unlike [4], and is much simpler than [5] . First, we clarify the relationship
doi:10.1250/ast.34.212 fatcat:3d3txlbkfndxxow42boo4pmnfu