Cross-relation based blind identification of acoustic SIMO systems and applications

Mathieu Hu, Patrick A. Naylor, Mike Brookes, European Union
<span title="2017-10-26">2017</span>
Speech signals captured by microphones placed at a distance from the speaker are cor- rupted by reverberation, i.e. sound waves reflected off hard surfaces such as walls and objects. The spectral distortion caused by reverberation drastically decreases the perfor- mance of automatic speech recognition systems and may degrade the intelligibility and the quality of speech for human listeners. The increased use of devices controlled by distant speech therefore induces the need for dereverberation.
more &raquo; ... A possible approach to dereverberation is that of system equalization, which consists of the blind estimation of the room impulse responses from noisy reverberant signals followed by an inversion of these impulse responses. This thesis investigates the first part of this two-stage approach. The cross-relation method is adopted and exploited in two different ways. The first way follows the adaptive filter framework, which was first introduced in the context of blind identification of room impulses responses in the Multi-Channel Least Mean Square. By considering a block update of this stochastic gradient algorithm, a noise robust algorithm is developed. The convergence rate of the resulting algorithm is then increased by using a locally optimal adaptive step-size. The cross-relation, expressed in the frequency domain, is then shown to contain the transfer function relating any of the microphone to a reference microphone. This relative transfer function can be used to reduce the number of variables to be estimated. However, the performance of the previous methods severely degrades when realisti- cally long room impulse responses are considered. An alternative interpretation of the cross-relation, from an annihilation filter perspective, is therefore explored. The resulting algorithm is shown to be able to estimate room impulse responses of thousands of taps. From a more practical perspective, the use of room impulses estimated at a poor accuracy is investigated for the problem of speaker diarization. The spatial information c [...]
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.25560/52430">doi:10.25560/52430</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ie3nbr3lqnbltjhfgeu5arku6y">fatcat:ie3nbr3lqnbltjhfgeu5arku6y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200216223149/https://spiral.imperial.ac.uk:8443/bitstream/10044/1/52430/1/Hu-M-2017-PhD-Thesis.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/03/fe/03fe20b1fcff6ab03e55907b82a73cfb2533d57e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.25560/52430"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>