Sequential estimation techniques and application to multiple speaker tracking and language modeling [article]

Youssef Oualil, Universität Des Saarlandes, Universität Des Saarlandes
For many real-word applications, the considered data is given as a time sequence that becomes available in an orderly fashion, where the order incorporates important information about the entities of interest. The work presented in this thesis deals with two such cases by introducing new sequential estimation solutions. More precisely, we introduce a: I. Sequential Bayesian estimation framework to solve the multiple speaker localization, detection and tracking problem. This framework is a
more » ... te pipeline that includes 1) new observation estimators, which extract a fixed number of potential locations per time frame; 2) new unsupervised Bayesian detectors, which classify these estimates into noise/speaker classes and 3) new Bayesian filters, which use the speaker class estimates to track multiple speakers. This framework was developed to tackle the low overlap detection rate of multiple speakers and to reduce the number of constraints generally imposed in standard solutions. II. Sequential neural estimation framework for language modeling, which overcomes some of the shortcomings of standard approaches through merging of different models in a hybrid architecture. That is, we introduce two solutions that tightly merge particular models and then show how a generalization can be achieved through a new mixture model. In order to speed-up the training of large vocabulary language models, we introduce a new extension of the noise contrastive estimation approach to batch training. During my long journey to this thesis, I had the pleasure to work, collaborate, discuss and meet many great people, who made this experience the most pleasant. Outside the office, the warm love and immense support of my wife Melanie, my son Yassin and my daughter Mariam turned my evenings and weekends into a source of energy to continue on this long road. After many years at the Spoken Language Systems (LSV) group of Saarland university, it became urgent to express an overdue note of gratitude and recognition to many colleagues who became friends, first and foremost, to my advisor Dietrich Klakow whose immense help and support brought the light needed to turn this experience into a success. Many thanks also go to my office mate, colleague and friend Rahil Mahdian Toroghi for the long fruitful discussions and collaboration and for making the office a pleasant place to work. On the same note, I send my gratitude to Friedrich Faubel for putting me on the right track to accomplish this journey. The office was a pleasant place to be and work due to the amazing colleagues,
doi:10.22028/d291-27228 fatcat:oo7sbvq6nrbtjc54tzwwi74rwa