6 Hits in 8.1 sec

D2.3 Software and demonstration of human-like content description generation

Doukhan, Guo, Harrando, Kurimo, Laaksonen, Lindgren, Lindh-Knuutila, Lisena, Pehlivan Tort, Reboud, Rouhe, Troncy (+1 others)
2020 Zenodo  
Finally, the abstracts of academic theses together with full texts of scientific publications appear at the end of the report.  ...  This deliverable describes the last development iteration of the joint collection of libraries and tools for multimodal content analysis and description from AALTO, EURECOM, INA, Lingsoft, LLS and Limecraft  ...  is based on open source contributions. 2 Variational Bayes resegmentation model was based on  ... 
doi:10.5281/zenodo.4964391 fatcat:ertkzz2wbjajjlavw4iljlbmaq

A Review of Speaker Diarization: Recent Advances with Deep Learning [article]

Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan
2021 arXiv   pre-print
Furthermore, we discuss how speaker diarization systems have been integrated with speech recognition applications and how the recent surge of deep learning is leading the way of jointly modeling these  ...  In the early years, speaker diarization algorithms were developed for speech recognition on multispeaker audio recordings to enable speaker adaptive processing.  ...  A system based on statistical models on spectrum [100] , Gaussian mixture models (GMMs) [101] and on Hidden Markov Models (HMMs) [102, 103] has been traditionally used.  ... 
arXiv:2101.09624v4 fatcat:kvjhbg5axnc2rhhmt4bridt23q

The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence

Kar-Han Tan, Boon Pang Lim
2018 APSIPA Transactions on Signal and Information Processing  
and will be more valuable than ever in guiding the design of novel neural network architectures.  ...  Although deep learning appears to be reducing the algorithmic problem solving to a matter of data collection and labeling, we believe that many insights learned from 'pre-Deep Learning' works still apply  ...  ACKNOWLEDGEMENTS The first author would like to thank Irwin Sobel for pointers on the pioneering work at MIT, and Xiaonan Zhou for her work on many of the deep neural network results shown.  ... 
doi:10.1017/atsip.2018.6 fatcat:6iftrepekjdmjffcb5ouz42jke

Structured Deep Neural Networks for Speech Recognition

Chunyang Wu, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository, Gales Mark
For regularisation, parameters can be separately regularised based on their functions.  ...  Though a sensible performance can be achieved, the lack of interpretations to network structures and parameters causes better regularisation and adaptation on DNN models challenging.  ...  Acknowledgements First of all, I would love to express my sincere and utmost gratitude to my supervisor, Prof. Mark Gales, for his mentorship and support over the past four years.  ... 
doi:10.17863/cam.23363 fatcat:56qw5pl4mnfk7du5hrnxnfgfdm

ICDT 2011 Committee ICDT Steering Committee ICDT 2011 Tecnical Program Committee

Saied Abedi, Bilal Al Momani, Gerard Damm, Alcatel-Lucent, France Javier, Del Lorente, Michael Grottke, Constantin Paleologu, Jyrki Penttinen, Reda Reda, Dan Romascanu, Israel Avaya (+54 others)
ICDT 2011 The Sixth International Conference on Digital Telecommunications Foreword The Sixth International Conference on Digital Telecommunications   unpublished
systems, voice over packet networks, video, conferencing, telephony, as well as image producing, sending, and mining, speech producing and processing, IP/Mobile TV, Multicast/Broadcast Triple-Quadruple-play  ...  Furthermore, our ability to model and reason about the architectural properties of a system built from existing components is of great concern to modern system developers.  ...  ACKNOWLEDGEMENT We greatly appreciate the support and guidance of Dr. Herve Taddei, Dr. Christophe Beaugeant and Dr. Imre Varga. We would also like to thank Dr. Anisse Taleb and Mr.  ... 


Riadh Zaier, Riadh Zaier
It is designed to be accessible and practical, with an emphasis on useful information to those working in the fields of robotics, cognitive science, artificial intelligence, computational methods and other  ...  The editor of the book has extensive research and development experience, and he has patents and publications in the area of humanoid robotics, and his experience is reflected in editing the content of  ...  The voice conversion is based on Eigenvoice Gaussian mixture models (EGMMs) (12) .  ...