3 Hits in 3.1 sec

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments [article]

Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra
2021 arXiv   pre-print
noisy environment.  ...  In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an  ...  We would also like to thank everyone who gave valuable feedback on the dataset. Lastly, we extend our thanks to all participants involved in the data collection.  ... 
arXiv:2107.04174v2 fatcat:owdguaovsnd67n57vm6l253jn4

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments [article]

Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii
2022 arXiv   pre-print
This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments.  ...  A major approach that has actively been studied in simulated environments is to sequentially perform speech enhancement and automatic speech recognition (ASR) based on deep neural networks (DNNs) trained  ...  Acknowledgments This work was supported in part by JSPS KAKENHI Nos. 19H04137, 20K19833, and 20K21813.  ... 
arXiv:2207.07273v1 fatcat:sxdfeolb65g6tawuxi5vv2e53q

Ego4D: Around the World in 3,000 Hours of Egocentric Video [article]

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan (+73 others)
2022 arXiv   pre-print
Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community.  ...  Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation  ...  Thank you to the Common Visual Data Foundation (CVDF) for hosting the Ego4D dataset.  ... 
arXiv:2110.07058v3 fatcat:lgh27km63nhcdcpkvbr2qarsru