2 Hits in 4.9 sec

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [article]

Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed
2021 arXiv   pre-print
Starting with a simple k-means teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the state-of-the-art wav2vec 2.0 performance on the Librispeech  ...  approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training  ...  Ablation: Impact of Hyperparameters Figure 3 and Table VII studies how hyperparameters affect HuBERT pre-training.  ... 
arXiv:2106.07447v1 fatcat:y2x227ubtzbmzduuphvlptoghy

Generative Spoken Language Modeling from Raw Audio [article]

Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux
2021 arXiv   pre-print
Across 3 speech encoders (CPC, wav2vec 2.0, HuBERT), we find that the number of discrete units (50, 100, or 200) matters in a task-dependent and encoder-dependent way, and that some combinations approach  ...  We set up baseline systems consisting of a discrete speech encoder (returning pseudo-text units), a generative language model (trained on pseudo-text), and a speech decoder (generating a waveform from  ...  Wei-Ning Hsu, Yao-Hung Hubert Tsai, Benjamin Bolte, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. HuBERT: How much can a bad teacher benefit ASR pre-training?  ... 
arXiv:2102.01192v2 fatcat:vuucz32wxjcqrc42s3wo7d5tk4