Filters








228 Hits in 4.0 sec

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting

Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni
2016 Interspeech 2016  
The loss function modifications consist of a combination of multi-task training and weighted cross entropy.  ...  We show that weighted cross-entropy results in additional accuracy improvements.  ...  Summary In this paper we have proposed and studied a combination of multi-task training and weighted cross entropy DNN training loss functions for more accurate keyword (KW) spotting.  ... 
doi:10.21437/interspeech.2016-1485 dblp:conf/interspeech/PanchapagesanSK16 fatcat:gc5zgi2jynatbm2r2nuyxk4nyi

State Sequence Pooling Training of Acoustic Models for Keyword Spotting

Kuba Łopatka, Tobias Bocklet
2020 Interspeech 2020  
It is equivalent to max/attention pooling but is based on prior acoustic knowledge. We also employ a multi-task learning setup by predicting both LVCSR and keyword posteriors.  ...  We compare our model to a baseline trained on frame-wise cross entropy, with and without per-class weighting. We employ a lowfootprint TDNN for acoustic modeling.  ...  The baseline model is trained with frame-wise cross entropy loss as in Fig. 1 . For comparison, we also train two models based on weighted cross entropy, as in [5, 11] .  ... 
doi:10.21437/interspeech.2020-2722 dblp:conf/interspeech/LopatkaB20 fatcat:ynzfxgeqongnhdlxlgyagar5x4

Domain Aware Training for Far-field Small-footprint Keyword Spotting [article]

Haiwei Wu, Yan Jia, Yuanfei Nie, Ming Li
2020 arXiv   pre-print
In this paper, we focus on the task of small-footprint keyword spotting under the far-field scenario.  ...  To cope with the distortions, we develop three domain aware training systems, including the domain embedding system, the deep CORAL system, and the multi-task learning system.  ...  We also employ multi-task learning to predict keywords and domains simultaneously.  ... 
arXiv:2005.03633v3 fatcat:af7cezrqlreg5ltl2ciwcaqjju

Model Compression Applied to Small-Footprint Keyword Spotting

George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni
2016 Interspeech 2016  
Motivated by this, we investigated two ways to improve deep neural network (DNN) acoustic models for keyword spotting without increasing CPU usage.  ...  Accurate on-device keyword spotting within a tight CPU budget is crucial for such devices.  ...  However, they have largely been replaced with better performing deep neural network (DNN) based keyword spotting systems (KWS).  ... 
doi:10.21437/interspeech.2016-1393 dblp:conf/interspeech/TuckerWSPFV16 fatcat:6jzsy6li2zfzpi66kbz44wdxfm

Domain Aware Training for Far-Field Small-Footprint Keyword Spotting

Haiwei Wu, Yan Jia, Yuanfei Nie, Ming Li
2020 Interspeech 2020  
In this paper, we focus on the task of small-footprint keyword spotting under the far-field scenario.  ...  To cope with the distortions, we develop three domain aware training systems, including the domain embedding system, the deep CORAL system, and the multi-task learning system.  ...  We also employ multi-task learning to predict keywords and domains simultaneously.  ... 
doi:10.21437/interspeech.2020-1412 dblp:conf/interspeech/WuJNL20 fatcat:aghbj3lqjjefbpphkmt5q3np6u

Investigation of DNN-Based Keyword Spotting in Low Resource Environments

Kaixiang Shen, Meng Cai, Wei-Qiang Zhang, Yao Tian, Jia Liu
2016 International Journal of Future Computer and Communication  
Keyword Spotting is a challenging task aiming at detecting the predefined keywords in utterances.  ...  In addition, we investigate several techniques including transfer-learning, multilingual bottleneck features, balancing keyword filler data and data augmentation to address the low resource problem and  ...  ACKNOWLEDGMENT We would like to thank Tingting Cheng for her devotion and help for the writing of this paper.  ... 
doi:10.18178/ijfcc.2016.5.2.458 fatcat:e75coyvfe5ajvhmjhrmvjbas4u

AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data [article]

Menglong Xu, Shengqiang Li, Chengdong Liang, Xiao-Lei Zhang
2021 arXiv   pre-print
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS).  ...  The proposed method not only maximizes the classification accuracy of keywords on the closed training set, but also maximizes the AUC score for optimizing the performance of non-keyword segments detection  ...  For the cross entropy loss, we use a mini-batch size of 128 and L 2 weight decay of 10 −5 .  ... 
arXiv:2107.05859v1 fatcat:5emegtzglndinj7zhnzvzbaiyu

Data Augmentation for Robust Keyword Spotting under Playback Interference [article]

Anirudh Raju, Sankaran Panchapagesan, Xing Liu, Arindam Mandal, Nikko Strom
2018 arXiv   pre-print
Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents.  ...  In this paper, we propose a data augmentation strategy to improve keyword spotting performance under these challenging conditions.  ...  A weighted cross-entropy objective function is used where the loss stream corresponding to the speech recognition targets are weighted at 0.1 and the loss stream corresponding to the keyword targets are  ... 
arXiv:1808.00563v1 fatcat:yyax5ivafzejndtcvko7ay467m

On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification

Rajath Kumar, Vaishnavi Yeruva, Sriram Ganapathy
2018 Interspeech 2018  
For the TDSV task, the MTL model can be well regularized using the CLSTM training examples for personalized wake up task.  ...  In this paper, we show that TDSV and keyword spotting (KWS) can be jointly modeled using the convolutional long short term memory (CLSTM) model architecture, where an initial convolutional feature map  ...  The authors would also like to thank Dhanush Bekal and Harish Haresamudram for their help with i-vectors.  ... 
doi:10.21437/interspeech.2018-1759 dblp:conf/interspeech/KumarYG18 fatcat:7b4ycebkufeypmapiowad4vfj4

Hello Edge: Keyword Spotting on Microcontrollers [article]

Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra
2018 arXiv   pre-print
Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience.  ...  We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements.  ...  Acknowledgements We would like to thank Matt Mattina from Arm Research and Ian Bratt from Arm ML technology group for their help and support.  ... 
arXiv:1711.07128v3 fatcat:swrltzaqc5hvjay7ofrx3r4lwy

Accurate Detection of Wake Word Start and End Using a CNN

Christin Jose, Yuriy Mishchenko, Thibaud Sénéchal, Anish Shah, Alex Escott, Shiv Naga Prasad Vitaladevuni
2020 Interspeech 2020  
Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants.  ...  Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS.  ...  WL CNN (Section 2) and Multi-aligned CNN were trained using cross-entropy loss and Adam optimizer in Tensorflow on that data for 2M steps using random initialization, mini-batch size of 4k, and learning  ... 
doi:10.21437/interspeech.2020-1491 dblp:conf/interspeech/JoseMSSEV20 fatcat:frphlebuurd5phx4mffm3tyb2a

Accurate Detection of Wake Word Start and End Using a CNN [article]

Christin Jose, Yuriy Mishchenko, Thibaud Senechal, Anish Shah, Alex Escott, Shiv Vitaladevuni
2020 arXiv   pre-print
Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants.  ...  Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS.  ...  WL CNN (Section 2) and Multi-aligned CNN were trained using cross-entropy loss and Adam optimizer in Tensorflow on that data for 2M steps using random initialization, mini-batch size of 4k, and learning  ... 
arXiv:2008.03790v1 fatcat:kyoarxbqmnhhbck4qgaprkqb6e

Multi-task Learning with Cross Attention for Keyword Spotting [article]

Takuya Higuchi, Anmol Gupta, Chandra Dhir
2021 arXiv   pre-print
In this paper, we introduce a cross attention decoder in the multi-task learning framework.  ...  Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data.  ...  INTRODUCTION Keyword spotting (KWS) is a task to detect a keyword phrase from audio.  ... 
arXiv:2107.07634v2 fatcat:dizcux4etjepldw7dkimy5ywy4

Simultaneous Detection and Localization of a Wake-Up Word Using Multi-Task Learning of the Duration and Endpoint

Takashi Maekaku, Yusuke Kida, Akihiko Sugiyama
2019 Interspeech 2019  
This paper proposes a novel method for simultaneous detection and localization of a wake-up word using multi-task learning of the duration and endpoint.  ...  Experimental results with real-environment data show that a relative improvement in accuracy of 41% for onset estimation and 38% for endpoint estimation are achieved compared to a baseline method.  ...  Weight parameters of all methods were trained with the cross-entropy criterion [18] . Baseline Methods Two baseline methods based on Deep KWS [5] with a DNN and an LSTM were included.  ... 
doi:10.21437/interspeech.2019-1180 dblp:conf/interspeech/MaekakuKS19 fatcat:f4ag62v4kbfazhkoawifvknc4e

Score normalization and system combination for improved keyword spotting

Damianos Karakos, Richard Schwartz, Stavros Tsakalidis, Le Zhang, Shivesh Ranjan, Tim Ng, Roger Hsiao, Guruprasad Saikumar, Ivan Bulyko, Long Nguyen, John Makhoul, Frantisek Grezl (+6 others)
2013 2013 IEEE Workshop on Automatic Speech Recognition and Understanding  
We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords  ...  merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion.  ...  This is followed by realignment, and a second iteration of cross-entropy training.  ... 
doi:10.1109/asru.2013.6707731 dblp:conf/asru/KarakosSTZRNHSBNMGHKSVLL13 fatcat:ypfmkvvxafhzngsg2v333r5ffm
« Previous Showing results 1 — 15 out of 228 results