Filters








325 Hits in 7.4 sec

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
2016 2016 IEEE Spoken Language Technology Workshop (SLT)  
We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements.  ...  The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network.  ...  t (5) Max-pooling We propose to train the LSTM for keyword spotting using a max-pooling based loss function.  ... 
doi:10.1109/slt.2016.7846306 dblp:conf/slt/SunRTPFMMSV16 fatcat:iut6eoolfzcrxa77ka5ncujzcy

Hello Edge: Keyword Spotting on Microcontrollers [article]

Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra
2018 arXiv   pre-print
We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements.  ...  Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience.  ...  We would also like to thank Pete Warden from Google's TensorFlow team for his valuable inputs and feedback on this project.  ... 
arXiv:1711.07128v3 fatcat:swrltzaqc5hvjay7ofrx3r4lwy

Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks [article]

Raphael Tang, Gefei Yang, Hong Wei, Yajie Mao, Ferhan Ture, Jimmy Lin
2018 arXiv   pre-print
than state-of-the-art CNNs for KWS, yet can be easily trained and deployed with limited resources.  ...  ASR systems require significant computational resources in training and for inference, not to mention copious amounts of annotated speech data.  ...  Finally, it feeds the long-term context into a deep neural network (DNN) classifier for our N + 1 voice query labels. Short-term context modeling.  ... 
arXiv:1812.07754v1 fatcat:aorwtniflvenjfnty2fu6yx2vq

Encoder-Decoder Neural Architecture Optimization for Keyword Spotting [article]

Tong Mo, Bang Liu
2021 arXiv   pre-print
In this paper, we utilize neural architecture search to design convolutional neural network models that can boost the performance of keyword spotting while maintaining an acceptable memory footprint.  ...  Keyword spotting aims to identify specific keyword audio utterances. In recent years, deep convolutional neural networks have been widely utilized in keyword spotting systems.  ...  There are also other efforts to boost performance of CNN models for KWS by combining other deep learning models, such as recurrent neural network (RNN) [9] , bidirectional long short-term memory (BiLSTM  ... 
arXiv:2106.02738v1 fatcat:ekl3ju5g4bfxdcry62kam3v6kq

An Ultra-low Power RNN Classifier for Always-On Voice Wake-Up Detection Robust to Real-World Scenarios [article]

Emmanuel Hardy, Franck Badets
2021 arXiv   pre-print
We demonstrate the superiority of RNNs on this task compared to the other tested approaches, with an estimated power consumption of 45 nW for the RNN itself in 65nm CMOS and a minimal memory footprint  ...  The purpose of our sensor is to bring down by at least a factor 100 the power consumption in background noise of always-on speech processing algorithms such as Automatic Speech Recognition, Keyword Spotting  ...  A state-of-the-art small footprint keyword spotting achieves 2.85% False Rejection Rate for 1 False Accept per hour at 5 dB Signal-to-Noise Ratio (SNR) [1] .  ... 
arXiv:2103.04792v1 fatcat:astg4vruq5fdfo7ak4wmtsof2m

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting [article]

Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou
2021 arXiv   pre-print
Keyword Spotting (KWS) remains challenging to achieve the trade-off between small footprint and high accuracy.  ...  Furthermore, a new type of model (LG-Net) is exquisitely designed to promote long-short term acoustic feature modeling based on 1D-CNN and self-attention.  ...  Conclusion In this paper, we propose text anchor based metric learning method and design a neural network LG-Net for small-footprint keyword spotting (KWS).  ... 
arXiv:2108.05516v1 fatcat:oo2h65va4bgqxe2fpn3bz7omcm

Small-Footprint Wake Up Word Recognition in Noisy Environments Employing Competing-Words-Based Feature

Ki-mu Yoon, Wooil Kim
2020 Electronics  
This paper proposes a small-footprint wake-up-word (WUW) recognition system for real noisy environments by employing the competing-words-based feature.  ...  To obtain sufficient data for training, data augmentation is performed by using a room impulse response filter and adding sound signals of various television shows as background noise, which simulates  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/electronics9122202 fatcat:qcozwqf3gjcrzc6lq5vbjcdur4

Deep Spoken Keyword Spotting: An Overview

Ivan Lopez-Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen
2021 IEEE Access  
Wei, “Small-footprint of long short-term memory networks for small-footprint keyword spot- keyword spotting with graph convolutional network,” in Proceedings  ...  INDEX TERMS Keyword spotting, deep learning, acoustic model, small footprint, robustness. I.  ... 
doi:10.1109/access.2021.3139508 fatcat:i4pfpfxcpretlkbefp7owtxcti

Exploring Filterbank Learning for Keyword Spotting [article]

Iván López-Espejo and Zheng-Hua Tan and Jesper Jensen
2020 arXiv   pre-print
possibilities in the field of small-footprint KWS.  ...  In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS).  ...  In [6] , Sainath et al. train a raw time convolutional layer (i.e., filterbank), initialized with a gammatone filterbank, jointly with a convolutional, long short-term memory deep neural network (DNN)  ... 
arXiv:2006.00217v1 fatcat:hciy4jfkjzadlgt65jmm6h4x5q

A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting [article]

Yue Gu, Zhihao Du, Hui Zhang, Xueliang Zhang
2019 arXiv   pre-print
Robustness against noise is critical for keyword spotting (KWS) in real-world environments. To improve the robustness, a speech enhancement front-end is involved.  ...  To fit the small-footprint device, a novel convolution recurrent network is proposed, which needs fewer parameters and computation and does not degrade performance.  ...  However, its enhancement model is based on bidirectional long-short time memory ( BiLSTM) which needs too many parameters and computation, which does not fit the small-footprint device.  ... 
arXiv:1906.08415v1 fatcat:frromri5djfu7btz77l4jgjs6e

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification [article]

Chieh-Chi Kao, Ming Sun, Weiran Wang, Chao Wang
2020 arXiv   pre-print
As long short-term memory (LSTM) leads to state-of-the-art results in various speech related tasks, it is employed as a popular solution for AEC as well.  ...  We find max pooling on the prediction level to perform the best among the nine pooling approaches in terms of classification accuracy and insensitivity to event position within an utterance.  ...  As long short-term memory (LSTM) [10] leads to state-of-theart results in various speech related tasks, e.g. automatic speech recognition [4] , keyword spotting [11] , speaker identification [12]  ... 
arXiv:2002.06279v1 fatcat:hzy2lknghzde5abcfg4ieprysu

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

Chieh-Chi Kao, Ming Sun, Weiran Wang, Chao Wang
2020 ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
As long short-term memory (LSTM) leads to stateof-the-art results in various speech related tasks, it is employed as a popular solution for AEC as well.  ...  We find max pooling on the prediction level to perform the best among the nine pooling approaches in terms of classification accuracy and insensitivity to event position within an utterance.  ...  As long short-term memory (LSTM) [10] leads to state-of-theart results in various speech related tasks, e.g. automatic speech recognition [4] , keyword spotting [11] , speaker identification [12]  ... 
doi:10.1109/icassp40776.2020.9053150 dblp:conf/icassp/KaoSWW20 fatcat:qirotninifdulmajnn6fhjm7ea

Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM Networks [article]

Théodore Bluche, Maël Primet, Thibault Gisselbrecht
2020 arXiv   pre-print
The model, based on a quantized long short-term memory (LSTM) neural network, trained with connectionist temporal classification (CTC), weighs less than 500KB.  ...  We describe the different design choices leading to a fast and small-footprint system, able to run on tiny devices, for any arbitrary set of user-defined keywords, without training data specific to those  ...  In this paper, we explore a method similar to [23] , based on a CTC-trained neural network made of long short-term memory (LSTM) layers.  ... 
arXiv:2002.10851v1 fatcat:czidimdbvrbiplyerctlk55ffi

A depthwise separable convolutional neural network for keyword spotting on an embedded system

Peter Mølgaard Sørensen, Bastian Epp, Tobias May
2020 EURASIP Journal on Audio, Speech, and Music Processing  
Availability of data and materials The data that support the findings of this study are available from [34] . Competing interests The authors declare that they have no competing interests.  ...  Acknowledgements This research was supported by the Centre for Applied Hearing Research (CAHR).  ...  It was also found that the weights of the network could all be quantized to 4 bit with no substantial loss of accuracy, which can significantly reduce the memory footprint and possibly reduce the processing  ... 
doi:10.1186/s13636-020-00176-2 fatcat:xteuziyw2fbn5mcfe5y7uptipm

Optimality Assessment of Memory-Bounded ConvNets Deployed on Resource-Constrained RISC Cores

Matteo Grimaldi, Valentino Peluso, Andrea Calimera
2019 IEEE Access  
Results are collected from three realistic IoT tasks (Image Classification on CIFAR-10, Keyword Spotting on the Speech Commands Dataset, Facial Expression Recognition on Fer2013) run on RISC cores (Cortex-M  ...  A cost-effective implementation of Convolutional Neural Nets on the mobile edge of the Internet-of-Things (IoT) requires smart optimizations to fit large models into memory-constrained cores.  ...  It consists of three convolutional layers interleaved with max-pooling and one fully-connected layer. 2) KEYWORD SPOTTING (KWS) A well-known application in the field of speech recognition, which is hard  ... 
doi:10.1109/access.2019.2948577 fatcat:6ckfv72yxjd2ppbct7zwe4ahwm
« Previous Showing results 1 — 15 out of 325 results