37 Hits in 4.6 sec

Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier [article]

Zhiming Wang, Xiaolong Li, Jun Zhou
2017 arXiv   pre-print
Mainly for the sake of solving the lack of keyword-specific data, we propose one Keyword Spotting (KWS) system using Deep Neural Network (DNN) and Connectionist Temporal Classifier (CTC) on power-constrained  ...  small-footprint mobile devices, taking full advantage of general corpus from continuous speech recognition which is of great amount.  ...  In this paper, we propose one keyword spotting system using Deep Neural Network and Connectionist Temporal Classifier (CTC) [6] , which makes full use of general LVCSR corpus.  ... 
arXiv:1709.03665v1 fatcat:fmtevd64brcwrfbhpz66g7zpnm

Deep Spoken Keyword Spotting: An Overview

Ivan Lopez-Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen
2021 IEEE Access  
Sun, “Robust small-footprint tional neural network for keyword spotting on an embedded system,” keyword spotting using sequence-to-sequence model with connectionist  ...  Parada, “Convolutional neural networks for small- ting using time-domain features in a temporal convolutional network,” footprint keyword spotting,” in Proceedings  ... 
doi:10.1109/access.2021.3139508 fatcat:i4pfpfxcpretlkbefp7owtxcti

Deep Template Matching for Small-Footprint and Configurable Keyword Spotting

Peng Zhang, Xueliang Zhang
2020 Interspeech 2020  
Keyword spotting (KWS) is a very important technique for human-machine interaction to detect a trigger phrase and voice commands.  ...  In this paper, we propose a novel template matching approach for KWS based on end-to-end deep learning method, which utilizes an attention mechanism to match the input voice to the keyword templates in  ...  Some previously conducted studies replaced the HMM by recurrent neural network models trained with a connectionist temporal classification criterion [2] or by an attention-based model [3] .  ... 
doi:10.21437/interspeech.2020-1761 dblp:conf/interspeech/ZhangZ20 fatcat:5q2fk57e3vbvrjqydakdietle4

QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer [article]

Jinmiao Huang, Waseem Gharbieh, Qianhui Wan, Han Suk Shim, Chul Lee
2022 arXiv   pre-print
Current keyword spotting systems are typically trained with a large amount of pre-defined keywords.  ...  Towards this goal, we propose a pure MLP-based neural network that is based on MLPMixer - an MLP model architecture that effectively replaces the attention mechanism in Vision Transformers.  ...  Over the years, different neural networks have been proposed for the fixed keyword spotting (KWS) task: for example, using Deep Neural Networks (DNNs) [1, 2] , Time Delay Neural Networks [3] , Convolutional  ... 
arXiv:2206.13231v1 fatcat:drwjgpz4ibcdblkjh365oslj6m

A neural attention model for speech command recognition [article]

Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener Da Silva Viana, Christoph Bernkopf
2018 arXiv   pre-print
The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint  ...  by the network when outputting a given category.  ...  Command recognition using deep residual networks has been investigated in Tang and Lin (2017) , Arik et al. (2017) and Sainath and Parada (2015) .  ... 
arXiv:1808.08929v1 fatcat:lvctjdpay5fjfcn6lsvx4ezztm

Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation

Zhiying Huang, Shiliang Zhang, Ming Lei
2019 Interspeech 2019  
Convolutional neural network (CNN) is the most popular choice among a wide variety of model structures, and it's successfully applied to audio events prediction task.  ...  Meanwhile, an audio-to-audio ratio (AAR) based data augmentation method is proposed to further improve the classifier performance.  ...  Thereby, it is promoted to many other tasks, such as text to speech (TTS) [26] and smaller footprint keyword spotting (KWS) [27] . In this paper, we propose to use cFSMN in audio tagging.  ... 
doi:10.21437/interspeech.2019-1302 dblp:conf/interspeech/HuangZL19 fatcat:5oyxjqlqcbazthsfguwn2q4soq

Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario

Łukasz Lepak, Kacper Radzikowski, Robert Nowak, Karol J. Piczak
2021 Sensors  
As our primary approach, we evaluate Siamese and prototypical neural networks trained on several datasets of English and Polish recordings.  ...  Models for keyword spotting in continuous recordings can significantly improve the experience of navigating vast libraries of audio recordings.  ...  Voice Datasets Various voice datasets suitable for speech recognition and keyword spotting tasks are used for training neural network models.  ... 
doi:10.3390/s21248313 pmid:34960407 pmcid:PMC8704929 fatcat:2nlurls7hbehhcgbu32emkz5wq

An Optimized Recurrent Unit for Ultra-Low-Power Keyword Spotting [article]

Justice Amoh, Kofi Odame
2019 arXiv   pre-print
Our new architecture, the Embedded Gated Recurrent Unit (eGRU) is demonstrated to be highly efficient and suitable for short-duration AED and keyword spotting tasks.  ...  There is growing interest in being able to run neural networks on sensors, wearables and internet-of-things (IoT) devices.  ...  Another idea for tackling for lengthy sequences is to use eGRU in a sequence-to-sequence network architecture [36] with a sequential loss function such as the connectionist temporal classifier (CTC)  ... 
arXiv:1902.05026v1 fatcat:xth36fjcyrgupac2nbadj6iwyi

Brain-Inspired Learning on Neuromorphic Substrates

Friedemann Zenke, Emre O. Neftci
2021 Proceedings of the IEEE  
KEYWORDS | Artificial neural networks; biological neural networks; learning systems; machine learning; neural network hardware; neuromorphic engineering; recurrent neural networks (RNNs).  ...  | Neuromorphic hardware strives to emulate brain-like neural networks and thus holds the promise for scalable, low-power information processing on temporal data streams.  ...  RNNs have proved highly effective for sequential processing, such as keyword spotting, object recognition, and time-series forecasting [7] , [43] .  ... 
doi:10.1109/jproc.2020.3045625 fatcat:pelkbpbg5jg7pjyvkvtpgrt2su

Brain-Inspired Learning on Neuromorphic Substrates [article]

Friedemann Zenke, Emre O. Neftci
2020 arXiv   pre-print
Neuromorphic hardware strives to emulate brain-like neural networks and thus holds the promise for scalable, low-power information processing on temporal data streams.  ...  learning rules for training Spiking Neural Networks (SNNs).  ...  These dynamics are different from the majority of deep neural networks, which are often strictly feedforward, and lack the fine temporal dynamics of brains.  ... 
arXiv:2010.11931v1 fatcat:e7bwgrmynvgmfkuordiqb3zusq

A first look into the carbon footprint of federated learning [article]

Xinchi Qiu, Titouan Parcollet, Javier Fernandez-Marques, Pedro Porto Buarque de Gusmao, Yan Gao, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane
2022 arXiv   pre-print
We performed extensive experiments across different types of datasets, settings and various deep learning models with FL.  ...  Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers.  ...  Acknowledgments This work was supported by the UK's Engineering and Physical Sciences Research Council (EPSRC) with grants EP/M50659X/1 and EP/S001530/1 and the European Research Council via the REDIAL  ... 
arXiv:2102.07627v5 fatcat:zzeqxhkwtzeqbgaoyrp2lbj4y4

Developing Far-Field Speaker System Via Teacher-Student Learning [article]

Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye,, Yifan Gong
2018 arXiv   pre-print
In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.  ...  We also use T/S learning to compress a large-size KWS model into a small-size one to fit the device computational cost.  ...  Given the recent success of end-to-end modeling, we used the connectionist temporal classification (CTC) approach [25] [26] for KWS [27] .  ... 
arXiv:1804.05166v1 fatcat:ct66kb5l7bgsnjuvha6c4vrwgq

A review of machine learning in processing remote sensing data for mineral exploration [article]

Hojat Shirmard, Ehsan Farahbakhsh, Dietmar Muller, Rohitash Chandra
2021 arXiv   pre-print
Moreover, these methods are robust in processing spectral and ground truth measurements against noise and uncertainties.  ...  This paper reviews the implementation and adaptation of some popular and recently established machine learning methods for remote sensing data processing and investigates their applications for exploring  ...  on this topic using dif- can accurately and efficiently classify remotely sensed imagery ferent keywords.  ... 
arXiv:2103.07678v1 fatcat:srlddi22grhofiwxw2qpetee7q

Tools and resources for Romanian text-to-speech and speech-to-text applications [article]

Tiberiu Boros, Stefan Daniel Dumitrescu, Vasile Pais
2018 arXiv   pre-print
While the tools are general purpose and can be used for any language (we successfully trained our system for more than 50 languages and participated in the Universal Dependencies Shared Task), the resources  ...  In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.  ...  As we are currently working on a neural-based speech recognition and keyword spotting tool for Romanian, providing pre-trained models on the entire speech corpus will not be a problem and will mitigate  ... 
arXiv:1802.05583v1 fatcat:4ubjkvjr4bcvhkdcwstx2shsji

Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook

Mike Davies, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, Sumedh R. Risbud
2021 Proceedings of the IEEE  
KEYWORDS | Computer architecture; neural network hardware; neuromorphics. I.  ...  While conventional feedforward deep neural networks show modest if any benefit on Loihi, more brain-inspired networks using recurrence, precise spike-timing relationships, synaptic plasticity, stochasticity  ...  and direct neural probes, drone-based structural health monitoring, and numerous audio applications, such as low-power keyword spotting, speech recognition, speaker identification, denoising, and sound  ... 
doi:10.1109/jproc.2021.3067593 fatcat:krqdmy3u6jdvfl7btjglek5ag4
« Previous Showing results 1 — 15 out of 37 results