576 Hits in 3.1 sec

Dual-input Control Interface for Deep Neural Network Based on Image/Speech Recognition

Neng-Sheng Pai, Yi-Hsun Chen, Chin-Pao Hung, Pi-Yun Chen, Ying-Che Kuo, Jun-Yu Chen
2019 Sensors and materials  
/audio recognition consisting of two input interface systems, hand posture and speech recognition, with the use of specific hand postures or voice commands for control without the need for wearable devices  ...  The speech feature parameters were then input to the LSTM neural network to make predictions and achieve speech recognition.  ...  A CNN and LSTM were used to achieve hand posture and voice recognition. This made control by a specific posture or voice command possible without the need for a wearable device.  ... 
doi:10.18494/sam.2019.2481 fatcat:jeqdiylx7jfg7d7r4hugcfx42e

A 2D Convolutional Gating Mechanism for Mandarin Streaming Speech Recognition

Xintong Wang, Chuangang Zhao
2021 Information  
Recent research shows recurrent neural network-Transducer (RNN-T) architecture has become a mainstream approach for streaming speech recognition.  ...  In this work, we investigate the VGG2 network as the input layer to the RNN-T in streaming speech recognition.  ...  We combine convolutional networks with LSTM as the Encoder of Transducer to build a low-latency streaming speech recognition model.  ... 
doi:10.3390/info12040165 fatcat:fdjg3ttx6bf7zkez3gefsxgid4

Effective Combination of DenseNet and BiLSTM for Keyword Spotting

Mengjun Zeng, Nanfeng Xiao
2019 IEEE Access  
DenseNet-BiLSTM is able to achieve the accuracy of 96.6% for the 20-commands recognition task with 223K trainable parameters.  ...  INDEX TERMS Keyword spotting, speech recognition, DenseNet, long short-term memory, attention mechanism.  ...  Therefore, for KWS, it is very important to obtain a good accuracy of speech command recognition.  ... 
doi:10.1109/access.2019.2891838 fatcat:7cpgawushfe37pmosyrhxnhz6e

Dynamic Hand Gesture Recognition for Wearable Devices with Low Complexity Recurrent Neural Networks [article]

Sungho Shin, Wonyong Sung
2016 arXiv   pre-print
Gesture recognition is a very essential technology for many wearable devices.  ...  One is based on video signal and employs a combined structure of a convolutional neural network (CNN) and an RNN. The other uses accelerometer data and only requires an RNN.  ...  Although speech recognition can be more versatile, the gesture recognition can also be conveniently used for issuing simple commands.  ... 
arXiv:1608.04080v1 fatcat:5qgoufornje5tlw4lg2tkrz5yi

An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task

Muhammad Babar Kamal, Arfat Ahmad Khan, Faizan Ahmed Khan, Malik Muhammad Ali Shahid, Chitapong Wechtaisong, Muhammad Daud Kamal, Muhammad Junaid Ali, Peerapong Uthansakul
2022 Computers Materials & Continua  
This paper presents a novel end-to-end binary view transformer-based architecture for speech recognition to cope with the frequency resolution problem.  ...  The proposed system has generated robust results on Google's speech command dataset with an accuracy of 95.16% and with minimal loss.  ...  ., Long-Short-Term-Memory (LSTM), and Gated-Recurrent-Unit (GRU) allow the machine to process sequential data models, such as speech recognition.  ... 
doi:10.32604/cmc.2022.024590 fatcat:ztetxu3wxfh7ljjtl7t3ubep3u

Hello Edge: Keyword Spotting on Microcontrollers [article]

Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra
2018 arXiv   pre-print
Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience.  ...  Recently, neural networks have become an attractive choice for KWS architecture because of their superior accuracy compared to traditional speech processing algorithms.  ...  We would also like to thank Pete Warden from Google's TensorFlow team for his valuable inputs and feedback on this project.  ... 
arXiv:1711.07128v3 fatcat:swrltzaqc5hvjay7ofrx3r4lwy

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Martin Wöllmer, Felix Weninger, Jürgen Geiger, Björn Schuller, Gerhard Rigoll
2013 Computer Speech and Language  
This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR  ...  We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization.  ...  The authors would like to thank Jort Gemmeke, Antti Hurmalainen, and Tuomas Virtanen for providing source code for non-negative sparse classification.  ... 
doi:10.1016/j.csl.2012.05.002 fatcat:g2n3fjgcsvdj5n35vgn5tp2ocm

Deep Learning for Intelligent Exploration of Image Details

Okanti Apoorva, Y.Mohan Sainath, G.Mallikarjuna Rao
2017 International Journal of Computer Applications Technology and Research  
Most of the approaches involve the use of very large convolution neural networks (CNN) for the object detection in the photographs and then a recurrent neural network (RNN) like an LSTM (Long short-term  ...  Once you can detect objects in photographs and generate labels for those objects, you can see that the next step is to turn those labels into a coherent sentence description.  ...  Acknowledgement: We are thankful to our beloved director P.S Raju, Principal J.N Murthy for their kind support.  ... 
doi:10.7753/ijcatr0607.1012 fatcat:sxw5g23eafdmvg6jwdicpddvri

A review of on-device fully neural end-to-end automatic speech recognition algorithms [article]

Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, Jiyeon Kim, Ankur Kumar, Sungsoo Kim, Abhinav Garg, Changwoo Han
2021 arXiv   pre-print
In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications.  ...  To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed.  ...  For on-device command recognition, shallow-fusion with a WFST is also useful for specific domains since WFST contains a list of words not just subword units [41] .  ... 
arXiv:2012.07974v3 fatcat:uxpxqcgcvvg7dfrkl2rxekkmse

EdgeRNN: A Compact Speech Recognition Network with Spatio-temporal Features for Edge Computing

Shunzhi Yang, Zheng Gong, Kai Ye, Yungen Wei, Zhenhua Huang, Zheng Huang
2020 IEEE Access  
Speech keywords recognition uses Google's Speech Commands Datasets V1 with a weighted average recall (WAR) of 96.82%.  ...  In this paper, we propose a compact speech recognition network with spatio-temporal features for edge computing, named EdgeRNN.  ...  [35] take advantage of CNN and Long Short Term Memory (LSTM) for speech emotion recognition. Wang [36] combined CNN and Gated Recurrent Unit (GRU) for hate speech recognition.  ... 
doi:10.1109/access.2020.2990974 fatcat:quhnet2xkbbd7nqfrtgsujr3na

Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?

Felix Weninger, Martin Wollmer, Jurgen Geiger, Bjorn Schuller, Jort F. Gemmeke, Antti Hurmalainen, Tuomas Virtanen, Gerhard Rigoll
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
matrix factorization (NMF) for speech enhancement.  ...  This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant  ...  Every memory block consists of self-connected memory cells and three multiplicative gate units (input, output, and forget gates). Further details on the LSTM principle can be found in [10] .  ... 
doi:10.1109/icassp.2012.6288963 dblp:conf/icassp/WeningerWGSGHVR12 fatcat:glett5cf5rf7pcgom43k6ys4by

Attention Incorporate Network: A network can adapt various data size [article]

Liangbo He, Hao Sun
2018 arXiv   pre-print
Sequence model(RNN, LSTM, etc.) can accept different size of input like text and audio.  ...  But one disadvantage for sequence model is that the previous information will become more fragmentary during the transfer in time step, it will make the network hard to train especially for long sequential  ...  Audio recognition Speech Command. Speech Commands dataset, which consists of 65,000 wave audio files of people saying 30 different words.  ... 
arXiv:1806.03961v1 fatcat:v63zszkeabc2pchr7gxx5z42km

Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition

Tessfu Geteye Fantaye, Junqing Yu, Tulu Tilahun Hailu
2020 Computers  
Of these networks, convolutional neural network (CNN) is an effective network for representing the local properties of the speech formants.  ...  Deep neural networks (DNNs) have shown a great achievement in acoustic modeling for speech recognition task.  ...  [27] investigated the GCNN model for speech command recognition and small-scale speech recognition tasks, respectively.  ... 
doi:10.3390/computers9020036 fatcat:k54s5pj7grggffsbibjpm5jk2q

A Review of Deep Learning Research

2019 KSII Transactions on Internet and Information Systems  
recognition and online advertising and so on.  ...  of big data, deep learning technology has become an important research direction in the field of machine learning, which has been widely applied in the image processing, natural language processing, speech  ...  Acknowledgements We thank the anonymous referees for their helpful comments and suggestions on the initial version of this paper.  ... 
doi:10.3837/tiis.2019.04.001 fatcat:tefkvk3fvvanbkzwmjn44eoxsu

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Chhayarani Ram Kinkar, Yogendra Kumar Jain
2021 Information Technology and Control  
The presented paper proposes a new speech command recognition model for novel engineering applications with limited resources.  ...  The recognition accuracy of the proposed model is 96% on Google's speech command dataset, and on laboratory recording, its recognition accuracy is 89%.  ...  Figure 2 Basic CRNN for speech command recognition (FC is abbreviation of fully connected layer) Speech Command Recognition Using CRNN The convolutional and recurrent layers of CRNN process the input  ... 
doi:10.5755/j01.itc.50.4.27352 fatcat:xlu72fxblzfrdpria5rlicoymm
« Previous Showing results 1 — 15 out of 576 results