Filters








394,626 Hits in 4.2 sec

Modeling human activities as speech

Chia-Chih Chen, J. K. Aggarwal
2011 CVPR 2011  
While the essence of the speech signal is the variation of air pressure in time, our method models activities as the likelihood time series of action associated local interest patterns.  ...  Human activity recognition and speech recognition appear to be two loosely related research areas.  ...  We are motivated to model human activities as speech due to the analogies between their production mechanisms.  ... 
doi:10.1109/cvpr.2011.5995555 dblp:conf/cvpr/ChenA11 fatcat:cfakbtxj4rbrjdn2gbw633urhi

Communication culture and speech etiquette

Z Uteshova
2022 Ренессанс в парадигме новаций образования и технологий в XXI веке  
In turn, Yakubinsky noted human speech activity as a diverse phenomenon, determined by all the complex variety of factors and functions [8, 17-58].  ...  Therefore, the socialization of the personality takes place, during which the child's thinking and models of his behavior are formed, therefore the social function of the language as a means of communication  ...  In turn, Yakubinsky noted human speech activity as a diverse phenomenon, determined by all the complex variety of factors and functions [8, .  ... 
doi:10.47689/innovations-in-edu-vol-iss1-pp39-40 fatcat:mjcdlaqzzna6vo7b22kjtwt3a4

Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants

Ifat Yasin, Vit Drga, Fangqi Liu, Andreas Demosthenous, Ray Meddis
2020 IEEE Access  
In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in  ...  The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of  ...  In general, the speech recognition accuracy obtained is lower than that observable for a human listener (the human-machine speech gap), as seen in a study of human listeners' performance on the same speech  ... 
doi:10.1109/access.2020.2981885 fatcat:vly5mjpde5exjejlp2xerldxgy

Visualizing Phoneme Category Adaptation in Deep Neural Networks

Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson, Najim Dehak
2018 Interspeech 2018  
ability to serve as a model of human perceptual learning.  ...  The aim of this paper is two-fold: investigate whether a deep neural network-based (DNN) ASR system can adapt to only a few examples of ambiguous speech as humans have been found to do; investigate a DNN's  ...  , and show that DNNs can be used as a way to investigate human speech processing.  ... 
doi:10.21437/interspeech.2018-1707 dblp:conf/interspeech/ScharenborgTHD18 fatcat:mr47gspfjvat7i7m7y62g2ugx4

Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

Odette Scharenborg, Louis ten Bosch, Lou Boves, Dennis Norris
2003 Journal of the Acoustical Society of America  
Experiments based on "real-life" speech highlight critical limitations posed by some of the simplifying assumptions made in models of human speech recognition.  ...  This letter evaluates potential benefits of combining human speech recognition ͑HSR͒ and automatic speech recognition by building a joint model of an automatic phone recognizer ͑APR͒ and a computational  ...  referred to as ''joint model''͒ that can be regarded as an end-to-end model of human speech recognition.  ... 
doi:10.1121/1.1624065 pmid:14714783 fatcat:qkohvlql3jdpxku23u64bxjjiu

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies [article]

Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi
2018 arXiv   pre-print
Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization.  ...  The labels in the dataset annotate three different speech activity conditions: clean speech, speech co-occurring with music, and speech co-occurring with noise, which enable analysis of model performance  ...  instant and keep the max score as the model prediction for Speech-Active.  ... 
arXiv:1808.00606v2 fatcat:ipttxwsxrnchjkbmv3conxc2hq

Deep Residual Local Feature Learning for Speech Emotion Recognition [chapter]

Sattaya Singkul, Thakorn Chatchaisathaporn, Boontawee Suntisrivaraporn, Kuntpong Woraratpanya
2020 Lecture Notes in Computer Science  
Speech Emotion Recognition (SER) is becoming a key role in global business today to improve service efficiency, like call center services. Recent SERs were based on a deep learning approach.  ...  detail in deeper layers using residual learning for solving vanishing gradient and reducing overfitting; and MLP is adopted to find the relationship of learning and discover probability for predicted speech  ...  Here, we briefly describe three important components of speech signals: glottal flow, prosody, and human hearing. Glottal flow can be viewed as a source of speech signals [25] .  ... 
doi:10.1007/978-3-030-63830-6_21 fatcat:26tbej4bmfe5zer5kuav2c7oky

From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems

Izzet B. Yildiz, Katharina von Kriegstein, Stefan J. Kiebel, Viktor K. Jirsa
2013 PLoS Computational Biology  
level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech.  ...  We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions.  ...  Conceptual overview: A generative model of human speech As a model, we employ a novel Bayesian recognition method of dynamical sensory input such as birdsong and speech.  ... 
doi:10.1371/journal.pcbi.1003219 pmid:24068902 pmcid:PMC3772045 fatcat:wriv3xx3rverrjjzrcgxvt6mcy

EARSHOT: A Minimal Neural Network Model of Incremental Human Speech Recognition

James S Magnuson, Heejo You, Sahil Luthra, Monica Li, Hosung Nam, Monty Escabí, Kevin Brown, Paul D Allopenna, Rachel M Theodore, Nicholas Monto, Jay G Rueckl
2020 Cognitive Science  
Most models of human speech recognition (HSR) have side-stepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech.  ...  This allows the model to learn to map real speech from multiple talkers to semantic targets with high accuracy, with human-like timecourse of lexical access and phonological competition.  ...  We thank Eddie Chang and Nima Mesgarani for supplying us with data from Mesgarani et al. (2014) used to compare EARSHOT and human STG responses.  ... 
doi:10.1111/cogs.12823 pmid:32274861 fatcat:fwhc7ud7xza5xdmwbw55nj7akq

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi
2018 Interspeech 2018  
Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization.  ...  The labels in the dataset annotate three different speech activity conditions: clean speech, speech co-occurring with music, and speech cooccurring with noise, which enable analysis of model performance  ...  instant and keep the max score as the model prediction for Speech-Active.  ... 
doi:10.21437/interspeech.2018-2028 dblp:conf/interspeech/ChaudhuriREGKMP18 fatcat:tkldyaebb5cj5peipc5s4xxwqy

Exploring the Dependencies between Behavioral and Neuro-physiological Time-series Extracted from Conversations between Humans and Artificial Agents

Hmamouche Youssef, Ochs Magalie, Prévot Laurent, Chaminade Thierry
2020 Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods  
The second step consists in applying machine learning models to predict brain activity on the basis of various aspects of behavior given knowledge about the functional role of the areas under scrutiny.  ...  Here, we use a unique corpus including fMRI and behavior recorded when participants discussed with a human or a conversational robot.  ...  These recordings include speech produced by the two interlocutors, as well as eyetracking signals of the participant while viewing videos of the human or artificial interlocutor.  ... 
doi:10.5220/0008989503530360 dblp:conf/icpram/HmamoucheO0C20 fatcat:ykrea3wqgffc5mji63f7lkae5y

A neural theory of speech acquisition and production

Frank H. Guenther, Tony Vladusich
2012 Journal of Neurolinguistics  
The DIVA model thus provides a well-defined framework for guiding the interpretation of experimental results related to the putative human speech mirror system.  ...  As the DIVA model is defined both computationally and anatomically, it is ideal for generating precise predictions concerning speechrelated brain activation patterns observed during functional imaging  ...  Simulated fMRI activations from the DIVA model when performing the same speech task as the subjects in the fMRI experiment.  ... 
doi:10.1016/j.jneuroling.2009.08.006 pmid:22711978 pmcid:PMC3375605 fatcat:axonezm2n5bytkfr6kmf4g2qsm

Modeling human word recognition with sequences of artificial neurons [chapter]

P. Wittenburg, D. Kuijk, T. Dijkstra
1996 Lecture Notes in Computer Science  
A new psycholinguistically motivated and neural network based model of human word recognition is presented. In contrast to earlier models it uses real speech as input.  ...  In experiments with a small lexicon which includes groups of very similar word forms, the model meets high standards with respect to word recognition and simulates a number of wellknown psycholinguistical  ...  Therefore, the RAW-model (Real-speech model for Auditory Word recognition) was designed to serve as a starting point for a simulation lab which combines the use of real speech and the implementation of  ... 
doi:10.1007/3-540-61510-5_61 fatcat:ym4gtt4w3rf6tfwymzap75ain4

Repetition enhancement to voice identities in the dog brain

Marianna Boros, Anna Gábor, Dóra Szabó, Anett Bozsik, Márta Gácsi, Ferenc Szalay, Tamás Faragó, Attila Andics
2020 Scientific Reports  
In the human speech signal, cues of speech sounds and voice identities are conflated, but they are processed separately in the human brain.  ...  The processing of speech sounds and voice identities is typically performed by non-primary auditory regions in humans and non-human primates.  ...  The dog auditory cortex is therefore not as tuned to human vocalizations as the human auditory cortex is.  ... 
doi:10.1038/s41598-020-60395-7 pmid:32132562 pmcid:PMC7055288 fatcat:3vs6yyogujeqfkkf6qxya2wtzm

Open challenges in understanding development and evolution of speech forms: The roles of embodied self-organization, motivation and active exploration

Pierre-Yves Oudeyer
2015 Journal of Phonetics  
In particular, we emphasize the importance of embodied self-organization , as well as the role of mechanisms of motivation and active curiosity-driven exploration in speech formation.  ...  Based on the analysis of mathematical models of the origins of speech forms, with a focus on their assumptions , we study the fundamental question of how speech can be formed out of non--speech, at both  ...  Thus, the model relies on a pre--existing set of linguistic abilities, as well as abstracts away from many non--linguistic processes, such as sensorimotor development outside speech, non--linguistic activities  ... 
doi:10.1016/j.wocn.2015.09.001 fatcat:zwjgd3tbnvfz7fodancabckfru
« Previous Showing results 1 — 15 out of 394,626 results