4,161 Hits in 5.0 sec

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases [article]

Subhashini Venugopalan, Joel Shor, Manoj Plakal, Jimmy Tobin, Katrin Tomanek, Jordan R. Green, Michael P. Brenner
2021 arXiv   pre-print
Here, we develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases.  ...  Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment.  ...  We thank Katie Seaver for assessing and labeling a portion of the speech samples, Aren Jansen for advice on the CNN-ResNetish model, Shanqing Cai and Dick Lyon for reviews on a draft of this work, and  ... 
arXiv:2107.03985v1 fatcat:kad3cznlj5b75lulhhmzamvrze

Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition [article]

Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang
2022 arXiv   pre-print
The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.  ...  Compared to using Fbank features, XLSR-based features reduced WERs by 6.8%, 22.0%, and 7.0% for the UASpeech, PC-GITA, and EasyCall corpus, respectively.  ...  Speech representation learning models such as Wav2Vec2.0 [2] , and Hubert [3] , have shown that learned representations produce state-of-the-art results on a variety of speech tasks: Speaker and language  ... 
arXiv:2204.01670v1 fatcat:prnagcdntvcwnmvsvvdskgve4i

A Robust Isolated Automatic Speech Recognition System using Machine Learning Techniques

The basic stages of speech recognition system are pre-processing, feature extraction and feature selection and classification.  ...  For example speech recognition, speaker verification and speaker recognition.  ...  There are four basic methods of learning on the basis of machine gaining knowledge to respond correctly are:  Supervised learning  Un-supervised learning  Semi-supervised learning  Active learning  ... 
doi:10.35940/ijitee.j8765.0881019 fatcat:n7vfdkeehfavzf2utjlpuggiui

An Extensive Review of Feature Extraction Techniques, Challenges and Trends in Automatic Speech Recognition

Vidyashree Kanabur, Sunil S Harakannanavar, Dattaprasad Torse
2019 International Journal of Image Graphics and Signal Processing  
In order to recognize the areas of further research in ASR, one must be aware of the current approaches, challenges faced by each and issues that needs to be addressed.  ...  Therefore, in this paper human speech production mechanism is discussed. The various speech recognition techniques and models are addressed in detail.  ...  In binary SVM, features are classified into two classes, each class for recognized and unrecognized speaker. Supervised Learning Method Simple operation.  ... 
doi:10.5815/ijigsp.2019.05.01 fatcat:3uidt4wvofffvmuqlnanaegzjq

Simulating dysarthric speech for training data augmentation in clinical speech applications [article]

Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss
2018 arXiv   pre-print
Training machine learning algorithms for speech applications requires large, labeled training data sets.  ...  We evaluate the efficacy of our approach using both objective and subjective criteria.  ...  Due to a lack of data, machine learning models used in the study of pathological speech are typically limited to simple unsupervised metrics [13] , or flat supervised models [14] [15] .  ... 
arXiv:1804.10325v1 fatcat:txbs6yfjpfegpddbl6qo7udiym

Efficient Collection and Representation of Preverbal Data in Typical and Atypical Development

Florian B. Pokorny, Katrin D. Bartl-Pokorny, Dajie Zhang, Peter B. Marschik, Dagmar Schuller, Björn W. Schuller
2020 Journal of nonverbal behavior  
In this paper, we give a methodological overview of current strategies for collecting and acoustically representing preverbal data for intelligent audio analysis paradigms.  ...  Efficiency in the context of data collection and data representation is discussed.  ...  In contrast, dynamic modeling was investigated by applying a neural network classifier on the basis of the LLDs of the ComParE set.  ... 
doi:10.1007/s10919-020-00332-4 pmid:33088008 pmcid:PMC7561537 fatcat:goinklywmnaljpgnaqd5rbs3tq

2020 Index IEEE Transactions on Affective Computing Vol. 11

2021 IEEE Transactions on Affective Computing  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  Departments and other items may also be covered if they have been judged to have archival value. The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, T-AFFC July-Emotion Recognition on Twitter: Comparative Study and Training a Unison Model.  ... 
doi:10.1109/taffc.2021.3055662 fatcat:het65admgnbbvn4fdzdgmftuqu

A Review of Automated Speech and Language Features for Assessment of Cognition and Thought Disorders

Rohit Nihar Uttam Voleti, Julie Liss, Visar Berisha
2019 IEEE Journal on Selected Topics in Signal Processing  
This work relies on extracting a set of features from recorded and transcribed speech for objective assessments of speech and language, early diagnosis of neurological disease, and tracking of disease  ...  With an emphasis on cognitive and thought disorders, in this paper we provide a review of existing speech and language features used in this domain, discuss their clinical application, and highlight their  ...  For classifying clinical (patients with schizophrenia and bipolar I disorder) subjects and healthy control subjects, the selected feature subset achieved receiver operating characteristic (ROC) area under  ... 
doi:10.1109/jstsp.2019.2952087 pmid:33907590 pmcid:PMC8074691 fatcat:a6t24cpp6jbdxbxq5wzd3uz6jq

Us and them: identifying cyber hate on Twitter across multiple protected characteristics

Pete Burnap, Matthew L Williams
2016 EPJ Data Science  
To support the automatic detection of cyber hate online, specifically on Twitter, we build multiple individual models to classify cyber hate for a range of protected characteristics including race, disability  ...  for different types of cyber hate beyond the use of a Bag of Words and known hateful terms.  ...  supervised machine classifiers, and is based on human agreement on which class a piece of text belongs to.  ... 
doi:10.1140/epjds/s13688-016-0072-6 pmid:32355598 pmcid:PMC7175598 fatcat:r55bks5wazhw3ligikn5d2ptze

Classification of Speech Dysfluencies Using Speech Parameterization Techniques and Multiclass SVM [chapter]

P. Mahesha, D. S. Vinod
2013 Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering  
Stuttering is a fluency disorder characterized by the occurrences of dysfluencies in normal flow of speech, such as repetitions, prolongations and interjection and so on.  ...  It is one of the serious problems in speech pathology.  ...  It is supervised learning technique that uses a labeled data set for training and tries to find a decision function that classifies best the training data.  ... 
doi:10.1007/978-3-642-37949-9_26 fatcat:v2mpro4r3nbkdpe2oir4cu2vyy

Fast screening for children's developmental language disorders via comprehensive speech ability evaluation—using a novel deep learning framework

Xing Zhang, Feng Qin, Zelin Chen, Leyan Gao, Guoxin Qiu, Shuo Lu
2020 Annals of Translational Medicine  
Developmental language disorders (DLDs) are the most common developmental disorders in children. For screening DLDs, speech ability (SA) is one of the most important indicators.  ...  In this paper, we propose a solution for the fast screening of children's DLDs based on a comprehensive SA evaluation and a deep framework of machine learning.  ...  a rough SA supervision for our deep model.  ... 
doi:10.21037/atm-19-3097 pmid:32617327 pmcid:PMC7327328 fatcat:55yavenqejee5dciq5tiz3d3zi

An Information Retrieval Approach to Building Datasets for Hate Speech Detection [article]

Md Mustafizur Rahman, Dinesh Balakrishnan, Dhiraj Murthy, Mucahid Kutlu, Matthew Lease
2021 arXiv   pre-print
To intelligently and efficiently select which tweets to annotate, we apply standard IR techniques of pooling and active learning.  ...  Annotator rationales we collect not only justify labeling decisions but also enable future work opportunities for dual-supervision and/or explanation generation in modeling.  ...  We thank the many talented Amazon Mechanical Turk workers who contributed to our study and made it possible.  ... 
arXiv:2106.09775v3 fatcat:56cg2t7nwbfe3lwdqw7z2eqjoy

Survey on Deep Neural Networks in Speech and Vision Systems [article]

Mahbubul Alam, Manar D. Samad, Lasitha Vidyaratne, Alexander Glandon,, Khan M. Iftekharuddin
2019 arXiv   pre-print
This survey begins by providing background and evolution of some of the most successful deep learning models for intelligent vision and speech systems to date.  ...  To our knowledge, this paper provides one of the most comprehensive surveys on the latest developments in intelligent vision and speech applications from the perspectives of both software and hardware  ...  ACKNOWLEDGMENT The authors would like to acknowledge partial funding of this work by the National Science Foundation (NSF) through a grant (Award# ECCS 1310353) and the National Institute of Health (NIH  ... 
arXiv:1908.07656v2 fatcat:7acubicqzzac3dqemkiccoogm4

The Use of Machine Learning Algorithms in the Classification of Sound

2022 International Journal of Service Science Management Engineering and Technology  
With regard to Ecoacoustics, studies on extreme events such as tornadoes and earthquakes for early detection and warning systems were lacking.  ...  This study is a systematic review of literature on the classification of sounds in three domains - Bioacoustics, Biomedical acoustics, and Ecoacoustics.  ...  Additionally, a semi-supervised learning technique called active learning was used to minimize the demand for human descriptions on sound classification training models . Figure 6.  ... 
doi:10.4018/ijssmet.298667 fatcat:gygkfeoblrfihjzxlm46a7labe

Modeling the Progression of Speech Deficits in Cerebellar Ataxia using a Mixture Mixed-effect Machine Learning Framework

Bipasha Kashyap, Pubudu N. Pathirana, Malcolm Horne, Laura Power, David J Szmulewicz
2021 IEEE Access  
ACKNOWLEDGMENT The authors would like to thank the Royal Victorian Eye and Ear Hospital (RVEEH), the Florey Institute of Neuroscience and Mental Health, Melbourne, Australia and CSIRO Data61 for their  ...  : hereafter 'RT')). 2) Speech Task 2: Utter the phrase British Constitution (BC: a classical phrase for eliciting the features of ataxic speech) thrice.  ...  exploitation of the selected features' change over two timepoints. 3) Classify the subjects into two groups based on the mixture extensions of the multivariate model.  ... 
doi:10.1109/access.2021.3114328 fatcat:vs2iawvdwvbg3gf7xsfg6gbfly
« Previous Showing results 1 — 15 out of 4,161 results