Filters








11,127 Hits in 7.3 sec

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

June-Woo Kim, Hyekyung Yoon, Ho-Young Jung
2021 IEEE Access  
We extract speech-based linguistic information using K-means clustering to combine linguistic information for corresponding feature frames from speech data.  ...  To this end, we introduced linguistic-coupled information through the unsupervised phonology clustering method and proposed the age-to-age voice translation using the linguistic-coupled information to  ...  For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/  ... 
doi:10.1109/access.2021.3115608 fatcat:ublcbcyz2fhrtfamj7x5li7viy

Response type selection for chat-like spoken dialog systems based on LSTM and multi-task learning

Kengo Ohta, Ryota Nishimura, Norihide Kitaoka
2021 Speech Communication  
Nishimura and N. Kitaoka, Response type selection for chat-like spoken dialog systems based on LSTM and multi-task learning. Speech Communication (2021), doi: https://doi.  ...  As a response type selector, we propose an LSTM-based encoder-decoder framework utilizing acoustic and linguistic features extracted from input utterances.  ...  Acknowledgments This work was supported in part by the Strategic Information and Com-  ... 
doi:10.1016/j.specom.2021.07.003 fatcat:owbm3rkqqfbifcfhyclbq2azzi

INFORMATION EXTRACTION FROM SPEECH

John Makhoul
2006 2006 IEEE Spoken Language Technology Workshop  
now commonly usedFor names and non-nested entity mentions, tagging models can be used -For example: Person-Name-Start, Organization-Name-Continue, Not-A-Name• Coreference requires combining non-local  ...  efficient Machine Translation • Approach -Discriminative methods • Neural Network for acoustic features • Perceptron for linguistic features -Sentence length constraints • Statistical duration  ...  37 Levels of Linguistic Analysis  ... 
doi:10.1109/slt.2006.326780 dblp:conf/slt/Makhoul06 fatcat:aqj3ovbcufg3bpeu5bakfzqkxa

Text To Speech Synthesis for Afaan Oromoo Language Using Deep Learning Approach

2022 New Media and Mass Communication  
The Festival toolkit is used for texts normalized in linguistic extraction from label phoneme alignment to match with speech corpus in trains and tests.  ...  The normalized text was used for linguistic features are extracted by using Festival toolkit for Afaan Oromoo TTS.  ...  Speech parameter based are used for intelligibility is 4.3 mechanism, spoken Synthesis for on Hidden training and 10 and naturalness 4.1 language non- Afaan Markov Model sentences for of the speech standard  ... 
doi:10.7176/nmmc/101-02 fatcat:jswi5bxwgffnnn2qpxurhr7i5e

Emotional Voice Conversion With Cycle-consistent Adversarial Network [article]

Songxiang Liu, Yuewen Cao, Helen Meng
2020 arXiv   pre-print
Emotional Voice Conversion, or emotional VC, is a technique of converting speech from one emotion state into another one, keeping the basic linguistic information and speaker identity.  ...  Recently, cycle-consistent generative adversarial networks (CycleGAN) have been used successfully for non-parallel VC. This paper investigates the efficacy of using CycleGAN for emotional VC tasks.  ...  INTRODUCTION Human speech is a complex signal that contains rich information, which includes linguistic information, para-and non-linguistic information.  ... 
arXiv:2004.03781v1 fatcat:onctwylabjbkhk5ux7rdz5nwpy

References [chapter]

2010 The Handbook of Computational Linguistics and Natural Language Processing  
Sumita (2002) , Using language and translation models to select the best among outputs from multiple MT systems.  ...  Fourth Workshop on the Semantics and Pragmatics of Dialogue, 43-50. Bos, Johan, & Tetsush Oka (2002), An inference-based approach to dialogue system design. Proceedings of the 19th COLING, 113-19.  ...  Bos, Johan, Ewan Klein, Oliver Lemon, & Tetsushi Oka (2003), DIPPER: description and formalisation of an information-state update dialogue system architecture.  ... 
doi:10.1002/9781444324044.refs fatcat:udzjhccz6vg4hnsy5h744y67ai

Neural speech synthesis for resource-scarce languages

Johannes A. Louw
2019 South African Forum for Artificial Intelligence Research  
We compare traditional hidden Markov model (HMM)-based acoustic modelling for speech synthesis with the proposed architecture using the World and LPCNet vocoders, giving both objective and MUSHRA based  ...  in resource-scarce language environments, with corpora less than 1 hour in size, to build text-to-speech systems of high perceived naturalness.  ...  Introduction The advent of neural network-based text-to-speech (TTS) systems has brought on dramatic improvements in the naturalness and intelligibility of synthesized speech.  ... 
dblp:conf/fair2/Louw19 fatcat:2i5sdf5ppbaexedzjgsgerosem

Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech

Shammur A. Chowdhury, Younes Samih, Mohamed Eldesouki, Ahmed Ali
2020 Interspeech 2020  
In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at the token-level minding both linguistic and acoustic cues for dialectal Arabic.  ...  For detailed analysis, we study Arabic automatic speech recognition (ASR), Arabic dialect identification (ADI), and natural language processing (NLP) modules for the DCS corpus.  ...  We opt for the non-linguist transcriber to avoid any bias to standard Arabic.  ... 
doi:10.21437/interspeech.2020-2271 dblp:conf/interspeech/ChowdhurySEA20 fatcat:gvze2uojdzasjafwtmfzovmyla

Speech Emotion Recognition: A Survey

Swarna. kuchibhotla, Associate Professor, Department of CSE, Koneru Lakshmaiah Education Foundation, Guntur
2019 International Journal of Multimedia and Ubiquitous Engineering  
Systems are given training in such a way to detect the emotions from the spoken utterances.  ...  Emotions can be recognized in a better way using Speech processing, Artificial Intelligence techniques and linguistic semantics.  ...  speech samples are sent into the system to extract speech coefficients and uses model file to classify the speech emotion [19] .  ... 
doi:10.21742/ijmue.2019.14.2.03 fatcat:pwkfph3v7nfm3i2667k2rszxpu

Advances in Chinese Natural Language Processing and Language resources

Jianhua Tao, Fang Zheng, Aijun Li, Ya Li
2009 2009 Oriental COCOSDA International Conference on Speech Database and Assessments  
The Chinese Language resources, including linguistic data, speech data, evaluation data and language toolkits which are elaborately constructed for CNLP related fields and some language resource consortiums  ...  Aimed to promote the development of corpus-based technologies, many resource consortiums commit themselves to collect, create and distribute many kinds of resources.  ...  Linguistic resources for text classification, information retrieval and automatic summary etc. are also numerous.  ... 
doi:10.1109/icsda.2009.5278384 fatcat:mch4odyn2fedxompkxfvhtmaqe

RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

Wern-Jun Wang, Yuan-Fu Liao, Sin-Horng Chen
2002 Speech Communication  
In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed.  ...  Character accuracy rates of 73.6%, 74.6% and 74.7% were obtained for the systems using the baseline scheme, Schemes 1 and 2, respectively.  ...  Acknowledgements The database was provided by Chunghwa Telecommunication Laboratories and the basic lexicon was supported by Academia Sinica of Taiwan.  ... 
doi:10.1016/s0167-6393(01)00006-1 fatcat:ig7ehvd5xfhs5oecglcn25qq2q

Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions

Kulvinder Panesar
2020 Journal of Computer-Assisted Linguistic Research  
To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution.  ...  This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the  ...  Figure 7 illustrates the use of BERT for extracting the embedding for each tweet and using the embedding to train a text classification model for hate speech (Bhashkar 2019) .  ... 
doi:10.4995/jclr.2020.12932 fatcat:oogpuyd6zvhixi22k33xawe3dm

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech [article]

Wei Liu, Jingyu Li, Tan Lee
2022 arXiv   pre-print
Different acoustic feature conversion approaches, including deep neural network based and signal processing based, are investigated and compared under a fair experimental setting, in which converted acoustic  ...  A disentanglement-based auto-encoder (DAE) conversion framework is found to be useful and the approach of F0 normalization achieves the best performance.  ...  The linguistic factor refers to the speech content and the para-linguistic factor covers all content-irrelevant information, including speaker identity, emotion, prosody, and speaking style.  ... 
arXiv:2205.12477v1 fatcat:bql2mjmmcjgbjjksijeazjzcj4

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation [article]

Rongjie Huang, Zhou Zhao, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He
2022 arXiv   pre-print
To alleviate the acoustic multimodal problem, we propose bilateral perturbation, which consists of the style normalization and information enhancement stages, to learn only the linguistic information from  ...  Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation learning, where a sequence of discrete representations (units) derived in a self-supervised manner,  ...  As illustrated in Figure 1 (a), the unit-based textless S2ST system consists of a speech-to-unit translation (S2UT) model followed by a unit-based vocoder that converts discrete units to speech, leading  ... 
arXiv:2205.12523v1 fatcat:akzsdagrubce7b7ie56yfmm6dm

Improving Voice Separation by Incorporating End-to-end Speech Recognition [article]

Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji
2020 arXiv   pre-print
In this work, we propose to explicitly incorporate the phonetic and linguistic nature of speech by taking a transfer learning approach using an end-to-end automatic speech recognition (E2EASR) system.  ...  Experimental results on speech separation and enhancement task on the AVSpeech dataset show that the proposed method significantly improves the signal-to-distortion ratio over the baseline model and even  ...  END-TO-END ASR FEATURE To capture phonetic and linguistic information, it is important to model the long-term dependences of an utterance.  ... 
arXiv:1911.12928v2 fatcat:7ye3xijyg5gfvlconzwoiwjtc4
« Previous Showing results 1 — 15 out of 11,127 results