A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
2021
IEEE Access
We extract speech-based linguistic information using K-means clustering to combine linguistic information for corresponding feature frames from speech data. ...
To this end, we introduced linguistic-coupled information through the unsupervised phonology clustering method and proposed the age-to-age voice translation using the linguistic-coupled information to ...
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ ...
doi:10.1109/access.2021.3115608
fatcat:ublcbcyz2fhrtfamj7x5li7viy
Response type selection for chat-like spoken dialog systems based on LSTM and multi-task learning
2021
Speech Communication
Nishimura and N. Kitaoka, Response type selection for chat-like spoken dialog systems based on LSTM and multi-task learning. Speech Communication (2021), doi: https://doi. ...
As a response type selector, we propose an LSTM-based encoder-decoder framework utilizing acoustic and linguistic features extracted from input utterances. ...
Acknowledgments This work was supported in part by the Strategic Information and Com- ...
doi:10.1016/j.specom.2021.07.003
fatcat:owbm3rkqqfbifcfhyclbq2azzi
INFORMATION EXTRACTION FROM SPEECH
2006
2006 IEEE Spoken Language Technology Workshop
now commonly used • For names and non-nested entity mentions, tagging models can be used -For example: Person-Name-Start, Organization-Name-Continue, Not-A-Name• Coreference requires combining non-local ...
efficient
Machine Translation
• Approach
-Discriminative methods
• Neural Network for acoustic features
• Perceptron for linguistic features
-Sentence length constraints
• Statistical duration ...
37
Levels of Linguistic Analysis ...
doi:10.1109/slt.2006.326780
dblp:conf/slt/Makhoul06
fatcat:aqj3ovbcufg3bpeu5bakfzqkxa
Text To Speech Synthesis for Afaan Oromoo Language Using Deep Learning Approach
2022
New Media and Mass Communication
The Festival toolkit is used for texts normalized in linguistic extraction from label phoneme alignment to match with speech corpus in trains and tests. ...
The normalized text was used for linguistic features are extracted by using Festival toolkit for Afaan Oromoo TTS. ...
Speech parameter based are used for intelligibility is 4.3 mechanism, spoken Synthesis for on Hidden training and 10 and naturalness 4.1 language non- Afaan Markov Model sentences for of the speech standard ...
doi:10.7176/nmmc/101-02
fatcat:jswi5bxwgffnnn2qpxurhr7i5e
Emotional Voice Conversion With Cycle-consistent Adversarial Network
[article]
2020
arXiv
pre-print
Emotional Voice Conversion, or emotional VC, is a technique of converting speech from one emotion state into another one, keeping the basic linguistic information and speaker identity. ...
Recently, cycle-consistent generative adversarial networks (CycleGAN) have been used successfully for non-parallel VC. This paper investigates the efficacy of using CycleGAN for emotional VC tasks. ...
INTRODUCTION Human speech is a complex signal that contains rich information, which includes linguistic information, para-and non-linguistic information. ...
arXiv:2004.03781v1
fatcat:onctwylabjbkhk5ux7rdz5nwpy
References
[chapter]
2010
The Handbook of Computational Linguistics and Natural Language Processing
Sumita (2002) , Using language and translation models to select the best among outputs from multiple MT systems. ...
Fourth Workshop on the Semantics and Pragmatics of Dialogue, 43-50. Bos, Johan, & Tetsush Oka (2002), An inference-based approach to dialogue system design. Proceedings of the 19th COLING, 113-19. ...
Bos, Johan, Ewan Klein, Oliver Lemon, & Tetsushi Oka (2003), DIPPER: description and formalisation of an information-state update dialogue system architecture. ...
doi:10.1002/9781444324044.refs
fatcat:udzjhccz6vg4hnsy5h744y67ai
Neural speech synthesis for resource-scarce languages
2019
South African Forum for Artificial Intelligence Research
We compare traditional hidden Markov model (HMM)-based acoustic modelling for speech synthesis with the proposed architecture using the World and LPCNet vocoders, giving both objective and MUSHRA based ...
in resource-scarce language environments, with corpora less than 1 hour in size, to build text-to-speech systems of high perceived naturalness. ...
Introduction The advent of neural network-based text-to-speech (TTS) systems has brought on dramatic improvements in the naturalness and intelligibility of synthesized speech. ...
dblp:conf/fair2/Louw19
fatcat:2i5sdf5ppbaexedzjgsgerosem
Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech
2020
Interspeech 2020
In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at the token-level minding both linguistic and acoustic cues for dialectal Arabic. ...
For detailed analysis, we study Arabic automatic speech recognition (ASR), Arabic dialect identification (ADI), and natural language processing (NLP) modules for the DCS corpus. ...
We opt for the non-linguist transcriber to avoid any bias to standard Arabic. ...
doi:10.21437/interspeech.2020-2271
dblp:conf/interspeech/ChowdhurySEA20
fatcat:gvze2uojdzasjafwtmfzovmyla
Speech Emotion Recognition: A Survey
2019
International Journal of Multimedia and Ubiquitous Engineering
Systems are given training in such a way to detect the emotions from the spoken utterances. ...
Emotions can be recognized in a better way using Speech processing, Artificial Intelligence techniques and linguistic semantics. ...
speech samples are sent into the system to extract speech coefficients and uses model file to classify the speech emotion [19] . ...
doi:10.21742/ijmue.2019.14.2.03
fatcat:pwkfph3v7nfm3i2667k2rszxpu
Advances in Chinese Natural Language Processing and Language resources
2009
2009 Oriental COCOSDA International Conference on Speech Database and Assessments
The Chinese Language resources, including linguistic data, speech data, evaluation data and language toolkits which are elaborately constructed for CNLP related fields and some language resource consortiums ...
Aimed to promote the development of corpus-based technologies, many resource consortiums commit themselves to collect, create and distribute many kinds of resources. ...
Linguistic resources for text classification, information retrieval and automatic summary etc. are also numerous. ...
doi:10.1109/icsda.2009.5278384
fatcat:mch4odyn2fedxompkxfvhtmaqe
RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion
2002
Speech Communication
In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. ...
Character accuracy rates of 73.6%, 74.6% and 74.7% were obtained for the systems using the baseline scheme, Schemes 1 and 2, respectively. ...
Acknowledgements The database was provided by Chunghwa Telecommunication Laboratories and the basic lexicon was supported by Academia Sinica of Taiwan. ...
doi:10.1016/s0167-6393(01)00006-1
fatcat:ig7ehvd5xfhs5oecglcn25qq2q
Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions
2020
Journal of Computer-Assisted Linguistic Research
To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution. ...
This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the ...
Figure 7 illustrates the use of BERT for extracting the embedding for each tweet and using the embedding to train a text classification model for hate speech (Bhashkar 2019) . ...
doi:10.4995/jclr.2020.12932
fatcat:oogpuyd6zvhixi22k33xawe3dm
An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
[article]
2022
arXiv
pre-print
Different acoustic feature conversion approaches, including deep neural network based and signal processing based, are investigated and compared under a fair experimental setting, in which converted acoustic ...
A disentanglement-based auto-encoder (DAE) conversion framework is found to be useful and the approach of F0 normalization achieves the best performance. ...
The linguistic factor refers to the speech content and the para-linguistic factor covers all content-irrelevant information, including speaker identity, emotion, prosody, and speaking style. ...
arXiv:2205.12477v1
fatcat:bql2mjmmcjgbjjksijeazjzcj4
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
[article]
2022
arXiv
pre-print
To alleviate the acoustic multimodal problem, we propose bilateral perturbation, which consists of the style normalization and information enhancement stages, to learn only the linguistic information from ...
Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation learning, where a sequence of discrete representations (units) derived in a self-supervised manner, ...
As illustrated in Figure 1 (a), the unit-based textless S2ST system consists of a speech-to-unit translation (S2UT) model followed by a unit-based vocoder that converts discrete units to speech, leading ...
arXiv:2205.12523v1
fatcat:akzsdagrubce7b7ie56yfmm6dm
Improving Voice Separation by Incorporating End-to-end Speech Recognition
[article]
2020
arXiv
pre-print
In this work, we propose to explicitly incorporate the phonetic and linguistic nature of speech by taking a transfer learning approach using an end-to-end automatic speech recognition (E2EASR) system. ...
Experimental results on speech separation and enhancement task on the AVSpeech dataset show that the proposed method significantly improves the signal-to-distortion ratio over the baseline model and even ...
END-TO-END ASR FEATURE To capture phonetic and linguistic information, it is important to model the long-term dependences of an utterance. ...
arXiv:1911.12928v2
fatcat:7ye3xijyg5gfvlconzwoiwjtc4
« Previous
Showing results 1 — 15 out of 11,127 results