Filters








8,806 Hits in 4.2 sec

Morphological Analysis of the Corpus of Spontaneous Japanese

K. Uchimoto, K. Takaoka, C. Nobata, A. Yamada, S. Sekine, H. Isahara
2004 IEEE Transactions on Speech and Audio Processing  
This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and a method for accurately tagging a large spontaneous speech  ...  In this paper, we show that by using semi-automatic analysis we can expect a precision of over 99% for detecting and tagging short words and 97% for long words; the two types of words comprising the corpus  ...  INTRODUCTION The "Spontaneous Speech: Corpus and Processing Technology" project is sponsoring the construction of a large spontaneous Japanese speech corpus, Corpus of Spontaneous Japanese (CSJ) [1] .  ... 
doi:10.1109/tsa.2004.828700 fatcat:vycubdew4vd7pnqzldkv5xldry

Morphological analysis of a large spontaneous speech corpus in Japanese

Kiyotaka Uchimoto, Chikashi Nobata, Atsushi Yamada, Satoshi Sekine, Hitoshi Isahara
2003 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03  
This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately  ...  the corpus.  ...  Introduction The "Spontaneous Speech: Corpus and Processing Technology" project is sponsoring the construction of a large spontaneous Japanese speech corpus, Corpus of Spontaneous Japanese (CSJ) (Maekawa  ... 
doi:10.3115/1075096.1075157 dblp:conf/acl/UchimotoNYSI03 fatcat:sqgwohghwnaopbsdnjqvuwifx4

Morphological analysis of the spontaneous speech corpus

Kiyotaka Uchimoto, Chikashi Nobata, Atsushi Yamada, Hitoshi Isahara, Satoshi Sekine
2002 Proceedings of the 19th international conference on Computational linguistics -   unpublished
We use a morphological analysis system based on a maximum entropy model, which is independent of the domain of corpora.  ...  In this paper we show the tagging accuracy achieved by using the model and discuss problems in tagging the spontaneous speech corpus.  ...  Tagging the spontaneous speech corpus with morphological information such as word segmentation and parts-of-speech is one of the goals of the project.  ... 
doi:10.3115/1071884.1071903 fatcat:apxcquur4zfzhnfn4xmhadlbvu

Grammatically Coded Corpus Of Spoken Lithuanian: Methodology And Development

L. Kamandulytė-Merfeldienė
2017 Zenodo  
The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of  ...  The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006.  ...  Thus, when creating a balanced corpus, it was decided to collect the data of spontaneous private speech and prepared public speech, since the analysis of such data is informative and revealing not only  ... 
doi:10.5281/zenodo.1129916 fatcat:6422uxpygvbbpemypvu2fdwtcy

Corpora of spoken Lithuanian

Ineta Dabašinskienė, Laura Kamandulytė
2009 Eesti Rakenduslingvistika Ühingu Aastaraamat  
The data are transcribed and coded according to the requirements of CHILDES. The second part of the paper presents a corpus based analysis and provides preliminary results.  ...  The data of adult-directed speech, child-directed speech and child speech are analysed to reveal the frequency distribution of parts of speech.  ...  However, until the end of 2006 there was no corpus of Lithuanian adult speech to provide for spontaneous adult speech analysis.  ... 
doi:10.5128/erya5.05 fatcat:zi66kgtrtfalbo6om4zzqgoeui

Syntactically Coded Corpus of Spoken Lithuanian: Developmental Issues and Pilot Studies

Laura Kamandulytė Merfeldienė, Ingrida Balčiūnienė
2016 Studies About Languages  
First, we consider a methodology of development of the Corpus as well as the principles of transcribing and coding Lithuanian speech data.  ...  Generally, we believe that future systematic corpus-based research of spontaneous spoken language will give more possibilities to identify, evaluate, and elaborate the development of the Lithuanian language  ...  Later on, the Corpus was supplemented by a new data of spontaneous speech and expanded by a data of prepared speech, and thus renamed the Corpus of Spoken Lithuanian 2 .  ... 
doi:10.5755/j01.sal.0.28.15131 fatcat:h6wswk35tjejtpaxx5x2v2pyr4

Linguistic and Logical Tools for an Advanced Interactive Speech System in Spanish [chapter]

Jordi Álvarez, Victoria Arranz, Núria Castell, Montserrat Civit
2001 Lecture Notes in Computer Science  
The research here presented shows work on the development of a restricted-domain spontaneous speech dialogue system in Spanish.  ...  Following the morphological, syntactic and semantic analysis, the module generates a structured representation with the content of the user's intervention.  ...  The morphological analysis is carried out by means of MACO+ (Morphological Analyzer Corpus Oriented [11] ), which has been adapted for the task domain.  ... 
doi:10.1007/3-540-45517-5_58 fatcat:h3tfqtj375dvpb5hbtify5xp7y

Morphology-based investigation of differences between spoken and written isiZulu

Laurette Marais, CSIR, Ilana Wilken, CSIR
2021 Journal of the Digital Humanities Association of Southern Africa (DHASA)  
In this paper, we present a quantitative investigation into such differences by considering the morphology of tokens in a transcribed spoken isiZulu corpus and a written isiZulu corpus.  ...  This analysis presents information that could inform the development of voice-enabled computer applications for isiZulu.  ...  considered as spontaneous speech.  ... 
doi:10.55492/dhasa.v3i01.3860 fatcat:znp34ybwqna7fbzfje7dlgwjpy

Recent Results in Speech Recognition for the Tatar Language [chapter]

Aidar Khusainov
2017 Lecture Notes in Computer Science  
In this paper we describe an approach to the creation of automatic speech recognition systems for the Tatar language.  ...  We developed a speech analysis platform to work with under-resourced languages and used this tool to create a baseline speech recognition system.  ...  Speech corpus and acoustic models The creation of the multi-speaker speech corpus for the Tatar language is currently in progress.  ... 
doi:10.1007/978-3-319-64206-2_21 fatcat:zgdfv3add5b33pggrah4uxjiqy

Experiments on Detection of Voiced Hesitations in Russian Spontaneous Speech

Vasilisa Verkhodanova, Vladimir Shapranov
2016 Journal of Electrical and Computer Engineering  
Experimental results on the mixed and quality diverse corpus of spontaneous Russian speech indicate the efficiency of the techniques for the task in question, with SVM outperforming other methods.  ...  The development and popularity of voice-user interfaces made spontaneous speech processing an important research field.  ...  The third part is the corpus of scientific reports from seminar devoted to analysis of conversational speech held at SPIIRAS in 2011.  ... 
doi:10.1155/2016/2013658 fatcat:o6ar2z7kbfhltnxmkh7e5m7btq

Affixation effects on word-final coda deletion in spontaneous Seoul Korean speech

Jungsun Kim
2016 Phonetics and Speech Sciences  
The Korean Corpus of Spontaneous Speech (Yun et al., 2016) showed high percentages of labeling consistency for the analysis of the present study.  ...  For more details on the Korean Corpus of Spontaneous Speech, please see the corpus manual (Yun et al., 2015) .  ... 
doi:10.13064/ksss.2016.8.4.009 fatcat:oo4fvtuk3nghffje3dnxnmw6ae

Gossip is More than Just Story Telling Topic Modelling and Quantitative Analysis on a Spontaneous Speech Corpus

Boróka Pápay, Bálint Kubik, Júlia Galántai
2018 European Conference on Information Retrieval  
In this paper, we describe a quantitative approach to identify gossip in a large corpus containing spontaneous talk with LDA topic modeling and quantitative analysis.  ...  We also analyze the topics to distinguish gossiping and storytelling by dividing gossip and non-gossip texts in our large spontaneous speech corpora.  ...  For our analysis we used a unique corpus of Hungarian language which consists of approximately 550 hours of spontaneous speech.  ... 
dblp:conf/ecir/PapayKG18 fatcat:5jnlcmymdrhllfalzununwfhae

Spoken Tunisian Arabic Corpus "STAC": Transcription and Annotation

Inès Zribi, Mariem Ellouze, Lamia Hadrich Belguith, Philippe Blache
2015 Research in Computing Science  
This paper presents the "STAC" corpus (Spoken Tunisian Arabic Corpus) of spontaneous Tunisian Arabic speech. We present our method used for the collection and the transcription of this corpus.  ...  Then, we detail the different stages done to enrich the corpus with necessary linguistic and speech annotations that makes it more useful for many NLP applications.  ...  The iterative procedure starts by dividing our corpus to 10 folders according to the number of sentences. We begin with a morphological analysis of the first folder of the corpus.  ... 
doi:10.13053/rcs-90-1-9 fatcat:vhrhp7aobna7fnb2bgqumlhysu

The present status, progress, and usage of speech databases in Japan

Hisao Kuwabara, Shuich Itahashi, Mikio Yamamoto, Satoshi Nakamura, Toshiyuki Takezawa, Kazuya Takeda
2005 Acoustical Science and Technology  
of spontaneous speech data is available.  ...  The present status, progress and usage of Japanese speech database has been described. The database project in Japan started in the early 1980s.  ...  of spontaneous speech data is available.  ... 
doi:10.1250/ast.26.62 fatcat:f4rtbn2oovboxognl52ccktfde

The Corpus of Lithuanian Children Language: Development and application for modern studies in language acquisition

Ingrida Balčiūnienė, Laura Kamandulytė-Merfeldienė
2019 Kalbotyra  
The longitudinal data (conversations between the target children and their caretakers) compiled according to the requirement of natural observation includes transcribed and morphologically annotated speech  ...  First of all, the procedure of data collection for the Corpus is discussed.  ...  Table 3 . 3 The structure and size of the dialogue sub-corpus Table 6 . 6 Results of the comparative analysis among the cohorts  ... 
doi:10.15388/kalbotyra.2018.1 fatcat:quof4nej65ef7lkn7hiyk5s46a
« Previous Showing results 1 — 15 out of 8,806 results