808 Hits in 7.1 sec

Automatic extraction of phonetically rich sentences from large text corpus of indian languages

Karunesh Arora, Sunita Arora, Kapil Verma, Shyam Sunder Agrawal
2004 Interspeech 2004   unpublished
This paper describes a simple process of automatically extracting such set of sentences from a large text corpus of a given Indian Language and also presents an algorithm for the process.  ...  Selecting such a set from a large text corpus without modifying the characteristics of the corpus is still a difficult task.  ...  Vijay Gugnani for manual editing of phonetically rich sentences and getting the recording done. We are grateful to Mr. C K Joseph, Controller (Administration) and Mr.  ... 
doi:10.21437/interspeech.2004-743 fatcat:tun77sjxsjf2vkie7yzdzllyci

Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus [article]

Shrikant Malviya, Rohit Mishra, Uma Shanker Tiwary
2017 arXiv   pre-print
Further a two stage algorithm is proposed to extract phonetically rich sentences with a high variety of triphones from the EMILLE Hindi corpus.  ...  The results show that the approach efficiently build uniformly distributed phonetically-rich corpus with optimum number of sentences.  ...  In [2] , phonetically-rich sentences are extracted from the large corpus of Hindi language.  ... 
arXiv:1701.08655v2 fatcat:3yvyviikw5a7vawkk7mrfvtcga

Development of Phoneme Dominated Database for Limited Domain T-T-S in Hindi

Archana Balyan
2017 International Journal of Artificial Intelligence & Applications  
The primary goal of this paper is to review the speech corpus created by various institutes and organizations so that the scientists and language technologists can recognize the crucial role of corpus  ...  The result shows that a medium size database consisting of 630 utterances with 12,614 words, 11572 tokens of phonemes covering 38 phonemes are generated in our database and it cover maximum possible phonetic  ...  Automatic segmentation of speech corpus at phonetic level TEXT CORPUS: SELECTION OF SENTENCES The domain specific text corpora was so collected that these sentences not only covers the most frequent  ... 
doi:10.5121/ijaia.2017.8301 fatcat:7y5qamht4jaoff65zstlyvhpvy

Festival and Festvox Framework Tools for Marathi Text-to-Speech Synthesis

Sangramsing Nathusing
2015 International Journal of Computer Applications  
This Marathi G2P has been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich sentences.  ...  The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the complete text corpora.  ...  Hence, we have proposed an algorithm [9] to automatically extract compound words and identify its constituent parts from a large text corpus.  ... 
doi:10.5120/ijca2015907413 fatcat:z6dxukw5xrfl5mgaykq3g3x46u

A method for the extraction of phonetically-rich triphone sentences

Gustavo Mendonca, Sara Candeias, Fernando Perdigao, Christopher Shulby, Rean Toniazzo, Aldebaro Klautau, Sandra Aluisio
2014 2014 International Telecommunications Symposium (ITS)  
Such a corpus is of interest for a wide range of contexts, from automatic speech recognition to speech therapy.  ...  A method is proposed for compiling a corpus of phonetically-rich triphone sentences; i.e., sentences with a high variety of triphones, distributed in a uniform fashion.  ...  [15] considered syllables as the basic unit to extract, in an automatic way, phonetically-rich sentences from a large text corpus from Indian languages, justifying their choice because a syllable is  ... 
doi:10.1109/its.2014.6947957 fatcat:dcsxx4odu5aq7egy4zlticohmu

Development of Multi-lingual Spoken Corpora of Indian Languages [chapter]

K. Samudravijaya
2006 Lecture Notes in Computer Science  
The completed preparatory work include the design of phonetically rich sentences, data acquisition setup for recording speech data over telephone channel, a Wizard of Oz setup for acquiring speech data  ...  This paper describes a recently initiated effort for collection and transcription of read as well as spontaneous speech data in four Indian languages.  ...  The process of accumulation of text corpus and design of phonetically rich sentence corpus is described in Section 4.  ... 
doi:10.1007/11939993_79 fatcat:4bhyoimb7nd7lcxord75373jkq

Natural Language Chhattisgarhi: A Literature Survey

Rijuka Pathak, Somesh Dewangan
2014 International Journal of Engineering Trends and Technoloy  
Chhattishgarhi is a official language in the Indian state of Chhattisgarh. Spoken by 17.5 million people.  ...  POS tagger is one of the important tools that are used to develop language translator and information extraction so that computer based be compatible for natural language processing.  ...  This paper describes the design, structure and phonetic analysis of text corpus for Hindi. An analysis of the phonetic richness of sentences designed by this method is provided. 17 .  ... 
doi:10.14445/22315381/ijett-v12p220 fatcat:teaehvmhavetjb4odoaq7ywwv4

Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil

Bharadwaja KumarG, Melvin Jose Johnson Premkumar
2013 International Journal of Computer Applications  
Research in the area of Large Vocabulary Continuous Speech Recognition (LVCSR) for Indian languages has not seen the level of advancement as in English since there is a dearth of large scale speech and  ...  Tamil is one among the four major Dravidian languages spoken in southern India. One of the characteristics of Tamil is that it is morphologically very rich.  ...  The training data consists of phonetically rich sentences from newspapers and Thirukkural. Thirukkural is a Tamil classic, consisting of 1330 couplets or kurals.  ... 
doi:10.5120/12172-8180 fatcat:vs7ojmx3wnd4tijawevbdnk6ie

Resources for Development of Hindi Speech Synthesis System: An Overview

Archana Balyan
2017 Open Journal of Applied Sciences  
Most of the information in digital world is accessible to few who can read or understand a particular language. The speech corpus acquisition is an essential part of all spoken technology systems.  ...  The quality and the volume of speech data in corpus directly affect the accuracy of the system.  ...  At KIIT, Gurgaon, a text corpus of 2 million words of natural messages in 12 different domains in Hindi and Indian English and a speech corpus of 100 speakers, each speaking 630 phonetically rich sentences  ... 
doi:10.4236/ojapps.2017.76020 fatcat:74qrju5ex5gu7aikudvoj237lq

Automated Transcription System for Malayalam Language

Cini Kurian, Kannan Balakrishnan
2011 International Journal of Computer Applications  
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers.  ...  The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.  ...  The corpus balancing tool, CorpusCrt [18] is used to extract a set of phonetically rich sentences, from the text materials. Accordingly, 20 phonetically rich sentences are selected for training.  ... 
doi:10.5120/2360-3091 fatcat:ug4ap6vosreshghkbsqjh2j3lq

Design and development of phonetically rich Urdu speech corpus

Agha Ali Raza, Sarmad Hussain, Huda Sarfraz, Inam Ullah, Zahid Sarfraz
2009 2009 Oriental COCOSDA International Conference on Speech Database and Assessments  
This paper presents details of designing and developing an optimal context based phonetically rich speech corpus for Urdu that will serve as a baseline model for training a Large Vocabulary Continuous  ...  The significance of such resources becomes crucial in the development of Automatic Speech Recognition systems and Text to Speech systems.  ...  Acknowledgement The work has been funded through a research grant by Higher Education Commission, Govt. of Pakistan.  ... 
doi:10.1109/icsda.2009.5278380 fatcat:ivzbxgoauzevvo6f6ali4ivc54

Review of Development of Speech corpora and speech recognition research in Hindi

Dr.Harshalata Petkar
2017 International Journal of Engineering Research and Applications  
This speech corpora is utilized for the development of acoustic and language models which can be used for training automatic speech recognition, synthesis or translation system applicability of automatic  ...  Hindi is one of the most widely spoken languages in the world and is major language in India.  ...  time phrses, application words, phonetically rich word, phonetically rich sentence, person name etc.  ... 
doi:10.9790/9622-0707031219 fatcat:mgfm3p2g2vetlojgvdwlo5kibi

Web Recognition of Spoken Hindi

Kamlesh Sharma, Suryakanthi Tangirala
2017 Indian Journal of Science and Technology  
Technology has evolved and computers have but still Indian communities are far from the use of computers, only 37% [13] user of Indian society like persons from academics, health, engineering and research  ...  Although there is a revolution in development of operating systems in the past two years, there are no operating systems that support Indian languages like Hindi, they only support English language and  ...  This necessitates creation of sets of phonetically rich sentences that provide a good coverage of pairs of phones of the language 6 .  ... 
doi:10.17485/ijst/2017/v10i35/118956 fatcat:ro2o2r6noza6fjl4d35rcml5wi

Computational intelligence in processing of speech acoustics: a survey

Amitoj Singh, Navkiran Kaur, Vinay Kukreja, Virender Kadyan, Munish Kumar
2022 Complex & Intelligent Systems  
When compared with non-Indian languages, the research on speech recognition of Indian languages (except Hindi) has not achieved the expected milestone yet.  ...  However, a limited number of automatic speech recognition systems are available for commercial use.  ...  [174] developed Prosody and phonetically Rich Transcribed speech corpus for Bengali and Oriya languages.  ... 
doi:10.1007/s40747-022-00665-1 fatcat:6pu2xccbq5as7bn2y2tav2fdwa

A comprehensive survey on Indian regional language processing

B. S. Harish, R. Kasturi Rangan
2020 SN Applied Sciences  
Processing of these natural languages for various language processing tasks is challenging. The Indian regional languages are considered to be low resourced when compared to other languages.  ...  The future scope and essential requirements to enhance the processing of Indian regional languages for various language processing tasks are discussed.ϖ  ...  Recently, in language processing due to the availability of large corpus, researchers have developed pretrained neural models which are trained on these large benchmark corpus/dataset.  ... 
doi:10.1007/s42452-020-2983-x fatcat:e3u5r5qo7ngapj5mbiwit7qlwi
« Previous Showing results 1 — 15 out of 808 results