1,476 Hits in 3.6 sec

Empirical Study of Utilizing Morph-Syntactic Information in SMT [chapter]

Young-Sook Hwang, Taro Watanabe, Yutaka Sasaki
2005 Lecture Notes in Computer Science  
Moreover, the use of a class-based n-gram language model improves performance by alleviating the data sparseness problem in a word-based language model.  ...  And we integrate the models into a log-linear model.  ...  technology based on a large corpus".  ... 
doi:10.1007/11562214_42 fatcat:tkcbvagphfehpcrz5t2ymguooi

Unsupervised learning of agglutinated morphology using nested Pitman-Yor process based morpheme induction algorithm

Arun Kumar, Lluis Padro, Antoni Oliver
2015 2015 International Conference on Asian Language Processing (IALP)  
In this paper we describe a method to morphologically segment highly agglutinating and inflectional languages from Dravidian family.  ...  We use nested Pitman-Yor process to segment long agglutinated words into their basic components, and use a corpus based morpheme induction algorithm to perform morpheme segmentation.  ...  ., 2007) based on Minimum Description Length principle is the reference model for highly inflecting languages, such as Finnish.  ... 
doi:10.1109/ialp.2015.7451528 dblp:conf/ialp/KumarPO15 fatcat:feva73iv25c53nz2jhipjzkx7i

Izafet vs non-Izafet genitive patterns in non-related languages

Nailya Mingazova, Vitaly Subich, Charles Carlson
2018 XLinguae  
The degree of izafet/ non-izafet characteristics of the languages under study is revealed on the basis of the genitive phrase (GP) models.  ...  The phenomenon of izafet is considered typical to Iranian (Persian), Afroasiatic (Arabic), Turkic (Tatar), and Uralic (Udmurt) languages, i.e. in languages with more or less agglutinating morphology.  ...  inflectional languages) and head-last languages (agglutinative Turkic languages, etc.).  ... 
doi:10.18355/xl.2018.11.02.04 fatcat:3i3cd3skc5h7pmi5vv7w2ombvi

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages [chapter]

Ebru Arsoy, Mikko Kurimo, Murat Saralar, Teemu Hirsimki, Janne Pylkknen, Tanel Alume, Haim Sak
2008 Speech Recognition  
For other agglutinative languages like Finnish and Estonian, OOV rates are around 15% for a 69K lexicon (Hirsimäki et al., 2006) and 10% for a 60K lexicon respectively and 8.27% for Czech, a highly inflectional  ...  Highly inflectional and agglutinative languages suffer from high number of OOV words with similar size vocabularies. In our Turkish BN transcription system, the OOV rate is 9.3% for a 50K lexicon.  ...  The authors would like to thank Sabancı and ODTÜ universities for the Turkish text data and AT&T Labs -Research for the software.  ... 
doi:10.5772/6380 fatcat:nurewjnz6jatlh4ojid5xrvhy4

An empirical test of the Agglutination Hypothesis [chapter]

Martin Haspelmath
2009 Zenodo  
I report on a study of the nominal and verbal inflectional morphology of a reasonably balanced world-wide sample of 30 languages, applying a variety of measures for the agglutination parameters and determining  ...  (ii) Second prediction: If a language is agglutinating/fusional with respect to one of the three agglutination parameters (a-c) (and perhaps others), it shows the same type with respect to the other two  ...  In other words, a highly developed paradigmatic system of tonal oppositions appears not to be very compatible with a highly developed syntagmatic system of agglutinative morphology" But at least since  ... 
doi:10.5281/zenodo.3888247 fatcat:um6zeztf5bburhmh2decprg7pm

Quantifying Synthesis and Fusion and their Impact on Machine Translation [article]

Arturo Oncevay and Duygu Ataman and Niels van Berkel and Barry Haddow and Alexandra Birch and Johannes Bjerva
2022 arXiv   pre-print
However, literature in Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.  ...  For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as  ...  in Science and Technology (COST) under the programme CA18231 -Multi3Generation: Multi-task, Multilingual, Multimodal Language Generation.  ... 
arXiv:2205.03369v1 fatcat:rifjp2oxnvg6vmcztwjkkr4gwa

A Rule based Kannada Morphological Analyzer and Generator using Finite State Transducer

Ramasamy Veerappan, Antony P J, S Saravanan, Soman K P
2011 International Journal of Computer Applications  
Developing a well fledged morphological analyzer and generator (M AG) tools for highly agglutinative language like Kannada is a challenging task.  ...  This project has been developed as part of the development of a machine translation system for English to Kannada language.  ...  At 2000, Agirve introduced a word-grammar based morphological analyzer using the two-level and a unification-based formalism for a highly agglutinative language called Basque [5] . [14] .  ... 
doi:10.5120/3333-4583 fatcat:cw3gmeqbkjdifexqjzdyys24ca

A Survey on Various Approach used in Named Entity Recognition for Indian Languages

Dikshan N., Harshad Bhadka
2017 International Journal of Computer Applications  
They found that there is a lack of annotated data and it is highly agglutinating and inflected language.  ...  No capitalization, redundant named entities available in dictionary with other specific meaning, highly inflectional language resulting in large complex word forms, free word order language which  ... 
doi:10.5120/ijca2017913878 fatcat:rda4faeyyrhrnngxiyvvpn44o4

Reading development in agglutinative languages: Evidence from beginning, intermediate, and adult Basque readers

Joana Acha, Itziar Laka, Manuel Perea
2010 Journal of Experimental Child Psychology  
Do typological properties of language, such as agglutination (i.e., the morphological process of adding affixes to the lexeme of a word), have an impact on the development of visual word recognition?  ...  To each stem, four inflections of different lengths were attached (-a, -ari, -aren, and -arentzat, i.e., inflectional sequences).  ...  non-native counterparts in third grade.  ... 
doi:10.1016/j.jecp.2009.10.008 pmid:20003988 fatcat:fkvi6hzqsbefxlc4ykkhk4hqpu

Morphological Analysis of the Dravidian Language Family

Arun Kumar, Ryan Cotterell, Lluís Padró, Antoni Oliver
2017 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers  
To remedy this, we create DravMorph, a corpus annotated for morphological segmentation and part-of-speech.  ...  The Dravidian family is one of the most widely spoken set of languages in the world, yet there are very few annotated resources available to NLP researchers.  ...  Acknowledgments The second author was supported by a DAAD Long-Term Research Grant and an NDSEG fellowship.  ... 
doi:10.18653/v1/e17-2035 dblp:conf/eacl/KumarPCO17 fatcat:jyogzw4h7zcy3h2glzlxg3qzum

Automated Learning of Hungarian Morphology for Inflection Generation and Morphological Analysis

Gabor Szabo, Laszlo Kovacs
2020 Indonesian Journal of Electrical Engineering and Informatics (IJEEI)  
The goal of our research is to create a novel morphology model that can learn the morphology of highly agglutinative languages in an automated way, and then generate inflected word forms from a lemma and  ...  The automated learning of morphological features of highly agglutinative languages is an important research area for both machine learning and computational linguistics.  ...  CONCLUSION In this paper we presented a novel multi-affix morphology model that can learn the morphology of highly agglutinative languages like Hungarian, and solve the inflection generation and morphological  ... 
doi:10.52549/ijeei.v8i4.2545 fatcat:nbqvno7uhrbwbptdnclnssqm3m

Speech Recognition for Agglutinative Languages [chapter]

R. Thangarajan
2012 Modern Speech Recognition Approaches with Case Studies  
This bi-gram measure is sufficient for modeling strings of words in a language where inflectional morphology is low.  ...  Justification for using prosodic syllable as a speech unit Thangarajan et al (2008a) have proposed a syllable based language model for combating the agglutinative nature of Tamil language.  ... 
doi:10.5772/50140 fatcat:uda5p267ajapvk5qojorl6j2q4

Sheffield Systems for the Finnish-English WMT Translation Task

David Steele, Karin Sim Smith, Lucia Specia
2015 Proceedings of the Tenth Workshop on Statistical Machine Translation  
Finnish is a morphologically rich language with elements such as nouns and verbs carrying a large number of inflectional types.  ...  This paper provides an overview of the Sheffield University submission to the WMT15 Translation Task for the Finnish-English language pair.  ...  Morphological stemming Our main improvement to the system was based on the idea that there is a need to deal with the highly inflectional nature of Finnish, as the source language.  ... 
doi:10.18653/v1/w15-3020 dblp:conf/wmt/SteeleSS15 fatcat:5hbc3sppv5ce7jdmzzrp7co6qq

Influence of Highly Inflected Word Forms and Acoustic Background on the Robustness of Automatic Speech Recognition for Human–Computer Interaction

Andrej Zgank
2022 Mathematics  
Thus, a novel type of analysis is proposed, where a dedicated speech database comprised solely of highly inflected word forms is constructed and used for tests.  ...  The impact of highly inflected word forms on speech recognition accuracy was reduced with the increased levels of acoustic background and was, in these cases, similar to the non-highly inflected test sets  ...  Different languages can present a challenging task for a speech recognition system, due to their properties. Examples of such languages are tonal, highly inflected, or agglutinative languages.  ... 
doi:10.3390/math10050711 fatcat:hsuajqjegjeqtboq4vt2l26v3i

UniMorph 4.0: Universal Morphology [article]

Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke (+83 others)
2022 arXiv   pre-print
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages.  ...  The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.  ...  Turkic Turkish is part of the Oghuz branch, and it is highly agglutinative, like the other languages of this family. This release vastly expanded the pre-existing UniMorph inflection tables.  ... 
arXiv:2205.03608v2 fatcat:twdio7zbm5ehhoo2f5abmviune
« Previous Showing results 1 — 15 out of 1,476 results