A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Empirical Study of Utilizing Morph-Syntactic Information in SMT
[chapter]
2005
Lecture Notes in Computer Science
Moreover, the use of a class-based n-gram language model improves performance by alleviating the data sparseness problem in a word-based language model. ...
And we integrate the models into a log-linear model. ...
technology based on a large corpus". ...
doi:10.1007/11562214_42
fatcat:tkcbvagphfehpcrz5t2ymguooi
Unsupervised learning of agglutinated morphology using nested Pitman-Yor process based morpheme induction algorithm
2015
2015 International Conference on Asian Language Processing (IALP)
In this paper we describe a method to morphologically segment highly agglutinating and inflectional languages from Dravidian family. ...
We use nested Pitman-Yor process to segment long agglutinated words into their basic components, and use a corpus based morpheme induction algorithm to perform morpheme segmentation. ...
., 2007) based on Minimum Description Length principle is the reference model for highly inflecting languages, such as Finnish. ...
doi:10.1109/ialp.2015.7451528
dblp:conf/ialp/KumarPO15
fatcat:feva73iv25c53nz2jhipjzkx7i
Izafet vs non-Izafet genitive patterns in non-related languages
2018
XLinguae
The degree of izafet/ non-izafet characteristics of the languages under study is revealed on the basis of the genitive phrase (GP) models. ...
The phenomenon of izafet is considered typical to Iranian (Persian), Afroasiatic (Arabic), Turkic (Tatar), and Uralic (Udmurt) languages, i.e. in languages with more or less agglutinating morphology. ...
inflectional languages) and head-last languages (agglutinative Turkic languages, etc.). ...
doi:10.18355/xl.2018.11.02.04
fatcat:3i3cd3skc5h7pmi5vv7w2ombvi
Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages
[chapter]
2008
Speech Recognition
For other agglutinative languages like Finnish and Estonian, OOV rates are around 15% for a 69K lexicon (Hirsimäki et al., 2006) and 10% for a 60K lexicon respectively and 8.27% for Czech, a highly inflectional ...
Highly inflectional and agglutinative languages suffer from high number of OOV words with similar size vocabularies. In our Turkish BN transcription system, the OOV rate is 9.3% for a 50K lexicon. ...
The authors would like to thank Sabancı and ODTÜ universities for the Turkish text data and AT&T Labs -Research for the software. ...
doi:10.5772/6380
fatcat:nurewjnz6jatlh4ojid5xrvhy4
An empirical test of the Agglutination Hypothesis
[chapter]
2009
Zenodo
I report on a study of the nominal and verbal inflectional morphology of a reasonably balanced world-wide sample of 30 languages, applying a variety of measures for the agglutination parameters and determining ...
(ii) Second prediction: If a language is agglutinating/fusional with respect to one of the three agglutination parameters (a-c) (and perhaps others), it shows the same type with respect to the other two ...
In other words, a highly developed paradigmatic system of tonal oppositions appears not to be very compatible with a highly developed syntagmatic system of agglutinative morphology" But at least since ...
doi:10.5281/zenodo.3888247
fatcat:um6zeztf5bburhmh2decprg7pm
Quantifying Synthesis and Fusion and their Impact on Machine Translation
[article]
2022
arXiv
pre-print
However, literature in Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative. ...
For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as ...
in Science and Technology (COST) under the programme CA18231 -Multi3Generation: Multi-task, Multilingual, Multimodal Language Generation. ...
arXiv:2205.03369v1
fatcat:rifjp2oxnvg6vmcztwjkkr4gwa
A Rule based Kannada Morphological Analyzer and Generator using Finite State Transducer
2011
International Journal of Computer Applications
Developing a well fledged morphological analyzer and generator (M AG) tools for highly agglutinative language like Kannada is a challenging task. ...
This project has been developed as part of the development of a machine translation system for English to Kannada language. ...
At 2000, Agirve introduced a word-grammar based morphological analyzer using the two-level and a unification-based formalism for a highly agglutinative language called Basque [5] . [14] . ...
doi:10.5120/3333-4583
fatcat:cw3gmeqbkjdifexqjzdyys24ca
A Survey on Various Approach used in Named Entity Recognition for Indian Languages
2017
International Journal of Computer Applications
They found that there is a lack of annotated data and it is highly agglutinating and inflected language. ...
No capitalization, redundant
named entities available in dictionary with other specific
meaning, highly inflectional language resulting in large
complex word forms, free word order language which ...
doi:10.5120/ijca2017913878
fatcat:rda4faeyyrhrnngxiyvvpn44o4
Reading development in agglutinative languages: Evidence from beginning, intermediate, and adult Basque readers
2010
Journal of Experimental Child Psychology
Do typological properties of language, such as agglutination (i.e., the morphological process of adding affixes to the lexeme of a word), have an impact on the development of visual word recognition? ...
To each stem, four inflections of different lengths were attached (-a, -ari, -aren, and -arentzat, i.e., inflectional sequences). ...
non-native counterparts in third grade. ...
doi:10.1016/j.jecp.2009.10.008
pmid:20003988
fatcat:fkvi6hzqsbefxlc4ykkhk4hqpu
Morphological Analysis of the Dravidian Language Family
2017
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
To remedy this, we create DravMorph, a corpus annotated for morphological segmentation and part-of-speech. ...
The Dravidian family is one of the most widely spoken set of languages in the world, yet there are very few annotated resources available to NLP researchers. ...
Acknowledgments The second author was supported by a DAAD Long-Term Research Grant and an NDSEG fellowship. ...
doi:10.18653/v1/e17-2035
dblp:conf/eacl/KumarPCO17
fatcat:jyogzw4h7zcy3h2glzlxg3qzum
Automated Learning of Hungarian Morphology for Inflection Generation and Morphological Analysis
2020
Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
The goal of our research is to create a novel morphology model that can learn the morphology of highly agglutinative languages in an automated way, and then generate inflected word forms from a lemma and ...
The automated learning of morphological features of highly agglutinative languages is an important research area for both machine learning and computational linguistics. ...
CONCLUSION In this paper we presented a novel multi-affix morphology model that can learn the morphology of highly agglutinative languages like Hungarian, and solve the inflection generation and morphological ...
doi:10.52549/ijeei.v8i4.2545
fatcat:nbqvno7uhrbwbptdnclnssqm3m
Speech Recognition for Agglutinative Languages
[chapter]
2012
Modern Speech Recognition Approaches with Case Studies
This bi-gram measure is sufficient for modeling strings of words in a language where inflectional morphology is low. ...
Justification for using prosodic syllable as a speech unit Thangarajan et al (2008a) have proposed a syllable based language model for combating the agglutinative nature of Tamil language. ...
doi:10.5772/50140
fatcat:uda5p267ajapvk5qojorl6j2q4
Sheffield Systems for the Finnish-English WMT Translation Task
2015
Proceedings of the Tenth Workshop on Statistical Machine Translation
Finnish is a morphologically rich language with elements such as nouns and verbs carrying a large number of inflectional types. ...
This paper provides an overview of the Sheffield University submission to the WMT15 Translation Task for the Finnish-English language pair. ...
Morphological stemming Our main improvement to the system was based on the idea that there is a need to deal with the highly inflectional nature of Finnish, as the source language. ...
doi:10.18653/v1/w15-3020
dblp:conf/wmt/SteeleSS15
fatcat:5hbc3sppv5ce7jdmzzrp7co6qq
Influence of Highly Inflected Word Forms and Acoustic Background on the Robustness of Automatic Speech Recognition for Human–Computer Interaction
2022
Mathematics
Thus, a novel type of analysis is proposed, where a dedicated speech database comprised solely of highly inflected word forms is constructed and used for tests. ...
The impact of highly inflected word forms on speech recognition accuracy was reduced with the increased levels of acoustic background and was, in these cases, similar to the non-highly inflected test sets ...
Different languages can present a challenging task for a speech recognition system, due to their properties. Examples of such languages are tonal, highly inflected, or agglutinative languages. ...
doi:10.3390/math10050711
fatcat:hsuajqjegjeqtboq4vt2l26v3i
UniMorph 4.0: Universal Morphology
[article]
2022
arXiv
pre-print
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. ...
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. ...
Turkic Turkish is part of the Oghuz branch, and it is highly agglutinative, like the other languages of this family. This release vastly expanded the pre-existing UniMorph inflection tables. ...
arXiv:2205.03608v2
fatcat:twdio7zbm5ehhoo2f5abmviune
« Previous
Showing results 1 — 15 out of 1,476 results