Filters








164 Hits in 1.8 sec

Neural Polysynthetic Language Modelling [article]

Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk (+9 others)
2020 arXiv   pre-print
We examine the current state-of-the-art in language modelling, machine translation, and text prediction for four polysynthetic languages: Guaran\'i, St.  ...  When we consider polysynthetic languages (those at the extreme of morphological complexity), approaches like stemming, lemmatization, or subword modelling may not suffice.  ...  Acknowledgements We had access to parallel data for two Yupik languages: St. Lawrence Island Yupik (ess) and Central Alaskan Yup'ik (esu).  ... 
arXiv:2005.05477v2 fatcat:nzw5w2ueznhpbfocqvlmbalkyi

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages [article]

Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze
2018 arXiv   pre-print
Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even  ...  obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings.  ...  We may thus conclude that neural models are indeed applicable to segmentation of polysynthetic languages in a low-resource setting.  ... 
arXiv:1804.06024v1 fatcat:hjmbmiyo3rdtxmbc4chtzlkrta

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages

Katharina Kann, Jesus Manuel Mager Hois, Ivan Vladimir Meza Ruiz, Hinrich Schütze
2018 Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)  
Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even  ...  obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings.  ...  We may thus conclude that neural models are indeed applicable to segmentation of polysynthetic languages in a low-resource setting.  ... 
doi:10.18653/v1/n18-1005 dblp:conf/naacl/KannMRS18 fatcat:wgz7plttyzbdnlsfxoosvyuhpm

Bootstrapping Techniques for Polysynthetic Morphological Analysis [article]

William Lane, Steven Bird
2020 arXiv   pre-print
To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language  ...  Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word.  ...  covered by a research permit from the Northern Land Council, and was sponsored by the Australian government through a PhD scholarship, and grants from the Australian Research Council and the Indigenous Language  ... 
arXiv:2005.00956v1 fatcat:p7ajqtwcbvhmznu6qyuhkxpiry

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages [article]

Manuel Mager and Elisabeth Mager and Alfonso Medina-Urrea and Ivan Meza and Katharina Kann
2018 arXiv   pre-print
To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem  ...  Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available.  ...  Furthermore, with the rise of neural MT (NMT), the common assumption that machine learning approaches for MT were language independent routed the efforts into the direction of general model improvements  ... 
arXiv:1807.00286v1 fatcat:bz7bpik5ujbovfd5q6l3tfysdm

Central Yup'ik and Machine Translation of Low-Resource Polysynthetic Languages [article]

Christopher Liu, Laura Dominé, Kevin Chavez, Richard Socher
2020 arXiv   pre-print
We trained a seq2seq neural machine translation model with attention to translate Yup'ik input into English.  ...  Machine translation tools do not yet exist for the Yup'ik language, a polysynthetic language spoken by around 8,000 people who live primarily in Southwest Alaska.  ...  Polysynthetic languages in particular suffer from this issue. Our primary goal was to train a neural machine translation (NMT) model to reliably translate words from Yup'ik to English.  ... 
arXiv:2009.04087v1 fatcat:nrls4oyeqffvreippn4rhct4se

A Resource for Computational Experiments on Mapudungun [article]

Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W Black
2020 arXiv   pre-print
We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.  ...  We anticipate, though, that one could significantly improve ASR quality over our dataset, by using in-domain language models, or by training endto-end neural recognizers leveraging languages with similar  ...  As with other polysynthetic languages, Mapudungun has Noun Incorporation; however, it is unique insofar as the Noun appears to the right of the Verb, instead of to the left, as in most polysynthetic languages  ... 
arXiv:1912.01772v2 fatcat:4gauokpyobdh7pmxreriv34u5q

Improving Low-Resource Morphological Learning with Intermediate Forms from Finite State Transducers

Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden
2019 Proceedings of the Workshop on Computational Methods for Endangered Languages  
Neural encoder-decoder models are usually applied to morphology learning as an end-to-end process without considering the underlying phonological representations that linguists posit as abstract forms  ...  This paper shows that training a bidirectional two-step encoder-decoder model of Arapaho verbs to learn two separate mappings between tags and abstract morphemes and morphemes and surface allomorphs improves  ...  In polysynthetic languages such as Arapaho, inflected verbal forms are often semantically equivalent to whole sentences in morphologically simpler languages.  ... 
doi:10.33011/computel.v1i.427 fatcat:z3e3w52z35g2jdshig4kbewl7u

Tackling the Low-resource Challenge for Canonical Segmentation [article]

Manuel Mager, Özlem Çetinoğlu, Katharina Kann
2020 arXiv   pre-print
We compare model performance in a simulated low-resource setting for the high-resource languages German, English, and Indonesian to experiments on new datasets for the truly low-resource languages Popoluca  ...  However, while accuracy in emulated low-resource scenarios is over 50% for all languages, for the truly low-resource languages Popoluca and Tepehua, our best model only obtains 37.4% and 28.4% accuracy  ...  We additionally experiment with two polysynthetic low-resource languages: Tepehua and Popoluca (cf. Section 2).  ... 
arXiv:2010.02804v1 fatcat:27ljwcaqujb4dnck4udcvmecre

Subword-Level Language Identification for Intra-Word Code-Switching

Manuel Mager, Özlem Çetinoğlu, Katharina Kann
2019 Proceedings of the 2019 Conference of the North  
We further propose a model for this task, which is based on a segmental recurrent neural network.  ...  Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language  ...  Kong et al. (2016) later proposed the SegRNN model that segments and labels jointly, with successful applications on automatic glossing of polysynthetic languages (Micher, 2017 (Micher, , 2018 .  ... 
doi:10.18653/v1/n19-1201 dblp:conf/naacl/MagerCK19 fatcat:dfknw2z7mjdwzmhh5hfofjtrsu

Subword-Level Language Identification for Intra-Word Code-Switching [article]

Manuel Mager, Özlem Çetinoğlu, Katharina Kann
2019 arXiv   pre-print
We further propose a model for this task, which is based on a segmental recurrent neural network.  ...  Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language  ...  Kong et al. (2016) later proposed the SegRNN model that segments and labels jointly, with successful applications on automatic glossing of polysynthetic languages (Micher, 2017 (Micher, , 2018 .  ... 
arXiv:1904.01989v1 fatcat:2y3mc6ktdvg25fhdvfssoyt6na

Titelei

2008 Glottotheory  
Measuring and Modeling the Complexity of Polysynthetic Language Learning: A 104 HORAL 104 Stanislav, KALINOVÁ Michaela Morphological Analyser of Slovak Language ......................................  ...  , pragmatics, etc. on all levels of linguistic analysis, • applications of methods, models or findings from quantitative linguistics concerning problems of natural language processing, language teaching  ... 
doi:10.1515/glot-2008-frontmatter1 fatcat:ca7dp6tt7rfttn4itkwosxchem

Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer

Lane Schwartz, Emily Chen, Benjamin Hunt, Sylvia LR Schreiner
2019 Proceedings of the Workshop on Computational Methods for Endangered Languages  
Morphological analysis is a critical enabling technology for polysynthetic languages. We present a neural morphological analyzer for case-inflected nouns in St.  ...  Lawrence Island Yupik, an endangered polysythetic language in the Inuit-Yupik language family, treating morphological analysis as a recurrent neural sequence-to-sequence task.  ...  Conclusion Morphological analysis is a critical enabling technology for polysynthetic languages such as St. Lawrence Island Yupik.  ... 
doi:10.33011/computel.v1i.4277 fatcat:awpkbsdnrbgfbnbhlopa6degji

Comparing morphological complexity of Spanish, Otomi and Nahuatl [article]

Ximena Gutierrez-Vasques, Victor Mijangos
2018 arXiv   pre-print
These are languages that belong to different linguistic families, the latter are low-resourced.  ...  We show that a language can be complex in terms of how many different morphological word forms can produce, however, it may be less complex in terms of predictability of its internal structure of words  ...  These models need a corpus as training data, they are usually based on n-grams, and more recently, in neural representations of words.  ... 
arXiv:1808.04314v1 fatcat:hrcgs25qrjdcdk7oni5ith7lnm

Page 390 of Linguistics and Language Behavior Abstracts: LLBA Vol. 29, Issue 1 [page]

1995 Linguistics and Language Behavior Abstracts: LLBA  
J., writings bibliography; 9501658 Gaelic language-music relationship, Cape Breton Island (Nova Sco- tia); 9501212 Greek Kalamatianos folk songs, information theory models basis, structural characteristics  ...  plot development; age advancement; plot unit graph gen- erator computer program; children from kindergarten to 5th grade; longitudinal study; 9500703 Nasalization automatic consonent classification, neural  ... 
« Previous Showing results 1 — 15 out of 164 results