Filters








21 Hits in 8.9 sec

A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger

GÉRARD HUET
2005 Journal of functional programming  
We present the Zen toolkit for morphological and phonological processing of natural languages.  ...  A coroutine interpreter is given, and its correctness and completeness are formally proved. An application to the segmentation of Sanskrit by sandhi analysis is demonstrated.  ...  We shall describe here the Zen toolkit for lexical, morphological and phonological processing, as the first layer in a generic computational linguistics platform in ML.  ... 
doi:10.1017/s0956796804005416 fatcat:3ry3bjk3jbhtjf2ew5by6xudeq

Zen and the Art of Symbolic Computing: Light and Fast Applicative Algorithms for Computational Linguistics [chapter]

Gérard Huet
2002 Lecture Notes in Computer Science  
The talk reports on experiments in using declarative programming for the processing of the sanskrit language, in its phonological and morphological aspects.  ...  A lexicon-based morphological tagger has been designed, using an original algorithm for the analysis of euphony (the so-called sandhi process, which glues together the words of a sentence in a continuous  ...  The talk reports on experiments in using declarative programming for the processing of the sanskrit language, in its phonological and morphological aspects.  ... 
doi:10.1007/3-540-36388-2_3 fatcat:b3e2pe3krbfadcqpreytu7zdse

Shallow syntax analysis in Sanskrit guided by semantic nets constraints

Gérard Huet
2007 Proceedings of the 2006 international workshop on Research issues in digital libraries - IWRIDL '06  
It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional  ...  The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database.  ...  Con-crete applications (notably to text processing tools such as spell and syntax correction, to tools for search in textual databases such as Web search engines, and in a longer term perspective to inter-lingual  ... 
doi:10.1145/1364742.1364750 dblp:conf/iwridl/Huet06 fatcat:3lmv3v5mhfc5pdvjrwiqwjvw74

Urdu Morphology, Orthography and Lexicon Extraction [article]

Muhammad Humayoun and Harald Hammarström and Aarne Ranta
2022 arXiv   pre-print
The morphology is implemented in a toolkit called Functional Morphology (Forsberg & Ranta, 2004), which is based on the idea of dealing grammars as software libraries.  ...  Therefore this implementation could be reused in applications such as intelligent search of keywords, language training and infrastructure for syntax.  ...  Functional Morphology Toolkit FM is a toolkit for morphology development in Haskell (Forsberg & Ranta, 2004) .  ... 
arXiv:2204.03071v1 fatcat:vpjab6wma5d5rkpp57y5l3dzhy

The Reactive Engine for Modular Transducers [chapter]

Gérard Huet, Benoît Razet
2006 Lecture Notes in Computer Science  
Such additional choice points require fitting some additional control to the reactive engine. Further parameters are required for some functionalities.  ...  It is an abstraction from a modular version of the Sanskrit segmenter presented in [9] .  ...  For the Sanskrit platform built by the first author, this allows to build a tagger composing machines which invert phonology (sandhi analysis) and morphology, with separate machines for distinct lexical  ... 
doi:10.1007/11780274_19 fatcat:wlumqjkdxngsvlr3nctsbanfey

Linguistic Resources for Bhojpuri, Magahi and Maithili: Statistics about them, their Similarity Estimates, and Baselines for Three Applications [article]

Rajesh Kumar Mundotiya, Manish Kumar Singh, Rahul Kapur, Swasti Mishra, Anil Kumar Singh
2021 arXiv   pre-print
Corpus preparation for low-resource languages and for development of human language technology to analyze or computationally process them is a laborious task, primarily due to the unavailability of expert  ...  The basic statistical measures were both absolute and relative and were exptected to indicate of linguistic properties such as morphological, lexical, phonological, and syntactic complexities (or richness  ...  The same also applies to phonological complexities for these languages. In fact, it is difficult to separate phonological complexity from morphological complexity solely based on these statistics.  ... 
arXiv:2004.13945v2 fatcat:gjtvhkukunb7xcybh3akvfkvhm

An Information-Extraction System for Urdu---A Resource-Poor Language

Smruthi Mukund, Rohini Srihari, Erik Peterson
2010 ACM Transactions on Asian Language Information Processing  
NLP systems begin with modules such as word segmentation, part-of-speech tagging, and morphological analysis and progress to modules such as shallow parsing and named entity tagging.  ...  Each of the new Urdu text processing modules has been integrated into a general text-mining platform.  ...  to Hindi as it shares its phonological, morphological, and syntactic structure with Hindi.  ... 
doi:10.1145/1838751.1838754 fatcat:ibmmwalmtfbfdpjufxccwolzgq

Bangla Natural Language Processing: A Comprehensive Review of Classical, Machine Learning, and Deep Learning Based Methods [article]

Ovishake Sen, Mohtasim Fuad, MD. Nazrul Islam, Jakaria Rabbi, MD. Kamrul Hasan, Mohammed Baz, Mehedi Masud, Md. Abdul Awal, Awal Ahmed Fime, Md. Tahmid Hasan Fuad, Delowar Sikder, MD. Akil Raihan Iftee
2021 arXiv   pre-print
To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials.  ...  There are some review papers to understand the past, previous, and future Bangla Natural Language Processing (BNLP) trends.  ...  Acknowledgment: The authors would like to thank for the support from Taif University Researchers Supporting Project number (TURSP-2020/239), Taif University, Taif, Saudi Arabia.  ... 
arXiv:2105.14875v2 fatcat:kvqmgxpthvh2fj7jza64n6kaiq

Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods

Ovishake Sen, Mohtasim Fuad, Md. Nazrul Islam, Jakaria Rabbi, Mehedi Masud, Md. Kamrul Hasan, Md. Abdul Awal, Awal Ahmed Fime, Md. Tahmid Hasan Fuad, Delowar Sikder, Md. Akil Raihan Iftee
2022 IEEE Access  
To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials.  ...  There are some review papers to understand the past, previous, and future Bangla Natural Language Processing (BNLP) trends.  ...  Moreover, the Bangla's lexis consists of native Bangla words and borrowings from Sanskrit along with other neighbouring languages. Bangla is a morphologically, rich language. B.  ... 
doi:10.1109/access.2022.3165563 fatcat:rmersduz6vbyjjczvobrebskmi

Reduplicated MWE (RMWE) Helps In Improving The CRF Based Manipuri POS Tagger

Kishorjit Nongmeikapam
2012 International Journal of Advanced Information Technology  
With the identification of RMWE and considering it as a feature makes an improvement to a Recall of 80.20%, Precision of 74.31% and F-measure of 77.14%.  ...  The new CRF system shows a Recall of 78.22%, Precision of 73.15% and F-measure of 75.60%.  ...  Complete Reduplication MWEs In the complete reduplication MWEs the single word or clause is repeated once forming a single unit regardless of phonological or morphological variations.  ... 
doi:10.5121/ijitcs.2012.2106 fatcat:g6gza7n7hneihmadgm7dxot2nu

English-Bhojpuri SMT System: Insights from the Karaka Model [article]

Atul Kr. Ojha
2019 arXiv   pre-print
It also presents a brief idea of the implementation of these models in the SMT system for English-Bhojpuri language pair.  ...  This thesis has been divided into six chapters namely: Introduction, Karaka Model and it impacts on Dependency Parsing, LT Resources for Bhojpuri, English-Bhojpuri SMT System: Experiment, Evaluation of  ...  ACKNOWLEDGEMENTS This thesis is a fruit of love and labour possible with the contributions made by many people, directly and indirectly. I would like to express my gratitude to all of them.  ... 
arXiv:1905.02239v1 fatcat:yqkyuzss3bfg3ddpswpfec3qg4

Statistical Parsing by Machine Learning from a Classical Arabic Treebank [article]

Kais Dukes
2015 arXiv   pre-print
Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order.  ...  A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic.  ...  The question that divides us is whether it is crazy enough to have a chance of being correct. -Niels Bohr  ... 
arXiv:1510.07193v1 fatcat:mkx5gtgehrgfjjy5xmwowxgasq

Role of Morphology Injection in SMT

Sreelekha S, Pushpak Bhattacharyya
2017 ACM Transactions on Asian and Low-Resource Language Information Processing  
SMT approaches face the problem of data sparsity while translating into a morphologically rich language. It is very unlikely for a parallel corpus to contain all morphological forms of words.  ...  We propose a solution to generate these unseen morphological forms and inject them into original training corpora.  ...  ACKNOWLEDGMENTS The authors would like to thank Department of Science and Technology, Govt. of India for the funding under Women Scientist Scheme-WOS-A with the project code-SR/WOS-A/ET-1075/2014.  ... 
doi:10.1145/3129208 fatcat:yrwrlxqjbrcnzluhws2hgyz7eq

An Experiment with the CRF++ Parts of Speech (POS) Tagger for Odia 18 ==================================================================== Language in India www.languageinindia

Pitambar Behera, M Phil
2017 unpublished
This research work presents a probability-based CRF++ parts of speech (POS) tagger for Odia language.  ...  Indian languages have always been quite challenging for both linguistics and NLP owing to the fact that they are diverse and multiple in nature and morphologically richer; including some other unique features  ...  POS Tagger for Sanskrit Chandrashekhar 1 (2002-2007) has developed a POS tagger for Sanskrit language with the application of a rule-based method as part of his doctoral research.  ... 
fatcat:luurpce3pngxngfxgyrz2lap3m

Plagiarism Detection for Indonesian Texts

Lucia D. Krisnawati, Klaus U. Schulz
2013 Proceedings of International Conference on Information Integration and Web-based Applications & Services - IIWAS '13  
Hardware used for experiments was provided by Center for Information and Language Processing, Ludwig-Maximilian University, Munich, Germany.  ...  Titien Saraswati for her generosity for consenting her articles to be included in our evaluation corpus. I also thank my mother, brothers and sister for their moral and spiritual support.  ...  Unlike the other systems, a PD system proposed by Adam and suharjito tries to incorporate shallow NLP techniques by using POS tagger of Stanford NLP toolkits [2] .  ... 
doi:10.1145/2539150.2539213 dblp:conf/iiwas/KrisnawatiS13 fatcat:r6p2h4oiq5fi3mhlazokatknrq
« Previous Showing results 1 — 15 out of 21 results