A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language
2019
Informatics
In this paper, semantic similarity is explored in Bangla, a less resourced language. ...
Semantic similarity is a long-standing problem in natural language processing (NLP). ...
The funding agency had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. ...
doi:10.3390/informatics6020019
fatcat:pp42ofxrlfaztaqli7zcqicura
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
[article]
2019
arXiv
pre-print
Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel ...
Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks, (ii) creating effective parallel corpora for resource-constrained languages, and (iii ...
In the case of Low Resource Languages (LRL), which lack in linguistic resources such as parallel corpora, the problem promptly comes within sight, with most words being OOV words. ...
arXiv:1811.08816v2
fatcat:h2ixdn7lyngqzfwtqipywwj2dy
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
2019
Journal of Language Modelling
For example, some other sub-languages like Rajasthani, Maithili and Magahi are also often included in the Hindi spectrum. ...
However, the usual meaning of the word 'Hindi' in literature refers to standard Hindi, whose base is Khari Boli and which is an official language of India. 2 ...
Replacing the translation of OOV words with that of their transductions leads to an improvement of 6.3 points in the BLEU score, which is substantial considering that we are translating to a low-resource ...
doi:10.15398/jlm.v7i2.214
fatcat:dztyzz3iizf3vo4zktx7yj33s4
Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study
[article]
2021
arXiv
pre-print
However, incorporating a new language in an LM still remains a challenge, particularly for languages with limited corpora and in unseen scripts. ...
This holds promise for low web-resource languages (LRL) as multilingual models can enable transfer of supervision from high resource languages to LRLs. ...
this study. ...
arXiv:2106.03958v2
fatcat:rkc22nqzhfdp7p4j3kdpiulsdy
DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language
[article]
2021
arXiv
pre-print
In this paper, we propose an explainable approach for hate speech detection from the under-resourced Bengali language, which we called DeepHateExplainer. ...
However, some languages are under-resourced, e.g., South Asian languages like Bengali, that lack computational resources for accurate natural language processing (NLP). ...
XLM-RoBERTa not only outperformed other transformer models on cross-lingual benchmarks but also performed better on various NLP tasks in a low-resourced language setting. ...
arXiv:2012.14353v4
fatcat:xpwnvfh2bnh2xbiewqzrrys2cu
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT
2022
ACM Transactions on Asian and Low-Resource Language Information Processing
problem and release six variations of pre-trained language models for use in any Bangla NLP downstream task. ...
Despite significant advancements in other languages such as English, Spanish, and Chinese, Bangla lacks comprehensive research in this field due to its complex linguistic feature and sentence structure ...
An in-depth analysis of the errors can help reveal more about what the models learn in terms of understanding the structure and semantics of a language which remains a scope for future study. ...
doi:10.1145/3530691
fatcat:dpvjpaiurzcudcvovifmwwyh7a
Handwriting Recognition in Low-resource Scripts using Adversarial Learning
[article]
2019
arXiv
pre-print
low-resource scripts. ...
We record results for varying training data sizes, and observe that our enhanced network generalizes much better in the low-data regime; the overall word-error rates and mAP scores are observed to improve ...
[3] proposed a cross-lingual framework for Indic scripts where training is performed using a script that is abundantly available and testing is done on the low-resource script using character-mapping ...
arXiv:1811.01396v5
fatcat:xp3emb4whrh7jasluh3wv3ffce
Linguistic Resources for Bhojpuri, Magahi and Maithili: Statistics about them, their Similarity Estimates, and Baselines for Three Applications
[article]
2021
arXiv
pre-print
They are closely related to Hindi, which is a relatively high-resource language, which is why we compare with Hindi. ...
Bhojpuri, Magahi, and Maithili, languages of the Purvanchal region of India (in the north-eastern parts), are low-resource languages belonging to the Indo-Aryan (or Indic) family. ...
For example, when the language model is trained as well as tested on Bhojpuri, it gives a cross-lingual similarity of 1. ...
arXiv:2004.13945v2
fatcat:gjtvhkukunb7xcybh3akvfkvhm
IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation
[article]
2021
arXiv
pre-print
Unfortunately, the lack of publicly available NLG benchmarks for low-resource languages poses a challenging barrier for building NLG systems that work well for languages with limited amounts of data. ...
Here we introduce IndoNLG, the first benchmark to measure natural language generation (NLG) progress in three low-resource -- yet widely spoken -- languages of Indonesia: Indonesian, Javanese, and Sundanese ...
The model is first pretrained with denoising in 25 languages using a masked language modelling framework, and then fine-tuned on another 25 languages covering low and medium-resource languages, including ...
arXiv:2104.08200v3
fatcat:txxm4lltvvhp3dwhxmejj7yjaq
Bangla Text Classification using Transformers
[article]
2020
arXiv
pre-print
Models designed with this type of network and its variants recently showed their success in many downstream natural language processing tasks, especially for resource-rich languages, e.g., English. ...
In this work, we fine-tune multilingual transformer models for Bangla text classification tasks in different domains, including sentiment analysis, emotion detection, news categorization, and authorship ...
Word counts are also sampled in a similar manner so that low resource languages have sufficient words in the vocabulary. b) XLM-RoBERTa: RoBERTa [14] improves upon BERT by training on larger datasets ...
arXiv:2011.04446v1
fatcat:2l7qbtqntvcd3mbzo3njds2gde
CogNet: A Large-Scale Cognate Database
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
This paper introduces CogNet, a new, large-scale lexical database that provides cognates-words of common origin and meaning-across languages. ...
Finally, statistics and early insights about the cognate data are presented, hinting at a possible future exploitation of the resource 1 by various fields of lingustics. ...
Acknowledgments This paper was partly supported by the In-teropEHRate project, co-funded by the European Union (EU) Horizon 2020 programme under grant number 826106. ...
doi:10.18653/v1/p19-1302
dblp:conf/acl/BatsurenBG19
fatcat:7wudx56nt5dk5kqxqufod7cvz4
A comprehensive survey on cross-language information retrieval system
2019
Indonesian Journal of Electrical Engineering and Computer Science
Cross language information retrieval (CLIR) is a retrieval process in which the user fires queries in one language to retrieve information from another (different) language. ...
This study is aimed at building an experimental CLIR system between one of the under-resourced language (i.e. Odia) and one of the most commonly used online language (i.e. English) in future. ...
Cross-lingual information retrieval Cross-Language Information Retrieval is quickly becoming a mature area in the information retrieval world. ...
doi:10.11591/ijeecs.v14.i1.pp127-134
fatcat:bg3kk7o5sbcrbbxklsjbfw7aue
Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure
2014
International Journal of Computer Applications
In this paper, we have proposed a hierarchically organized semantic lexicon in Bangla and also a graph based edgeweighting approach to measure semantic similarity between two Bangla words. ...
As we have earlier discussed, this lexicon can be used in various applications like categorization, semantic web, and natural language processing applications like, document clustering, word sense disambiguation ...
Therefore, it will be a useful resource and tool to other psycholinguistic and NLP studies in Bangla. ...
doi:10.5120/16588-6297
fatcat:hyvxlze4vzdlbmnhcstyepyw2i
Deep learning based question answering system in Bengali
2020
Journal of Information and Telecommunication
Recent advances in the field of natural language processing has improved state-of-the-art performances on many tasks including question answering for languages like English. ...
Finally, we compare our models with human children to set up a benchmark score using survey experiments. ARTICLE HISTORY ...
Rahman is working as a Professor in Electrical and Computer Engineering Department, North South University, Dhaka, Bangladesh. ...
doi:10.1080/24751839.2020.1833136
fatcat:ltwrsufie5hrrezjtv2tu56fjy
BANNER: A Cost-Sensitive Contextualized Model For Bangla Named Entity Recognition
2020
IEEE Access
Many architectures have produced good results on high resourced languages like English and Chinese. However, the NER task has not yet achieved much progress for Bangla, a low resource Language. ...
Named Entity Recognition (NER) is a task in Natural Language Processing (NLP) that aims to classify words into a predetermined list of Named Entities (NE). ...
For low resource languages like Bangla, for which there is a dearth of large annotated datasets, this naive approach is limited in its applicability. ...
doi:10.1109/access.2020.2982427
fatcat:ujdbt3urh5gzrkmo4yc66oputu
« Previous
Showing results 1 — 15 out of 85 results