6 Hits in 4.6 sec

Linguistic Resources for Bhojpuri, Magahi and Maithili: Statistics about them, their Similarity Estimates, and Baselines for Three Applications [article]

Rajesh Kumar Mundotiya, Manish Kumar Singh, Rahul Kapur, Swasti Mishra, Anil Kumar Singh
2021 arXiv   pre-print
The POS tagged data sizes are 16067, 14669 and 12310 sentences, respectively, for Bhojpuri, Magahi and Maithili.  ...  The sizes for chunking are 9695 and 1954 sentences for Bhojpuri and Maithili, respectively.  ...  ., highest for Maithili among the three Purvanchal languages and lower for Bhojpuri and Magahi.  ... 
arXiv:2004.13945v2 fatcat:gjtvhkukunb7xcybh3akvfkvhm

Development of a Dataset and a Deep Learning Baseline Named Entity Recognizer for Three Low Resource Languages: Bhojpuri, Maithili and Magahi [article]

Rajesh Kumar Mundotiya, Shantanu Kumar, Ajeet kumar, Umesh Chandra Chaudhary, Supriya Chauhan, Swasti Mishra, Praveen Gatla, Anil Kumar Singh
2020 arXiv   pre-print
Bhojpuri, Maithili and Magahi are low resource languages, usually known as Purvanchal languages.  ...  The lower baseline F1-scores from the NER tool obtained by using Conditional Random Fields models are 96.73 for Bhojpuri, 93.33 for Maithili and 95.04 for Magahi.  ...  Acknowledgements We would like to thank our NER annotators Ajeet Kumar and Umesh Kumar  ... 
arXiv:2009.06451v1 fatcat:uc7jrltc7ja6dnknqafztjrkhi

Building Machine Translation Systems for the Next Thousand Languages [article]

Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod (+12 others)
2022 arXiv   pre-print
We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven  ...  and monolingual datasets for an additional 1000+ languages; and (iii) Studying the limitations of evaluation metrics for these languages and conducting qualitative analysis of the outputs from our MT  ...  Table 2 shows statistics for these three datasets -the full low-resource dataset ("LRL-full"), the portion of the full low-resource dataset used for model training ("LRL-train"), and the full training  ... 
arXiv:2205.03983v3 fatcat:65fva7qvpbaapemrrmovgkpac4

No Language Left Behind: Scaling Human-Centered Machine Translation [article]

NLLB team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun (+27 others)
2022 arXiv   pre-print
More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource  ...  Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages.  ...  NLLB-MD, Toxicity-200, performing our human evaluations, and teaching us about their native languages.  ... 
arXiv:2207.04672v1 fatcat:qa3wmryp4ndo5bpiuwq22zeoqa

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow

M Thirumalai, B Mallikarjun, Sam, B A Sharada, A R Fatihi, Lakhan, Marie Jennifer, S M Bayer, G Ravichandran, L Baskaran, Ramamoorthy, Swarna Assistant (+1 others)
2013 unpublished
We can argue that this is because of the diversity of the tribes in the time of Islam's advent and the dialectical differences between them. And also translating onomatopoeia is hard work.  ...  Qur'an is important for Muslims, because as per Islam it is the holy book of Islam religion and Allah's words revealed to prophet Muhammad (PBUH) through the Angel Gabriel (Jibril).  ...  For my supportive and inspiring wife, Amandy M.  ... 

Politics of mass literacy in India : A case study of two North Indian villages under the 'Total Literacy Campaign', 1988-95

Ajay Kumar
It seeks to study the cultural and linguistic bases of mass literacy and the democratic i.e. participatory and interactive/discursive methods of literacy promotion.  ...  To counter its past failures, it has launched a 'total campaign' approach in adult literacy programme along with 'Education For All' (EFA) goal in general towards elementary education.  ...  The masses on the other hand used other popular speech varieties such as the Maithili, Magahi, Angika, Bhojpuri, Santhali or other tribal languages.  ... 
doi:10.25501/soas.00028640 fatcat:ldzspw2lbnhbvimv2d2zq5as6i