Filters








1,484 Hits in 5.8 sec

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

Kelsey Ball, Dan Garrette
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
Experiments on Hindi-English part-of-speech tagging demonstrate that our approach outperforms standard models when training on monolingual text without transliteration, and testing on code-switched text  ...  Code-switching, the use of more than one language within a single utterance, is ubiquitous in much of the world, but remains a challenge for NLP largely due to the lack of representative data for training  ...  Acknowledgements This work was supported by a Fulbright-Nehru grant from the United States-India Educational Foundation (USIEF) for the first author.  ... 
doi:10.18653/v1/d18-1347 dblp:conf/emnlp/BallG18 fatcat:3qfkrhis6zdkzl2d2abkyyfbr4

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data [article]

Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti Misra Sharma
2017 arXiv   pre-print
Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.  ...  In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data.  ...  For an intrinsic evaluation of our parsing models on code-mixed texts, we manually annotated a data set of Hindi-English codemixed tweets with dependency structures.  ... 
arXiv:1703.10772v1 fatcat:t27nwvoxbfeodbjc5pnh77rhmy

Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus - A Comprehensive Review

Gazi Imtiyaz Ahmad, Jimmy Singla, Anis Ali, Aijaz Ahmad Reshi, Anas A. Salameh
2022 International Journal of Advanced Computer Science and Applications  
In multilingual countries, people express their views using English as well as their native languages. Several reasons can be attributed to code-mixing.  ...  A comprehensive review of sentiment analysis for code-mixed and switched text corpus of Indian social media using machine learning (ML) approaches, based on recent research studies has been presented in  ...  An approach for three code-mixed Indian language texts in language pairs (Hindi-English, Hindi-Bengali and Hindi-Telugu) POS tagging was presented by [32] .  ... 
doi:10.14569/ijacsa.2022.0130254 fatcat:43ub7ku5xjeqvcjkpxfutpqgqi

Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Muhammad Yasir, Li Chen, Amna Khatoon, Muhammad Amir Malik, Fazeel Abid, Ahmed Mostafa Khalil
2021 Computational Intelligence and Neuroscience  
This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English.  ...  Mixed script identification is a hindrance for automated natural language processing systems.  ...  The work on language identification in Turkish [19], Maltese-English [20], Romanized Arabic the code-mixed text using CFE is a novel approach for word- Moroccan (Darija), French-English [21], current  ... 
doi:10.1155/2021/8415333 pmid:34925496 pmcid:PMC8683192 fatcat:6ilsi3lsmjclzjjbgby7wuioui

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

Irshad Bhat, Riyaz A. Bhat, Manish Shrivastava, Dipti Sharma
2017 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers  
Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.  ...  In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data.  ...  For an intrinsic evaluation of our parsing models on code-mixed texts, we manually annotated a data set of Hindi-English code-mixed tweets with dependency structures.  ... 
doi:10.18653/v1/e17-2052 dblp:conf/eacl/BhatSBS17 fatcat:ozvhhxtfefhrlfip3xtlpuwtdi

Language Lexicons for Hindi-English Multilingual Text Processing [article]

Mohd Zeeshan Ansari, Tanvir Ahmad, Noaima Bari
2021 arXiv   pre-print
Due to the unavailability of large standard corpora for Hindi-English mixed lingual language processing tasks we propose the language lexicons, a novel kind of lexical database that supports several multilingual  ...  The present Language Identification techniques presume that a document contains text in one of the fixed set of languages, however, this presumption is incorrect when dealing with multilingual document  ...  Many researchers have looked at the detection of code-mixing in text [23] .  ... 
arXiv:2106.15105v1 fatcat:6xqpid3wajgptlkowayhdeorza

Language lexicons for Hindi-English multilingual text processing

Mohd Zeeshan Ansari, Tanvir Ahmad, Mirza Mohd Sufyan Beg, Noaima Bari
2022 IAES International Journal of Artificial Intelligence (IJ-AI)  
Due to the unavailability of standard corpora for Hindi-English mixed lingual language processing tasks, we propose the language lexicons, a novel kind of lexical database that augments several bilingual  ...  The present language identification techniques presume that a document contains text in one of the fixed set of languages.  ...  Many researchers have looked at the detection of code-mixing in text [22] .  ... 
doi:10.11591/ijai.v11.i2.pp641-648 fatcat:qfze4yu6q5e67fch5p3tintxby

Code Mixed Entity Extraction in Indian Languages using Neural Networks

Irshad Ahmad Bhat, Manish Shrivastava, Riyaz Ahmad Bhat
2016 Forum for Information Retrieval Evaluation  
We describe a Neural Network system for Entity Extraction in Hindi-English Code Mixed text.  ...  In this paper we present our submission for FIRE 2016 Shared Task on Code Mixed Entity Extraction in Indian Languages.  ...  DATA The Entity Extraction in the Code-Mixed (CM) data in Indian Languages shared task is meant for NER in 2 language pairs namely, Hindi-English (H-E) and Telugu-English (T-E).  ... 
dblp:conf/fire/BhatSB16 fatcat:ibkn7rxjhrgjdigxiawf3qbb5u

Character Embedding for Language Identification in Hindi-English Code-mixed Social Media Text

P. V. Veena, M. Anand Kumar, K. P. Soman
2018 Journal of Computacion y Sistemas  
In code-mixed data, one language will be written using another language script. So to process such code-mixed text, identification of language used in each word is important for language processing.  ...  The language used by the users in social media earlier was purely English. Code-mixed text, i.e., mixing of two or more languages, is commonly seen now.  ...  The language identification for code-mixed text proposed in this paper is implemented using word embedding models.  ... 
doi:10.13053/cys-22-1-2775 fatcat:zeod5ue6onef5fbxqzy7f62rn4

Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora

Anupam Jamatia, Amitava Das, Björn Gambäck
2018 Journal of Intelligent Systems  
This article addresses language identification at the word level in Indian social media corpora taken from Facebook, Twitter and WhatsApp posts that exhibit code-mixing between English-Hindi, English-Bengali  ...  The coarse nature of code-mixed social media text makes language identification challenging.  ...  , the first Indian code-mixing social media text corpus (Bengali-Hindi-English) was reported by Das and Gambäck [10] in the context of language identification at the word level.  ... 
doi:10.1515/jisys-2017-0440 fatcat:o6yxge7rzne3lcrgtm7hedrsyi

Universal Dependency Parsing for Hindi-English Code-switching [article]

Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti Misra Sharma
2018 arXiv   pre-print
tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.  ...  In particular, we study dependency parsing of code-switching data of Hindi and English multilingual speakers from Twitter.  ...  Mixed grammar Mixed grammar Hindi grammar English grammar Figure 5 : Code-switching tweet showing grammatical fragments from Hindi and English.  ... 
arXiv:1804.05868v3 fatcat:xsmebj2icjfevmp2gdqtos6yqe

IRLab@IITV@Dravidian-CodeMix-FIRE2020: Sentiment Analysis on Multilingual Code Mixing Text Using BERT-BASE

Anita Saroj, Sukomal Pal
2020 Forum for Information Retrieval Evaluation  
We used the BERT_BASE model for sentiment classification of Dravidian-CodeMix data and for HASOC task, our team submitted systems for all the two sub-tasks in three languages -Hindi, English, and German  ...  This paper discusses our participation in the "Sentiment Analysis in Dravidian-CodeMix", Dravidian-CodeMix and "Hate Speech and Offensive Content Identification in Indo-European Languages"-FIRE 2020 tasks  ...  Writing in a mixed language like Hindi-English, English-Tamil, English-Spanish, English-Malayalam, English-Chinese etc is also quite common.  ... 
dblp:conf/fire/SarojP20 fatcat:wx5nyd7d55d3heyun3kkolfxu4

CEN@Amrita FIRE 2016: Context based Character Embeddings for Entity Extraction in Code-Mixed Text

Srinidhi Skanda V, Shivkaran Singh, Remmiya Devi G, Veena P. V, M. Anand Kumar, Soman K. P
2016 Forum for Information Retrieval Evaluation  
The tweets in code mix are written in English mixed with Hindi or Tamil. In this work, Entity Extraction system is implemented for both Hindi-English and Tamil-English code-mix tweets.  ...  This paper presents the working methodology and results on Code Mix Entity Extraction in Indian Languages (CMEE-IL) shared the task of FIRE-2016.  ...  ACKNOWLEDGMENT We would like to give thanks to the task organizer -Forum for Information Retrieval Evaluation. We also thank organizers of CMEE-IL task.  ... 
dblp:conf/fire/VSGVMP16 fatcat:gdjynp5l5bf3hbl7jrtm2bsua4

niksss at HinglishEval: Language-agnostic BERT-based Contextual Embeddings with Catboost for Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text [article]

Nikhil Singh
2022 arXiv   pre-print
The goal of this task was to investigate the factors influencing the quality of the code-mixed text generation system.  ...  We attempted to solve these tasks using sentence-level embeddings, which are obtained from mean pooling the contextualized word embeddings for all input tokens in our text.  ...  2020) and language identification (Molina et al., 2016) are covered for Code-Mixed textual data.  ... 
arXiv:2206.08910v1 fatcat:fxh7ty6furbq7elenla2jzor5m

A Simple and Efficient Probabilistic Language model for Code-Mixed Text [article]

M Zeeshan Ansari, Tanvir Ahmad, M M Sufyan Beg, Asma Ikram
2021 arXiv   pre-print
We present a simple probabilistic approach for building efficient word embedding for code-mixed text and exemplifying it over language identification of Hindi-English short test messages scrapped from  ...  The problem is often more challenging in code-mixed documents wherein foreign languages words are drawn into base language while framing the text.  ...  We perform an empirical evaluation of PMI based embeddings derived from code-mixed text.  ... 
arXiv:2106.15102v1 fatcat:gdfmqaqim5eb7iaeb3mblcrhye
« Previous Showing results 1 — 15 out of 1,484 results