11,206 Hits in 7.5 sec

Named Entity Recognition on Code-Mixed Cross-Script Social Media Content

Somnath Banerjee, Sudip Kumar Naskar, Paolo Rosso, Sivaji Bandyopadhyay
2018 Journal of Computacion y Sistemas  
Focusing on the current multilingual scenario in social media, this paper reports automatic extraction of named entities (NE) from code-mixed cross-script social media data.  ...  This paper also introduces a Bengali-English (Bn-En) code-mixed cross-script dataset for NE research and proposes domain specific taxonomies for NE.  ...  Named entity recognition from scratch on social media. Proceedings of 6th International Workshop on Mining Ubiquitous and Social Environments (MUSE), co-located with the ECML PKDD. 25.  ... 
doi:10.13053/cys-21-4-2850 fatcat:kgqlbblju5gehhg56hpns3eure

Cross Script Hindi English NER Corpus from Wikipedia [article]

Mohd Zeeshan Ansari, Tanvir Ahmad, Md Arshad Ali
2018 arXiv   pre-print
The development of mixed lingual Indian Named Entity Recognition (NER) systems are facing obstacles due to unavailability of the standard evaluation corpora.  ...  The text generated on social media platforms is essentially a mixed lingual text. The mixing of language in any form produces considerable amount of difficulty in language processing systems.  ...  Except for Bengali-English Code Mixed Cross Script Named Entity corpora which is extracted from social media content [10] , to the best of our knowledge, no Hindi-English cross script automatically built  ... 
arXiv:1810.03430v1 fatcat:zvq6trwx3vdc5iarfhw54m5gzm

Modeling Classifier for Code Mixed Cross Script Questions

Rupal Bhargava, Shubham Khandelwal, Akshit Bhatia, Yashvardhan Sharma
2016 Forum for Information Retrieval Evaluation  
Focusing on this current multilingual scenario, code-mixed cross-script (i.e., non-native script) data gives rise to a new problem and presents serious challenges to automatic Question Answering (QA) and  ...  With a boom in the internet, the social media text had been increasing day by day and the user generated content (such as tweets and blogs) in Indian languages are written using Roman script due to various  ...  RELATED WORK Today social media platforms are flooded by millions of posts everyday on various topics resulting in code mixing in multilingual countries like India.  ... 
dblp:conf/fire/BhargavaKBS16 fatcat:tm56fwro3jaybmas35d2i3iafe

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets

Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
2018 Proceedings of ACL 2018, Student Research Workshop  
While growing code-mixed content on Online Social Networks (OSNs) provides a fertile ground for studying various aspects of code-mixing, the lack of automated text analysis tools render such studies challenging  ...  Named Entity Recognition (NER) is an important text analysis task which is not only informative by itself, but is also needed for downstream NLP tasks such as semantic role labeling.  ...  We particularly focus on efforts to building NERs for social media content and, NERs for Indian languages and code-mixed corpora.  ... 
doi:10.18653/v1/p18-3008 dblp:conf/acl/SinghSK18 fatcat:4priedub3zcu3kjuielspvtfwi

AMRITA_CEN@FIRE 2016: Code-Mix Entity Extraction for Hindi-English and Tamil-English Tweets

Remmiya Devi G, Veena P. V, M. Anand Kumar, Soman K. P
2016 Forum for Information Retrieval Evaluation  
The work is submitted as a part of Shared task on Code Mix Entity Extraction for Indian Languages(CMEE-IL) at Forum for Information Retrieval Evaluation (FIRE) 2016.  ...  Social media text holds information regarding various important aspects.  ...  Extracting such informative content from an unorganized text format is the most challenging task. In our task we deal with social media text, specifically code-mix twitter dataset.  ... 
dblp:conf/fire/GVMP16 fatcat:gccds65b3jc5hfswpbhp4pj7ru

Code mixed cross script factoid question classification - A deep learning approach

Somnath Banerjee, Sudip Naskar, Paolo Rosso, Sivaji Bandyopadhyay, David Pinto, Vivek Kumar Singh, Aline Villavicencio, Philipp Mayr-Schlegel, Efstathios Stamatatos
2018 Journal of Intelligent & Fuzzy Systems  
Recent trends in social media usage have led to a proliferation of studies on social media content.  ...  Multilingual social media users often write native language content in non-native script (cross-script).  ...  In this work, we deal with Bengali-English code-mixed cross-script content.  ... 
doi:10.3233/jifs-169481 fatcat:c4nvgunbqvefhjqtufzh6stpja

Named Entity Recognition for Hindi-English Code-Mixed Social Media Text

Vinay Singh, Deepanshu Vijay, Syed Sarfaraz Akhtar, Manish Shrivastava
2018 Proceedings of the Seventh Named Entities Workshop  
Named Entity Recognition (NER) is a major task in the field of Natural Language Processing (NLP), and also is a subtask of Information Extraction.  ...  In this paper, we present a corpus for NER in Hindi-English Code-Mixed along with extensive experiments on our machine learning models which achieved the best f1-score of 0.95 with both CRF and LSTM.  ...  In Section 2, we review related research in the area of Named Entity Extraction on code-mixed social media texts. In Section 3, we describe the corpus creation and annotation scheme.  ... 
doi:10.18653/v1/w18-2405 dblp:conf/aclnews/SinghVAS18 fatcat:k3zbazbzwnb5fntq3st2dbmoju

Hierarchical classification for Multilingual Language Identification and Named Entity Recognition

Saatvik Shah, Vaibhav Jain, Sarthak Jain, Anshul Mittal, Jatin Verma, Shubham Tripathi, Rajesh Kumar
2015 Forum for Information Retrieval Evaluation  
The subtask involved multilingual language identification (including mixed words and anomalous foreign words), named entity recognition (NER) and subclassification.  ...  This paper describes the approach for Subtask-1 of the FIRE-2015 Shared Task on Mixed Script Information Retrieval.  ...  This research paper addresses language identification (LI) and Named Entity Recognition (NER) for text in social media.  ... 
dblp:conf/fire/ShahJJMVTK15 fatcat:ea2rq65ggfcmzdcy3fcpgim3qm

A Survey of Code-switched Speech and Language Processing [article]

Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Alan W Black
2020 arXiv   pre-print
As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for.  ...  This survey reviews computational approaches for code-switched Speech and Natural Language Processing.  ...  To this end, they have col- Named Entity Recognition Another sequence labeling task of interest is Named Entity Recognition (NER).  ... 
arXiv:1904.00784v3 fatcat:r5tsg4kdnfbtnndae523c32pta

Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus - A Comprehensive Review

Gazi Imtiyaz Ahmad, Jimmy Singla, Anis Ali, Aijaz Ahmad Reshi, Anas A. Salameh
2022 International Journal of Advanced Computer Science and Applications  
Sentiment analysis of monolingual social media content has been carried out for the last two decades.  ...  Code-mixing and switching are linguistic behavior shown by the bilingual/multilingual population, primarily in spoken but also in written communication, especially on social media.  ...  Various tools for POS tagging, language identification as well as named entity recognition (NER) have been developed for the analysis of code-mixed data over the recent years.  ... 
doi:10.14569/ijacsa.2022.0130254 fatcat:43ub7ku5xjeqvcjkpxfutpqgqi

Language Identification of Hindi-English tweets using code-mixed BERT [article]

Mohd Zeeshan Ansari, M M Sufyan Beg, Tanvir Ahmad, Mohd Jazib Khan, Ghazali Wasim
2021 arXiv   pre-print
Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states.  ...  Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper.  ...  A collaborative effort was formed to extract entities from code-mixed Tamil-English and Hindi-English social media content [8] .  ... 
arXiv:2107.01202v1 fatcat:kf5hpdin2vbcdltgkadaprxrha

MSIR@FIRE: A Comprehensive Report from 2013 to 2016

Somnath Banerjee, Monojit Choudhury, Kunal Chakma, Sudip Kumar Naskar, Amitava Das, Sivaji Bandyopadhyay, Paolo Rosso
2020 SN Computer Science  
Keywords Information retrieval • Indian languages • Social media • Transliterated search • Code-mixed QA This article is part of the topical collection "Forum for Information Retrieval Evaluation" guest  ...  MSIR track was first introduced in 2013 at FIRE and the aim of MSIR was to systematically formalize several research problems that one must solve to tackle the code mixing in Web search for users of many  ...  Acknowledgements Somnath Banerjee and Sudip Kumar Naskar are supported by Media Lab Asia, MeitY, Government of India, under the Visvesvaraya PhD Scheme for Electronics & IT.  ... 
doi:10.1007/s42979-019-0058-0 fatcat:z5ojljqkkfatph46hzjj6epnny

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

Sunayana Sitaram, Sai Krishna Rallabandi, Shruti Rijhwani, Alan W. Black
2016 9th ISCA Speech Synthesis Workshop  
However, due to the rise in conversational data available from social media, phenomena such as code-mixing, in which multiple languages are used together in the same conversation or sentence are now seen  ...  From our subjective experiments we find that listeners have a strong preference for cross-lingual systems with Hindi as the target language for code-mixed Hindi and English text.  ...  word which language it belonged to, or whether it was mixed, ambiguous or a named entity.  ... 
doi:10.21437/ssw.2016-13 dblp:conf/ssw/SitaramRRB16 fatcat:ztr4tzn3f5bcxnbowv5q3ru7yq

L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and BERT Language Models [article]

Ravindra Nayak, Raviraj Joshi
2022 arXiv   pre-print
Code-switching occurs when more than one language is mixed in a given sentence or a conversation. This phenomenon is more prominent on social media platforms and its adoption is increasing over time.  ...  We present L3Cube-HingCorpus, the first large-scale real Hindi-English code mixed data in a Roman script. It consists of 52.93M sentences and 1.04B tokens, scraped from Twitter.  ...  In this internet era, we see the usage of code-mixed data prevalently in social media and chat platforms [13] .  ... 
arXiv:2204.08398v1 fatcat:b3ltly4s6bbprofijwhhhbx3xi

CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing [article]

Sai Muralidhar Jayanthi, Kavya Nerella, Khyathi Raghavi Chandu, Alan W Black
2021 arXiv   pre-print
These successes, in conjunction with the proliferating mixed language interactions on social media have boosted interest in modeling code-mixed texts.  ...  We believe this work has a potential to foster a distributed yet collaborative and sustainable ecosystem in an otherwise dispersed space of code-mixing research.  ...  Like a curse in disguise, though code-mixing is widely prevalent and available on social media, it is accompanied with non-standard spellings, mixed scripts and ill-formed sentences are common in code-mixing  ... 
arXiv:2106.06004v1 fatcat:cf2x5u4gebapvbhtem72k7tz24
« Previous Showing results 1 — 15 out of 11,206 results