15 Hits in 9.0 sec

A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

Sujan Kumar Saha, Pabitra Mitra, Sudeshna Sarkar
2012 Knowledge-Based Systems  
Dimensionality reduction leads to performance enhancement in such situations. There are a number of approaches for dimensionality reduction based on feature selection and feature extraction.  ...  Features used for named entity recognition (NER) are often high dimensional in nature. These cause overfitting when training data is not sufficient.  ...  [33] we proposed word selection and word clustering based feature reduction techniques for the Hindi NER task.  ... 
doi:10.1016/j.knosys.2011.09.015 fatcat:wss3offx2ffnlf5iad6jhfbcse

A composite kernel for named entity recognition

Sujan Kumar Saha, Shashi Narayan, Sudeshna Sarkar, Pabitra Mitra
2010 Pattern Recognition Letters  
The features used in machine learning algorithms for NER are mostly string based features. The proposed kernel is based on calculating a novel distance function between the string based features.  ...  The kernel function is applied to the Hindi and biomedical NER tasks and the results are quite promising.  ...  Now we have trained a MaxEnt classifier along with the word selection and word clus- tering based feature reduction approaches proposed by Saha et al. (2009) .  ... 
doi:10.1016/j.patrec.2010.05.004 fatcat:lcct5thwkvgx5polrka4jfpvii

Named Entity Recognition in Hindi using Maximum Entropy and Transliteration

Sujan Kumar Saha, Partha Sarathi Ghosh, Sudeshna Sarkar, Pabitra Mitra
2008 POLIBITS Research Journal on Computer Science and Computer Engineering With Applications  
Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have described a Maximum Entropy based NER system for Hindi.  ...  We have explored different features applicable for the Hindi NER task. We have incorporated some gazetteer lists in the system to increase the performance of the system.  ...  The system has used word selection and word clustering based feature reduction techniques to achieve this result.  ... 
doi:10.17562/pb-38-4 fatcat:ewq4jxuf6vfitmgrpvzmsmbtwy

A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition

Rakesh Patra, Sujan Kumar Saha
2017 Journal of Intelligent Systems  
For example, the Brown clustering algorithm is based on bigram statistics of the words.  ...  To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition.  ...  Saha et al. used the MaxEnt classifier to develop a Hindi NER system [23, 24] . They explored the applicability of different NER features in Hindi language.  ... 
doi:10.1515/jisys-2016-0074 fatcat:lyk5zwprz5b7vcohkb2cdmzzu4

An Information-Extraction System for Urdu---A Resource-Poor Language

Smruthi Mukund, Rohini Srihari, Erik Peterson
2010 ACM Transactions on Asian Language Information Processing  
NLP systems begin with modules such as word segmentation, part-of-speech tagging, and morphological analysis and progress to modules such as shallow parsing and named entity tagging.  ...  Techniques such as bootstrap learning and resource sharing from a syntactically similar language, Hindi, are explored to augment the available annotated Urdu data.  ...  Ekbal and Bandyopadhyay [2010] have shown that using SVM based approach for Hindi NER also produces reasonably good results. There is limited work on NER for Urdu.  ... 
doi:10.1145/1838751.1838754 fatcat:ibmmwalmtfbfdpjufxccwolzgq

Bangla Natural Language Processing: A Comprehensive Review of Classical, Machine Learning, and Deep Learning Based Methods [article]

Ovishake Sen, Mohtasim Fuad, MD. Nazrul Islam, Jakaria Rabbi, MD. Kamrul Hasan, Mohammed Baz, Mehedi Masud, Md. Abdul Awal, Awal Ahmed Fime, Md. Tahmid Hasan Fuad, Delowar Sikder, MD. Akil Raihan Iftee
2021 arXiv   pre-print
However, English is the predominant language for online resources and technical knowledge, journals, and documentation.  ...  , Parts of Speech Tagging, Question Answering System, Sentiment Analysis, Spam and Fake Detection, Text Summarization, Word Sense Disambiguation, and Speech Processing and Recognition.  ...  Acknowledgment: The authors would like to thank for the support from Taif University Researchers Supporting Project number (TURSP-2020/239), Taif University, Taif, Saudi Arabia.  ... 
arXiv:2105.14875v2 fatcat:kvqmgxpthvh2fj7jza64n6kaiq

Textual Similarity Measurement Approaches: A Survey (1)

Amira Abo-Elghit, Aya Al-Zoghby, Taher Hamza
2020 The Egyptian Journal of Language Engineering  
However, many approaches for measuring textual similarity have been presented for Arabic text reviewed and compared in this paper.  ...  Finding the similarity between terms is the essential portion of textual similarity, then used as a major phase for sentence-level, paragraph-level, and script-level similarities.  ...  Hybrid-based M. Al-Samdi et al. [38] 2017 Latent Semantic Indexing and feature- based: N-grams, POS overlap features Word Alignment and NER features.  ... 
doi:10.21608/ejle.2020.42018.1012 fatcat:a2fhtkub7nazlkgzqewqbb7koi

Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods

Ovishake Sen, Mohtasim Fuad, Md. Nazrul Islam, Jakaria Rabbi, Mehedi Masud, Md. Kamrul Hasan, Md. Abdul Awal, Awal Ahmed Fime, Md. Tahmid Hasan Fuad, Delowar Sikder, Md. Akil Raihan Iftee
2022 IEEE Access  
However, English is the predominant language for online resources and technical knowledge, journals, and documentation.  ...  To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials.  ...  a stemming cluster-based morphological parsing technique of Bangla words.  ... 
doi:10.1109/access.2022.3165563 fatcat:rmersduz6vbyjjczvobrebskmi

Information Extraction from Multifaceted Unstructured Big Data

2019 International journal of recent technology and engineering  
These fields are shifting to smart and advanced technologies such as smart manufacturing industries, data-aware medical sciences, and other smart applications.  ...  According to IDC, by 2020, over 40 zettabytes of data will be generated and reproduced.  ...  Learning based methods can be supervised such as Hidden Markov Model (HMM), Maximum Entropy Model (MaxEnt), Support Vector Machine (SVM) and Conditional Random Fields (CRF), unsupervised such as clustering  ... 
doi:10.35940/ijrte.b1074.0882s819 fatcat:iwpaamsftrgztduhgfkbmgo3du

Over a Decade of Social Opinion Mining [article]

Keith Cortis, Brian Davis
2020 arXiv   pre-print
Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels.  ...  These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance  ...  word clusters, and a dependency parser for tweets, besides annotated corpora and web-based annotation tools; • Stanford NLP 77 : software that provides statistical NLP, deep learning NLP and rule-based  ... 
arXiv:2012.03091v1 fatcat:bm5nydbdvbalzi33l3w2ivkdja

Message from the general chair

Benjamin C. Lee
2015 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
To maximize the utility of the injected knowledge, we deploy a learning-based multi-sieve approach and develop novel entity-based features.  ...  We propose a candidate ranking model for "this-issue" anaphora resolution that explores different "issue"-specific and general abstract-anaphora features. The model is not restricted to nominal  ...  We present high dimensional new features, including word-based features and enriched edge (labeltransition) features, for the joint modeling.  ... 
doi:10.1109/ispass.2015.7095776 dblp:conf/ispass/Lee15 fatcat:ehbed6nl6barfgs6pzwcvwxria

Current Project Work on English to Kannada Machine Translation System: a Literature Survey on NLP

Mr Chethan, Chandra Basavaraddi, H Shashirekha
Language processing refers to the way human beings use words to communicate ideas and feelings, and how such communications are processed and understood.  ...  for kagunita, 34 vottakshara signs, etc. and has its own independent script and long document histories.  ...  ., for her valuable motivation, guidance and suggestion, which helped me for completion of this Research paper.  ... 

Native language identification: explorations and applications

Shervin Malmasi
Following our implementation of an NLI system for the shared task -- which investigated the effects of classifier ensembles, feature types and feature diversity -- we explored the task in several new ways  ...  Most work hitherto has focused on the core machine learning and feature engineering facets of the task, obtaining suitable data and unifying the area with a common evaluation framework.  ...  Acknowledgements First and foremost, I would like to thank my supervisor, Mark Dras, for all his help and support.  ... 
doi:10.25949/19437986 fatcat:wnf7vdyrsjfjrmbf3nclwdrire

Pluricentric languages : automatic identification and linguistic variation [article]

Marcos Zampieri, Universität Des Saarlandes, Universität Des Saarlandes
It explores different computational methods and different sets of features for this task that go beyond character and word language models.  ...  This research shows, for example, that it is possible to discriminate between Brazilian and European Portuguese with 99.8% accuracy using journalistic texts.  ...  features and applied information gain, parallel text feature selection, and a manual feature selection to select the best features for classification.  ... 
doi:10.22028/d291-23660 fatcat:um5riv7ffvg4te7eqprfnbjyem

Some models and measures for learning on a budget

Avishek Saha
not depend on domain knowledge and feature selection.  ...  Regularization also played a role in clustering-based MTL. The Task-Clustering (TC) algorithm (Thrun and O'Sullivan, 1996) was the first work to propose clustering of related tasks. J.  ...  APPENDIX A SEMISUPERVISED TRANSFER In the following, we provide proofs for Theorem 4.2, Theorem 4.4 and Theorem 4.5.  ... 
doi:10.26053/0h-r1gf-kr00 fatcat:22g3cdjqbre2ferhrzm3vrluum