A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition
2012
Knowledge-Based Systems
Dimensionality reduction leads to performance enhancement in such situations. There are a number of approaches for dimensionality reduction based on feature selection and feature extraction. ...
Features used for named entity recognition (NER) are often high dimensional in nature. These cause overfitting when training data is not sufficient. ...
[33] we proposed word selection and word clustering based feature reduction techniques for the Hindi NER task. ...
doi:10.1016/j.knosys.2011.09.015
fatcat:wss3offx2ffnlf5iad6jhfbcse
A composite kernel for named entity recognition
2010
Pattern Recognition Letters
The features used in machine learning algorithms for NER are mostly string based features. The proposed kernel is based on calculating a novel distance function between the string based features. ...
The kernel function is applied to the Hindi and biomedical NER tasks and the results are quite promising. ...
Now we have trained a MaxEnt classifier along with the word selection and word clus- tering based feature reduction approaches proposed by Saha et al. (2009) . ...
doi:10.1016/j.patrec.2010.05.004
fatcat:lcct5thwkvgx5polrka4jfpvii
Named Entity Recognition in Hindi using Maximum Entropy and Transliteration
2008
POLIBITS Research Journal on Computer Science and Computer Engineering With Applications
Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have described a Maximum Entropy based NER system for Hindi. ...
We have explored different features applicable for the Hindi NER task. We have incorporated some gazetteer lists in the system to increase the performance of the system. ...
The system has used word selection and word clustering based feature reduction techniques to achieve this result. ...
doi:10.17562/pb-38-4
fatcat:ewq4jxuf6vfitmgrpvzmsmbtwy
A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
2017
Journal of Intelligent Systems
For example, the Brown clustering algorithm is based on bigram statistics of the words. ...
To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition. ...
Saha et al. used the MaxEnt classifier to develop a Hindi NER system [23, 24] . They explored the applicability of different NER features in Hindi language. ...
doi:10.1515/jisys-2016-0074
fatcat:lyk5zwprz5b7vcohkb2cdmzzu4
An Information-Extraction System for Urdu---A Resource-Poor Language
2010
ACM Transactions on Asian Language Information Processing
NLP systems begin with modules such as word segmentation, part-of-speech tagging, and morphological analysis and progress to modules such as shallow parsing and named entity tagging. ...
Techniques such as bootstrap learning and resource sharing from a syntactically similar language, Hindi, are explored to augment the available annotated Urdu data. ...
Ekbal and Bandyopadhyay [2010] have shown that using SVM based approach for Hindi NER also produces reasonably good results. There is limited work on NER for Urdu. ...
doi:10.1145/1838751.1838754
fatcat:ibmmwalmtfbfdpjufxccwolzgq
Bangla Natural Language Processing: A Comprehensive Review of Classical, Machine Learning, and Deep Learning Based Methods
[article]
2021
arXiv
pre-print
However, English is the predominant language for online resources and technical knowledge, journals, and documentation. ...
, Parts of Speech Tagging, Question Answering System, Sentiment Analysis, Spam and Fake Detection, Text Summarization, Word Sense Disambiguation, and Speech Processing and Recognition. ...
Acknowledgment: The authors would like to thank for the support from Taif University Researchers Supporting Project number (TURSP-2020/239), Taif University, Taif, Saudi Arabia. ...
arXiv:2105.14875v2
fatcat:kvqmgxpthvh2fj7jza64n6kaiq
Textual Similarity Measurement Approaches: A Survey (1)
2020
The Egyptian Journal of Language Engineering
However, many approaches for measuring textual similarity have been presented for Arabic text reviewed and compared in this paper. ...
Finding the similarity between terms is the essential portion of textual similarity, then used as a major phase for sentence-level, paragraph-level, and script-level similarities. ...
Hybrid-based
M. Al-Samdi
et al. [38]
2017
Latent
Semantic
Indexing and feature-
based:
N-grams,
POS
overlap features Word
Alignment and NER
features. ...
doi:10.21608/ejle.2020.42018.1012
fatcat:a2fhtkub7nazlkgzqewqbb7koi
Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods
2022
IEEE Access
However, English is the predominant language for online resources and technical knowledge, journals, and documentation. ...
To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials. ...
a stemming cluster-based morphological parsing technique of Bangla words. ...
doi:10.1109/access.2022.3165563
fatcat:rmersduz6vbyjjczvobrebskmi
Information Extraction from Multifaceted Unstructured Big Data
2019
International journal of recent technology and engineering
These fields are shifting to smart and advanced technologies such as smart manufacturing industries, data-aware medical sciences, and other smart applications. ...
According to IDC, by 2020, over 40 zettabytes of data will be generated and reproduced. ...
Learning based methods can be supervised such as Hidden Markov Model (HMM), Maximum Entropy Model (MaxEnt), Support Vector Machine (SVM) and Conditional Random Fields (CRF), unsupervised such as clustering ...
doi:10.35940/ijrte.b1074.0882s819
fatcat:iwpaamsftrgztduhgfkbmgo3du
Over a Decade of Social Opinion Mining
[article]
2020
arXiv
pre-print
Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. ...
These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance ...
word clusters, and a dependency parser for tweets, besides annotated corpora and web-based annotation tools; • Stanford NLP 77 : software that provides statistical NLP, deep learning NLP and rule-based ...
arXiv:2012.03091v1
fatcat:bm5nydbdvbalzi33l3w2ivkdja
Message from the general chair
2015
2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
To maximize the utility of the injected knowledge, we deploy a learning-based multi-sieve approach and develop novel entity-based features. ...
We propose a candidate ranking model for "this-issue" anaphora resolution that explores different "issue"-specific and general abstract-anaphora features. The model is not restricted to nominal ...
We present high dimensional new features, including word-based features and enriched edge (labeltransition) features, for the joint modeling. ...
doi:10.1109/ispass.2015.7095776
dblp:conf/ispass/Lee15
fatcat:ehbed6nl6barfgs6pzwcvwxria
Current Project Work on English to Kannada Machine Translation System: a Literature Survey on NLP
unpublished
Language processing refers to the way human beings use words to communicate ideas and feelings, and how such communications are processed and understood. ...
for kagunita, 34 vottakshara signs, etc. and has its own independent script and long document histories. ...
., for her valuable motivation, guidance and suggestion, which helped me for completion of this Research paper. ...
fatcat:h54ljka555hmdd35p5fqwbqbdu
Native language identification: explorations and applications
2022
Following our implementation of an NLI system for the shared task -- which investigated the effects of classifier ensembles, feature types and feature diversity -- we explored the task in several new ways ...
Most work hitherto has focused on the core machine learning and feature engineering facets of the task, obtaining suitable data and unifying the area with a common evaluation framework. ...
Acknowledgements First and foremost, I would like to thank my supervisor, Mark Dras, for all his help and support. ...
doi:10.25949/19437986
fatcat:wnf7vdyrsjfjrmbf3nclwdrire
Pluricentric languages : automatic identification and linguistic variation
[article]
2016
It explores different computational methods and different sets of features for this task that go beyond character and word language models. ...
This research shows, for example, that it is possible to discriminate between Brazilian and European Portuguese with 99.8% accuracy using journalistic texts. ...
features and applied information gain, parallel text feature selection, and a manual feature selection to select the best features for classification. ...
doi:10.22028/d291-23660
fatcat:um5riv7ffvg4te7eqprfnbjyem
Some models and measures for learning on a budget
2013
not depend on domain knowledge and feature selection. ...
Regularization also played a role in clustering-based MTL. The Task-Clustering (TC) algorithm (Thrun and O'Sullivan, 1996) was the first work to propose clustering of related tasks. J. ...
APPENDIX A SEMISUPERVISED TRANSFER In the following, we provide proofs for Theorem 4.2, Theorem 4.4 and Theorem 4.5. ...
doi:10.26053/0h-r1gf-kr00
fatcat:22g3cdjqbre2ferhrzm3vrluum