1,098 Hits in 7.3 sec

Text Categorisation Using Document Profiling [chapter]

Maximilien Sauban, Bernhard Pfahringer
2003 Lecture Notes in Computer Science  
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation.  ...  Similarly to standard feature selection for text classification, the dimensionality of instances is drastically reduced this way, which in turn greatly lowers the computational load for the subsequent  ...  Acknowledgments The authors would like to thank the other members of the WEKA group for their assistance, especially Eibe Frank and Nils Weidmann.  ... 
doi:10.1007/978-3-540-39804-2_37 fatcat:yglkexbayfepnmcjt74xtdxywa

Self-taught hashing for fast similarity search

Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu
2010 Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10  
In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal l-bit binary codes for all documents in the given corpus via  ...  A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes  ...  Acknowledgements We are grateful to Dr Xi Chen (Alberta) for his valuable discussion and the London Mathematical Society (LMS) for their support of this work (SC7-09/10-6).  ... 
doi:10.1145/1835449.1835455 dblp:conf/sigir/ZhangWCL10 fatcat:o6gw5daigbfspf2aig3hqfghji

Self-Taught Hashing for Fast Similarity Search [article]

Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu
2010 arXiv   pre-print
Our experiments on three real-world text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms state-of-the-art techniques  ...  In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal l-bit binary codes for all documents in the given corpus via  ...  Acknowledgements We are grateful to Dr Xi Chen (Alberta) for his valuable discussion and the London Mathematical Society (LMS) for their support of this work (SC7-09/10-6).  ... 
arXiv:1004.5370v1 fatcat:kmaqddklzfd5tjnrljfjepwvze

Distributional Measures of Semantic Abstraction

Sabine Schulte im Walde, Diego Frassinelli
2022 Frontiers in Artificial Intelligence  
) and in terms of the generality–specificity distinction (e.g., animal–fish), in order to compare the strengths and weaknesses of the measures regarding categorisations of abstraction, and to determine  ...  This article provides an in-depth study of distributional measures for distinguishing between degrees of semantic abstraction.  ...  NN The neighbourhood density of a target word t is defined as the average vector-space distance between the k nearest neighbours of t .  ... 
doi:10.3389/frai.2021.796756 pmid:35252847 pmcid:PMC8892386 fatcat:g5owyb5gxnbpvgtjv3fv2ot7xu

Exploring Multidimensional Continuous Feature Space to Extract Relevant Words [chapter]

Márius Šajgalík, Michal Barla, Mária Bieliková
2014 Lecture Notes in Computer Science  
We evaluate our method within text categorisation problem using a well-known 20-newsgroups dataset and achieve state-of-the-art results.  ...  With growing amounts of text data the descriptive metadata become more crucial in efficient processing of it.  ...  VG1/0675/11, APVV-0208-10 and it is the partial result of the Research and Development Operational Programme project "University Science Park of STU Bratislava", ITMS 26240220084, co-funded by the European  ... 
doi:10.1007/978-3-319-11397-5_12 fatcat:y5cwfh4n4jczxhonyhlr2tl2hi

Dimensionality Reduction through Sub-space Mapping for Nearest Neighbour Algorithms [chapter]

Terry R. Payne, Peter Edwards
2000 Lecture Notes in Computer Science  
However, several studies have demonstrated that this assumption rarely holds; for many supervised learning algorithms, the inclusion of irrelevant or redundant attributes can result in a degradation in  ...  While a variety of different methods for dimensionality reduction exist, many of these are only appropriate for datasets which contain a small number of attributes (e.g. < 20).  ...  Acknowledgements T.Payne acknowledges financial support provided by the UK Engineering & Physical Sciences Research Council (EPSRC).  ... 
doi:10.1007/3-540-45164-1_35 fatcat:uweaarzjhbdy7g2bbe3rmjhbwy

On Document Classification with Self-Organising Maps [chapter]

Jyri Saarikoski, Kalervo Järvelin, Jorma Laurikkala, Martti Juhola
2009 Lecture Notes in Computer Science  
We compared the results gained to those of k nearest neighbour searching and k-means clustering.  ...  This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics.  ...  Acknowledgements The research was partially funded by Alfred Kordelin Fund and Academy of Finland, projects 120264, 115609 and 124131. SNOWBALL stemmer was by Martin Porter.  ... 
doi:10.1007/978-3-642-04921-7_15 fatcat:sgoihxztcreolo2rx5gregzlzi

Twitter Sentiment Analysis Using Deep Learning Techniques

S. Kasifa Farnaaz and A. Sureshbabu
2022 International journal of modern trends in science and technology  
This study aims to perform Sentimental analysis using deep learning with bigrams and trigrams to classify the tweets accurately.  ...  It also shows a parametric relationship between operations that are influenced by perceived boundaries. The qualities conveyed in them address the tweets: positive, negative, or fair.  ...  In order to classify each data point's nearest neighbour, the K-NN employs a clear common vote. The number of neighbours closest to (K, or by definition or by a number of neighbours RNN model.  ... 
doi:10.46501/ijmtst0802035 fatcat:64ykzecyg5gjbex5nmlyzkxtu4

The impact of metadata on the accuracy of automated patent classification

Georg Richter, Andrew MacFarlane
2005 World Patent Information  
In this project, automated classifiers using the k-Nearest Neighbour algorithm were developed for the classification of patents into two different classification systems.  ...  The study shows that metadata can play an extremely useful role in the classification of patents. Nonetheless, it must not be used indiscriminately but only after careful evaluation of its usefulness.  ...  Acknowledgements The authors would like to thank Dr Stephen Robertson, Microsoft Corporation, for his friendly and generous support and advice on various aspects of this study, in particular term weight  ... 
doi:10.1016/j.wpi.2004.08.001 fatcat:2bwoswsh4zckdnzapf62pxchi4

Towards noise and error reduction on foundry data gathering processes

Igor Santos, Javier Nieves, Yoseba K. Penya, Pablo G. Bringas
2010 2010 IEEE International Symposium on Industrial Electronics  
In this paper, we address the use of Singular Value Decomposition (SVD) and Latent Semantic Analysis (LSA) in order to reduce the number of ambiguities and noise in the dataset.  ...  Further, we have tested this approach comparing the results without this preprocessing step in order to show the effectiveness of the proposed method.  ...  -K-nearest neighbour: For K-nearest neighbour we have performed experiments with k = 1, k = 2, k = 3, k = 4, and k = 5.  ... 
doi:10.1109/isie.2010.5637901 fatcat:llhr26xg2jexvikmz3c26t6tcu

A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks [article]

Israa Khalaf Salman Al-Tameemi, Mohammad-Reza Feizi-Derakhshi, Saeed Pashazadeh, Mohammad Asadpour
2022 arXiv   pre-print
As social media continues to develop, people post a massive amount of information in different forms, including text, photos, audio and video.  ...  Our study focuses on the forefront field of multimodal SA, which examines visual and textual data posted on social media networks.  ...  They also used a sentiment score vector of tweets to enhance the performance of the SVM classifier. K-Nearest Neighbour (KNN).  ... 
arXiv:2207.02160v1 fatcat:l3vxpjnqkrfthkvhdldwonpoe4

Finding translations for low-frequency words in comparable corpora

Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, Andrea Mulloni
2006 Machine Translation  
In this article, we study possibilities of improving the extraction of low-frequency equivalents from bilingual comparable corpora.  ...  We develop a method that aims to compensate for insufficient amounts of corpus evidence on rare words: prior to measuring cross-language similarities, the method uses same-language corpus data to model  ...  of nearest neighbours used for smoothing (k), the mean rank of the correct equivalent and the frequency rank, illustrated on the German-Spanish data when using the performance-based method for estimating  ... 
doi:10.1007/s10590-007-9029-7 fatcat:s6xxgb5nvngproveprfo3mwysq

Textual data classification for a sectoral categorisation of public investments

Carlo Amati, Fabio De Angelis, Francesca Romani
The result is achieved through a supervised classification methodology based on K-Nearest Neighbour Algorithm which works on the Singular Value Decomposition Matrices of the supervisor set, using appropriate  ...  Therefore, we present a strategy to apply a homogeneous sectoral categorisation of projects monitored in different Italian Databases on Public Investments, based on the exploitation of textual information  ...  The linkage between the classified and unclassified method is a starting point for new research in which the unsupervised classification can be used to enhance the training set adaptation and knowledge  ... 

A review on sentiment analysis in psychomedical diagnosis

Monali Kishor Patil, Nandini Chaudhari, Ram Bhavsar, BV Pawar
2020 Open Journal of Psychiatry and Allied Sciences  
In the domain of medical science, little amount of work has been done in clinical SA. However, the efforts are at an elementary level in the course of research of SA for behavioural psychology.  ...  In medical science, huge data is available which can be used for predictive modelling.  ...  SA is a multidisciplinary field which uses both NLP and ML. SA computationally identifies and categorises opinions which are expressed in a piece of text.  ... 
doi:10.5958/2394-2061.2020.00025.7 fatcat:x6tudvjoxrfvrbil6xn7jmak24

Sentiment Evaluation: User, Business Assessment and Hashtag Analysis

Chetan Jha, Ray Walshe
2017 Irish Conference on Artificial Intelligence and Cognitive Science  
The methods described here provide users with a robust and flexible way of profiling Twitter users using sentiment extracted from tweet data.  ...  different users in the form of tweets and using statistical, learning and natural language processing techniques..  ...  Vrunda and Vipul conducted a comparative study [8] of Support Vector Machine, Naïve Bayes, Multi-Layer Perceptron, Decision Tree, Subjective Lexicon Method, Case based Reasoning and K-Nearest Neighbour  ... 
dblp:conf/aics/JhaW17 fatcat:bndyfwzfxfh7xdouhwcahjgw7i
« Previous Showing results 1 — 15 out of 1,098 results