1,835 Hits in 7.2 sec

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings [article]

Andrei M. Butnaru, Radu Tudor Ionescu
2017 arXiv   pre-print
After each word in a collection of documents is represented as word vector using a pre-trained word embeddings model, a k-means algorithm is applied on the word vectors in order to obtain a fixed-size  ...  In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision.  ...  Acknowledgements The authors have equally contributed to this work. The authors thank the reviewers for their helpful comments.  ... 
arXiv:1707.08098v1 fatcat:v74bzq7ygvcwxaczbskbaotlqy

Distributed Learning over Massive XML Documents in ELM Feature Space

Xin Bi, Xiangguo Zhao, Guoren Wang, Zhen Zhang, Shuang Chen
2015 Mathematical Problems in Engineering  
In this paper, a solution to distributed learning over massive XML documents is proposed, which provides distributed conversion of XML documents into representation model in parallel based on MapReduce  ...  and a distributed learning component based on Extreme Learning Machine for mining tasks of classification or clustering.  ...  Vector Space Model (VSM) [1] is one of the most classic and popular representation models of plain text.  ... 
doi:10.1155/2015/923097 fatcat:2qxlvlx2pfesbnsgxzsgaplaea

Extension of the Rocchio Classification Method to Multi-modal Categorization of Documents in Social Media [chapter]

Amin Mantrach, Jean-Michel Renders
2012 Lecture Notes in Computer Science  
The extension consists of simultaneously maintaining different "centroid" representations for each class, in particular "crossmedia" centroids that correspond to pairs of modes.  ...  To classify new data points, different scores are derived from similarity measures between the new data point and these different centroids; a global classification score is finally obtained by suitably  ...  For the EF and LF fusion models we use a one-vs-rest logistic regression classifier with a l2-norm regularization.  ... 
doi:10.1007/978-3-642-33460-3_14 fatcat:l47s4xr3qfftxe5qwvxr7yvkbi

Fisher kernel based relevance feedback for multimodal video retrieval

Ionut Mironica, Bogdan Ionescu, Jasper Uijlings, Nicu Sebe
2013 Proceedings of the 3rd ACM conference on International conference on multimedia retrieval - ICMR '13  
Hence during relevance feedback we create a new Fisher Kernel representation based on the most relevant examples.  ...  This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval.  ...  We also acknowledge the 2012 Genre Tagging Task of the MediaEval Multimedia Benchmark [6] for providing the test data set.  ... 
doi:10.1145/2461466.2461478 dblp:conf/mir/MironicaIUS13 fatcat:ijtevl3zbre2xj2dmjqn5ozavq

An Improved Plant Identification System by Fuzzy c-means Bag of Visual Words Model and Sparse Coding

Soodabeh Safa
2020 International Journal of Advanced Trends in Computer Science and Engineering  
Classic bag of visual words algorithm is based on k-means clustering and every SIFT features belongs to one cluster and it leads to decreasing classification results.  ...  Sparse representation prevents over-fitting in the classifier by eliminating redundancies and evaluating high-frequency patterns between feature vectors.  ...  ACKNOWLEDGEMENT The authors would like to thank the Fakulti Teknologi Maklumat dan Komunikasi (FTMK), Universiti Teknikal Malaysia Melaka (UTeM), Centre of Advanced Computing Technologies (CACT) for supporting  ... 
doi:10.30534/ijatcse/2020/152942020 fatcat:y7t5rkoohvebvkpoh3j7czqvuq

A Framework of Centroid-Based Methods for Text Categorization

Dandan WANG, Qingcai CHEN, Xiaolong WANG
2014 IEICE transactions on information and systems  
It classifies a document into the class that owns the prototype vector nearest to the document. Many studies have been done on constructing prototype vectors.  ...  In this paper, based on the observation of its general procedure, the centroid-based text classification is treated as a kind of ranking task, and a unified framework for centroid-based TC methods is proposed  ...  The basic idea of centroid-based methods is to construct one prototype vector (centroid) for each class during the training phase and then classify a document into the class that owns the nearest prototype  ... 
doi:10.1587/transinf.e97.d.245 fatcat:nhturn45s5bxdmiveinzameabm

SOTXTSTREAM: Density-based self-organizing clustering of text streams

Avory C. Bryant, Krzysztof J. Cios, M. Sohel Rahman
2017 PLoS ONE  
SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach.  ...  A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM.  ...  There was no additional external funding received for this study.  ... 
doi:10.1371/journal.pone.0180543 pmid:28686655 pmcid:PMC5501566 fatcat:35pyboer4rdp3kzdlp5ygogehq

Text Mining using Nonnegative Matrix Factorization and Latent Semantic Analysis [article]

Ali Hassani, Amir Iranmanesh, Najme Mansouri
2020 arXiv   pre-print
As a result, we propose a new feature agglomeration method based on Nonnegative Matrix Factorization, which is employed to separate the terms into groups, and then each group's term vectors are agglomerated  ...  into a new feature vector.  ...  Therefore, we propose a nearest-neighbors-based centroid initialization for K-Means.  ... 
arXiv:1911.04705v3 fatcat:kkegojl2kffohl65isw7unttom

Asymmetric Learning and Dissimilarity Spaces for Content-Based Retrieval [chapter]

Eric Bruno, Nicolas Moenne-Loccoz, Stéphane Marchand-Maillet
2006 Lecture Notes in Computer Science  
We introduce here the idea of Query-based Dissimilarity Space (QDS) which enables to cope with the asymmetrical setup by converting it in a more classical 2-class problem.  ...  The proposed approach is evaluated on both artificial data and real image database, and compared with stateof-the-art algorithms. ⋆ This work is funded by the Swiss NCCR (IM)2 (Interactive Multimodal Information  ...  Conclusion We have presented a new similarity-based representation space for content-based multimedia retrieval.  ... 
doi:10.1007/11788034_34 fatcat:gkm2chboqbbuhbqyol4eelivpe

Generating and Browsing Multiple Taxonomies Over a Document Collection

2003 Journal of Management Information Systems  
Amy Chow, Michael Danke, and JP Pietrzak for providing feedback on the user interface.  ...  Acknowledgments: The authors gratefully acknowledge Dharmendra Modha and Ray Strong for their contributions to the original design of eClassifier; Larry Proctor for initiating the MindMap project; and  ...  and a classification model for the taxonomy.  ... 
doi:10.1080/07421222.2003.11045749 fatcat:en5ucciawnf7bhrwdwnd2456we

Multilabel classification with meta-level features

Siddharth Gopal, Yiming Yang
2010 Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10  
The fine-grained features in such a space may not be sufficiently expressive for characterizing discriminative patterns, and worse, make the model complexity unnecessarily high.  ...  Rank-SVM, ML-kNN and IBLR-ML (Instance-based Logistic Regression for Multi-label Classification) in most cases.  ...  Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.  ... 
doi:10.1145/1835449.1835503 dblp:conf/sigir/GopalY10 fatcat:ir5cvdjszreqpdjqd6urmaav5a

Complementary Document Representations for Information Retrieval

Sylvia Melzer
2021 Proceedings of the ... International Florida Artificial Intelligence Research Society Conference  
In this paper, we present an approach for combining different document representations to support retrieval systems to deliver similar documents from different views.  ...  Analogously, to fold-in a new 1 × N term vector, t, into an existing LSI model, a projection,t, of t onto the span of the current document vectors (columns of H) is computed byt = Σ −1 k V T k t T .  ...  The algorithm for holistic IR is defined in Algorithm 1. A holistic document representation V T is computed as an approximation of a term-document matrix C by one of lower rank k using the SVD.  ... 
doi:10.32473/flairs.v34i1.128528 fatcat:r2xrrzsphrgjffjmb7k6bymtdy

Improving Rocchio with Weakly Supervised Clustering [chapter]

Romain Vinot, François Yvon
2003 Lecture Notes in Computer Science  
This paper presents a novel approach for adapting the complexity of a text categorization system to the difficulty of the task.  ...  To this end, we propose several clustering algorithms, and report results of various evaluations on standard benchmark corpora such as the Newsgroups corpus.  ...  Classification of new documents is performed by computing the euclidian distance between the document vector and the prototype vector of each class; the document is then assigned to the nearest class.  ... 
doi:10.1007/978-3-540-39857-8_41 fatcat:dubpon6iwbc33kanynmq3jndbu

Spherical Distance Metrics Applied to Protein Structure Classification [article]

James DeFelice, Vicente M. Reyes
2015 arXiv   pre-print
Prior work has shown that the Double Centroid Reduced Representation (DCRR) model is a useful geometric representation for protein structure with respect to visual models, reducing the quantity of modeled  ...  This work combines DCRR with DDPIn for the development of new DCRR centroid-based metrics: spherical binning distance and inter-centroid spherical distance.  ...  in which p-norms of such vectors are referred to as Lp norms, or Lp distance.  ... 
arXiv:1602.08079v1 fatcat:znqtuf3ag5bwhfm73zhctiwku4

Using Titles vs. Full-text as Source for Automated Semantic Document Annotation [article]

Lukas Galke, Florian Mai, Alan Schelten, Dennis Brunsch, Ansgar Scherp
2017 arXiv   pre-print
Thus, conducting document classification by just using the titles is a reasonable approach for automated semantic annotation and opens up new possibilities for enriching Knowledge Graphs.  ...  For the first time, we offer a systematic comparison of classification approaches to investigate how far semantic annotations can be conducted using just the metadata of the documents such as titles published  ...  We thank Tobias Rebholz, Gabi Schädle, and Andreas Oskar Kempf from ZBW for valuable discussions in our expert workshops on how to use SKOS vocabularies and titles to annotate scientific papers.  ... 
arXiv:1705.05311v2 fatcat:hv7v77mhsjgqtcu3j65fn2bv3m
« Previous Showing results 1 — 15 out of 1,835 results