Filters








12 Hits in 1.4 sec

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations [article]

Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick
2017 arXiv   pre-print
We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations  ...  that are widely used for text representation.  ...  Sparse Composite Document Vectors In this section, we present the proposed Sparse Composite Document Vector (SCDV) representation as a novel document vector learning algorithm.  ... 
arXiv:1612.06778v3 fatcat:3d3aqdz42rb4bhtboylvovp624

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
We present a feature vector formation technique for documents -Sparse Composite Document Vector (SCDV)which overcomes several shortcomings of the current distributional paragraph vector representations  ...  that are widely used for text representation.  ...  Sparse Composite Document Vectors In this section, we present the proposed Sparse Composite Document Vector (SCDV) representation as a novel document vector learning algorithm.  ... 
doi:10.18653/v1/d17-1069 dblp:conf/emnlp/MekalaGPK17 fatcat:l6foqtwqrrcjhopdx4puqgqbba

Corpus Statistics Empowered Document Classification

Farid Uddin, Yibo Chen, Zuping Zhang, Xin Huang
2022 Electronics  
In this context, for long texts, we proposed Weighted Sparse Document Vector (WSDV), which performs clustering on the weighted data that emphasizes vital terms and moderates the soft clustering by removing  ...  Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification.  ...  using soft clustering over distributional representations [6] and Improving Document Classification with Multi-Sense (SCDV-MS) [13] .  ... 
doi:10.3390/electronics11142168 fatcat:yjmkl3zhkjcfhpow7wmanmn5pa

Improving Document Classification with Multi-Sense Embeddings [article]

Vivek Gupta, Ankit Saw, Pegah Nokhiz, Harshit Gupta, Partha Talukdar
2019 arXiv   pre-print
Recently proposed Sparse Composite Document Vector (SCDV) (Mekala et. al, 2017) extends this approach from sentences to documents using soft clustering over word vectors.  ...  Efficient representation of text documents is an important building block in many NLP tasks.  ...  Therefore, we can use an efficient sparse operation (sparse addition and multiplication) over sparse vectors to speedup feature formation.  ... 
arXiv:1911.07918v1 fatcat:oqu73ngsqnf5tny53qfuxp56ia

Unsupervised Contextualized Document Representation [article]

Ankur Gupta, Vivek Gupta
2021 arXiv   pre-print
SCDV (Mekala et. al., 2017) further extends this from sentences to documents by employing soft and sparse clustering over pre-computed word vectors.  ...  sense disambiguation with SCDV soft clustering approach.  ...  Gupta et al., 2016; Mekala et al., 2017 proposed clustering-based technique with tf-idf weighting to form sparse composite document vector (SCDV), thus extending the simple averaging approach beyond a  ... 
arXiv:2109.10509v1 fatcat:weuxtrxpofcphozmz5cp7vm33i

P-SIF: Document Embeddings Using Partition Averaging

Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
effective document representation.  ...  While it is desirable to use the same method to represent documents as well, unfortunately, the effectiveness is lost when representing long documents involving multiple sentences.  ...  These representations essentially represent the global contexts of the documents as a distribution over topics.  ... 
doi:10.1609/aaai.v34i05.6292 fatcat:hq2iwfuqtnegleknxxpkhhp2fi

P-SIF: Document Embeddings Using Partition Averaging [article]

Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar
2020 arXiv   pre-print
effective document representation.  ...  While it is desirable to use the same method to represent documents as well, unfortunately, the effectiveness is lost when representing long documents involving multiple sentences.  ...  These representations essentially represent the global contexts of the documents as a distribution over topics.  ... 
arXiv:2005.09069v1 fatcat:6d3jitjpw5ey7fmyvidwsvc244

A Tensor Factorization on Rating Prediction for Recommendation by Feature Extraction from Reviews

Yang Sun, Guan-Shen Fang, Sayaka Kamei
2020 International Journal of Networking and Computing  
In our evaluation, we use pre-processed data of five cities in YELP challenge dataset, and apply one of LDA, Doc2Vec and SCDV to get numeric feature vectors of reviews.  ...  Secondly, it uses TF which is trained by the proposed first-order gradient descent method for TF named Feature Vector Gradient Descent (FVGD).  ...  Dirichlet Allocation (LDA) [3] , Document to Vector (Doc2Vec) [15] , Sparse Composite Document Vectors (SCDV) [17] .  ... 
doi:10.15803/ijnc.10.2_111 fatcat:5yk4tf3wanchbaey5x5jlbjmky

Comprehensive biological interpretation of gene signatures using semantic distributed representation [article]

Yuumi Okuzono, Takashi Hoshino
2019 bioRxiv   pre-print
gene signatures by incorporating a method of distributed document representation from natural language processing (NLP).  ...  In proposed algorithm, a gene-topic vector is created by multiplying the feature vector based on the gene's distributed representation by the probability of the gene signature topic and the low frequency  ...  In addition, we executed an original algorithm 119 for creating a unique gene signature feature vector based on the sparse composite document 120 vectors (SCDV) [9] method from NLP using only R language  ... 
doi:10.1101/846691 fatcat:fxfswgudjff7haq3g3zw4vztpa

An AI-based methodology for the automatic classification of a multiclass ebook collection using information from the tables of contents

E. Giannopoulou, N. Mitrou
2020 IEEE Access  
Extensive experiments were conducted using various configurations of preprocessing steps, NN set up and vector and vocabulary sizes to assess their impact on the classifier's performance.  ...  The vector construction leverages information that was extracted from the table of contents (ToC) of each book using the TF-IDF weighting scheme (for the first case) and the Keras tokenizer (for the second  ...  The sparse composite document vector (SCDV), which was proposed in [33] , extended the weighted averaging of word vectors from sentences to documents by using soft clustering over word vectors, while  ... 
doi:10.1109/access.2020.3041651 fatcat:6ckarslugbgbjhskfcsabxppq4

Unsupervised Contextualized Document Representation

Ankur Gupta, Vivek Gupta
2021 Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing   unpublished
SCDV (Mekala et al., 2017) further extends this from sentences to documents by employing soft and sparse clustering over pre-computed word vectors.  ...  sense disambiguation with SCDV soft clustering approach.  ...  Gupta et al., 2016; Mekala et al., 2017 proposed clustering-based technique with tf-idf weighting to form sparse composite document vector, thus extending the simple averaging approach beyond a single  ... 
doi:10.18653/v1/2021.sustainlp-1.17 fatcat:clhusfwecbfcbcttbq7ekdyopa

Measurement of similarity between texts
テキスト間の類似度の測定

Hidetsugu NANBA
The Journal of Information Science and Technology Association  
SCDV: sparse composite document vectors using soft clustering over distributional representations, Proceedings of EMNLP 2017, 2017, p.659-669. た尺度である。 11) Mikolov, T.; Le, Q.  ...  Distributed representations of sentences and documents. Proceedings of the 31st International 4.おわりに Conference on Machine Learning, ICML 2014, 2014, p.1188-1196.  ... 
doi:10.18919/jkg.70.7_373 fatcat:hhmrzk4zwjdvnjasmd3p6nemly