Filters








19,608 Hits in 4.2 sec

Multi-Document Summarization using Distributed Bag-of-Words Model [article]

Kaustubh Mani, Ishan Verma, Hardik Meisheri, Lipika Dey
2018 arXiv   pre-print
In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model.  ...  As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time.  ...  In this work, we use Distributed bag of words (PV-DBOW) model to represent documents and sentences.  ... 
arXiv:1710.02745v2 fatcat:bazas5vnzbdorgqcvdodz2x5ge

Exploiting Semantic Term Relations in Text Summarization

2022 International Journal of Information Retrieval Research  
The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations  ...  The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences.  ...  '' funded by the Department of Science and Technology (DST)[grant number EEQ/2017/000369], Government of India under the SERB scheme.  ... 
doi:10.4018/ijirr.289607 fatcat:hy4jjcucfngmbbtnjscc3invwi

Centroid-based Text Summarization through Compositionality of Word Embeddings

Gaetano Rossiello, Pierpaolo Basile, Giovanni Semeraro
2017 Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres  
The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model.  ...  A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common.  ...  In our work we use two models 1 , continuous bag-of-words and skip-gram, introduced by (Mikolov et al., 2013a) .  ... 
doi:10.18653/v1/w17-1003 dblp:conf/acl-multiling/RossielloBS17 fatcat:ahtytnpuqfg7fk7n3wapwqs6oe

Vectorization of Text Documents for Identifying Unifiable News Articles

Anita Kumari Singh, Mogalla Shashi
2019 International Journal of Advanced Computer Science and Applications  
The effectiveness of various text vectorization methods, namely the bag of word representations with tf-idf scores, word embeddings, and document embeddings are investigated for clustering news articles  ...  The proposed work targets at identifying unifiable news articles for performing multi-document summarization.  ...  The documents in each cluster are summarized using a Hybrid Multi-Document Summarization methodology proposed by the authors, details of which are elaborated in the paper [13] .  ... 
doi:10.14569/ijacsa.2019.0100742 fatcat:3bolg7lwa5anbb4w5bbra5buj4

A Software System for Topic Extraction and Document Classification

Davide Magatti, Fabio Stella, Marco Faini
2009 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology  
While topic extraction is performed by using an optimized implementation of the Latent Dirichlet Allocation model, multi-label document classification is performed by using a specialized version of the  ...  This dataset is used for topic extraction while an independent dataset, consisting of 1,012 elements labeled by humans, is used to evaluate the performance of the Multi-Net Naive Bayes model.  ...  This software component exploits a general purpose Italian vocabulary to obtain the word-document matrix following the bag-of-words model.  ... 
doi:10.1109/wi-iat.2009.49 dblp:conf/webi/MagattiSF09 fatcat:3snbhltd2zcvtfimlzyxffjctm

Learning to Distill: The Essence Vector Modeling Framework [article]

Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang
2016 arXiv   pre-print
However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization.  ...  Learning representations of words is a pioneering study in this school of research.  ...  the contextual words at the input layer, the model is named the distributed bag-of-words (DBOW) model.  ... 
arXiv:1611.07206v1 fatcat:eenwfx4gr5hojboathny4gsdrq

Measurement of Text Similarity: A Survey

Jiapeng Wang, Yihong Dong
2020 Information  
, and document matching.  ...  The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic  ...  Funding: This research was funded by Natural Science Foundation of Zhejiang Province grant number LY20F020009. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/info11090421 fatcat:wmixtnd7fjfw5nnx6u3vpdktji

Exploring events and distributed representations of text in multi-document summarization

Luís Marujo, Wang Ling, Ricardo Ribeiro, Anatole Gershman, Jaime Carbonell, David Martins de Matos, João P. Neto
2016 Knowledge-Based Systems  
on multi-document 80 summarization.  ...  We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information.  ...  continuous bag-of-words models [18] .  ... 
doi:10.1016/j.knosys.2015.11.005 fatcat:zagg74jeinajfhmmhnvimj7b2i

The Influence of Feature Representation of Text on the Performance of Document Classification

Sanda Martinčić-Ipšić, Tanja Miličić, and Todorovski
2019 Applied Sciences  
In particular, we consider the most often used family of bag-of-words models, the recently proposed continuous space models word2vec and doc2vec, and the model based on the representation of text documents  ...  While the bag-of-word models have been extensively used for the document classification task, the performance of the other two models for the same task have not been well understood.  ...  Bag-of-Words Model The bag-of-words (BOW) model represents each document as an unordered set (bag) of features that correspond to the terms in a vocabulary for a given document collection.  ... 
doi:10.3390/app9040743 fatcat:ccvsj6p5cbcrvn6wbsiidfivvy

A Survey of Deep Learning Methods for Relation Extraction [article]

Shantanu Kumar
2017 arXiv   pre-print
Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision.  ...  In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.  ...  Figure 3 summarizes the results of the various multi-instance learning models applied on the distant supervision dataset created by Riedel et al. (2010) .  ... 
arXiv:1705.03645v1 fatcat:5iwefizfa5fkvoze5qink2urku

Latent dirichlet allocation based multi-document summarization

Rachit Arora, Balaraman Ravindran
2008 Proceedings of the second workshop on Analytics for noisy unstructured text data - AND '08  
Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary.  ...  Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries.  ...  Also LDA and the summarization algorithms assume the documents to be "bag-of-words" and we don't involve the grammar.  ... 
doi:10.1145/1390749.1390764 dblp:conf/sigir/AroraR08 fatcat:7fjxnhytszevhlcxtmik4adlfq

Generative Adversarial Nets for Multiple Text Corpora [article]

Baiyang Wang, Diego Klabjan
2017 arXiv   pre-print
; (2) the generation of robust bag-of-words document embeddings for each corpora.  ...  We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given different word embeddings per corpus  ...  words are used to predict the appearance of each word; the skipgram model, where each neighboring word is used individually for prediction.  ... 
arXiv:1712.09127v1 fatcat:fhur7ebflnhuplyc4wcfeeueae

Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization

Rachit Arora, Balaraman Ravindran
2008 2008 Eighth IEEE International Conference on Data Mining  
Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries.  ...  Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events.  ...  Also LDA, SVD and the summarization algorithm based on it assume the documents to be "bag-of-words" and we don't involve the grammar.  ... 
doi:10.1109/icdm.2008.55 dblp:conf/icdm/AroraR08 fatcat:arqxotiaqzfmhb3rtra7gh6tne

Privacy-Preserving Multi-Document Summarization [article]

Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj
2015 arXiv   pre-print
We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances  ...  State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties.  ...  MULTI-DOCUMENT SUMMARIZATION To determine the most representative sentences of a set of documents, we used a multi-document approach based on KP-Centrality [16] .  ... 
arXiv:1508.01420v1 fatcat:tre4pfvjljd4hlnpe5bmdwc6t4

Event-based Multi-document Summarization

Luís Carlos dos Santos Marujo
2016 SIGIR Forum  
We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances  ...  State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties.  ...  MULTI-DOCUMENT SUMMARIZATION To determine the most representative sentences of a set of documents, we used a multi-document approach based on KP-Centrality [16] .  ... 
doi:10.1145/2888422.2888448 fatcat:zmfaxn663bgxjeny7eisb2rmaq
« Previous Showing results 1 — 15 out of 19,608 results