A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multi-Document Summarization using Distributed Bag-of-Words Model
[article]
2018
arXiv
pre-print
In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model. ...
As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. ...
In this work, we use Distributed bag of words (PV-DBOW) model to represent documents and sentences. ...
arXiv:1710.02745v2
fatcat:bazas5vnzbdorgqcvdodz2x5ge
Exploiting Semantic Term Relations in Text Summarization
2022
International Journal of Information Retrieval Research
The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations ...
The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. ...
'' funded by the Department of Science and Technology (DST)[grant number EEQ/2017/000369], Government of India under the SERB scheme. ...
doi:10.4018/ijirr.289607
fatcat:hy4jjcucfngmbbtnjscc3invwi
Centroid-based Text Summarization through Compositionality of Word Embeddings
2017
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. ...
A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. ...
In our work we use two models 1 , continuous bag-of-words and skip-gram, introduced by (Mikolov et al., 2013a) . ...
doi:10.18653/v1/w17-1003
dblp:conf/acl-multiling/RossielloBS17
fatcat:ahtytnpuqfg7fk7n3wapwqs6oe
Vectorization of Text Documents for Identifying Unifiable News Articles
2019
International Journal of Advanced Computer Science and Applications
The effectiveness of various text vectorization methods, namely the bag of word representations with tf-idf scores, word embeddings, and document embeddings are investigated for clustering news articles ...
The proposed work targets at identifying unifiable news articles for performing multi-document summarization. ...
The documents in each cluster are summarized using a Hybrid Multi-Document Summarization methodology proposed by the authors, details of which are elaborated in the paper [13] . ...
doi:10.14569/ijacsa.2019.0100742
fatcat:3bolg7lwa5anbb4w5bbra5buj4
A Software System for Topic Extraction and Document Classification
2009
2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology
While topic extraction is performed by using an optimized implementation of the Latent Dirichlet Allocation model, multi-label document classification is performed by using a specialized version of the ...
This dataset is used for topic extraction while an independent dataset, consisting of 1,012 elements labeled by humans, is used to evaluate the performance of the Multi-Net Naive Bayes model. ...
This software component exploits a general purpose Italian vocabulary to obtain the word-document matrix following the bag-of-words model. ...
doi:10.1109/wi-iat.2009.49
dblp:conf/webi/MagattiSF09
fatcat:3snbhltd2zcvtfimlzyxffjctm
Learning to Distill: The Essence Vector Modeling Framework
[article]
2016
arXiv
pre-print
However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. ...
Learning representations of words is a pioneering study in this school of research. ...
the contextual words at the input layer, the model is named the distributed bag-of-words (DBOW) model. ...
arXiv:1611.07206v1
fatcat:eenwfx4gr5hojboathny4gsdrq
Measurement of Text Similarity: A Survey
2020
Information
, and document matching. ...
The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic ...
Funding: This research was funded by Natural Science Foundation of Zhejiang Province grant number LY20F020009.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/info11090421
fatcat:wmixtnd7fjfw5nnx6u3vpdktji
Exploring events and distributed representations of text in multi-document summarization
2016
Knowledge-Based Systems
on multi-document 80 summarization. ...
We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. ...
continuous bag-of-words models [18] . ...
doi:10.1016/j.knosys.2015.11.005
fatcat:zagg74jeinajfhmmhnvimj7b2i
The Influence of Feature Representation of Text on the Performance of Document Classification
2019
Applied Sciences
In particular, we consider the most often used family of bag-of-words models, the recently proposed continuous space models word2vec and doc2vec, and the model based on the representation of text documents ...
While the bag-of-word models have been extensively used for the document classification task, the performance of the other two models for the same task have not been well understood. ...
Bag-of-Words Model The bag-of-words (BOW) model represents each document as an unordered set (bag) of features that correspond to the terms in a vocabulary for a given document collection. ...
doi:10.3390/app9040743
fatcat:ccvsj6p5cbcrvn6wbsiidfivvy
A Survey of Deep Learning Methods for Relation Extraction
[article]
2017
arXiv
pre-print
Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. ...
In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead. ...
Figure 3 summarizes the results of the various multi-instance learning models applied on the distant supervision dataset created by Riedel et al. (2010) . ...
arXiv:1705.03645v1
fatcat:5iwefizfa5fkvoze5qink2urku
Latent dirichlet allocation based multi-document summarization
2008
Proceedings of the second workshop on Analytics for noisy unstructured text data - AND '08
Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. ...
Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. ...
Also LDA and the summarization algorithms assume the documents to be "bag-of-words" and we don't involve the grammar. ...
doi:10.1145/1390749.1390764
dblp:conf/sigir/AroraR08
fatcat:7fjxnhytszevhlcxtmik4adlfq
Generative Adversarial Nets for Multiple Text Corpora
[article]
2017
arXiv
pre-print
; (2) the generation of robust bag-of-words document embeddings for each corpora. ...
We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given different word embeddings per corpus ...
words are used to predict the appearance of each word; the skipgram model, where each neighboring word is used individually for prediction. ...
arXiv:1712.09127v1
fatcat:fhur7ebflnhuplyc4wcfeeueae
Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization
2008
2008 Eighth IEEE International Conference on Data Mining
Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. ...
Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. ...
Also LDA, SVD and the summarization algorithm based on it assume the documents to be "bag-of-words" and we don't involve the grammar. ...
doi:10.1109/icdm.2008.55
dblp:conf/icdm/AroraR08
fatcat:arqxotiaqzfmhb3rtra7gh6tne
Privacy-Preserving Multi-Document Summarization
[article]
2015
arXiv
pre-print
We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances ...
State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. ...
MULTI-DOCUMENT SUMMARIZATION To determine the most representative sentences of a set of documents, we used a multi-document approach based on KP-Centrality [16] . ...
arXiv:1508.01420v1
fatcat:tre4pfvjljd4hlnpe5bmdwc6t4
Event-based Multi-document Summarization
2016
SIGIR Forum
We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances ...
State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. ...
MULTI-DOCUMENT SUMMARIZATION To determine the most representative sentences of a set of documents, we used a multi-document approach based on KP-Centrality [16] . ...
doi:10.1145/2888422.2888448
fatcat:zmfaxn663bgxjeny7eisb2rmaq
« Previous
Showing results 1 — 15 out of 19,608 results