276,245 Hits in 4.1 sec


A Lakshmi Deepthi .
2013 International Journal of Research in Engineering and Technology  
Graph-Based Document Clustering works with frequent senses rather than frequent keywords used in traditional text mining techniques.Similarity between a pair of objects can be defined either explicitly  ...  With this paper, we analyzed existing multi-viewpoint based similarity measure and two related clustering methods.  ...  We aim to cluster documents based on the similarity of one's sub graphs in the document-graphs.  ... 
doi:10.15623/ijret.2013.0208012 fatcat:ck7bwdueyveirhlvihfazvxqh4

Searching by corpus with fingerprints

Charu C. Aggarwal, Wangqun Lin, Philip S. Yu
2012 Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12  
To the best of our knowledge, this is the first work on corpus-based search in massive document collections.  ...  An even more general case is one in which a collection of documents is available as a query to the search process. In such cases, it is desirable to return sets of all pairwise similar documents.  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory  ... 
doi:10.1145/2247596.2247638 dblp:conf/edbt/AggarwalLY12 fatcat:g7pzad624ng2hkikjot37ele2i

Web Document Clustering Using Document Index Graph

B. F. Momin, P. J. Kulkarni, Amol Chaudhari
2006 2006 International Conference on Advanced Computing and Communications  
Hence first part of the paper presents phrase-based model, Document Index Graph (DIG), which allows incremental phrase-based encoding of documents and efficient phrase matching.  ...  It emphasizes on effectiveness of phrase-based similarity measure over traditional single term based similarities.  ...  In this paper, we first used DIG model to demonstrate effectiveness of phrase based similarity over term based similarity, and then we proposed DIGBC algorithm to cluster documents efficiently.  ... 
doi:10.1109/adcom.2006.4289851 fatcat:nluwzarscneppogw7jthrts3fm

Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs [article]

Rui Portocarrero Sarmento, Pavel Brazdil
2018 arXiv   pre-print
Thus, with this information, we leverage optimized algorithms used for graph-based applications.  ...  We are using bipartite graphs - one type of node are documents, and the other type of nodes are words - to know what documents are affected with a word arrival at the stream (the neighbors of the word  ...  arXiv:1811.11746v1 [cs.IR] 29 Nov 2018 Regarding our goal, the efficient update of the similarity between documents (ICS), we use the bipartite graph first order neighbors for new or updated words in the  ... 
arXiv:1811.11746v1 fatcat:l4poiarqvvfqrbaol2efi7xmd4

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations [article]

Yan Xiao, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Xueqi Cheng
2018 arXiv   pre-print
Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural  ...  That is to leverage neural representations to help re-rank a set of candidate documents, which are typically obtained from an initial retrieval stage based on some symbolic index and search scheme (e.g  ...  We expand these seeds to associate their semantically similar documents based on the graph index, i.e., obtaining their k-NNs via looking up the graph.  ... 
arXiv:1806.10869v2 fatcat:f7ggl2nnszchzdhqmkupfc63y4

A Framework for Semantic Text Clustering

Soukaina Fatimi, Chama EL, Larbi Alaoui
2020 International Journal of Advanced Computer Science and Applications  
to efficiently identify the more related groups in a document collection.  ...  The framework allows documents RDF representation, clustering, topic modeling, clusters summarizing, information retrieval based on RDF querying and Reasoning tools.  ...  We present an overall framework, and show how to apply machine learning techniques to mine textual documents using This work is within the framework of the research project "Big Data Analytics -Methods  ... 
doi:10.14569/ijacsa.2020.0110657 fatcat:undy4wffzvgkxiopyhtys64zuu

Document Similarity Search Algorithm Based On Hierarchy Model

Zhu Ge
2015 International Journal of Database Theory and Application  
The calculation of the similarity is based on the total probability model and the efficient search is achieved via level n nodes and paths of citation graph.  ...  Herein, a new document similarity calculation and search method with high efficiency is proposed.  ...  Search Algorithm Based on the citation graph and calculation of similarity, this paper puts forward Next-Level algorithm for document similarity search. Next-Level is a hierarchical search algorithm.  ... 
doi:10.14257/ijdta.2015.8.3.19 fatcat:xhzm53saffcdno6ydt4qjykyjy

Fast document summarization using locality sensitive hashing and memory access efficient node ranking

Ercan Canhasi
2016 International Journal of Electrical and Computer Engineering (IJECE)  
The common text modeling method connects a pair of sentences based on their similarities.  ...  Even thought it can effectively represent the sentence similarity graph of given document(s) its big drawback is a large time complexity of $O(n^2)$, where n represents the number of sentences.  ...  We describe the method for sub-linear time text modeling by means of sentence similarity graph and very efficient node ranking in those graphs. 3.  ... 
doi:10.11591/ijece.v6i3.9030 fatcat:2em66rwag5bm5mi3izyjhrm7ga

Learning multiple graphs for document recommendations

Ding Zhou, Shenghuo Zhu, Kai Yu, Xiaodan Song, Belle L. Tseng, Hongyuan Zha, C. Lee Giles
2008 Proceeding of the 17th international conference on World Wide Web - WWW '08  
used based on the nature of different graphs.  ...  Due to the sparsity of a single graph and noise in graph construction, we propose a new method for combining multiple graphs to measure document similarities, where different factorization strategies are  ...  The heart of memory-based CF methods is the measurement of similarity: either the similarity of users (a.k.a user-based CF) or the similarity of items (a.k.a items-based CF) or a hybrid of both.  ... 
doi:10.1145/1367497.1367517 dblp:conf/www/ZhouZYSTZG08 fatcat:rrxpxevxrfdlflnafigr5j6bke


Saiful Bahri Musa, Andi Baso Kaswar, Supria Supria, Susiana Sari
2016 Jurnal Ilmu Komputer dan Informasi  
In this paper, we proposed a new method for document clustering with dynamic hierarchy algorithm based on fuzzy set type - II from frequent itemset.  ...  Based on the experiment, it resulted the value of F-measure 0.40 for Newsgroup, 0.62 for Classic and 0.38 for Reuters.  ...  Max-graph relies on the maximum β-similarity relationship and it is a sub-graph of the first one. Vertices of the graph is the same as vertices in the graph -similarity.  ... 
doi:10.21609/jiki.v9i2.383 fatcat:effuztb4c5egfj6r6u4ntlgb4m

Graph-Based Extractive Text Summarization Models: A Systematic Review

Abdulkadir Bichi, Pantea Keikhosrokiani, Rohayanti Hassan, Khalil Almekhlafi
2022 Journal of Information Technology Management  
This paper presents a novel systematic review of various graph-based automatic text summarization models.  ...  Many approaches and algorithms have been proposed for automatic text summarization including; supervised machine learning, clustering, graph-based and lexical chain, among others.  ...  Semantic Graph-Based Model The semantic graph-based model used a semantic similarity measure to determine relations between document sentences The method used semantic properties of the documents, such  ... 
doi:10.22059/jitm.2022.84899 doaj:66273e45f5d7421dbdf101925fbfc6b8 fatcat:7wuowbtj4few3kp37sl5g4i42m

Virtualizing Document Algorithms using Predictive Semantic Data

Dr. Muhammad Usman Tariq Dr. Muhammad Usman Tariq, TJPRC
2020 International Journal of Mechanical and Production Engineering Research and Development  
Users can use the approach to perform the translation efficiently. Moreover, the research focuses on the construction of a virtual document and queries for the semantic web data.  ...  Often traditional search methods do not offer the adequate and required level of matching users' information with the available online documents, which act as a barrier for efficient usage and reproduction  ...  In order to search efficiently and effectively from a given keywords query, the virtual documents from RDF graph nodes are constructed and stored them into the knowledge base.  ... 
doi:10.24247/ijmperdjun2020202 fatcat:hzdo55w7dbdznelkvu4wpu7ura

Sharding for literature search via cutting citation graphs

Haozhen Zhao
2014 2014 IEEE International Conference on Big Data (Big Data)  
This paper proposes a novel sharding policy for literature search that bases on cutting the document citation and co-citation graphs.  ...  Experiments on the iSearch test collection reveal that relevant documents for a given query distribute over the shards generated through citation graph cutting in such a pattern that a few shards becomes  ...  A possible way to address this is to add a step following graph partition that assigns documents out of the citation graphs into their most similar shards.  ... 
doi:10.1109/bigdata.2014.7004500 dblp:conf/bigdataconf/Zhao14 fatcat:6eaycgjtxraltmhsts37x6awey

Document Clustering based on Phrase and Single Term Similarity using Neo4j

Document representation - DIG model incrementally construct the graph and simultaneously finds the shared phrase between current document and previously inserted documents from the graph.  ...  The hybrid similarities are used with wellknown density based clustering technique DBSCAN to assess their effect on quality of the clusters.  ...  For getting phrase based similarity different document representation models like n-gram [9] , suffix tree [3] , document index graph(DIG) [7] , [8] is used.  ... 
doi:10.35940/ijitee.c9050.109320 fatcat:t4a24pa3argxdecpi66t5sqchi

A Bag-of-Paths Based Serialized Subgraph Matching for Symbol Spotting in Line Drawings [chapter]

Anjan Dutta, Josep Lladós, Umapada Pal
2011 Lecture Notes in Computer Science  
Similar paths within the whole collection of documents are clustered and organized in a lookup table for efficient indexing.  ...  Efficient indexing of common substructures helps to reduce the computational burden of usual graph based methods.  ...  Moreover, graphs are widely adapted by the research community as a robust tool since a long back, as a result lots of efficient methods and algorithms are available to handle the graph based methods.  ... 
doi:10.1007/978-3-642-21257-4_77 fatcat:q4vq32eszveslp5u5u6etgx4jy
« Previous Showing results 1 — 15 out of 276,245 results