Filters








3,443 Hits in 8.4 sec

Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Chuancong Gao, Sebastian Michel
2012 Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12  
In this paper we consider the problem of mining frequently occurring interesting phrases in large document collections in an ad-hoc fashion.  ...  Our approach to mine the top-k most interesting phrases consists of a novel indexing technique, called Sequence Pattern Indexing (SeqPattIndex), that benefits from the observation that phrases often overlap  ...  Algorithm 1: Top-k Phrase Mining (Sequence Pattern Indexing) Function: mine(D , k) Input: Current Document Collection D ⊆ D, Result Number Output: Top-k Interesting Phrase List 1 result ← ∅; 2 foreach  ... 
doi:10.1145/2247596.2247628 dblp:conf/edbt/GaoM12 fatcat:4p44ufclajbbnk7sfh5bmxpyxe

Interesting-phrase mining for ad-hoc text analytics

Srikanta Bedathur, Klaus Berberich, Jens Dittrich, Nikos Mamoulis, Gerhard Weikum
2010 Proceedings of the VLDB Endowment  
We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus.  ...  While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases.  ...  Finding interesting phrases is related to mining frequent subsequences in sequence collections.  ... 
doi:10.14778/1920841.1921007 fatcat:lmkv2yfnarhchmxszer4zsnsja

Multi-Dimensional, Phrase-Based Summarization in Text Cubes

Fangbo Tao, Honglei Zhuang, Chi Wang Yu, Qi Wang, Taylor Cassidy, Lance M. Kaplan, Clare R. Voss, Jiawei Han
2016 IEEE Data Engineering Bulletin  
To quickly digest the content of subsets of documents in the multi-dimensional context, we study the problem of phrase-based summarization of a subset of documents of interest.  ...  representative phrases.  ...  It mines top-k representative phrases based on three criteria: integrity, popularity and distinctiveness.  ... 
dblp:journals/debu/TaoZYWCKVH16 fatcat:jwez2zkmpncx7o3uts3pavmrie

Labeling Document Clusters with Thematic Phrases

Dr. Y. Sri Lalitha, Dr. N. V. Ganapathi Raju, Dr. O. Srinivasa Rao
2017 IARJSET  
The work considers embedding external knowledge to terms using WordNet and provides an approach to derive a theme in the group of documents and label that group with the most appropriate Phrase.  ...  To solve the problem, a phrase based cluster labeling is considered in this work.  ...  methods Data mining IP Dimensional feature space Image processing NS Tolerant authentication broadcast authentication protocol Ad hoc networks ST Test plans improvement using simulated defect  ... 
doi:10.17148/iarjset.2017.4703 fatcat:fc2iticjnbgijhx5ayhye7cocy

Processing Conjunctive and Phrase Queries with the Set-Based Model [chapter]

Bruno Pôssas, Nivio Ziviani, Berthier Ribeiro-Neto, Wagner Meira
2004 Lecture Notes in Computer Science  
For the TReC-8 collection, our extension led to a gain, relative to the standard vector space model, of 23.32% and 18.98% in average precision curves for conjunctive and phrase queries, respectively. ⋆  ...  and phrase queries.  ...  This pruning strategy is incorporated in the maximal termsets enumeration algorithm. Phrase Queries Search engines are used to find data in response to ad hoc queries.  ... 
doi:10.1007/978-3-540-30213-1_25 fatcat:o7hjcwhpmrhoplxf7lhxikaytu

PRIIME: A Generic Framework for Interactive Personalized Interesting Pattern Discovery [article]

Mansurul Bhuiyan, Mohammad Al Hasan
2016 arXiv   pre-print
In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness  ...  The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns.  ...  [14] develops a toolbox with interestingness measures, mining and post-processing algorithms as built-ins that can assist a user to visually mine interesting patterns.  ... 
arXiv:1607.05749v1 fatcat:mghsgatrfbegnlfbcmf4t6m4ka

PRIIME: A generic framework for interactive personalized interesting pattern discovery

Mansurul A Bhuiyan, Mohammad Al Hasan
2016 2016 IEEE International Conference on Big Data (Big Data)  
In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness  ...  The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns.  ...  [14] develops a toolbox with interestingness measures, mining and post-processing algorithms as built-ins that can assist a user to visually mine interesting patterns.  ... 
doi:10.1109/bigdata.2016.7840653 dblp:conf/bigdataconf/BhuiyanH16 fatcat:z4fps5cjszhnxbx5qei7exvgiy

An Intelligent Approach to Information Retrieval System Using Enhanced DIG and FP-Tree Techniques

P. Janarthanan, N. Rajkumar
2014 IOSR Journal of Computer Engineering  
Consequently, the enhanced indexing technique named Document Index Graph (DIG) used for indexing document collection in order to match and retrieve information efficiently.  ...  The most frequently appearing words are planted into FP (Frequent Pattern) Tree. The FP-tree is a compact representation of all relevant frequently occurring information in a corpus.  ...  It tries to find interesting patterns from the corpus.  ... 
doi:10.9790/0661-16516778 fatcat:lres7cpjq5dlxjgango4phych4

Scalable ad-hoc entity extraction from text collections

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
2008 Proceedings of the VLDB Endowment  
In this paper, we introduce the "ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task.  ...  In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive.  ...  CONCLUSIONS In this paper, we considered the problem of ad-hoc entity extraction from indexed document collections.  ... 
doi:10.14778/1453856.1453958 fatcat:gkeokco6urfddomvma22svop4u

Automatic assignment of biomedical categories: toward a generic approach

P. Ruch
2005 Bioinformatics  
Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.  ...  Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary:  ...  Using noun phrases The index of phrases is used to reorder the set of terms returned by the engine.  ... 
doi:10.1093/bioinformatics/bti783 pmid:16287934 fatcat:sodow3a22vfrtmflp4qfbjw45y

Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

Xuerui Wang, Andrew McCallum, Xing Wei
2007 Seventh IEEE International Conference on Data Mining (ICDM 2007)  
However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks.  ...  Thus our model can model "white house" as a special meaning phrase in the 'politics' topic, but not in the 'real estate' topic. Successive bigrams form longer phrases.  ...  modeling framework to conduct ad-hoc retrieval on Gigabyte TREC collections.  ... 
doi:10.1109/icdm.2007.86 dblp:conf/icdm/WangMW07 fatcat:ciwc47rj45djnmmcexdb46jxfi

Moving Objects Analytics: Survey on Future Location & Trajectory Prediction Methods [article]

Harris Georgiou, Sophia Karagiorgou, Yannis Kontoulis, Nikos Pelekis, Petros Petrou, David Scarlatti, Yannis Theodoridis
2018 arXiv   pre-print
We also list the properties of several real datasets used in the past for validation purposes of those works and, motivated by this, we discuss challenges that arise in the transition from conventional  ...  Phrases: mobility data, moving object trajectories, trajectory prediction, future location prediction.  ...  and sequence mining and corset discovery, to discover different types of pattern.  ... 
arXiv:1807.04639v1 fatcat:lvje57kod5eldaplkl53wbwgti

Further reflections on TREC

Karen Sparck Jones
2000 Information Processing & Management  
The paper focuses on the ad hoc retrieval task, with discussion of other test tracks as appropriate.  ...  The analysis of the tests is presented through a series of key questions about indexing models, document and query descriptions, search strategies, etc.  ...  Acknowledgements I am grateful to Stephen Robertson and Donna Harman for their comments, and wish speci®cally to acknowledge the latter's scrupulous care in distinguishing her points as from a Referee,  ... 
doi:10.1016/s0306-4573(99)00044-8 fatcat:lqrad3oravfsxkzahlpj3afmea

Automatic labeling of multinomial topic models

Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai
2007 Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07  
Multinomial distributions over words are frequently used to model topics in text collections.  ...  A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic.  ...  The advantage of such an ngram testing approach is that it does not require training data, and is applicable to text collection of any ad hoc domains/topics.  ... 
doi:10.1145/1281192.1281246 dblp:conf/kdd/MeiSZ07 fatcat:gpxwjhaeirfnlgrrx2vm5gxqvu

Advanced document description, a sequential approach

Antoine Doucet
2006 SIGIR Forum  
One natural improvement over this representation is the extraction and use of cohesive word sequences.  ...  In this dissertation, we consider the problem of the extraction, selection and exploitation of word sequences, with a particular focus on the applicability of our work to domain-independent document collections  ...  Sequential Pattern Mining Techniques SP ADE.  ... 
doi:10.1145/1147197.1147212 fatcat:k32ofbs5szd4hph4mgq6yq65am
« Previous Showing results 1 — 15 out of 3,443 results