A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing
2012
Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12
In this paper we consider the problem of mining frequently occurring interesting phrases in large document collections in an ad-hoc fashion. ...
Our approach to mine the top-k most interesting phrases consists of a novel indexing technique, called Sequence Pattern Indexing (SeqPattIndex), that benefits from the observation that phrases often overlap ...
Algorithm 1: Top-k Phrase Mining (Sequence Pattern Indexing) Function: mine(D , k) Input: Current Document Collection D ⊆ D, Result Number Output: Top-k Interesting Phrase List 1 result ← ∅; 2 foreach ...
doi:10.1145/2247596.2247628
dblp:conf/edbt/GaoM12
fatcat:4p44ufclajbbnk7sfh5bmxpyxe
Interesting-phrase mining for ad-hoc text analytics
2010
Proceedings of the VLDB Endowment
We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. ...
While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. ...
Finding interesting phrases is related to mining frequent subsequences in sequence collections. ...
doi:10.14778/1920841.1921007
fatcat:lmkv2yfnarhchmxszer4zsnsja
Multi-Dimensional, Phrase-Based Summarization in Text Cubes
2016
IEEE Data Engineering Bulletin
To quickly digest the content of subsets of documents in the multi-dimensional context, we study the problem of phrase-based summarization of a subset of documents of interest. ...
representative phrases. ...
It mines top-k representative phrases based on three criteria: integrity, popularity and distinctiveness. ...
dblp:journals/debu/TaoZYWCKVH16
fatcat:jwez2zkmpncx7o3uts3pavmrie
Labeling Document Clusters with Thematic Phrases
2017
IARJSET
The work considers embedding external knowledge to terms using WordNet and provides an approach to derive a theme in the group of documents and label that group with the most appropriate Phrase. ...
To solve the problem, a phrase based cluster labeling is considered in this work. ...
methods
Data mining
IP
Dimensional feature space
Image processing
NS
Tolerant authentication broadcast authentication protocol Ad hoc networks
ST
Test plans improvement using simulated defect ...
doi:10.17148/iarjset.2017.4703
fatcat:fc2iticjnbgijhx5ayhye7cocy
Processing Conjunctive and Phrase Queries with the Set-Based Model
[chapter]
2004
Lecture Notes in Computer Science
For the TReC-8 collection, our extension led to a gain, relative to the standard vector space model, of 23.32% and 18.98% in average precision curves for conjunctive and phrase queries, respectively. ⋆ ...
and phrase queries. ...
This pruning strategy is incorporated in the maximal termsets enumeration algorithm.
Phrase Queries Search engines are used to find data in response to ad hoc queries. ...
doi:10.1007/978-3-540-30213-1_25
fatcat:o7hjcwhpmrhoplxf7lhxikaytu
PRIIME: A Generic Framework for Interactive Personalized Interesting Pattern Discovery
[article]
2016
arXiv
pre-print
In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness ...
The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns. ...
[14] develops a toolbox with interestingness measures, mining and post-processing algorithms as built-ins that can assist a user to visually mine interesting patterns. ...
arXiv:1607.05749v1
fatcat:mghsgatrfbegnlfbcmf4t6m4ka
PRIIME: A generic framework for interactive personalized interesting pattern discovery
2016
2016 IEEE International Conference on Big Data (Big Data)
In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness ...
The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns. ...
[14] develops a toolbox with interestingness measures, mining and post-processing algorithms as built-ins that can assist a user to visually mine interesting patterns. ...
doi:10.1109/bigdata.2016.7840653
dblp:conf/bigdataconf/BhuiyanH16
fatcat:z4fps5cjszhnxbx5qei7exvgiy
An Intelligent Approach to Information Retrieval System Using Enhanced DIG and FP-Tree Techniques
2014
IOSR Journal of Computer Engineering
Consequently, the enhanced indexing technique named Document Index Graph (DIG) used for indexing document collection in order to match and retrieve information efficiently. ...
The most frequently appearing words are planted into FP (Frequent Pattern) Tree. The FP-tree is a compact representation of all relevant frequently occurring information in a corpus. ...
It tries to find interesting patterns from the corpus. ...
doi:10.9790/0661-16516778
fatcat:lres7cpjq5dlxjgango4phych4
Scalable ad-hoc entity extraction from text collections
2008
Proceedings of the VLDB Endowment
In this paper, we introduce the "ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. ...
In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. ...
CONCLUSIONS In this paper, we considered the problem of ad-hoc entity extraction from indexed document collections. ...
doi:10.14778/1453856.1453958
fatcat:gkeokco6urfddomvma22svop4u
Automatic assignment of biomedical categories: toward a generic approach
2005
Bioinformatics
Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. ...
Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: ...
Using noun phrases The index of phrases is used to reorder the set of terms returned by the engine. ...
doi:10.1093/bioinformatics/bti783
pmid:16287934
fatcat:sodow3a22vfrtmflp4qfbjw45y
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
2007
Seventh IEEE International Conference on Data Mining (ICDM 2007)
However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. ...
Thus our model can model "white house" as a special meaning phrase in the 'politics' topic, but not in the 'real estate' topic. Successive bigrams form longer phrases. ...
modeling framework to conduct ad-hoc retrieval on Gigabyte TREC collections. ...
doi:10.1109/icdm.2007.86
dblp:conf/icdm/WangMW07
fatcat:ciwc47rj45djnmmcexdb46jxfi
Moving Objects Analytics: Survey on Future Location & Trajectory Prediction Methods
[article]
2018
arXiv
pre-print
We also list the properties of several real datasets used in the past for validation purposes of those works and, motivated by this, we discuss challenges that arise in the transition from conventional ...
Phrases: mobility data, moving object trajectories, trajectory prediction, future location prediction. ...
and sequence mining and corset discovery, to discover different types of pattern. ...
arXiv:1807.04639v1
fatcat:lvje57kod5eldaplkl53wbwgti
Further reflections on TREC
2000
Information Processing & Management
The paper focuses on the ad hoc retrieval task, with discussion of other test tracks as appropriate. ...
The analysis of the tests is presented through a series of key questions about indexing models, document and query descriptions, search strategies, etc. ...
Acknowledgements I am grateful to Stephen Robertson and Donna Harman for their comments, and wish speci®cally to acknowledge the latter's scrupulous care in distinguishing her points as from a Referee, ...
doi:10.1016/s0306-4573(99)00044-8
fatcat:lqrad3oravfsxkzahlpj3afmea
Automatic labeling of multinomial topic models
2007
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07
Multinomial distributions over words are frequently used to model topics in text collections. ...
A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. ...
The advantage of such an ngram testing approach is that it does not require training data, and is applicable to text collection of any ad hoc domains/topics. ...
doi:10.1145/1281192.1281246
dblp:conf/kdd/MeiSZ07
fatcat:gpxwjhaeirfnlgrrx2vm5gxqvu
Advanced document description, a sequential approach
2006
SIGIR Forum
One natural improvement over this representation is the extraction and use of cohesive word sequences. ...
In this dissertation, we consider the problem of the extraction, selection and exploitation of word sequences, with a particular focus on the applicability of our work to domain-independent document collections ...
Sequential Pattern Mining Techniques SP ADE. ...
doi:10.1145/1147197.1147212
fatcat:k32ofbs5szd4hph4mgq6yq65am
« Previous
Showing results 1 — 15 out of 3,443 results