A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Utility-driven Mining of Contiguous Sequences
[article]
2021
arXiv
pre-print
In this paper, we propose a novel algorithm, fast utility-driven contiguous sequential pattern mining (FUCPM), to address the CSPM problem. ...
Recently, contiguous sequential pattern mining (CSPM) gained interest as a research topic, due to its varied potential real-world applications, such as web log and biological sequence analysis. ...
Previous studies on CSPM are mostly based on frequency, while only a few studies [25] , [26] , [27] have been conducted to address the problem of utility-driven contiguous sequential pattern mining ...
arXiv:2111.00247v1
fatcat:jywfqf3tyjhxrcilcnp4oo4ntu
Gathering meta‐data and instances from object referral lists on the web
2006
Online information review (Print)
Purpose -The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities ...
Originality/value -Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain ...
An instance is defined as a sequence of one or more contiguous atoms. (4) Instance compatibility. ...
doi:10.1108/14684520610675807
fatcat:tldxs7lirnafxj53mwrk6wuqb4
A framework for representing navigational patterns as full temporal objects
2004
ACM SIGecom Exchanges
This is a shift from existing techniques that are driven by pre-defined thresholds that can only support partial temporal representation of navigational patterns. ...
The proposed framework also enhances the understanding and interpretation of discovered patterns, and provides a rich environment for integrating the analysis of navigational patterns with data from the ...
Our results show that mining all contiguous navigational patterns utilizing the techniques discussed in this paper performs well compared to the existing techniques that make use of support thresholds. ...
doi:10.1145/1120687.1120691
fatcat:wh74nie7e5cnzb4skcxdbl4fju
FlyExpress 7: An Integrated Discovery Platform To Study Coexpressed Genes Using in Situ Hybridization Images in Drosophila
2017
G3: Genes, Genomes, Genetics
analytical and visual mining of these patterns. ...
(image) and genomic (sequence) data. ...
This work was supported by the National Institutes of Health (grant HG002516-09 to S.K.).
LITERATURE CITED ...
doi:10.1534/g3.117.040345
pmid:28667017
pmcid:PMC5555482
fatcat:4s4dtcooazbklliffko5gvbwum
Semantic Partitioning of Web Pages
[chapter]
2005
Lecture Notes in Computer Science
In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures ...
Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16, 861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of ...
Our pattern mining algorithm is different from these approaches and parses the given sequence in a bottom-up fashion and infers the grammar on-the-fly as it goes through the sequence multiple number of ...
doi:10.1007/11581062_9
fatcat:64phk6gijrfmvjv333dywrveb4
Metagenomics for Mining New Genetic Resources of Microbial Communities
2009
Journal of Molecular Microbiology and Biotechnology
Activitybased screening of such libraries has demonstrated that this new diversity is not simply variations on known sequence themes, but rather the existence of entirely new sequence classes and novel ...
Recent progress has revealed that the capture of genetic resources of complex microbial communities in metagenome libraries allows the discovery of a richness of new enzymatic diversity that had not previously ...
normally is, depending on sequencing method, between 80 and 1,000 bp, it results in artificial contiguous assemblies of DNA sequences derived from different hosts [DeLong, 2005; Noguchi et al., 2006; ...
doi:10.1159/000142898
pmid:18957866
fatcat:66hzhsyduvaldeq4ue4jjwnenq
Superstring-Based Sequence Obfuscation to Thwart Pattern Matching Attacks
[article]
2021
arXiv
pre-print
The algorithms are evaluated on synthetic data traces and on the Reality Mining Dataset to demonstrate their utility. ...
By relating the problem to a set of combinatorial questions on sequence construction, we are able to provide provable guarantees for our proposed constructions. ...
• We validate the developed approaches on both synthetic data and the Reality Mining dataset to demonstrate their utility and compare their performance (Section VIII). ...
arXiv:2108.12336v1
fatcat:7qy4ejhmbbceddwvnkkhrzs4fu
Scalable Topical Phrase Mining from Text Corpora
[article]
2014
arXiv
pre-print
Existing work either performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. ...
As such, we consider the problem of discovering topical phrases of mixed lengths. ...
We formally define phrases and other necessary notation and terminology as follows: ‚ A phrase is a sequence of contiguous tokens: P =tw d,i , ...w d,i`n u n ą 0 ‚ A partition over d-th document is a sequence ...
arXiv:1406.6312v2
fatcat:umrmdntoabhntf4knzkvflywji
Scalable topical phrase mining from text corpora
2014
Proceedings of the VLDB Endowment
Existing work either performs post processing to the results of unigram-based topic models, or utilizes complex n-gramdiscovery topic models. ...
As such, we consider the problem of discovering topical phrases of mixed lengths. ...
We formally define phrases and other necessary notation and terminology as follows: ‚ A phrase is a sequence of contiguous tokens: P =tw d,i , ...w d,i`n u n ą 0 ‚ A partition over d-th document is a sequence ...
doi:10.14778/2735508.2735519
fatcat:l5dgrmk3sngg3aghbnyrbahmli
Hierarchically clustered HMM for protein sequence motif extraction with variable length
2014
Tsinghua Science and Technology
Protein sequence motifs extraction is an important field of bioinformatics since its relevance to the structural analysis. ...
The related data mining fields using Hidden Markova Model may also benefit from this clustering on HMM themselves approach. ...
Motifs can then be extracted without any assumption on the length of the motif by analyzing the clusters and extracting contiguous sequences with a given threshold of clustered proteins. ...
doi:10.1109/tst.2014.6961032
fatcat:hpqnbnaldnfr3csezc3ohg6h2e
Data Mining for Modeling Chiller Systems in Data Centers
[chapter]
2010
Lecture Notes in Computer Science
We present a data mining approach to model the cooling infrastructure in data centers, particularly the chiller ensemble. ...
These infrastructures are poorly understood due to the lack of "first principles" models of chiller systems. At the same time, they abound in data due to instrumentation by modern sensor networks. ...
(c-b, 14) Frequent episode mining is now conducted over this sequence of transitions. ...
doi:10.1007/978-3-642-13062-5_13
fatcat:fll3ajjasfccbkdslim2xxm3ky
Programming Language Agnostic Mining of Code and Language Pairs with Sequence Labeling Based Question Answering
[article]
2022
arXiv
pre-print
In this paper, we propose a Sequence Labeling based Question Answering (SLQA) method to mine NL-PL pairs in a PL-agnostic manner. ...
In particular, we propose to apply the BIO tagging scheme instead of the conventional binary scheme to mine the code solutions which are often composed of multiple blocks of a post. ...
RELATED WORK 2.1 NL-PL Pair Mining Existing methods for mining NL-PL pairs first proposed to utilize the function's code-documentation pairs because they are naturally aligned and can be automatically ...
arXiv:2203.10744v1
fatcat:6nrtwkovirclhfvf3grcwjm6iu
Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases
[chapter]
2005
Proceedings of the 2005 SIAM International Conference on Data Mining
We demonstrate the utility of our approach with a set of comprehensive experiments on real datasets from a variety of domains. ...
The increasing interest in time series data mining in the last decade has resulted in the introduction of a variety of similarity measures, representations and algorithms. ...
Reproducible Results Statement: In the interests of competitive scientific inquiry, all datasets used in this work are available at the following URL [19] . ...
doi:10.1137/1.9781611972757.55
dblp:conf/sdm/KumarLKLR05
fatcat:b7ysobf5kjauncsw7zmsjcu26q
Mineralization at Seathwaite Tam, near Coniston, English Lake District: The first occurrence of wittichenite in Great Britain
1979
Mineralogical magazine
Ore specimens from the abandoned copper mine at Seathwaite Tarn, Cumbria, were studied in polished section by reflected light microscopy. ...
S. wishes to acknowledge a University of Aston studentship and to thank his supervisors Drs J. W. Gaskarth and D. J. Vaughan and Professor D. D. Hawkes, Head of the Department of Geological Sciences. ...
Embrey for his criticism of the manuscript. ...
doi:10.1180/minmag.1979.043.325.06
fatcat:prqz452h7falxo6mu5aubk33oe
Plant genomic instability detected by microsatellite-primers
2000
Electronic Journal of Biotechnology
Many of these efforts were motivated by a known or likely utility of the proteins for therapy. ...
sequence data to assemble small sets of sequences into large contiguous nucleotide sequences, structural analysis of the large sequences to identify the presence of a gene, analysis of the putative amino ...
doi:10.2225/vol3-issue2-fulltext-2
fatcat:j4xcbo5bdzhyhlhryiepugkqrq
« Previous
Showing results 1 — 15 out of 6,036 results