6,036 Hits in 3.5 sec

Utility-driven Mining of Contiguous Sequences [article]

Chunkai Zhang, Quanjian Dai, Zilin Du, Wensheng Gan, Jian Weng, Philip S. Yu
2021 arXiv   pre-print
In this paper, we propose a novel algorithm, fast utility-driven contiguous sequential pattern mining (FUCPM), to address the CSPM problem.  ...  Recently, contiguous sequential pattern mining (CSPM) gained interest as a research topic, due to its varied potential real-world applications, such as web log and biological sequence analysis.  ...  Previous studies on CSPM are mostly based on frequency, while only a few studies [25] , [26] , [27] have been conducted to address the problem of utility-driven contiguous sequential pattern mining  ... 
arXiv:2111.00247v1 fatcat:jywfqf3tyjhxrcilcnp4oo4ntu

Gathering meta‐data and instances from object referral lists on the web

Srinivas Vadrevu, Fatih Gelgi, Saravanakumar Nagarajan, Hasan Davulcu, Miguel‐Angel Sicilia
2006 Online information review (Print)  
Purpose -The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities  ...  Originality/value -Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain  ...  An instance is defined as a sequence of one or more contiguous atoms. (4) Instance compatibility.  ... 
doi:10.1108/14684520610675807 fatcat:tldxs7lirnafxj53mwrk6wuqb4

A framework for representing navigational patterns as full temporal objects

Ajumobi Udechukwu, Ken Barker, Reda Alhajj
2004 ACM SIGecom Exchanges  
This is a shift from existing techniques that are driven by pre-defined thresholds that can only support partial temporal representation of navigational patterns.  ...  The proposed framework also enhances the understanding and interpretation of discovered patterns, and provides a rich environment for integrating the analysis of navigational patterns with data from the  ...  Our results show that mining all contiguous navigational patterns utilizing the techniques discussed in this paper performs well compared to the existing techniques that make use of support thresholds.  ... 
doi:10.1145/1120687.1120691 fatcat:wh74nie7e5cnzb4skcxdbl4fju

FlyExpress 7: An Integrated Discovery Platform To Study Coexpressed Genes Using in Situ Hybridization Images in Drosophila

Sudhir Kumar, Charlotte Konikoff, Maxwell Sanderford, Li Liu, Stuart Newfeld, Jieping Ye, Rob J. Kulathinal
2017 G3: Genes, Genomes, Genetics  
analytical and visual mining of these patterns.  ...  (image) and genomic (sequence) data.  ...  This work was supported by the National Institutes of Health (grant HG002516-09 to S.K.). LITERATURE CITED  ... 
doi:10.1534/g3.117.040345 pmid:28667017 pmcid:PMC5555482 fatcat:4s4dtcooazbklliffko5gvbwum

Semantic Partitioning of Web Pages [chapter]

Srinivas Vadrevu, Fatih Gelgi, Hasan Davulcu
2005 Lecture Notes in Computer Science  
In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures  ...  Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16, 861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of  ...  Our pattern mining algorithm is different from these approaches and parses the given sequence in a bottom-up fashion and infers the grammar on-the-fly as it goes through the sequence multiple number of  ... 
doi:10.1007/11581062_9 fatcat:64phk6gijrfmvjv333dywrveb4

Metagenomics for Mining New Genetic Resources of Microbial Communities

Manuel Ferrer, Ana Beloqui, Kenneth N. Timmis, Peter N. Golyshin
2009 Journal of Molecular Microbiology and Biotechnology  
Activitybased screening of such libraries has demonstrated that this new diversity is not simply variations on known sequence themes, but rather the existence of entirely new sequence classes and novel  ...  Recent progress has revealed that the capture of genetic resources of complex microbial communities in metagenome libraries allows the discovery of a richness of new enzymatic diversity that had not previously  ...  normally is, depending on sequencing method, between 80 and 1,000 bp, it results in artificial contiguous assemblies of DNA sequences derived from different hosts [DeLong, 2005; Noguchi et al., 2006;  ... 
doi:10.1159/000142898 pmid:18957866 fatcat:66hzhsyduvaldeq4ue4jjwnenq

Superstring-Based Sequence Obfuscation to Thwart Pattern Matching Attacks [article]

Bo Guan, Nazanin Takbiri, Dennis Goeckel, Amir Houmansadr, Hossein Pishro-Nik
2021 arXiv   pre-print
The algorithms are evaluated on synthetic data traces and on the Reality Mining Dataset to demonstrate their utility.  ...  By relating the problem to a set of combinatorial questions on sequence construction, we are able to provide provable guarantees for our proposed constructions.  ...  • We validate the developed approaches on both synthetic data and the Reality Mining dataset to demonstrate their utility and compare their performance (Section VIII).  ... 
arXiv:2108.12336v1 fatcat:7qy4ejhmbbceddwvnkkhrzs4fu

Scalable Topical Phrase Mining from Text Corpora [article]

Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han
2014 arXiv   pre-print
Existing work either performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models.  ...  As such, we consider the problem of discovering topical phrases of mixed lengths.  ...  We formally define phrases and other necessary notation and terminology as follows: ‚ A phrase is a sequence of contiguous tokens: P =tw d,i , ...w d,i`n u n ą 0 ‚ A partition over d-th document is a sequence  ... 
arXiv:1406.6312v2 fatcat:umrmdntoabhntf4knzkvflywji

Scalable topical phrase mining from text corpora

Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, Jiawei Han
2014 Proceedings of the VLDB Endowment  
Existing work either performs post processing to the results of unigram-based topic models, or utilizes complex n-gramdiscovery topic models.  ...  As such, we consider the problem of discovering topical phrases of mixed lengths.  ...  We formally define phrases and other necessary notation and terminology as follows: ‚ A phrase is a sequence of contiguous tokens: P =tw d,i , ...w d,i`n u n ą 0 ‚ A partition over d-th document is a sequence  ... 
doi:10.14778/2735508.2735519 fatcat:l5dgrmk3sngg3aghbnyrbahmli

Hierarchically clustered HMM for protein sequence motif extraction with variable length

Cody Hudson, Bernard Chen, Dongsheng Che
2014 Tsinghua Science and Technology  
Protein sequence motifs extraction is an important field of bioinformatics since its relevance to the structural analysis.  ...  The related data mining fields using Hidden Markova Model may also benefit from this clustering on HMM themselves approach.  ...  Motifs can then be extracted without any assumption on the length of the motif by analyzing the clusters and extracting contiguous sequences with a given threshold of clustered proteins.  ... 
doi:10.1109/tst.2014.6961032 fatcat:hpqnbnaldnfr3csezc3ohg6h2e

Data Mining for Modeling Chiller Systems in Data Centers [chapter]

Debprakash Patnaik, Manish Marwah, Ratnesh K. Sharma, Naren Ramakrishnan
2010 Lecture Notes in Computer Science  
We present a data mining approach to model the cooling infrastructure in data centers, particularly the chiller ensemble.  ...  These infrastructures are poorly understood due to the lack of "first principles" models of chiller systems. At the same time, they abound in data due to instrumentation by modern sensor networks.  ...  (c-b, 14) Frequent episode mining is now conducted over this sequence of transitions.  ... 
doi:10.1007/978-3-642-13062-5_13 fatcat:fll3ajjasfccbkdslim2xxm3ky

Programming Language Agnostic Mining of Code and Language Pairs with Sequence Labeling Based Question Answering [article]

Changran Hu, Akshara Reddi Methukupalli, Yutong Zhou, Chen Wu, Yubo Chen
2022 arXiv   pre-print
In this paper, we propose a Sequence Labeling based Question Answering (SLQA) method to mine NL-PL pairs in a PL-agnostic manner.  ...  In particular, we propose to apply the BIO tagging scheme instead of the conventional binary scheme to mine the code solutions which are often composed of multiple blocks of a post.  ...  RELATED WORK 2.1 NL-PL Pair Mining Existing methods for mining NL-PL pairs first proposed to utilize the function's code-documentation pairs because they are naturally aligned and can be automatically  ... 
arXiv:2203.10744v1 fatcat:6nrtwkovirclhfvf3grcwjm6iu

Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases [chapter]

Nitin Kumar, Venkata Nishanth Lolla, Eamonn Keogh, Stefano Lonardi, Chotirat Ann Ratanamahatana, Li Wei
2005 Proceedings of the 2005 SIAM International Conference on Data Mining  
We demonstrate the utility of our approach with a set of comprehensive experiments on real datasets from a variety of domains.  ...  The increasing interest in time series data mining in the last decade has resulted in the introduction of a variety of similarity measures, representations and algorithms.  ...  Reproducible Results Statement: In the interests of competitive scientific inquiry, all datasets used in this work are available at the following URL [19] .  ... 
doi:10.1137/1.9781611972757.55 dblp:conf/sdm/KumarLKLR05 fatcat:b7ysobf5kjauncsw7zmsjcu26q

Mineralization at Seathwaite Tam, near Coniston, English Lake District: The first occurrence of wittichenite in Great Britain

C. J. Stanley, A. J. Criddle
1979 Mineralogical magazine  
Ore specimens from the abandoned copper mine at Seathwaite Tarn, Cumbria, were studied in polished section by reflected light microscopy.  ...  S. wishes to acknowledge a University of Aston studentship and to thank his supervisors Drs J. W. Gaskarth and D. J. Vaughan and Professor D. D. Hawkes, Head of the Department of Geological Sciences.  ...  Embrey for his criticism of the manuscript.  ... 
doi:10.1180/minmag.1979.043.325.06 fatcat:prqz452h7falxo6mu5aubk33oe

Plant genomic instability detected by microsatellite-primers

Xavier J. Leroy, Karine Leon, Michel Branchard
2000 Electronic Journal of Biotechnology  
Many of these efforts were motivated by a known or likely utility of the proteins for therapy.  ...  sequence data to assemble small sets of sequences into large contiguous nucleotide sequences, structural analysis of the large sequences to identify the presence of a gene, analysis of the putative amino  ... 
doi:10.2225/vol3-issue2-fulltext-2 fatcat:j4xcbo5bdzhyhlhryiepugkqrq
« Previous Showing results 1 — 15 out of 6,036 results