391,498 Hits in 5.0 sec

Linear Detrending Subsequence Matching in Time-Series Databases

Myeong-Seon GIL, Yang-Sae MOON, Bum-Soo KIM
2011 IEICE transactions on information and systems  
Based on the lower bounding theorem, we next propose the index building and subsequence matching algorithms for linear detrending subsequence matching.We finally show the superiority of our index-based  ...  In this paper we define this problem the linear detrending subsequence matching and propose its efficient index-based solution.  ...  This notion enables an LD-window to represent multiple subsequences of different lengths, and eventually, we can use only one index in subsequence matching [7] .  ... 
doi:10.1587/transinf.e94.d.917 fatcat:abfb45mgibe63od6f2g35keidi

Similar sequence matching supporting variable-length and variable-tolerance continuous queries on time-series data stream

Hyo-Sang Lim, Kyu-Young Whang, Yang-Sae Moon
2008 Information Sciences  
To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences.  ...  We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream.  ...  Fig. 3 shows an example of similar sequence matching for two query sequences Q 1 and Q 2 using SSM-IS. The query sequences have different lengths and different tolerances 1 and 2 .  ... 
doi:10.1016/j.ins.2007.10.026 fatcat:w7tjk5kv7bbqhei4nwfvt2jwdm

Short read DNA fragment anchoring algorithm

Wendi Wang, Peiheng Zhang, Xinchun Liu
2009 BMC Bioinformatics  
Limited by reliable output sequence length of next-generation sequencing technologies, we are confined to study gene fragments with 30~50 bps in general and it is relatively shorter than traditional gene  ...  Due to the sheer number of fragments produced by next-generation sequencing technologies and the huge size of reference sequences, anchoring would rapidly becoming a computational bottleneck.  ...  Acknowledgements We are grateful for the resourceful feedback from our anonymous reviewers and Dongbo Bu at the Bioinformatics Lab, University of Waterloo.  ... 
doi:10.1186/1471-2105-10-s1-s17 pmid:19208116 pmcid:PMC2648759 fatcat:nbsdaallpfbknjgi5watc3eipi

Pan-genomic Matching Statistics for Targeted Nanopore Sequencing [article]

Omar Ahmed, Massimiliano Rossi, Sam Kovaka, Michael Schatz, Travis Gagie, Christina Boucher, Ben Langmead
2021 bioRxiv   pre-print
We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing with the help of efficient pangenome indexes.  ...  SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics (half-maximal exact matches) in a streaming fashion.  ...  An MS at position i of a query sequence P of length m equals the length of the longest prefix of P [i..m] that exactly matches a sequence in the index.  ... 
doi:10.1101/2021.03.23.436610 fatcat:bjwrzvkbhvbqtpnhxbnujv45j4

A DNA Index Structure Using Frequency and Position Information of Genetic Alphabet [chapter]

Woo-Cheol Kim, Sanghyun Park, Jung-Im Won, Sang-Wook Kim, Jee-Hee Yoon
2005 Lecture Notes in Computer Science  
Exact match queries, wildcard match queries, and kmismatch queries are widely used in lots of molecular biology applications including the searching of ESTs (Expressed Sequence Tag) and DNA transcription  ...  Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide.  ...  In exact match queries, SeqScan and Suffix show nearly constant performance regardless of the length of query sequences.  ... 
doi:10.1007/11430919_21 fatcat:rxmt5zrjczhvtgquy3ottmolxe

Algorithms designed for compressed-gene-data transformation among gene banks with different references

Qiuming Luo, Chao Guo, Yi Jun Zhang, Ye Cai, Gang Liu
2018 BMC Bioinformatics  
With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today.  ...  We will 1) analyze some different compression algorithms to find the similarities and the differences among all of them, 2) come up with a naïve method named TDM for data transformation between difference  ...  Availability of data and materials The Korean gene data can get from and the first Chinese standard genome sequence map YH-1 can get from http://  ... 
doi:10.1186/s12859-018-2230-2 pmid:29914357 pmcid:PMC6006589 fatcat:ggg3ssjlcbawrdeii4grufzgg4


Sebastian Wandelt, Johannes Starlinger, Marc Bux, Ulf Leser
2013 Proceedings of the VLDB Endowment  
However, due to the sharply falling cost of sequencing technology, studies of populations of individuals of the same species are now feasible and promise advances in areas such as personalized medicine  ...  Given a query, RCSI then searches the reference and all genome-specific individual differences.  ...  We define a referential match entry as a triple rme = (start, length, mismatch), where start is a number indicating the start of a match within the reference, length denotes the match length, and mismatch  ... 
doi:10.14778/2536258.2536265 fatcat:guyqkb34gjflbh6myf4o7vu4lu

Applying Shannon's information theory to bacterial and phage genomes and metagenomes

Sajia Akhter, Barbara A. Bailey, Peter Salamon, Ramy K. Aziz, Robert A. Edwards
2013 Scientific Reports  
Here, Shannon's index of complete phage and bacterial genomes was examined.  ...  The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size.  ...  Acknowledgements This work was supported by grant NSF DBI-0850356 from the NSF Division of Biological Infrastructure to RAE (PhAnToMe project).  ... 
doi:10.1038/srep01033 pmid:23301154 pmcid:PMC3539204 fatcat:weo4z5wuw5ghfaawjukybu6wqa

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment

Altti Ilari Maarala, Ossi Arasalo, Daniel Valenzuela, Veli Mäkinen, Keijo Heljanko, Yanbin Yin
2021 PLoS ONE  
(n = 599) were blasted to the compressed index of 488 GB GenBank database (n = 13,375,031) in 26 minutes on 25 nodes. 78 MB mixed sequences (n = 4,167) were blasted to the compressed index of 18 GB E.  ...  Whole-genome sequencing (WGS) data volumes are growing rapidly, making genomic data compression and indexing methods very important.  ...  Acknowledgments CSC-IT Center for Science and the Finnish Grid and Cloud Infrastructure2 (FGCI2) are gratefully acknowledged for providing the computing capacity and their expertise.  ... 
doi:10.1371/journal.pone.0255260 pmid:34343181 pmcid:PMC8330939 fatcat:wq2tg5obivbzhpnneb3nylpbnm

Financial Time Series: Market Analysis Techniques Based on Matrix Profiles †

Eoin Cartwright, Martin Crane, Heather J. Ruskin
2021 Engineering Proceedings  
It still permits the initial identification of time periods with indicatively similar behaviour across individual market sectors and indexes, together with the assessment of wider applications, such as  ...  Several approaches for the identification of similar behaviour patterns (or motifs) are proposed, illustrated, and the results discussed.  ...  combination of series based upon different measures of a single company or index.  ... 
doi:10.3390/engproc2021005045 fatcat:yjspar6gafbthk5gnvr5xo5xzq

An efficient DNA sequence searching method using position specific weighting scheme

Woo-Cheol Kim, Sanghyun Park, Jung-Im Won, Sang-Wook Kim, Jee-Hee Yoon
2006 Journal of information science  
Exact match queries, wildcard match queries, and kmismatch queries are widely used in various molecular biology applications including the searching of ESTs (Expressed Sequence Tags) and DNA transcription  ...  Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide.  ...  Fig. 5 .Fig. 6 . 56 Time for processing exact match queries with various lengths of query sequences. Time for processing wildcard match queries with various lengths of query sequences.  ... 
doi:10.1177/0165551506062329 fatcat:pexgpqhzerfarb4aazhms22f4i

Bit-Parallel Multiple Pattern Matching [chapter]

Tuan Tu Tran, Mathieu Giraud, Jean-Stéphane Varré
2012 Lecture Notes in Computer Science  
We present an extension of the bit-parallel Wu-Manber algorithm [16] to combine several searches for a pattern into a collection of fixed-length words.  ...  We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities with seed-based heuristics.  ...  For each platform, there are three different curves, corresponding to different seed lengths (3, 4 and 6), hence to different total numbers of neighborhoods.  ... 
doi:10.1007/978-3-642-31500-8_30 fatcat:k7aqvzhusrearfhxabm5bvbn2m

Indexing and retrieval for genomic databases

H.E. Williams, J. Zobel
2002 IEEE Transactions on Knowledge and Data Engineering  
Amino-acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing.  ...  We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment.  ...  Availability The index-based gapped search system (cafe) is available free of charge by request to the authors.  ... 
doi:10.1109/69.979973 fatcat:rhb3572ocjhcnj4ymv5e6dir7y

An Index Based Forward Backward Multiple Pattern Matching Algorithm

Raju Bhukya, DVLN Somayajulu
2010 Zenodo  
In this paper we explore the applicability of a new pattern matching technique called Index based Forward Backward Multiple Pattern Matching algorithm(IFBMPM), for DNA Sequences.  ...  The number of comparisons rapidly decreases and execution time decreases accordingly and shows better performance.  ...  proposed pattern matching algorithm and tested with the pattern of length 16.  ... 
doi:10.5281/zenodo.1083901 fatcat:7eajtuemzjhiff4uvzdq5hf2qq

Efficient time-series subsequence matching using duality in constructing windows

Yang-Sae Moon, Kyu-Young Whang, Woong-Kee Loh
2001 Information Systems  
Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. short), which divides data sequences into  ...  sliding windows and the query sequence into disjoint windows.  ...  Acknowledgements We would like to thank Byoung-Yong Moon for helping in revising an earlier English version of this paper.  ... 
doi:10.1016/s0306-4379(01)00021-7 fatcat:txc4wz3jgrg25lgzh3t4a3fq3i
« Previous Showing results 1 — 15 out of 391,498 results