88,033 Hits in 5.5 sec

On-line String Matching in Highly Similar DNA Sequences

Nadia Ben Nsira, Mourad Elloumi, Thierry Lecroq
2017 Mathematics in Computer Science  
We consider the problem of on-line exact string matching of a pattern in a set of highly similar sequences. This can be useful in cases where indexing the sequences is not feasible.  ...  We exhibit experimental results showing that our algorithm is much faster than searching for the pattern in each sequences with a very fast on-line exact string matching algorithm.  ...  For FJS algorithm we launch the execution texts one by one. We ran the FJS algorithm on On-line String Matching in Highly Similar DNA Sequences each sequence successively.  ... 
doi:10.1007/s11786-016-0280-2 fatcat:xc7zvtnx4fey3jkik3phrkex3m

A fast pattern matching algorithm for highly similar sequences

Nadia Ben Nsira, Thierry Lecroq, Mourad Elloumi
2014 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)  
In this paper we propose a very efficient algorithm that solves the on-line exact pattern matching problem in a set of highly similar DNA sequences.  ...  There is thus a strong need for efficient algorithms for performing fast pattern matching in such specific sets of sequences.  ...  problem on a set of highly similar sequences.  ... 
doi:10.1109/bibm.2014.6999384 dblp:conf/bibm/NsiraLE14 fatcat:pr7b2dlgizcwbib5kurzf3pjeq

Indexing DNA Sequences Using q-Grams [chapter]

Xia Cao, Shuai Cheng Li, Anthony K. H. Tung
2005 Lecture Notes in Computer Science  
Contributing to the interest, this paper presents a method for indexing the DNA sequences efficiently based on q-grams to facilitate similarity search in a DNA database and sidestep the need for linear  ...  Two level index -hash table and c-treesare proposed based on the q-grams of DNA sequences.  ...  Conclusion We have devised a novel two-level index structure based on q-grams of the DNA sequences which can support efficient similarity search in DNA sequence database.  ... 
doi:10.1007/11408079_4 fatcat:sczrqs2aingcxaka5wzavcqp7q

SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning

V. Vineetha, C. L. Biji, Achuthsankar S. Nair
2019 Scientific Reports  
Knowledge driven algorithms utilizing features of input sequences, such as high similarity in case of DNA sequences, can help in improving the efficiency of DNA MSA to assist in phylogenetic tree construction  ...  The algorithm uses suffix tree for identifying common substrings and uses a modified Needleman-Wunsch algorithm for pairwise alignments.  ...  DNA sequences are highly similar compared to protein sequences.  ... 
doi:10.1038/s41598-019-42966-5 pmid:31036850 pmcid:PMC6488671 fatcat:muqsmksj5bfnrdwwkdqfcveywe

An Improved Fast Search Method Using Histogram Features For Dna Sequence Database

Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi
2010 Zenodo  
An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching.  ...  Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.  ...  In section II, we will first introduce the proposed local search algorithm using histogram features for DNA sequences in detail.  ... 
doi:10.5281/zenodo.1076388 fatcat:nnti2eptzrahrmyb4i4dzimaoy

A Filtering Algorithm for Efficient Retrieving of DNA Sequence

M. Nordin A. Rahman, M. Yazid M. Saman, Aziz Ahmad, A. Osman M. Tap
2009 Journal of clean energy technologies  
Index Terms-Exact string matching, Aho-Corasick algorithm, sequence comparison, Smith-Waterman algorithm.  ...  The algorithm filtered the expected irrelevant DNA sequences in database from being computed for dynamic programming based optimal alignment process.  ...  In general, this ranking process is not guaranteed for highly expected similar DNA sequences to a query are positioned at top of the ranked list.  ... 
doi:10.7763/ijcte.2009.v1.16 fatcat:hjbt5el2frgaho6yenaukpptdq

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

Gustavo Encarnacao, Nuno Sebastiao, Nuno Roma
2011 2011 International Conference on High Performance Computing & Simulation  
of DNA sequences is presented.  ...  When compared with the CPU, the results demonstrate the possibility to achieve speedups as high as 85 when using the suffix array in the GPU, thus making it an adequate choice for high-performance bioinfomatics  ...  The usage of a suffix array for string matching (in this case for DNA sequence alignment) is similar to using any other sorted array to search for a given element.  ... 
doi:10.1109/hpcsim.2011.5999806 dblp:conf/ieeehpcs/EncarnacaoSR11 fatcat:wbavz72f4fapbfa3qilfdfwb7i

Short Read Alignment Based on Maximal Approximate Match Seeds

Wei Quan, Dengfeng Guan, Guangri Quan, Bo Liu, Yadong Wang
2020 Frontiers in Molecular Biosciences  
Here, we propose a novel sequence alignment algorithm, named MAM, which can efficiently align short DNA sequences.  ...  Thus, most alignment tools prefer to simply discard highly repetitive seeds, but this may cause the true alignment to be missed.  ...  Repetitive DNA sequences are multiple copies of sequences with high similarity that occur throughout the genome.  ... 
doi:10.3389/fmolb.2020.572934 pmid:33251246 pmcid:PMC7674947 fatcat:wopc6aj3i5bytbqgw6asufw6gu

Implementation and performance analysis of efficient index structures for DNA search algorithms in parallel platforms

Nuno Sebastião, Gustavo Encarnação, Nuno Roma
2012 Concurrency and Computation  
performing DNA EFFICIENT INDEX STRUCTURES FOR DNA SEARCH IN PARALLEL PLAT Figure 2.  ...  These indexes have been widely adopted to perform DNA search operations of short query sequences against a large reference sequence in general purpose processors.  ...  EVALUATION OF INDEX-BASED SEARCH ALGORITHMS To evaluate the conceived highly concurrent implementations of the considered index-based search algorithms, a set of real DNA sequence data obtained from the  ... 
doi:10.1002/cpe.2970 fatcat:rr5nixxos5f6tdpdxaoe2xefpa

Multiple Co-Evolutionary Networks Are Supported by the Common Tertiary Scaffold of the LacI/GalR Proteins

Daniel J. Parente, Liskin Swint-Kruse, Andrew C. Gill
2013 PLoS ONE  
Alternatively, the tertiary scaffold might be adaptable, accommodating a unique set of functionally important sites for each paralogous function.  ...  Functionally important positions were identified by conservation and co-evolutionary sequence analyses.  ...  Brown (University of California at Santa Barbara) for providing the source code for his implementation of ZNMI.  ... 
doi:10.1371/journal.pone.0084398 pmid:24391951 pmcid:PMC3877293 fatcat:kqcjkbziojfrvgfcavcnovv3fa

Evolution of biosequence search algorithms: a brief survey [article]

Gregory Kucherov
2018 arXiv   pre-print
The paper surveys the evolution of main algorithmic techniques to compare and search biological sequences.  ...  We highlight key algorithmic ideas emerged in response to several interconnected factors: shifts of biological analytical paradigm, advent of new sequencing technologies, and a substantial increase in  ...  Acknowledgements Many thanks go to Karel Břinda for his helpful comments and suggestions. Bibliography  ... 
arXiv:1808.01038v4 fatcat:uiyjrwvgprgu3nfcu6i47o4wpe

Googling DNA sequences on the World Wide Web

Mehrdad Hajibabaei, Gregory AC Singer
2009 BMC Bioinformatics  
We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google.  ...  We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words.  ...  Acknowledgements Thanks to Donal Hickey for his advice and help in the development of this algorithm.  ... 
doi:10.1186/1471-2105-10-s14-s4 pmid:19900300 pmcid:PMC2775150 fatcat:on7fdsthqzcanirzh3xddalwa4

Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations [chapter]

Ali Alatabbi, Carl Barton, Costas S. Iliopoulos, Laurent Mouchard
2012 IFIP Advances in Information and Communication Technology  
In this paper we present efficient data structures and algorithms for the High Similarity Sequencing Problem.  ...  In the High Similarity Sequencing Problem we are given the sequences S0, S1, . . . , S k where Sj = ej 1 Iσ 1 ej 2 Iσ 2 ej 3 Iσ 3 , . . . , ej Iσ and must perform pattern matching on the set of sequences  ...  Recent research has focused on exploiting the inherent similarity present in multiple DNA sequences to allow for in memory analysis of a large number of number of DNA sequences.  ... 
doi:10.1007/978-3-642-33412-2_60 fatcat:h2o2brxqynhvflhpqiucrwn74y

Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences

2019 Nucleic Acids Research  
We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding  ...  We test the measures on well-studied genomic sequences of different sizes drawn from different sources.  ...  Species close to each other will have similar DNA sequence entropy values, allowing lossless compression algorithms to compress statistical regularities of genomes of related species with similar compression  ... 
doi:10.1093/nar/gkz750 pmid:31511887 pmcid:PMC6846163 fatcat:7wydzyy62nf3njwqjauy4oyx2u

Querying highly similar sequences

Carl Barton, Mathieu Giraud, Costas S. Iliopoulos, Thierry Lecroq, Laurent Mouchard, Solon P. Pissis
2013 International Journal of Computational Biology and Drug Design  
We present an asymptotically fast O(n + occ log occ)-time algorithm, as well as a practical O( nk w )-time algorithm for solving this problem, where n is the length of a sequence, occ is the number of  ...  The Extreme Similarity Sequencing problem consists of finding occurrences of a pattern p in a set S0, S1, . . . , S k of sequences of equal length, where Si, for all 1 ≤ i ≤ k, differs from S0 by a constant  ...  These allow for good compression rates for single repetitive sequences; however when there is a large number of sequences which are highly similar the entropy does not change.  ... 
doi:10.1504/ijcbdd.2013.052206 pmid:23428478 fatcat:35bw3t3lobe6ndd7ojbfpgeexu
« Previous Showing results 1 — 15 out of 88,033 results