772 Hits in 8.9 sec

Accelerating protein classification using suffix trees

B Dorohonceanu, C G Nevill-Manning
2000 Proceedings. International Conference on Intelligent Systems for Molecular Biology  
We present a method for accelerating these searches using a suffix tree data structure computed from the sequences to be searched.  ...  Position-specific scoring matrices have been used extensively to recognize highly conserved protein regions.  ...  Acknowledgements We would like to thank Professor Martin Farach-Colton for many enlightening discussions on suffix-tree creation and use.  ... 
pmid:10977073 fatcat:e4yfxovjp5gzbmozrmnj7okudq

BLAST Tree: Fast Filtering for Genomic Sequence Classification

Stuart King, Yanni Sun, James Cole, Sakti Pramanik
2010 2010 IEEE International Conference on BioInformatics and BioEngineering  
These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search.  ...  This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer.  ...  In order to ensure high match efficiency, we use a suffix tree based indexing data structure to organize protein families.  ... 
doi:10.1109/bibe.2010.74 dblp:conf/bibe/KingSCP10 fatcat:p2ianswitzaovggjs5axhps7gm

A Comparative Analysis of Motif Discovery Algorithms

Angela Makolo
2016 Computational Biology and Bioinformatics  
The proposed algorithm, Suffix Tree Gene Enrichment Motif Searching (STGEMS) as reported in [30] , proved effective in identifying motifs from organisms with peculiarity in their genomic structure such  ...  The empirical time analysis of seven motif discovery algorithms was evaluated using four sets of genes from the intraerythrocytic development cycle of P. falciparum.  ...  The use of the suffix tree for preprocessing and organizing the input data resulted in an accelerated search for motifs.  ... 
doi:10.11648/j.cbb.20160401.11 fatcat:sin2jqhwafgkjjuadcgrlsudpu

Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment

J. Gracy, P. Argos
1998 Bioinformatics  
After the elimination of all but one sequence from each detected cluster of closely related proteins, the remaining sequences are compiled in a suffix tree which is self-compared to detect local sequence  ...  Sets of proteins which share similar sequence segments are then weighted according to their closeness and multiply aligned using a fast hierarchical dynamic programming algorithm.  ...  Local similarity search Suffix tree construction.  ... 
doi:10.1093/bioinformatics/14.2.164 pmid:9545449 fatcat:2zkka3diejc5fbkze632rv77fa

Significant speedup of database searches with HMMs by search space reduction with PSSM family models

Michael Beckstette, Robert Homann, Robert Giegerich, Stefan Kurtz
2009 Computer applications in the biosciences : CABIOS  
Results: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios.  ...  Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families.  ...  To use enhanced suffix arrays for fast database searching with PSSMs, one simulates a depth first traversal of the suffix tree (cf.  ... 
doi:10.1093/bioinformatics/btp593 pmid:19828575 pmcid:PMC2788931 fatcat:ezmxd7rorzgk5boa23ofjlpzba

Database indexing for large DNA and protein sequence collections

Ela Hunt, Malcolm P. Atkinson, Robert W. Irving
2002 The VLDB journal  
We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, which has hitherto not been possible.  ...  We show experimentally that suffix trees can be effectively used in approximate string matching with biological data.  ...  Considerably less research has been done in the use of suffix trees for protein analysis, with the exception of recent work in the area of protein classification [27] .  ... 
doi:10.1007/s007780200064 fatcat:jupeurrbfrdunf4lvkumm2zuge

A Categorization of Relevant Sequence Alignment Algorithms with Respect to Data Structures

Hasna El Haji, Larbi Alaoui
2020 International Journal of Advanced Computer Science and Applications  
We describe the employed data structures and expose some important algorithms using each. Then we show potential strengths and weaknesses among all these structures.  ...  Suffix Arrays and Suffix Trees Given a sequence S, a suffix array is an ordered array of all suffixes of S, and a suffix tree is a representation of all suffixes in S in the form of a tree.  ...  Suffix trees are used for Read Alignment and Whole Genome Alignment while suffix arrays are more adequate to Prefix-suffix Overlaps Computation and Sequence Clustering. B.  ... 
doi:10.14569/ijacsa.2020.0110635 fatcat:kcvxuxducvhmvo7zrhnfkpaujq

Extracting key-substring-group features for text classification

Dell Zhang, Wee Sun Lee
2006 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06  
In particular, we propose a suffix tree based algorithm that can extract such features in linear time (with respect to the total number of characters in the corpus).  ...  In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words.  ...  tree [62] can accelerate the computation speed to some degree.  ... 
doi:10.1145/1150402.1150455 dblp:conf/kdd/ZhangL06 fatcat:fgtagwoeyfe55jzxpfiral3cri

Distributed String Mining for High-Throughput Sequencing Data [chapter]

Niko Välimäki, Simon J. Puglisi
2012 Lecture Notes in Computer Science  
The data structure used is that of Version Space Trees (VST, introduced by [6]) to organize the set of substring patterns being discovered.  ...  More specifically, we employ ideas from suffix trees [4, 5] to represent and compute the set of patterns of interest.  ... 
doi:10.1007/978-3-642-33122-0_35 fatcat:oyojyer7mfbtllp6inbzpof77q

A greedy alignment-free distance estimator for phylogenetic inference

Sharma V. Thankachan, Sriram P. Chockalingam, Yongchao Liu, Ambujam Krishnan, Srinivas Aluru
2017 BMC Bioinformatics  
Performance evaluation using real sequence datasets shows that our heuristic is able to reconstruct comparable, or even more accurate, phylogenetic tree topologies than the kmacs heuristic algorithm at  ...  Results: We present ALFRED-G, a greedy alignment-free distance estimator for phylogenetic tree reconstruction based on the concept of the generalized ACS approach.  ...  The distance based on these methods can be computed using suffix trees/arrays.  ... 
doi:10.1186/s12859-017-1658-0 pmid:28617225 pmcid:PMC5471951 fatcat:5hdl3lemd5acbez6mwdyoaalmq

Biological Sequence Indexing Using Persistent Java

Elzbieta Pustulka-Hunt
2001 Zenodo  
This methodology allowed us to develop a practical algorithm for the construction of suffix trees on disk up to any size supported by the available file and addressing space, which has hitherto not been  ...  The third contribution is a new experimental methodology for examining the usefulness of suffix indexes, and the use of this methodology in an empirical investigation of the indexing gain achieved by combining  ...  We thank David Leader for giving us access to the recently improved map browser and Guenter Teltow for his management work for the Chromosome 21 database.  ... 
doi:10.5281/zenodo.1341577 fatcat:jeeostwznzdrtakbhhw4bepsy4

Indexing Schemes for Similarity Search In Datasets of Short Protein Fragments [article]

Aleksandar Stojmirovic, Vladimir Pestov
2007 arXiv   pre-print
We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments.  ...  This type of similarity search has importance in both providing a building block to more complex algorithms and for possible use in direct biological investigations where datasets are of the order of 60  ...  We also thank Marco Patella for providing us  ... 
arXiv:cs/0309005v4 fatcat:sm742ecd2nh5pcimxmbacqf524

The Average Common Substring Approach to Phylogenomic Reconstruction

Igor Ulitsky, David Burstein, Tamir Tuller, Benny Chor
2006 Journal of Computational Biology  
We implemented the algorithm, using suffix arrays.  ...  Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin.  ...  ., Protml: Maximum likelihood inference of protein phylogeny. [2] Ben-Dor, A., Chor, B., Graur, D., Ophir, R., and Pelleg, D. 1998 Constructing phylogenies  ... 
doi:10.1089/cmb.2006.13.336 pmid:16597244 fatcat:l2y4ypheo5bbncforxo55ffqdi

Information Theoretic Approaches to Whole Genome Phylogenies [chapter]

David Burstein, Igor Ulitsky, Tamir Tuller, Benny Chor
2005 Lecture Notes in Computer Science  
We implemented the algorithm, using suffix arrays.  ...  Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin.  ...  We can efficiently perform the subsequence search by using suffix trees [48] . Creating the generalized suffix tree for two sequences of lengths 1 , 2 requires O( 1 + 2 ) time.  ... 
doi:10.1007/11415770_22 fatcat:fhdjl343eje7de2y2bheevty44

Mining and applications of repeating patterns

Ja-Hwung Su, Tzung-Pei Hong, Chu-Yu Chin, Zhi-Feng Liao, Shyr-Yuan Cheng
2018 Vietnam Journal of Computer Science  
In this paper, our purposes are to contribute an efficient mining algorithm for repeating patterns and to conduct a real application using the repeating patterns mined.  ...  In addition to RP tree, another tree structure for generating the repeating patterns is suffix tree. Basically, the suffix tree is the compressed tree for the nonempty suffixes of a string.  ...  Second, RP tree performs much worse than the proposed method and Suffix tree.  ... 
doi:10.1007/s40595-018-0120-1 fatcat:jjn7kqejdfhwracabifgq7snl4
« Previous Showing results 1 — 15 out of 772 results