A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Filters
Accelerating protein classification using suffix trees
2000
Proceedings. International Conference on Intelligent Systems for Molecular Biology
We present a method for accelerating these searches using a suffix tree data structure computed from the sequences to be searched. ...
Position-specific scoring matrices have been used extensively to recognize highly conserved protein regions. ...
Acknowledgements We would like to thank Professor Martin Farach-Colton for many enlightening discussions on suffix-tree creation and use. ...
pmid:10977073
fatcat:e4yfxovjp5gzbmozrmnj7okudq
BLAST Tree: Fast Filtering for Genomic Sequence Classification
2010
2010 IEEE International Conference on BioInformatics and BioEngineering
These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search. ...
This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer. ...
In order to ensure high match efficiency, we use a suffix tree based indexing data structure to organize protein families. ...
doi:10.1109/bibe.2010.74
dblp:conf/bibe/KingSCP10
fatcat:p2ianswitzaovggjs5axhps7gm
A Comparative Analysis of Motif Discovery Algorithms
2016
Computational Biology and Bioinformatics
The proposed algorithm, Suffix Tree Gene Enrichment Motif Searching (STGEMS) as reported in [30] , proved effective in identifying motifs from organisms with peculiarity in their genomic structure such ...
The empirical time analysis of seven motif discovery algorithms was evaluated using four sets of genes from the intraerythrocytic development cycle of P. falciparum. ...
The use of the suffix tree for preprocessing and organizing the input data resulted in an accelerated search for motifs. ...
doi:10.11648/j.cbb.20160401.11
fatcat:sin2jqhwafgkjjuadcgrlsudpu
Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment
1998
Bioinformatics
After the elimination of all but one sequence from each detected cluster of closely related proteins, the remaining sequences are compiled in a suffix tree which is self-compared to detect local sequence ...
Sets of proteins which share similar sequence segments are then weighted according to their closeness and multiply aligned using a fast hierarchical dynamic programming algorithm. ...
Local similarity search Suffix tree construction. ...
doi:10.1093/bioinformatics/14.2.164
pmid:9545449
fatcat:2zkka3diejc5fbkze632rv77fa
Significant speedup of database searches with HMMs by search space reduction with PSSM family models
2009
Computer applications in the biosciences : CABIOS
Results: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. ...
Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. ...
To use enhanced suffix arrays for fast database searching with PSSMs, one simulates a depth first traversal of the suffix tree (cf. ...
doi:10.1093/bioinformatics/btp593
pmid:19828575
pmcid:PMC2788931
fatcat:ezmxd7rorzgk5boa23ofjlpzba
Database indexing for large DNA and protein sequence collections
2002
The VLDB journal
We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, which has hitherto not been possible. ...
We show experimentally that suffix trees can be effectively used in approximate string matching with biological data. ...
Considerably less research has been done in the use of suffix trees for protein analysis, with the exception of recent work in the area of protein classification [27] . ...
doi:10.1007/s007780200064
fatcat:jupeurrbfrdunf4lvkumm2zuge
A Categorization of Relevant Sequence Alignment Algorithms with Respect to Data Structures
2020
International Journal of Advanced Computer Science and Applications
We describe the employed data structures and expose some important algorithms using each. Then we show potential strengths and weaknesses among all these structures. ...
Suffix Arrays and Suffix Trees Given a sequence S, a suffix array is an ordered array of all suffixes of S, and a suffix tree is a representation of all suffixes in S in the form of a tree. ...
Suffix trees are used for Read Alignment and Whole Genome Alignment while suffix arrays are more adequate to Prefix-suffix Overlaps Computation and Sequence Clustering.
B. ...
doi:10.14569/ijacsa.2020.0110635
fatcat:kcvxuxducvhmvo7zrhnfkpaujq
Extracting key-substring-group features for text classification
2006
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06
In particular, we propose a suffix tree based algorithm that can extract such features in linear time (with respect to the total number of characters in the corpus). ...
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. ...
tree [62] can accelerate the computation speed to some degree. ...
doi:10.1145/1150402.1150455
dblp:conf/kdd/ZhangL06
fatcat:fgtagwoeyfe55jzxpfiral3cri
Distributed String Mining for High-Throughput Sequencing Data
[chapter]
2012
Lecture Notes in Computer Science
The data structure used is that of Version Space Trees (VST, introduced by [6]) to organize the set of substring patterns being discovered. ...
More specifically, we employ ideas from suffix trees [4, 5] to represent and compute the set of patterns of interest. ...
doi:10.1007/978-3-642-33122-0_35
fatcat:oyojyer7mfbtllp6inbzpof77q
A greedy alignment-free distance estimator for phylogenetic inference
2017
BMC Bioinformatics
Performance evaluation using real sequence datasets shows that our heuristic is able to reconstruct comparable, or even more accurate, phylogenetic tree topologies than the kmacs heuristic algorithm at ...
Results: We present ALFRED-G, a greedy alignment-free distance estimator for phylogenetic tree reconstruction based on the concept of the generalized ACS approach. ...
The distance based on these methods can be computed using suffix trees/arrays. ...
doi:10.1186/s12859-017-1658-0
pmid:28617225
pmcid:PMC5471951
fatcat:5hdl3lemd5acbez6mwdyoaalmq
Biological Sequence Indexing Using Persistent Java
2001
Zenodo
This methodology allowed us to develop a practical algorithm for the construction of suffix trees on disk up to any size supported by the available file and addressing space, which has hitherto not been ...
The third contribution is a new experimental methodology for examining the usefulness of suffix indexes, and the use of this methodology in an empirical investigation of the indexing gain achieved by combining ...
We thank David Leader for giving us access to the recently improved map browser and Guenter Teltow for his management work for the Chromosome 21 database. ...
doi:10.5281/zenodo.1341577
fatcat:jeeostwznzdrtakbhhw4bepsy4
Indexing Schemes for Similarity Search In Datasets of Short Protein Fragments
[article]
2007
arXiv
pre-print
We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments. ...
This type of similarity search has importance in both providing a building block to more complex algorithms and for possible use in direct biological investigations where datasets are of the order of 60 ...
We also thank Marco Patella for providing us ...
arXiv:cs/0309005v4
fatcat:sm742ecd2nh5pcimxmbacqf524
The Average Common Substring Approach to Phylogenomic Reconstruction
2006
Journal of Computational Biology
We implemented the algorithm, using suffix arrays. ...
Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. ...
., Protml: Maximum likelihood inference of protein phylogeny. http://cmgm.stanford.edu/phylip/protml.html. [2] Ben-Dor, A., Chor, B., Graur, D., Ophir, R., and Pelleg, D. 1998 Constructing phylogenies ...
doi:10.1089/cmb.2006.13.336
pmid:16597244
fatcat:l2y4ypheo5bbncforxo55ffqdi
Information Theoretic Approaches to Whole Genome Phylogenies
[chapter]
2005
Lecture Notes in Computer Science
We implemented the algorithm, using suffix arrays. ...
Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. ...
We can efficiently perform the subsequence search by using suffix trees [48] . Creating the generalized suffix tree for two sequences of lengths 1 , 2 requires O( 1 + 2 ) time. ...
doi:10.1007/11415770_22
fatcat:fhdjl343eje7de2y2bheevty44
Mining and applications of repeating patterns
2018
Vietnam Journal of Computer Science
In this paper, our purposes are to contribute an efficient mining algorithm for repeating patterns and to conduct a real application using the repeating patterns mined. ...
In addition to RP tree, another tree structure for generating the repeating patterns is suffix tree. Basically, the suffix tree is the compressed tree for the nonempty suffixes of a string. ...
Second, RP tree performs much worse than the proposed method and Suffix tree. ...
doi:10.1007/s40595-018-0120-1
fatcat:jjn7kqejdfhwracabifgq7snl4
« Previous
Showing results 1 — 15 out of 772 results