Fast searching in biological sequences using multiple hash functions

Simone Faro, Thierry Lecroq
2012 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)  
With the availability of large amounts of DNA data, exact matching of nucleotide sequences has become an important application in modern computational biology and in metagenomics. In this paper we present an efficient method based on multiple hashing functions which improves the performance of existing string matching algorithms when used for searching DNA sequences. From our experimental results it turns out that the new proposed technique leads to algorithms which are up to 8 times faster
more » ... 8 times faster than the best algorithm known for matching multiple patterns. It turns out also that the gain in performances is larger when searching for larger sets. Thus, considering the fact that the number of reads produced by next generation sequencing equipments is ever growing, the new technique serves a good basis for massive multiple long pattern search applications.
doi:10.1109/bibe.2012.6399669 dblp:conf/bibe/FaroL12 fatcat:uw77ar4g5fdfzpmaxisf3stuyy