Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets

Kazuki Takabatake, Kazuki Izawa, Motohiro Akikawa, Keisuke Yanagisawa, Masahito Ohue, Yutaka Akiyama
2021 Genes  
Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance
more » ... homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.
doi:10.3390/genes12091455 pmid:34573438 pmcid:PMC8469100 fatcat:neohyzn6bngpxdu53va3vm3rhy