A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
String Kernel (SK) techniques, especially those using gapped k-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we increase the dictionary size (Σ) or allow more mismatches (M). This is because current gk-SK uses a trie-based algorithm to calculate co-occurrence of mismatched substrings resulting in a time cost proportional to O(ΣM). We propose a fast algorithm for calculatingdoi:10.1101/329425 fatcat:v6dvhqoa2jgo7gfhklwooxfq5u