DeepMinimizer: A Differentiable Framework for Optimizing Sequence-Specific Minimizer Schemes [article]

Minh Hoang, Hongyu Zheng, Carl Kingsford
2022 bioRxiv   pre-print
Minimizers are k-mer sampling schemes designed to generate sketches for large sequences that preserve sufficiently long matches between sequences. Despite their widespread application, learning an effective minimizer scheme with optimal sketch size is still an open question. Most work in this direction focuses on designing schemes that work well on expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed
more » ... construct minimizer schemes for a specific target sequence. These methods, however, require greedy approximations to solve an intractable discrete optimization problem on the permutation space of k-mer orderings. To address this challenge, we propose: (a) a reformulation of the combinatorial solution space using a deep neural network reparameterization; and (b) a fully differentiable approximation of the discrete objective. We demonstrate that our framework, DeepMinimizer, discovers minimizer schemes that significantly outperform state-of-the-art constructions on genomic sequences.
doi:10.1101/2022.02.17.480870 fatcat:a4rqqwjwhvhbfal5qkxxbjs3lu