A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are "novel" compared to the others in the same dataset, and low weights to sequences that are over-represented. We formalise this principle by rigorously defining thedoi:10.1101/2020.12.03.410100 fatcat:5naoqzvkxne35aeo3y5rjqvf6i