A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2011; you can also visit the original URL.
The file type is
Proceedings of the Annual International Conference on BioInformatics and Computational Biology & Proceedings of the Annual International Conference on Advances in Biotechnology
The D2 statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as it may be susceptible to single-sequence noise. We examine the extent of the problem and the effectiveness of overcoming it by using a mean-centred version of the statistic. We conclude that the D2 statistic is a useful measure of sequence similaritydoi:10.5176/978-981-08-8119-1_bicb23 fatcat:zm46xjb3hjdszjfuapusbw5iky