A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Detection of Underrepresented Biological Sequences Using Class-Conditional Distribution Models
[chapter]
2003
Proceedings of the 2003 SIAM International Conference on Data Mining
A labeled sequence data set related to a certain biological property is often biased and, therefore, does not completely capture its diversity in nature. To reduce this sampling bias problem a data mining procedure is proposed for detecting underrepresented relevant sequences. The procedure is aimed at helping domain experts achieve a cost-effective qualitative enlargement of knowledge through an in-depth study of a small number of statistically underrepresented and functionally interesting
doi:10.1137/1.9781611972733.30
dblp:conf/sdm/VuceticPXO03
fatcat:xdetkknnafbulexarogp7pozpi