A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Given a black-box classification model and an unlabeled evaluation dataset from some application domain, efficient strategies need to be developed to evaluate the model. Random sampling allows a user to estimate metrics like accuracy, precision, and recall, but may not provide insight to high-confidence errors. High-confidence errors are rare events for which the model is highly confident in its prediction, but is wrong. Such errors can represent costly mistakes and should be explicitlyarXiv:2102.12844v1 fatcat:5d4fv4izo5axljv5orqujjghum