PMLB v1.0: An open source dataset collection for benchmarking machine learning methods [article]

Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan Fu, Jason H. Moore
2021 arXiv   pre-print
Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods
more » ... gated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.
arXiv:2012.00058v3 fatcat:7n7hgofluvfu5lowpc7pgtf2w4