Hands-on training about overfitting

Janez Demšar, Blaž Zupan
2021 PLoS Computational Biology  
Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of
more » ... hine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.
doi:10.1371/journal.pcbi.1008671 pmid:33661899 pmcid:PMC7932115 fatcat:cckfe3xxmbgohcx6gpearweuii