Crowdsourcing chart digitizer: task design and quality control for making legacy open data machine-readable

Satoshi Oyama, Yukino Baba, Ikki Ohmukai, Hiroaki Dokoshi, Hisashi Kashima
2016 International Journal of Data Science and Analytics  
Despite recent open data initiatives in many countries, a significant percentage of the data provided is in non-machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. Various types of software for digitizing data chart images have been developed. However, such software is designed for manual use and thus requires human intervention, making it unsuitable for automatically extracting data from a large number of chart
more » ... ges. This paper describes the first unified framework for converting legacy open data in chart images into a machine-readable and reusable format by using crowdsourcing. Crowd workers are asked not only to extract data from an image of a chart but also to reproduce the chart objects in a spreadsheet. The properties of the repro-This paper is an extended version of the DSAA 2015 long presentation paper "From one star to three stars: upgrading legacy open data using crowdsourcing" [1].
doi:10.1007/s41060-016-0025-y dblp:journals/ijdsa/OyamaBODK16 fatcat:iy7nffj53bbpnf4o5nw7ih3rsq