Assessing Crowdsourced POI Quality: Combining Methods Based on Reference Data, History, and Spatial Relations

Guillaume Touya, Vyron Antoniou, Ana-Maria Olteanu-Raimond, Marie-Dominique Van Damme
2017 ISPRS International Journal of Geo-Information  
With the development of location-aware devices and the success and high use of Web 2.0 techniques, citizens are able to act as sensors by contributing geographic information. In this context, data quality is an important aspect that should be taken into account when using this source of data for different purposes. The goal of the paper is to analyze the quality of crowdsourced data and to study its evolution over time. We propose two types of approaches: (1) use the intrinsic characteristics
more » ... c characteristics of the crowdsourced datasets; or (2) evaluate crowdsourced Points of Interest (POIs) using external datasets (i.e., authoritative reference or other crowdsourced datasets), and two different methods for each approach. The potential of the combination of these approaches is then demonstrated, to overcome the limitations associated with each individual method. In this paper, we focus on POIs and places coming from the very successful crowdsourcing project: OpenStreetMap. The results show that the proposed approaches are complementary in assessing data quality. The positive results obtained for data matching show that the analysis of data quality through automatic data matching is possible but considerable effort and attention are needed for schema matching given the heterogeneity of OSM and the representation of authoritative datasets. For the features studied, it can be noted that change over time is sometimes due to disagreements between contributors, but in most cases the change improves the quality of the data. 4 of 29 with OSM_ID 26691437. Through an iterative process, all the versions of each OSM features in scope were downloaded and stored in a PostgreSQL/PostGIS database. This method provided a complete timeline of the OSM edits made in the area for the data of interest. Reference Dataset The reference data were extracted from the BD TOPO database produced by IGN. BD TOPO is a topographic dataset with a positional accuracy below 1 m. Its scope does not cover all POIs that are captured in OSM (e.g., there are no shops or restaurants), but the POI layer covers education, administration, transportation, religion, health, sports, and hydrography, which gives a sufficient overlap with OSM POIs for comparison purposes. The attribute values, originally in French, have been translated to English to enable semantic matching with OSM. The IGN POI dataset contains 6202 features. The change in IGN features is due to updates (e.g., a change in the real world, errors correction), changes in specifications or changes due to partnerships that provide data to IGN (e.g., Ministry of Education for the position of schools, RAPT for the local public transportation administration). Flickr Dataset The development of Web 2.0 [21] has led to a bi-directional Web where the aim of many web-based applications is to provide platforms that enable users to create and publish their own content and share this with other users. This development has led to the emergence of social networking websites. One of the first examples of such websites have been photograph sharing applications, e.g., Flickr, Picasa Web, or the more recent Instagram, which urges users to share their photographs along with titles, comments, keywords (known as tags), and their location (known as geo-tagging), and then to use them as means of networking. While geography or the location of the content is not the prime feature of such applications, they can still be implicit sources of GI since newly developed GI retrieval methods can transform implicit information into geospatial content [22] . In this study, geo-tagged photographs downloaded from Flickr have been used to evaluate the validity and quality of ambiguous OSM features. By using the Flickr API, all geo-tagged photographs uploaded to the Flickr website between April and December 2015 have been downloaded, resulting in a total of 79,722 geo-tagged photographs to help determine the presence and type of OSM POIs that cannot be identified using satellite imagery (e.g., POIs located under trees or the type of buildings).
doi:10.3390/ijgi6030080 fatcat:vmryhwtl7jgepdvbolosamztiy