Geocoding with OpenStreetMap Data

Konstantin Clemens
OpenStreetMap (OSM) is a platform where users contribute geographic data. To serve multiple use cases, these data are held in a very generic format. This makes processing and indexing OSM data a challenge. Nominatim is an open source search and geocoding engine that consumes OSM data. While Nominatim does process OSM data well, it does not use term frequency-inversed document frequency (TF/IDF) based ranking of search results. Lucene is a framework offering TF/IDF for ranking of indexed
more » ... of indexed documents. In this paper Nominatim's processing of OSM data is utilized to assemble full addresses with their geocoordinates. These addresses are then indexed in Elasticsearch, a web service on top of Lucene. The resulting TF/IDF based geocoding system is benchmarked in comparison with plain Nominatim. The analysis shows: TF/IDF based ranking yields more accurate results, especially for queries with unordered address elements or only partially specified addresses.