[JT] Jochen Topf's Blog
Fri 2014-02-21 19:30

OSM Test Data Repository

I have been writing lots of software for the OpenStreetMap project and there is always one problem: How to test it? There is a huge variety of data in OSM, every conceivable way of tagging, every way of getting the data slightly (or not so slightly) wrong, will sooner or later appear. We need to test file formats, OSM objects syntax and semantic, changing data, cases with failing referential integrity or invalid geometries and much much more. But just coming up with all those test cases is a huge task and implementing those tests is even more work.

I have often wished that there was a common test repository that could be used to test many different programs in the OSM world. And I know from others that they have wished the same. There were a few efforts to create such as repository but they never went anywhere. Now I am trying again.

This current effort has been triggered by my work for Mapbox on the Osmium library, in particular the new multipolygon assembling code I am working on. But there is much more code in Osmium that has not been tested well enough. So unlike some previous efforts for such a test data repository, this one is part of my paid work, so there is a chance that it will get somewhere. Even though collecting test cases and writing unit tests for them is incredibly tedious… But it needs to be done.

Test cases in QGIS

You can see the current state of my efforts at the osm-testdata repository. I have spent some time organizing and re-organizing the tests cases. They should be organized in a way that it is reasonably easy to run them automatically, but also to look at them, for instance with QGIS, and see what’s what. As an example, the image shows how the multipolygon test cases can be visualized in QGIS. (Green border means the test case contains valid data, red border means the test case contains invalid data. Nodes and ways are brown, ways labelled with their ID. Multipolygons created are blue. Test numbers are in the middle of the grid squares, gray grid squares are currently unused tests. If you look closely, you’ll notice test case 730 showing the wrong result.)

See the project README.md and the grid README.md for some more info how the data is currently organized. If you have QGIS2 installed, clone the repository and load the tests.qgs QGIS project file to have a look around. Of course not all test data can be neatly organized into this grid, but we’ll find other places for that data later.

I have started to use this test data in the development of libosmium. See the test/data-tests directory for the code.

Of course this is only the beginning. We need a lot more test cases for very different issues. And I fully expect there to be changes to accommodate other people’s testing needs. Please contact me if you are interested in using this (or other) test data and/or if you already have test data and want to talk about integrating it.

Tags: dev · openstreetmap · osmdata