[JT] Jochen Topf's Blog
Wed 2012-08-15 20:43

Osmium MultiPolygon Refactoring

Osmium has had support for assembling multipolygons from relations for a long time. But the code had been lifted from an earlier Geofabrik project and was rather messy, hard to extend, and it contained several memory leaks. In the last month or so I finally chewed through the whole thing bit by bit and made it prettier.

First I took out the stuff that reads the relations and finds all the members for those relations. There is a new class Osmium::Relations::Assembler now that does this in a generic way. This means we now have a powerful and very easy to use way of handling any kind of relations such as public transport routes and so on. Specific relations are handled by writing a subclass of the Assembler class, specifying a few template parameters and overwriting a few member functions.

The multipolygon specific code is in such a child class called Osmium::MultiPolygon::Assembler. It uses the Osmium::MultiPolygon::Builder class for the actual building of the multipolygons. This functionality used to be in the Osmium::OSM::Area class. The Area class is now only a small class similar to Node, Way, and Relation.

The few huge functions creating the multipolygons have been split up into many smaller functions. This makes it much easier to see what is going on and to improve the code. I have found and plugged most of the memory leaks. (There is one leak left that only shows when there are touching inner rings. I am not sure whether thats a problem in the Osmium code or in GEOS.) And there are many other aspects of the code that I have improved. There is still more to do to make the code cleaner, faster and more easily extensible, but it is good enough to be released now.

I have also spent some time thinking about how all this functionality can be used more easily. You can see the result in the example programs. Osmium_mpdump simply assembles all members of all relations and dumps some information to stdout. It takes only a few lines of code to do this. And it is only a few lines more to write assembled multipolygons to any kind of GIS format supported by the OGR library (see osmium_toogr2 for example code).

I have timed the new code. It seems to be a bit slower and uses a bit less RAM, but only marginally. With the cleaner code it should be possible to find some more optimization opportunities.

Tags: c++ · dev · openstreetmap · osmdata · osmium