Some month ago I introduced Taginfo and mentioned there in passing that the statistics collection is done based on a C++ framework called Osmium. I promised another post and I am delivering on this promise now. I have been working on Osmium on and off over the last 6 months or so and, while it is far from perfect, I think it could be actually useful for other people now. So lets dive right in. What is Osmium and what is it good for?
Osmium is a C++ framework for working with OSM data files. Osmium will parse OSM files (XML or PBF) and call back into handlers for each object (node, way, relation) it encounters along the way. Osmium tries to be a very thin layer with as little overhead as possible. It doesn’t do anything with the OSM data, not even store it. It just gives you those objects one after another. The handlers it calls can then do interesting things with those objects.
Of course you can write your own handlers, but Osmium already comes with a few handlers:
The Statistics handler counts number of nodes, ways, relations, tags etc. It’s pretty simple, really, and probably a good starting point if you want to write your own handler.
Even simpler is the Bbox handler that calculates the bounding box for the input data from the node locations.
The NodeLocationStore handler will store the location of each node in memory and then use this data to build the way geometries. There are two different ways of storing the data, one is better for smaller, one better for larger OSM files.
The TagStats handler is used for creating the statistics for Taginfo.
A more useful handler is the Multipolygon handler. It assembles proper multipolygons from relations tagged with type=multipolygon or type=boundary. It can even correct some common mistakes like rings that were not properly closed. If you add this handler to your application, you’ll get an additional callback for every (multi)polygon. This handler works only if you read the input file twice, on the first pass the handler stores information about all multipolygon relations in memory, on the second pass, it assembles them from the node and way data.
Osmium is available from Github. You’ll find more information in the README files. The documentation is currently pretty basic. Feel free to ask me questions if you want to use Osmium and don’t understand something.