After a long time I have finally released new versions of the Osmium command line tool and the Osmium C++ library. PyOsmium also got a new release to match the libosmium version.
Prompted by the new Data Protection Regulation coming soon in the European Union, there is a discussion in the OSM community about having less personal information such as user names in the OSM data. I don’t want to go into the legal aspects here, but there are some technical issues to solve. In particular the Osmium toolchain should be able to handle OSM data files missing some of the metadata on OSM objects such as user name or timestamp.
Osmium already supported reading and writing files without any metadata (except object IDs which are always needed), but it wasn’t possible to read or write data with only some metadata attributes but not others. This has now been changed and made completely flexible: Any metadata attribute can be written to a file or left out. When reading files, non-existent metadata isn’t an error, but a default value is simply assumed.
Most of this work was done by Michael Reichert who spent a huge amount of time testing different software and sending me patches and test cases. As a result of this the new Osmium version now supports OSM files with reduced metadata better than any of the other software tested by him.
If you want to play around with this yourself, read the section on the add_metdata option in the file formats manual.
I went through the code in libosmium that reads OSM data files, including the XML parser and PBF reader. Some of this code wasn’t as robust as it could be. In some places it assumed data was correct, the XML correctly formatted etc. I have rewritten parts of the XML reader and added checks here and there to make sure the code will not fail even if somebody feeds it badly formatted input data. I have also used fuzzing to test this code extensively.
I also spent many hours going through the reports from the excellent clang-tidy linter tool which finds a lot of (potential) problems in C++ code. Most of the issues found by clang-tidy were probably not real problems, but I am happy for all the help I can get to make my code more robust and more correct. And I did find and fix some genuine bugs this way.
I have fixed nearly all of the issues reported by clang-tidy by now. Some warnings have been disabled where they don’t make much sense in our context and I have documented this in the .clang-tidy config files (for libosmium, for osmium-tool). There are some false-positives and there are some problems reported which I can’t fix because they are not in my code but in the C++ standard library or so, so the board is not completely green yet. But I’ll keep working on this and keep checking the reports from clang-tidy regularly.
The Osmium command line tool contains several further small changes making life easier for its users or adding options to commands. For instance the bbox option now allows any two opposing corners, instead of insisting on the bottom-left and top-right corner. This affects the changeset-filter and extract commands. Most of these changes are based on user feedback. See the change log for the complete list of changes.
Users of libosmium need to beware. The Protozero library used to be included in the libosmium code. While this was often convenient, it could also lead to problems when you application was already using Protozero in a different version than what libosmium had included. So now you have to install Protozero separately when using Libosmium.
This libosmium release also contains a fix for a multipolygon problem. Some more complex cases with touching inner rings weren’t handled properly. This was the last bug in the multipolygon code (that I am aware of) making this version of libosmium a good basis for the next osm2pgsql release.
Tags: openstreetmap · osmium