[JT] Jochen Topf's Blog
Tue 2021-04-27 13:24

Libosmium Speedup

Yesterday I released version 2.17.0 of the Osmium library. This version is noticeable faster when reading and writing PBF files.

Nothing is ever perfect and that goes for software, too. I like going back to software that I have written years ago and make it better. And for a library that is used in many places it makes a lot of sense to do that occasionally. Usually this is triggered when I work on some software to add a new feature or fix some bug and I notice something in the code which could be improved. Or sometimes it is triggered by me finding a new tool which gives me some new insight or capability. In this case I stumbled upon the heaptrack memory profiler which is really easy to use and gives you good insights into where and how much memory is allocated.

Memory allocation is often a performance problem, so it is always a good idea to take a closer look at code which does a lot of it. Much of the memory allocation in libosmium is in the input/output code and the most important file format is PBF. And there had been a recent complaint from a user that the PBF code takes too much CPU so that was another reason to take a fresh look at that code and see what I could find. Some of what I found was related to memory management and I refactored some of that code. But the bigger issue in the PBF reading code, it turned out, was that some work was basically done twice. And this code (decoding varints) was quite expensive. Rewriting that section made the biggest difference.

PBF reading was already quite fast and well distributed over the available CPUs. With the new changes it became about 15% faster. On a modern server with 12 CPUs I measured reading and decoding of a planet PBF file at about 2.5 minutes.

Because writing PBF doesn’t happen as often as reading PBFs I have never given as much attention to the writing code as to the reading code. But many people need to, for instance, update planet files regularly. And this means writing a 60 GByte file. And for technical reasons it is more difficult to move much of that work to other threads, i.e. CPUs. But this time around when I looked at the code I found an opportunity to do just that. Only for a small part in the code, but better than nothing. This made the writing code also about 15% faster.

Taking this together means a planet update can now be done in about 25 minutes on my server saving about 4 to 5 minutes. Running osmium add-locations-to-ways is now about 25% faster overall. Not huge differences, but it means processing speed can keep up somewhat with the growth of the planet file.

Already in the last libosmium version 2.16.0 I added support for the LZ4 compression algorithm for compressing PBF blocks. LZ4 is much faster encoding and decoding than the ZIP compression used normally. So for some use cases where all the software you use supports the LZ4 blocks, you can get even more performance out of libosmium.

There are more changes and fixes in the new release, see the change log. Thanks also to mmd who found several bugs in the OPL and O5M parsers and helped fix them.

The work on Osmium is partially supported by Geofabrik. Thank you!

Tags: openstreetmap · osmium