[JT] Jochen Topf's Blog
Mon 2023-09-25 12:39

Minutely updated map

Most OpenStreetMap-based maps you see these days on the Internet are only updated every so often. Some may be updated daily, most are probably only updated every few weeks or months. And that is totally okay for most use cases. But there are some maps that are updated continuously so that changes in the OSM data appear as fast as possible, which is important to motivate OSM contributors and helps with quickly detecting and fixing problems. The most important map with continuous updates is the main map you see on www.openstreetmap.org.

Minutely Diffs

To enable everybody to keep up with changes in OSM data, the OSMF publishes minutely diffs, i.e. files containing the OSM updates for a single minute. There are several challenges when keeping a map up to date from those diffs. The first is that the diffs don’t contain enough information. Only the immediate object that changed is in the diff, but not other objects that also need data from the changed object to draw them correctly. This means that you need a database with all the OSM data for the whole planet locally available to correctly interpret the diffs and update the map.

I am working with osm2pgsql which has solved this problem a long time ago and recently got some improvements on that front.

Data Conversion and Cleanup

The second problem is that even if you have an up-to-date OSM database available, you still need to draw the map from that data. And the OSM data format isn’t a good fit for that job. You can directly draw a map from OSM data (the JOSM editor does this for instance), but usually you want other data formats in between (such as PostgreSQL/PostGIS tables or vector tiles).

Osm2pgsql was invented years ago to solve this problem by doing conversions from OSM data into database tables with geometries which can be rendered into maps directly by many tools out there. With the flex output I have been working on for several years now, this conversion has become much more powerful, so that we are getting better and better at converting the data into formats that are easier to work with on the rendering side. In this step we can also clean up the data which makes it considerably easier to work with when rendering.

Generalization

The third problem is the problem of generalization. The conversions described above usually only work on one OSM object at a time (or a few like a way and its node members). That works well enough for large zoom levels where you want to render features on the same level of detail as they are modelled in the OSM database. But for smaller zoom levels you need to accumulate hundreds, sometimes tens of thousands of OSM objects into something larger to show on the map. You can, sort of, do without generalization, but that means rendering can become very slow. Which makes the continuously updated map not updating so quickly any more.

I have been working on that problem in the generalization project, adding code to osm2pgsql to quickly calculate generalized data from the OSM data already in the database when something changes and write the data back into the database.

Technology Demo

To tie all of this together and to show what we can do today (and what we can’t do yet) I have created this technology demo. The map on that page is updated continuously from OSM data using the minutely diffs. For most data this includes updates of all the lower zoom levels. The only exception are the roads on lower zoom levels which are only updated every once in a while, because I haven’t yet figured out a way to do this really quickly. And the ocean data is also only updated once daily from osmdata.openstreetmap.de.

Tiles are rendered on the fly from the PostgreSQL/PostGIS database. There is no caching involved, each tile is created for each request. Of course this is not something fit for a production map where some caching will be needed, but I didn’t do it for this map on purpose to demonstrate how fast this setup still is at producing those tiles.

The tiles use the Shortbread tiles format. It is a somewhat simple format, especially compared to the wealth of detail you see on the OSM Carto map on openstreetmap.org, but it shows the most important stuff and we have to start somewhere.

Want to try this yourself?

My setup runs on a 8 core (16 with HT) machine with 128 GB RAM and the database needs about 800 GB plus about 80 GB for the flat node file. Setup takes about 20 hours including the initial import, initial calculation of generalized data and then catchup with the changes that happened in OSM in the mean time.

You’ll need osm2pgsql and osm2pgsql-themepark, use the current master. I am using the T-Rex tileserver in my current setup and the osm2pgsql-themepark framework will generate a configuration for that, but you can also use a different tileserver. (In fact development on T-Rex has stopped, so you might want to look into alternatives.)

What’s next?

This map is only a prototype. There is more work to be done to turn this into something that could be used for real. To make this operational the whole stack including web server, tile server, and caching has to be set up properly. Because I am not caching tiles, I didn’t have to set up cache expiry either.

The generalization code we have now is already working quite well for lots of layers, but it is kind of simplistic. And it has to become faster and more robust and needs to be extended to other feature types. It currently takes about a minute to re-generate all the data for the smaller zoom levels, that’s kind of tight for something that should update every minute. And, as mentioned, the road layers can only be updated less often. (They currently need about 15 or 20 minutes to update for the whole planet.)

Over a decade ago OSM was the first map out there that not only everybody could change, but that also would show those changes for everybody to see in minutes. But we have become stuck at that point. Changes in higher zoom levels are shown quickly, but in lower zoom levels it takes more time. And, arguably, we have also been moving backwards, because typical vector tile maps don’t support minutely updates. With this demo and blog post I hope to get some more people interested in this work so that together we can make a better, continously updated, always current OSM map. My focus will be to keep working on the building blocks in osm2pgsql to make this possible.

Tags: openstreetmap · osm2pgsql