[JT] Jochen Topf's Blog
Mon 2024-09-30 23:17

Introducing OSM XRAY

This blog post is about the new OSM XRAY service. You can find a demo here. It uses vector tiles with all OSM data to really show all of OSM, mostly for debugging. But it is also interesting to just explore what’s there.

Vector Tiles with ALL Data

Every time there is talk about vector tiles of OpenStreetMap data the question arises whether we can have vector tiles with “all of OSM”. Not just data with this or that tag suitable for rendering this or that type of map but everything. All of it. All objects from the OSM database.

It is possible to do that. Nine years ago Mapbox/HOT already did that with something called QA tiles. The project is long defunct though. And we have a lot more data by now. More recently Wat Mildon published his WAMap (See also this thread on community.openstreetmap.org) which also shows that this is totally possible.

So it is possible. And it turns out it is actually not that complicated. But is it useful?

There are two problems with vector tiles containing “everything”:

  1. “Everything from OSM” is quite a lot of data. Vector tiles work best if they are at most a few kilobytes each. If you include everything in OSM, vector tiles will often need a few megabytes. This data takes longer to transmit and the browser will struggle to render it. Everything will become sluggish.
  2. Rendering OSM data into a usable map is difficult. You need to take many tags and tag combinations into account, handle bad data and handle OSM specialties like ways that can be lines or polygons or complex relations. And you need to do that with a rendering configuration that is rather limited and hard to use, like the styling language of Maplibre.

These problems become much more pronounced for smaller zoom levels. It might be slow, but on zoom level 14 you can download and show everything. But on zoom level 0 you would have to download the whole planet file with more than 70GB and process that in the browser just to show the outlines of the continents. Not really feasible.

The solution for all these problems is usually to preprocess the data on the server, decide which parts you need on which zoom level and convert the data into a format that the renderer can work with more easily.

Use Cases

Okay, so most use cases don’t need the “full OSM data” in vector tiles and they are better of doing things differently. But maybe there are use case which do need it. I can think of two: One is for experimentation. Sometimes you want to create a map but don’t know yet what OSM data there is and how best to transform it into something more useful and faster to render. Having all data available might make the map slow to be created, but it is still possible. So you can show the data, tweak the display and when you are done and know what you want, go back to the server, convert the data there and then have a fast map. This is a definite use case and I have used this approach many times, but usually using QGIS and not vector tiles. It is much easier to play around with a GUI tool.

The other use case I can think of is as a debug view. Similiar to what the “Map Data” button on openstreetmap.org does. Or similar to what you see in an OSM editor where you can click on everything and see the tags etc. In fact the existence of OSM editors in the browser shows that this is totally feasible and does make sense.

Now having said all of this, the question of what would be possible is still open. How far can we go?

How to do it?

Just creating vector tiles with everything in them is pretty easy. Use the flex output of osm2pgsql to create a database with all OSM objects having a “tags” JSONB column that contains all the tags and we are already half there. Using osm2pgsql gives me something extra that nobody has done yet with “full” vector tiles: The database can be updated continuously from minutely diffs, so we can show a map that always shows all current OSM data.

From the data in the database we need to create the vector tiles and we need to do it fast. There are several vector tile servers that can do that. For this project I am using pg_tileserv, because it will automatically convert all the data from the JSONB tags column into vector tile attributes and is pretty easy and flexibly configured using PostgreSQL functions. Creating vector tiles on something like zoom level 12 or 14 is very straightforward that way. In a densly mapped urban area the browser might take a while to download and render that tile, but it does work.

Of course if you want to do complex rendering this will become more and more expensive with every rendering rule you have. But for a debug site a wire-frame view is enough. So we only need a few rules for rendering black dots and lines.

For smaller zoom levels this isn’t a solution though. We need to somehow reduce the amount of data. I played around with several approaches for a while, simplifying the geometries. But sooner or later this approach breaks down. So I reverted back to “ye olde raster tile” approach for smaller zoom levels. I could have gone with a renderer like Mapnik, but just for the fun of it I decided to try the raster capabilities of the PostGIS extension. Turns out you can render vectors into raster tiles completely inside the database and after some experimentation I got it to be reasonably performant, too. There aren’t many styling options, though, so this approach will not work for most use cases. But in my case I am interested only in, as I said, drawing some black dots and lines, and PostGIS can do that.

Okay, I have the outline of a solution here. Vector tiles for zoom level 14 and above, raster tiles for zoom levels 0 to 13.

Caching and Prerendering

For the higher zoom levels the database query is actually quite performant. And because this service is not intended for massive use, just as a debugging tool for a few mappers, we could probably get by without any kind of caching. But just for the fun of it I added some caching inside the database. For a high traffic site I would not consider this, the overhead for caching inside PostgreSQL compared to about any other solution must be quite large. And the database is the bottleneck for the whole operation, so we should not add any load to it. But I realized I could add caching inside the database without any additional software component, just using the abilities of pg_tileserv and osm2pgsql for caching and expiry, so I wanted to try it. And at least I created the tile_cache table as UNLOGGED to reduce the overhead somewhat. We’ll see how that goes.

The smaller the zoom levels gets, the more data needs to be processed for each tile. And that takes time. So from zoom level 10 down I pre-render all tiles. Some tiles are rendered from the data and for the smallest zoom levels they are just resized bitmaps of the middle zoom levels. Currently these are rendered once on import and never rerendered, but it should be possible to do that once a week or so. It takes a few hours to do this.

Creating all of this took a few hours here and there on a few weekends. But I needed more, I needed some way to show all of this in a browser.

The Application

Actually creating the tiles was easy compared to figuring out how to show the data and how to interact with it. I wanted a fast and easy to use tool that allows exploring the OSM data. It took me quit a while and many experiments to get the interface to where it is now. How do you show tags? How do you handle selecting objects? What if there are multiple objects under the mouse cursor? How do you show relations? All at once or one member at a time?

The solution I came up with is certainly not perfect, I am sure we’ll improve it over time. There are some clear limitations, for instance selections only work in zoom level 14 and above. But it is already fun to explore OSM that way. So I am happy to release it in the state it is in. I am excited to see what improvements we’ll come up with over time.

Also it turns out that there is so much OSM data that it is sometimes difficult to navigate this map full of black dots and lines that cover the background map shown for reference. So I added an overlay of OSM boundaries. This was straightforward to do because XRAY uses the osm2pgsql Themepark framework which makes it easy to mix in layers from different configurations, in this case I am using the boundaries layer from the Shortbread configuration.

What’s next?

Currently this is only running as a test/demo on the osm2pgsql test machine sponsored by FOSSGIS. It can’t stay there forever because we need that maschine for testing new osm2pgsql versions etc. We’ll see where we can find a permanent host for this.

There are plenty of things that need improving. For some of them I have already opened issues in the OSM XRAY repository and I am sure you’ll come up with more ideas and find some bugs. The two most important issues in my mind are the question of how to handle selected objects in low zoom levels and how to handle filtering in low zoom levels. I am open to ideas.

Tags: openstreetmap · osm-xray · osm2pgsql