Jochen Topf's Blog

Mon 2022-11-07 10:16

Selection of generalization problems

To start the OSM generalization project I decided to look at some specific generalization problems with a wide variety of use cases and challenges, and with different types of solutions. It is obvious that I can’t implement every algorithm out there and solve every generalization problem, but I can take a sample of typical problems with different solutions and try to implement them. This should hopefully give us a better understanding of the problem space and a basis on which we can build later.

Based on some experience from earlier projects, some reading of papers, and a few looks into what other people have been doing before, I decided to concentrate on the following problems. I’ll give an overview about the problems and challenges here and leave the details for later blog posts.

1. Finding representative points for polygons

In OpenStreetMap points of interest are sometimes represented by nodes, i.e. point geometries, but sometimes also as polygons (usually building outlines). For many use cases the exact outline of the polygon isn’t important, you just need a single position somewhere on that polygon, for instance to draw an icon there. A similar problem is finding a labelling point of a polygon, that is the point where to put the name label of a forest or a lake or a country.

For a human that sounds like a simple problem, you just use the “center” of the polygon. But for a computer it is not that easy, because it is not at all clear where the “center” of an irregular shape actually is.

The problem is interesting, because…

the, at first, obvious and simple to compute solution of using the
centroid (center of gravity) doesn’t work well
several solutions differing in their suitability and performance are
in common use
the generalization is done only on single objects, it can be
implemented in osm2pgsql alone without the help of the database

2. Generalize landcover polygons

Small-scale maps often need generalized landcover polygons. Several types of landcover can be aggregated (for instance different types of wood) and the resulting polygons simplified for the target resolution.

The problem is interesting, because…

it is a very common problem and especially important for vector-tile
based maps which need small geometries with few points for
performance
the usual solution involves merging many complex polygons into one
large one and doing buffer operations on that which is really
expensive
the solution needs the database as backing store and probably a
tile-based approach for updates

3. Deriving built-up areas

Typical maps show built-up areas in lower zoom levels to visualize the important difference between urban and rural landscape. Those built-up areas have to be derived from a diverse set of input geometries like landuse and building outlines.

This is similar to the previous problem (landcover generalization) but a bit more complicated. While OSM data has tags for classifying an area as residential, commercial or industrial for instance, the use of these tags is spotty and we need to take other features like buildings or a dense road network into account, too.

The problem is interesting, because…

we need to take a variety of input features with their tags into
account

4. Generalization of waterway network

Large waterways such as rivers and canals are important features on small scale maps. On the other hand there are a lot of small streams and tiny lakes which should not show up on the map to not clutter up the display.

The problem is interesting, because…

it mixes linestring and polygon features
figuring out importance (for deciding which waterways to show) isn’t
straightforward
waterway linestrings need to be merged into long linestrings
before it makes sense so simplify the lines
updates probably have to be triggered based on the ids or tags of the
original objects

5. Filtering places based on density

Most maps show places like cities, towns, and villages with some symbol (like a dot) and a name. When “zooming out” fewer and fewer dots should appear on the map, only showing the more important places. We can approximate “importance” by using additional information like the population count of a place. Simple maps just use a cut-off value for the importance, something like “show only places with 100.000 inhabitants or more”, but this leads to very “unbalanced” maps with lots of dots in densly populated places like central Europe and empty maps in the middle of Australia.

The problem is interesting, because…

unlike the other problems this is only a about selecting features
not about doing anything complex with the geometry
there are several documented algorithms, but we need to find a good
solutions that works with updates
there are many similar problems out there, like deciding which
mountain peaks to show in densly mountainous areas

Conclusions

All of these problems were chosen because they are relevant for most maps. They also cover a variety of problems and solutions. The first problem only needs to process a single objects at a time and can thus be implemented directly in osm2pgsql without help from the database. For all others we need to aggregate several objects together using the database in some way. Some problems are geometrically challenging, for others the difficulties are more in marshalling lots of data and finding ways to do partial updates, instead of having to re-process the whole world whenever something changes.

None of these problems or their solutions are completely new. The goal here is not to invent a new algorithm, but to implement something that’s reasonably easy to use and to extend and that works with real-world data and the quick update cycles of OpenStreetMap.

This is a research project so it is not clear how much of this plan will be implemented in the end. It depends on how much time the work will need and on whether I find good and practical solutions for the problems mentioned. We need solutions that are good and robust enough and that are maintainable. After all we don’t want just a prototype but we have to integrate everything into an existing program that’s used by many people.

Tags: generalization · openstreetmap