To start the OSM generalization project I decided to look at some specific generalization problems with a wide variety of use cases and challenges, and with different types of solutions. It is obvious that I can’t implement every algorithm out there and solve every generalization problem, but I can take a sample of typical problems with different solutions and try to implement them. This should hopefully give us a better understanding of the problem space and a basis on which we can build later.
Based on some experience from earlier projects, some reading of papers, and a few looks into what other people have been doing before, I decided to concentrate on the following problems. I’ll give an overview about the problems and challenges here and leave the details for later blog posts.
In OpenStreetMap points of interest are sometimes represented by nodes, i.e. point geometries, but sometimes also as polygons (usually building outlines). For many use cases the exact outline of the polygon isn’t important, you just need a single position somewhere on that polygon, for instance to draw an icon there. A similar problem is finding a labelling point of a polygon, that is the point where to put the name label of a forest or a lake or a country.
For a human that sounds like a simple problem, you just use the “center” of the polygon. But for a computer it is not that easy, because it is not at all clear where the “center” of an irregular shape actually is.
The problem is interesting, because…
Small-scale maps often need generalized landcover polygons. Several types of landcover can be aggregated (for instance different types of wood) and the resulting polygons simplified for the target resolution.
The problem is interesting, because…
Typical maps show built-up areas in lower zoom levels to visualize the important difference between urban and rural landscape. Those built-up areas have to be derived from a diverse set of input geometries like landuse and building outlines.
This is similar to the previous problem (landcover generalization) but a bit more complicated. While OSM data has tags for classifying an area as residential, commercial or industrial for instance, the use of these tags is spotty and we need to take other features like buildings or a dense road network into account, too.
The problem is interesting, because…
Large waterways such as rivers and canals are important features on small scale maps. On the other hand there are a lot of small streams and tiny lakes which should not show up on the map to not clutter up the display.
The problem is interesting, because…
Most maps show places like cities, towns, and villages with some symbol (like a dot) and a name. When “zooming out” fewer and fewer dots should appear on the map, only showing the more important places. We can approximate “importance” by using additional information like the population count of a place. Simple maps just use a cut-off value for the importance, something like “show only places with 100.000 inhabitants or more”, but this leads to very “unbalanced” maps with lots of dots in densly populated places like central Europe and empty maps in the middle of Australia.
The problem is interesting, because…
All of these problems were chosen because they are relevant for most maps. They also cover a variety of problems and solutions. The first problem only needs to process a single objects at a time and can thus be implemented directly in osm2pgsql without help from the database. For all others we need to aggregate several objects together using the database in some way. Some problems are geometrically challenging, for others the difficulties are more in marshalling lots of data and finding ways to do partial updates, instead of having to re-process the whole world whenever something changes.
None of these problems or their solutions are completely new. The goal here is not to invent a new algorithm, but to implement something that’s reasonably easy to use and to extend and that works with real-world data and the quick update cycles of OpenStreetMap.
This is a research project so it is not clear how much of this plan will be implemented in the end. It depends on how much time the work will need and on whether I find good and practical solutions for the problems mentioned. We need solutions that are good and robust enough and that are maintainable. After all we don’t want just a prototype but we have to integrate everything into an existing program that’s used by many people.
Tags: generalization · openstreetmap