Every map of the world has to take into account that there are land areas and water areas. You could have huge polygons for each continent and paint them on a blue background. But handling those huge polygons is difficult. OpenStreetMap solves this problem by having ways tagged with “natural=coastline” and the convention that the land is always on the left side of this way. So a continent or an island is surrounded by one or more ways in counter-clockwise order. It is much easier to edit those coastlines instead of large multipolygon relations or something like it. Coastlines are the only case where OSM has this special rule.
But to use those coastlines for rendering or anything else, the coastline ways have to be assembled into “proper” polygons. And because those polygons are then too big to be used efficiently for many uses (including the rendering of maps), they have to be cut up into smaller pieces. For years we have been using the coastcheck application for this. Unfortunately coastcheck is quite a horrible mixture of scripts and programs thats difficult to work with and more difficult to work on. And it is slow. But, to be fair, it did it’s job for a long time.
Apart from being slow there are some things it doesn’t do and the problem of amending or replacing it have come up a few times. Most recently at the Karlsruhe Hack Weekend when Sven Geggus needed polygons for the water (and not for the land as coastcheck creates them). I have also wished several times to have nice land polygons to clip administrative boundaries on so that they don’t go out into the sea.
So last week I thought: How hard can it be to re-write all of that from scratch? To a) make it faster and b) allow a few more options in the output. As usual it turns out to be a mixture of “not hard at all” and “suprisingly tricky”. Using Osmium to walk though an OSM file and get all the ways tagged with “natural=coastline” is easy. And assembling the polygons isn’t that difficult either. But I also want to generate reasonably sensible error indications when something went wrong, for instance show the place where there is a self-intersection in one of the polygons. So that needed a bit of experimenting. Also we have to take “holes” into account, ie. bodies of water inside land areas. Normally “natural=coastline” should not be used for those (use “natural=water” or one of a few other tags instead), but there are a few hundred cases (for instance the Great Lakes in North America) so that has to be taken into account.
And the whole thing should be a) fast and b) not use too much memory. I first tried the simple approach of reading the OSM file only once, keeping all node positions in RAM for later use when assembling the polygons from the ways. But that needs about 13 GB of RAM just for those node positions. So I switched to reading the OSM file twice which makes the whole thing a bit slower but not really that much.
I am not finished yet, but I am releasing my OSMCoastline code anyway. It reads a planet file, assembles the polygons, and writes out the coastline and several error files in different formats. It does not fix broken rings (when there are gaps in the coastline), it does not project the coordinates into Mercartor and it does not split up the huge polygons into smaller ones. But it is reasonably clean and documented code, so those things can be added. It takes about 3GB of RAM and not even 20 minutes to run on a 2.8GHz Intel Core i7 processor.
If and when I have the time I’ll add those extra features (and work on the documentation). But if you feel adventurous you can already play around with it (or add some features). You can look at the shapefiles it generates with QGIS for instance. I have supplied a QGIS project file. Just be aware that some of the polygons are huge (the largest polygon – Europe/Asia/Africa – has over 4 million points from over 38000 ways) and even on modern machines it takes a renderer quite a while to display that.