Its high time I write something about the news in the Osmium world. After 10 days or so of multiple releases in short succession things have quieted down some and I think we have something more stable now.
Its high time I write something about the news in the Osmium world. After 10 days or so of multiple releases in short succession things have quieted down some and I think we have something more stable now.
I am busy preparing for two events: Only two days now until the State of the Map 2016 in Brussels, where I’ll be giving a talk about taginfo. On Sunday I’ll be hosting an Area Workshop where we want to talk about the effort to fix broken and inefficiently tagged (multi)polygons in OSM. I’ll be around for (part of) the hack day, too.
The most requested features of taginfo is a view showing historical use of tags. I never had the time to add this, but fortunately there is now a solution: Martin Raifer created a service that shows the history of tags in nice graphs. You can see it at taghistory.raifer.tech. Some more background in a diary entry from Martin. Matthijs Melissen has already used this to find some interesting statistics.
About two years ago I introduced the taginfo “projects” feature: A way to connect taginfo with all the uses of OSM tags out there, to editors, special maps, routing services etc. By now we have about 50 projects that have published a project file describing their tag use. This data gets integrated into taginfo and helps mappers and users of OSM data to see who actually uses which tags. Today I have released some changes to taginfo that integrate this information even better.
OpenStreetMap has a somewhat peculiar data model. OSM ways don’t contain the locations of the points they go through, but only references to nodes, which in turn contain their location. This model has some advantages, but it also has a major disadvantage: For almost all uses of OSM, like drawing maps or calculating routes, the software has first to find the locations of all nodes in a way, before it can do anything with it. This is cumbersome and potentially expensive. To do this quickly, the lookup needs to be done in main memory without any disk access, but to store the locations of all nodes of a full planet file we already need about 32GB RAM. But maybe we can create a shortcut, so that we can do this once, instead of everybody doing this again and again?
The number of objects in OpenStreetMap is growing and with it the number of unique IDs. For node IDs we have gone over 4 billion now. Soon they will not fit into an unsigned 32 bit integer any more (232=4294967296).
It has been a long time since I wrote something about Osmium in this blog. Considering that I have worked on Osmium and related software almost every day for many months now, this is a huge omission. I want to make up for this a little bit with this posting.
The recent London OSM hack weekend gave me the incentive to work on taginfo once more and we got quite a few things done on the weekend and the days after. Taginfo now works better on mobile, is faster and we have a very cool tag list feature for the wiki. Lets look at these in turn…
Recently I was working on some OSM software when I noticed a strange thing…
Three years ago I launched the OpenStreetMapData.com web site to host the OSM coastline data extracts I was generating. Since then I have been updating those extracts at least daily and offering them for download on that site. The coastline extracts are used in many OSM maps including the OSM standard style.
Last week I gave a talk about taginfo at the FOSSGIS 2015 conference. Thanks to the amazing video team all talks from the conference are available online in great quality. As always you can find my talk on my Talks page.
I just added a few features to taginfo: There is a new “Similar” tab on all key pages, see for instance all keys similar to the “highway” key. This makes it easy to find related keys and typos. The new Similar keys report shows keys that are commonly used together with similar keys that are rarely used. Often, but not always, the rare keys are just misspellings.
The next annual FOSSGIS conference on Open Source geographic information systems and OpenStreetMap will be held in Münster, Germany, from March 11th to 13th. I’ll be giving a talk there about taginfo.
I am a heavy git user. I am actively developing in about half a dozen repositories at the same time and have many more I occasionally check out or add to. I have files lying around that need to be added, changes to be committed, commits to be push, all in several branches, local and remote. And then there is the stuff I stashed months ago and forgot…
Taginfo has the “mission” of aggregating all available information about OSM tag usage and present it to the users. The main sources of information are the OSM database and the documentation in the OSM wiki. But there are many more sources of information to be tapped. Everybody who creates a map, or writes presets for an editor, everybody writing a data export program or a routing application has to decide which tags to use in what way. All this is useful information, but it is hard to come by. Everybody uses different programming languages and different config file formats. And all this “stuff” is spread out over many software repositories and web sites.
When I started using the ZSH shell years ago, it was because of its superior command line completion features. Since then, the Bash shell has improved a lot, and probably has the same features, but I am used to ZSH now. And when a command doesn’t have completion it feels cumbersome to use. So, of course, I wanted to have completion for the osmium tool.
Last weekend we had the SOTM-EU here in Karlsruhe. I had a lot of fun, but didn’t get enough sleep, something you’ll notice in the video from my Osmium talk. Oh well, not everybody can have their talk on the first day.
I spent a few days working on taginfo again. I fixed some bugs and refactored code here and there and added a heap of new features.
QGIS caches the data for the layers it is showing. This can be a bit annoying when the data changes and QGIS doesn’t know about this. What is missing is some kind of “reload layer data” function. Instead I have to remove layers and add them again all the time. Or, instead of reloading each layer, I reload the whole project. Still, this needs several clicks and after the reload you are back to the project extent instead of where you were before the reload.
Next week the (German language) FOSSGIS-Konferenz takes place in Berlin. For the first time since 2007 I am not giving a talk there, but I’ll be around all three days to learn new things and (re-)connect with people from the OSM and Open Source GIS communities. Lars Lingner and I are also organizing a BoF session for freelancers. We both are have been seeing more and more interest in OSM and GIS development and services and are often contacted for projects larger than one person can handle or where different skillsets are needed. We want to better connect the freelancers in these fields to to able to help each other out. So if you are working freelance or have a small company in the OSM or Open Source GIS fields, please come join us on Wednesday for a chat.
I have been writing lots of software for the OpenStreetMap project and there is always one problem: How to test it? There is a huge variety of data in OSM, every conceivable way of tagging, every way of getting the data slightly (or not so slightly) wrong, will sooner or later appear. We need to test file formats, OSM objects syntax and semantic, changing data, cases with failing referential integrity or invalid geometries and much much more. But just coming up with all those test cases is a huge task and implementing those tests is even more work.
In October 2010 I introduced taginfo. Since then I have updated it occasionally and added new functions. And over time it has become part of the “OSM landscape”.
On January 16 2014 I will give a talk about the OpenStreetMap data model at the HafenCity University in Hamburg. The talk will be in German. Everybody is welcome to attend, you don’t need to be a student there.
I have written before about my work on the software doing the assembly, repair, and conversion of the OSM coastline data. I wrote the Osmcoastline software and I run it daily to create shapefiles with land and water polygons available for download on OpenStreetMapData.com. For a while this data has now been used on the main OSM map. Unlike with the old process, this new process usually makes sure the map is updated within a few days when the coastline changes.
A few weeks back I gave two talks at the State Of The Map 2013 conference. One about Osmium, the other called “Towards an Area Datatype for OSM”. The slides and videos from both talks are now available on my talks page. Thanks to Peter Barth for “liberating” the videos from the 8 hour long flash-only videos put up originally.
Over the last months I have been busy working on a project for Mapbox. As part of that project I have spent a lot of time improving Osmium. And not just a few changes here or there but a more or less complete redesign based on the experience of developing and using Osmium for the last nearly three years now.
Last week I was at the yearly FOSSGIS conference, the German language conference for Open Source GIS, Open Data and OpenStreetMap. This year it took place in Rapperswil, Switzerland near Zurich. I like the relaxed atmosphere and the good mixture of introductory talks and talks for power users.
I have had a personal web page for a long time. I started creating web pages in 1993 when I was a student at Freiburg University. At that time I didn’t have any way to publish it. But when I moved to Karlsruhe later that year I could put up my first real public home page on the university server there. In 1995 my web pages moved to the INKA web server and in 1998 they moved again to my new home in cyberspace on remote.org. In 2005 I added a web page for my “professional self” at www.jtic.de.
The yearly FOSSGIS conference, the German language conference for Open Source GIS, Open Data and OpenStreetMap, will take place from June 12th to June 14th 2013 in Rapperswil, Switzerland near Zurich.
A year ago I released the first version of the OSMCoastline program that extracts and assembles coastlines from OSM data. Since then I have learned more about coastlines in OSM than I ever wanted to know, fixed numerous coastline bugs myself and added more features to the software. OSMCoastline has been in production use at Geofabrik and the German map server for more six months now.
I have written before about the mapping of Antarctia in OSM. Since then there has been a lot of activity. Christoph Hormann has prepared better Antarctica data for import and has documented the planned import extensively on the wiki. Peter Körner has created a web map in Polar Stereographic Projection.
The Map Features wiki page is one of the oldest pages on the OSM wiki and it has been getting larger and larger over time. And more and more wiki templates are used to create it and its language variants. Very similar tables can also be found on many “Key:*” pages on the wiki to list the tags used with this key.
OpenStreetMap wants to map the whole world, but there is a whole continent that has been neglected a bit: Antarctica. Of course there isn’t much there besides rocks, ice, and penguins and not too many people live there. But still, it is a huge area and it should appear properly on our maps.
It has been nearly a year since I did some serious work on taginfo. Taginfo works well, but there has been and still is a backlog of todo items. And I get new ideas on how to improve it all the time. So I spent the first days of the new year doing some work on it. A lot of this work has been “behind the scenes”, improving code here or there, so there isn’t actually that much to see. (If you are taking a look remember that you might have to clear your browser cache for all the goodies to show up properly.) But here are the larger improvements:
It is high time I’ll write something about the status of the Wikipedia Multilingual Maps project I have been working on for many months now.
One of the biggest problems with the OSM data model is that there are no real polygons. Polygons or multipolygons are “simulated” using closed ways or multipolygon relations and several other methods. These multipolygon relations are often broken. Some of the problems resulting from this non-existence of multipolygons are affecting many people, for instance with broken coastlines and country or other boundaries.
In a recent discussion on the OpenStreetMap talk mailing list around imports in France a point came up again that has been raised a few times: If we had some notion of layers in OSM data, maybe, some tasks such as imports would be easier or could be done in a better way. I want to look into this “layer” issue a little bit.
Yesterday Development Seed/MapBox announced that they received a grant of $575,000 to “improve the core infrastructure of OpenStreetMap”. This is a great development. We do have a lot of problems and there are many things that can be improved. The community has been working on all of these things over the years of course, but I do think that we lag behind the growth curve of OSM and any help is appreciated.
A while ago I wrote a small QGIS plugin, to control a running JOSM or other OSM editor that supports the remote control API from QGIS. I finally got around to add it to the official repository so that you can now install it with a few clicks from inside QGIS.
Osmium has had support for assembling multipolygons from relations for a long time. But the code had been lifted from an earlier Geofabrik project and was rather messy, hard to extend, and it contained several memory leaks. In the last month or so I finally chewed through the whole thing bit by bit and made it prettier.
I have done some long-overdue refactoring of the Osmium code which simplifies Osmium itself and simplifies its use. And should make it faster to compile, because only those parts that are really needed are included.
After setting up the MapQuest Render Stack and poking around in its code I consider using it for the Multilingual Maps project. Unlike Tirex it can run on a cluster of machines for reliability and load distribution. That will be important for a tile server working on “Wikipedia scale”.
Whatever way we’ll develop for rendering the multilingual tiles, we have to get the information about which language(s) the user wants from the user to the tileserver. Web browsers typically support the selection of an ordered list of preferred languages. This list is sent through the Accept-Language header to the web server. We could use this setting to determine the language for the labels. But in some situations people might not know about this setting or can’t change it. Maybe they are sitting in an Internet cafe in a foreign country. In that case it might be easier for them if they can just change the language setting on the web page. This is especially interesting if the web site knows which languages are available and only shows those options. The browser setting doesn’t know anything about actually available languages, it just has one large master list.
Occasionally people ask me why there is no comment function on this blog. There are two answers for that: First, the software I use to create this blog does not support comments. My blog posts are “compiled” into static web pages that I then upload to my public site. This has many advantages. The site is fast, I don’t need a database around, the files are easy to backup and can be kept under version control. But mainly it means I can sleep soundly, because I don’t have to update Wordpress or some other blogging software every time a new security vulnerability is found.
If we want to render multilingual maps by separating the “label layers” from the “base layers” as described in my last blog post, we have to make sure the rendering results don’t change. Or, at least, that they do not look too bad. Lets look into that.
As a proof-of-concept for the rendering of labels on demand I set up an OpenLayers map with two XYZ tile layers, one for the background and one for the labels. I configured the MapQuest render stack to deliver those two layers, the background layer is stored in the tile store, the label layer is never stored, but always rendered on demand.
Nearly everybody setting up an OSM tile server uses either the Apache/mod_tile/renderd or the Apache/mod_tile/Tirex combination. The renderd software has been around for while. When we needed something more flexible two years ago, Frederik Ramm and I created Tirex. To keep the development work needed small and make it easier to switch from renderd, Tirex uses the same Apache/mod_tile frontend as renderd. We had plans to add different frontends, but they never materialized. Apache/mod_tile is difficult to configure and not very flexible, but it does its job well enough. I wrote a proof-of-concept frontend using node.js, but that was never taken up either.
In Choosing a Language I have proposed to think about how to render labels in different languages into the tiles on the fly when they are requested.
One key issue of the multilingual map project is obviously how we choose which language labels to render into a map. There are two sides to consider:
(This article is a (modified) translation of a German language article I wrote for the German OpenStreetMap blog. It should have been translated months ago, but I am only now getting around to it. The project was a bit slow in starting but is gathering momentum now. It has progressed a bit since the original article.)
In the last months I have spent a lot of my spare time working on the OSMCoastline program mentioned several times in this blog already. It is by no means perfect yet, but it works and several people have tried it and used its output. But it is not the easiest program to use with all its options and you need a current planet file to work from, not everybody has one lying around. So I thought about how to make access to its output easier for everybody.
I just noticed that someone has created official Debian packages for Osmium (libosmium-dev, osmjs). They have also found their way into the new Ubuntu 12.04 version. Unfortunately they are already outdated. It is fine if you only want to use osmjs, but if you do any development based on Osmium, I suggest you use the version from github.
For a while now I have been working on OSMCoastline adding features bit by bit. (See this blog article for some background info.) OSMCoastline now closes small gaps in the coastline and it can split up the huge polygons into smaller ones. You can create land polygons or water polygons depending on your needs and you can have it all in WGS84 or in the usual Mercator projection used for tiles web maps (EPSG 3857). So it does a lot more than the coastcheck application used for these things before. And the software is still a lot faster.
Today the April 2012 issue of the German computer magazine “iX” was in my mailbox. It contains my four and a half page article on how to create your own maps with OSM data. I describe how to use Osmium/osmjs software to extract the data and Tilemill to style and create the maps.
I gave two talks at the FOSSGIS 2012 conference last week. The first, OpenStreetMap in Zahlen und Karten, was not recorded and I will not release the slides as they will be difficult to understand and easy to mis-understand without my words. I will probably re-use the content for a few blog articles here in the future.
Next week is the FOSSGIS conference in Dessau. I’ll be speaking there about OpenStreetMap in Zahlen und Karten and Das Osmium-Framework. Together with Tim Alder I am also organising a community session to discuss the Wikipedia Multilingual Maps Project. See you in Dessau!
Every map of the world has to take into account that there are land areas and water areas. You could have huge polygons for each continent and paint them on a blue background. But handling those huge polygons is difficult. OpenStreetMap solves this problem by having ways tagged with “natural=coastline” and the convention that the land is always on the left side of this way. So a continent or an island is surrounded by one or more ways in counter-clockwise order. It is much easier to edit those coastlines instead of large multipolygon relations or something like it. Coastlines are the only case where OSM has this special rule.
The Karlsruhe Hacking Weekend is over. 18 people were there at least part of the time. I had a lot of fun and even got a little bit of work done: As mention I worked on Taginfo and on the second day I worked a bit on the statistics for my upcoming talk at the FOSSGIS conference. I am using Osmium and full-history planet dumps to see what has been happening in OSM. I’ll publish some code and graphics at some point.
The Karlsruhe Hacking Weekend is in full swing. About 15 people are sitting around, munching apples, pizza and Gummi Bears and talking about things “somebody” should do “someday”. And we do some actual work, too. I fixed a few bugs in Taginfo, Sven and others are working on updating the German map style. Some people are talking about obscure Mapnik config options. No idea what the other people are doing.
I recently started writing my own RSS and Atom feed reader. I’ll write another blog post about why and how I am doing that. Today I want to focus just on one part of the job: Reading the RSS and Atom feeds.
The Osmium framework has been around for about a year and a half now and has grown to include a lot of functionality. But there is still much to to. In this blog post, I have collected some of those things. There is no particular order to the following list.
Time for the next OpenStreetMap hack weekend. Hope to see you there.
Lately I have been working on OSM-derived data in QGIS and finding bugs in the OSM data in the process. To fix the data I have to go into an OSM editor (that’s JOSM for me), find the right position and do the actual fix. To make that process easier I added a button to the QGIS user interface that uses the JOSM remote control feature to tell the editor which area to load data for.
I spend a lot of time slicing and dicing OSM and other geodata. Sometimes that is done with custom software but often a database is involved somewhere. Recently that sometimes has been the SQLite/SpatiaLite duo which works great for simple things. But the heavy lifting has to be done with PostgreSQL/PostGIS which has much more “geo functionality” and is much faster when complex queries are used. So here comes the first tip:
While working with OpenStreetMap or other geodata I often stumble upon interesting (or just pretty) visualization of the data. I have started a collection on my Miscellaneous Maps Flickr set. Some of these maps were created “on purpose”, some just appeared on my screen by accident. Maybe it inspires someone to create their own whacky visualizations.
Osmium now includes OGR support. OGR is a popular Open Source library for converting different GIS storage formats into each other. It supports PostGIS, Shapefiles, Spatialite, and many other formats. I had started working on OGR integration many weeks ago, but didn’t quite finish it. I have now added the last bit: Support for OGR Multipolygons. This opens up a lot of new possibilities for Osmium users.
More and more boundaries of administrative areas find their way into OpenStreetMap. In some countries there is already quite a lot of data for several levels of government. Unfortunately it is hard to edit and hard to use because relations must be used to model areas in OSM and relations are notoriously difficult to handle. Relations and with it the areas they represent are often broken as a result. Its therefore hard to get a complete set of administrative areas for a particular country and level of government.
Osmium is getting bigger and bigger and doing more and more. But testing the code is still done manually and ad-hoc. Not good. I have used some testing frameworks in other languages, but never in C++. As usual, Boost comes to the rescue. I recently added some test code using the Boost Unit Test Framework which (at least for now) does all I need.
Martijn van Exel is organizing a hack weekend on June 10-12 2011 at the Linuxhotel. I am planning to be there. After many hack weekends in London this ist the first such event in Essen after a long time. The first OSM event at the Linuxhotel was the Essen Developers Workshop I organized there four years ago. Has it already been that long?.
The organisers of the State of the Map Europe have published the programme and it looks very interesting. I will give a talk about Taginfo and a talk about Osmium, the two projects I currently spend my free time on.
Yesterday I gave a talk about Taginfo at the FOSSGIS2011 conference. The talk is in German, you can download a video (570 MB MP4). Many thanks to the volunteers who created the awesome videos of all the OSM talks at FOSSGIS!
It is quite amazing what can be done with good data and current map rendering technology. Mapnik and other renderers have continually pushed the envelope and there are many examples of beautiful maps out there.
The Unicode character set contains somewhat over one million code points from 0 to hex 10ffff. That wasn’t always so. Unicode started out with only 16 bit characters, or about 65000 code points. At some point it was decided that that wasn’t enough and the version 2.0 released in 1996 switched to the larger character set.
Taginfo always had an API so everybody can access all the juicy data in its database through simple HTTP queries. But that API was never used by anyone (as far as I can tell from a quick look at the log files). One reason was probably that I didn’t document the API.
In the past week two people contacted me independently who are working on Taginfo sites with local information for Ireland and Brazil. This prompted me to work on Taginfo again. Taginfo was never intended to be run on more than one web site, but it makes sense for other people to use it, too. So I incorporated some stuff that makes it easier to do that. Taginfo now has a new icon in the top left corner that can be changed for each instance, when you click on it you get some local information. Taginfo sites can add their address to the wiki.
Even hotels that are proud of being different are similar in a way. Ideally there might be different experiences to be had, but somehow they are all standard. At least in the bathroom.
Some month ago I introduced Taginfo and mentioned there in passing that the statistics collection is done based on a C++ framework called Osmium. I promised another post and I am delivering on this promise now. I have been working on Osmium on and off over the last 6 months or so and, while it is far from perfect, I think it could be actually useful for other people now. So lets dive right in. What is Osmium and what is it good for?
The OpenStreetMap database currently contains nearly 900 Million nodes. Because some nodes have already been deleted the highest node ID is already past 1 Billion. IDs in the database are already 64 bit wide so that space will not run out. But lots of software only uses 32 bit IDs. There recently was a question on the German mailing list when the 32 bit would run out, so I want to give some answers here.
Yesterday Frederik Ramm send me a link to this wikipedia page. It contains 73 observations on (mostly obnoxious) behaviour in the Wikipedia and some recommendations how to deal with it. Almost all of that can be translated directly to OpenStreetMap and I recommend that every OSMer read it.
One of the things I like to do when on holiday is spending huge amounts of time and money in book shops and then more time reading the books. And inevitably the time runs out faster than the books, so my stack of unread books keeps growing. I recently added a new bookshelf conventiently located near my bed just for the unread books. There are just too many interesting things and interesting books about those things out there…
The OpenStreetMap project gets bigger every day. More people edit the data which leads to more data in the database and this to more people using the data. And with the growing database it gets harder and harder to see whats in there. Gone are the days when the common mapper would need to know maybe a dozen different tag keys, each with just a handful of typical values. Today people tag parking spaces and public transport, power lines and penguin pools in their local zoo. Some people are valiantly trying to keep up by documenting tags in the wiki, but sometimes that seems to be a lost cause. There are only about 550 wiki pages describing a key, but there are about 20,000 different keys in the database (about one third only used once, many will by typos) not to mention the 50 Million different tags. What are all these tags? How are they used and by whom? The endless discussions on the mailing lists about what tags to use are partly because the world is complex and fitting everything in our tagging model is not easy, but they are also in part a symptom of this confusing situation.