The OpenStreetMap project gets bigger every day. More people edit the data which leads to more data in the database and this to more people using the data. And with the growing database it gets harder and harder to see whats in there. Gone are the days when the common mapper would need to know maybe a dozen different tag keys, each with just a handful of typical values. Today people tag parking spaces and public transport, power lines and penguin pools in their local zoo. Some people are valiantly trying to keep up by documenting tags in the wiki, but sometimes that seems to be a lost cause. There are only about 550 wiki pages describing a key, but there are about 20,000 different keys in the database (about one third only used once, many will by typos) not to mention the 50 Million different tags. What are all these tags? How are they used and by whom? The endless discussions on the mailing lists about what tags to use are partly because the world is complex and fitting everything in our tagging model is not easy, but they are also in part a symptom of this confusing situation.
Some people have already tried solving this by creating tools to look at the database content. There are Tagwatch, Tagstat, and OSMdoc and probably others. They create statistics from the planet file and present them to the user. But they don’t go far enough. There is more information out there about tags. The wiki has a lot (and Tagwatch already uses some of that), but thats not all. If I try to decide which tag to use for a specific feature I might want to see how the OSM editors (JOSM, Potlatch, and others) handle this tag. There is a lot of information build into the styles, presets and other configuration of these editors. Even more information is in the different map styles used in rendering.
So we need some kind of system that can find and aggregate all these information bits about tags and make it browsable and searchable. A bit like the Google search, it should crawl the information sources out there and bring everything together.
So I went out and built it. Its called Taginfo and available at taginfo.openstreetmap.org [UPDATE: URL]. Taginfo currently collects information about tags from the OSM planet file, from OSM wiki pages about keys and tags and from the JOSM style configuration. Of course thats only the beginning, there are many other sources as I mentioned, and I or others can write modules to include this information. Already Taginfo has nearly all features that Tagwatch, Tagstat and OSMdoc have and many more on top. And, of course, all of Taginfo is Open Source.
Taginfo has been designed to be very modular. There are several components:
More information about Taginfo should appear on the wiki. The code is available from Github. The statistics collection from the planet file is done using a C++ framework called Osmium that I am also releasing today. Details about that will be in a later post.
A word about updates: Taginfo is currently updated manually by me, probably about whenever a new planet comes out. But I want to move to daily updates as soon as possible. Currently creating the statistics from the planet takes about 6 hours and 10GB of RAM. There is room for improvement here when I can switch over to using the new binary format instead of XML. But some code needs to be written first for that. And, yes, it would be nice to have instant updates from minutely diffs, but we’ll solve that problem later. :-)
I have many ideas on how to improve Taginfo. There are some obvious ones like a better search function and I18N support. I want some reports about inconsistencies in the data such as popular keys without wiki pages. And many data sources need to be integrated: More wiki pages, JOSM styles and presets, Potlatch config, Mapnik rendering styles, and many others. And I already know of some bugs that need fixing and you will probably find more.
I am looking forward to hearing your ideas and seeing your code to improve Taginfo.