[JT] Jochen Topf's Blog
Sun 2020-11-08 17:06

10 Years of Taginfo

10 years ago I started the taginfo project. To celebrate this I am introducing some new features!

I am working on taginfo in my free time. And there isn’t that much free time to go around. So usually I set aside a few days once a year or so to go through the issues on Github, fix some problems and add a new feature here or there. The rest of the time the taginfo server just keeps chugging along and usually doesn’t need any maintainance at all. And even though taginfo has been running for 10 years and OSM grew by leaps and bounds over that time, the original architecture still holds and mostly works smoothly.

But there is a long list of things I need to (or want to) work on and improve taginfo. Here are some new features I added in the last month or so.

The New Chronology Tab

One of the most often requested features for taginfo is some kind of way to see the historical change in tag usage. This isn’t that difficult to generate, I just never got around to it. But now we have it! Every night, when the taginfo update runs, it goes through the full history planet file and creates statistics about all ca. 200.000 keys that every existed and about 80.000 often used tags.

Why am I going through the history planet on every run and not update the taginfo database from change files? Turns out this only takes about 20 minutes, so it isn’t that expensive. And I had to write the code anyway, because I have to start the whole process somewhere. So for the time being it is just easiest this way. And very robust: if something goes wrong, the whole process starts from scratch the next night and usually rights itself.

The result you can see as tab in the key and tag pages. So here is the graph for the name key for instance. Typically these graphs point straight up, with a bump here or there for imports. But there are more interesting examples, for instance usage of the deprecated created_by tag is actually declining nicely.

Just like with the maps I can create the chronology for all the keys, but only some of the tags. This is mostly about the memory requirements to store the data and also a bit about the time it would take to generate these. Taginfo doesn’t need a high-end server and it should stay that way.

Search Upgrades

While most of the taginfo UI is pretty snappy, there always was one thing that didn’t work as fast as it should: The search. Depending on what you type into the search box, taginfo will do a full-text search over all tag keys and values. This was slow. But I have now upgraded to a newer Sqlite full-text search implementation (FTS5 instead of FTS3) and played around with it until I found a way to improve search speed tremendously. Before a search could take several minutes, now the same search gives results instantly.

Note that this only affects the full-text search, which is used if you don’t use an equals sign (=) in the search box. If you have an equals sign, taginfo uses a slower substring search which can still take a long time. This is certainly something which needs more work.

Useful Links

Another feature mappers have requested again and again is adding links from tag values to “appropriate” web sites. I have blocked these requests for a long time, because I didn’t want to get into the semantics of the tags themselves. Taginfo mostly just shows general information about tags, like how often they are in the database or which project uses them. But it doesn’t know what the bits and bytes in the tag value mean. Is this a text, a number, a link?

But it is occasionally useful to connect the tags to other websites and so I broke down and added that feature. Taginfo knows about some special tags and shows one or more links from the tag overview to canonical websites. So, for example, the tag species=Celtis australis links to the corresponding page on Wikispecies.

Currently this works for the addr:country, phone, ref:bag, species, url, website, wikidata, and wikipedia tags. In all cases the tag values are checked for the right format and sometimes additional checks are done. For instance the wikipedia links have to go to an actual existing Wikipedia site.

This is certainly a feature that will evolve over time. I am willing to add new links, within reason. Only reasonably common, well established and well documented tags will be considered. And links should go to obvious, well-established, and somewhat “official” sites.

Approval Status

Another addition that was requested often and that I have thought about for a long time: The “Approval status” of keys/tags from the wiki pages is now shown in the tables on the wiki tabs of the key and tag pages. Here is an example.

I have hesitated a long time to add this, because I see two problems with this information: First, the approval process is, in my opinion, far from optimal. Too few people participate and the decisions are not a good reflection of community consensus. And the process often focuses too narrowly on a single tag or a few tags instead of looking at the larger picture. But, for better or worse, this is what we have, and if taginfo wants to bring all available information about tags together, it should include this.

But this brings me to the other problem: The approval status on the wiki pages is just a text string. Often there are several versions of the same status (“de facto”, “defacto”, “De facto”, …) documented and used. And, even worse, the wiki pages for the different languages often disagree about the status. So all I can do at the moment is showing the status in the wiki table for each language version there is.

I worked on all the above features and a lot more behind-the-scenes cleanups, fixes and performance improvements in the last months. Now its time to let things rest a while and work on other projects. Sorry if I didn’t get around to your favourite feature. Keep the suggestions, ideas, and bug reports coming. It might take a while, but I intend to keep improving taginfo for the next 10 years, too.

Oh, and if you want to help. Taginfo is available in 20 languages. But not all translations are complete and there are languages missing. See the Wiki for information on how to help.

Tags: openstreetmap · taginfo