[JT] Jochen Topf's Blog
Thu 2012-06-21 18:01

Wikipedia Multilingual Maps Project

(This article is a (modified) translation of a German language article I wrote for the German OpenStreetMap blog. It should have been translated months ago, but I am only now getting around to it. The project was a bit slow in starting but is gathering momentum now. It has progressed a bit since the original article.)

In autumn of last year the German Wikimedia e.V. invited the public to tender ideas for projects improving free knowledge in general and the Wikipedia in particular. It set aside a budget for these projects.

The project for rendering multilingual maps was accepted as one of the projects. It was started by Tim Alder who has been working for years on the cooperation between OpenStreetMap and Wikipedia. Tim asked me whether I wanted to do the implementation. As the main author of Tirex (thats used at the Wikipedia) I have been working on web maps for a long time.

The goal of the project is it to create the technical base that would allow the Wikipedia to have maps in all the languages that the encyclopaedia exists in. At the moment there are maps in about ten Wikipedias. This is only a small subset of the about 270 languages in which Wikipedias exists. Because articles in Wikipedia with geographical references are for instance used by tourists and school children it is necessary to not only translate the articles themselves but also the maps associated with it.

For use on the web the maps are cut up into tiles of 256×256 pixels. At the moment map tiles for each language are handled separately. But many map tiles are the same for all languages (because there are no names on them at all). And for many map tiles there will only be a few variants. For instance most map tiles in Germany can be used on an English language map with the German names, because there is no translation for the names anyway. Only a few cities in Germany have special English language names, for instance Munich (“M√ľnchen” in German).

The setup used at the moment is very inefficient in that regard and can’t be scaled up for many more languages. To find a better solution the project has to look at several issues. First of all we have to find out in which languages a map tile actually exists. We do not want to render a map tile in Patagonia hundreds of times only to find out that there are only two language versions of it. We’ll probably need some kind of database that holds the information which languages ares used where on the planet.

Then the Mapnik renderer has to be configured in some way that it renders the right labels into the maps. At the moment this is problematic, because the huge Mapnik configuration has to be kept for each language although only small parts differ between language versions.

Once the tiles have been rendered they have to be kept in a space efficient way together with the information in which language they can be used. There will probably be a lot of tiles that have been rendered several times for different languages but in the end we’ll find they are the same anyway. We should only store those tiles once.

In all these steps we have to keep in mind that OpenStreetMap data changes all the time and the tiles have to be re-rendered often. And of course it can happen that a tile that used to be available in only one language, is now available in two languages. In that case all the necessary modifications need to be made to the databases and tiles.

The solution should be as flexible as possible so that we can experiment with tile expiry policies etc. once we are in operation.

And even with all those complex components the whole system has to be really fast. The Wikipedia is used by many people and the map tiles have to be delivered to their browser quickly and efficiently. And the less resources we use the better, hardware costs money and needs to be operated and maintained.

One possible alternative would be to switch the whole map over to using vector data that is rendered in the browser. It would be relatively easy to switch to the right language labels in the browser. Or it would be possible to have a hybrid model where the base map is rendered on the server but the names of places, streets, etc. are rendered on top of that from vector data in the browser. But both approaches might lead to less map quality and they only work in modern browsers and on fast clients, those do not exist everywhere. Wikipedia must take poorer countries into account where the Internet is used with old computers or from mobile phones. And all the techniques used for rendering maps in the browser are relatively new and we don’t have as much operational experience with them as with the tiled bitmap maps. Thats why we have decided to keep the well-proven technology for this project and build on it.

The project will not only be interesting for Wikipedia, but it will help everybody who uses OpenStreetMap maps. Larger and more robust tile servers are needed in other places, too. And the software developed for the project could also be interesting for other uses. If you want, for instance, to render highways according to the color usually used in each country, you essentially have the same problem: Maps that are styled a bit differently depending on the area or a user choice. So the solution should be “translatable” for other applications, too, even if that is not the focus of this project.

The project is supported by the Wikimedia e.V. with 20,000 EUR. This means I can give the project the attention it needs. We are currently still in the first phase where we design the architecture of the system and evaluate software to be used.

Of course we are interested in hearing from you, especially if you are already running a large tile server or a tile server with many styles. But we also need all the ideas and support we can get, for example how to store large amounts of tiles efficiently or how we can store the mapping from languages to tiles.

Ideas and discussions should best go to the OSM dev list. But feel free to contact me directly. There is also the project wiki page.

Comments can be directed to the Multilingual maps wiki page.

Tags: multilingual maps · osm · wikipedia