Fri 2012-09-21 13:44

Making it easier to add data to OpenStreetMap

Yesterday Development Seed/MapBox announced that they received a grant of $575,000 to “improve the core infrastructure of OpenStreetMap”. This is a great development. We do have a lot of problems and there are many things that can be improved. The community has been working on all of these things over the years of course, but I do think that we lag behind the growth curve of OSM and any help is appreciated.

They haven’t announced yet, what exactly they are planning to do, but just from the description in the announcement this is quite an ambitious project. And while half a million dollars is quite a lot of money, once you start paying software developers with it and take the management overhead into account it will shrink quickly.

One of their goals is to “use this investment to make it easier to add data to OpenStreetMap”. I don’t think adding data was ever a problem really, what they probably mean is adding, editing and maintaining the data. And there we could use a lot of help. But it is a huge task. Ohloh [Updated: now OpenHub] estimates the cost for developing JOSM, the most powerful OSM editor we currently have, at about 10 million dollars. Suddenly the half million does look small :-)

People have wanted easier to use editors since the project started. As things progressed editors became easier to use and so we could attract more and different groups of people to the project. But there is the countermovement of OpenStreetMap getting more and more complex with the addition of relations, mappers trying to get into the third dimension and so on. Of course we want editors to be simple and attract more people to the project. But unfortunately that’s not an easy goal to achieve. Let’s dig into this a bit deeper. There are three reasons why OpenStreetMap is complex and difficult to work with:

1. OpenStreetMap is trying to model the real world and the real world is complex. The world isn’t split up into neatly categorized and distinct objects. There are bridges that have a road and a railway next to each other on the same bridge. There are houses that have several addresses. There are rivers that flow in different directions depending on seasons. We, as a community, have to decide how much of that complexity we want to allow in OpenStreetMap. Over the last years, we have added more and more complex and detailed data to OSM, often just because we could, but also because we needed this data for some new task we want to achieve. As long as we use OpenStreetMap to model all these complex things, there is a limit to how simple the job of a mapper can be.

2. The user interfaces of all the editors are, aeh, say, non-optimal. This has been an old grievance and the editors are already much better than they used to be. Some focus on user interface design could improve things a lot. But user interface is more than skin deep. It is not just nice icons in the right places and that leads us to the next point:

3. OpenStreetMap is complex and difficult to work with because of our data model. Our data model is very open and that is a good thing. We want this flexibility. Especially in the beginning when nobody really knew how OpenStreetMap would turn out and what people would do with it, the simple model of nodes, ways, (and later) relations with “any tag you like”, allowed innovations to happen. But the model shows its age, some decisions we would probably do different today. But what is more problematic: The open model stands in the way of a simpler user interface. There is, for instance, no reason I know of why ways containing only a single node are allowed. But they are. And they do exist. And they do mess up the data. Any editor has to take this into account, find a way to visualize single-node-ways, so that people can fix them.

What about a way that has a highway tag and a railway tag. There have been discussion about whether that’s the right way to tag things, but even if we decide that that’s now how things should be done, those ways exist and will exist as long as we have the data model we have now. So an editor has to take this into account. A simplistic editor that draws simple lines of different color or whatever for highways and railways now has to decide which color to draw the line. In all probability it will draw one and ignore the other, maybe even depending on which tag comes first. But this way the user gets the illusion that everything is simple, that a way can either be a highway or a railway, when in the data the case is more complex.

I have cited two small problems of this type to illustrate my point, but there are many many more and many of them are much worse. The biggest problem of this type is, in my opinion, the polygon problem. There are different ways to draw polygons (closed ways or multipolygon relations) and there is no way to put a simple interface on this because there are so many ways such polygons can be incomplete or broken. As a user you have to understand the underlying data model, otherwise you can not work effectively with the data. We can put a nice, easy to use, interface on top of the data, but that only works if the data is correct.

So I think we can make OSM a lot easier to edit by working on the issue 2. But it will remain too hard to use for occasional users until we tackle issue 1 and 3. Some kinds of type 1 problems can be solved by a better data model. We need to “isolate” the difficult parts from the easy parts so that we can give novice or occasional users an easy to use interface and keep the heavy-duty-work for the specialits. And we need to discuss how much flexibility we want to allow in the future. There is a tradeoff here. More flexibility means more complexity which means it is harder to use. There is just no way around that.

People who only use OpenStreetMap data (and don’t edit it) have it much simpler. Everybody has approached the job of creating maps from OpenStreetMap data (or any other thing you do with the data) by implementing the easy bits and ignoring the difficult bits. Broken data, undefined data, data you don’t understand is ignored. You start with something simple, and if needed, you iterate and piece by piece improve your work by teasing out more and more information from the OpenStreetMap data. But still you ignore everything you are not interested in or that you can’t understand. That is a valid (and I think only workable approach) when you use the OpenStreetMap data.

Unfortunately this approach doesn’t work for an editor. An OSM editor has to understand (to a degree) everything. It has to give you a way to work with the data even if it is broken or you can’t interpret the tags. And that’s what makes this task so difficult. Development Seed/MapBox has shown some amazing work over the last years. But I haven’t seen much collaboration with the wider OSM community from them, especially not while drawing up the application for this grant. So it will be interesting to see which problems they’ll decide to tackle and how they’ll go about involving the community without which their goals can’t be achieved.

Tags: openstreetmap