Sun 2019-10-20 20:51

OSMData on Cloud Servers

Since March the OSMData Service has been running on new servers sponsored by FOSSGIS e.V. I wrote about the switch in an earlier blog post. Now, half a year later, I want to write about my experiences with the new setup.

First, the switch itself. This went without any hickups. I had expected more complaints or questions from users of the old servers, but it seems everybody found the new servers. As mentioned in my blog post from March, I disabled some products but that affected only one or two people and I had reached out to them beforehand.

My experience with the Hetzner cloud servers is somewhat mixed. Most of the time they work well and are really cheap. But there have been some problems along the line. Here is a list of pros and cons:

Pros

It is really nice to start a server from nothing and have completely automated scripts for the setup. If something goes wrong, start from scratch. When Debian Buster was released I could test everything on new virtual machines in the exact same setup as the production system and then switch over.
Costs are lower than having your own server, but I am not sure how long this will be true. I have been thinking about adding more “products” to the site and pretty soon one large server would be cheaper than running multiple small ones and the extra storage needed. This is something to keep in mind if and when the service is extended.
In theory the data updates could be run in parallel without any extra cost. So I could, for instance, update the coastline and, at the same time, on a second server update the icesheet polygons. That’s because it doesn’t cost more to run two servers for n hours instead of one server for 2n hours. This would allow faster access to the new data. But it doesn’t work (or gets more complicated) when data from one processing step is needed for another processing step, especially because there is no easy way to share data between hosts. The network disks can only be mounted on one server at a time. Of course I could set up some data sharing or copy the data around, but that is more work and copies of data cost money.
The Hetzner Cloud web user interface, API and hcloud tools are really nice. They are easy to use and do everything that’s needed. Some features I requested for the hcloud tool were quickly added.

Cons

More servers mean there are more things that can go wrong. The network can be slow or broken in some way or a new server can take more time to come up than you expected.
Installing a server from scratch every time it is used means you might get slightly different setups from one day to the next, for instance, if Debian is updated, without you noticing. Or the Debian repository might be temporarily unreachable which makes your setup more brittle.
If something goes wrong it is sometimes difficult to find the problem, because there are more servers involved. Also, because a server might have been shutdown already after a problem, the information needed to find and fix the problem might already be gone. I have opted to not shut down servers automatically when there are problems so I can login and do the debugging. But that means servers might run longer than needed.
Because Hetzner doesn’t provide some kind of shared data access, data has to be copied between servers. It would be nice if Hetzner would offer some kind of shared disks.
The scaling options offered by Hetzner are not as flexible as I would need for some use cases. CPU, memory, and disk grow proportionally from one server type to the next. But sometimes I would need much more RAM, but not more CPUs, or just want a larger disk. They probably do this to make it easy to distribute virtual servers on their actual hardware, but it means I sometimes have to buy a much larger server than I would really need.
One of the biggest problems I have encountered are the large CPU performance differences between virtual hosts. Just in the last month the very same job sometimes ran for about 4,5 hours, sometimes it needed more than 8 hours. Usually it is somewhere in between, but the huge spread makes it difficult to plan and basically impossible to measure how any changes I work on impact the run time. To be fair, Hetzner offers servers with guaranteed CPU performance, but they are so much more expensive that this isn’t a real option.
Another large problem is the unpredictable provisioning time for new servers. Usually a new server is available in only a minute or two, but I had to extend the timeout in my setup script serveral times, because it can sometimes take 10 or 15 minutes or more until the new server is ready.

All in all I am reasonably happy with what I have got here, even if it took me a while to work around some of the problems with the Hetzner service. But when I am thinking about extending the OSMData service and what I need to get this done and how much the service would cost then, I am not sure whether this cloud approach is the right way to go.

OSM data is notoriously large and often needs a lot of RAM to process, so using a large dedicated server might be easier and cheaper once I do more processing. Another option I am considering is having one large dedicated server for the heavy lifting up front, everything that involves the whole planet, keeping the complete planet around and updating it, creating extracts from it, etc. And then farm out further work to cloud servers that only have to deal with smaller amounts of data. Unfortunately the Hetzner dedicated servers and the cloud servers are two different worlds, you can’t mount the cloud disks on dedicated servers for instance, so this becomes a bit more difficult.

I’d love to hear from you and your experiences with Hetzner or other providers if you are doing similar things.

Tags: openstreetmap · openstreetmapdata · osmdata