I spend a lot of time slicing and dicing OSM and other geodata. Sometimes that is done with custom software but often a database is involved somewhere. Recently that sometimes has been the SQLite/SpatiaLite duo which works great for simple things. But the heavy lifting has to be done with PostgreSQL/PostGIS which has much more “geo functionality” and is much faster when complex queries are used. So here comes the first tip:
Use UPDATE FROM instead of UPDATE with a subselect. PostgreSQL has a non-standard extension to the UPDATE SQL command that allows you to add a FROM clause. This often makes subselects unnecessary that are normally needed when updating the content of a table from the content of another table or tables. Instead of writing “UPDATE foo SET a=(SELECT x FROM bar WHERE bar.id=foo.id)” you write more succinctly “UPDATE foo SET a=x FROM bar WHERE bar.id=foo.id”. Not only is this shorter to write it often is much faster to execute!
I use the great Quantum GIS nearly every day to visualize the geodata I am working on. This leads me to my second tip:
Change rendering settings in QGIS for faster rendering. It has been bugging me for a long time that QGIS seemed so slow when rendering. Especially in a setting very common in my day-to-day work: I often work with some data from local files or a local database and use some external WMS server as a background layer. The local data renders really fast, but the WMS is slower because the large PNG file has to be downloaded. With the default QGIS settings you have to wait till all layers are finished downloading and rendering before you see anything. But I recently discovered a setting in “Settings → Options → Tab Rendering” called “Number of features to draw before updating the display”. Set this from the default zero to some high number and you don’t have to wait for slow layers any more!
All the data mangling is done in many steps, some using PostgreSQL, some standard tools like the OGR tools, others using my own software. All of that is kept together by some shell scripts. That is made easier by my third tip:
Use “set -e” and psql ON_ERROR_STOP. By default shell scripts keep running if a program called from it exits with an error. Many of my shell scripts start with “set -e” which makes the shell exit by default in case there is an error. At least while you are developing and haven’t added some other error checks this makes sure your shell script will not silently ignore errors. When using the PostgreSQL psql command to run SQL scripts you have the same problem: By default psql will not terminate if there is an error executing the SQL commands. But you can use the option “—set ON_ERROR_STOP=” to change this.