Clustering Munich

It so happens that I need to write a blog post as the final assignment of a Coursera specialization, so here goes - it will also be useful to jot down some of what I learned.

This final project is a more or less free assignment, nach dem Motto fatti una domanda e datti una risposta. The only rule is that the question should be answerable by collecting geographic data and running k-means clustering.

It’s not like the assignment pops out of nowhere - there are several examples along the course, and you can build on that. In one of these assignments the idea was to analyze the neighbourhoods of Toronto after being shown an example with the ones in Manhattan: You pick a table with neighbourhood names and coordinates, draw a ball around each coordinate, see what’s inside and check whether there’s some kind of natural clustering, whatever that means. Already back then I had tried to run the exercise with Munich, because I honestly don’t care that much about Toronto, except (as usual) real data is messier than precooked example.

What happened in the Toronto dataset is that you had the city divided in postal codes and every postal code neatly divided in neighbourhood with tiny cute names and geographic coordinates; what happens in Munich is that there’s a complete mismatch between postal codes and neighbourhoods - postal code X covers part of neighbourhood A and part of neighbourhood B, neighbourhood A in its entirety covers part of postal code X and part of postal code Y -, and neighbourhood names seem to be composite of subneighbourhoods, but the internal subdivision is nowhere to be found.

All of this is of course extremely irritating, so I thought I would create new neighbourhoods from scratch; you just draw a grid over the city, pick a ball around every grid point and then have the algorithm group the points together. Two remarks dazu:

So you can run the algorithm and decide what number of clusters you like and so on and so forth. There isn’t really a “right” answer, but it was nice to toggle the parameter a bit and look at the results. In the end I settled for 28 clusters, which give the following picture:

The Gewerbegebiete in the north and the zoo in the south got nicely clustered, also the various bathing spots along the Isar got their own neighbourhood. You can download the map as html file here and play with it.