Thursday, September 8, 2011

Tutorial: Geographic data on a map with Python

Plotting geographic data on a world map with Python

via Seeing Data by Chris McDowall on 8/11/11

Last week I promised to write a blog post detailing how I created this public transport animation. On reflection, it’s a topic best dealt with over a few sessions. Let’s start simple. How might you plot lots of geographic data on a map? In this post I will show you how to programmatically create a map of the World’s top ten most populated cities. It will end up looking something like this.

This tutorial involves programming with the Python language and some additional modules. If you have never programmed before it might be a tad confusing. I suggest you first read one of the many fine introductory Python tutorials to get your head around the language. The rest of this post will assume that you understand how to install and import modules, how to write and run a script and the fundamentals of Python data structures.

Before we begin you will need to install the following:
Python 2.5, 2.6 or 2.7.
NumPy (a Python extension that adds support for multi-dimensional arrays and a host of whiz-bang high-level mathematical operations).
Matplotlib (a flexible 2D plotting library for Python that “tries to make easy things easy and hard things possible”).
Basemap (a Matplotlib extension for plotting data on geographic projections).

Right, let’s make a map. Wikipedia has a list of World metropolitan areas by population that can serve as a nice test dataset. I will demonstrate how to turn this data into a labelled proportional symbol world map.

First we need to import our various libraries:
from mpl_toolkits.basemap import Basemap import matplotlib.pyplot as plt import numpy as np

Next we will set up our map. Basemap supports many different geographic projections. For this exercise I am using the Robinson projection.
# lon_0 is central longitude of robinson projection. # resolution = 'c' means use crude resolution coastlines. m = Basemap(projection='robin',lon_0=0,resolution='c') #set a background colour m.drawmapboundary(fill_color='#85A6D9')

Basemap comes packaged with some global geographic data at four resolutions: ‘crude’, ‘low’, ‘medium’ and ‘high’. Let’s draw the continent and country datasets using the ‘crude’ outlines.
# draw coastlines, country boundaries, fill continents. m.fillcontinents(color='white',lake_color='#85A6D9') m.drawcoastlines(color='#6D5F47', linewidth=.4) m.drawcountries(color='#6D5F47', linewidth=.4)

We can also ask Basemap to draw lines of longitude and latitude.
# draw lat/lon grid lines every 30 degrees. m.drawmeridians(np.arange(-180, 180, 30), color='#bbbbbb') m.drawparallels(np.arange(-90, 90, 30), color='#bbbbbb')

Populate three arrays of equal length with the latitude, longitude and population values for each city. Normally we would read the data from a file, database or service but in the interest of simplicity I have typed them directly into the script.
# lat/lon coordinates of top ten world cities lats = [35.69,37.569,19.433,40.809,18.975,-6.175,-23.55,28.61,34.694,31.2] lngs = [139.692,126.977,-99.133,-74.02,72.825,106.828,-46.633,77.23,135.502,121.5] populations = [32.45,20.55,20.45,19.75,19.2,18.9,18.85,18.6,17.375,16.65] #millions

Use our basemap object to convert the latitude/longitude values into map display coordinates.
# compute the native map projection coordinates for cities x,y = m(lngs,lats)

Multiply each population by itself to create a scaled list of values. These will be our circle display sizes.
#scale populations to emphasise different relative pop sizes s_populations = [p * p for p in populations]

Use the matplotlib scatter function to plot the circles. Note the use of the zorder parameter. This ensures that the scattered circles will be rendered on top of the continents.
#scatter scaled circles at the city locations m.scatter( x, y, s=s_populations, #size c='blue', #color marker='o', #symbol alpha=0.25, #transparency zorder = 2, #plotting order )

Loop though the unscaled population values and the display coordinates. Label each circle with the city population rounded to the nearest million people.
# plot population labels of the ten cities. for population, xpt, ypt in zip(populations, x, y): label_txt = int(round(population, 0)) #round to 0 dp and display as integer plt.text( xpt, ypt, label_txt, color = 'blue', size='small', horizontalalignment='center', verticalalignment='center', zorder = 3, )

Finally, add a title and display the map on your screen.
#add a title and display the map on screen plt.title('Top Ten World Metropolitan Areas By Population')

Run your script and you should see a map. Lovely. I recommend you experiment further by tweaking parameters, importing other datasets and using alternative plotting methods.

This is the basic overview of how I typically plot geographic data in Python. Next time I’ll take this a step further and show you how to map, export and animate temporal geographic data.