Thursday, January 12, 2012

Viz: Embracing Uncertainty in Two-Line Charts

I like Robert's alternative. As he says, we need to show the undecided. 

Embracing Uncertainty in Two-Line Charts

As we’re heading towards elections again, there is a chart type that is as unavoidable as political ads, baby-kissing, and smear campaigns: line charts showing polling data. The most common pitch two candidates against each other, and often make a big deal out of the fact that the lines cross. Not only are these charts misleading in the way they depict the choice, they also hide an important fact: the number of undecided voters.

The following image shows slightly different data, but the idea is the same. The data is from a long-running Gallup poll about job approval ratings of President Obama, from early 2009 to the end of 2011. Each data point is actually a three-day average. Blue shows approval, green disapproval.

There is a clear trend here that shows approval dropping from very high in early 2009 to just below 50% in mid-2010. From there, things get murkier. The blue and green lines cross, then continue for a while, then cross again. There’s clearly a lot of noise, but every inversion looks like a big event.

The thing that also stands out is the symmetry: since there are (apparently) only two choices, the disapproval percentage is always going to be 100% minus the approval percentage. That is visually very appealing, but it also creates the illusion of two independent values, shown as different lines, being in much more agreement than would be expected. There is also a lot of noise in the regions where the lines cross or are close to crossing, which make it hard to see what is going on.

An Alternative

How else could the data be shown? In particular, what else is there to show about this data? There are two aspects to the data that are getting lost in the line chart above: the number of undecided people, and the fact that the numbers have to add up to 100%. It also makes sense to reduce the noise and make it easier to see the trend, especially in parts where the approval and disapproval numbers are very close together.

Here is my alternative. It is a stacked area chart that contains the approval at the bottom, the undecided percentage in the middle, and the disapproval on top. The colors were chosen deliberately to be easy to interpret (red is bad, blue is like above and it’s also the color of the Democratic Party), and the undecided layer is actually transparent.

How is this better? For one, it shows the approval trend in a much clearer way than the first chart. If we assume that the number of undecideds is constant, we don’t actually need to see the disapproval numbers, and so can initially ignore anything above the blue area.

But the undecided perc...