“Find the narrative in the numbers.” It’s this year’s mantra of data visualization, and some variation thereof is the watchword for all modern journalism: Find the story. Let the facts speak for themselves to tell what happened.
It’s a beguiling idea, the concept that a narration is hidden like a sculpture in every misshapen lump of data if only it could be liberated from the clay of unrelated information. And it’s true that in a world of infinite resources where every bit of existing data could be considered holistically, this would definitely be the case. After all, everything influences something. But perhaps it’s time to read a little closer into this much-abused buzzphrase: when we say “Find the narrative”, don’t we really mean, “Attribute some causality?”
Data vis is cool fun exciting stuff, and yes everyone and their aunt has a right, nay, a duty to give it a try. But in the last couple of months we’ve watched this well-meaning catchphrase morph from a description of data-cleaning processes to an injunction to project all kinds of causality on any given collection of numbers. Just three years shy of its 50th anniversary, the prime directive of How to Lie with Statistics (“Don’t!”) is getting brushed aside in our excitement to plot the hell out of any data set we can get our hands on.
It’s a difficult thing to explain to a client: sometimes things just happen. An upward trend in sales numbers may not be related to advertising campaigns, and a downward trend may not be the fault of the economy. This is basic basic stuff, people, and it’s just as true now that we can instantly make a groovy looking visualization in Fusion Charts as it was when we needed some graph paper and a sliderule. Being two steps removed from reality means that every visualization has an element of editorialization, but it doesn’t follow that we can suddenly make wild claims about the real-world events they very very abstractly represent.
How would we feel if we treated past representational art forms this way? We all know that the square-looking blob that is a Picasso nude says more about Picasso’s mental tools than what his model actually looked like. Certainly no company would decide to re-tailor their fall clothing line based on his “findings” about the female body. The graph is not the phenomenon.
But there’s something so finite about numbers that when faced with a visualization all of this logic suddenly goes out the window. Correlation is easy to show and impossible to prove. Truly impossible. Short of that infinite holistic data set I mentioned we’re going to have to accept that causality is networked: All a data set can show is is how a data set changed over time. Impart causality – I’m sorry, “narrative” – into it at your own peril.