Archive for the ‘Data Vis’ Category

“Find the narrative in the numbers.” It’s this year’s mantra of data visualization, and some variation thereof is the watchword for all modern journalism: Find the story. Let the facts speak for themselves to tell what happened.

It’s a beguiling idea, the concept that a narration is hidden like a sculpture in every misshapen lump of data if only it could be liberated from the clay of unrelated information. And it’s true that in a world of infinite resources where every bit of existing data could be considered holistically, this would definitely be the case. After all, everything influences something. But perhaps it’s time to read a little closer into this much-abused buzzphrase: when we say “Find the narrative”, don’t we really mean, “Attribute some causality?”

Data vis is cool fun exciting stuff, and yes everyone and their aunt has a right, nay, a duty to give it a try. But in the last couple of months we’ve watched this well-meaning catchphrase morph from a description of data-cleaning processes to an injunction to project all kinds of causality on any given collection of numbers. Just three years shy of its 50th anniversary, the prime directive of  How to Lie with Statistics (“Don’t!”) is getting brushed aside in our excitement to plot the hell out of any data set we can get our hands on.

It’s a difficult thing to explain to a client: sometimes things just happen. An upward trend in sales numbers may not be related to advertising campaigns, and a downward trend may not be the fault of the economy.  This is basic basic stuff, people, and it’s just as true now that we can instantly make a groovy looking visualization in Fusion Charts as it was when we needed some graph paper and a sliderule. Being two steps removed from reality means that every visualization has an element of editorialization, but it doesn’t follow that we can suddenly make wild claims about the real-world events they very very abstractly represent.

How would we feel if we treated past representational art forms this way? We all know that the square-looking blob that is a Picasso nude says more about Picasso’s mental tools than what his model actually looked like. Certainly no company would decide to re-tailor their fall clothing line based on his “findings” about the female body. The graph is not the phenomenon.

But there’s something so finite about numbers that when faced with a visualization all of this logic suddenly goes out the window. Correlation is easy to show and impossible to prove. Truly impossible. Short of that infinite holistic data set I mentioned we’re going to have to accept that causality is networked: All a data set can show is is how a data set changed over time. Impart causality – I’m sorry, “narrative” – into it at your own peril.


Read Full Post »

The road to professional academic success often seems to be paved by wild refutations: pick a pet theory by another leading academic and disagree as loudly as you can. Maybe the concept is to to provoke a response which will drive traffic to your site. I’d feel bad for all of the recent flack Malcolm Gladwell is receiving for his well-researched and thoughtful New Yorker article from people who obviously did no more than skim it, except that Gladwell himself is occasionally guilty of this method. Another prime example, well, Evgeny Morozov love him though I do.

The lesson here: it’s never too early to backlash. Let’s call it the ‘Remora Effect with a twist’, although I’m sure there’s an official name for it (and someone who loudly disagrees with that name). Romoras  (AKA Suckerfish) are fish that hitch rides on a host to save themselves the effort of really thinking about a topic.  Alright, I added that last bit.

The concept does bring up an interesting question though, how much of news media released is original thought, that is to say primary source reporting and editorial, and how much of it is a rehashing of other sources. Jay Rosen points out the surprising amount of “reporting” is taken directly from press releases, and anyone following Japan on twitter the last two days sees that new developments come about once every six hours, not on the second by second basis that the firehose would have you believe.

To do this xkcd style, I’m guessing the graph of actual news before and after digital tools looks something like this:

Read Full Post »

Transportation visualizations have a huge advantage over their drier, more numerical cousins in that there’s an intrinsic visual already built into their nature.  After all, the more steps required for a viewer’s brain to switch between an obvious metaphor and a completely unrelated one, the less likely it is for the visualization to have any impact.  What  size is to astronomical visualizations, a good solid X-axis is to time visualizations, and color is to wavelengths or hex codes, all these things length is to distance.   When it comes to transportation visualizations, it would be a fool who passed up length in favor of something with no visual correlation to distance, like, say, color, or size of circles.

I am that fool!

The following transportation visualization was completed in two parts: The first as part of Great Urban Hack in Manhattan this last December, and the second a month later once the sleep deprivation had worn off. Perhaps that is why one is successful, and one is such a complete and embarrassing failure.

The first analyzes taxi rides throughout Manhattan during a 24 hour period of March, 2009. Each visual aspect is correlated to a particular variable for the rides occurring during the 5 minute period there represented, but no accounting is taken for the type of variable occurring. So for example, both Time and Length of a cab ride and distances traveled are minimized by their placement in the interior of the 24 hour cycle, while number of passengers is represented by circle size instead of the obvious indicator: quantity.

What this entire visualization suffers from is the ancient and evil curse of “Data Inspired Environment”. In the same way that the producer of a dramatic movie may, in the face of their terrible product, suddenly decide that they meant to produce a dark comedy all along, so do many visualizations hide behind the label of “data-inspired environment”. They are easy to spot: the purpose is an impressive visual and the data relegated to just another way to get some random variation into the visuals. With any numerical narrative buried under mis-chosen visuals, these poor excuses for visualizations would have been better seeded with some Perlin noise and good riddance.

As a thorough apology, I offer this second visualization. It attempts to solve these problems in a couple of ways – first and formost by chosing a more intuitive metaphore for a measurement of distance (a chart that requires the eye “travel” to follow the narrative) and a much more intuitive metaphore for volume: volume. It also cuts down the amount of data displayed to a single hour. Less impressive yes, but with the singular advantage of actually having some kind of meaning.

This isn’t art, it’s data visualization. Like an attractive taxi driver, pretty only counts if it also gets you where you’re going.

Analysis & Data Visualizations done by Zoe Fraade-Blanar, Kevin Webb, Aaron Glazer, John Keefe, Lev Steshenko. Data is based on a record of GPS-tagged taxi rides in March 2009 provided by the New York City Taxi and Limousine Commission.

Read Full Post »