Readers have come to rely on interactive presentations to understand complicated stories, using them to zoom in on periods of time and highlight areas of interest. Yet to investigate these stories, reporters often create what amounts to handcrafted investigative art: flow charts with circles and arrows, maps shaded with highlighters and stuck with pins.
More and more, though, some reporters are using data visualization tools to find the story hidden in the data. Those tools help them discover patterns and focus their reporting on particular places and times. Many of the presentations, which can have rough interfaces or less-than-sleek design, are never published.
At the recent National Institute for Computer-Assisted Reporting (NICAR) conference, Sarah Cohen, database editor for The Washington Post‘s investigative team — and recently named professor of computational journalism at Duke University — showed how reporters can use interactive graphics for their exploratory reporting. [PDF]
Cohen described this approach to me via e-mail. Here’s an edited version of our exchange.
Steve Myers: How would creating a digital, visual representation of data help a reporter? What does it tell you that you wouldn’t be able to find otherwise?
Sarah Cohen: The same way that visualizations and graphics help readers cut through a lot of clutter and display dense information in an efficient way. The most common things that early visualizations help with are place and time — two of the most important elements in reporting a complex story. Those two things are really hard to see in text. They’re really, really hard to see in combination. So the graphics can show you where to go to find your subjects or where to go to find the most typical subjects. They can also show you when the story you are trying to find peaked. Put them together, and you can start finding the very best examples for your story.
That’s pretty general, so let me give you a couple of examples. During a story on disaster payments in the farm subsidy system, we wanted to make sure that we went to places that had received the payments year after year after year. Using a database, we could find farms that had received multiple payments pretty easily. But looking at repeated images of density maps that I made of the payments, it was really obvious where to go — specific areas of North Dakota and Kansas.
In another example, we were working last year on a story on practices used by landlords to empty their buildings, partly in order to avoid strict laws on condo conversions. We knew one neighborhood of the city was Ground Zero — an area called Columbia Heights, in Northwest D.C. But making an interactive map with a slider that showed the timing, we could see that it was moving into other areas of the city, especially in Southeast. We could also quickly see that the most affluent areas of the city had none of them. [A visualization similar to the one used to report the story was published online.]
Can you describe some of the forms these visualizations take and how they have guided or improved the reporting?
Cohen: Again, most often it’s some combination of place and time. Almost any complex story will benefit from some kind of chronology. The ability to do this interactively rather than on paper has the advantage of being able to zoom in and out on specific periods of time, turn on and off various types of entries or players. On a time line, you might want to see a broad view at one time, but zoom in on one person’s role at another. These are really hard to make — I haven’t mastered it. The tools available to do it are really hard, really expensive, or require a kind of discipline in entering your notes that few of us can muster.
Another typical form is some kind of frequency — grouping cases by age, by stage, or some other element. Last year, when two reporters here were working on a story on deaths in detention centers, I made a simple Flash interactive that let them look at deaths on a map by age group, cause of death, year and a couple of other variables. It helped to be able to see whether certain kinds of deaths were centered in certain areas. It also helped show that what appeared to be a site of many deaths was really one where they sent very, very ill people, making it far less newsworthy and obvious that they should focus somewhere else.
How frequently is this reporting technique used at the Post?
Cohen: Not enough. I try to do what I can in almost every long-term story I work on, but it’s hard. I’m neither a designer nor a real computer programmer, so I get frustrated over the difficulty of doing this. I can almost always picture what I want to see. Accomplishing it in some reasonable amount of time is another story.
But you don’t have to get so fancy. I think if you walk around any newsroom, you’ll see hand-drawn diagrams, time lines, flow charts and all kinds of other visualizations tacked up on the cubicle walls of beat reporters and people working on longer-term projects. We often try to draw out a story for an editor, or a source, as an easier way to think about it. We just don’t usually use fancy tools to do it.
Is this a common method of reporting in-depth stories in the industry?
Cohen: I wish it were more common. It’s hard, though — you need standardized data; you need someone with time to create the material; and you have to be willing to walk away from it if it doesn’t help. And it does take some programming skill in most cases.
When I sent out that note [on the NICAR-L listserv] asking for examples of visualizations, it was clear that reporters from large and small news organizations are trying to create their own visualizations. Sometimes they have to draw something by hand, and sometimes it’s just coloring in a map or something — but the idea is there across a bunch of beats and the whole spectrum of sizes.
It’s just that some of the more professional looking ones can’t be done as a reporting tool. It wouldn’t be an efficient use of our graphic artists’ time to create a really elegant interactive visualization if we didn’t know we were going to publish it. So instead we, as reporters, kind of fumble around with it. But I have gotten advice from our artists and designers. In turn, they have used my amateurish attempts as a start for some visualizations for publication.
Tell me how you used data visualization tools to report on lead levels in Washington.
Cohen: The lead one is an interesting example of using visualizations during reporting, and of moving pretty seamlessly into publication both in the paper and online. We had obtained a copy of, I believe, about 6,000 water tests from a source on a paper printout. We had to scan them in and then geocode them (get them on a map). There were a lot of places where things could go wrong in the process and we had no real ability to check every one. So we built the fact-checking into the reporting.
For about six or seven areas of high-lead readings, we armed our team of reporters with a satellite image of a few blocks with dots on each house that had taken a test, color-coded for the level. We also had a database and interactive map internally for reporting purposes. The story was partly about people who lived in high-contamination areas and had not been warned. The reporting was also used to spot-check the scanned database. (It turned out to be really good.) … This was a case where we went almost seamlessly from the reporting stage to the publication stage — I think every element of the print graphic and the online graphic were used as reporting tools before they were considered for publication.
How can using data visualizations early in the reporting process help find errors or holes in your data?
Cohen: Sometimes seeing a bunch of records that have no dates on them but are all in one place tells you that there is something consistently wrong with the data. Or that there are cases that are way off the charts in money or time. You can find all of this through querying the data, but visualizing it sometimes makes it easier to see some combination of variables that you wouldn’t otherwise notice.
A simple scatterplot is a good example of that — many reporters have used a scatterplot with school test scores on one axis versus poverty rates for schools on the other to see which ones fall outside the usual pattern. Sometimes it’s an error. Sometimes it’s a story. Sometimes that has discovered cheating by teachers or schools.
Do you ever use this reporting method for a story and end up not publishing the data visualization? What would lead you to do that?
Cohen: It’s more common NOT to publish than to publish. Only a handful of the slides [in my presentation at the NICAR conference] ended up in anything published. Just like most of your notes never end up in a story, or most of the photographs taken don’t end up published, most of these are used only for our own understanding, not for publication. If it works, that’s fine. If not, well, it’s just part of the process. There are some exceptions — some you know will not only help you focus a story, but will be natural for publication, such as one that we used to investigate reconstruction spending in Iraq.
What tools are available to create these, and what kind of expertise do reporters need to use them? Does it make sense to do these in Flash before you know what you’ll end up presenting to the public?
There are various tools, some easier or harder than others.
It’s common in most newsrooms to use visualizations of maps, often using very sophisticated software such as ArcView. We’re used to the idea that it is very difficult to summarize or digest geographic data on anything other than a picture.
But the other tools are a little rarer. Some reporters who know HTML and a tiny little bit of Javascript have been able to use Google Maps, and now Google Charts, pretty well. It takes a little discipline to get everything right, but it’s not as hard as some other tools.
The MIT Simile project has something called Exhibit, which is also a little picky, but it’s one of the only places I know of to make interactive time lines.
Many Eyes, an experiment by IBM, is also good for creating quick graphics just by cutting and pasting. The only drawback is that it is a public site — you can’t keep your data secret. ProPublica has been doing a lot with it lately, like these: