How journalists can use JSON to draw meaning from data

By: Dan Nguyen

March 12, 2012

JSON stands for “JavaScript Object Notation,” which makes it sound like an esoteric bit of programming trivia that non-Web developers won’t ever have to deal with.

But JSON is neither esoteric, nor does it have to involve programming. It is just a data format that’s easy on the eyes for both humans and computers. This is one reason why it’s become one of the preferred data formats of choice for programmers and major Web applications.

JSON is just structured text, like CSV (comma-separated values) and XML. However, CSV typically is used to store spreadsheet-like data, with each line representing a row and each comma denoting a column. XML and JSON can describe data with nested information; for example, a list of users and the last 20 tweets belonging to each user. JSON, however, is more lightweight than XML and easier to read.

In other words, if someone tells you that a website’s data comes in JSON form, this is great news. It means that the data will be easy to collect and parse, and it indicates that the site developer intended this data to be easily usable. This is in contrast to the practice of “Web-scraping,” which involves the tedious work of collecting Web pages and breaking them apart into usable data. JSON is much more enjoyable to work with, which is why most major and successful Web services such as Facebook, Twitter, and Google use it to communicate with your browser. Unfortunately, older websites (e.g. most government websites) do not deliver data in JSON format.

In this piece, I’ll try to demystify JSON so that you can at least recognize it when you come across it. Again, it is just a data format. Reading and understanding JSON doesn’t require programming. But after you see how JSON is used, you’ll realize why it might be worth your while to learn some programming.

Your tweets as JSON

The best way to explain JSON is to show it in the wild. Here’s a simplified version of how Twitter uses JSON to store and transmit your tweets:

Compare this to how this data would be stored in a spreadsheet:

JSON data is stored by key->value

Instead of using column headers to describe a datafield, JSON uses a key->value system to associate a datapoint (e.g. “193”) with its descriptor (e.g. “retweet_count”).

Below, I have circled the keys in orange.

Unlike a simple spreadsheet, however, JSON allows data to be stored in a relational way. The following tweet has a couple of hashtags:

The JSON format allows Twitter to associate the metadata for several hashtags to a single Tweet like so:

Lighter than XML

If you’ve ever used XML before — or even tried learning HTML, which is a subset of XML — you might see that it offers the same kind of structure as JSON. In fact, some services, including Twitter, provide their data in both XML and JSON formats.

Here’s the XML version of the previous JSON snippet of Stephen Fry’s tweet:

It may seem like a difference of only a few dozen characters, but multiply that by millions of such requests in time-sensitive applications, and it should be obvious why JSON is becoming the preferred format.

APIs: JSON in the wild

You’ve probably heard the term “API” before, which is an acronym for Application Programming Interface. That’s a fancy way of saying that an online service has taken the time to design a way to send you data based on the requests you send.

For example, here’s the Twitter API call to get the latest 50 tweets from Stephen Fry. And here’s the API call to get 20 tweets from Stephen Colbert.

Notice how the screen_name and count change correspondingly.

Sending this request gets you back a JSON with the raw tweet data. For Twitter, this is a lot more lightweight than sending you all the HTML markup that composes the page for Fry’s tweets:

It’s a lot more convenient to parse, if you know a little programming (which I’ll get to later).

The joys of JSON

The biggest joy of JSON — actually, APIs in general — is that services have done the hard work of building a database of information. It’s up to you to be creative in combining it.

Let’s start with The New York Times’ Congress API, which is free for developers to use. It’s pretty straightforward. Give it a chamber (House, Senate), a session (e.g. 112), and it gives you all the members (try it out here). Another type of call to the New York Times API retrieves legislation by bill number.

When building the SOPA Opera app at ProPublica, we cross-referenced the congressmember identities with the meta-information for the SOPA bill (such as who signed on as co-sponsors). This made it easy to generate a site that showed all the congressmembers who sponsored the bill, with their mugshots and respective parties:

Matching up who voted for/sponsored each bill is pretty straightforward. Now let’s do something completely facetious: The face.com face detection API. When you upload an image or a URL of an image, it returns a JSON file detailing:

How many faces were found in the photo
The coordinates of the detected faces
Miscellaneous attributes, such as perceived gender and whether a given face is smiling, wearing glasses, etc.

You can try out the Face.com API here:

The Times’ Congress API doesn’t provide Congressmember photos, but we can use the ID to grab the photo directly from the official Congressional directory. Sen. Harry Reid’s ID is R00146, so his photo can be found here.

A fun, multi-part script involves:

Querying the Times’ Congress API to get a JSON datafile of all current Senate members
Using the IDs for each Senator to get the image from bioguide.congress.gov
Sending each image to Face.com to get the facial characteristics data

Here are the results from the Times’ Congress API:

Getting the image from the Congressional directory:

And then sending the image URL to Face.com’s face-detection API:

If you repeat this script 100 times — once for each senator — you can do something amusing like analyze which sitting senator has the biggest smile. If you’re interested in the programming details, you can see a detailed explanation on my blog.

JSON for non-programmers

So how can you do this kind of data mashup without going crazy from all the copying and pasting? Well, if you don’t know how to program, the fun of JSON pretty much ends here. At this point, you know what JSON is and how to read it. But your human hands are far too slow and clumsy to request and parse data at a speed that exploits the usefulness of JSON.

Automating this kind of tedious data-crunching is one of the most common-use cases for programming in journalism. There’s a wealth of data in JSON (and less convenient formats) waiting to be fully analyzed and combined.

How to try JSON-parsing a program right in your browser

The following is just an experiment. You should be able to run some JavaScript code to play with JSON even as you read this article. But it may not work, especially if Twitter’s service is not responding at the moment.

The Web inspector and console

Most major browsers have a Web Inspector tool that includes a Console in which to type in JavaScript code. Developers use it to debug and decipher a webpage, but we can also use it to enter JavaScript in order to write a quick JSON-fetching program.

This is not an ideal situation for programming, but it’s convenient since you’re already in a Web browser reading this webpage.

How to open your Web inspector tool

If you are in Chrome, go to the following submenu from the menubar:
View >> Developer >> JavaScript Console

If you are in Firefox, go to this submenu:
Tools >> Web Developer >> Web Console

Here’s a screenshot of what you should see in Firefox:

Where I’ve typed: console.log(“Hi there, this is the console”); …

…this is where the console begins.

Note: You can open the console while reading this article. In fact, open the console while you’re still reading this article or else the code snippet below may not work.

Now that your browser’s console is open, you can start entering JavaScript code. For example, you can type in:

  console.log("Hello World");

When you hit Enter, you should see the console respond back with “Hello World.” Make sure to copy every punctuation mark, especially the quotation marks, exactly as is.

This is what it looks like in Chrome:

Assuming that worked for you, retype (again, a small typo can derail the entire script) or copy and paste it into your console:

  var url = 'http://api.twitter.com/1/statuses/user_timeline.json?' +    'screen_name=stephenfry&count=10&callback=?';    jQuery.getJSON(url,      function(tweets){          console.log("Success!");          var str = "";          for(var i in tweets){              str += tweets[i].text + "    ";          }          jQuery('#resultsBox').html(str);      }  );

If the Twitter API service is working, when you hit Enter, you should see the text of 10 tweets below:

Now try changing the part of the code that pulls out the “text” and change it to “created_at”. In the variable URL, you can change it to a different screen_name.

For example:

  var url = 'http://api.twitter.com/1/statuses/user_timeline.json?' +    'screen_name=poynter&count=10&callback=?';    jQuery.getJSON(url,      function(tweets){          console.log("Success!");          var str = "";          for(var i in tweets){              str += tweets[i].created_at + "    ";          }          jQuery('#resultsBox').html(str);      }  );

The above, slightly altered snippet will print out the time-posted of the @Poynter’s latest tweets.

All the above snippet does is loop through all the retrieved tweets from Twitter and print out the selected attribute.

If you’re not a programmer, don’t worry if the code doesn’t make sense yet. You should now be able to see how a short snippet can quickly handle tedious work (copying and pasting the text of multiple tweets) and how you can easily select whatever JSON fields you want to work with.

If you have no intention of going into programming, at the very least, it’s important not to be intimidated by JSON. It’s just a data format, after all.

This piece is part of a Poynter Hacks/Hackers series featuring How To’s that focus on what journalists can learn from emerging trends in technology and new tech tools.

Support high-integrity, independent journalism that serves democracy. Make a gift to Poynter today. The Poynter Institute is a nonpartisan, nonprofit organization, and your gift helps us make good journalism better.

Donate

Tags: Digital Strategies, Hacks/Hackers, Media Innovation

Dan Nguyen

I’m a developer/journalist for ProPublica, a non-profit investigative news organization based in Manhattan. It’s a job where I’m lucky to get to exercise my journalism,…

Dan Nguyen

More News

ABC News affiliate WNEP-TV in Pennsylvania accidentally aired test election results. It’s not cheating.

The station said the numbers were randomly generated test results that help news organizations make sure their equipment is working properly

October 31, 2024

Ciara O'Rourke

Gannett executive defends company’s decision to end presidential endorsements

‘We have intentionally returned to our roots as a facts-forward, down-the-center survey of our nation’

October 31, 2024

Angela Fu

Opinion | How Fox News and Bill Hemmer are preparing for another critical election night

Fox News anchor Bill Hemmer said the first meeting to prepare for Tuesday’s election night coverage was held in January … of 2023.

October 31, 2024

Tom Jones

Former USA Today editor rips Gannett’s retreat from presidential endorsements

It's a distinct change for the worse from the ’80s, ’90s and early 2000s, Ken Paulson argued, when Gannett let editors make those decisions

October 31, 2024

Rick Edmonds

Carpenter Media Group has a ‘pattern’ of acquisitions and layoffs, union alleges

The company went from a relative unknown to the sixth-largest newspaper owner in the country over the past year

October 31, 2024

Angela Fu

Back to News

Your tweets as JSON

JSON data is stored by key->value

Lighter than XML

APIs: JSON in the wild

The joys of JSON

JSON for non-programmers

How to try JSON-parsing a program right in your browser

ABC News affiliate WNEP-TV in Pennsylvania accidentally aired test election results. It’s not cheating.

Gannett executive defends company’s decision to end presidential endorsements

Opinion | How Fox News and Bill Hemmer are preparing for another critical election night

Former USA Today editor rips Gannett’s retreat from presidential endorsements

Carpenter Media Group has a ‘pattern’ of acquisitions and layoffs, union alleges

Start your day informed and inspired.

Media Jobs