In recent years, growing numbers of news organizations in the U.S. have begun to systematically track the demographics or other characteristics of the sources they cover and experts they quote. Variously referred to as “source audits” or “source diversity tracking,” such efforts are often described as part of a larger strategy around addressing persistent inequalities, underrepresentation and misrepresentation in news.
The thinking is that collecting better metrics about sourcing will lead to more inclusive and more accurate journalism. NPR has said its source-tracking efforts, begun in 2013, are part of its mission to “look and sound like America.” Minnesota Public Radio launched its source-tracking initiative in 2021 with “the goal of reflecting the voices of all Minnesotans.” Others say they want to address existing power structures; as the editors of the San Antonio Report wrote in a commentary piece, “While who we quote and use as news sources is often affected by structures of power and influence in our city, that doesn’t mean our work has to reflect those structures.”
Motivations aside, the question of how to collect and analyze such data has so far attracted only limited attention as an object of study in its own right—either among academic or professional communities.
This month, at the Association for Education in Journalism and Mass Communication (AEJMC) conference in Philadelphia, we have sought to do just that, offering an overview of the most common approaches U.S. news organizations are taking when it comes to source tracking, and several of the key questions these approaches raise about tradeoffs and limitations. We see promise in the intentions of these efforts, but newsrooms should understand the complex challenges involved in seeking to guide and improve newsroom sourcing practices.
Three approaches to source tracking
We identified and defined three distinct approaches to source tracking:
- retroactive source auditing
- real-time source tracking
- automated source monitoring.
Each involves differences around when and how sources’ demographic data are collected and when and how sources’ demographic data are analyzed.
Retroactive source auditing
The most widely adopted approach to source tracking is some form of retroactive auditing or review of the sources quoted in coverage, bounded by some defined body of news stories over a specific period of time. Under this approach to source tracking, data about source demographics and analysis of these data both happen some time after news stories have been published or broadcast, and findings from these audits must be reported back to the newsroom at a later date.
Newsrooms have taken different approaches in deciding who should conduct these audits, as well as how such data ought to be collected and analyzed. Some news organizations outsource source auditing to external researchers or consultants: Impact Architects has conducted source audits for a handful of public media organizations, including KUOW and KQED. This organization collects a sample of source records from a news outlet’s website and then codes the source records based on demographic categories. Impact Architects explains in its source audit for KUOW that they use “visual cues, reporters’ and sources’ own words and descriptions, and online research to make judgments about each source’s characteristics.” Others approach auditing in house. For example, KUT hired a part-time employee to perform an internal source audit. The consultant first contacted reporters to seek source data before reaching out to sources directly to confirm demographic data. Alternatively, Wisconsin Public Radio used emailed surveys to collect source data—relying on the sources themselves to report their own demographic data.
Real-time source tracking
We use the term real-time source tracking to refer to a second approach which involves the practice of journalists collecting source demographic data during the reporting process itself. These data are then aggregated and analyzed together after sources have appeared in coverage.
How newsrooms approach organizing these data varies considerably. Some newsrooms record and analyze source data using ad hoc systems involving Google Forms, Airtable or Microsoft Excel. A growing number of organizations have turned instead to a tool designed to streamline this work called Source Matters, created by the American Press Institute in 2021. Drawing on RSS feeds of news organizations’ digital content, Source Matters uses natural language processing to identify sources in published stories, building an ever-expanding database of sources appearing in that news outlets’ coverage. The system then prompts newsroom staff to report source characteristics in each story tracked, and then in turn provides real-time analyses of sourcing patterns to newsroom staffers.
Some such as New Hampshire Public Radio and MPR News conduct comprehensive source tracking, with the goal of collecting data on every source quoted in every story. Others, such as LAist, have tracked demographic information for samples of stories at specific points in time.
Automated source monitoring
The least common approach to source tracking we found, as of now, involves what we call automated source monitoring. These approaches use computational systems to identify sources in news content as with Source Matters, but they go farther in also categorizing source demographics using automated methods as well. Source data collection and analysis happen during the reporting and editing process, and as such has the potential to impact the reporting process before stories are published or broadcast, potentially prompting reporters to diversify the range of voices they have included in their drafts. Perhaps the most prominent example of these approaches is NPR’s Dex, which is a rolodex and source tracking system integrated into the media organization’s content management system. As artificial intelligence tools continue to improve and newsrooms experiment with adopting these technologies, it is not inconceivable that such approaches may become more feasible across the industry.
Tradeoffs and key questions
In our research, not only have we sought to identify these burgeoning approaches, but we also consider their distinct strengths and weaknesses. We highlight four critical questions we think newsrooms should consider when evaluating which of these approaches (if any) might be right for their needs.
- What source characteristics does your news organization wish to track?
When establishing a source-tracking protocol, news organizations inevitably need to decide which specific source characteristics they wish to track. Our review showed that newsrooms commonly track, at the very least, gender, age, race and geographic location of sources. Some also track additional information, like whether sources identify with minority groups or the specific role the source plays in a certain story such as serving as an expert, an official spokesperson, etc.
Different approaches to source tracking are better suited to tracking different characteristics. For example, when source data are collected after publication with retroactive source audits, many source details may be difficult to obtain or may have changed in the interim (such as geographic location or job title). Real-time source tracking allows reporters to collect more extensive source data at the time stories are drafted, but doing so may be more labor intensive or intrusive. These trade-offs are largely reversed for automated source monitoring. Depending on the specific areas of diversity a newsroom seeks to address, a particular approach may be better suited than alternatives.
- How integrated should the approach be with existing newsroom workflows?
Embedding source-tracking initiatives in current newsroom workflows potentially holds multiple consequences. It can lead to additional workload for journalists, but it can also serve as an opportunity for newsrooms to reflect on sourcing practices as part of the daily production process. Real-time source tracking imposes the largest burden on journalists, who are asked to collect a variety of sourcing data each time they converse with a source and then take the time to enter that data into a source-tracking system. On the other hand, outsourcing retroactive source audits to external parties means less work for journalists day-to-day but also fewer opportunities to reflect on and engage with what sourcing data may reveal about their coverage. Automated source monitoring may prove to be a middle ground in balancing these trade-offs, although what can be tracked using these methods is currently considerably more limited.
- How important is data reliability and how will you handle missing sourcing data?
Source tracking is presumably only effective when source data are accurate. Systematic missing data about sources is a pernicious problem in source tracking, especially for newsrooms using a real-time source-tracking approach, which are unlikely to succeed in identifying all sources in their content. Since this approach often relies on reporters to manually enter their sourcing information, there is no way to track what proportion of sources’ demographic data has not been entered, which may systematically bias results depending on how attentive newsroom staff may be to following through with these efforts.
Over time, if only reporters who care most about diversity, equity and inclusion diligently enter all of their sourcing data, resulting analyses will likely paint a rosier picture of what source diversity looks like than warranted. While tools like Source Matters and other automated systems help address these problems by identifying the proportion of content where missing data appears and under whose bylines, often these systems still rely on journalists to collect, confirm and enter sourcing data. One advantage to retroactive source audits is that depending on how they are designed, they can offer a more comprehensive picture of source diversity in context, allowing for more systematic comparisons rooted in controlled approaches to sampling. If, on the other hand, a news organization places a larger premium on staff being involved directly in the process of tracking their own sources over and beyond what they might learn from analyzing what they collect, these concerns may ultimately be outweighed.
- How will your newsroom “close the loop”?
Last but not least, we think it is worth underscoring that whichever approach newsrooms may employ for source tracking, such efforts require expending considerable resources in time and money, which means most want to ensure the data they are collecting will lead to actionable and meaningful change in coverage and more representative sourcing practices. The first step in this process requires being attentive to precisely how source data will be aggregated and analyzed. If instantaneous analysis of source demographics is important to a newsroom’s goals, automated source-monitoring approaches best fit the bill. Alternatively either real-time source tracking or retroactive source audits can offer aggregate, in-depth analysis after publication, but they may require designing systems for sharing source diversity data and prompting members of the newsroom to engage with these results and reflect on the practices that produce them.
Implications of source tracking
We set out to review approaches to source tracking because we recognize in recent years, a growing number of news organizations have announced embarking on some form of these efforts. In so doing we surfaced several key questions involving tradeoffs around each approach. But ample questions remain. It is not ultimately clear, for example, whether tracking source diversity does actually lead to improvements in the diversity of sources appearing in news coverage—or under what circumstances these efforts may be effective. Does source tracking ultimately change the way journalists think about sourcing or how they ultimately engage in newsgathering? Likewise, how do audiences perceive such efforts— and to what extent do they discern changes in sourcing patterns?
By outlining the different approaches to source tracking and their tradeoffs, we hope to spur greater dialogue around these questions.