Press Coverage of the Early 2020 Primary

Observations of the early press coverage in the 2020 Democratic presidential primary race

Stephen Godfrey
Towards Data Science

--

Admittedly, we’re still in the year 2019 and the next U.S. presidential election is 2020, about 17 months from the time of this writing. However, the election process has already begun, and there are over 20 individuals who have declared candidacies and are running to secure the nomination of the Democratic party. In fact, this party has already held its first debate and the race is well underway.

This makes it a good time to explore the relationship between candidate polling and press coverage. One way to gain insights and establish a comprehensive overview of an election’s status is to analyze polling and press-metric data across the participating campaigns to look for similarities, differences and patterns. By combining these two data dimensions, we are better able to appreciate the relative performance and positioning of individual candidates.

Previous work and data

This analysis updates previous work which examined similar data for the 2016 Republican presidential primary (post). In addition, more information and the underlying code for the 2016 and 2020 analyses can be found at my GitHub site.

Poll data come from FiveThirtyEight, an analytical service with a popular website that provides quantitative and statistical analysis of politics, sports, science and health, economics and culture. Press data come from the Global Data on Events, Language and Tone or GDELT project. GDELT is a project thats monitors, stores and provides “the world’s broadcast, print, and web news from nearly every corner of every country.”

Debate results

The first place to start is to look at the candidate polling performance after last week’s debate. The chart below contains a snapshot of standings from the first poll (released June 28, 2019) after the debate and measures of each candidate’s change from averages in previous polls. From this, we note that Joe Biden is leading and his poll results did not move much after the debate even though he was widely viewed as having underperformed on the stage. Kamala Harris and Elizabeth Warren were seen by many as having performed well and both saw substantial polling gains. Several other candidates, notably Beto O’Rourke, saw declines from relatively low levels.

Press Coverage

It is helpful to plot poll standings versus the amount of press coverage. The chart below contains average poll performance versus the average number of articles appearing between polling periods for the 10 candidates with the highest polling averages. As expected, there is a positive relationship between the two with leading candidates garnering the most coverage.

In this chart, Biden appears to be a clear front runner with a substantial lead in the polls and considerably more press coverage than any other candidate. Bernie Sanders press coverage is consistent with his second-place standings in the polls, but one might expect press coverage numbers closer to Biden given his 2016 presidential run and longstanding public presence. Warren and Harris have similar poll numbers, but Warren draws considerably more press coverage. Interestingly, O’Rourke and Harris have similar article counts but seem to be moving in opposite directions in the polls, and it’ll be interesting to see if their respective press-coverage metrics diverge.

Tone

One attribute the GDELT project tracks is a measure of article tone. More information on tone can be found in the GDELT documentation, but, to summarize, it measures the difference between positive and negative word frequencies. Although it can range from -100 to +100, the documents examined here have scores concentrated around zero, and with these data, relatively small differences are significant (for example, the average tone average of articles covering Warren is statistically different from 0 and -0.5).

It’s informative to plot poll performance versus this metric, averaged across articles referencing each candidate. Even with small tone averages, noticeable patterns emerge. As can be seen in the graph below, candidates can be organized into two groups denoted by the rectangle and oval shapes. Warren, Harris, Sanders, Kirsten Gillibrand and Biden all see more negative tone in their coverage than O’Rourke, Pete Buttigieg, Cory Booker and Amy Klobuchar.

Polarity

The GDELT project also measures document polarity and plotting poll results versus this value across candidates also yields interesting results. A higher value is associated with a higher prevalence of polarizing language in the document.

In this case, we see a grouping of the four leading candidates, Biden, Sanders, Warren and Harris, all receiving higher polarity coverage than the remaining top-10 field. Among this group, Sanders’ coverage has the highest polarity averages. Given perception that his policy positions are far left of center, this might be expected. It is also interesting to note that Biden’s coverage is less polarized than that of Harris, Warren and Sanders.

In summary

Although it’s clearly early in the 2020 Democratic presidential primary race, potential voters are already being asked to start forming opinions and selecting leaders among a large candidate field. Collecting and analyzing data on these candidates is one way to gain a broad perspective on the nature of these opinions.

From this work, we seeing that leading candidates enjoy the most press documents. However, that coverage has a more negative tone and higher polarity than what is observed for middle-of-the-pack participants. And, among race leaders these metrics notably differ with some having particularly negative and polarized coverage.

--

--

Stephen Godfrey is an experienced technical product manager with deep expertise in quantitative analysis and strategic planning.