Suspicious Votes? Data Should Include Correlations

Journalistic standards protect data science from ambiguity

5 min readNov 22, 2020

Image edited by author, plain outline at publicdonainvectors.org, source Openclipart

On Monday, November 9, Trump campaign adviser Steve Cortes claimed that Georgia’s results showed a suspicious number of Biden-only votes — that is, ballots for which voters apparently declined to vote for any down-ballot offices or propositions, ostensibly because they lacked interest or deemed themselves insufficiently informed. But it’s not a great leap to imagine that Cortes really wanted his readers to accept that such voters were rushing to fabricate a glut of ballots favorable to their candidate.

Two days later, National Review disputed this. Apparently using file detail.xls by following links from sos.ga.gov, National Review asserts the state’s difference between the number of votes for president and the smaller number of senate votes in the Purdue race wasn’t sufficient to support Cortes’ claim of so many extra Biden-only ballots.

I found both merit and disingenuousness in the arguments of both sides. This does not bode well for the standard of straightforwardness.

I’ll let the reader decide which was worse

Cortes clearly explained his calculation, but confused matters by segueing from another source, as if to confuse cross-platform voting with ballots missing down-ballot votes.

NR disproved Cortes, but unfortunately was technically inaccurate in its precise method, thereby inviting resistance from opponents and, however slightly, harming its credibility and that of journalism going forward.

Partisan rationalizations aside, many will think the NR article was better-written because its error seems so unlikely to occur in real life. Others will find the error itself to matter more than Cortes’ mishmash of sources since he did cite his calculation accurately, even if his claim was off-base.

Since the NR article came later, its figures are more in line with those more recently available. This is sufficient for comparison purposes, as is disregarding the special senate election.

As rational as NR sounds here, its big mistake is that it only mentions the net difference in totals; if there are 50,000 ballots with votes for senator but not president, there would be enough to make Cortes’ claim correct. Any reader uncertain about this can find that such votes do exist among the totals, as there are a few counties where the total senate votes for a category — early, absentee, provisional or regular — exceeds the presidential votes for that category. But as unlikely as that may be, it appears ignorant of NR to ignore the possibility. As I explain, that undermines their journalism.

But since the state data don’t distinguish how votes for each candidate correlate with other votes on individual ballots, there would be no way to prove a Biden-only ballot claim anyway. We can prove that both sides have logical arithmetic errors.

The Cortes article’s entire section on Biden-only votes is reproduced below because the precise wording is instructive:

Trump campaign legal counsel Sidney Powell reports that, nationwide, over 450,000 Biden-only ballots were cast, meaning the voter allegedly selected Biden but then neglected down-ballot candidates, including closely-contested Senate and House races.
Again, this phenomenon appears far more prominently in battleground states, raising the alarm for manipulation.
Why would so many people vote Biden–only in battleground Georgia, but not in deeply-red Wyoming, for instance?
In the Peach State, President Trump’s vote total almost exactly tracked the vote totals for the Republican senate candidates, separated by merely 818 votes out of 2.43 million votes Trump earned there. But, Joe Biden saw an astounding surplus of 95,801 votes over the Democratic Senate candidates.
By comparison, in Wyoming Biden only registered a surplus “Biden-only” take of just 725 votes over the Democratic Senate candidate there, or about 1/4th his take in in Georgia, on a percentage basis.
The Biden-only ballots do not conclusively prove fraud, but they sure reek of something very amiss.

The portion of Cortes’ essay that I included can be divided into two parts. The latter half provides numbers to substantiate NR’s claim as to how Biden-only ballots were defined. While readers may well deny the usefulness of this metric, there is no denying the method is declared, even if harder to follow without the imagery and narration of the video embedded in the NR piece.

On the other hand, Cortes’ first paragraph above clearly refers to voters who “allegedly selected Biden but then neglected down-ballot candidates.” I saw no evidence that this metric was recorded anywhere on the state website, and one may well wonder if situating it adjacent to another source is meant to provide wiggle room for Cortes to distance himself from the assertions he hopes the reader will find bolstering.

Even when Cortes’ (and Powell’s) claim is presented accurately, it’s both legitimate and noncommittal in any case. It’s fine to say something unusual must be going on in states where downballot voting is less in-sync, just as others can claim that voters in a state with an unusual senate race might expect to have more cross-party voting. A charitable reader might say this was really the anomaly Cortes meant to address. My point is not to attribute motives, but to call attention to actual errors.

But this is only one of Cortes’ categories

I offer a reference to a point you may have once heard about the significance of absence of evidence. In this story, Sherlock Holmes deduced that an act was committed not by an intruder, but by an insider, because a dog did not bark at the time.

Similarly, the NR article had no argument with the other categories of Cortes’ claims of incredulity:

Oddly high turnout percentages
“Sleepy Joe” outperforming “rock star” Obama in some counties
Pennsylvania rejecting mail-in votes much less than in other times or places

Either side can easily appear to claim that the normal expectations don’t apply, whether for the reasons Trump defied predictions in 2016 or because of especially intense rejection based on his actions or portrayals since.

House percentages aside, a blackjack dealer plays by different rules than the other players; they have different roles and rules, as do plaintiff and defendant. Some will say National Review has an exalted journalistic status and therefore should be held to a higher standard, especially in an era of social media censorship; others will place the burden of proof on Cortes because he takes the “alarmist” position. Still others will simply acknowledge and reject error no matter what the source.

In a world where real concerns and hype compete for attention, this analysis may seem like more detail than anyone would want to spend time on. But satisfying one’s curiosity in depth on a narrow subject can leave one with an understanding that can transfer to other subjects without devoting great detail to them.

I demonstrate this to data scientists not only as an invitation to step back and consider the motives behind what they’re exposed to. It’s also an invitation to defend their hot stock against cynics of statistics and reporting who might apply guilt by association.

Suspicious Votes? Data Should Include Correlations

Journalistic standards protect data science from ambiguity

I’ll let the reader decide which was worse

But this is only one of Cortes’ categories

Written by Chris Dungan

No responses yet