Did You Hear This Epidemiology Anywhere? If Not, Ask Yourself Why

Chris Dungan
5 min readJun 14, 2020

--

This post breaks down the salient points of the academic article How data science can ease the COVID-19 pandemic. In addition to comments I added among the quotes, any links present within the quotes are original to the article and are useful to any journalism wishing to expand on it. Perhaps the largest points, whether from the article directly or inferred — which we should aspire to doing as news consumers — I formatted in bold (no bold here from the original article).

It starts by crediting slowed infection to “social distancing and stay-at-home orders.” While proving that may be irrelevant to a communication narrowly aimed at busy scientists, it would be informative, when reported to the general public to quell skepticism or confusion about the varied practices between Florida and New York, for example.

Their attribution that this “halted the immediate threat to the U.S. healthcare system” hearkens back to older reasons given for the economic shutdown which were eventually less reported, but while reassuring us that “data science can already provide ongoing, accurate estimates of health system demand” the article acknowledges earlier overestimation and blames it on not taking accurate lags to hospitalization into account.

I recommend the article’s detailed but not-too-technical language for those who want to learn more about the measurement of data, especially the kind relevant to a pandemic. It also makes the following useful points about how the quality of data is an important limitation to such science; you might ask yourself if you think your accustomed news services gave the sufficient priority relative to what has been reported, especially frequently:

“Most departments of public health are collecting and reporting metrics that are not helpful, and are reporting them with 48 hour delays, and often with errors.” Putting aside misleading irrelevance, delays and errors, that quote makes it reasonable to assume, or at least ask, if methods differ.

“By and large, the recommendations from the health IT community around accurate and fast public health reporting remain ignored.”

The number of COVID-19 hospitalizations, which is the best indicator of the disease’s burden on the regional health system” are specifically cited as corrupted due to:

“time lags in confirming and reporting cases” [emphasis mine]

failure to distinguish between current and cumulative hospitalizations”

“even regions that report hospitalization data often provide only a blurry picture of the burden on the regional health system,” adding that data should “indicate the date of admission, in addition to the date of report or confirmation” [emphasis mine; I take it methods do differ]

The article is a call to action on different aspects of the community to fill the gap: “Just as the maker community stepped up to cover for the failures of the government to provide adequate protective gear to health workers, this is an opportunity for the data and tech community to partner with healthcare experts and provide a measure of public health planning that governments are unable to do.”

The article can be mined by journalists looking for material from different angles; it proposed tracking hot spots via “cough sounds.” Would this be based on phones whose microphone is always on by default, and how might that skew the sample? To what extent could coughs from other sources be distinguished if an inordinate amount of them were due to dust storms or construction, or reported more or less than some might be due to noise in the environment?

“Contact tracing, which currently requires significant human effort, can also help tracking of potential cases if it can be scaled using technology under development by major American tech companies.” What would be the effects of such technology, and what new obligations or expectations might face us?

I especially like the reminders that “symptom tracking is nonspecific and may have difficulty tracking virus activity at low prevalence.” Might this result in tradeoff incentives to keep cases higher to some extent? Especially if that avoids unpopular shutdown restrictions?

Perhaps its most relevant lay pointer about data is its reminder that “daily assessments of a valid sample of the population (via testing, via daily surveys, via electronic health record-based surveillance) would allow monitoring of changes in transmission which can alert us to the need to intervene, such as by reducing mobility.” This one sentence calls to our attention that data collection methods may vary and impose certain restrictions and that the main goal is to determine the change in conditions. This may be especially good news from a privacy standpoint; if one is determined to show that any errors in data collection and analysis are constant, this could undermine arguments that increasingly invasive measures are needed for safety.

The authors acknowledge that useful evaluation and collection of data “will require thoughtful legislation so that the solutions invented for enduring the current pandemic do not lead to loss of privacy in perpetuity.” With mainstream skeptics asserting current economic practices are meant to sway the November elections — if not inure fearful citizens to a perceived need for intrusive protection — much more than “thoughtful” leadership will be needed to convince millions of Americans of this.

The linked article https://thehill.com/opinion/white-house/492025-poor-state-reporting-hampers-pandemic-fight of gives a more detailed picture of the data reporting itself:

“We cannot tell whether an increase in cases is due to the daily increases in testing, more testing of symptomatic patients, or a true change in the infection rate.” Case counts “currently contribute to three dangerous illusions: they underestimate true prevalence because they don’t count infections scientifically, they exaggerate the increase in infections as we test more people, and they inflate the apparent mortality rate per infection because we’re underestimating the number of people infected. This is a formula for transforming rational concern into a panic.” It recommends that “new hospitalizations, new ICU admissions and the daily number of deaths” give a more consistent view of demands on hospitals.

Not one of the 11 states with the highest number of COVID-19 deaths provides daily trends in even two of those numbers” and they “report deaths as cumulative totals.” Of those eleven, “the five that report any data on hospitalizations (California, Louisiana, Florida, Connecticut and Georgia), provide only the current hospital census or cumulative totals, not the trend in daily admissions. How can we know if we are flattening the curve without seeing curves?”

--

--

Chris Dungan
Chris Dungan

Written by Chris Dungan

The biggest problem and achievement of this L.A. based data scientist and sociologist is melding so many interests into unique career steps.

No responses yet