All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Recommendations for Improving the United States Centers for Disease Control (CDC) Data Practices for Pneumonia, Influenza, and COVID-19

Corresponding Author:
John F. McGowan
Mathematical Software Inc., Sunnyvale, California, United States
E-mail: ceo@mathematical-software.com

Received: 21-August-2023, Manuscript No. ijocs-23-110913; Editor assigned: 22-August-2023, PreQC No. ijocs-23-110913 (PQ); Reviewed: 31-August-2023, QC No. ijocs-23-110913 (Q); Revised: 02-September-2023, Manuscript No. ijocs-23-110913 (R); Published: 28-July2023, DOI: 10.37532/1753- 0431.2023.17(8).318

Abstract

During the pandemic, millions of Americans have become acquainted with the CDC because its reports and the data it collects affect their day-to-day lives. But the methodology used and even some of the data collected by CDC remain opaque to the public and even to epidemiologists. In this paper, we highlight areas in which CDC methodology might be improved and where greater transparency could lead to broad collaboration. "Excess" deaths are routinely reported, but not "years of life lost", an easily-computed and more granular datum that is important for public policy. What counts as an "excess death"? The method for computing the number of excess deaths does not include error bars and we show a substantial range of estimates is possible. Pneumonia and influenza death data on different CDC pages is grossly contradictory. The methodology for computing influenza deaths is not described in suffici t detail that an outside analyst might pursue the source of the discrepancy. Guidelines for filling out death certific tes have changed during the COVID-19 pandemic, preventing the comparison of 2020-21 death profiles with any previous year. We conclude with a series of explicit recommendations for greater consistency and transparency, and ultimately to make CDC data more useful to the public and epidemiologists and other scientists.

Keywords

Pneumonia, Influenza, COVID-19

Introduction

The United States Centers for Disease Control (CDC) was tasked with a wide array of data tracking and policy recommendations during the course of the COVID-19 pandemic. Many choices were made under extreme time pressure, and CDC personnel did the best they could give the conditions they were tasked with. As a result, a number of CDC practices since the start of the pandemic in early 2020 have not followed common scientific and engineering practice. However, several problems with data presentation and analyses for pneumonia and influenza predate the pandemic.

Common scientific and engineering practices are designed to prevent serious errors and minimize faulty results due to cognitive biases [1-3]. Proper use of significant figures and reporting of statistical and systematic errors is generally required for most peer-reviewed journal publications, Ph.D. dissertations, and other scientific and engineering publications. During times of crisis, common scientific and engineering practice should be followed rigorously and uniformly to minimize the chances of serious errors [4-12]

For example, CDC analyses and data presentations for pneumonia, influenza, and COVID-19 frequently do not follow common scientific and engineering practice for proper use of significant figures reporting of statistical and systematic errors , clear and consistent definitions of measured quantities, or transparency and reproducibility [13-20].

This omission of common scientific and engineering practices raises questions about the accuracy of the CDC's data, conclusions, and public health policies in a number of important areas, including the COVID-19 pandemic. These issues may undermine public confidence in the CDC and public health policies if not corrected. These issues are sometimes shared with other government agencies such as the US Social Security Administration (SSA) and US Census Bureau that work closely with the CDC [21].

\As another example, death counts for both individual causes and “all cause” deaths are frequently reported as precise to the last digit without any statistical or systematic errors, despite both known and unknown uncertainties in counting deaths, such as missing persons, unreported deaths due to deceased payee fraud , the ~1,000 living Americans incorrectly added to the government Deaths Master File (DMF), each month, for unknown reasons, considerable uncertainties in assigning the Underlying Cause Of Death (UCOD) by coroners and doctors and other issues [20-30].

Similarly, raw counts, adjusted counts, and estimates often based on incompletely documented computer mathematical models – are often not clearly identified as such. The Deaths Master File, with names and dates of death of deceased persons is exempt from the Freedom of Information Act (FOIA) and unavailable to the general public, independent researchers, and even other government agencies such as the IRS. This confidentiality of data makes independent verification of many CDC numbers, such as the excess deaths numbers tracked during the COVID-19 pandemic, all but impossible.

"Excess" deaths are routinely tracked by CDC, but not Years of Life Lost (YLL), an easilycomputed and more granular datum that is important for public policy.

This article gives more detail on specific examples of failures to follow common scientific and engineering practice, and related data and policy questions. We conclude with recommended improvements to the CDC's data practices, to improve quality and increase public confidence in the data, analysis, and public health policies where warranted. We review a number of examples in the following sections.

Literature Review

■ Discrepancies in tracking pneumonia and influenza de ths

One of the most striking examples is significant differences in the number of deaths attributed to “pneumonia and influenza” on the CDC FluView website (~188,000 per year), the leading causes of death report (~55,000 per year), and the CDC Excess Deaths website (~55,000 per year). The discrepancy between the Flu View website and the leading causes of death report predates the COVID-19 pandemic by several years (Figure 1). It seems likely the weekly pneumonia and influenza death numbers reported on the CDC Excess Deaths website – added during the COVID-19 pandemic – are derived from the same underlying data as the leading causes of deaths reports [31, 32].

ijocs-17-8-at

Figure 1: CDC's Contradictory Pneumonia and Influenza Death Numbers, with CDC’s excess deaths data showing significantly less than the FluView data. (Our plot of CDC data)

The CDC FluView website shows that 6-10 percent of all deaths, varying seasonally, are due to Pneumonia and Influenza (P&I) according to the vertical axis label on the FluView Pneumonia & Influenza Mortality plot. The underlying data files from the National Center for Health Statistics (NCHS) list, as mentioned, ~188,000 deaths per year attributed to pneumonia and influenza (Figure 2).

ijocs-17-8-at

Figure 2: US Centers for Disease Control (CDC) Flu View Pneumonia & Influenza ortality Plot (June 9, 2021)

The CDC FluView graphic and underlying data files list no statistical or systematic errors. The counts of deaths in the data files give the numbers to the last significant digit, implying an error of less than one count, one death, based on common scientific and engineering practice.

In contrast, the CDC’s leading causes of death report (Figure 3), Deaths and percentage of total deaths for the 10 leading causes of death: United States, 2016 and 2017 on Page Nine attributes only 2 percent of annual deaths (about 55,000 in 2017) to “influenza and pneumonia (Figure 3).”

ijocs-17-8-at

Figure 3: CDC’s leading causes of deaths report suggests accuracy of death counts to the single digit level, with no error bars or uncertainties reported.

The difference between the CDC Flu View and leading causes of death report numbers seems to be due to the requirement that pneumonia or influenza be listed as “the underlying cause of death” in the leading causes of death report and only “a cause of death” in the FluView data. This is not, however, clear. Many deaths have multiple “causes of death.” The assignment of an “underlying cause of death” may be quite arbitrary in some or even many cases. Despite this, none of these official numbers, ei ther in the le ading causes of death report or the FluView website, are reported with error bars or error estimates, as is the common scientific and engineering practice when numbers are uncertain. The leading causes of death report for 2017 reports exactly 55,672 deaths from “influenza and pneumonia” in 2017 with no errors as shown in Figure 1.

Death certificates frequently have multiple causes of death. One of these is assigned as the underlying cause of death. This may be quite arbitrary in some cases. Indeed, the concept of “underlying cause of death” may not be well defined for some deaths because elderly patients will often develop multiple health problems in parallel that are fatal either in combination or due to one of the comorbidities reaching a level of severity sufficient to induce death. (See the discussion of the uncertain assignment of the underlying cause of death for deaths where pneumonia is present or a cause of death in the CDC’s Medical Examiners’ and Coroners’ Handbook on Death Registration and Fetal Death Reporting (2003 Revision) and Randy Hanzlick’s Cause of Death and the Death Certificate: Important Information for Physicians, Coroners, Medical Examiners, And the Public, Randy Hanzlick Editor (2006), College of American Pathologists below for examples of this problem).

In contrast, the FluView site, with a much larger number of deaths, appears to count deaths where pneumonia or influenza is listed as “a cause of death,” even if it is not the “underlying cause of death.” The FluView website and the leading causes of death report use semantically equivalent names for the two grossly different In contrast, the FluView site, with a much larger number of deaths, appears to count deaths where pneumonia or influenza is listed as “a cause of death,” even if it is not the “underlying cause of death.” The FluView website and the leading causes of death report use semantically equivalent names for the two grossly different

Both of these sources, especially the FluView website, are intended for the public, busy health professionals, policy makers and others, all of whom have limited time or knowledge to decipher the technical notes provided by CDC and whose confidence in these numbers may be significantly diminished if they notice the gross discrepancy in these two sets of numbers that are not clearly distinguished.

In Peer Review in Scientific Publications: Benefits, Critiques, and A Survival Guide” Kelly et al note in their section on “Common Errors in Scientific Papers.

Another common fault is the author’s failure to define terms or use words with precision, as these practices can mislead readers [33-36].

The scientific and medical distinction between the numbers is substantial if the FluView website is listing deaths where “pneumonia and influenza” are only “a cause of death.” The FluView numbers likely include large numbers of deaths of persons with Chronic Obstructive Pulmonary Disease (COPD), mostly late-stage chronic bronchitis and emphysema, a terminal condition, as well as other often terminal conditions, who are much more likely to die from a respiratory infection than most healthy persons – presumably the “influenza and pneumonia” deaths listed in the leading causes of death report.

Note that the label on the vertical axis of the FluView graph uses the language “% of All Deaths Due to P&I” – where P&I is an abbreviation for “pneumonia and influenza” – not “Deaths Involving P&I” or “Deaths with P&I (Figure 2).” There is no suggestion of any difference between these quite divergent mortality (Figure 4).

ijocs-17-8-at

Figure 4: FluView Mortality Surveillance notes with “A Cause of Death” Language Circled in Red (Dec. 18, 2020)

■ The CDC influenza US deaths model

The CDC uses an incompletely documented mathematical model that attributes roughly 55,000 deaths from pneumonia and influenza to the influenza virus as the underlying cause of death, a number roughly comparable to the total pneumonia and influenza deaths in the leading causes of death data. The presence of the influenza virus is confirmed by laboratory tests, however, in only a small fraction of pneumonia and influenza deaths, ~6,000 per year in most years [37].

Although the language is often unclear in the CDC documents and websites, the CDC appears to claim that there is substantial under-testing for the influenza virus (see the discussion of the influenza deaths model below) and that an initial influenza infection, which often disappears or becomes undetectable in laboratory tests, leads to the subsequent pneumonia, presumably a bacterial pneumonia, although other viruses would be consistent with some lab tests. Based on this argument, the CDC appears to attribute most pneumonia deaths where, historically, pneumonia was listed as the “underlying cause of death,” to the influenza virus — even though laboratory tests frequently fail to confirm influenza or even detect other viruses or bacteria as the cause of death instead of influenza. The “underlying cause of death” issue is discussed in more detail below.

As shown in Figure 5 above, the CDC website Disease Burden of Influenza appears to give a range from 12,000 influenza deaths to 61,000 influenza deaths from this model. Th e graphic does not indicate if this range is a 95% confidence interval — another common scientific and engineering practice — or some other error estimate. The range in the graphic does not appear to match any of the 95% confidence levels for estimated deaths attributed to influenza in on the CDC Disease Burden of Influenza website.

ijocs-17-8-at

Figure 5: The US CDC attributes 12,000 to 61,000 pneumonia deaths to influenza

The website does not provide the source code for the model, nor the data used to produce the model except for the seasons 2010-2011 and 2011-12 provided in the single reference cited. The model was apparently implemented in the proprietary and quite expensive SAS statistics tool based on references to use of the freely available SAS macro BETABIN for fitting a beta-binomial distribution to the data. We did not find any goodness of fit statistics for the beta-binomial model. For example, supplemental Figure  6 shows plots of the fitted beta distributions, no error bars on the fitted models, and no goodness of fit statistics or tests.

ijocs-17-8-at

Figure 6: Beta-binomial probability distributions of the summary proportion of patients tested for influenza and sensitivity of influenza testing across six FluSurv-NET sites, by age group and year.

The beta distributions shown are fitted to data from only six sites in 2010-2011 with a total of 5,458 hospitalized patients and five sites in 2011-2012 with a total of 2,502 patients. In both seasons, all but one site in New York is in the western United States. The paper estimated 114,018–633,001 total hospitalizations per season for the 2010-11, 2011-12, and 2012- 13 seasons. Thus, the sample is a small fraction of the actual hospitalizations and testing for influenza. The beta distribution is a model of the distribution of fractions of patients with respiratory illnesses tested for the influenza virus at different sites.

The sites have widely differing frequency of influenza testing and significant variation in the sensitivity of the influenza testing, ranging from a high of 54% of adults aged 65+ at one site in California (with 1,049 patients) to a low of 18% of adults aged 65+ in New Mexico (with only 102 patients). These fitted beta binomial distributions then appear to be extrapolated nationwide to produce the estimates of influenza deaths which report wide 95% confidence levels. The model is used to adjust the reported deaths with laboratory confirmed influenza by a large multiplier:

The CDC paper cited on the web site as the reference for the model notes (page 11/13 of PDF version) [37]:

“Our analysis was subject to some limitations. First, we assumed that the probability of a person with influenza being tested for influenza was the same as all persons with a respiratory illness. If physicians were more likely to recognize influenza patients clinically and select those patients for testing, we may have over-estimated the magnitude of under-detection”.

The CDC’s Cold Versus Flu web page (retrieved Sep 27, 2021) presents a graphic that seems to imply that a cold and flu (influenza virus) can be distinguished based on clinical symptoms, absent a diagnostic test [38].

We were unable to locate any obvious references for this graphic or any of the statements on the “Cold versus Flu” page which includes contradictory text next to the graphic:

Because colds and flu share many symptoms, it can be difficult (or even impossible) to tell the difference between them based on symptoms alone. Special tests can tell if a person is sick with flu.

The influenza deaths model reference contains a remarkable and counter-intuitive statement with no reference or obvious source (also on page 11/13)[37]:

“Likewise, our estimate of deaths may also be underestimated because we did not adjust for the finding that patients who died in the hospital were less likely to have been tested for influenza than other hospitalized patients”.

One might expect, however, that deaths would be more severe cases of pneumonia and influenza where doctors would order more tests (Figure 7).

ijocs-17-8-at

Figure 7: CDC cold versus flu raphic

There is a substantial history of serious criticism of the CDC’s influenza death numbers by medical scientists and others [39-41]. One prominent critic is Peter Doshi, currently a professor at the University of Maryland and a senior editor at the British Medical Journal (BMJ). Citing the results of actual laboratory tests of deceased patients, critics of the CDC’s flu death numbers such as Doshi have argued that pneumonia deaths are actually due to a range of different viruses, bacteria, other pathogens, and even toxins, rather than predominantly influenza, as implied by the CDC’s influenza deaths model. The output of this model appears to be the basis of the baseline “flu” deaths numbers used in most popular and public policy discussions of COVID-19 deaths — although the leading causes of death report number may also be used.

CDC scientists have published rebuttals to some of Doshi’s arguments. The unresolved controversy illustrates the difficulties with using models instead of direct measurement, especially models that change consequential results by large factors rather than small few percent improvements in accuracy. We recommend reducing the use of models in this area as much as possible. Ideally, testing all patients with respiratory illnesses for influenza and other respiratory viruses is the preferred solution; improvements in PCR and other molecular technologies may make this feasible now or in the near future. In the short term, comprehensive influenza testing is probably not possible, but a better option is to randomly test symptomatic patients from a representative sample of the entire country for influenza and other respiratory viruses to determine the fraction with influenza and the fraction of those who die with influenza.

■ CDC excess deaths website data presentation and analysis issues

Turning to the COVID-19 pandemic data, the CDC Excess Deaths website presents an estimate of the excess deaths due to the COVID-19 pandemic or the pandemic response – associated with COVID-19 in CDC language – based on a mathematical model, the Noufaily or “extended Farrington” model, developed and used for early epidemic detection by the UK Public Health Service. The CDC’s website technical notes indicate the CDC has modified the Noufaily algorithm to “zero out” negative excess deaths in any categories – a statistically invalid procedure for estimating excess deaths that ensures that excess deaths will always be zero or positive even if the actual deaths are lower than expected based on historical deaths data – although this zeroing may be justified as a conservative measure for outbreak detection rather than evaluating the impact of the pandemic and the policy responses to the pandemic [42,43].

Estimates of excess deaths for the US overall were computed as a sum of jurisdiction-specific numbers of excess deaths (with negative values set to zero), and not directly estimated using the Farrington surveillance algorithms (CDC Excess Deaths website, Technical Notes, Retrieved June 7, 2021, emphasis added).

One purpose of the excess deaths analysis is to verify that reported COVID-19 deaths are an actual increase in the all-cause mortality rate rather than relabeling of deaths due to other causes such as Chronic Obstructive Pulmonary Disease (COPD). In the absence of lockdowns, aggressive intubation, and other novel responses to the COVID-19 pandemic, this would be a straightforward inference from a positive excess deaths value larger than the modeling error on the predicted/expected number of deaths from the Noufaily and other models – see the discussion of modeling below. In this context, the problem with the zeroing procedure seems clear. Consider the US has fifty state jurisdictions. For example, if there is no actual increase in the mortality rate between 2019 and 2020, the zeroing procedure can still produce a spurious estimate of increased mortality in 2020. There will be statistical fluctuations in the number of deaths in each state. With no overall increase in all-cause mortality, about half the states will see more deaths in 2020 than 2019, balanced by declines in the number of deaths in the other states. If the negative “excess deaths” in these states with purely statistical declines in the number of deaths are set to zero, however, an overall positive excess death will be incorrectly reported because CDC’s current procedure doesn’t account for negative excess deaths in individual jurisdictions.

Note also that it is theoretically possible for a new virus to lower the all-cause mortality rate if it out-competes and crowds out a more dangerous virus or viruses. It could, for example, become the immediate cause of death in COPD patients and yet lower the number of total deaths. In this case, most jurisdictions could show a decrease in deaths (negative excess deaths) but the zeroing procedure would still show positive excess deaths if some jurisdictions showed increases due to chance.

Note that the graph in Figure 8 can be confusing. The legend in the upper left corner (the blue “g”) seems to indicate that the blue bars are the predicted number of deaths from all causes according to the CDC’s Noufaily, “improved Farrington,” algorithm, but show spikes in the spring, summer, and fall of 2020 suggesting these are the actual weekly deaths during the pandemic. A model based on data before March of 2020 should resemble the beige line, showing a predicted drop in weekly deaths from all causes during the summer of 2020 and no spikes. The legend indicates that the red plus signs are the actual weekly deaths when these exceed the threshold. In common scientific and engineering practice, a plot will show both the model, meaning the predicted deaths, and the data for actual deaths, for the full range of the data – in this case January 2017 through May 2021.

ijocs-17-8-at

Figure 8: CDC excess deaths website is an interactive tool that allows various displays of data relevant to excess deaths since early 2020 in the U.S

The confusing “Predicted number of deaths from all causes” label refers to a second model used to adjust the weekly death counts for delays in receiving all death certificates based on past experience with the delays. This is distinct from the Noufaily model used to predict expected deaths – the beige line – and compute the excess deaths. This is another example of confusing language on the CDC web site and in some documents where it is unclear what is actually meant. In our recommendations section, we suggest some practices to improve names, labeling, and avoid confusion between different models [44,45].

As noted previously, the data on the CDC excess deaths website provides a significantly lower historical (pre-2020) number of deaths attributed to “pneumonia and influenza” (~55,000 per year) than the FluView website (~188,000 per year) [46].

The website does not report the coefficient of determination (usually denoted R2 or r2 and pronounced “R squared” in statistics, sometimes denoted R**2 in plain text and statistical programming) or other goodness of fit statistics for their model, nor does it give any estimate or illustration of the systematic modeling error (Figure 9). It is common scientific and engineering practice to report a goodness of fit statistic, frequently the chisquared statistic Χ2 or the coefficient of determination R2, for any models and rank the models by the goodness of fit statistic for comparison. The goodness of fit statistic such as R2 is itself an estimate, and errors on this measure, usually a 95% confidence interval, should also be reported [47,48].

ijocs-17-8-at

Figure 9: Annual Deaths in USA (CDC Wonder)

We obtained the algorithm from CDC’s GitHub and performed a series of sensitivity analyses under various data assumptions. Figure 10 below shows different possible results under the Noufaily algorithm without the CDC’s inappropriate zeroing procedure and with different parameters and using simple alternative models. Our version of the Noufaily model finds about 411,000 excess deaths with the set of parameters that produces the best R2 value of 0.94. There is an error on the computation of R2 which is shown as a ninety-five percent confidence level range: 0.91 to 0.96. The largest and smallest number of excess deaths with R2 in this range are also shown: about 390,000 deaths and 423,000 deaths. This is based on data from the FluView website downloaded on May 17, 2021, through the period ending January 1, 2021 [49-51].

ijocs-17-8-at

Figure 10: U.S. excess deaths using various statistical models, including Noufaily, with best fit parameters and Alternative Models, Feb. 1 2020-Jan. 1 2021

Note that CDC uses a different set of model parameters with a lower R2 of about 0.74 (i.e. not as good a fit) to produce their estimate of ~500,000 excess deaths in 2020. The CDC parameters are shown in the white line in Figure 10 below [52]. These results and graph are presented as an illustration of the excess deaths data analysis and presentation that we recommend for the CDC excess deaths website and documents.

Annual deaths in the United States began to rise significantly from 2010 to 2017, at which time the decrease slowed dramatically, and almost stopped prior to the COVID pandemic in 2020. The 2010 to 2017 rise appears to reflect the aging and expected increase in mortality of the 1947-1964 “baby boom” generation. The flattening in 2017-2019 is unexpected and appears to reflect declining death rates, notably for heart and other blood coagulation related conditions, possibly due to reductions in risk factors and improved medical therapies [50].

Noufaily model and other simple trend detection models are unable to realistically model this complex evolution of mortality rates. However, the Noufaily model will more accurately match this behavior with the higher R2 shown below with the longer lookback period of five or more years than the CDC’s default of four years. The shorter lookback period used by the CDC weights the slow, almost minimal growth in the death rate during the anomalous, unexpected 2017-2019 period.

We recommend the use of medically-based models that explicitly incorporate and model demographics and aging as well as trends in specific cause mortality rates such as the reported declining mortality from heart attacks for excess deaths modeling and calculations. See the recommendations sections at the end of this paper.

Figure 10 below shows the results of fitting the Noufaily algorithm in the R surveillance package with different parameters and two simple trend models implemented in Python to the FluView deaths data. The excess deaths, the coefficient of determination R2 goodness of fit statistics, and the 95% confidence interval for R2 are given for each model. For the Noufaily models, b, w, and t in the model name refer to key parameters of the model. The most consequential is b, the number of previous years used in the prediction as discussed above. The N_oufaily_b4_w2_t2.58 white line model is the CDC’s choice of parameters. The FluView weekly death counts data are shown as black plus signs. The date in years is indicated on the horizontal axis.

■ Lack of reproducibility of CDC excess deaths

It does not appear possible to independently reproduce the CDC excess deaths graph (Figure 8), or the numerical results from raw data such as actual death certificates. The full Deaths Master File (DMF) used by the Social Security Administration (SSA) is not public and not subject to the Freedom of Information Act (FOIA). Even most other government agencies, including the IRS, lack access to this data that includes the names and dates of deaths of all persons reported deceased to the US government [51].

The ostensible reason for this secrecy is that much of the data is reported to the CDC’s National Center for Health Statistics (NCHS) by the Vital Registration Offices (VRO’s) of individual states and is considered property of the states and not the federal government. The federal government reportedly pays for limited access to this data, instead of general access for the government and general public, as transparency and scientific reproducibility would require.

The CDC provides data files that appear to contain de-identified information on each death on their website. Verifying these files requires the actual names, dates of death, and possibly other identifying information on the deceased persons. A complete verification of all deaths could involve substantial cost and time, but verification of a random sample of the reported deaths provides an affordable alternative. The CDC is not involved in collecting the Deaths Master File–a Social Security Administration project – which means the DMF provides an independent check on CDC tabulations [52].

■ Including years of life lost analysis alongside excess deaths analysis

Years of Life Lost (YLL) is a granular mortality impact measure that considers age and comorbidities in relation to mortality. Excess deaths analysis, in contrast, does not consider age or comorbidities, just the number of deaths. The average age at death of U.S. COVID-19 victims is ~76 and the average comorbidities is ~4, according to CDC data. ~38 percent of all U.S [53-57]. COVID-19- related deaths occurred in nursing homes, and an even higher proportion occurred in longterm care homes more generally (1.3 million people lived in skilled nursing homes and another 1.7 million in other assisted living and other long-term care).

We note that the CDC Wonder database of deaths in the United States shows an average age of death of ~74 years in 2019, the year before the start of the COVID-19 pandemic, suggesting the YLL from COVID-19 may be quite small (COVID-19 average age of death, as just mentioned, was ~76).

Methodology and assumptions are important for YLL analysis, and will affect outcomes significantly. Briggs et al. 2020 found, for example, a weighted mean of 7.33 YLL for COVID-19 deaths through July of 2020 in the United Kingdom, and 8.42 for the United States. Quast et al. 2021 found an average of 9.2 YLL for U.S. COVID-19 deaths in 2020. Both of these analyses are significantly larger than might be expected from the average age of death of COVID-19 victims. We updated Briggs et al.’s data with CDC’s 4.0 average comordibities/ additional causes of death (their analysis assumed just 2.0 average comorbidities) and this results in a weighted mean of 5.3 YLL for U.S. COVID19-related deaths.

AYLL analysis is not as simple as counting deaths and age of death. AYLL analysis is also sensitive to assumptions about pre-existing conditions that generally shorten life expectancy such as obesity, diabetes, Chronic Obstructive Pulmonary Disease (COPD), and others common in COVID-19 victims. A proper YLL analysis should show the YLL results for different reasonable assumptions about pre-existing conditions, similar to the ensemble of models shown in Figure 10 for a simple excess deaths analysis.

In order to enable evaluation of the costs and benefits of the pandemic response, the CDC should compare the direct COVID-19 YLL to the YLL due to overdose deaths, homicides, suicides, and other deaths reasonably attributed primarily to the pandemic response (such as “lockdown” policies). For example, we calculate, based on an average age of death of ~43 years for overdose “deaths, an average 36.8 YLL for overdose deaths” (those living to 43 years old have an average of 36.8 additional years to live, based on the Social Security Administration actuarial life table; SSA 2020) [57-59].

Average age at death is even younger, at ~30 for 2019 homicide deaths. Average YLL for these homicide deaths is significantly higher than overdose deaths, at 49.8. These non-COVID-19 YLL figures are significantly higher than COVID-19 average YLL figures (in the middle or high single digits in the various analyses mentioned) because the age of death is so much younger for these other causes of death [60].

Figure 11 shows a sharp increase, the highest on record at over 30% annually, from 70,357 overdose deaths in the 12 months preceding November 2019, to over 93,000 overdose deaths in 2020 (and still rising through February 2021, to over 95,000, which is the extent of the data available as of September 2021). Based on these trends, we estimate conservatively 22,000 excess overdose deaths for the full year 2020 [61].

There were ~10,000 excess homicides for 2020 through the third quarter (Figure 11), for a preliminary total of ~32,000 excess overdoses and homicides that correlate with the pandemic in 2020 [62].

ijocs-17-8-at

Figure 11: US Overdose Deaths in 2020 compared to previous years (Source: Ahmad et al. 2021)

Using this ~32,000 excess overdose deaths and homicides in 2020 yields ~1.3 million total YLL for just these two categories of non-COVID-19 excess deaths.

Due to the high impact on YLL from pre-existing conditions that shorten life expectancy and from causes of death like overdoses and homicides that affect younger people at a higher rate, it is highly important to include COVID-19 YLL figures alongside, or possibly instead of, excess deaths figures, due primarily to the higher granularity of the YLL measure.

■ Changing death certification guidelines

During the COVID-19 pandemic the CDC (through its the National Vital Statistics System or NVSS) adopted new death certification guidelines, and related practices, in ways that appear inconsistent with prior practice, and without soliciting public review or comment on these very significant changes (see, e.g., Florida v. Becerra 2021, finding that CDC acting in an “arbitrary and capricious” manner in imposing cruise ship restrictions without adequate notice and review). These changes in death certification guidelines, and related coding practices by CDC, make comparing historical (pre-2020) pneumonia and influenza death numbers with COVID-19 pandemic numbers difficult or impossible. It also makes highly important public health policy decisions largely immune from public review and comment [63, 64].

■ The Rules for assigning the underlying cause of death before COVID-19

Prior to 2020 and COVID-19, most pneumonia deaths did not list pneumonia or the pneumoniacausing pathogen, if known, as the underlying cause of death. This will be discussed in detail below. The only common partial exception was HIV/AIDS where Pneumocystis pneumonia (a common fungus, formerly known as Pneumocystis carinii, renamed in a confusing process about 2005) was often the immediate cause of death and the Human Immunodeficiency Virus (HIV) is almost always listed as the underlying cause of death.

However, HIV is not the pneumonia-causing pathogen, which is the pneumocystis fungus. Instead, most pneumonia deaths, those included in the FluView numbers but not included in the leading causes of death numbers, were attributed to a cause such as a chronic lower respiratory disease, heart disease, cancer, even accidents, and other usually pre-existing conditions as the underlying cause of death (Figure 12 ).

ijocs-17-8-at

Figure 12: US Homicide Deaths in 2020 Compared to 2019

The CDC follows the World Health Organization (WHO)’s definition of the underlying cause of death. WHO defines the underlying cause of death as “the disease or injury which initiated the train of morbid events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury” in accordance with the rules of the International Classification of Diseases (ICD) [65]. In the United States, the underlying cause of death is listed at the bottom of the list of causes of death in part I of the death certificate. The immediate cause of death is listed first. Part 2 lists other conditions that are considered contributing factors but not implicated in the causal chain leading to death. Pneumonia is often the immediate cause of death in part 1 of the death certificate.

In principle, death certificates and the assignment of causes of death, including the underlying cause of death, is governed or at least guided by the CDC’s Medical Examiners’ and Coroners’ Handbook on Death Registration and Fetal Death Reporting (2003 Revision)[66]. This one-hundred and thirty-eight (138) page manual provides, however, limited guidance on how to assign the underlying cause of death in cases where pneumonia is present. Page 17 of the document contains the only detailed discussion of deaths involving pneumonia, as follows (Figure 13):

ijocs-17-8-at

Figure 13: CDC Medical Examiner and Coroner's Handbook (2003) on pneumonia

Although the CDC’s Medical Examiners’ Handbook 2003 gives little specific direction on deaths involving pneumonia, it references several books and articles edited or authored by Randy Hanzlick, M.D., now retired Chief of the Fulton County Medical Examiner’s Office and former pathologist with the CDC, including Cause of Death and the Death Certificate: Important Information for Physicians, Coroners, Medical Examiners, And the Public, Randy Hanzlick Editor (2006), College of American Pathologists (the reference seems to have been updated to the year 2006 since the original release of the handbook in 2003), which discusses the cause of death for pneumonia cases in more detail, notably on pages 89 and 90 (emphasis added):

“Pneumonia is often a nonspecific process that occurs as the terminal event in someone who dies of a more specific underlying cause of death, such as congestive heart failure resulting from ischemic heart disease. In such cases, the specific underlying cause of death should be included in the cause-of-death statement.

Pneumonia is often designated as either community acquired or hospital or institution acquired (nosocomial). If the community- or institution-acquired nature of the pneumonia is known, the cause-of-death statement should include an indication of which one applies.

The specific bacterial, viral, or other infectious agent, if known, should be cited in the causeof-death statement.

Relevant risk factors should also be cited in the cause-of-death statement, as might occur in an alcoholic who develops tuberculous pneumonia. Only in those instances where pneumonia has caused death and there is no known underlying cause or risk factor should the underlying cause of death be stated as “Pneumonia,” being sure to specify the infectious agent, if known, or specifying that a specific etiology is unknown, if such is the case”[67].

And on page 113 of Cause of Death and the Death Certificate by Randy Hanzlick, dementia, cerebrovascular disease, cardiac disease, and lung disease are all listed as common underlying causes of death in cases of deaths due to pneumonia (Figure 14):

ijocs-17-8-at

Figure 14: Hanzlick on assigning pneumonia as underlying cause of death

Thus, traditionally, pre-pandemic, pneumonia deaths were frequently assigned a nonpneumonia un-derlying cause of death, usually a pre-existing condition and not the pneumonia causing pathogen such as the influenza virus or SARS-COV-2, in common medical practice.

Based on the CDC’s technical notes mentioned above, these pneumonia and influenza deaths would be included in the FluView death numbers but not in the leading causes of death report.

■ Comparing COVID-19 death numbers to the pneumonia and influenza death numbers and estimates from previous years

As shown above, the CDC tracks at least three (3) different pneumonia and influenza death numbers and estimates: the Leading Causes of Death Report (~55,000 deaths per year, about two percent of annual deaths from all causes), the FluView graph and underlying data from the NCHS (~188,000 deaths per year, six to ten percent of annual deaths from all causes, before 2020), and the influenza death model estimates that range from 12,000 deaths per year to 61,000 deaths per year, with the best estimate close to the number of pneumonia and influenza deaths in the leading causes of death report.

Are any of these the proper baseline for comparing COVID-19 deaths to prior years or should some other number or estimate be used?

In the absence of the RT-PCR, antigen, and antibody tests for the SARS-COV-2 virus, most COVID-19 deaths would likely have been unexplained pneumonia deaths lacking a laboratory test confirming influenza or other known pathogen. Possibly, some COVID-19 deaths would have been listed as heart attacks or strokes, those COVID-19 deaths attributed to the blood clots and other blood-related anomalies currently blamed on COVID-19, or even some other causes [68].

The rest of this article will focus on the pneumonia deaths that would probably comprise most of the COVID-19 deaths in the absence of Emergency Use Authorization (EUA) laboratory tests for COVID-19, which may be misleading or inaccurate, sometimes to a high degree, depending on how they are employed (see, e.g. Skittrall et al. 2021 , finding, based on a hypothetical application of standard Positive Predictive Value analysis, 25 times more false positives than true positives in testing the United Kingdom population in June 2020, based on measured background prevalence and test sensitivity and specificity).

The US CDC’s April 2020 guidelines for reporting COVID-19 deaths (NVSS: Vital Statistics Reporting Guidance, Report 3, April 2020) clearly direct physicians and others not to list Chronic Obstructive Pulmonary Disease (COPD) as the underlying cause of death in COVID-19 cases. Instead, it should be included in Part 2 of the death certificate, which is reserved for “non-cause” contributing factors. This guidance differs dramatically from medical practice prior to 2020, as described in Randy Hanzlick’s book and implicit in the FluView pneumonia and influenza deaths data above. The April 2020 guidance states, in relevant part:

In some cases, survival from COVID–19 can be complicated by pre-existing chronic conditions, especially those that result in diminished lung capacity, such as Chronic Obstructive Pulmonary Disease (COPD) or asthma. These medical conditions do not cause COVID–19, but can increase the risk of contracting a respiratory infection and death, so these conditions should be reported in Part II and not in Part I.

This guidance also gives a specific example of a COVID-19 death with COPD relegated to Part 2, see Figure 11.

Although other causes of death that are often given as the underlying cause of death in pneumonia cases on pre-2020 death certificates are not explicitly identified in the April 2020 guidance document, it seems probable that most physicians would move these pre-existing conditions to Part 2 and not list them as the underlying cause of death for COVID-19, based on the April 2020 CDC guidance document. Note that COPD would fall under the category “lung disease” in the list of “distractors” from Hanzlick’s Causes of Death and the Death Certificate, mentioned above (Figure 10).

Thus, COVID-19 deaths since the April 2020 guidance are probably roughly comparable to the FluView ~188,000 pneumonia and influenza deaths per year that occur in a normal flu year. The language “roughly” is used because the April 2020 guidance encourages physicians and others to assign COVID-19 as the underlying cause of death in any death where COVID-19 is detected by tests or even just suspected, raising the possibility that heart attack and stroke deaths might be wrongly classified as COVID-19 deaths, as well as the traditional pneumonia and influenza deaths that would be listed in the FluView data. These would presumably be misclassified (“reassigned”) as the COVID-19 deaths exhibiting the mysterious blood clots and other blood-related problems reported in some COVID-19 cases and deaths. Thus, the FluView death numbers may represent a lower bound on COVID-19 deaths rather than an exact baseline (Figures 14 &15) [68-71].

ijocs-17-8-at

Figure 15: COVID-19 Death Certificate Guidance Example with COPD as Contributing Factor Only (source: NVSSVital Statistics Report Guidance April 2020)

Conclusion & Recommendations

In light of the previous discussion, we make a number of recommendations to improve CDC’s data practices, including improved observance of common scientific and engineering practice – such as use of significant figures and reporting of statistical and systematic errors. Common scientific and engineering practice is designed to prevent serious errors and should be followed rigorously in a crisis such as the COVID-19 pandemic.

Note that some of these recommendations may require changes in federal or state laws, federal or state regulations, or renegotiation of contracts between the federal government and states. This is probably the case for making the Deaths Master File (DMF), with names and dates of death of persons reported as deceased to the states and federal government, freely available to the public and other government agencies.

All CDC numbers, where possible, should be clearly identified as estimates, adjusted counts, or raw counts, with statistical errors and systematic errors given, using consistent clear standard language in all documents. The errors should be provided as both ninety-five percent (95%) confidence level intervals and the standard deviation – at least for the statistical errors.

In the case of adjusted counts, the raw count should be explicitly listed immediately following the adjusted count as well as a brief description of the adjustment and a reference for the adjustment methodology. For example, if the adjusted number of deaths in the United States in 2020 is 3.4 million but the raw count of deaths was 3.3 million with 100,000 deaths added to adjust for unreported deaths of undocumented immigrants, the web pages and reports would say: Total deaths (2020): 3.4 million (adjusted, raw count 3.3 million, unreported deaths of undocumented immigrants, adjustment methodology citation: Smith et al, MMWR Volume X, Number Y)

The distinction between the leading causes of death report “pneumonia and influenza” deaths, ~55,000 per year pre-pandemic, and the FluView website “pneumonia and influenza” deaths, ~188,000 per year pre-pandemic, should be clarified in the labels and legends for the graphics and prominently in the table of leading causes of death or immediately adjacent text. Statistical and systematic errors on these numbers should be provided in graphs and tables.

n general, where grossly different raw counts, adjusted counts, or estimates are presented in CDC documents and websites with the same name, semantically equivalent or nearly equivalent names such as “pneumonia and influenza” and “influenza and pneumonia,” clearly distinct names should be used instead, or the reasons for the gross difference in the values should be prominently listed in the graphs and tables or immediately adjacent text. It should be easy for the public, busy health professionals, policy makers and others to recognize and understand the differences

Where mathematical models are fit to data, such as the excess deaths computation, goodness of fit statistics should be reported in results, in or immediately adjacent to any plots, graphs, or tables showing the results. We recommend at least the standard chi-squared and the standard coefficient of determination (R2), which is often of greater practical utility than the chi-squared statistic, as is common scientific and engineering practice in most fields.

CDC should provide results for different models for the same data with similar R2 values – coefficient of determination – to give the audience a quick sense of the systematic modeling errors – since there is no generally accepted methodology for estimating the 95% confidence level for the systematic modeling errors. See Figure 10 above for an example.

All mathematical models should be free and open source with associated data provided using commonly used free open-source scientific programming languages such as Python or R, made available on the CDC website, GitHub, and other popular sources. The models and data should be provided in a package form such that anyone with access to a standard MS Windows, Mac OS X, or Linux/Unix computer can easily download and run the analysis – similar to the package structure used by the GNU project, for example.

Specifically, the influenza virus deaths model should be provided to the public as code and data.

Mathematical models should have distinct short English names where possible. We recommend the use of a unique digital identifier, possibly the DOI (Digital Object Identifier) system for each model and increasing sequential version numbers (e.g., 1.1, 1.2, 2.0, 2.1…) for different versions of the model. The digital identifier should point directly to the free, open-source code used. A footnote or link such as (English Model Name, Point of Contact, MODEL ID, Version) should be associated with plots, tables, or other documents generated with the model. For example, (Influenza Deaths Model, Smith, 123423, v 1.12) to enable quick reproducibility of results and avoid confusion between different models. In particular, several different models appear to be used in various aspects of reporting the influenza disease burden, estimating reductions in the burden due to the influenza vaccination program, and other influenza related metrics.

We recommend minimizing the use of models that produce large changes in the measured value, certainly greater than 100% changes, such as the influenza death model which produces multipliers of 2-12 applied to raw counts of death certificates listing influenza as a cause of death, phasing out such models and switching to direct measurement, or as close to direct measurement as possible.

With respect to excess deaths tracking, include all major cause of death categories, rather than just the thirteen (13) in the cause-specific excess deaths that CDC tracks, which currently account for about 2/3 of all deaths.

Include a Years of Lives Lost (YLL) display for COVID-19 deaths, and non-COVID-19 deaths, as well as excess deaths analysis, due to the higher granularity of YLL analysis when compared to excess deaths analysis. Explain the pros and cons of both analytical tools. Do the same for any future pandemics or health crises

Adopt or develop a different algorithm or algorithms for tracking excess deaths which are mostly attributed to non-infectious causes such as heart attacks, cancer, and strokes. The Farrington/Noufaily algorithms were specifically developed as an early warning for often nonlethal infectious disease outbreaks such as salmonella. A medically-based model or models that incorporates population demographics such as the aging “baby boom” and evolving death rates broken down by age, sex, and possibly other factors where known is probably a better practice rather than simple empirical trend models such as the Noufaily algorithm.

Eliminate the zeroing procedure in calculating excess deaths, in which negative excess deaths in some categories are set to zero, rather than being added to the full excess deaths sum over all categories.

The anonymized data with causes of death as close to the actual data as possible, e.g., the actual death certificates, should be available on the CDC website in a simple accessible widely used format such as CSV (comma separated values) files. The code used to aggregate the data into summary data such as the FluView website data files should also be public

The full Deaths Master File (DMF) including the actual names of the deceased persons and dates of death should be made available to the general public, independent researchers, and others. This is critical to independent verification of many numbers from the CDC, SSA, and US Census.

COVID-19-related deaths figures should be tracked based on year-specific age of death, rather than 10-year age ranges, as is currently the case.

CDC frequently changes the structure and layout of the CSV files/spreadsheets on their websites. The CDC should either (1) not do this or (2) provide easy conversion between different file formats with each new format so it is trivial for third parties to quickly adapt to the changes without writing additional code. CDC should provide a program or program in a free and open-source language like R to convert between the formats.

The CDC and other agencies should be required to announce and solicit public comment for changes to case definitions, data collection rules, etc. for key public policy data such as the COVID-19 case definitions, death certification guidelines, and coding rules. Other government agencies have significantly more public participation than CDC, which is appropriate in a modern democracy.

Any practices and policies imposed in a public emergency, such as case definitions, definitions of measured quantities, data reporting practices, etc. imposed without public comment and review, should have an expiration date (e.g. sixty days) beyond which they must be subject to public review. Public comment, reviews, and cost/benefit analyses should happen during this emergency period.

Enacting these reforms should reduce the risk of serious errors, increase the quality and accuracy of CDC data and analyses, as well as any policies or CDC guidelines based on the data and analysis, and strengthen public confidence in the CDC and public health policies.

Acknowledgements

We thank Dr. Robert Anderson, Dr. Lauren Rossen and other staff of the CDC’s National Center for Health Statistics (NCHS) for detailed answers to our questions regarding the CDC’s excess deaths analysis and providing a copy of the R statistical programming language code and data used for the excess deaths analysis during this difficult time. All conclusions and opinions in this paper belong to the authors only .

References