One of the most important things about data is that it has to be collected. As mentioned in the last post, the data about COVID 19 infections comes from a number of different agencies across the county, state, nation and world. The story we wish to tell from this data is where this thing is headed and how to stop it. The CDC has outlined several specific goals for data collection:
- Monitor spread and intensity
- Determine disease severity and characteristics
- Identify risks increasing severity and transmission
- Track changes in the virus itself
- Estimate impact on health system
- Forecast the spread and magnitude
Much of the data about the virus' behavior comes from researchers and medical staff working the front lines of care. Data about the spread and societal impact of the disease depends on a comprehensive testing regime. Both of these data streams require a rigorous reporting infrastructure as well as a detailed inspection process to ensure data is consistent between disparate sources.
Combining the various testing reports results in a broad and many-layered repository. From it analysts can extract information about the spread of the virus and about the pattern of testing itself. An example of the latter is a study from April 1st that produced two related maps. They illustrate differences in testing between the states as a way of evaluating the variability and extent of testing. I selected just the northeast region of the country from each map and show them side by side below.
The map on the left indicates that for each 100,000 people, New York has a higher rate of testing than Vermont, and Vermont has a higher rate than Pennsylvania. The map on the right shows a different story, where Vermont and Pennsylvania have a relatively low percentage of positive tests per total testing. New York on the other hand has 37% positive cases. This could mean that many more people are infected in New York, or that much more testing needs to be done. We know from the map in the previous post, that New York on April 26 was reporting more than 400 cases per 100,000. While this is higher than elsewhere, it represents only 0.4 percent of the total population.
If most of the testing took place in New York City, where so many infected people were overburdening the health care system, this could skew the rate of positives per total. Perhaps if more testing was done throughout the state, the percentage of positives would come down slightly. Interpreting this result is difficult, since the data was collected from a relatively small sample of the people in the state. This is an example of a data quality issue that was mentioned in the previous post - gaps in the data. According to the CDC and other groups studying the pandemic, testing needs to increase substantially in order to gauge the effectiveness of any policy or treatment changes.
We cannot simply ignore the data we have, even if it is not as complete as we would like. There is a wild fire burning and we have to act to put it out using any means we can. In a LinkedIn interview with Tom Lawson, CEO of FM Global, Mr. Lawson was asked how his leadership style has changed in this crisis. He responded that "Speed beats perfection in a crisis because if you wait until you get all the information, it may be too late. You have to make do with the best information you have."