Thursday, May 14, 2020

Using Data to Tell a Story

Every day we see graphs and maps in the news to illustrate how things are going with the Coronavirus. We look to these visualizations (views of data) as sign posts to guide us through this nightmare pandemic. Already in this blog I have shown several visualizations and talked about the stories they are telling and what it means to us in the path of COVID 19.

We are all familiar with the map of the United States; we see it during election time when news reports want to show which states favor one party or the other. It is a useful framework for showing the distribution of something across the nation. In today's post, I am highlighting two similar maps that show different views of COVID 19 cases in the United States. The first map shows the "Total Cases" for each state and was published in the Brazil Times using data from the CDC (Center for Disease Control). The map was designed to show where the most and the least number of people are infected.



The data needed to make this map is fairly simple: the outcome of each test (positive or negative), the state where the test took place, and the date of testing. For this map  the most recent test results were used. Rather than show the individual totals as a number on the map, the counts were assigned a color based on which range of values they fell into (1 to 100, 101 to 1,000, etc). By graduating the colors from lighter to darker, the map presents a quickly understood picture of where the virus is most active.

Most people know that more people live in California than live in North Dakota. It is not surprising then that the map of "Total Cases" shows the state with the higher population in the red range (the highest virus counts) and the more sparsely populated state in the yellow range (fewer cases).

 The second map shows "Cases Per 100,000" and was published in the Daily Independent on April 30. It uses slightly more recent testing data and calculates the tests per capita from each state's total population (obtained from the US Census Bureau).

What the first map does not tell us, is how each state is doing when you compare the number of cases to the number of people in the state. The "cases per 100,00" values represent the total cases that have occurred for each 100,000 people. This approach offers a way to compare states on an even playing field. Are they doing better or worse at controlling the virus across an equal sample of people? According to this view, California and North Dakota are doing equally well at handling the outbreak for their respective population sizes.

Each of these graphic narratives depend on data that has been collected by government agencies, health care organizations, and other official sources. Often the data is combined in order to show a more complete description of events or to show how the virus is progressing across broader geographic extents such as states or nations. In order to make the best use of data and tell a reliable story, we need to have a clearly defined goal as well as a detailed understanding of the data itself. We need to know how it was collected, where it was collected, and, in the case of the virus, who was selected for testing.

Whether data comes from multiple sources or a single source, we always want to check if it meets some basic criteria. This "fitness for analysis" (Data Quality) is determined by asking several questions:
  • Is the data relevant to the story you wish to tell?
  • Is the data consistently measured and accurate to a similar standard?
  • Is the data complete for the appropriate time period and geographic area, without gaps?
  • Is the data clean - without duplicates and with a well documented structure, so we know what each column or field refers to?
During the pandemic, more and more data is being collected every day. The story you want to tell will help determine what sets of data are relevant. If you want to create a map from your data, you also need a geogaphic representation of the area with the name of the state tied to each state's boundary. This provides a way to link your tabular data to the map..


No comments:

Post a Comment