Suppose we’re looking at a crowd of people.
We want to find out something about them, but we don’t have enough time or resources to ask them all. What do we do? We choose a few people, and only ask this sample of the population.
How does the way we sample a population impact what we find out?
What if our crowd was just exposed to a virus? Let’s say it’s the covid-19 virus. Some of the people in our population are showing symptoms. Some are infected, but not showing any symptoms yet. Some have even recovered.
This is a new disease, and any predictions we make now are just models based on what limited information we have - which isn't a lot.
We need answers to these questions, so that we can be ready to care for those who will become most ill.
How we conduct testing for the virus will impact how well we can tell what's going on in our population.
Let’s try an experiment, with our simulated population. If we select people randomly, does the sample reflect the rest of the crowd? Sometimes, but usually not
What happens in our model if we take another random sample of our population? Or only sample people who are noticeably sick?
What happens if we change our population size?
Both strategies, testing purely randomly and only with visible indications, can lead to inaccurate understandings of the actual population. With random testing, we might completely miss infections all together if we don't have a large enough sample. And yet, restricting testing to only those who are ill enough to go to a hospital and meet 'testing criteria' will continue to leave us with incomplete, likely biased, data.
In the case of COVID-19, we need both a large enough sample and an understanding of the biases in our sampling strategies.