What We Learned from Modeling the Pandemic
Data scientists at First Analytics were among many in the profession who were pressed into service to address the pandemic. Our chairman, Tom Davenport, recently published several stories showing how various companies grappled with the radical change in their business.
Read: When Data Science Met Epidemiology.
Becoming “citizen epidemiologists”, in a sense, provided us with some learning experiences. Among them:
- Traditional epidemiological models, like SIR and its variants proved not to be very predictive, even with modifications.
- Public forecast models had enormous ranges between their predictions, reducing confidence in using any single model. Model ensembles helped.
- Tracking dashboards were innumerable, yet nearly all of them were not forward-looking.
- Most dashboards and data were at geographic levels too high to correlate with a business’ operations. Generally, these were at the nation or state level, although US county-level data was available in some.
- Data flukes would surprise some of the earlier, less sophisticated models we developed. For example, the state of Massachusetts taking 8200 cases off the books in a single day, or New Jersey adding 51,000.
- There were underlying patterns that were reflective of the reporting process, rather than the trajectory of the spread of the virus (e.g. big consistent spikes on Mondays).
- Our own case prediction models were generally only reliable about 14-21 days out, although we provided projections out several months. We had a 10.1% error rate at the county-day level looking forward to day 14, but advised our clients not put too much stock in daily estimates extending beyond three weeks.
Apart from these technical aspects, we observed several cultural and societal phenomenon.
- The media is slow to pick up on changes in trends. For example, our models detected the trend reversal in mid-September which manifested itself in the enormous increases in cases beginning in early November. The public alarm was not sounded until late October. The use of models, rather than backward-looking dashboards could have helped mitigate the upswing.
- Numeric illiteracy is still a problem, even in the media and among educated professionals.
- Data can be manipulated to support any story somebody wants to promote (e.g. the effectiveness of lockdowns).
- The phrase “follow the science” was often wielded to insist that the analysis was complete and conclusive, and to end discussion. See the previous point.
- Having forward-looking predictions of cases became a hot-potato issue for corporate legal and human resource departments, as they deliberated safety policies for their employees in light of the information they possessed.
For the data scientists working this problem (both at First Analytics and at some of our clients) , modeling the pandemic became a drag on the psyche. This was especially true in the early days, where some modelers were updating models every day, seven days a week, for an extended period, and witnessing some of the terrible turns in the data that they thought and hoped would settle down. Tom’s article was correct when he said, “all the data science and analytics people I spoke with will be happy to return to more traditional domains.”
For some examples of things we would like to NOT do again, check out this Pandemic Response case study.