Who’s Yoyang Gu?

Recently I read an interesting article about the leading causes of COVID deaths, according to statistical analysis. Surprisingly population age or lockdown measures are not strong predictors of the number of deaths in a given country; other factors should be considered, as you will see below. The author is Yoyang Gu, a data scientist and practical machine learning expert. He is the creator of covid19-projections.com, a site dedicated to accurate modelling of the COVID-19 pandemic.

A data-driven approach

I like data analytics because it lets you find unexpected causes of many phenomenons, which people should practice more during this pandemic of self-made immunologists.

For instance, in the book Everybody Lies Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, the author Seth Stephens-Davidowitz described how statistical correlation and modern data collection methods disrupted horse racing: there was a time in which horses’ potential was judged by the dimension of their pedigree: whenever a horse’s parents or siblings were great performers, they also would have a shot at success on the racetrack. They would sell for a lot of money at auctions. Jeff Seder applied a statistical methodology to determine that pedigree is not a good predictor of race track performance. After years of research finally found that there is one unusual physical attribute that was highly predictive: the size of the horse’s left ventricle. This story proves that what can sound logical sometimes needs to be backed up by data and that the root causes of what happens around us may be hidden.

What’s causing the deaths? Why?

Coming back to the COVID outbreak, the left ventricle’s size is three: inequality, population density and nursing home residents per capita. Countries with a higher Gini coefficient, people per square meter, and nursing home residents tend to have higher related COVID deaths (the R^2 equals 0.35). The method used was a linear regression with increasing L1 regularisation to find the most robust predictors. Other variables correlate weakly: stringency index, relative humidity, poverty rate, minority population, flu death rate, church-going rate. Many assumptions seem to fall here: older adults are not necessarily the ones at risk, lockdown does not seem an effective means of restriction of virus death spread.

It comes naturally to wonder how comes that these three factors harm global heath. According to the Economist, You could give three explanations:

  1. A study in 2016 by Beth Truesdale and Christopher Jencks found “modest evidence” of a link between lower life expectancy and higher income inequality. Adding one dollar to one wealthy individual has a lower benefit than removing a dollar to a needy individual, so that people in the lowest income percentiles tend to suffer more.
  2. A second cause could be the work relations formed in more egalitarian countries. Where there is less power distance between employer and employee, such as not Europe, workers feel more “entitled” in raising concerns about current work conditions than in stricter countries.
  3. Lastly, highly unequal countries have less civic engagement and community spirit, which pushes for lower observation of restrictions.

At the end of the day, I believe that following common sense without backing it up with data could be very costly for societies, and it’s worth making an extra effort to challenge people’s ideas and always look for data and evidence, especially after such a tough year.

Leave a Reply

Your email address will not be published. Required fields are marked *