We are back to forecasting chess tournaments!! The model was designed for classical games, so I could not use it for all the online rapid tournaments that we saw over the last year. World Champion Magnus Carlsen is the top seed in the 14-player round robin.
Friday, January 15, 2021
Tuesday, December 29, 2020
Trends in Social Distancing - Part 2
A few weeks ago, I wrote Part 1 of this series. It looked at anonymous smartphone data from several different sources. In all cases, I found that social distancing started after WHO declared a pandemic, even though the shutdowns did not begin until a week or two later. In this update, I now have data on Thanksgiving. Many feared that there would be a surge in cases after people celebrated the holiday together. But it looks like there was little change in social distancing.
One of my sources, SafeGraph, changed its methodology, so I am not using it anymore. I still have data from the adjusted device exposure index (DEXA). Every day, it tracks how many smartphones were in each store (more information in Part 1). In the graph, time (t) is zero on January 20, when the first case in the US was confirmed. So Thanksgiving is around t = 300. There isn't a big spike during the holiday. But perhaps this is not the best data source to capture that. Gatherings happened in people's homes, not in stores. However, it should pick up a Black Friday surge. It is hard to spot it on the graph, so this means that people were shopping online instead.
DEXA
Saturday, December 5, 2020
Comments
I just saw that there were a bunch of comments on old articles that I never noticed. Blogspot was putting comments on hold until they could be moderated, but somehow I didn't see any notification to check them. I changed the settings so it should work better now.
Trends in Social Distancing
I came across some interesting data while researching the coronavirus. Social contact is a key factor in explaining the disease's spread. About a week or two before the shutdowns, social contact fell dramatically. After a while, it started to rise, but it is still well below normal. Ideally, we would track everyone and measure how many times they got within 6 feet of someone else, weighted by the amount of time that they were in close contact. That data does not exist. There is measurement error in the available data. However, all my sources tell the same story. That's why I think this pattern is real.
My first source is the Device Exposure Index (DEX) (link to the methodology). It uses smartphone location data. When you go to a store, how many other devices were in that store? One issue is that if someone stays at home, their smartphone drops out of the sample. The Adjusted DEX fixes that problem. I averaged the Adjusted DEX across all US counties, weighting by population.
My last source is data from SafeGraph (downloaded from Carnegie Mellon: link). If a smartphone leaves the house for 3-6 hours, they assume you are working part-time. If it's away for 6+ hours, then it's full-time.
Part-time:
Full-time:
The numbers are suspiciously low. Back in February, only 9% of people worked full-time and 13% worked part-time? That can't be right. I almost threw out this dataset due to the measurement error. However, it might still have some uses. It does display the same trend: social distancing began a week or two before the shutdowns. Contact starts rising again, but it's far from normal.
Social distancing began voluntarily, but that doesn't prove that government policies were unnecessary. Right now, I'm studying optimal policy. No results yet - just sharing some data that I found along the way. Take care
Monday, July 27, 2020
The Curve is Starting to Flatten Again
-Because the growth rate is exponential, it's best to take logs of the data
-I prefer data on active cases rather than total confirmed cases. This is because people who have recovered or died aren't spreading the disease anymore. Unfortunately, I haven't found good data on recoveries. That is why I look at the total number of confirmed cases.
-I run statistical tests to find significant changes in the trend
-The curve was flattening for a while until late June
The tests found 7 distinct periods. The first period was the beginning until March 22. Growth was very rapid back then. You can see that in the table below; the coefficients are circled. The number next to "t1" is the slope in the first period. The slope drops from 0.14 to 0.12 for the second period (March 23-April 13). It keeps falling but then it begins to rise in period 5, which began on June 17. It rises again on June 27. But now we are in period 7. It started on July 18 and the slope began to decline again. The number of cases is still going up, but at a slower rate now.
Saturday, June 27, 2020
Coronavirus - Statistics Update
My data comes from Bing's COVID tracker. I prefer to focus on active cases, since you can't be infected by people who have recovered. However, I noted earlier that the recovery data is not very reliable. First, not every county's recovery data is reflected in Bing's tracker. The data that is reported appears to be cumulative recoveries rather than daily recoveries. That's fine, but on some days, the cumulative total of recoveries goes down - which should be impossible. It can't be daily recoveries because the sum of them all eclipses the total number of cases - also impossible.
Instead, I focus on total confirmed cases. Since the virus spreads exponentially, it's best to take logs (see my explanation in an earlier post). In the graph below, "logC" is the log of confirmed cases and "t" is time. The rapid growth slowed around late March and early April. It continued to slow for a while until the very end of the graph. It's hard to see, but that's why we rely upon statistical tests rather than just eyeballing pictures.
The tests found that there were 6 different periods. In each period, the slope is significantly different from the previous period. For the first 5 periods, the slopes were getting closer and closer to 0. That's a good thing. It means that the disease's spread is slowing down. Unfortunately, Period 6 (June 21 - present) is different. The slopes for Periods 5 and 6 are circled in the picture below.
Monday, June 1, 2020
America is Reopening, but the Curve is Still Flattening
My data comes from Bing's Covid-19 tracker. I focus on active cases (active = confirmed - recovered - deaths). Neither the dead nor the recovered can spread the virus - that's why active cases are the relevant factor. My earlier blog post was about total confirmed cases, but when I turn my attention to active cases, I still get the same result: the curve started to flatten in early April.
Here is the graph of active cases in the US.