e4stat

Friday, January 15, 2021

Tata Steel Forecast

We are back to forecasting chess tournaments!! The model was designed for classical games, so I could not use it for all the online rapid tournaments that we saw over the last year. World Champion Magnus Carlsen is the top seed in the 14-player round robin.

Tuesday, December 29, 2020

Trends in Social Distancing - Part 2

A few weeks ago, I wrote Part 1 of this series. It looked at anonymous smartphone data from several different sources. In all cases, I found that social distancing started after WHO declared a pandemic, even though the shutdowns did not begin until a week or two later. In this update, I now have data on Thanksgiving. Many feared that there would be a surge in cases after people celebrated the holiday together. But it looks like there was little change in social distancing.

One of my sources, SafeGraph, changed its methodology, so I am not using it anymore. I still have data from the adjusted device exposure index (DEXA). Every day, it tracks how many smartphones were in each store (more information in Part 1). In the graph, time (t) is zero on January 20, when the first case in the US was confirmed. So Thanksgiving is around t = 300. There isn't a big spike during the holiday. But perhaps this is not the best data source to capture that. Gatherings happened in people's homes, not in stores. However, it should pick up a Black Friday surge. It is hard to spot it on the graph, so this means that people were shopping online instead.

DEXA

If social distancing fell during Thanksgiving, it should show up in the trips data (more information in Part 1). This is collected by the Department of Transportation. If you look closely, there does seem to be an increase around t = 300. However, it is pretty small.

Number of Trips

The Department of Transportation also estimates how many people left their homes. It was actually trending downwards before Thanksgiving. That isn't surprising; cases were rising, so people became more cautious. The trend briefly reversed for the holiday, but now it is heading back down. Once again, Thanksgiving only has a small effect.

Percent of people who left home

I don't have data on Christmas yet - when the Department of Transportation posts an update, there is a lag of about one or two weeks. Stay safe

Saturday, December 5, 2020

Comments

I just saw that there were a bunch of comments on old articles that I never noticed. Blogspot was putting comments on hold until they could be moderated, but somehow I didn't see any notification to check them. I changed the settings so it should work better now.

Trends in Social Distancing

I came across some interesting data while researching the coronavirus. Social contact is a key factor in explaining the disease's spread. About a week or two before the shutdowns, social contact fell dramatically. After a while, it started to rise, but it is still well below normal. Ideally, we would track everyone and measure how many times they got within 6 feet of someone else, weighted by the amount of time that they were in close contact. That data does not exist. There is measurement error in the available data. However, all my sources tell the same story. That's why I think this pattern is real.

My first source is the Device Exposure Index (DEX) (link to the methodology). It uses smartphone location data. When you go to a store, how many other devices were in that store? One issue is that if someone stays at home, their smartphone drops out of the sample. The Adjusted DEX fixes that problem. I averaged the Adjusted DEX across all US counties, weighting by population.

The big drop that you see is March 11. That is when the WHO declared pandemic. Shutdowns did not begin until a week or two later, but people had already started social distancing. The average value for the Adjusted DEX plummets from around 200 to roughly 50. It's about 100 now, so life is still very far from normal.

My next data source is the Department of Transportation (link). It uses smartphone data to count how many times people leave their homes and how far that they travel. I calculated trips per person for each county. Then I averaged across all counties, weighting by population. I was surprised to see that the average person took almost 4 trips per day before the pandemic. That seems suspiciously high. But then I looked into the methodology. Each time you go somewhere, that is counted as a separate trip. So if you (1) grab coffee from Starbucks in the morning and then (2) go to work and (3) take a walk after lunch and (4) buy groceries on the way home, that is counted as 4 different trips.

A similar pattern emerges. A big drop that preceded the shutdowns, then a gradual recovery, but it's still far below January and February. June 1 corresponds to Ndate=22067, and we see the protests showing up in the graph.

My last source is data from SafeGraph (downloaded from Carnegie Mellon: link). If a smartphone leaves the house for 3-6 hours, they assume you are working part-time. If it's away for 6+ hours, then it's full-time.

Part-time:

Full-time:

The numbers are suspiciously low. Back in February, only 9% of people worked full-time and 13% worked part-time? That can't be right. I almost threw out this dataset due to the measurement error. However, it might still have some uses. It does display the same trend: social distancing began a week or two before the shutdowns. Contact starts rising again, but it's far from normal.

Social distancing began voluntarily, but that doesn't prove that government policies were unnecessary. Right now, I'm studying optimal policy. No results yet - just sharing some data that I found along the way. Take care

Monday, July 27, 2020

The Curve is Starting to Flatten Again

Welcome back to our coronavirus series. Here is a quick summary of my earlier posts:

-Because the growth rate is exponential, it's best to take logs of the data
-I prefer data on active cases rather than total confirmed cases. This is because people who have recovered or died aren't spreading the disease anymore. Unfortunately, I haven't found good data on recoveries. That is why I look at the total number of confirmed cases.
-I run statistical tests to find significant changes in the trend
-The curve was flattening for a while until late June

The tests found 7 distinct periods. The first period was the beginning until March 22. Growth was very rapid back then. You can see that in the table below; the coefficients are circled. The number next to "t1" is the slope in the first period. The slope drops from 0.14 to 0.12 for the second period (March 23-April 13). It keeps falling but then it begins to rise in period 5, which began on June 17. It rises again on June 27. But now we are in period 7. It started on July 18 and the slope began to decline again. The number of cases is still going up, but at a slower rate now.

I graphed the data along with the trend lines, but it is hard to see all of them in the top graph. In the bottom graph, I only show the trends.

Stay safe!

Saturday, June 27, 2020

Coronavirus - Statistics Update

We have been warned that as America reopens, the virus will spread more rapidly. But this didn't show up in the data until very recently. About a week ago, I did a quick check and didn't see any evidence. It's different now.

My data comes from Bing's COVID tracker. I prefer to focus on active cases, since you can't be infected by people who have recovered. However, I noted earlier that the recovery data is not very reliable. First, not every county's recovery data is reflected in Bing's tracker. The data that is reported appears to be cumulative recoveries rather than daily recoveries. That's fine, but on some days, the cumulative total of recoveries goes down - which should be impossible. It can't be daily recoveries because the sum of them all eclipses the total number of cases - also impossible.

Instead, I focus on total confirmed cases. Since the virus spreads exponentially, it's best to take logs (see my explanation in an earlier post). In the graph below, "logC" is the log of confirmed cases and "t" is time. The rapid growth slowed around late March and early April. It continued to slow for a while until the very end of the graph. It's hard to see, but that's why we rely upon statistical tests rather than just eyeballing pictures.

The tests found that there were 6 different periods. In each period, the slope is significantly different from the previous period. For the first 5 periods, the slopes were getting closer and closer to 0. That's a good thing. It means that the disease's spread is slowing down. Unfortunately, Period 6 (June 21 - present) is different. The slopes for Periods 5 and 6 are circled in the picture below.

The slope is about the same as it was in Period 4, which was mid May. It's true that testing has expanded. This means that we are detecting more cases than before. However, it does not entirely explain the increase. The expansion in testing started well before June 21. I don't know if new lockdowns are justified. The benefits of flattening the curve have to be weighed against the economic costs. I'm working on a research project to address this, but progress has stalled. In the meantime, I hope you're staying safe and healthy.

Monday, June 1, 2020

America is Reopening, but the Curve is Still Flattening

As the shutdowns gradually end, the big fear is that there will be a surge of new cases. I find that the number of active cases has continued to rise. However, the growth rate has not increased. The curve is still flattening.

My data comes from Bing's Covid-19 tracker. I focus on active cases (active = confirmed - recovered - deaths). Neither the dead nor the recovered can spread the virus - that's why active cases are the relevant factor. My earlier blog post was about total confirmed cases, but when I turn my attention to active cases, I still get the same result: the curve started to flatten in early April.

Here is the graph of active cases in the US.

You can see the exponential growth. As I explained earlier, when you have exponential growth, it's better to look at the log graph:

It's still spreading, but not as rapidly as before.

When examining the data, I asked, "When did the curve's slope change?" I found that it changed several times.

Time1 is before March 20. Time2 is from March 20 - April 7. Though this was during the lockdown, the disease was actually spreading more rapidly than before. However, it can take up to 2 weeks before symptoms appear, so this increase reflected cases that had started earlier. Time3 is April 8 - April 29. Here we see the curve flattening: the coefficients (circled in red) started shrinking. However, they were still positive. Overall, that means that the virus was spreading but the growth rate was slowing down. Time4 is April 30 - May 9. Some states had started reopening by then. Nevertheless, the growth rate kept slowing down. Right now, we are in Time5: May 10 - present. The growth rate is even slower.

This analysis does *not* prove that the reopenings were a success or that shutdowns were unnecessary. A longer shutdown would have driven down the growth rates more quickly, but we don't know if that benefit outweighs the economic costs. That question is too complicated for a single blog post. I have one piece of the puzzle: the curve is still flattening in spite of the reopenings. Other researchers will have to fill in the rest.