Tuesday, December 29, 2020

Trends in Social Distancing - Part 2

A few weeks ago, I wrote Part 1 of this series. It looked at anonymous smartphone data from several different sources. In all cases, I found that social distancing started after WHO declared a pandemic, even though the shutdowns did not begin until a week or two later. In this update, I now have data on Thanksgiving. Many feared that there would be a surge in cases after people celebrated the holiday together. But it looks like there was little change in social distancing. 


One of my sources, SafeGraph, changed its methodology, so I am not using it anymore. I still have data from the adjusted device exposure index (DEXA). Every day, it tracks how many smartphones were in each store (more information in Part 1). In the graph, time (t) is zero on January 20, when the first case in the US was confirmed. So Thanksgiving is around t = 300. There isn't a big spike during the holiday. But perhaps this is not the best data source to capture that. Gatherings happened in people's homes, not in stores. However, it should pick up a Black Friday surge. It is hard to spot it on the graph, so this means that people were shopping online instead.


DEXA



If social distancing fell during Thanksgiving, it should show up in the trips data (more information in Part 1). This is collected by the Department of Transportation. If you look closely, there does seem to be an increase around t = 300. However, it is pretty small.

Number of Trips


The Department of Transportation also estimates how many people left their homes. It was actually trending downwards before Thanksgiving. That isn't surprising; cases were rising, so people became more cautious. The trend briefly reversed for the holiday, but now it is heading back down. Once again, Thanksgiving only has a small effect.

Percent of people who left home


I don't have data on Christmas yet - when the Department of Transportation posts an update, there is a lag of about one or two weeks. Stay safe 




Saturday, December 5, 2020

Comments

 I just saw that there were a bunch of comments on old articles that I never noticed. Blogspot was putting comments on hold until they could be moderated, but somehow I didn't see any notification to check them. I changed the settings so it should work better now.

Trends in Social Distancing

I came across some interesting data while researching the coronavirus. Social contact is a key factor in explaining the disease's spread. About a week or two before the shutdowns, social contact fell dramatically. After a while, it started to rise, but it is still well below normal. Ideally, we would track everyone and measure how many times they got within 6 feet of someone else, weighted by the amount of time that they were in close contact. That data does not exist. There is measurement error in the available data. However, all my sources tell the same story. That's why I think this pattern is real.

My first source is the Device Exposure Index (DEX) (link to the methodology). It uses smartphone location data. When you go to a store, how many other devices were in that store? One issue is that if someone stays at home, their smartphone drops out of the sample. The Adjusted DEX fixes that problem. I averaged the Adjusted DEX across all US counties, weighting by population. 



The big drop that you see is March 11. That is when the WHO declared pandemic. Shutdowns did not begin until a week or two later, but people had already started social distancing. The average value for the Adjusted DEX plummets from around 200 to roughly 50. It's about 100 now, so life is still very far from normal. 

My next data source is the Department of Transportation (link). It uses smartphone data to count how many times people leave their homes and how far that they travel. I calculated trips per person for each county. Then I averaged across all counties, weighting by population. I was surprised to see that the average person took almost 4 trips per day before the pandemic. That seems suspiciously high. But then I looked into the methodology. Each time you go somewhere, that is counted as a separate trip. So if you (1) grab coffee from Starbucks in the morning and then (2) go to work and (3) take a walk after lunch and (4) buy groceries on the way home, that is counted as 4 different trips. 



A similar pattern emerges. A big drop that preceded the shutdowns, then a gradual recovery, but it's still far below January and February. June 1 corresponds to Ndate=22067, and we see the protests showing up in the graph.

My last source is data from SafeGraph (downloaded from Carnegie Mellon: link). If a smartphone leaves the house for 3-6 hours, they assume you are working part-time. If it's away for 6+ hours, then it's full-time.

Part-time:


Full-time:


The numbers are suspiciously low. Back in February, only 9% of people worked full-time and 13% worked part-time? That can't be right. I almost threw out this dataset due to the measurement error. However, it might still have some uses. It does display the same trend: social distancing began a week or two before the shutdowns. Contact starts rising again, but it's far from normal. 

Social distancing began voluntarily, but that doesn't prove that government policies were unnecessary. Right now, I'm studying optimal policy. No results yet - just sharing some data that I found along the way. Take care

Monday, July 27, 2020

The Curve is Starting to Flatten Again

Welcome back to our coronavirus series. Here is a quick summary of my earlier posts:

-Because the growth rate is exponential, it's best to take logs of the data
-I prefer data on active cases rather than total confirmed cases. This is because people who have recovered or died aren't spreading the disease anymore. Unfortunately, I haven't found good data on recoveries. That is why I look at the total number of confirmed cases.
-I run statistical tests to find significant changes in the trend
-The curve was flattening for a while until late June

The tests found 7 distinct periods. The first period was the beginning until March 22. Growth was very rapid back then. You can see that in the table below; the coefficients are circled. The number next to "t1" is the slope in the first period. The slope drops from 0.14 to 0.12 for the second period (March 23-April 13). It keeps falling but then it begins to rise in period 5, which began on June 17. It rises again on June 27. But now we are in period 7. It started on July 18 and the slope began to decline again. The number of cases is still going up, but at a slower rate now.



I graphed the data along with the trend lines, but it is hard to see all of them in the top graph. In the bottom graph, I only show the trends.



Stay safe!

Saturday, June 27, 2020

Coronavirus - Statistics Update

We have been warned that as America reopens, the virus will spread more rapidly. But this didn't show up in the data until very recently. About a week ago, I did a quick check and didn't see any evidence. It's different now.

My data comes from Bing's COVID tracker. I prefer to focus on active cases, since you can't be infected by people who have recovered. However, I noted earlier that the recovery data is not very reliable. First, not every county's recovery data is reflected in Bing's tracker. The data that is reported appears to be cumulative recoveries rather than daily recoveries. That's fine, but on some days, the cumulative total of recoveries goes down - which should be impossible. It can't be daily recoveries because the sum of them all eclipses the total number of cases - also impossible.

Instead, I focus on total confirmed cases. Since the virus spreads exponentially, it's best to take logs (see my explanation in an earlier post). In the graph below, "logC" is the log of confirmed cases and "t" is time. The rapid growth slowed around late March and early April. It continued to slow for a while until the very end of the graph. It's hard to see, but that's why we rely upon statistical tests rather than just eyeballing pictures.




The tests found that there were 6 different periods. In each period, the slope is significantly different from the previous period. For the first 5 periods, the slopes were getting closer and closer to 0. That's a good thing. It means that the disease's spread is slowing down. Unfortunately, Period 6 (June 21 - present) is different. The slopes for Periods 5 and 6 are circled in the picture below.



The slope is about the same as it was in Period 4, which was mid May. It's true that testing has expanded. This means that we are detecting more cases than before. However, it does not entirely explain the increase. The expansion in testing started well before June 21. I don't know if new lockdowns are justified. The benefits of flattening the curve have to be weighed against the economic costs. I'm working on a research project to address this, but progress has stalled. In the meantime, I hope you're staying safe and healthy.


Monday, June 1, 2020

America is Reopening, but the Curve is Still Flattening

As the shutdowns gradually end, the big fear is that there will be a surge of new cases. I find that the number of active cases has continued to rise. However, the growth rate has not increased. The curve is still flattening.

My data comes from Bing's Covid-19 tracker. I focus on active cases (active = confirmed - recovered - deaths). Neither the dead nor the recovered can spread the virus - that's why active cases are the relevant factor. My earlier blog post was about total confirmed cases, but when I turn my attention to active cases, I still get the same result: the curve started to flatten in early April.

Here is the graph of active cases in the US.


You can see the exponential growth. As I explained earlier, when you have exponential growth, it's better to look at the log graph:


It's still spreading, but not as rapidly as before.

When examining the data, I asked, "When did the curve's slope change?" I found that it changed several times.


Time1 is before March 20. Time2 is from March 20 - April 7. Though this was during the lockdown, the disease was actually spreading more rapidly than before. However, it can take up to 2 weeks before symptoms appear, so this increase reflected cases that had started earlier. Time3 is April 8 - April 29. Here we see the curve flattening: the coefficients (circled in red) started shrinking. However, they were still positive. Overall, that means that the virus was spreading but the growth rate was slowing down. Time4 is April 30 - May 9. Some states had started reopening by then. Nevertheless, the growth rate kept slowing down. Right now, we are in Time5: May 10 - present. The growth rate is even slower.

This analysis does *not* prove that the reopenings were a success or that shutdowns were unnecessary. A longer shutdown would have driven down the growth rates more quickly, but we don't know if that benefit outweighs the economic costs. That question is too complicated for a single blog post. I have one piece of the puzzle: the curve is still flattening in spite of the reopenings. Other researchers will have to fill in the rest.

Friday, May 15, 2020

K+Q vs. K+R: How to Win This Tricky Ending

Click here if the diagrams aren't displaying correctly

Over the last few weeks, I have been exploring ChessTempo's Endgame Benchmark problems. Most of them weren't too hard and my rating rose steadily. But then I ran into a bunch of Queen vs. Rook exercises. They were extremely difficult. I don't think most players realize how tough they are. Successfully solving them would often take more than an hour plus a bit of "luck." (I.e., the computer playing a move that delays checkmate but is easy to refute). In a practical game, this could be even harder. The defensive techniques are very simple, you won't have hours and hours of time on the clock, and you'll be exhausted from the long battle that preceded the endgame. So here is my guide to winning this tricky ending.

Tip #1: *Always* watch out for stalemate

I was working on a problem, and the following position came up in my calculations. You can quickly verify that there is no safe square for the rook. Either it falls immediately or it falls a few moves later due to tactics. I was about to head for this position, but then I remembered to look for stalemate.

[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "New game"] [Black "?"] [Result "*"] [SetUp "1"] [FEN "8/8/1Q6/8/k7/8/1K6/2r5 b - - 0 1"] [PlyCount "1"] 1... Rc5 $1 {and the rook is immune. White's pieces don't coordinate very well and the win will be difficult.} *



The next position is well known.


[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "New game"] [Black "?"] [Result "*"] [SetUp "1"] [FEN "5k2/7r/4Q1K1/8/8/8/8/8 b - - 0 1"] [PlyCount "3"] [TimeControl "40/7200:20/3600:1800"] 1... Rg7+ $1 (1... Rh6+ $1 {also works}) 2. Kf5 (2. Kh6 Rh7+ $1) (2. Kf6 Rg6+ $1 {This idea is especially important to know. It comes up in a lot of variations}) 2... Rf7+ {The only way to escape the checks is to play 3.Ke5 or 3.Ke4. But in either case, Black draws immediately with 3...Re7.} *



Tip #2: Centralize the queen

We all know that a centralized queen controls more squares than a cornered queen. Thus, if you keep your queen in the center, she can often pick off the rook with a fork. So far I'm just stating the obvious. But if you combine the first 2 tips, you may learn something new: when you push back the defender, lead with your king, not your queen. To see this, change the last diagram just a little.


[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "New game"] [Black "?"] [Result "*"] [SetUp "1"] [FEN "5k2/7r/6K1/3Q4/8/8/8/8 b - - 0 1"] [PlyCount "4"] {The situation is completely different.} 1... Rg7+ (1... Rh6+ {doesn't work any more. Because White kept his queen in the center, there are no stalemates.} ) (1... Re7 {(to prevent 2.Qd8#)} 2. Qd8+ Re8 3. Qf6+) 2. Kf6 Re7 ({ Another important difference:} 2... Rg6+ {isn't stalemate. Again, there are no stalemates because the queen stayed in the center.}) 3. Qd6 {and the rook falls } *



Tip #3: The Temporary Retreat

At first, it's easy to push the defender back. The tough part is once they are near the edge, it's hard to make further progress due to stalemate threats. The defender should keep the rook on the third rank. This simple idea will probably frustrate the attacker unless he knows about the Temporary Retreat.


[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "New game"] [Black "?"] [Result "*"] [SetUp "1"] [FEN "8/8/8/8/3Q4/1k1K4/8/2r5 w - - 0 1"] [PlyCount "3"] [TimeControl "?"] 1. Qe5 $1 (1. Qb6+ {is tempting, since it pushes back the enemy king. But White's king cannot approach and there's no way to fork the rook.}) 1... Rd1+ 2. Ke2 {At first glance, White is losing ground. But Black's blockade quickly collapses. The rook can't hold the d-file because of forks (I'll let you work out the details). Trying to give checks from the side with 2...Rg1 or 2...Rh1 fails for the same reason. Returning to the c-file doesn't work either: 2... Rc1 3.Kd2 and the checks end shortly. The rest is straightforward.} *



Thus, the Temporary Retreat is very effective. However, it's so counterintuitive that you probably wouldn't find it during a game. Voluntarily giving up ground looks is very unnatural; almost everyone will prefer 1.Qb6+ instead and then struggle to break the blockade. Now there is one more paradoxical idea to know, which brings us to Tip #4.


Tip #4: Letting the King escape


[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "New game"] [Black "?"] [Result "*"] [SetUp "1"] [FEN "8/8/8/4Q3/8/k2K4/1r6/8 w - - 0 1"] [PlyCount "25"] 1. Qa5+ {My first instinct was to approach with the king, but then I just get checked. So instead, I intentionally let Black's king escape from the a-file.} Kb3 2. Qa1 $3 {And now I violate Tip #2 and put my queen in the corner. At first, White's last 2 moves look terrible. But let's look deeper. While Black's king did escape from the a-file, he can't run any further - he's chained to the rook. Furthermore, the Black king is taking away squares from the rook. Black's pieces are badly misplaced and he's in zugzwang.} Rh2 (2... Ra2 3. Qc3+ Ka4 4. Kc4 {and it's very easy}) 3. Qc3+ Ka2 (3... Ka4 4. Qd4+ $1 { (centralization) and the rook drops off shortly:} Kb3 { (other moves lose the rook even faster. I'll let you verify)} 5. Qd5+ $1 Ka4 { (again, other moves lose the rook more quickly)} 6. Qa8+ $1 { followed by 7.Qb8+ and 8.Qxh2}) 4. Qe5 $1 {Centralizing the queen. It's hard to believe, but White can't fork the rook yet. Instead, I take away squares and prepare to advance my king.} Rb2 (4... Rh3+ 5. Kc2 { and the centralized queen denies Black ...Rh2+}) 5. Kc3 Kb1 { The rook can't move because of tactics} 6. Qe1+ Ka2 7. Qd1 $1 ({ Remember our earlier principle: lead with the king, not the queen.} 7. Qc1 $6 Rb3+ {and now White can't play} 8. Kc2 $4 {because of} Rc3+ $1 {. So White has to play 8.Kc4 and then he's driven back even further after 8...Rb4+!}) 7... Rh2 {Black is in zugzwang; all the rook moves lose quickly.} 8. Qa4+ Kb1 9. Qb5+ Ka2 10. Qa6+ Kb1 11. Qb6+ Ka2 12. Qa7+ Kb1 13. Qb8+ { winning the rook and the game} *



And that is how you win Queen vs. Rook. Stay healthy everyone!

Tuesday, April 14, 2020

The Curve is Almost Certainly Flattening

Almost all chess events have been cancelled due to the coronavirus, so instead we use our statistical tools to analyze the epidemic. To flatten the curve, the US has shut down much of the economy. Are these efforts succeeding? At first glance, it looks like we haven't made any progress.

Confirmed cases in the US






















(Source: CDC)

However, the trend is exponential, so it can be hard to spot changes in the growth rate. That's why we take logarithms:

Log of confirmed cases in the US






















Now there does appear to be a recent slowdown in the growth rate. I checked this with a simple model. New York State announced a lockdown on 3/20. Within a few days, other states with big outbreaks made similar announcements. It can take a week or two for an infected person to show symptoms. Thus, even if the social distancing is working, you might not see results until the end of March or early April. So does the curve flatten starting in late March/early April? Yes, it does. In the picture below, "time73" is for the curve after the first week of April. "Time" is for the whole curve. You can see that the number next to "time73" is substantially smaller - this means that the curve started to flatten in early April.

The next part of this post is more technical. If you haven't studied a lot of stats, this is a good time to stop reading. The bottom part of the screenshot shows that the slowdown in the growth rate is statistically significant. However, this test presumes that we had already known the exact date of the structural break. We didn't. It could have been anywhere from 1 week to 2 weeks after the lockdown.  Andrews (1993) showed how to test for an unknown structural break (see Table 1 in this paper). However, the slowdown remains statistically significant after making these adjustments. Thus, I conclude that the curve really is flattening.

Though I looked for a structural break based on when the social distancing started, the test does not prove that the lockdown caused the curve to flatten. It seems likely to be true, but showing that is more complicated. I guess I would start by comparing the results for different states that began the lockdown at different times (while being aware of the endogeneity). I don't have time to explore that, but hopefully one of my readers does. Write a comment with your results. My analysis also does not show if the shutdown was worth it. That would be a separate project. 

That's all for today. Stay safe, everyone!


Sunday, March 15, 2020

Candidates Tournament

Caruana is the favorite, but the opposition is very strong - don't be surprised if someone else wins.


Monday, February 10, 2020

Thursday, January 9, 2020

Tata Steel 2020

Magnus Carlsen is the top seed in the 14-player round robin. Here is the forecast