Updated forecast after Tari replaced Rapport
Monday, May 30, 2022
Friday, May 20, 2022
Norway Chess 2022
Anand returns to classical chess! Though many players are skipping the tournament in order to focus on the Candidates, the field is still very strong.
Saturday, May 7, 2022
Are 2700s overrated? Insights from the new model
In my last post, I described the new model. Now we will look at more results, though this time I will try to stay away from all the technical jargon.
If you perform better than your expected score, then your rating goes up. But is the expected score formula accurate? The model can tell us. Here is the graph for a 2650 player facing opponents from 2600 to 2900. When a 2650 plays another 2650, the expected score is 0.5. Not surprising - when you play someone with the same rating, you have equal chances. The model and Elo's formula agree on that. But then the two lines diverge. On the far left, we have a 2650 playing a 2600. Elo's formula ("theory") says that the 2650's expected score is about 0.57. But in the data, the 2650 scores slightly worse - roughly 0.55. This means that in real life, the 2650 will lose rating points if playing weaker players. On the other side of the graph, the pattern reverses. When facing stronger opponents, 2650s perform better than their rating and gain points.
Friday, May 6, 2022
New Methodology
Before the Carlsen-Nepo match, I had noticed that my model was underestimating the draw rate. I improved it before the World Championship, but there was more work to do. The original model was based on a large database. Most of the games involved strong amateurs, such as 2300s and 2400s. Elite tournaments were a tiny minority, so the model was not focused on them. However, my forecasts are always for top tournaments. First, I needed a better sample. I only kept games where both players were 2500+. Then I estimated a model based on games where at least one player was 2700+ during the period 2010-2021.
The rest of the details are more technical. Each chess game has 3 possible outcomes; this suggests that an ordered logit model is appropriate. However, several sources said that the proportional odds assumption is often violated in the data. Instead, I used a generalized order logit. The explanatory variables were:
-the rating gap (i.e. White's rating - Black's rating)
-the average rating of the two players
-the year in which the game was played
-"elite": this variable equals 1 if one of the players is 2750+ and other other isn't. Otherwise it equals 0. This variable could matter if 2750's are overrated. Normally if a player is overrated, then their rating should adjust. When their rating is above their true strength, they won't be able to perform well enough to maintain it. However, top players mostly compete in round robins against each other. They don't mix with the rest of the pool. If overrated players face each other, then the ratings don't adjust.
-interaction terms (gap x avg, gap x year, avg x year, elite x gap, elite x avg, elite x year, gap^2, avg^2, year^2)
The terms involving Elite were statistically significant (p-value = 0.0004). Here are the results (N = 25534 games):