Before the Carlsen-Nepo match, I had noticed that my model was underestimating the draw rate. I improved it before the World Championship, but there was more work to do. The original model was based on a large database. Most of the games involved strong amateurs, such as 2300s and 2400s. Elite tournaments were a tiny minority, so the model was not focused on them. However, my forecasts are always for top tournaments. First, I needed a better sample. I only kept games where both players were 2500+. Then I estimated a model based on games where at least one player was 2700+ during the period 2010-2021.
The rest of the details are more technical. Each chess game has 3 possible outcomes; this suggests that an ordered logit model is appropriate. However, several sources said that the proportional odds assumption is often violated in the data. Instead, I used a generalized order logit. The explanatory variables were:
-the rating gap (i.e. White's rating - Black's rating)
-the average rating of the two players
-the year in which the game was played
-"elite": this variable equals 1 if one of the players is 2750+ and other other isn't. Otherwise it equals 0. This variable could matter if 2750's are overrated. Normally if a player is overrated, then their rating should adjust. When their rating is above their true strength, they won't be able to perform well enough to maintain it. However, top players mostly compete in round robins against each other. They don't mix with the rest of the pool. If overrated players face each other, then the ratings don't adjust.
-interaction terms (gap x avg, gap x year, avg x year, elite x gap, elite x avg, elite x year, gap^2, avg^2, year^2)
The terms involving Elite were statistically significant (p-value = 0.0004). Here are the results (N = 25534 games):
No comments:
Post a Comment