e4stat: June 2018

Thursday, June 14, 2018

"Is Your Opponent Underrated?" methodology

The complete list of tournaments used in the sample:

Atlantic Open 2017 (U2100, U1900, U1700, U1500)
Bradley Open 2017 (U2100, U1800, U1500)
Cherry Blossom Classic 2018 (U2200, U1900, U1600)
Chesapeake Bay Open 2018 (U2200, U1800, U1600)
Chicago Open 2018 (U2300, U1900, U1700, U1500)
Chicago Class 2017 (Expert, A, B, C)
National Chess Congress 2017 (U2200, U2000, U1800, U1600, U1400)
Continental Open 2017 (U2100, U1900, U1700, U1500)
US Amateur East Individual 2018 (U2200, U1800, U1400)
Eastern Class 2018 (Expert, A, B, C)
Eastern Chess Congress 2017 (U2100, U1900, U1700, U1500)
Evans Memorial 2017 (Expert, A, B, C)
George Washington Open 2017 (U2100, U1800, U1500)
Kings Island Open 2017 (U2100, U1900, U1700, U1500)
Liberty Bell Open 2018 (U2100, U1900, U1700, U1500)
Manhattan Open 2017 (U2200, U2000, U1800, U1600, U1400)
National Open 2017 (U2300, U2100, U1900, U1700, U1500)
North American Open 2017 (U2300, U2100, U1900, U1700, U1500)
North Eastern Open 2017 (U2050, U1650)
Pacific Coast Open 2017 (U2100, U1900, U1700, U1500)
Pan-American Intercollegiate 2017
Philadelphia Open 2018 (U2200, U2000, U1800, U1600, U1400)
Potomac Open 2017 (U2300, U2100, U1900, U1700, U1500)
Southern Open 2017 (U2100, U1800, U1500)
Southwestern Class 2018 (Expert, A, B, C)
US Amateur Team North 2018
US Open 2017
World Open 2017 (U2200, U2000, U1800, U1600)
World Amateur Team 2017

Total = 22,828 games
________________________________________________________________

You can see that I often dropped the open section. Why is that? I notice that GMs and IMs frequently travel to large tournaments, so their opponents come from all over the country. Thus, their ratings will probably not be affected by local inflation or deflation. Regional differences in ratings will only be detected among players who compete primarily in local tournaments, as explained in the article. I typically dropped the bottom sections of the tournaments, since ratings are very volatile at those levels.

Define the "elo residual" to be (actual score - expected score). Let "D" be (player A's rating - player B's rating), i.e., the difference in the ratings. Then Player A's expected score against Player B is:

This comes from Section 4.2 of "The US Chess Rating System" by Glickman and Doan (April 24, 2017) Link to the description of the USCF rating system

Example: In the article, I said that a 1700's expected score against a 1500 is about 0.75. Let's suppose that the 1700 won the game. In that case, his actual score is 1. Therefore, his Elo residual = actual score - expected score = 1 - 0.75 = 0.25. We can say that the 1700 scored 0.25 more points than expected. On average, the Elo residual is zero.

This brings us to a crucial idea. If a player is truly underrated, then he should score better than his rating indicates when he goes to a national tournament. In other words, his Elo residual should be positive. If players from a certain state consistently have positive Elo residuals, then that state is underrated.

I created dummy variables for all 50 states, British Columbia, Ontario, and "other" (for foreign players). For Player A, the dummy variable for a state equals 1 if Player A is from that state. If Player A's opponent is from that state, then the dummy equals -1. Exception: if both players are from the same state, then the dummy is zero. In all other cases, the dummy is zero.

Example: Suppose that Player A is from Montana. He faces an opponent from Kansas. The dummy variable for Montana equals 1. The dummy variable for Kansas is -1. The dummy variables for all the other states are 0.

The first step would be to estimate a linear regression of the following form:

Elo residual = (BetaAlabama)(Alabama dummy) + (BetaAlaska)(Alaska dummy) + (BetaArizona)(Arizona dummy)+ ... + epsilon

Here, "BetaAlabama" is the coefficient on the Alabama dummy; that is what the model is trying to estimate. If Alabama players are underrated, then "BetaAlabama" will be positive. That's because when "BetaAlabama" is positive, then Alabama players have positive Elo residuals, which means that they outperform their ratings. It also means that players facing an Alabama opponent will tend to underperform. "BetaAlaska" is the coefficient on the Alaska dummy, "BetaArizona" is the coefficient on the Arizona dummy, and so on for all the other states. As usual, epsilon is the disturbance term.

However, there is one issue with this approach. Consider someone who plays 9 games in the US Open. Each game is one observation in my sample. However, those observations might not be independent, which can lead to - how do I say this in plain English? - let's say it leads to problems. If you don't have a stats background and you have read this far, I admire your persistence and curiosity. Unfortunately, the rest of this article isn't going to make much sense if you haven't taken - at the very least - an advanced undergrad course in stats. Preferably a graduate level course.

To correct for these issues, I can insert fixed effects or random effects for each player in each tournament.

Delta is the intercept and alpha_i is the fixed or random effects term for player i. EloResidual_ij is the Elo residual in the game between player i and player j. Due to multicollinearity, the dummy for "other" was dropped.

A Lagrange multiplier test soundly rejected the null of no random effects (test statistic = 279.13, p-value = 0.0000). Then I performed a Hausman test (test statistic = 43.87, p-value = 0.7164); the assumptions of the random effects model were not rejected. Therefore, I based my results on the random effects model.

One last issue. The coefficients in the model above are in terms of the Elo residual. E.g., the intercept plus the coefficient for Washington state was about 0.05, which means that Washington players tend to score 0.05 points more than their ratings would indicate.

In order to convert this into rating points, I took a first order Taylor series approximation of the expected score formula. The rating adjustment for state k is the following.

This transformation was also applied to the confidence intervals in order to generate the second graph.

________________________________________________________________

UPDATE: It looks like some of the graphs had to be cut from the original article. It's probably because the magazine was running out of space. The figures below show the 95% confidence intervals for each state. Here is an example of how to interpret them. Take the first state, Alabama (AL). Our best estimate is that their players are underrated by about 30 points. However, they could be anywhere from 10 points overrated to as much as 70 points underrated. There is a fair amount of uncertainty with smaller states. That is because their sample sizes are small. I get a much bigger sample from states like California (CA), so their confidence interval is not so wide.

------------------------------------------------------------------------------------------------
UPDATE2: Due to a software upgrade, I can now estimate a nonlinear model with random effects.

Though this bypasses the need for a linear approximation, there are other challenges. A key assumption is that the random effects are uncorrelated with the regressors. In the linear model, I verified that with a Hausman test. That doesn't work for nonlinear models because the fixed effects estimator is not consistent. The results for the nonlinear model are in the graphs below, but be aware that they rest on an assumption that I haven't been able to test.

Saturday, June 9, 2018

Methodology

Links to all the nerd stuff:

MinStrength Methodology

Methodology for "Is Your Opponent Underrated?"

How the forecasts are generated (before May 2022)

Revised Methodology (May 2022)

Methodology for my Chess Life article "Reforming the Candidates Cycle"

Bonus material to "Reforming the Candidates Cycle" In order to save space, some interesting parts had to be cut.

Friday, June 8, 2018

World Championship Forecast Update

The Norway Tournament began badly for Fabi fans; he lost to Magnus Carlsen. But then Carlsen lost to So and Caruana surpassed him in the end. The rating gap narrowed slightly, so a new forecast is in order:

Methodology

In other news, Jakovenko won the Poikovsky Karpov tournament. I first became a Jakovenko fan when I was studying the 6.Be2 Najdorf; he won a number of instructive games with the White pieces in that line. Gelfand returned to the 2700 club.

Thursday, June 7, 2018

Rook endings at the Chicago Open

Improvement is possible. E4stat spent a decade in Class A, but finally became an expert in 2016. Then at the 2018 Chicago Open, I crossed the 2100 threshold. Today's lesson is about two instructive rook endings from the tournament. You may have read about the Lucena position, but do you really know it? Could you still win against a different defense that tries to stop you from building the bridge? Unfortunately, my opponent did. Later in Round 6, I was tested on the famous Alekhine-Capablanca ending. If you are unaware of that game, click here! Or look it up in Alexander Alekhine's "My Best Games of Chess" to see it with his annotations.

If the games are not displaying properly, click here

[Event "Chicago Open"] [Site "Wheeling"] [Date "2018.05.26"] [Round "2"] [White "Wilson, Matthew"] [Black "Eckert, FM Doug"] [Result "0-1"] [ECO "C95"] [WhiteElo "2093"] [BlackElo "2238"] [Annotator "Wilson,Matthew"] [SetUp "1"] [FEN "R7/8/8/8/3pk3/4r1p1/6K1/8 b - - 0 62"] [PlyCount "43"] 62... d3 $6 {Black is still winning after this move, but it does allow White to prolong the battle.} 63. Rd8 $1 { Black is in zugzwang! But the FM doesn't lose his cool.} Re2+ 64. Kxg3 Ke3 { Heading for the Lucena position, an endgame that you must know if you're serious about chess.} 65. Re8+ Kd2 66. Rd8 Re7 67. Kf3 Rf7+ 68. Kg3 (68. Ke4 Ke2 69. Rxd3 Re7+ $1 70. Kd4 Rd7+ {wins the rook}) 68... Rf6 {Black would like to park the rook on f5 in order to build the famous Lucena bridge. The reason that I played 68.Kg3 was to kick the rook away with Kg4 if it ever came to f5. So instead Black puts the rook on f6, but that's too far away for his construction plans to work.} 69. Rd7 Ke2 70. Re7+ Kd1 71. Rd7 d2 72. Rc7 Rf5 73. Rc8 ({If White tries to sabotage the bridge with} 73. Kg4 {, then} Rf1 $1 { wins easily.} 74. Kg3 Ke2 75. Re7+ Kd3 76. Rd7+ Kc3 77. Rc7+ Kb4 { and the checks are certainly not perpetual. So I tried a different plan.}) 73... Ke2 74. Re8+ Kd3 75. Rd8+ Ke3 76. Rd7 (76. Re8+ Kd4 77. Rd8+ Rd5 { is the key idea in the Lucena. Instead, I keep my rook on the d-file, since the pawn is not threatening to promote at this moment.}) 76... Rg5+ $1 { Pushing my king back.} 77. Kh3 Ke2 78. Re7+ Kf3 {Threatening mate and promotion } 79. Rf7+ $1 {Foiling Black's plan, but he has other ways to win} Ke2 80. Re7+ Kd1 81. Rc7 { Or else 81...Rc5 followed by 82...Kc1 and Black wins without any difficulties} Re5 (81... Rg1 82. Kh2 Re1 83. Kg2 Ke2 84. Re7+ Kd3 85. Rd7+ Kc3 86. Rc7+ Kb4 { is also good enough. White quickly runs out of checks}) 82. Kg3 Ke1 83. Kf4 Re8 ({Ending my last hope. After} 83... d1=Q 84. Kxe5 { I could test if he knew how to win queen vs. rook}) 0-1

This is the game that pushed me over the 2100 barrier: a win against a National Master.

[Event "Chicago Open"] [Site "Wheeling"] [Date "2018.05.28"] [Round "6"] [White "Wilson, Matthew"] [Black "Shanmugasundaram, NM Raj"] [Result "1-0"] [ECO "B45"] [WhiteElo "2093"] [BlackElo "2283"] [Annotator "Wilson,Matthew"] [SetUp "1"] [FEN "6k1/3n1p2/p1r5/5bp1/2B5/PP3P2/1B1R1KP1/8 w - - 0 38"] [PlyCount "67"] {My clock ticked down to 12 seconds. I went for the combination.} 38. g4 $1 (38. Rd5 Rc5 { is less convincing}) 38... Be6 39. Bxe6 fxe6 40. Rxd7 Rc2+ 41. Ke3 Rxb2 42. Rd3 {The dust has settled after the time scramble. I was quite confident here. I'm up a pawn. I don't have any weaknesses, but Black does.} Kf7 43. b4 {I had vague ideas of Rd3-d6xa6, but those variations never seem to work. 43.Kd4 was simpler, but Stockfish thinks the two moves are about equally good.} a5 { White threatens to raid the queenside, so Black figures that it's better to trade his a-pawn than it is to risk losing it for nothing.} (43... Kf6 44. Kd4 Rc2 45. Rc3 Rf2 {If Black trades rooks, then he loses the pawn ending, even if he inserts 45...e5+ first} (45... Rd2+ 46. Kc5 Ke5 47. a4 $1 {is similar}) 46. a4 $1 Ra2 47. a5 Rf2 48. Kc5 Ke5 { Here my notes continue with 49.Kb6, which is good enough, but Stockfish's} 49. b5 $1 {wins easily}) (43... Ke7 44. Kd4 Kd6 $4 {falls into an unusual trap:} 45. Kc3+ $1) 44. bxa5 Rb5 45. a4 $1 ({Maybe} 45. a6 Ra5 46. Rd6 {also works, but I was nearly certain that the text was winning. Know your classics! After.. .}) 45... Rxa5 46. Ra3 {...White has an improved version of the famous Alekhine-Capablanca ending. Unlike in that game, here White can create a second passer and Black's pawns are more vulnerable.} Ke7 47. Kd4 Kd6 48. Kc4 Re5 (48... Kc6 49. Kb4 Kb6 50. Re3 $1 ({During the game, I was planning on} 50. Rd3 {, but then} Re5 {and the rook has found the ideal defensive post}) 50... e5 51. Rd3 $1 {and Black collapses}) 49. a5 Kc7 (49... Rc5+ 50. Kd4 Rc7 51. a6 Ra7 52. Ra5 Ke7 53. Ke5 {Zugzwang!}) 50. a6 Kb8 51. Kd4 {My original intention was to insert 51.a7+ Ka8 here. However, in some lines, White can benefit from keeping the pawn safe on a6, where it doesn't have to be defended. E.g., 51. Kd4 Re1 52.Re3 and it will take Black's king two tempi to snap up the pawn.} Rd5+ 52. Ke4 Rb5 {Stockfish announces mate in 43!} (52... Ka7 53. f4 gxf4 54. Kxf4 {White will simply march the g-pawn up the board. There isn't much that Black can do about it.}) 53. Ra4 Rb1 (53... e5 54. Kf5 Ka7 55. Kxg5 e4+ 56. Kf4 $18) 54. Ke5 Re1+ 55. Kf6 Re3 56. Kxg5 Rxf3 57. Re4 $1 {The rest is straightforward, even with just a minute left on my clock (there was a 10 second delay)} Kc7 58. Rxe6 Kd7 59. Rf6 Ra3 60. Kg6 Ke8 61. g5 Ra5 62. Kh6 Ke7 63. Rb6 Ra1 64. g6 Rh1+ 65. Kg7 Ra1 66. Kh7 Rh1+ 67. Kg8 Rc1 68. g7 Rc2 69. a7 Ra2 70. Kh7 Rh2+ 71. Rh6 1-0