Tuesday, October 25, 2022

The 2022 US Memory Championship

I qualified comfortably. The finals were scheduled for October in the Orlando area. After five events, one of the twelve finalists would be crowned national champion. I slept well, but I was nervous on the day of the event. There was tension in my chest. My palms sweated when I was holding the microphone. Each year I feel more pressure since I keep improving and expect more from myself.


The first event was Pinball Recall, a Lumosity game (described here in last year's post). It would determine seeding. The finalist with the highest score got to pick their seat. In the next events, the person in the first seat got asked the first question, the person in the second seat got asked the second question, etc. Being in the last seat is a slight advantage. There is a chance that someone will get eliminated before you get questioned again. However, I did not consider this to be very important, so I neglected it. My preparation consisted of playing the game once on the morning of the championship. I beat my high score, so I figured I was in good shape. Then I finished in last place. Not a good start. I was in the 4th seat.


Event #2: Words. We had 15 minutes to memorize a list of 200 words. Two mistakes and you are eliminated. The round would continue until three of the twelve finalists were knocked out. My plan was to memorize 120 words. It's important to be calm when memorizing and I wasn't calm. Plus, some of the words were tough, so I decided to do just 100. I didn't need to know all 200; I just had to know more than the three weakest competitors. The finalist in the first seat had to recite the first word. Then the finalist in the second seat had to say the second word, etc. But if someone made a mistake, the next person had to correct them. E.g. if the tenth finalist screws up the 10th word, now the 11th finalist has to say the 10th word, then the 12th finalist has to say the 11th word, and so on. So you can't just memorize words 1, 13, 25, etc. I did all of the first 100. I was sailing through until around the 60th word. I had visualized "riptide - image" on a table in Ophelia Parish hall and then there was "autonomy - house" on a couch. Then when we were recalling the words on stage, I skipped the table and went straight to the couch. I almost never do this. The pressure got to me. Strike one for Matt. If I did that again, I would be knocked out.


Soon it was my turn again. The previous word was "omen" but I was drawing a blank on what came next. My time was ticking; we only have 15 seconds to answer. Was I going to finish in last place?! Then I saw a "k" in my mind's eye. "Knelt," I said. I dodged elimination. For the rest of the round, everything was under control and I advanced to the third event.


Event #3: Long Term Recall. On September 1, we were given a 12-page spreadsheet with trivia about WWE wrestlers, Fortune 99 companies, and space shuttles. I remember hating this event last year since it was so much work. But I had forgotten just how hard it was. I was spending about 2 hours a day memorizing and reviewing the information. Usually that meant working late into the night. It took so much time and energy away from training for the other events. But even after all that toil, I wasn't ready for the finals on October 1. Then Hurricane Ian forced the organizers to postpone the championship. I needed the extra three weeks, but I was still a bit shaky with the space shuttles. There were many repeated dates - more than one shuttle launched on a February 3 - but in different years, so it was easy to mix them up. There were discrepancies with the astronauts' names. E.g. in one part of the spreadsheet, there is a Frederick D Gregory but in another part, his name is Frederick G Gregory. Easy to confuse. I had no problems with the first two questions, which were about wrestlers and the Fortune 99 list. Then we got to the dreaded space shuttles. I found a new way to mess up: I was asked about mission STS-87. My mind jumped to a shuttle that launched on 8/7. Midway through, I realized that I was doing something wrong. But we only have 15 seconds to answer, and I didn't have time to fix it. Once again, I was on the cusp of elimination. But I wasn't the only one struggling with the shuttles. Three more finalists were knocked out and the round ended before I could be questioned again. Another event where I had made a dumb mistake and gotten away with it.


Event #4: The Tea Party. I was one of the six remaining finalists. We would watch five videos where a "tea party guest" told us their name, birthday, occupation, and much more. I knew this event would be extremely hard. I had written a computer program to generate scripts and then read them aloud. I can't memorize as fast as they can talk, even if they are speaking at a normal speed. My strategy was to skip some parts and focus on getting the rest correct. Everyone else would also be struggling, so maybe I could outlast them. I knew I didn't have to be perfect; I only had to be better than the bottom three finalists. By now, the ones in the first three seats had been eliminated, so I had moved up to the first chair. The first question was always about names, so I was extra careful when memorizing each guest's name. I skipped the next two pieces of information (birthday and hometown), since the other five finalists would probably be asked about those. But like the words event, if a finalist misses a question, the next finalist has to correct them. Former champ John Graham was in the second seat and recited the birthday correctly. But the third finalist forgot the hometown. The fourth finalist had to correct them, but they didn't know the hometown either. Then the fifth finalist also screwed up the hometown. And the sixth finalist. Now it was my turn again. I confessed that I had skipped the hometowns. Strike one for Matt. But four others also had a strike, so I still had chances. This event was like baseball: three strikes and you're out. 


As expected, the rest of the field continued to struggle. It was my turn again, and I was asked about the guest's occupation. Job title, employer, and date that they started working. I knew it all. A few other finalists kept racking up strikes. Then I was asked about the guest's favorite sports team. I didn't completely skip that part, but I hadn't given it much attention. I knew one of them liked the Carolina Panthers, but I wasn't sure if that was a different guest. I tried it anyways and got it wrong. Strike two. But I only had to outlast the three weakest finalists. Two were eliminated. Then it was my turn again and I had to remember the guest's three favorite foods. This was another part that got minimal attention. I hoped the answer would spring to mind within the 15 seconds that we have to answer. I saw pineapple pizza, but that might have been for a different guest and there's no partial credit if you only get one of the three favorite foods. I was out. Fourth place - my best ever.


The last three finalists advanced to the double decker. I got to relax and watch from the audience. They had five minutes to memorize two decks of cards. 2018 champion John Graham was the favorite, but perhaps James Cumming (winner of the qualifier) could take him to tiebreaks. Two-time finalist Tracy Miller was also in contention. But Tracy stumbled early, mixing up the Queen of Clubs and the Queen of Spades. James corrected his mistake, but he didn't last much longer. John won his second championship and gave an emotional victory speech. Congrats John!


I'm honestly not sure I will compete in the US Championship again. I think I have the potential to do better than 4th place, but the Long Term Recall event was so much torture. I pushed myself through those long nights of review by swearing that I will never do it again. Maybe if Long Term Recall were shorter, I would be willing. But I'm not retiring from memory competitions. Instead, I want to focus on tournaments that are internationally ranked. It will likely involve travelling to Europe or Asia, since those events are rare in America. Hopefully I can do that over the summer.

Sunday, October 2, 2022

Tuesday, August 30, 2022

Sinquefield Cup 2022

 Magnus is the favorite, but don't be shocked if someone else takes first - the field is extremely strong.



Thursday, August 11, 2022

Ratings and the Olympiad

In an earlier article (Are 2700s Overrated? Insights From the New Model), I showed that 2700s underperform when they face weaker opponents. For example, a 2750 is expected to score 70% against a 2600. This comes from Elo's formula. However, in the data, he is only scoring about 65%. Thus, if a 2750 plays 11 games against 2600s, he will probably lose 5.5 rating points (i.e. 5% per game multiplied by 11 games multiplied by a K factor of 10). So we would expect members of the 2700 club to shed points in the Olympiad. Here is how they did:

Carlsen -3

So -2

Giri +4.3

Aronian -16.4

Caruana -18.3

Mamedyarov -0.5

Dominguez -9.4

Duda -9.1

Harikrishna -3.8

Maghsoodloo +11.9

Shankland -7.5

Vallejo +0.8

Vidit -3.7

Wojtaszek -14.8

(Data from 2700chess.com)


With a few exceptions, their ratings fell. However, it was especially bad for Team USA. "2700s tend to be overrated" doesn't fully explain their performance. Some other factor must be at work.


Sunday, June 12, 2022

Friday, May 20, 2022

Norway Chess 2022

 Anand returns to classical chess! Though many players are skipping the tournament in order to focus on the Candidates, the field is still very strong.



Saturday, May 7, 2022

Are 2700s overrated? Insights from the new model

In my last post, I described the new model. Now we will look at more results, though this time I will try to stay away from all the technical jargon.

If you perform better than your expected score, then your rating goes up. But is the expected score formula accurate? The model can tell us. Here is the graph for a 2650 player facing opponents from 2600 to 2900. When a 2650 plays another 2650, the expected score is 0.5. Not surprising - when you play someone with the same rating, you have equal chances. The model and Elo's formula agree on that. But then the two lines diverge. On the far left, we have a 2650 playing a 2600. Elo's formula ("theory") says that the 2650's expected score is about 0.57. But in the data, the 2650 scores slightly worse - roughly 0.55. This means that in real life, the 2650 will lose rating points if playing weaker players. On the other side of the graph, the pattern reverses. When facing stronger opponents, 2650s perform better than their rating and gain points.


The next graph is for a 2700. It is a similar story. 2700s underperform when facing weaker opponents. However, they do better than expected against stronger players.


It's also true for 2750s:


What does this mean for the forecasts? The new model is based on the data, so it will give more weight to the underdogs. The old model was based on Elo's formula; it was overestimating the favorites. We can see this when we revisit the Carlsen-Nepo match. Carlsen was the higher rated player, so the original model gave him higher chances.

Old model
Carlsen wins the match: 82.86%
Nepo wins the match: 9.115%
Drawn match: 8.025%

New model
Carlsen wins the match: 75.535%
Nepo wins the match: 13.56%
Drawn match: 10.905%

A word of caution about interpreting the results: the model was based on games with at least one 2700 player. Will a 2000-rated amateur gain points if they play up a section? I don't know. The model was designed for elite tournaments, not amateurs. 

I don't have a clear cut answer for the question in the title. Are 2700s overrated? Yes - but only when playing weaker opponents. When playing stronger opponents, they are underrated. Should FIDE fix this problem by adjusting the expected score formula? It might be more complicated than that. There is going to be a feedback effect on the ratings. There are many rating systems that are superior to Elo; it would be better to switch to one of them instead.








Friday, May 6, 2022

New Methodology

 Before the Carlsen-Nepo match, I had noticed that my model was underestimating the draw rate. I improved it before the World Championship, but there was more work to do. The original model was based on a large database. Most of the games involved strong amateurs, such as 2300s and 2400s. Elite tournaments were a tiny minority, so the model was not focused on them. However, my forecasts are always for top tournaments. First, I needed a better sample. I only kept games where both players were 2500+. Then I estimated a model based on games where at least one player was 2700+ during the period 2010-2021.

The rest of the details are more technical. Each chess game has 3 possible outcomes; this suggests that an ordered logit model is appropriate. However, several sources said that the proportional odds assumption is often violated in the data. Instead, I used a generalized order logit. The explanatory variables were:

-the rating gap (i.e. White's rating - Black's rating)

-the average rating of the two players

-the year in which the game was played

-"elite": this variable equals 1 if one of the players is 2750+ and other other isn't. Otherwise it equals 0. This variable could matter if 2750's are overrated. Normally if a player is overrated, then their rating should adjust. When their rating is above their true strength, they won't be able to perform well enough to maintain it. However, top players mostly compete in round robins against each other. They don't mix with the rest of the pool. If overrated players face each other, then the ratings don't adjust.

-interaction terms (gap x avg, gap x year, avg x year, elite x gap, elite x avg, elite x year, gap^2, avg^2, year^2)

The terms involving Elite were statistically significant (p-value = 0.0004). Here are the results (N = 25534 games):



Interpreting ordered logit coefficients is notoriously difficult. Based on this table, I can't say if Elite players are overrated or underrated, but the Elite variable is clearly significant. 

Next, I checked if the model fixes the original problem: does it underestimate the draw rate? If both players are 2750+, the model's draw rate is only 0.19% lower. This difference is statistically insignificant (p-value = 0.832). Though the model is targeted for 2700s, sometimes an organizer will invite a local 2600 to an elite round robin. Does the model still work for them? I tested it in this sample: one player is 2700+, the other isn't. The model's draw rate is 0.14% too high, which is statistically insignificant (p-value = 0.709).

The new model estimates the win, loss, and draw probabilities. The traditional model only estimated the draw rate; the win and loss probabilities were then extrapolated from Elo's expected score formula. This allows me to test the validity of the expected score formula. I will discuss this in a different post, since most readers probably gave up somewhere around the second paragraph.


Saturday, January 15, 2022

2022 Tata Steel Masters

 I changed the model before the World Championship, but I'm still not entirely happy with it. In elite tournaments, the draw rate seems to have risen in the last decade or so. My model does account for the year in which the game was played, but its predicted draw rate remains too low for recent tournaments. No forecast - I'm still making changes.

Wednesday, January 5, 2022

Stockfish + Bongcloud vs. 2100

[Event "Stockbong Challenge"] [Site "Binghamton"] [Date "2022.01.01"] [Round "1"] [White "Stockfish 14.1"] [Black "Wilson, Matthew"] [Result "1-0"] [BlackElo "2127"] [SetUp "1"] [FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"] [PlyCount "109"] [EventDate "2022.??.??"] 1. e4 c5 2. Ke2 {The Stockbong challenge: I made Stockfish play the Bongcloud. It's move 2 and I already have a big advantage, so maybe I can hope to survive?!} d5 3. exd5 Qxd5 4. Ke1 e5 5. Qf3 Nf6 $2 {I took the wrong approach to this game. Black needs to keep pieces on the board since White can't castle. But I was happy to trade down towards a draw. I did look at 5...Qd6. However, I didn't like 6.Bc4 followed by Nc3 and Ne4 - White would get trades anyways and his pieces would be centralized.} (5... Qe6 {(SF) is stronger.} 6. b3 Nc6 7. Bc4 Qe7 8. Nc3 Nd4 {is great for Black}) 6. Qxd5 Nxd5 7. Nc3 Be6 8. Nxd5 ({ I was planning to meet} 8. Bc4 {with} Nf4) 8... Bxd5 9. Ne2 Nc6 10. Nc3 Be6 11. Bb5 Rc8 {I thought it was important to avoid doubled c-pawns. I did consider 11...Kd7, but I felt that the text was better. The rook commonly goes to c8 in these Maroczy bind structures. If instead 11...Kd7, I will probably have to spend a tempo on ...Kc7 sometime in the future} 12. d3 Be7 13. b3 O-O 14. Bc4 Nd4 15. Kd2 Bg5+ $6 {Giving away my advantage. Stockfish prefers 15...f5.} 16. Kd1 Bxc1 ({At first, I thought I was winning material with} 16... Bg4+ 17. f3 Nxf3 {, but then I spotted} (17... Bxc1 18. Kxc1 { and now there is no fork on f3}) 18. h3 $1 ({not} 18. gxf3 $2 Bxf3+ 19. Ne2 ( 19. Ke1 Bxc1 $1 20. Rf1 Bb2 $1) 19... Bxc1 $1 20. Rf1 Bxe2+) 18... Bh5 19. Bxg5 Nxg5+ 20. g4 Bg6 (20... Nxh3 {(SF) is better, but after} 21. gxh5 Nf2+ 22. Kd2 Nxh1 23. Rxh1 {, the minor pieces outweigh the rook}) 21. h4 { followed by 22.h5, trapping the bishop}) 17. Kxc1 Rfd8 18. a4 ({ During the game, I wasn't sure about} 18. Re1 f6 19. f4 Bxc4 20. bxc4 exf4 { , but Stockfish shows that Black is at least equal here:} 21. Nd5 (21. Re7 $2 Re8 $1 22. Rxb7 Re1+ 23. Kb2 Rxa1 24. Kxa1 Nxc2+ {followed by ...Nc2-e3xg2}) 21... Re8 $1 {Now 22.Ne7+ doesn't work, so White has to play 22.Kd2 and try to recover the pawn}) 18... f6 19. Nb5 Bxc4 ({My original intention was} 19... a6 20. Nxd4 Bxc4 {but then I saw} 21. Nf5 $1 {winning material}) 20. bxc4 Nxb5 { A concession} (20... Kf7 {is best (SF). The tactical justification is} 21. Nxa7 Ra8 22. Nb5 Nxb5 23. cxb5 Ra5 $1 {and Black will recover the pawn}) (20... Nc6 {is simple and White doesn't have much}) 21. axb5 Ra8 22. Kd2 Rdb8 (22... a5 $2 23. bxa6 Rxa6 24. Rxa6 bxa6 25. Rb1 {and White dominates the open file. The text aims to open the b-file under more favorable conditions}) 23. h4 ({ I was expecting} 23. Rhb1 {in order to prevent ...a5. But Stockfish correctly judges that ...a5 weakens my position}) 23... a5 $6 {This gets me in trouble} ( 23... b6 24. Ra6 Rb7 25. Rha1 { and my rooks are very passive. However, White can't break through:} h5 26. Ke3 Kf7 27. Ke4 Ke6 28. c3 (28. f4 exf4 29. d4 Kd6) 28... g5 {. Even if the White king could somehow get to d5, I could reorganize my defense with ...Rd8+ ... Rdd7}) 24. bxa6 Rxa6 25. Rhb1 (25. Rxa6 bxa6 26. Ra1 Rb6 27. Ra5 Rc6 { and Black should be able to hang on}) 25... Rxa1 26. Rxa1 Kf7 27. Ra5 Rc8 $2 { The decisive mistake, which I played quickly} ({I rejected} 27... b6 28. Ra7+ { since I didn't want to surrender the 7th rank. But Stocky shows that Black can hold:} Kg6 29. Rc7 (29. Ke3 Rd8 {followed by ...Rd6}) 29... h5 30. Ke3 Ra8 31. Rb7 Ra6 {Black's pieces look awkward, but White can't move forward without allowing counterplay.} 32. Ke4 (32. g3 Kh6 33. Ke4 Ra2 34. Rxb6 Rxc2 35. Ke3 e4 $1 36. Kxe4 Rxf2 37. d4 cxd4 38. c5 Kg6 39. Kxd4 Kf5 40. Rb3 Ke6 {triple zeros} ) 32... Ra2 33. Rxb6 Rxc2 34. Ke3 Kf5 35. Rc6 g5 36. hxg5 fxg5 37. Rxc5 Kf6 { and White can't make progress}) 28. Rb5 Rc7 29. Ke3 Ke6 30. Ke4 {I was expectin g 30.g4, but as we will see in the next note, White is not afraid of ...f5+} Rd7 {Sacrificing a pawn for activity} (30... f5+ 31. Ke3 Kf7 32. f4 exf4+ 33. Kxf4 {and the White king will invade on the dark squares}) 31. Rxc5 Rd4+ 32. Ke3 Rxh4 33. Rb5 Rg4 34. g3 h5 35. Rxb7 g5 {Trying to create a passed h-pawn, but Stockfish quickly turns it into a weakness} 36. Rh7 h4 37. gxh4 gxh4 38. c3 Kd6 {I spent a lot of time on this move.} (38... Kf5 {was the main alternative. The idea is to kick out the White rook so I can push the h-pawn} 39. f3 $1 Rf4 40. Rh8 $1 Kg6 41. c5 Kg7 42. c6 $1 {and the pawn promotes}) 39. f3 $1 Rf4 { Now my "active" rook is nearly trapped. I hoped that it would keep White's king cut off, but itisn'tenough} ({I looked at} 39... Rg2 40. Rxh4 Rc2 { , but I didn't like my position after} 41. f4 {. Either I let White have connected passers or my e5-pawn is weak. Stocky's main line continues} Rxc3 42. fxe5+ fxe5 43. Rh6+ Kc5 44. Re6 {and White wins}) 40. d4 { Now the passers just march down the board} Ke6 41. c5 Kd5 42. Rh8 Kc6 43. Rc8+ Kd7 44. Rf8 Rf5 ({I thought} 44... Ke7 {lost to} 45. dxe5 { , but Stockfish finds} Rf5 $1 {. Instead 45.c6 wins}) 45. d5 Rf4 46. c6+ Kd6 47. Rd8+ Kc7 48. Rd7+ Kc8 49. Rh7 Rc4 50. Kd3 Rc5 51. c4 f5 52. Kc3 e4 53. Kb4 exf3 54. Rh8+ Kc7 55. Kxc5 1-0