e4stat: Reforming the Candidates Cycle: Discussion

I am thrilled that my article, "Reforming the Candidates Cycle," was published in Chess Life. To save space (an important consideration for magazines), the methodology and discussion sections are posted here online instead of in print.

Reforming the Candidates Cycle: Additional Discussion

By Matthew S. Wilson

The proposed Candidates cycle begins with 12 players competing in 4 Grand Prix tournaments. Why use 12 players instead of a different number? This is based on general considerations rather than a statistical model. Ratings fluctuate – for instance, around early 2015, Grischuk rose all the way to #3 in the world, but within a year he dropped out of the top 10. We want to invite enough players so that we are sure that the best one is not accidentally excluded. On the other hand, inviting too many players can also create problems. Pardon me for a moment while I take one last swipe at the knockout world championships. At the beginning, there were 128 players. Garry Kasparov famously referred to some of them as “chess tourists.” Many participants can only play the role of the spoiler and by chance, a few of them will eliminate a top player. This just makes it less likely that the tournament will successfully crown the best player.

So what is the right way to navigate the extremes of too few players and too many players? I felt that 12 was the right number. It ensures that no one in the top 10 will be excluded. The most recent Candidates Tournament featured 8 players and somehow that did not include World #2, Vladimir Kramnik. That suggests that 8 players are not enough. FIDE recently expanded the Grand Prix to 24 players, but that necessarily involves participants who are not in the top 20. It is highly unlikely that someone outside the top 20 is the best in the world, so that is probably too many players. However, my cutoff of 12 players is a bit arbitrary, and reasonable people can differ on the best number.

It is preferable that players are invited on the basis of rating rather than on success in a qualifying tournament. That is because ratings reward consistently strong performance, rather than a strong performance in a single tournament. As a practical matter, to find sponsorship it may be necessary to allow the organizers to nominate a player. More discussion of that later.

I would be surprised if this proposal was adopted with no alterations. Since some modifications are almost certain to occur, it’s important to know which changes would be minor and which would be major.

-The “artificial” environment in the statistical model. In the model, Player A is exactly 50 rating points stronger than all of his rivals. It is very unlikely that a real tournament will precisely replicate this situation. Earlier, I discussed the possibility of Player A’s nearest rival being fewer than 50 points behind him. Now for the other possibilities: (1) All the opponents are more than 50 points weaker (2) One or more opponents are 50 points weaker and the rest are more than 50 points weaker.

The goal of the World Championship cycle is to identify the best player. Clearly, Option #1 just makes that even more likely to occur. The more that Player A towers over his rivals, the more likely it is that he will become World Champion. So if the model says that Player A has a 90% chance of reaching the World Championship match, then his actual chance will be above 90%. That is not at all bad for the chess world – my proposed Candidates cycle turns out to be better than advertised.

Option #2 leads to a similar outcome. “Player B” is the individual 50 rating points below Player A. The other contenders are “Player C,” “Player D,” etc. In the model, Players B, C, D, etc. were all equally good, so they had equal chances of winning.

Now instead we’ll suppose that Players G and H are more than 50 points weaker than Player A, as in Option #2. This lowers the chances of Players G and H – weaker players are less likely to win. However, this necessarily raises the chances of all the other players, including Player A.

So once again the proposed Candidates cycle works better than advertised. Player A earns the right to play in the World Championship match even more often than predicted.

The practical implications? It is not a big problem if the organizers nominate one or two players who are slightly weaker than the rest. But there is one possible issue. Having more organizer nominees will crowd out players who would have earned the right to participate due to their rating. This raises the risk that the best player in the world is not even invited to the Candidates cycle. As a general guideline, I think that it would be fine if there were one or two players nominated by the organizers, but not many more than that.

-The 6-game rapid tiebreaks. Changing this would have a small impact. Tiebreaks are first applied after the Grand Prix cycle of four 12-player round robins. After that many games, ties are less likely. Changing the tiebreak system to 4-game rapid matches or 2-game rapid matches will lower Player A’s chances by about 1%. Other tiebreak systems also have a very small effect. Ideally, ties would be broken by a short match at classical time controls, but that may be impractical since it would take a few extra days. Organizers would need a contingency plan to find accommodations and a playing hall in case of ties. Though 6-game rapid matches may be the best solution, it is not critical for the Candidates cycle proposal.

-The scoring system. Each game is scored as 1/0.5/0 for win/draw/loss, and each player’s total score is the sum of the points scored in all the previous tournaments in the cycle. For instance, if you scored 25/44 in the Grand Prix and qualified for the Candidates Tournaments, you don’t start with 0/0 in the next stage – you start with the 25/44 you earned in the Grand Prix. This is a very important feature of the proposal.

Consider two players who have succeeded in the first stage of a Candidates cycle, qualifying them to continue to the second stage. Player 1 qualified by scoring a dominating 9.5/11. Player 2 managed 6.5/11, barely enough to qualify. There is definitely evidence that Player 1 is better than Player 2. But if we have them both start from 0/0 in the second stage, we have discarded useful information. The fresh start for both does not reflect the fact that Player 1 did better in the previous tournament. Having them start from their 9.5/11 and 6.5/11 scores is the simplest and best way to account for Player 1’s superior performance in the earlier stage.

Giving everyone a fresh start at 0/0 in each stage of the cycle has large implications for the 50 Point Principle. Player A’s chances of making it into the World Championship match plummet to 73.6%. We want the best players to qualify for the World Championship match. But there is always a chance of an upset. Player A has a good chance (about 55%) of winning the Grand Prix. If we use the total score as in the proposal, then Player A’s performance in the Grand Prix can provide a buffer against upsets in the later stages. Similarly, his probable victory in the Candidates Tournaments can offset potential poor form in the Final. Starting everyone at 0/0 in each stage takes away that buffer. Adopting the proposed scoring system is a virtually costless change that can greatly increase the chance that the best player wins.

I cannot recommend the Grand Prix Points scoring system. First of all, it is more complicated the the 1/0.5/0 system for win/draw/loss that every player knows. More importantly, distortions can arise.

A few examples will illustrate what I mean by “distortions.” Clearly, it should be better to win a Grand Prix Tournament with 8/11 than with 7/11. But either result yields the same number of Grand Prix Points (170). Also, it is hard to see why the gap between 1^st and 2^nd is 30 points, but the gap between 4^th and 5^th is just 10. These distortions are not just hypothetical problems; they had a decisive impact on the 2014-2015 Grand Prix cycle.

The top two players from the Grand Prix qualified for the Candidates. According the Grand Prix Points system, those players were Caruana and Nakamura (the tables below only show the top four players). Tomashevsky finished fourth in Grand Prix Points, even though he did just as well as Caruana and Nakamura! The reason? Tomashevsky “wasted” a point by winning Tbilisi with 8/11 (1.5 points above his nearest rival), when 7/11 would have sufficed and given him just as many Grand Prix Points. Caruana and Nakamura spread out their victories more “efficiently” and thus transformed the same 19/33 score into a larger number of Grand Prix Points. And thus Caruana and Nakamura qualified for the Candidates while Tomashevsky was left out.

Simply using the 1/0.5/0 system is easier, fairer, and it doesn’t cost FIDE anything.

-The World Champion participates in the Candidates cycle. This is a break from tradition that may be unpopular. The tradition can be preserved, though it will take one more event to maintain the 50 Point Principle. This may well be a cost that the chess world is willing to bear.

Variation: The World Champion does not play in the Candidates cycle. Instead, he plays a match against the winner of the cycle. The Candidates cycle is the same as before, except for two changes. First, the top 2 players from the Final face off in a 24-game match. The top player from the Final will have draw odds in case the match is tied 12-12. The winner of this match will be the Challenger in the World Championship. They play a 26-game match. If drawn, there will be a 4-game tiebreaker at classical time controls. This approximately satisfies the 50 Point Principle, though as you can see, it takes an extra match in order to do so.

What are the benefits of having the World Champion participate in the Candidates cycle? First of all, when the Candidates cycle whittles down the field to one player instead of two, there is a greater chance that the best player will be eliminated. That is why the original proposed Candidates cycle reduces the field to two players, who then play a match for the title. Second, having the World Champion play alongside the candidates grants us useful information. That information can be used in assigning draw odds in case the match is tied. Tiebreak systems tend to be very arbitrary, but in this case, the draw odds are earned by performing better against the same opposition. If the World Champion does not participate in the cycle, then this information is never obtained, so there is no basis for granting draw odds. We are left with the unpleasant question of how to break ties. Many fans would rather not see the World Championship title determined by rapid games, which has already happened in Topalov-Kramnik and Anand-Gelfand. And wouldn’t participation by the World Champion increase interest in the Candidates cycle?

Nevertheless, this is not an essential feature. The variation above can accommodate the more traditional approach.

-The draw rate in the statistical model. As described earlier, the simulations assume that there is a 62.5% probability of a draw. In the rapid tiebreaks, the draw probability is 40%. This second number comes from the draw rate in the Paris rapid section of the 2016 Grand Chess Tour. Small changes in the draw rate have only a small impact on the results. This is especially true for the rapid tiebreaks, since they are frequently not needed in this format. We can tweak these parts of the model if you disagree with them, but it will not do anything dramatic to the results.

However, the same cannot be said about large changes in the draw rate. There is no reason to think that 62.5% is wildly inaccurate for the regular World Championship, but it would not apply to the Women’s World Championship. The draw rate for those tournaments is closer to 50% (based on the 2011-2012 Women’s Grand Prix. Lower rated players tend to draw less often). A lower draw rate means that there will be more wins and more losses. This means that upsets become more likely. Suppose that Player 1 is expected to score 60% against Player 2. With an 80% draw rate, Player 1 never loses; upsets are impossible. With a 0% draw rate, an upset would not be very surprising. As discussed earlier, the top player has a better chance of prevailing if they play longer events. Thus, upsets are more likely when draws are rare and less likely when events have many games. Therefore, in order to achieve the same level of certainty that the best women wins, we need to have more games in the Women’s Candidates cycle.

Hou Yifan recommended that FIDE use the same format for the Women’s World Championship as the regular World Championship. This would undoubtedly be a large improvement. But due to the different draw rates, duplicating the formats will not duplicate the 50 Point Principle. With a 50% draw rate in my proposed format, Player A reaches the World Championship 83.58% of the time, compared to 89.66% earlier. Applying the 50 Point Principle to the Women’s World Championship will require more tournaments (or longer tournaments) for the women than for the men. But that also may strike people as unfair. A difficult and controversial problem has arisen.

e4stat

Thursday, February 2, 2017

Reforming the Candidates Cycle: Discussion

No comments:

Post a Comment