Rob Weir, the statistician whom I mentioned in my last post, graciously shared with me a data base of the performance of all the openings, organized by ECO code. This allows us to create something that I’ve never quite seen before: a “report card” of all the chess openings. Which are best for White? Which are best for Black? Which are the most drawish? Which are the most or the least popular?
In this post I am mostly going to stay on the level of whole openings, in other words the first two digits of the ECO code (i.e., A0, A1, A2, etc.) In my next post I will drill down into the second digit, which I think might be more interesting.
First, a word about Weir’s data base. He took the ChessBase data base and filtered it so that he only looks at games since the year 2000 and only games where both players were rated 2200 or above. This means that they reflect current opening practice and they are all played by masters. I think this gives the truest test of the strength of the opening in human play. ChessBase itself does not give users a way to search openings by ECO code — a very peculiar omission in their program. It’s a function that presumably a programmer could add in ten minutes, but for whatever reason they have chosen not to do that, so we are indebted to Weir for repairing the oversight. [Note added 3/5/2015: This is not quite correct. See discussion in the comments.]
The entire database contains 974,926 games (conveniently close to a million). White’s overall winning percentage is 54.7%, with 35.1% of games ending in wins for White, 25.7% of games ending in wins for Black, and 39.3% of games ending in draws. I imagine that the percentage of draws is probably higher for master games than it would be for amateur games.
In his blog, Weir analyzed the rating significance of White’s advantage and found that it is about 35 rating points. In other words, when Black is rated 35 points higher than White, the game is essentially an even match: that is the point where the win percentage for White is 50 percent and it’s also the point where the draw percentage is highest. This leads to my first important conclusion from the opening database:
1. Openings don’t matter much. But they matter some.
Of course I’ve been saying this for years, but I think the statistics bear me out. Every percentage point added to White’s winning percentage translates to about 7 rating points. Even if you could play White every game, it would only add 18 points to your rating. That’s less than you could add by studying tactics, strategy, endgames, and master games and just generally getting better at chess: seeing deeper, coordinating your pieces, etc., etc.
But still, they matter some. The difference between White’s worst opening and his best (see below) is 11.8 percent, or about 80 rating points. When 100 points makes the difference between two different prize brackets, as it does in many open tournaments, 80 points is a fairly big deal.
Okay, now let’s get to the data. First, popularity.
ECO code | A | B | C | D | E |
0 | 26767 | 55694 | 34351 | 25539 | 16546 |
1 | 25950 | 37690 | 28107 | 36687 | 37051 |
2 | 21195 | 40990 | 5447 | 14775 | 5897 |
3 | 29481 | 40242 | 2018 | 32138 | 12665 |
4 | 41144 | 36859 | 31892 | 24070 | 7801 |
5 | 16159 | 26174 | 12695 | 6722 | 4472 |
6 | 8553 | 9946 | 16570 | 2405 | 20336 |
7 | 3831 | 12122 | 8382 | 8782 | 7051 |
8 | 14977 | 24004 | 9038 | 9583 | 4794 |
9 | 4403 | 22761 | 12957 | 11080 | 25470 |
The most popular ECO code is B0, which means e-pawn games other than the Big Four (Sicilian, Caro-Kann, French, and double e-pawn). Grab-bag categories like this one tend to be pretty popular in the ECO scheme, because there are so many choices — Scandinavians, Alekhines, Pircs (Pirc-es? What is the plural of Pirc, anyhow?), etc. Next in order of popularity is A4, another grab-bag category that means d-pawn games other than the Big Three (Indians, Dutch, and double d-pawn), as well as 1. d4 Nf6 games where White doesn’t play 2. c4 (like the Trompowsky). The most popular category that is somewhat unified is B2, the Closed Sicilians, although even this category also includes Sicilians with unusual responses to 1. e4 c5 2. Nf3 (which might thus bleed into unusual open Sicilians).
The unpopular openings are a bit more interesting. They tend to be unpopular for one of two reasons: either the people who set up the ECO codes overrated them, or they are too favorable for White or for Black. The latter reason makes sense, because both players have to cooperate to get into a particular ECO code. If one player has a decided disadvantage, he isn’t going to cooperate. (Remember, these are masters we’re talking about, so they know their opening theory.)
The least popular ECO opening is, sadly, one that is near and dear to my heart: C3, the King’s Gambit. It is by far White’s worst opening, with a winning percentage of 47.8 percent. And of course, White can easily avoid it, just by not moving his f-pawn on move two. So no wonder it appears so infrequently.
The next most unpopular opening, D6, is the same story in reverse. It is Black’s worst opening, the Main Line Queen’s Gambit Declined. (I told you there were going to be some surprises!) White’s winning percentage in this opening is 59.6 percent. Black can easily avoid it in a million ways, most notably in recent years by playing the Slav or Semi-Slav.
Finally, number three in our Hall of Opening Infamy (popularity division) is, I think, unpopular for both reasons. It is not very good for Black (White wins 58.9 percent) and I also think that the ECO people overstated its importance. I’m talking about A7, the “Main Line” Benoni. (1. e4 Nf6 2. c4 c5 3. d5 e6 4. Nc3 ed 5. cd d6 6. e4 g6 7. Nf3) I put “Main Line” in quotes because it’s really questionable whether it deserves to be called the main line any more. What about the Benko Gambit, 3. … b5? What about the Taimanov Variation, 7. f4 followed by 8. Bb5+? Nevertheless, at the time the ECO codes were invented, in the 1960s, the Main Line Benoni was at the peak of its popularity, and so for at least that brief period it appeared to deserve its own ECO code, A7.
Now let’s move on to the grades.
ECO | A | B | C | D | E |
0 | 50.7 | 56.1 | 54.4 | 52.3 | 58.6 |
1 | 54.4 | 54.2 | 56.1 | 55.2 | 55.7 |
2 | 53.2 | 50.6 | 53.4 | 56.1 | 52.4 |
3 | 53.4 | 56.1 | 47.8 | 57.5 | 53.6 |
4 | 52.6 | 52.8 | 55.7 | 56.1 | 52.7 |
5 | 58.0 | 52.0 | 53.1 | 58.2 | 52.2 |
6 | 55.3 | 55.1 | 54.9 | 59.6 | 57.3 |
7 | 58.9 | 55.7 | 55.5 | 55.6 | 57.1 |
8 | 57.9 | 52.4 | 55.7 | 54.9 | 56.5 |
9 | 58.0 | 52.9 | 55.9 | 56.5 | 57.5 |
Oh yeah! Here’s the juicy stuff! Here’s where you can look up your favorite opening and see how it rates.
Let’s start with the good openings for White. I’ll say that an opening is good if its winning percentage is 57.5 percent or higher. First of all, if you play 1. e4, forget it. There aren’t any good openings for you. Your best hopes are B0 (your opponent plays something other than the Big Four), B3 (Sicilian with 2. … Nc6), or C1 (Main Line French, i.e. 3. Nc3 on the third move). All of these give you a 56.1 percent win percentage. Unfortunately, you don’t have any control over the first two and you have only limited control over the third.
If you play 1. d4, there are tons of good openings for you. A5 (e.g., Benkos and Budapests), A7 (mentioned above), A8-9 (all Dutches), D3, 5, 6 (Queen’s Gambit Declined except the Semi-Slav and Tarrasch), E0 (Miscellaneous 1. d4 Nf6 2. c4 e6 openings), and E9 (Main Line King’s Indian). Again, the choice is mostly out of White’s control, but the good news is that there are so many attractive possibilities.
Now suppose you are playing Black. What are the good choices? Well, as I said before, you can hope your opponent plays the King’s Gambit. But that happens only once every 500 games, so you’d better have another strategy.
Actually, the best strategy for Black is blindingly simple. And it’s hardly original. Against 1. e4, play the Sicilian (B2-9). Although B3, B6 and B7 (roughly the Sveshnikov, Richter-Rauzer and the Dragon) aren’t great, everything else scores 47 percent or better, which is really good. Also, because of the popularity statistics you have a very good chance of getting a Closed Sicilian, which is basically a coin flip (50 percent).
Against 1. d4, play the Nimzo-Indian (1. d4 Nf6 2. c4 e6 3. Nc3 Bb4), openings E2-E5. Again, they’re all in the 47 percent range, which is the best that Black can do in a d-pawn opening. The only problem, of course, is that White knows this, and so 3. Nf3 (E1) has become an ultra-popular response. In fact, it’s the most popular ECO code that doesn’t start with 1. e4. And code E0, moves other than 3. Nf3 (principally 3. g3), is even better for White. Sorry about that! There’s no free lunch.
Finally, I’d like to end the post with some musings about Fair Chess. A Fair Chess opening is one that offers both players about a 50 percent chance of winning, and you can see from the table that there are two of them: A0 (miscellaneous first moves) and B2 (Closed Sicilian). Both of them are still very slightly in White’s favor, but you can make them even fairer if you want. For code A0, you simply say that White is not allowed to play 1. e4, 1. d4, or 1. c4. Furthermore, if White chooses 1. Nf3 and Black plays 1. … d5 in response, then White is not allowed to play 2. g3. This takes out variations A07 and A08, which are slightly better for White. The remaining variations, A00-A06 and A09, have a combined win percentage of 50.2 percent, which is very close to even. In 1000 games, White would score 502-498.
But there is an even better and simpler version of Fair Chess: just play the Closed Sicilian (B20-B26). Here White’s win percentage is 50.1 percent, meaning that in 1000 games White would score 501-499. This advantage is so small that I think Black would be willing to live with it. In fact, it’s certainly within the sampling uncertainty (if you view this database as a random sample of all possible competently played chess games).
Why does Fair Chess matter? Because it can replace Armageddon! Longtime readers of this blog know that this is a huge issue for me, because Armageddon (where White gets a time advantage but Black gets draw odds) is a distortion of regular chess. It is generally used in playoff situations, for instance in matches where the players remain tied after several rounds of blitz chess. I believe it is an abomination that should be exterminated.
By contrast, we have an option that is real chess. Simply play Fair Chess until somebody wins a game. As noted, both versions of Fair Chess give White a tiny, microscopic, meaningless advantage, but we can cancel out even that advantage by giving Black a choice of which version to play. He can choose either A0 without A08 or A09, or he can choose the Closed Sicilian (B2 without B27-29).
What do you think? Do you like my idea of replacing Armageddon by Fair Chess? Which version of Fair Chess would you choose? Do you have any other comments on the opening grades?
Next time, we’ll get to look at the good, the bad, and the ugly of individual opening variations (the third digit in the ECO codes).
Addendum: Actually, the rules for A0 Fair Chess need to be a little bit more complicated, because of the possibility of transpositions. For instance, after 1. Nf3 d5 White can play 2. d4, transposing into a double d-pawn opening where he has a distinct advantage. I’ve tried tinkering with the rules to say what moves are “forbidden,” but it tends to get messy. Perhaps the best way, in this day, is to have a computer program turned on for the first five moves that tells you whether the opening can still be considered A00-A06 or A09. White just needs to know that he is supposed to play a flank opening that is not an English or a King’s Indian Attack, and the computer will arbitrate in case of a dispute.
A slight revision to the rules for B2 Fair Chess would also be necessary to prevent White from transposing into an open Sicilian a move or two later than usual. I would suggest the following:
1. The game starts 1. e4 c5.
2. White is not allowed to play 2. Nf3.
3. White is not allowed to play d4 within the first five moves.
I think this would sufficiently force White into a non-open Sicilian setup.
{ 12 comments… read them below or add one }
The point of Armageddon is that you are guaranteed a result in a set period of time. One game, one result. In your Fair Chess idea, players could play for eternity with draw after draw and no result. I don’t see how that solves the problem.
I am a bit confused. When you wrote that “ChessBase itself does not give users a way to search openings by ECO code,” you must be talking about something other than the Filter command (Ctrl-F). When I searched for A02 and restricted the results to games between masters and since 2000, I found 629 games in my main database.
Your results are interesting. I am a bit concerned about temporal effects, i.e. which chess epoch you are playing in. Who played the Berlin defense before 2000? Now it seems almost every top player has tried it. And amateur players often imitate the opening choices of the Great Masters.
Hi Michael,
I couldn’t find it on the menu of options, but I didn’t know about any keyboard shortcuts (i.e., Ctrl-F). To be honest, I don’t know ChessBase 13 very well yet. Previously I had something like ChessBase Light 9, and it’s taken me a while to figure out how to do the things that I knew how to do on the earlier version. (I hate upgrades. They always make it harder to do the things you want to do, and the new stuff is never worth it.)
So I should probably have said that *I* couldn’t figure out how to do an ECO code search, not that it couldn’t be done.
Rob Weir got 655 games in code A02, so the two of you are in the same ballpark but maybe with a slightly different cutoff date. (Maybe he included the year 2000 and you didn’t?)
Yeah, of course you can search by ECO code; it’s one of the first features you would put into any chess database. It was probably in ChessBase 1 🙂
Dana – it’s in the same search dialog with all the other filter criteria, such as player name, year, and rating.
This is really bugging me, but I think I have a possible explanation. I have ChessBase 13 (the program) but I do not have Megabase or whatever they call the database that comes on a hard disk. Instead, I access their online database via ChessBase 13. There are two reasons I do this: one is that I’m a cheapskate and don’t like to pay for something I can get for free. The other reason is that the online database is constantly updated, but a hard-disk database would be static.
The absence of the ECO code search (which, believe me, I’ve looked for) may be their way of penalizing cheapskates like me. “Okay, you can use our online database, but we won’t let you do the most elementary opening search.”
Is this possible? Am I missing something? Or is my ChessBase 13 just defective?
Maybe that’s it! I haven’t used their online database, so I don’t know anything about searching it.
The ECO is to openings as the
Mercator Projection is to cartography.
I question whether these openings would remain fair for long if opening analysts turned their full efforts to them, as they surely would if the outcome of major tournaments was known to hinge on a single fair-chess game.
I also think that relative performance of openings hides some gotchas. Some players have a set opening repertoire against all opponents, but others choose their opening based on their opponent. I believe this is why in chess.com’s master database the Exchange French does better for Black than for White. It’s hard to imagine that Black has any advantage in this opening–even for a French player like me it’s hard to imagine!–but White may be playing it out of fear, and sometimes that fear will have been justified…. I wonder how many Closed Sicilians arise this way as well.
For the top letter/number openings you might try limiting the search to games where the opponents’ ratings were within a certain limit, say 100 or 200 points; there might be enough games left for that to work.
I like the idea in your last paragraph and I’ll suggest it to Weir, if he isn’t reading these comments already.
The French Exchange Variation is an interesting case. I think you’re right about the psychology, that people go into it because they’re scared and not because they’ve actually studied the Exchange Variation. Of course, I had a recent post about it where I discussed a positive reason for White to play it and a strategy he could use. So it *is* possible for a master player to play it as White for good reasons, and I would expect such players to have a slightly better than 50 percent score.
Playing the devil’s advocate, though, there is one reason that Black could (in theory at least) claim an ultra-small advantage in the French Exchange. In the opening White usually has an inherent advantage in space and mobility, while Black has an advantage in *information*–White has to choose a setup first, and this often lets Black react accordingly. In the French Exchange, White has given up his advantage in space and mobility, but Black still has his advantage in information!
That’s one reason that Black does well after 4. Bd3; he plays 4. … Nc6 and the bishop either becomes a target or White commits himself to a certain pawn structure with 4. c3 (thus ruling out c4, at least for the time being). Black has definitely won the information battle. The thinking behind Mike Splane’s move 4. Be3 is that it’s very non-informative. Now maybe Black will play 4. … Bd6 and then White will win the information battle.
I’m not 100 percent convinced of this “information theory” myself, but it at least gives you a different way of thinking about what’s going on in a variation like the Exchange French.
Hello Dana,
Your statistics show White’s winning expectation is 54% from the symmetrical opening position. Shouldn’t it remain 54% if the position is still symmetrical after a few moves, for example after moves like 1.e4 e5 2. Nf3 Nf6, or 1. e4 e6 2. d4 d5 3. ed5 ed5. White still has the advantage of the right to move first but somehow chess “wisdom” dismisses these lines as equal positions.
What do the statistics say about these two opening lines?
If the positions really are statistically equal, how do you explain the disappearance of the 54%-46% advantage for White? Something doesn’t add up.
Hi Mike,
I think you might find my next post relevant to your question. White’s third-best opening variation according to the data … is a symmetric position! Not only has White maintained his 54-46 advantage, but it’s gone up. On the other hand, it’s also true that some of the most drawish opening variations (three of the top five) are symmetric positions.
What this shows is that not all symmetric positions are created equal. In particular, a symmetric position where there have been no pawn exchanges is still a position in its embryonic stage, full of possibilities. I think that the drawing chances go up, and perhaps White’s advantage goes down, after a file opens up or the pawn formation becomes locked.
Maybe ECO codes are just useless today. Here is what GM Ian Rogers wrote in a Chess Life Online article about the Reykjavik tournament:
“Showing that the Encyclopaedia of Chess Opening codes, developed by Chess Informant in the 1960s, are no longer essential knowledge for top players, neither Carlsen nor Hammer knew that the A45 code represented the Trompowsky opening. (Since Carlsen had twice prepared for world title matches against Viswanathan Anand, and the Indian had used the Trompovsky to win a critical game against Anatoly Karpov in their 1997 FIDE World Championship match, team MC/Hammer’s guess of the Reti Opening is doubly surprising.)”
http://www.uschess.org/content/view/12985/806/