Or Why Silumgar Is Not the Best Deck and Why 8-4s Make You a Worse Drafter
Imagine that you get up to go to the airport. Your wife drives you in and you shoulder your bags and walk towards the ticket counter. You get to security, but before you go through, you turn to your wife and tell her that there’s a fifty percent chance you won’t be coming home. That was the life of a bomber pilot in World War II. The casualty count was a state of national emergency, and the Navy decided they needed to call in someone with a special set of skills. He wasn’t Liam Neeson. He was a statistician.
Okay, technically they formed an entire organization of statisticians with the goal of improving all kind of technical and logistical problems in the war, but there was one statistician that stood out. His name was Abraham Wald, and it’s a shame that more people don’t know who he was. He was an absolute war hero, saving thousands of lives, and revolutionizing the war. A lot of his work was classified, only to be released at a later date, and he died when he was still young, but his story is still incredible; he’s basically the Captain America of statistics.
Once the Navy formed up this group of mathematicians, they set them to work on figuring out how to get more of their planes home. So they started looking at all of the planes that returned from battles. The planes were riddled with bullet holes, but the holes appeared most frequently along the wings, the body, and near the tail gunner. They knew they needed more armor, but you couldn’t just add plates of metal to the entire plane. You only got so much armor to add, so you had to make sure you got it in the right places. The determined that since the planes were getting shot in these three places, the wings, body, and tail gunner, that they should reinforce the armor in those locations and it should improve the survivability of the aircraft.
This is where Abraham Wald enters the story. You might not have seen the problem with this strategy, but Abraham Wald definitely did. The problem is that these were the planes that had survived their missions. This data wasn’t showing researchers where a plane was more likely to be hit; instead, they showed them where a plane could take a bullet and keep on flying. It was obvious that a plane could take a large amount of damage on the wings, along the body, or near the tail gunner without going down, because all of these planes were already doing that. They were missing important data, which was the location of the bullet holes on the planes that didn’t return. Wald did a bunch of very mathy and super cool calculations to figure out where the bullet holes would cause the most damage based on this missing data. Basically, he said that the places with the fewest bullet holes were the most critical, because they could probably handle the least damage without going down. So they reinforced those areas, and the survivability of US bomber pilots went up dramatically. His calculations are still in use by the US military.
This concept got a name called Survivorship Bias. A simple definition is that survivorship bias is when research is focused on survivors or winners and omits data about casualties or losers. It’s a bias of missing data. In the case above, the researchers found the wrong conclusions because they focused on the bombers that survived their runs. This same concept is used to better understand investing, medicine, and dating, and it’s also the reason why 8-4 drafts make you into a worse drafter and Silumgar isn’t the best deck in DTK draft.
Let me start by making some obvious connections between survivorship bias and MTG, and then I’m going to explain how it relates to 8-4 drafts and Silumgar in DTK draft.
We’ve all seen survivorship bias in action. For example, we just had Pro Tour: Dragons of Tarkir, which featured Martin Dang winning the whole thing with RG aggro. If someone is operating with survivorship bias, they would then say that RG aggro is the best deck in standard because it just won the pro tour. However, a better analysis would realize that the UB control deck had three copies in the top 8, and that people playing the deck had a sort of absurd win rate in standard. Meanwhile, there was only one RG aggro deck in the top 8. It seems like RG aggro would be pretty bad in a metagame full of Abzan midrange decks, but in the top 8, he managed to face three slow control decks in a row before taking home the trophy. It seems clear that these UB control decks were optimized for a metagame that was more green-centric, Dang was pretty lucky to be in a top 8 facing only the decks that his deck probably fares the best against.
Another example of survivorship bias would be when someone calls Shahar Shenhar the current best MTG player in the world because he has won back-to-back World Championships. While Shahar is definitely a very good player, this would not take into account to nine players that are currently ahead of him on the top 25 pro player rankings, where Eric Froehlich has just edged out Owen Turtenwald. Both of those two players are at the top of the list because they consistently put up good results at most of their major tournaments, though they didn’t take home the World Champion trophy.
Both of these examples show the fallacy of focusing on only the winners. For people that play a lot of Magic and follow the Pro Tour closely, it seems obvious that it is a logical fallacy to focus only on the deck that won the PT, or to focus only on the player that won the World Championship, because we see all of the non-survivors over and over again. For regular competitive players, we aren’t missing that data. We can look at the pro point totals for the year, or we can look at the other decks that made the top 8 at PT DTK. We can look at the limited records of the players that made top 8, and we can see data on how well various decks performed at all levels of that tournament. We aren’t missing so many pieces of data on the non-survivors, so we are able to draw better conclusions.
There are a few more places where survivorship bias is very common in Magic, but people seem to accept it much too readily, and that’s that I’m going to discuss here.
Is Silumgar (UB) the best deck in DTK Draft?
During Pro Tour Dragons of Tarkir this past weekend, I heard over and over from players and commentators that UB is just the best deck in Dragons of Tarkir. It seems obvious that this is the “consensus best deck” in the format. The generally accepted wisdom was also that green is just the worst color. When I heard this, I was skeptical, because green has been so good in my own experience. I wondered whether I was just getting lucky with my green decks and unlucky with my UB decks. I wondered if I was just missing something. But I also wondered if there were some effect that was causing such a large number of players to overestimate UB. I’ve definitely written articles in the past that called out the “generally accepted wisdom” as being incorrect, but I don’t like to do that unless I have some good solid data to the contrary.
Luckily, I had started my statistical analysis of Dragons of Tarkir, so over the past week I’ve been gathering data and starting to get a much better understanding of the format, and it turns out that UB is performing poorly while green is performing very well. The next question was to figure out why so many players were jumping on the UB bandwagon. As I was mulling over the topic, I came across this thread in the LRCast subreddit, which inspired me to put together this blogpost.
Imagine that a draft format has three different decks. I’m going to call them Solid decks, Slugger decks, and Achilles decks. Each of those decks has three different version that you might end up with, at varying degrees of strength, analogous to their win rate. Solid decks are the kind that are always going to be competitive, but they won’t necessarily be the best deck at any given table. A great example is the BG decks from Theros block, which were always competitive, though they were never as powerful as the best heroic decks. We’ll set the ratings for the different versions at 64, 65, and 66. Slugger decks are the kind that are usually mediocre, but sometimes it is absurdly good and nothing else in the format can compete with it. A good example would be the UR Kiln Fiend deck from Rise of the Eldrazi. It usually wasn’t very good, though you could sometimes win against bad decks just because you had a lot of evasion. But when everything came together, it was so fast that most decks could never compete. We’ll set the numbers for it at 95, 45, and 40. The last deck is the Achilles deck. This is the kind of deck that is usually very good, but when it got overdrafted or when the deck came together poorly, it went from very good to abysmal. An example of this would be the BG infect deck from Scars of Mirrodin. It was usually a very strong deck, but sometimes you would get a version with too even of a balance between infect and regular damage, with not enough power on either side, and your deck would just be trash. We’ll set the numbers for this deck at 20, 75, and 85.
This is where we start to see survivorship bias affect the community’s evaluations of a format. Because the community is so affected by hyperbole and extreme evaluations based on limited sample sizes, you’ll often see this survivorship bias have a strange effect on evaluations. Imagine that the community lines these decks up in order: Achilles 20, Slugger 40, Slugger 45, Solid 64, Solid 65, Solid 66, Achilles 75, Achilles 85, and Slugger 95. If you just take the top three performing decks from this lineup, you’ll see an interesting fight between the Achilles and the Slugger deck. People will go back and forth over which of the two decks is stronger. People will say that the Solid deck is just bad because it doesn’t hold up to these other decks. Soon, you’ll have people posting their 3-0 decklists from the Achilles and Slugger decks, and the community will quickly come to a consensus that these are the best decks.
This happens in virtually every set; early on in the format, people latch on to a deck that is occasionally very powerful, and they call it the best deck in the format. It’s exactly what is happening right now with the UB deck. If I were to categorize that deck, I would put it in the Slugger category; when the UB deck is open and comes together, it is easily the best thing in the entire format. If you face opponents that just don’t try to beat you down, you’re going to crush everybody. Meanwhile, the green decks fit into this solid category. They don’t even show up at the top of this list, so they kind of get dismissed, and you see something like when Shahar Shenhar completely avoided green in his day 2 draft, and then ended up going 1-2.
The problem is survivorship bias. People are missing huge pieces of important information. The Slugger and Achilles decks have a cost, and that cost is that sometimes they are much worse than the Solid decks. Now, it might not seem like much, but if you take the exact example above, you’ll see that the Solid deck averages out to 65, while the other two decks average out to 60. Again, this is a hypothetical example, and not a literal or exact example, but it is meant to demonstrate how a good deck can often be overlooked by people that play too heavily into survivorship bias.
This is where the Magic echo tunnel becomes very problematic. With UB, you see a small group of players with a lot of influence touting the UB deck as the best deck in the format. You see people getting insane versions of the deck, and posting their decklists, records, and videos. What people don’t show you is the drafts where they attempted the UB deck and it just fell apart. It’s not malicious on their part, it’s just that people aren’t that interested in reading about a UB deck that didn’t work. Soon, all the negative examples of the deck will be pushed out by the community, and you have this consensus start to build that UB is the best deck in the format, when it really isn’t. I’ve been running the numbers on this deck, and it’s just not performing anywhere near what people think it’s doing, and a lot of players are just missing out on a lot of great decks.
The bad UB decks are pretty bad. They might not have enough Exploit based card advantage, which means that they can’t end up winning the long game. They might not have enough Exploit enablers, so it’s too hard for the pieces to come together in the same hand. They also might be just too slow to compete against some of the faster decks, and get run over by large creatures. Finally, they often just don’t have good long game win conditions, so they aren’t able to close out the game. When all of these things line up perfectly, the deck really is unbeatable, but you just can’t throw out the negative examples when you talk about the deck, and UB has a lot of them. Meanwhile, I’ve been watching green decks that look really mediocre just put up consistently solid results. It might not ever feel like the best deck at the table, but it performs well enough on average to compete against the other decks.
This is the point. It’s vital to look at both the winners and the losers when you evaluate a format. Looking only at the good examples of a deck is going to lead to a very skewed perspective on a format, and it leads even great players to have very mediocre results. It’s a phenomenon that I’ve only seen grow over the past five years in MTG, since we have such a powerful, immediate feedback loop. The conversation on decks form so quickly based on information that is biased towards winners, and it’s very difficult for different ideas to break through that monoculture.
But there’s another aspect of MTG that is even more dramatically affected by survivorship bias. It contributes significantly to this exact problem with deck evaluation, but it also causes many other kinds of misunderstandings. This is the problem of 8-4s.
Survivorship Bias in 8-4 Drafts
The subtitle of this article is “Why 8-4s Make You a Worse Drafter” but the truth is more nuanced than that. There are lots of great benefits for drafting in 8-4s. You face better competition. If you’re better than about 80% of the field, then it will allow you to draft nearly endlessly with a very small cost. However, if a player does not account for survivorship bias, then 8-4s can actually teach them bad habits and reinforce incorrect strategies, and actually make a player into a worse drafter in important competitions.
How many limited events are actually single elimination? Pro Tour drafts use a swiss format. It’s important to go 3-0 if you want to make top 8, though it’s not necessarily required, but a 2-1 performance in both drafts can be a solid foundation to put together solid PT finishes even if you don’t make the top 8. Grand Prix drafts also use a swiss format; it can feel like single elimination when you enter the day on 7-2, but even if you pick up a few losses, you can still piece together a decent money finish. Grand Prix Top 8 Drafts are single elimination, and match wins are heavily skewed towards the person that wins the draft, so that’s the format that’s most analogous to 8-4 drafts. One of the problems with 8-4 drafts is that the format does not match up very well with the formats for which you are preparing.
But the real problem with 8-4s is the way that they reinforce survivorship bias. The problem is that when people only draft 8-4s, they can often reinforce the appearance that draft winners (or survivors) are stronger than shows up in their actual results. The problem is that 8-4 decks don’t often show you the difference between decks that would go 0-3 or 1-2 and the decks that would end up going 2-1. In a swiss draft, you get a lot more data points on every draft that you do. It’s easier to figure out which decks are the Solid decks, which are Sluggers, and which are Achilles, because you have to play out all three rounds, regardless of when you take your losses. In about 20% of my swiss drafts, I end up losing in the first round. In an 8-4, this can make me feel like the deck I just drafted is not very good. But in a swiss draft, I often go on to get a 2-1 record with those same decks. I just get more data points on a deck that wouldn’t be a round one survivor in an 8-4 draft.
One of the other big problems is that it reinforces the behavior of drafting swingy decks. Players will overvalue decks that can produce an easy 3-0, 6-0 win in an 8-4. One problem I see frequently is that players will abandon mediocre drafts; they aren’t willing to fight it out with an average deck. This happens a lot when you see people migrate out of an 8-4 draft into a swiss format. When they lose in round one, they’ll just drop from the event, instead of fighting out the next two rounds. This is a horrible practice that will quickly turn you into a worse drafter. Anybody can win rounds with great decks; that’s not hard to do. But great drafters are able to grind out wins with mediocre decks. 8-4s also cause players to value Slugger decks too highly because they overawed by the drafts that they win, but forget the ones where they lose in round one. I’ll often hear players shrug off their round one losses because they just got unlucky, but a lot of the time, those players actually drafted quite poorly. But they forget those drafts because they lost in the first round, and it wasn’t that big of a deal. In a Swiss event, when I draft a deck that goes 1-2 or 0-3, which happens extremely rarely, I know that something went critically wrong with my draft. These are often the drafts that teach me the most about the format, because I start to find the kinds of deck problems that create fatal situations.
I want to make it clear that although the subtitle of this post is that 8-4s make you a worse player, that’s not the actual point I’m trying to make. You’ll definitely face better competition in 8-4s, though the difference between skill level in 8-4s and Swiss drafts is usually exaggerated. The point is that 8-4s reinforce survivorship bias, and if you don’t take steps to fight back against that bias, then you will have major holes in abilities. Furthermore, this isn’t the kind of thing that you can simply wave away and say “well, I don’t do that, so I’m okay.” What I am describing isn’t just a bad habit that you can avoid; it’s human nature. Our psychology and our mental capabilities make us prone to survivorship bias, regardless of a person’s intelligence level. This means that you have to take deliberate steps to overcome survivorship bias.
How do you do that? The number one thing that you can do is to start keeping track of your results. At some point, I’ll put a link to the spreadsheet that I use to keep track of tournament results, and I’ll go into depth about how to use it, but this is the biggest thing that you can do to help yourself overcome these problems. In my draft spreadsheet, I keep track of the archetypes that I draft, as well as the formats, and over time I can build up a pretty good understanding of what decks are performing best for me in a given format. Recording data, specifically in a way that notes the pieces of data that human beings tend to look over, is the most important way to fight back against survivorship bias.
The other key thing is to search out information that does more than just confirm the biases that you already have. If you think that UB is the best deck in the format, you should be looking for all the reasons why it is not the best deck, rather than just seeking out information that confirms your biases. It’s very valuable to take note of the data that you are missing, make hypotheses about what that missing data might represent, and then actively seek out as much of those pieces of data that you can find.
Understanding Magic is like building a puzzle, and not just any kind of puzzle, but one that is giant. Many of the pieces have similar shapes, many of them look similar on their face. Furthermore, it is a puzzle that is simply too big for any one person or group to build by themselves. Instead, you’ll have some people working on one corner, and some people working on another. The problem with survivorship bias is that it represents the human tendency to throw away certain groups of data. This is like working on a puzzle, and simply throwing away the pieces that look like they don’t fit. This is one of the keys to success; finding the missing pieces.