tirsdag 13. desember 2016

Is It Possible to Determine Caster and Faction Strength?

I just saw a post from a web site that allows people to self report battle results, and then posts faction rankings and a top 10 caster list based on this data on Facebook.


I did some quick analysis and there are numerous problems with the data set, which lead me to believe that these top 10 lists are not the list of the strongest casters and factions, but that they are so noisy and inaccurate as to be almost random.

Analysis of the data set

Bear in mind I am not a statistician, but I have studied math and do have experience from work with programming, algorithms, game design and analysis of game statistics. One of the things I have learnt that it is bloody hard to get useful information out of data!

Bias Due to Self Reporting

All of the data is based on voluntary self reporting. This is always a low quality data source. People can choose to report or not report their games for any reason, and this introduces an unknown amount of bias in the data. As far as I can tell people do not need to log in to report games, so even though the app reports a number of players, I am unsure how they can know.

From just looking at the reported games, it seemed pretty obvious to me that a lot of the games with some casters come from only a very few individuals.

Too Low Volume to Do Analysis

The amount of data is low. The are around 180 warcasters and warlocks in the game, and as I understand it 3500 games have been reported (as of today). This means that on average there are less than 20 games per caster. You cannot compute anything out of such a low number of matches per caster.

The low number of matches per caster means that by chance popular casters will tend to go near 50% and the middle of the list, while less popular casters go to the top and bottom of the list.

To prove this, I sorted the list by the most games reported and. All of these are well known powerful casters, and none of them are in the "top 10 casters" reported by the website.

Confounding Variables

The data does not contain a vital piece of information - accurate estimates of player skill. WARMACHINE is a skill intensive game. Being better is much more important than relative faction or caster strength! If an experienced player plays a new player he is almost certain to win. When you have confounding variables that are likely to be much more important than the variables you want to measure, and don't have those variables, you have a problem.

Lack of Data Validation

You cannot discount the possibility of people simply reporting invalid data. Probably this would be rare, but with so few data points one bad apple could totally destroy the results. We don't know if this has happened, and probably not, but it cannot be ruled out - people do the strangest things.

Invalid Use of Elo Ratings

For some reason the factions and casters have been giving an Elo rating, but Elo ratings are based on the skill of single people playing a game. You cannot apply an Elo rating to a caster played by several different people. Using Elo this way is akin to giving the white pieces in chess an Elo rating.

The Results are Obviously Wrong

If you look at the list of top casters and top factions, then the numbers seem to have nothing to do with what people win tournaments with. Since actually winning tournaments is the litmus test for what actually is strong in reality, this disparity shows that what is being computed is not what is claimed to be computed.

If you filter the data set to only report tournaments, you can see that even then the top casters are missing most of the really good ones:

The win rates reported are unlikely to be the real win rates. Warmachine is a well balanced game, and I would be very surprised if any caster had an intrinsic win rate of more than 55% against a field of other strong casters.


While I appreciate the effort in setting up this website, and it would have been a cool thing to have such information, a system like this would need strong quality control in order to do what is wanted. You just cannot get quality information from poor quality data, and having lots of poor quality data is not going to help a lot.

I would advise everyone to not use this data to support any conclusion about what factions are better or worse, which models to buy, or what casters to use in a tournament.

Possible Improvements

I would guess the goal of a project like this would be to figure out which casters or factions are stronger, since that is what the reports posted contains and what people want to know.

One way of doing that would be to just ask people the question directly. After you have played a lot of games, you get some idea of what casters are hard to beat, which ones are fairly even, and which ones are relatively easy. Of course the skill of the people on your local meta might affect this, but this should average out as more opinions are gathered.

Another would be to purely base the statistics on data that are not prone to self-selection, and where single persons cannot skew the data as much. The obvious source here is to gather tournament data. Which factions are strongest? Well, it's the ones that win the most tournaments. Which casters are strongest? Also the ones that win the most tournaments. If you are able to track the players then you can compute their Elo, which would also help in removing player skill from the equation. There is of course some bias in that good players will take the strongest options,. It would probably give reasonable data, though.

1 kommentar:

  1. A very good and interesting analysis Christian. I agree with your points.


Merk: Bare medlemmer av denne bloggen kan legge inn en kommentar.