11 Jun 2018

How we made our first data-led game to forecast the World Cup

Last week, my team at The Telegraph launched our first data-led game. It allowed our readers to forecast the Russia 2018 World Cup by scoring six factors by their importance in the world’s biggest footballing event.

The result would be a personalised projection of the competition based on what the reader thinks is important, as opposed to an austere "The Telegraph Predicts The World Cup" style of forecasting.

To do this, myself and Patrick Scott had to collect a variety of data on six key areas for each team. Each of these six areas took in anything from one to six metrics to ensure that the final summary figure was a robust and accurate representation of how good that team is in that area.

The data behind the interactive

You can now see this data here. It gives each of the 32 teams a summarising score for each of the six areas, and then scales it to provide a score between zero and one.

We’ve written a bit of detail on the figures we had to dig into to get a final score for each of the six factors below. This includes the relative weightings we applied to each of the individual metrics, as some areas were deemed to be more important than others.

  • Qualifying record: Points won out of possible total in World Cup qualifiers (1.7x weighting)
  • Elo rating: Similar to a FIFA ranking, Elo ratings are a measure of a team’s strength based on their results. Teams gain or lose points after each game and are given more points for getting positive results against teams who are ranked higher than them. Home advantage and scoreline are also factored in (1.1x)
  • Weighted qualifying results: Qualifying record ranked by net Elo gains/losses in these games. This ensures that teams with tougher routes to the World Cup are rewarded more (1.2x)
  • Momentum: 12-month net Elo rating change (0.3x)

  • World Cup pedigree: Finishing positions at previous World Cups (2x weighting)
  • Performance vs expectation: How teams have over- or under-performed at World Cups. This is determined by the stage each team should have reached based on pre-tournament Elo ratings. Within this measure we’ve also factored in how teams from different continents fare in European-based World Cups, given Russia some home advantage and penalised the holders, Germany (1x)

  • Transfer value: Estimated transfer value of each squad (1.5x weighting)
  • Club quality: Players are ranked based on the strength of the league in which they play their domestic football (based on Club Elo rankings) (1x)
  • Experience: Total caps and previous World Cup appearances (0.5x)
  • FIFA 18 rating: The average Overall Rating score for each squad (0.8x)
  • Star player: The FIFA 18 Overall Rating for each team’s best player (1x)
  • Room to grow: The difference between a squad’s average Overall Rating on FIFA 18 and its average Potential rating (1.2x)

  • Honours won: Trophies won weighted by tournament Elo ratings (0.8x weighting)
  • World Cup experience: How each manager has fared at previous world cups (0.7x)
  • Record: Results with their current national side over the course of their tenure, including length of time in charge (1.5x)
  • Signs of improvement: Elo points gain/loss under current manager (2x)

  • Latest odds: Who should win based on what the bookmakers think (only metric)

  • A random number: A randomly generated number. Teams can have good or bad luck (only metric)

How it worked

Once we’d worked out all of these weightings, we had one final score for each of the six factors across the 32 teams.

This is where the reader comes into play. We asked the reader to score the importance of each of these factors from one to five, which then provides a multiplier. This introduces the “game” element of the forecaster.

To generate the winner of the tournament, we add the six newly-weighted figures together and get a final score. Each weighting has a random margin of five percentage points, to help mimic the randomness of any one World Cup game (this is in addition to the randomness of the actual “random luck” category).

Once this is done, we simulate the progression of the World Cup, with the team with the highest score in any one game beating the lower-scoring team until we get to the final.

So far, Germany have come out on top in 42 per cent of simulations, while Brazil are second-best on 25 per cent. England come out at zero per cent, but the Three Lions have still had several dozen wins.

Making a game: An element of randomness

Even though Germany and Brazil collectively share two third of the outcomes in our forecasting game, we worked hard to make sure that these two favourites didn’t dominate the game.

After all, countries like Spain, France, Portugal or Argentina all have a decent shot at the tournament. And with luck on their side, any team technically could win their next seven games and lift the trophy.

This is why we added two elements of randomness: first, the actual randomness score, and secondly, the randomness margin of ten percentage points on any one weighting.

We also ensured that we had categories that favoured other teams. While favourites Brazil and Germany dominate odds and manager pedigree, France performs best at player pedigree, and Spain's recent good run means that their form score is highest.

So Germany will indeed likely win if you score the importance of manager pedigree and odds categories highly. But you’d likely get France (the winner in 10 per cent of simulations) winning if you gave player pedigree the highest weighting.

You get the idea. We wanted this to be a game, so it was important that readers could go back and, by changing a couple of key metrics, there would be noticeable and interesting changes in how the World Cup progressed. Currently the metrics are indicating that readers are indeed running the simulation multiple times to see who else they can get winning the competition.

After every stage of the tournament, we’ll update our data to reflect the latest picture. This will most likely impact the form and player categories the most. If, for example, Argentina score five goals in their first game against Iceland, or equally if Messi gets injured in that game, their odds of progressing in both the World Cup and our simulations will be impacted. This way we ensure that our game stays relevant throughout the competition.

If you’re interested in finding out more about how this all works, check out our data or contact myself or Patrick.


Post a Comment