AIrsenal - the difficult second season

Converting a struggling Fantasy Football team into an open source success.

Friday 25 Oct 2019

Research area

Introduction

Back in April 2019 we wrote a blog post about our machine-learning-based Fantasy Premier League manager AIrsenal. This is a fun side project that a couple of us at The Alan Turing Institute worked on in summer 2018, so that it could take part in the Fantasy Premier League game for the 2018/19 football season. It did reasonably well, finishing in the top 30% of players, and we were full of optimism and ideas about how we could do better for the 2019/20 season.

What happened next?

So, with nine weeks of the season gone, how are we doing?

Erm, not so well.

AIrsenal league standings 19/20 season

That's AIrsenal, rock bottom of the Turing mini-league, and about five-millionth out of six million players.

So what went wrong?  I have a few ideas...

1) Nothing went wrong, we have just been unlucky

There is definitely some truth in this. Football, at the level of who scores or concedes goals in individual matches, has a high degree of randomness. Since keepers and defenders get four points for a clean sheet, a team conceding a goal can cost a lot of FPL teams a lot of points! To hedge against this, many human FPL players generally opt against "doubling up" or "trebling up" on defenders from the same team. AIrsenal's optimization doesn't incorporate this logic, and in the first weeks, it was telling us to field three Chelsea defenders (or a keeper and two defenders). Which was a problem because: 

2) Chelsea's defence are rubbish

If Chelsea had kept any clean sheets in the first six weeks, trebling up on their defending players would have paid off handsomely! But, they didn't. They conceded at least one goal in every game, and four in the first week! Frank Lampard's team is full of exciting young talent, but definitely not as tight at the back as the Mourinho sides from a few seasons ago. Which is a problem because:

3) Our team-level model treats all past matches equally, no matter how long ago

We use all the premier league results for the past four seasons to train our model, in order to obtain the fitted values for the attacking and defending abilities of each team. We don't have any time dependence in this process, so a match from 2016 will have just as much influence on our fitted parameters as last week's match. This was one of the things we had in mind to improve before the start of this season, but:

4) We hugely underestimated how long it would take to update the code for the new season

This is where we can definitely learn from the experience and do better next year! Although we had included the "season" as an argument in most of the function calls, we hadn't fully tested the end-to-end running of many of our procedures with the non-default value last year.

We had lots of unit tests, but many of these were broken as they relied on some team history data being available from the API, which wasn't the case before any gameweeks had been played. These broken tests masked some other real problems, which is a real lesson learned for the future!

 

But, more importantly, what went right?

Results aren't everything! Well, OK, maybe in real professional football they are, but AIrsenal is a fun project first and an FPL manager second, and as a project it's going from strength to strength! We have new people at the Turing involved as core developers, and lots of people both inside and outside the Institute have been checking out (and hopefully using!) our GitHub repo.

AIrsenal as an open source project

Back in July, Red Hat and Newcastle University held a great workshop here at the Turing about how to grow a successful open source project. Inspired by this, we are trying to make AIrsenal easier to use, and easier to contribute to. Some of the steps we have taken are:

  • Added a CONTRIBUTING.md file to the repository, giving clear guidelines on how people can collaborate on the project.
  • Made sure the instructions on how to setup and run the code are clear and reliable - apparently if new people aren't able to get up-and-running in half an hour from cloning the repository, they are likely to lose interest.
  • Looked at our use of Github "Issues", which we'd previously been using as very concise placeholders for things we wanted to consider. We now aim to write a bit more detail and context for each issue, and mark as "Help wanted" or "Good first issue" to encourage people to contribute.

We recently held an afternoon hackathon for our team where we helped people to identify and fix issues, and submit "pull requests' to integrate the improvements into the codebase. (As a side-note, this was also an opportunity for people to take a few stops towards getting a free T-shirt, thanks to Hacktoberfest, which is a great initiative from Digital Ocean to encourage participation in open source software - basically anyone who makes four pull requests to open source projects in October is eligible for a T-shirt.) 

This session was a great success and resulted in numerous fixes from five new contributors. It's probably fair to say that many of the issues were "low-hanging fruit" that the core developers could have fixed quickly, but the investment in getting new people involved will undoubtedly pay off in the long run.

Conclusion

We're looking forward to continuing to improve the AIrsenal codebase, and hopefully growing the open source community around it. We have some long-overdue new features, such as dealing with bonus points, to work on, and we're also hoping to introduce a new web front-end that would allow people to enter their FPL teams and get transfer suggestions using our model.  

Overall, to stretch a football analogy, we may have a lot of ground to make up on the (virtual) pitch, but the backroom is looking healthier than ever!  

Check out our Github repo if you'd like to get involved, either to contribute to the project or just to use it for your own FPL team.