Sila’s March Madness Code Challenge
Sila added a wrinkle to its annual NCAA March Madness 2017 Tournament competition by having its programmers write algorithms in code to generate their brackets, rather than hand picking their teams.
The rules were wide open; people could use any data set they wanted and they could write their code using any language or tool. The only requirements were that the contestants had to provide the code used to generate the brackets, a file showing the output of the code, and a README file describing the algorithm and how to run it.
There were two different algorithm competitions. One was to determine which bracket-choosing algorithm produced the best results, and participants entered their generated bracket online at CBSSports.com.
The second was a subjective competition, where a panel of three judges determined which algorithm they liked the best.
There were not any other stated criteria for the subjective competition, but Sila’s three judges found themselves influenced by two factors.
- Could they understand what the algorithm was trying to accomplish, and why?
- Did the algorithm accomplish something interesting, unique, or innovative?
Said one judge: “Ultimately, we liked aspects of each of the algorithms but ended up going with a team in our Washington, D.C., office. They took several factors into account for each game, including the teams’ offensive efficiency, defensive efficiency, luck, and tempo to figure out an expected score. If the calculated score was less than five points, they picked a winner at random.”
The competition was open to all employees in Sila’s five offices nationwide: Washington, D.C.; Seattle, WA; Shelton, CT; Annapolis Junction, MD; Denver, CO; and Chicago, IL.
In the following sections, the winning and runner-up teams each described the challenge and the rationale behind their algorithms.
Winning Team (from Sila’s Washington, D.C., office)
The challenge to write an algorithm to predict the outcome of the NCAA March Madness Tournament immediately struck our fancy as basketball fans and technical enthusiasts. We knew basketball and we knew how to write code, but we were very unsure of ourselves when it came to data analysis and algorithms. After spending some time looking at different metrics and data sources, we found that Ken Pomeroy’s kenpom.com did most of the analysis for us. Kenpom provides offensive and defensive efficiencies per 100 possessions, tempo (possessions per 40 minutes), and luck, all adjusted for the opponents of each team. Given these metrics, we were able to come up with a formula to predict the points scored by each team.
Our algorithm generated the points scored for a team by generating the points scored per possession using the luck-adjusted offensive efficiency of the team and the luck-adjusted defensive efficiency of their opponent, and multiplied by the average tempo of the two teams (the total number of possessions expected to be played by each team).
The average tempo is a very important piece of the puzzle because that determines how many possessions will be played and, ultimately, how many points will be scored. Given the nature of the NCAA March Madness tournament, we felt that we had to incorporate an element of “madness” to our algorithm. With that thought in mind, we added a random component to our algorithm where if the outcome of a game was within five points, we randomly chose the winner.
Runner-up Team (from Sila’s Connecticut office)
When we first decided to create the algorithm, we were faced with two questions. First, how detailed do we want to get? We knew that there were certainly statistics that could help predict the outcome, but we also knew that more variables involved might have a negative impact on our algorithm.
Second what variables should we use? We didn’t want to just create an algorithm that gave each team a score based on their stats. We wanted to have an algorithm that compared teams to each other. In other words, a team with a lot of points may be good, but if they played teams that were not ranked high throughout the season, then they may not be as good as they seem.
The goal was to create a simple algorithm that would take into account these key areas: opponent comparison, strength of regular season schedule, and team rank.
Based on those areas, we created an algorithm that used data mining in SQLite that would output a score for the teams in each game of the tournament. The team with the higher score would then move on to the next round and the score would be recalculated based off the new projected opponent.
Once we could test the algorithm, we quickly realized that we needed to modify it. The algorithm took the team rank into account too much and basically chose the higher seed in each matchup. To account for this we added weights to some variables. Adding the weight allowed us to play around and test which weights seemed to be the most logical fit. With a few days of testing, we were satisfied with our result and decided to stick to an algorithm.
After seeing the results, we have some ideas on how to optimize it for next year and we look forward to competing again!