Winning the MIT Sloan Sports Analytics 2022 Hackathon

Daniel Lee
9 min readJun 9, 2022
Daryl Morey, Frankie and me, Fabrice Mulumba, and Jessica Gelman; credit: MIT SSAC 22

March 2022. I participated in the SSAC22 hackathon. I showed up, found a teammate, and won. Here’s a writeup about our project, our strategy for winning, and how we did it.

The Hackathon Presented by ESPN and NHL

The Data

All hackathon participants were provided data from the 2020 Stanley Cup Finals. This included:

  • Tracking data for 40 players, the puck, and the referees.
    This data is really cool. There are x, y, z positions with estimates of velocity recorded at ~100 Hz. The data are from chips attached to jerseys and in the puck.
  • Play-by-play data.
    Two separate streams of play-by-play data were included: hand-generated and system-generated. The hand-generated data is more correct, but has all the characteristic problems with any human-generated data; every record of the scorekeeper’s update is included. The times are approximate and sometimes off by a minute. But ultimately, the players’ intent is captured by scorekeepers.
    The system-generated play-by-play data based on what it inferred from the positional data. In the data we had, this data was not linked to events after, e.g. the result of a shot attempt was not linked back to the shot itself.
  • Other meta data.
    Player information, rink information, game time, etc.

Data was provided for each of the 6 games in the series. For a sense of scale: one game has about 1.5M rows of tracking data with 1.5 GB of JSON files across the different types of data.

The Hackathon

There were two divisions for the Hackathon: Student and Open. The competition itself had very little structure. It started at 9 am on Thursday and the organizers asked us to be ready to present by 3:45 pm that day.

Each team would present to the judges starting at 4 pm and the top teams would present in the finals. We didn’t actually know how many teams were competing in each division or how many would make to the finals.

The Result

I met Fabrice a few minutes before 9 am. We made it to the finals in the Open Division. We spent a couple hours cleaning up our presentation on Friday. We presented on Saturday morning (sorry, no video available). And we won!

The Winning Project: Sloan Goalie Card

We focused on a simple question.

Does goaltender depth matter?

Having access to x, y, z position of every player meant that we could analyze where the goalie was at the time when shots were taken. Speaking to some hockey people, we found out that this data wasn’t publicly available, so this would be one of the first attempts at this type of analysis.

In the allotted time, we pulled off a quick analysis of goalie depth and built the Sloan Goalie Card web app. Since we don’t have video of our presentation available, I’m annotating our slides below. (Don’t worry, there were only 10 slides total.)

Slides 1 -3. Introduction, Focusing on Process.

Hockey is a hard game to analyze because of the few events of interest that we observe, especially in a single series. There was only 1 winner of the 2020 Stanley Cup Finals. There were only 6 games played. There were only 33 goals scored in the whole series.

We focused on the process leading to events of interest. For our project, we looked at how goals are scored. There were 500+ shot attempts for 33 goals scored. This was something we could work with.

Slide 4. The Problem: Does Goaltender Depth Matter?

Fans have a sense of whether their favorite goalie plays more aggressively or more defensively. Coaches have thoughts on when to come out of the crease and when to hang back. Without data, this is anecdotal.

Can we quantify the impact of goalie depth?

Slide 5. Statistical Modeling of Shot Percentage with Goalie Depth.

We wrote a statistical model to estimate shot percentage taking into account goalie depth. The statistical model was a Bayesian logistic regression implemented in Stan.

Slide 6–7. Results.

This visualization of the statistical model is goalie-centric. Blue indicates low shot percentage, red is higher shot percentage. Along the x-axis is depth of goalie, y-axis is shot distance.

According to this model, there’s some difference between the two goalies. Since goals scored are low-probability events, we’re looking at the visualization for large differences. Estimated probabilities shouldn’t be overly scrutinized.

Does goaltender depth matter? These results say:

  • Andrei Vasilevskiy (Tampa Bay) is better playing defensively than aggressively.
  • Short shots taken against Anton Khudobin (Dallas) have a higher percentage than Andrei Vasilevskiy for the same shot distance.

This is the result of a very shallow analysis done in the context of a hackathon. The model can be improved in a lot of different ways.

That said, we imagined how we’d use this information tactically. How should a goalie play differently to gain more wins? How could an offensive team alter their strategy to put the goalie out of position?

Slide 8. Sloan Goalie Card.

The Sloan Goalie Card web app is still live here: https://sloan-goalie-card.netlify.app/

Let us know what you think!

Slide 9. Thank You!

Fabrice and I were grateful to have the opportunity to compete at the hackathon this year. A lot of people supported us along the way including our families, AfroTech, open-source developers including the Stan team, friends that provided hockey knowledge, and everyone that participated in the hackathon.

Slide 10. The Team.

Our team:

  • Fabrice Mulumba. Software engineer. For the project, worked in Python and Javascript and got the webapp deployed.
  • Daniel Lee (me). Bayesian statistician. I worked in R and Stan with a touch of Python.

Our Winning Hackathon Strategy

I walked in with knowing a few things about the work needed to win hackathons:

  • Define a problem.
    If you can clearly define a problem, you’ll end up it the top third of the competition. It has to be clear why the problem matters and you have to communicate this effectively.
  • Specify a solution.
    If you’re able to specify a solution to the problem, you’ll end up in the top 10%. It has to be clear to the judges that this solution solves the problem.
  • Implement the solution.
    If you’ve gotten this far and you’re now able to actually implement the solution that you’ve outlined, you’ll end up in the top 3. It’s hard to get to this point. We’re talking about understanding the topic well enough to define a problem of interest, having explored enough of the solution space to specify a solution, then applying skills through focused effort to build the solution in a short amount of time. Do that and I’m sure you’ll be a finalist.
  • Build interactivity.
    If the judges can do something with the solution, specifically evaluate “what if” scenarios, then you’ve gone above and beyond the scopes of a hackathon. That should get you a win.

Winning a hackathon takes work and focus. It’s mentally and physically draining to compete in a hackathon. You have to pace yourself well, adjust to different challenges as they come, and have enough time and energy at the end to switch context to present the work.

One additional note: the solution only needs to be a proof of concept and pass a smell test. It’s important to know when to move on.

Our Team’s Journey

Fabrice and I made a pretty good team. But it almost didn’t happen.

Both Fabrice and I had competed in hackathons before. We first met around 8:30 am, half an hour before the hackathon started. As Fabrice was setting up, I saw that he had on an AfroTech sweatshirt and a Major League Hacking sticker on his laptop. I said hi, asked if he was competing alone, and if he was looking for a teammate. He told me he wanted to compete alone. I was hoping to find a teammate, but had been preparing to compete alone too. While it’s hard to do all the things above alone, it’s actually harder if you have the wrong teammate. We went our separate ways. A few minutes later, we decided to team up.

Something about the team felt right from the start. Maybe I was more comfortable teaming up with one of the few other POC in the room. Maybe there was a familiar cadence and vibe from having parents that immigrated to the US. Maybe it was knowing that the other had been through an intense working session in the past and was voluntarily going through it again. Whatever it was, it worked.

In the few days prior, I had spent a couple hours trying to gain some knowledge about hockey from friends that know the sport. The night before, I found a couple of people that worked for the LA Kings and asked questions about what they thought about and why. I came in thinking we should look at something related to goalie position. Fabrice came in wanting to work on a web app and focus on identifying a process within the game. These ideas melded together and formed the winning project.

For the most part, we worked on separate parts of the problem. We were able to split the work and trust that the other would get their part done. I spent a lot of time asking basic questions about hockey to Keith Horstman and Sam Wood from NHL to validate our problem. I parsed the data and did most of the processing. Fabrice turned what I gave him into a working web application. I worked on the statistical model and drafted a version of the presentation. We refined that together and we were able to showcase both our strengths and have both our voices represented.

Some of the intangibles: there was no bickering. We were able to critique without being critical. We were able to communicate ideas with a few sketches and the other would handle it.

Between the Hackathon and the Finals

There were two teams in the Open Division that were in the finals. We were going up against a juggernaut team of David Bergman and Daniel Brown. They had created a new metric in the week prior to the hackathon on sample data and had spent the hackathon applying it to the full dataset. It was compelling and after the preliminary judging, we were clearly down.

I spent about an hour updating the Stan model. Fabrice and I met for another hour where we updated the presentation. Those hours made a huge difference. It gave our work a clear narrative, one that was missing and fumbled at the time of the preliminary competition.

The Finals

Our 15 minute presentation was as clean and clear as we could have hoped. I was nervous getting up on stage. It had been the first time I’d been up on stage in a while, but it was the first time in a while that I was presenting to experts in a field that I didn’t know. That really made me nervous.

After our talk, there were two comments that really stood out:

  1. David told us our presentation was really polished. That was amazing to hear. We didn’t put that much more time in after the fact, but it was enough for it to really come off cleanly.
  2. I was talking to Allison Loucks (ESPN, hackathon judge) after it was all over. I told her my nerves were going because I felt a bit like an imposter. She said that it sounded like we knew hockey. At that point I knew my effort putting so much focus on defining the problem had paid off at the end.

Additional Thanks

Thanks to the MIT SSAC22 Hackathon organizers, especially James Hogan.

Thanks to all the judges! Brant Berglund, Meghan Chakya, Keith Horstman, Allison Loucks, Sam Wood, and the rest of the NHL (Christopher Baker) and ESPN.

--

--

Daniel Lee

Ramblings of a statistician, dj, basketball theorist. Stan developer (mc-stan.org). Data Scientist at Zelus Analytics.