Reimplementing the Inference Algorithms in Stan

Daniel Lee
2 min readDec 31, 2021

I’ve wanted to rewrite the inference algorithms in Stan for many years. It’s one of those things that was never urgent enough to warrant the effort, but entering 2022, I have some spare time that I will dedicate to this project. (Also, publicly writing this down will force me to work on it.)

Photo by Alexander Andrews on Unsplash

Why would I want to do this?
The code is hard to read, understand, and modify. Don’t get me wrong: it works and is fairly optimized. But in software development — especially open-source development — it’s worth the effort to enable existing and future contributions to the code base. This part of the Stan code base has been pretty stagnant.

I would love to get to a place where the algorithm API could be redesigned safely and sanely. Or add more features like checkpointing and updating the warmup phase. Or making it easier to add and remove different inference algorithms.

Some of the benefits of doing this:

  1. Easier for developers to read and understand code
  2. Encourage new contributors to join
  3. Have the algorithms to be a living part of the code base
  4. If architected well, could enable more algorithms research
  5. Better maintenance of the code: ability to refactor the code, redesign the API, reimplement the parts of the algorithm without fear of breaking something.

What next?
The work will be done in this repository: https://github.com/syclik/stan-algorithms

Implementing algorithms, especially when there’s a dependence on a random number generator, is hard. I’ll start by creating a test suite, instantiating the existing Stan algorithms code, and working on making little reproducible code tests that will allow me to work in fast iterations. Debugging and tracing through C++ code isn’t the easiest, but I’ve done it before and know what I’m looking for.

I’ll post updates as I make progress.

Do you want to help?
If this is of interest to you, please reach out. I’m happy to collaborate! You’ll need to know C++, but not necessarily the algorithms by heart. We’ll break that down as we go along.

--

--

Daniel Lee

Ramblings of a statistician, dj, basketball theorist. Stan developer (mc-stan.org). Data Scientist at Zelus Analytics.