Motivation

I recently spent some time exploring the internals of Polars, the lightning-fast DataFrame library built on top of Arrow2. To make sure I grasp the material better, I reimplemented a subset of Polars (with some code respectfully copied) in the form of a toy Dataframe library that only supports 3 data types. The full source code is available on github.

It all began when I stumbled across an amazing blog post written by Ritchie Vink, the creator of Polars, titled "I wrote one of the fastest DataFrame libraries." The post highlighted Polars' impressive performance in benchmarks. I was inspired and wanted to understand the secret behind its stellar performance.

Here are some screenshots from the recent h20.ai benchmark for groupby queries conducted by DuckDB to showcase how fast Polars is.

In this blog, I will provide detailed explanations of how Polars implements Join, Groupby, Filter, and more.