What is Polars?
Before we go into the implementation details of Polars, let’s quickly look at what Polars is.
Polars is a super fast DataFrame library for Rust and Python. It exposes a set of query APIs that are similar to Pandas.
For example, here is an example provided by Polars to load a parquet file, and perform operations such as groupby
, agg
and join
:
Polars uses Apache Arrow as its Memory Model. Polars also uses techniques such as SIMD instructions, parallelization, query optimization, and many other techniques to have a lightning-fast performance.
Apache Arrow
The best way to understand Apache Arrow is to check out its official documentation. To summarize, it is an in-memory columnar format for representing tabular data.
The columnar format is beneficial for computations because it enables vectorization using SIMD (Single Instruction, Multiple Data). Apache Arrow also provides zero-copy reads which makes copying data around cheap.
I personally haven’t had the time to dig into the details of the Apache Arrow design but I definitely plan to one day!