LazyFrame

Polars recommends using the Lazy API when dealing with performance-critical code. Unlike the eager API, the lazy API defers the execution until the end which allows Polars to perform query optimizations.

Here is an example of the lazy API:

#![allow(unused)]
fn main() {
df
  .lazy()
  .filter(col("age").gt(lit(25)))
  .groupby(vec!["team"])
  .agg(vec![col("points").sum()])
  .collect();
}

All lazy queries begin with the lazy() method. The execution of the query is delayed until collect is called. During execution, Polars first rearranges the query with optimizations like predicate pushdown, projection pushdown, type coercion, etc before actually executing the operations. Check out the list of optimizations used by Polars.

LazyFrame and LogicalPlan

When lazy() is called, the dataframe is converted to a LazyFrame.

#![allow(unused)]
fn main() {
pub struct LazyFrame {
    pub logical_plan: LogicalPlan,
		...
}
}

LazyFrame is just an abstraction around LogicalPlan. A LogicalPlan is an enum of transformations that makes up a query.

#![allow(unused)]
fn main() {
enum LogicalPlan {
		Selection {
        input: Box<LogicalPlan>,
        predicate: Expr,
    },
		Join {
        input_left: Box<LogicalPlan>,
        input_right: Box<LogicalPlan>,
        left_on: Vec<Expr>,
        right_on: Vec<Expr>,
				..
    },
		...
}
}

Operations like filter, select, join, etc creates the LogicalPlans.

#![allow(unused)]
fn main() {
lf.filter(col("age").gt(lit(25)))
}

For example, here is a simplified implementation of filter:

#![allow(unused)]
fn main() {
impl LazyFrame {
		pub fn filter(self, predicate: Expr) -> LazyFrame {
			let lp = LogicalPlan::Selection {
				input: Box::new(self.logical_plan),
				predicate
			};
			LazyFrame::from_logical_plan(lp)
		}
}
}

If you want to look at the logical_plan constructed by your query, you can perform:

#![allow(unused)]
fn main() {
lazy_frame.logical_plan.describe()
}

Optimization

When collect is called, it will optimize and rearrange the logical plan. The optimized logical_plan will then be converted to a physical_plan. A physical plan is an [Executor](<https://github.com/pola-rs/polars/blob/main/polars/polars-lazy/src/physical_plan/executors/executor.rs#L10>) that can generate a dataframe. For example, here is the physical plan for filter and here is the function that converts a logical plan to a physical plan.

If you want to look at the optimized logical plan, you can perform:

#![allow(unused)]
fn main() {
lazy_frame.describe_optimized_plan()
}

You can also turn each optimizers on and off like this:

#![allow(unused)]
fn main() {
lf.with_predicate_pushdown(true)
        .describe_optimized_plan()
}

In the remaining sections, we will deep dive into a couple of optimizations Polars uses.

Building a Fast DataFrame Library like Polars

LazyFrame

LazyFrame and LogicalPlan

Optimization