Thread Pinning

Thread-per-core is a programming paradigm in which developers are not allowed to spawn new threads to run tasks. Instead, each core only runs a single thread. This is to avoid expensive context switches and avoid using synchronization primitives such as locks. Check out this excellent blog by Glommio to explain the benefits of the thread-per-core archicture.

API

In this section, we will enable the developer to create a LocalExecutor that only runs on a particular CPU. In this code snippet below, we create an executor that only runs on Cpu 0 with the help of the LocalExecutorBuilder.

#![allow(unused)]
fn main() {
// The LocalExecutor will now only run on Cpu 0
let builder = LocalExecutorBuilder::new(Placement::Fixed(0));
let local_ex = builder.build();
let res = local_ex.run(async {
   ...
});
}

By creating N executors and binding each executor to a specific CPU, the developer can implement a thread-per-core system.

Implementation

sched_setaffinity

To force a thread to run on a particular CPU, we will be modifying the thread's CPU affinity mask by using Linux's sched_affinity command. As specified in Linux’s manual page, After a call to **sched_setaffinity**(), the set of CPUs on which the thread will actually run is the intersection of the set specified in the *mask* argument and the set of CPUs actually present on the system..

LocalExecutor

We modify LocalExecutor's constructor to take a list of CPUs as its parameter. It then calls bind_to_cpu_set

#![allow(unused)]
fn main() {
impl LocalExecutor {
    pub fn new(cpu_binding: Option<impl IntoIterator<Item = usize>>) -> Self {
        match cpu_binding {
            Some(cpu_set) => bind_to_cpu_set(cpu_set),
            None => {}
        }
        LocalExecutor { ... }
    }
  
  	pub(crate) fn bind_to_cpu_set(cpus: impl IntoIterator<Item = usize>) {
        let mut cpuset = nix::sched::CpuSet::new();
        for cpu in cpus {
            cpuset.set(cpu).unwrap();
        }
        let pid = nix::unistd::Pid::from_raw(0);
        nix::sched::sched_setaffinity(pid, &cpuset).unwrap();
    }
  ...
}
}

In bind_to_cpu_set, the pid is set to 0 because the manual page says that If *pid* is zero, then the calling thread is used.

Placement

Next, we introduce Placements. A Placement is a policy that determines what CPUs the LocalExecutor will run on. Currently, there are two Placements. We may add more in Phase 4.

#![allow(unused)]
fn main() {
pub enum Placement {
    /// The `Unbound` variant creates a [`LocalExecutor`]s that are not bound to
    /// any CPU.
    Unbound,
    /// The [`LocalExecutor`] is bound to the CPU specified by
    /// `Fixed`.
    Fixed(usize),
}
}

Placement::Unbound means that the LocalExecutor is not bound to any CPU. Placement::Fixed(cpu_id) means that the LoccalExecutor is bound to the specified CPU.

LocalExecutorBuilder

Finally, all the LocalExecutorBuilder does is that it transforms a Placement into a list of CPUs that will be passed into LocalExecutor's constructor.

#![allow(unused)]
fn main() {
pub(crate) struct LocalExecutorBuilder {
    placement: Placement,
}

impl LocalExecutorBuilder {
    pub fn new(placement: Placement) -> LocalExecutorBuilder {
        LocalExecutorBuilder { placement }
    }

    pub fn build(self) -> LocalExecutor {
        let cpu_binding = match self.placement {
            Placement::Unbound => None::<Vec<usize>>,
            Placement::Fixed(cpu) => Some(vec![cpu]),
        };
        let mut ex = LocalExecutor::new(cpu_binding);
        ex.init();
        ex
    }
}
}

When Placement::Fixed(cpu) is provided, the LocalExecutorBuilder simply creates the LocalExecutor with vec![cpu] as the specified CPU.

Building an Asynchronous Runtime like Glommio

Thread Pinning

API

Implementation