Io_uring

In this page, I’ll provide a surface-level explanation of how io_uring works. If you want a more in-depth explanation, check out this tutorial or this redhat article.

As mentioned, io_uring manages file descriptors for the users and lets them know when one or more of them are ready.

Each io_uring instance is composed of two ring buffers - the submission queue and the completion queue.

To register interest in a file descriptor, you add an SQE to the tail of the submission queue. Adding to the submission queue doesn’t automatically send the requests to the kernel, you need to submit it via the io_uring_enter system call. Io_uring supports batching by allowing you to add multiple SQEs to the ring before submitting.

The kernel processes the submitted entries and adds completion queue events (CQEs) to the completion queue when it is ready. While the order of the CQEs might not match the order of the SQEs, there will be one CQE for each SQE, which you can identify by providing user data.

The user can then check the CQE to see if there are any completed I/O operations.

Using io_uring for TcpListener

Let’s look at how we can use IoUring to manage the accept operation for a TcpListener. We will be using the iou crate, a library built on top of liburing, to create and interact with io_uring instances.

#![allow(unused)]
fn main() {
let l = std::net::TcpListener::bind("127.0.0.1:8080").unwrap();
l.set_nonblocking(true).unwrap();
let mut ring = iou::IoUring::new(2).unwrap();

unsafe {
    let mut sqe = ring.prepare_sqe().expect("failed to get sqe");
    sqe.prep_poll_add(l.as_raw_fd(), iou::sqe::PollFlags::POLLIN);
    sqe.set_user_data(0xDEADBEEF);
    ring.submit_sqes().unwrap();
}
l.accept();
let cqe = ring.wait_for_cqe().unwrap();
assert_eq!(cqe.user_data(), 0xDEADBEEF);
}

In this example, we first create a [TcpListener](<https://doc.rust-lang.org/stable/std/net/struct.TcpListener.html>) and set it to non-blocking. Next, we create an io_uring instance. We then register interest in the socket’s file descriptor by making a call to prep_poll_add (a wrapper around Linux’s io_uring_prep_poll_add call). This adds a SQE entry to the submission queue which will trigger a CQE to be posted when there is data to be read.

We then call accept to accept any incoming TCP connections. Finally, we call wait_for_cqe, which returns the next CQE, blocking the thread until one is ready if necessary. If we wanted to avoid blocking the thread in this example, we could’ve called peek_for_cqe which peeks for any completed CQE without blocking.

Efficiently Checking the CQE

You might be wondering - if we potentially need to call peek_for_cqe() repeatedly until it is ready, how is this different from calling listener.accept() repeatedly?

The difference is that accept is a system call while peek_for_cqe, which calls io_uring_peek_batch_cqe under the hood, is not a system call. This is due to the unique property of io_uring such that the completion ring buffer is shared between the kernel and the user space. This allows you to efficiently check the status of completed I/O operations.