The `io_uring` interface works through two main data structu...

The io_uring interface works through two main data structures: the submission queue entry (sqe) and the completion queue entry (cqe). Instances of those structures live in a shared memory single-producer-single-consumer ring buffer between the kernel and the application.

The application asynchronously adds sqes to the queue (potentially many) and then tells the kernel that there is work to do. The kernel does its thing, and when work is ready it posts the results in the cqe ring. This also has the added advantage that system calls are now batched. Remember Meltdown? At the time I wrote about how little it affected our Scylla NoSQL database, since we would batch our I/O system calls through aio. Except now we can batch much more than just the storage I/O system calls, and this power is also available to any application.

The application, whenever it wants to check whether work is ready or not, just looks at the cqe ring buffer and consumes entries if they are ready. There is no need to go to the kernel to consume those entries.

Here are some of the operations that io_uring supports: read, write, send, recv, accept, openat, stat, and even way more specialized ones like fallocate.

www.joshbeckman.org/notes/482766650