io_uring(7) — Linux manual page

NAME \| SYNOPSIS \| DESCRIPTION \| CONFORMING TO \| EXAMPLES \| SEE ALSO \| COLOPHON


io_uring(7)             Linux Programmer's Manual             io_uring(7)

NAME top

       io_uring - Asynchronous I/O facility

SYNOPSIS top

       #include <linux/io_uring.h>

DESCRIPTION top

       io_uring is a Linux-specific API for asynchronous I/O.  It allows
       the user to submit one or more I/O requests, which are processed
       asynchronously without blocking the calling process.  io_uring
       gets its name from ring buffers which are shared between user
       space and kernel space. This arrangement allows for efficient I/O,
       while avoiding the overhead of copying buffers between them, where
       possible.  This interface makes io_uring different from other UNIX
       I/O APIs, wherein, rather than just communicate between kernel and
       user space with system calls, ring buffers are used as the main
       mode of communication.  This arrangement has various performance
       benefits which are discussed in a separate section below.  This
       man page uses the terms shared buffers, shared ring buffers and
       queues interchangeably.

       The general programming model you need to follow for io_uring is
       outlined below

       •      Set up shared buffers with io_uring_setup(2) and mmap(2),
              mapping into user space shared buffers for the submission
              queue (SQ) and the completion queue (CQ).  You place I/O
              requests you want to make on the SQ, while the kernel
              places the results of those operations on the CQ.

       •      For every I/O request you need to make (like to read a
              file, write a file, accept a socket connection, etc), you
              create a submission queue entry, or SQE, describe the I/O
              operation you need to get done and add it to the tail of
              the submission queue (SQ).  Each I/O operation is, in
              essence, the equivalent of a system call you would have
              made otherwise, if you were not using io_uring.  For
              instance, a SQE with the opcode set to IORING_OP_READ will
              request a read operation to be issued that is similar to
              the read(2) system call. Refer to the opcode documentation
              in io_uring_enter(2) for all supported opcodes and their
              properties.  You can add more than one SQE to the queue
              depending on the number of operations you want to request.

       •      After you add one or more SQEs, you need to call
              io_uring_enter(2) to tell the kernel to dequeue your I/O
              requests off the SQ and begin processing them.

       •      For each SQE you submit, once it is done processing the
              request, the kernel places a completion queue event or CQE
              at the tail of the completion queue or CQ.  The kernel
              places exactly one matching CQE in the CQ for every SQE you
              submit on the SQ.  After you retrieve a CQE, minimally, you
              might be interested in checking the res field of the CQE
              structure, which corresponds to the return value of the
              system call's equivalent, had you used it directly without
              using io_uring.  Given that io_uring is an async interface,
              errno is never used for passing back error information.
              Instead, res will contain what the equivalent system call
              would have returned in case of success, and in case of
              error res will contain -errno.  For example, if the normal
              read system call would have returned -1 and set errno to
              EINVAL, then res would contain -EINVAL.  If the normal
              system call would have returned a read size of 1024, then
              res would contain 1024.

       •      Optionally, io_uring_enter(2) can also wait for a specified
              number of requests to be processed by the kernel before it
              returns.  If you specified a certain number of completions
              to wait for, the kernel would have placed at least those
              many number of CQEs on the CQ, which you can then readily
              read, right after the return from io_uring_enter(2).

       •      It is important to remember that I/O requests submitted to
              the kernel can complete in any order.  It is not necessary
              for the kernel to process one request after another, in the
              order you placed them.  Given that the interface is a ring,
              the requests are attempted in order, however that doesn't
              imply any sort of ordering on their execution or
              completion.  When more than one request is in flight, it is
              not possible to determine which one will execute or
              complete first.  When you dequeue CQEs off the CQ, you
              should always check which submitted request it corresponds
              to.  The most common method for doing so is utilizing the
              user_data field in the request, which is passed back on the
              completion side.

       •      Concretely, for operations where strict ordering is
              required, such as for sends and receives on a stream-
              oriented TCP socket, it is generally unsafe to have more
              than one outstanding send, or more than one outstanding
              receive (the two directions are independent) on a given
              socket at a time, as the kernel may reorder their execution
              if poll arming or other background kernel activities are
              involved.  However, io_uring provides various facilities to
              enable applications to efficiently pipeline their
              operations safely. If the requests are submitted in a
              single batch, the application may use IOSQE_IO_LINK to
              enforce an execution order in the kernel. Otherwise,
              io_uring provides advanced features like multi shot and
              send/receive bundles to allow applications to provide more
              data in fewer, more efficient trips to the kernel. Even if
              these features are used, applications must still ensure
              they do not overlap different sends or different receives
              on a given file.

       Adding to and reading from the queues:

       •      You add SQEs to the tail of the SQ.  The kernel reads SQEs
              off the head of the queue.

       •      The kernel adds CQEs to the tail of the CQ.  You read CQEs
              off the head of the queue.

       It should be noted that depending on the configuration io_uring's
       behavior can deviate from the behavior outlined above, like not
       posting a CQE for every SQE when setting IOSQE_CQE_SKIP_SUCCESS in
       the SQE or posting multiple CQEs for a single SQE for multi shot
       operations or requiring an io_uring_enter(2) syscall to make the
       kernel begin processing newly added SQEs when using submission
       queue polling.

   Submission queue polling
       One of the goals of io_uring is to provide a means for efficient
       I/O.  To this end, io_uring supports a polling mode that lets you
       avoid the call to io_uring_enter(2), which you use to inform the
       kernel that you have queued SQEs on to the SQ.  With SQ Polling,
       io_uring starts a kernel thread that polls the submission queue
       for any I/O requests you submit by adding SQEs.  With SQ Polling
       enabled, there is no need for you to call io_uring_enter(2),
       letting you avoid the overhead of system calls.  A designated
       kernel thread dequeues SQEs off the SQ as you add them and
       dispatches them for asynchronous processing.

   Setting up io_uring
       The main steps in setting up io_uring consist of mapping in the
       shared buffers with mmap(2) calls.  In the example program
       included in this man page, the function app_setup_uring() sets up
       io_uring with a QUEUE_DEPTH deep submission queue.  Pay attention
       to the 2 mmap(2) calls that set up the shared submission and
       completion queues.  If your kernel is older than version 5.4,
       three mmap(2) calls are required.

   Submitting I/O requests
       The process of submitting a request consists of describing the I/O
       operation you need to get done using an io_uring_sqe structure
       instance.  These details describe the equivalent system call and
       its parameters.  Because the range of I/O operations Linux
       supports are very varied and the io_uring_sqe structure needs to
       be able to describe them, it has several fields, some packed into
       unions for space efficiency.  Here is a simplified version of
       struct io_uring_sqe with some of the most often used fields:

           struct io_uring_sqe {
                   __u8    opcode;         /* type of operation for this sqe */
                   __s32   fd;             /* file descriptor to do IO on */
                   __u64   off;            /* offset into file */
                   __u64   addr;           /* pointer to buffer or iovecs */
                   __u32   len;            /* buffer size or number of iovecs */
                   __u64   user_data;      /* data to be passed back at completion time */
                   __u8    flags;          /* IOSQE_ flags */
                   ...
           };

       Here is struct io_uring_sqe in full:

           struct io_uring_sqe {
                __u8 opcode;        /* type of operation for this sqe */
                __u8 flags;         /* IOSQE_ flags */
                __u16     ioprio;        /* ioprio for the request */
                __s32     fd;       /* file descriptor to do IO on */
                union {
                     __u64     off; /* offset into file */
                     __u64     addr2;
                     struct {
                          __u32     cmd_op;
                          __u32     __pad1;
                     };
                };
                union {
                     __u64     addr;     /* pointer to buffer or iovecs */
                     __u64     splice_off_in;
                     struct {
                          __u32     level;
                          __u32     optname;
                     };
                };
                __u32     len;      /* buffer size or number of iovecs */
                union {
                     __kernel_rwf_t rw_flags;
                     __u32          fsync_flags;
                     __u16          poll_events;   /* compatibility */
                     __u32          poll32_events; /* word-reversed for BE */
                     __u32          sync_range_flags;
                     __u32          msg_flags;
                     __u32          timeout_flags;
                     __u32          accept_flags;
                     __u32          cancel_flags;
                     __u32          open_flags;
                     __u32          statx_flags;
                     __u32          fadvise_advice;
                     __u32          splice_flags;
                     __u32          rename_flags;
                     __u32          unlink_flags;
                     __u32          hardlink_flags;
                     __u32          xattr_flags;
                     __u32          msg_ring_flags;
                     __u32          uring_cmd_flags;
                     __u32          waitid_flags;
                     __u32          futex_flags;
                     __u32          install_fd_flags;
                     __u32          nop_flags;
                };
                __u64     user_data;     /* data to be passed back at completion time */
                /* pack this to avoid bogus arm OABI complaints */
                union {
                     /* index into fixed buffers, if used */
                     __u16     buf_index;
                     /* for grouped buffer selection */
                     __u16     buf_group;
                } __attribute__((packed));
                /* personality to use, if used */
                __u16     personality;
                union {
                     __s32     splice_fd_in;
                     __u32     file_index;
                     __u32     optlen;
                     struct {
                          __u16     addr_len;
                          __u16     __pad3[1];
                     };
                };
                union {
                     struct {
                          __u64     addr3;
                          __u64     __pad2[1];
                     };
                     __u64     optval;
                     /*
                      * If the ring is initialized with IORING_SETUP_SQE128, then
                      * this field is used for 80 bytes of arbitrary command data
                      */
                     __u8 cmd[0];
                };
           };

       To submit an I/O request to io_uring, you need to acquire a
       submission queue entry (SQE) from the submission queue (SQ), fill
       it up with details of the operation you want to submit and call
       io_uring_enter(2).  There are helper functions of the form
       io_uring_prep_X to enable proper setup of the SQE. If you want to
       avoid calling io_uring_enter(2), you have the option of setting up
       Submission Queue Polling.

       SQEs are added to the tail of the submission queue.  The kernel
       picks up SQEs off the head of the SQ.  The general algorithm to
       get the next available SQE and update the tail is as follows.

           struct io_uring_sqe *sqe;
           unsigned tail, index;
           tail = *sqring->tail;
           index = tail & (*sqring->ring_mask);
           sqe = &sqring->sqes[index];
           /* fill up details about this I/O request */
           describe_io(sqe);
           /* fill the sqe index into the SQ ring array */
           sqring->array[index] = index;
           tail++;
           atomic_store_explicit(sqring->tail, tail, memory_order_release);

       To get the index of an entry, the application must mask the
       current tail index with the size mask of the ring.  This holds
       true for both SQs and CQs.  Once the SQE is acquired, the
       necessary fields are filled in, describing the request.  While the
       CQ ring directly indexes the shared array of CQEs, the submission
       side has an indirection array between them.  The submission side
       ring buffer is an index into this array, which in turn contains
       the index into the SQEs.

       The following code snippet demonstrates how a read operation, an
       equivalent of a preadv2(2) system call is described by filling up
       an SQE with the necessary parameters.

           struct iovec iovecs[16];
            ...
           sqe->opcode = IORING_OP_READV;
           sqe->fd = fd;
           sqe->addr = (unsigned long) iovecs;
           sqe->len = 16;
           sqe->off = offset;
           sqe->flags = 0;

       Memory ordering
              Modern compilers and CPUs freely reorder reads and writes
              without affecting the program's outcome to optimize
              performance.  Some aspects of this need to be kept in mind
              on SMP systems since io_uring involves buffers shared
              between kernel and user space.  These buffers are both
              visible and modifiable from kernel and user space.  As
              heads and tails belonging to these shared buffers are
              updated by kernel and user space, changes need to be
              coherently visible on either side, irrespective of whether
              a CPU switch took place after the kernel-user mode switch
              happened.  We use memory barriers to enforce this
              coherency.  Being significantly large subjects on their
              own, memory barriers are out of scope for further
              discussion on this man page.  For more information on
              modern memory models the reader may refer to the
              Documentation/memory-barriers.txt in the kernel tree or to
              the documentation of the formal C11 or kernel memory model.

       Letting the kernel know about I/O submissions
              Once you place one or more SQEs on to the SQ, you need to
              let the kernel know that you've done so.  You can do this
              by calling the io_uring_enter(2) system call.  This system
              call is also capable of waiting for a specified count of
              events to complete.  This way, you can be sure to find
              completion events in the completion queue without having to
              poll it for events later.

   SQE pointer lifetimes & data stability
       Due to the fixed size of the submission queue entry (SQE) some
       data you provide in order to perform a desired operation will be
       passed in the form of a pointer rather than value. In this
       situation, you may free the memory backing the pointer once the
       succeeding io_uring_enter(2) call has completed; providing it is
       only required by the operation when submitting.

       When IORING_SETUP_SQPOLL is not enabled, this is done when you
       call io_uring_submit(3) In The event IORING_SETUP_SQPOLL is
       enabled, you must ensure any provided pointers remain valid until
       completion.

       However, very early kernels (5.4 and earlier) required state to be
        stable until the completion occurred regardless. Applications can
       test for this
        behavior by inspecting the IORING_FEAT_SUBMIT_STABLE flag passed
       back from io_uring_queue_init_params(3).

       As an example, the IORING_OP_TIMEOUT operation takes a pointer to
       a __kernel_timespec struct. This struct is then read by the kernel
       when you submit the submission queue entries, once submitted, you
       may free the backing memory of the __kernel_timespec as it will
       not be read again by the kernel.

       It should be noted that this behaviour does not apply to data that
       is read or written while the operation is inflight. For example,
       the pointers to a buffer used as part of a IORING_OP_WRITE or
       IORING_OP_READ operation must remain valid until completion.

   Reading completion events
       Similar to the submission queue (SQ), the completion queue (CQ) is
       a shared buffer between the kernel and user space.  Whereas you
       placed submission queue entries on the tail of the SQ and the
       kernel read off the head, when it comes to the CQ, the kernel
       places completion queue events or CQEs on the tail of the CQ and
       you read off its head.

       Submission is flexible (and thus a bit more complicated) since it
       needs to be able to encode different types of system calls that
       take various parameters.  Completion, on the other hand is simpler
       since we're looking only for a return value back from the kernel.
       This is easily understood by looking at the completion queue event
       structure, struct io_uring_cqe:

           struct io_uring_cqe {
                __u64     user_data;  /* sqe->data submission passed back */
                __s32     res;        /* result code for this event */
                __u32     flags;
           };

       Here, user_data is custom data that is passed unchanged from
       submission to completion.  That is, from SQEs to CQEs.  This field
       can be used to set context, uniquely identifying submissions that
       got completed.  Given that I/O requests can complete in any order,
       this field can be used to correlate a submission with a
       completion.  res is the result from the system call that was
       performed as part of the submission; its return value.

       The flags field carries request-specific information. As of the
       6.12 kernel, the following flags are defined:

       IORING_CQE_F_BUFFER
              If set, the upper 16 bits of the flags field carries the
              buffer ID that was chosen for this request. The request
              must have been issued with IOSQE_BUFFER_SELECT set, and
              used with a request type that supports buffer selection.
              Additionally, buffers must have been provided upfront
              either via the IORING_OP_PROVIDE_BUFFERS or the
              IORING_REGISTER_PBUF_RING methods.

       IORING_CQE_F_MORE
              If set, the application should expect more completions from
              the request. This is used for requests that can generate
              multiple completions, such as multi-shot requests, receive,
              or accept.

       IORING_CQE_F_SOCK_NONEMPTY
              If set, upon receiving the data from the socket in the
              current request, the socket still had data left on
              completion of this request.

       IORING_CQE_F_NOTIF
              Set for notification CQEs, as seen with the zero-copy
              networking send and receive support.

       IORING_CQE_F_BUF_MORE
              If set, the buffer ID set in the completion will get more
              completions. This means that the provided buffer has been
              partially consumed and there's more buffer space left, and
              hence the application should expect more completions with
              this buffer ID. Each completion will continue where the
              previous one left off. This can only happen if the provided
              buffer ring has been setup with IOU_PBUF_RING_INC to allow
              for incremental / partial consumption of buffers.

       IORING_CQE_F_SKIP
              If the ring has been configured with IORING_SETUP_CQE_MIXED
              , then CQEs may be posted which has this flag set. This can
              happen if the ring is a single 16b CQE entry away from
              wrapping, but someone needs to post a 32b CQE. As CQEs must
              be contiguous in memory, a filler/pad CQE needs to get
              posted to allow posting of the 32b CQE. CQEs with this flag
              set should simply be skipped and ignored, they serve no
              other purpose than to fill a gap in the CQ ring.

       IORING_CQE_F_32
              If the ring has been configured with IORING_SETUP_CQE_MIXED
              , this flag is set when the CQE is of the 32b type. This
              tells the application that there's an extra 16b of space in
              this CQE, and that to get to the next CQE the CQ ring must
              be advanced by twice as much as for a normal 16b CQE.

       The general sequence to read completion events off the completion
       queue is as follows:

           unsigned head;
           head = *cqring->head;
           if (head != atomic_load_acquire(cqring->tail)) {
               struct io_uring_cqe *cqe;
               unsigned index;
               index = head & (cqring->mask);
               cqe = &cqring->cqes[index];
               /* process completed CQE */
               process_cqe(cqe);
               /* CQE consumption complete */
               head++;
           }
           atomic_store_explicit(cqring->head, head, memory_order_release);

       It helps to be reminded that the kernel adds CQEs to the tail of
       the CQ, while you need to dequeue them off the head.  To get the
       index of an entry at the head, the application must mask the
       current head index with the size mask of the ring.  Once the CQE
       has been consumed or processed, the head needs to be updated to
       reflect the consumption of the CQE.  Attention should be paid to
       the read and write barriers to ensure successful read and update
       of the head.

   io_uring performance
       Because of the shared ring buffers between kernel and user space,
       io_uring can be a zero-copy system.  Copying buffers to and from
       becomes necessary when system calls that transfer data between
       kernel and user space are involved.  But since the bulk of the
       communication in io_uring is via buffers shared between the kernel
       and user space, this huge performance overhead is completely
       avoided.

       While system calls may not seem like a significant overhead, in
       high performance applications, making a lot of them will begin to
       matter.  While workarounds the operating system has in place to
       deal with Spectre and Meltdown are ideally best done away with,
       unfortunately, some of these workarounds are around the system
       call interface, making system calls not as cheap as before on
       affected hardware.  While newer hardware should not need these
       workarounds, hardware with these vulnerabilities can be expected
       to be in the wild for a long time.  While using synchronous
       programming interfaces or even when using asynchronous programming
       interfaces under Linux, there is at least one system call involved
       in the submission of each request.  In io_uring, on the other
       hand, you can batch several requests in one go, simply by queueing
       up multiple SQEs, each describing an I/O operation you want and
       make a single call to io_uring_enter(2).  This is possible due to
       io_uring's shared buffers based design.

       While this batching in itself can avoid the overhead associated
       with potentially multiple and frequent system calls, you can
       reduce even this overhead further with Submission Queue Polling,
       by having the kernel poll and pick up your SQEs for processing as
       you add them to the submission queue. This avoids the
       io_uring_enter(2) call you need to make to tell the kernel to pick
       SQEs up.  For high-performance applications, this means even fewer
       system call overheads.

CONFORMING TO top

       io_uring is Linux-specific.

EXAMPLES top

       The following example uses io_uring to copy stdin to stdout.
       Using shell redirection, you should be able to copy files with
       this example.  Because it uses a queue depth of only one, this
       example processes I/O requests one after the other.  It is
       purposefully kept this way to aid understanding.  In real-world
       scenarios however, you'll want to have a larger queue depth to
       parallelize I/O request processing so as to gain the kind of
       performance benefits io_uring provides with its asynchronous
       processing of requests.

       #include <stdio.h>
       #include <stdlib.h>
       #include <sys/stat.h>
       #include <sys/ioctl.h>
       #include <sys/syscall.h>
       #include <sys/mman.h>
       #include <sys/uio.h>
       #include <linux/fs.h>
       #include <fcntl.h>
       #include <unistd.h>
       #include <string.h>
       #include <stdatomic.h>
       #include <errno.h>

       #include <linux/io_uring.h>

       #define QUEUE_DEPTH 1
       #define BLOCK_SZ    1024

       /* Macros for barriers needed by io_uring */
       #define io_uring_smp_store_release(p, v)            \
           atomic_store_explicit((_Atomic typeof(*(p)) *)(p), (v), \
                         memory_order_release)
       #define io_uring_smp_load_acquire(p)                \
           atomic_load_explicit((_Atomic typeof(*(p)) *)(p),   \
                        memory_order_acquire)

       int ring_fd;
       unsigned *sring_tail, *sring_mask, *sring_array,
                   *cring_head, *cring_tail, *cring_mask;
       struct io_uring_sqe *sqes;
       struct io_uring_cqe *cqes;
       char buff[BLOCK_SZ];
       off_t offset;

       /*
        * System call wrappers provided since glibc does not yet
        * provide wrappers for io_uring system calls.
       * */

       int io_uring_setup(unsigned entries, struct io_uring_params *p)
       {
           int ret;
           ret = syscall(__NR_io_uring_setup, entries, p);
           return (ret < 0) ? -errno : ret;
       }

       int io_uring_enter(int ring_fd, unsigned int to_submit,
                          unsigned int min_complete, unsigned int flags)
       {
           int ret;
           ret = syscall(__NR_io_uring_enter, ring_fd, to_submit,
                         min_complete, flags, NULL, 0);
           return (ret < 0) ? -errno : ret;
       }

       int app_setup_uring(void) {
           struct io_uring_params p;
           void *sq_ptr, *cq_ptr;

           /* See io_uring_setup(2) for io_uring_params.flags you can set */
           memset(&p, 0, sizeof(p));
           ring_fd = io_uring_setup(QUEUE_DEPTH, &p);
           if (ring_fd < 0) {
               perror("io_uring_setup");
               return 1;
           }

           /*
            * io_uring communication happens via 2 shared kernel-user space ring
            * buffers, which can be jointly mapped with a single mmap() call in
            * kernels >= 5.4.
            */

           int sring_sz = p.sq_off.array + p.sq_entries * sizeof(unsigned);
           int cring_sz = p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe);

           /* Rather than check for kernel version, the recommended way is to
            * check the features field of the io_uring_params structure, which is a
            * bitmask. If IORING_FEAT_SINGLE_MMAP is set, we can do away with the
            * second mmap() call to map in the completion ring separately.
            */
           if (p.features & IORING_FEAT_SINGLE_MMAP) {
               if (cring_sz > sring_sz)
                   sring_sz = cring_sz;
               cring_sz = sring_sz;
           }

           /* Map in the submission and completion queue ring buffers.
            *  Kernels < 5.4 only map in the submission queue, though.
            */
           sq_ptr = mmap(0, sring_sz, PROT_READ | PROT_WRITE,
                         MAP_SHARED | MAP_POPULATE,
                         ring_fd, IORING_OFF_SQ_RING);
           if (sq_ptr == MAP_FAILED) {
               perror("mmap");
               return 1;
           }

           if (p.features & IORING_FEAT_SINGLE_MMAP) {
               cq_ptr = sq_ptr;
           } else {
               /* Map in the completion queue ring buffer in older kernels separately */
               cq_ptr = mmap(0, cring_sz, PROT_READ | PROT_WRITE,
                             MAP_SHARED | MAP_POPULATE,
                             ring_fd, IORING_OFF_CQ_RING);
               if (cq_ptr == MAP_FAILED) {
                   perror("mmap");
                   return 1;
               }
           }
           /* Save useful fields for later easy reference */
           sring_tail = sq_ptr + p.sq_off.tail;
           sring_mask = sq_ptr + p.sq_off.ring_mask;
           sring_array = sq_ptr + p.sq_off.array;

           /* Map in the submission queue entries array */
           sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
                          PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
                          ring_fd, IORING_OFF_SQES);
           if (sqes == MAP_FAILED) {
               perror("mmap");
               return 1;
           }

           /* Save useful fields for later easy reference */
           cring_head = cq_ptr + p.cq_off.head;
           cring_tail = cq_ptr + p.cq_off.tail;
           cring_mask = cq_ptr + p.cq_off.ring_mask;
           cqes = cq_ptr + p.cq_off.cqes;

           return 0;
       }

       /*
       * Read from completion queue.
       * In this function, we read completion events from the completion queue.
       * We dequeue the CQE, update and head and return the result of the operation.
       * */

       int read_from_cq() {
           struct io_uring_cqe *cqe;
           unsigned head;

           /* Read barrier */
           head = io_uring_smp_load_acquire(cring_head);
           /*
           * Remember, this is a ring buffer. If head == tail, it means that the
           * buffer is empty.
           * */
           if (head == *cring_tail)
               return -1;

           /* Get the entry */
           cqe = &cqes[head & (*cring_mask)];
           if (cqe->res < 0)
               fprintf(stderr, "Error: %s\n", strerror(abs(cqe->res)));

           head++;

           /* Write barrier so that update to the head are made visible */
           io_uring_smp_store_release(cring_head, head);

           return cqe->res;
       }

       /*
       * Submit a read or a write request to the submission queue.
       * */

       int submit_to_sq(int fd, int op) {
           unsigned index, tail;

           /* Add our submission queue entry to the tail of the SQE ring buffer */
           tail = *sring_tail;
           index = tail & *sring_mask;
           struct io_uring_sqe *sqe = &sqes[index];
           /* Fill in the parameters required for the read or write operation */
           sqe->opcode = op;
           sqe->fd = fd;
           sqe->addr = (unsigned long) buff;
           if (op == IORING_OP_READ) {
               memset(buff, 0, sizeof(buff));
               sqe->len = BLOCK_SZ;
           }
           else {
               sqe->len = strlen(buff);
           }
           sqe->off = offset;

           sring_array[index] = index;
           tail++;

           /* Update the tail */
           io_uring_smp_store_release(sring_tail, tail);

           /*
           * Tell the kernel we have submitted events with the io_uring_enter()
           * system call. We also pass in the IORING_ENTER_GETEVENTS flag which
           * causes the io_uring_enter() call to wait until min_complete
           * (the 3rd param) events complete.
           * */
           int ret =  io_uring_enter(ring_fd, 1,1,
                                     IORING_ENTER_GETEVENTS);
           if(ret < 0) {
               perror("io_uring_enter");
               return -1;
           }

           return ret;
       }

       int main(int argc, char *argv[]) {
           int res;

           /* Setup io_uring for use */
           if(app_setup_uring()) {
               fprintf(stderr, "Unable to setup uring!\n");
               return 1;
           }

           /*
           * A while loop that reads from stdin and writes to stdout.
           * Breaks on EOF.
           */
           while (1) {
               /* Initiate read from stdin and wait for it to complete */
               submit_to_sq(STDIN_FILENO, IORING_OP_READ);
               /* Read completion queue entry */
               res = read_from_cq();
               if (res > 0) {
                   /* Read successful. Write to stdout. */
                   submit_to_sq(STDOUT_FILENO, IORING_OP_WRITE);
                   read_from_cq();
               } else if (res == 0) {
                   /* reached EOF */
                   break;
               }
               else if (res < 0) {
                   /* Error reading file */
                   fprintf(stderr, "Error: %s\n", strerror(abs(res)));
                   break;
               }
               offset += res;
           }

           return 0;
       }

COLOPHON top

       This page is part of the liburing (A library for io_uring)
       project.  Information about the project can be found at 
       ⟨https://github.com/axboe/liburing⟩.  If you have a bug report for
       this manual page, send it to io-uring@vger.kernel.org.  This page
       was obtained from the project's upstream Git repository
       ⟨https://github.com/axboe/liburing⟩ on 2026-05-24.  (At that time,
       the date of the most recent commit that was found in the
       repository was 2026-05-18.)  If you discover any rendering
       problems in this HTML version of the page, or you believe there is
       a better or more up-to-date source for the page, or you have
       corrections or improvements to the information in this COLOPHON
       (which is not part of the original manual page), send a mail to
       man-pages@man7.org

Linux                         July 26, 2020                   io_uring(7)

Pages that refer to this page: io_uring_register(2), systemd.exec(5), io_uring_cancelation(7), io_uring_linked_requests(7), io_uring_multishot(7), io_uring_provided_buffers(7), io_uring_registered_buffers(7), io_uring_registered_files(7), io_uring_setup_flags(7), io_uring_sqpoll(7)

io_uring(7) — Linux manual page

NAME top

SYNOPSIS top

DESCRIPTION top

CONFORMING TO top

EXAMPLES top

SEE ALSO top

COLOPHON top