You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
io_uring is a relatively new asynchronous batched IO interface added to the Linux kernel in version 5.1 (just over 3 years ago). At a low level, it provides batched asynchronous zero-copy IO with minimal system call overhead.
Due to the very large number of files that need to be read by certain plugins on each collection cycle, Netdata seems like a prime candidate to take advantage of this, specifically for the IO batching aspects.
At a low level, the interface consists of a pair of ring buffers shared between a user process and the kernel, one for IO submission (the submission queue, or SQ), and one for reading back completed IO (the completion queue, or CQ). The user process adds a number of IO requests to the SQ, then calls a specific system call to inform the kernel that there are requests queued. The kernel then processes the requests in the SQ, creating a completion entry in the CQ for each one processed as it completes. The user process can either block waiting for some number of entries in the CQ, or check in a non-blocking manner for available completion entries. Once a completion entry is available, the user process then collects the data for it, and marks it as processed so the kernel knows that that slot in the CQ can be reused.
Realistically, we would be utilizing the liburing userspace library to leverage io_uring, as it provides a much cleaner and much simpler interface to work with.
Because of how this works, there are both far fewer context switches, and far fewer bytes of data copied around than would be required for traditional synchronous IO, resulting in both a much faster average completion time for large batches of IO, and a lower impact on the rest of the system. Instead of requiring at least two context switches per IO operation, io_uring only requires two context switches for each batch of IO.
The downside to this is that it would require a significant rewrite of much of the code that handles processing of files in /sys and /proc to utilize effectively, and it is only available on Linux. It’s also relatively new, so we would need either build-time, or ideally runtime, detection for it.
The apps plugin seems like the prime candidate for testing this, as it reads hundreds of files on each collection cycle even on a system with relatively few processes, is usually one of the biggest users of CPU time on most Netdata installs, and can also easily be artificially bench marked with very large numbers of files.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
io_uring is a relatively new asynchronous batched IO interface added to the Linux kernel in version 5.1 (just over 3 years ago). At a low level, it provides batched asynchronous zero-copy IO with minimal system call overhead.
Due to the very large number of files that need to be read by certain plugins on each collection cycle, Netdata seems like a prime candidate to take advantage of this, specifically for the IO batching aspects.
At a low level, the interface consists of a pair of ring buffers shared between a user process and the kernel, one for IO submission (the submission queue, or SQ), and one for reading back completed IO (the completion queue, or CQ). The user process adds a number of IO requests to the SQ, then calls a specific system call to inform the kernel that there are requests queued. The kernel then processes the requests in the SQ, creating a completion entry in the CQ for each one processed as it completes. The user process can either block waiting for some number of entries in the CQ, or check in a non-blocking manner for available completion entries. Once a completion entry is available, the user process then collects the data for it, and marks it as processed so the kernel knows that that slot in the CQ can be reused.
Realistically, we would be utilizing the liburing userspace library to leverage io_uring, as it provides a much cleaner and much simpler interface to work with.
Because of how this works, there are both far fewer context switches, and far fewer bytes of data copied around than would be required for traditional synchronous IO, resulting in both a much faster average completion time for large batches of IO, and a lower impact on the rest of the system. Instead of requiring at least two context switches per IO operation, io_uring only requires two context switches for each batch of IO.
The downside to this is that it would require a significant rewrite of much of the code that handles processing of files in
/sys
and/proc
to utilize effectively, and it is only available on Linux. It’s also relatively new, so we would need either build-time, or ideally runtime, detection for it.The apps plugin seems like the prime candidate for testing this, as it reads hundreds of files on each collection cycle even on a system with relatively few processes, is usually one of the biggest users of CPU time on most Netdata installs, and can also easily be artificially bench marked with very large numbers of files.
Beta Was this translation helpful? Give feedback.
All reactions