How to Display The Current Disk Io Queue Length on Linux

How to monitor the IO queue depth

Solution:

cd /sys/kernel/debug/tracing/events/nvme/nvme_sq
# filter by disk name:
echo 'disk=="nvme0n1"' > filter
# enable the event:
echo 1 > enable
# check results from trace_pipe:
cat /sys/kernel/debug/tracing/trace_pipe

I suggest also enable /sys/kernel/debug/tracing/events/nvme/nvme_setup_cmd, then, you can briefly understand what is the nvme driver doing.

          <idle>-0       [002] d.h.  2558.073405: nvme_sq: nvme0: disk=nvme0n1, qid=3, head=76, tail=76
   systemd-udevd-3805    [002] ....  2558.073454: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=48, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=104856608, len=7, ctrl=0x8000, dsmgmt=7, reftag=0)
          <idle>-0       [002] d.h.  2558.073664: nvme_sq: nvme0: disk=nvme0n1, qid=3, head=77, tail=77
   systemd-udevd-3805    [002] ....  2558.073704: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=49, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=104856648, len=7, ctrl=0x8000, dsmgmt=7, reftag=0)
          <idle>-0       [002] d.h.  2558.073899: nvme_sq: nvme0: disk=nvme0n1, qid=3, head=78, tail=78
   systemd-udevd-3805    [002] ....  2558.073938: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=50, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=104854512, len=7, ctrl=0x8000, dsmgmt=7, reftag=0)
          <idle>-0       [002] d.h.  2558.074134: nvme_sq: nvme0: disk=nvme0n1, qid=3, head=79, tail=79

The explanation of each field in this output can be found here.

Does anybody knows the way to access Disk IO queue length in Java?

sar -dp on linux should give you the queue length

How is run queue length computed in linux proc filesystem

As gcla said you cat use

cat /proc/loadavg

to read loadavarage from from kernel - but strictly speaking, it is not a queue length.

Take a look at

grep procs_running /proc/stat

and

grep procs_blocked /proc/stat

First is an actual running queue and second is a number of process blocked on disk IO. Load average is a function from sum of both.

Disk Write Queue Length

It is device driver detail. Whenever you go hunting for such details there are three places you look:

An IOCTL, the kind you use with DeviceIoControl(). That is a dead end.
A performance counter, Perfmon.exe is the best tool to see what is available. Out pops category "LogicalDisk", counter "Current Disk Queue Length", instance is the drive letter
A WMI query, best googled with a query like "wmi disk queue length". Out pops the first hit, the Win32_PerfFormattedData_PerfDisk_PhysicalDisk class.

Lots of sample code around to show you how to use a performance counter or a WMI query in your code, google away.

Disk io queue overflow

The queue doesn't extend to RAM. There's a disk cache with dirty pages. The OS really would like to write those to disk. Some programs may even block while they're waiting for their dirty pages to be written. And as programs get blocked, they stop writing further data to disk. Pretty self-limiting, actually.

IO request queueing

1. How exactly are they implemented within the OS?

Making the request

Let's see what happen when you invoke a write on a device.

You ask your runtime to perform a file write.
The runtime in the end invokes a system call.
The kernel locate the driver/module associated with the device you wrote into.
The driver/module performs its operations and may forward the request to other driver/module.
There is a last driver/module that specifically handle the hardware device, record the request and send the write command.

Putting the task to sleep
The last driver/module return a status code in order to inform its caller what happened. For the sake of simplicity assume that the return status are only: Success, Fails, Pending.

By returning Pending the driver/module informs that the request is ongoing.

The control return from 5. to 2. where the kernel is given the return status of the write.

If it is Success or Error it return to the user space indicating success or failure. If it is Pending however it marks the program as Waiting for IO (by using any OS specific method).

This is the key point, a task that is waiting it is not scheduled, even it there is nothing else to do.

While going from 1. to 5. the kernel can (in most OS and only when the driver/module or the kernel itself allow that) be preempted, i.e. interrupted, but the state of the calling task would be Running and so it will be rescheduled later, however a waiting task is not scheduled.

Waiking the task
When the the hardware is done, it request the CPU attention with an interrupt. The module/driver in 5. set a callback so that it get its code executed when the hardware it manages invokes the CPU attention.

After eventually finalizing the hardware operation, the module/driver inform the kernel that a specific request (previously recorded) is done.

The kernel know which task did the request and change its state to Running, thereby waking it.

2. How do the underlying devices work on an instruction level - e.g. the exact CPU instructions for storage and network hardware?

To handle a network device you mostly use moving instructions.

To handle a storage device you mostly use moving instructions.

To handle a [insert device type here] device you mostly use moving instructions.

In a computer everything is just a move of data. You move 09h and 41h in a multiplication ALU and you get 249h, you move 09h and 41h in a display memory and you get a colored A. You move 09h and 41h in a timer register and you get a beep sound.

System can be very different but the principle is the same. You move data to and from location identified by address. In most system this is the same as writing to RAM, however every system may have peculiarities, such as different kind of set of address accessed with different instructions (these are commonly referred as address spaces, example are: memory address space, IO address space, bus address space, PCI configuration address space, NUMA address space, ...).

An x86 CPU itself is aware of two address space: Memory Address space (up to 48 bit nowadays) and IO Address space (16 bit).

The first is accessed with any memory instruction (like if you were accessing global variables), the second one with specific instruction that has mnemonic that start with in and out.

Note however that IO address space is a vestigial space of the early days of IA32, new devices nowadays don't use it.

3. How many IO instructions can be pending with a device at a time?

4. How many IO requests can be queued with the OS, e.g. Linux and Windows at a time? Where to look for this info for other OSes?

If you mean actively pending, i.e. executed by the hardware concurrently, then this is usually one or little more.

Most controllers (like AHCI, XHCI, EHCI and so on) handle data structures that let the device to enqueue more than one requests, however they are handled one at a time. Usually there is no limit (but the available memory to describe the requests) but this is hardware specific.

Other controllers, like SCSI ones, can handle more than one command at a time.

The OS handles a software queue of request too, so again potentially infinite and limited only by the available memory.
Usually however the limit is a reasonable number.

Alex D. answer has more details on this.

For Windows, the DDK help cites

Except for file system drivers, the I/O manager associates a device queue object (for queuing IRPs) with each device object that a driver creates.

However it doesn't specify if there is a limit and in the programming interface I found no API to specify a limit on the number of requests.

So it is up to the driver coder to eventually impose an upper bound.

I don't know if Windows has a limit per sé (I presume it has not).