Total Number of Bytes Read/Written by a Linux Process and Its Children

Total number of bytes read/written by a Linux process and its children

A little awk, and strace is what you want.

strace -e trace=read,write -o ls.log ls

gives you a log of the read and write syscalls. Now you can take this log and sum the last column like this

cat ls.log | grep read | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'

You might wan't to change the grep to match only a read at the beginning of the line.

Determine the number of logical bytes read/written in a Linux system

If you want to use /proc filesystem for the total counts (and not for per second counts), it is quite easy.

This works also on quite old kernels (tested on Debian Squeeze 2.6.32 kernel).

# cat /proc/1979/io
rchar: 111195372883082
wchar: 10424431162257
syscr: 130902776102
syscw: 6236420365
read_bytes: 2839822376960
write_bytes: 803408183296
cancelled_write_bytes: 374812672

For system-wide, just sum the numbers from all processes, which however will be good enough only in short-term, because as processes die, their statistics are removed from memory. You would need process accounting enabled to save them.

Meaning of these files is documented in the kernel sources file Documentation/filesystems/proc.txt:

rchar - I/O counter: chars read

The number of bytes which this task has caused
to be read from storage. This is simply the sum of bytes which this
process passed to read() and pread(). It includes things like tty IO
and it is unaffected by whether or not actual physical disk IO was
required (the read might have been satisfied from pagecache)

wchar - I/O counter: chars written

The number of bytes which this task has
caused, or shall cause to be written to disk. Similar caveats apply
here as with rchar.

syscr - I/O counter: read syscalls

Attempt to count the number of read I/O
operations, i.e. syscalls like read() and pread().

syscw - I/O counter: write syscalls

Attempt to count the number of write I/O
operations, i.e. syscalls like write() and pwrite().

read_bytes - I/O counter: bytes read

Attempt to count the number of bytes which
this process really did cause to be fetched from the storage layer.
Done at the submit_bio() level, so it is accurate for block-backed
filesystems.

write_bytes - I/O counter: bytes written

Attempt to count the number of bytes which
this process caused to be sent to the storage layer. This is done at
page-dirtying time.

cancelled_write_bytes

The big inaccuracy here is truncate. If a process writes 1MB to a file
and then deletes the file, it will in fact perform no writeout. But it
will have been accounted as having caused 1MB of write. In other
words: The number of bytes which this process caused to not happen, by
truncating pagecache. A task can cause "negative" IO too.

FIFO read() function gets stuck in c

The problem is that you closed the pipe in the first process. A pipe doesn't have any permanent storage, it can only hold data while it's open by at least one process. When you close the pipe, the data that you've written to it is discarded.

As a result, when the second process tries to read from the pipe, there's nothing available.

You need to keep the pipe FD open when you execute the second process. Get rid of close(fd); in the reader program.

Two processes in C reading from the same file problem

Basic answer

Given that the data file is 73 bytes long (give or take — you might have extra white space around that I didn't guess at), the first call to fscanf() will read the whole file into memory. The parent process then reads 10 lines worth from memory, moving the read pointer in the standard I/O buffer. The trailing newlines in the fscanf() format strings are not really needed; the %d skips white space, which includes newlines, and if the input were not coming from a file, the trailing blank line would be a very bad UX — the user would have to type the (start of the) next number to complete the current input. (See scanf() leaves the newline in the buffer and What is the effect of trailing white space in a scanf() format string?.)

Then the process forks. The child is an exact copy of the parent, so it continues reading where the parent left off, and prints 10 numbers as you expected, and then exits.

The parent process then resumes. It has done nothing to change the position of the pointer in memory, so it continues where it left off. However, the reading code now reads single characters and prints their decimal values, so it gets 50,
57,
10 — the character codes for '2', '9', and '\n'. And so the output continues for all the rest of the prime numbers in the input.

You really need to fix the input to resume using fscanf() instead of fgetc().

There isn't a sensible way for the parent to know what the child has done other than by changing from buffered I/O to unbuffered I/O. If you switched to unbuffered I/O, by calling setbuf(fichier, NULL); or setvbuf(fichier, NULL, _IONBF, 0); after opening the file but before doing any other operation on the file stream, then you would see that the parent process continues where it left off.

A side-note: I'm not convinced about the loop in create_process() — if there aren't enough resources, at least wait a little to give the system time to find some, but it is more common to treat 'out of resources' as a fatal error.

Another side-note: sending a signal to a process that's already died (because you waited for it to die) isn't going to work.

Here's some revised code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

static pid_t create_process(void)
{
    pid_t pid = fork();
    if (pid < 0)
    {
        fprintf(stderr, "Failed to fork\n");
        exit(1);
    }
    return pid;
}

int main(void)
{
    const char filename[] = "entiers.txt";
    FILE *fichier = fopen(filename, "r");
    int i = 0;
    int n = 0;

    // setbuf(fichier, NULL);
    // setvbuf(fichier, NULL, _IONBF, 0);

    if (fichier == 0)
    {
        fprintf(stderr, "Failed to open file '%s' for reading\n", filename);
        exit(1);
    }

    printf("I am the parent process with the pid %d\n", getpid());
    for (i = 0; i < 10; i++)
    {
        if (fscanf(fichier, "%d", &n) != 1)
            break;
        printf("%d\n", n);
    }

    pid_t pid = create_process();

    if (pid == 0)
    {
        printf("I am the child process with the pid %d\n", getpid());
        for (i = 0; i < 10; i++)
        {
            if (fscanf(fichier, "%d", &n) != 1)
                break;
            printf("%d\n", n);
        }
    }
    else
    {
        wait(NULL);
        printf("I am the parent process with the pid %d\n", getpid());
        while (fscanf(fichier, "%d", &n) == 1)
            printf("%d\n", n);
    }

    fclose(fichier);

    return EXIT_SUCCESS;
}

Sample output:

I am the parent process with the pid 15704
2
3
5
7
11
13
17
19
23
29
I am the child process with the pid 15705
31
37
41
43
47
53
59
61
67
71
I am the parent process with the pid 15704
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97

Very often, questions like this involve file descriptor I/O and the discussion has to cover the different between an open file descriptor and an open file description and explain what's shared between processes and what isn't. Because the input file is so small, that isn't an issue with this code. If the table of primes went up to, say, 999983 (the largest prime smaller than a million), and the child process read much more data, then you'd see different effects altogether.

Unbuffered input and trailing newlines in `scanf()` format strings

Empirical observation shows that when the original version of the code shown above had scanf("%d\n", &n) in both the parent's first read loop and the child's read loop, and the program was configured to use unbuffered input, the output would look like:

…
67
71
I am the parent process with the pid 27071
33
79
…

where the 33 isn't expected at first glance. However, there is an explanation for what goes wrong.

There's at least one byte of pushback available on the stream (even with no buffering), so at the point where the parent forks, both parent and child have the 3 from 31 in the pushback position (because the newline was read as a white space character and the first non-blank, aka the 3 of the line containing 31 was read and pushed back into the input buffer).

The child is an almost exact copy of the parent, and reads the pushback character and continues with the 1 and gets the newline and then the 3 of 37, and prints 31 as you'd expect. This continues until it reads the 7 at the start of 73 and pushes it back into its own input buffer, but that has no effect on the parent's input buffer of course (they're separate processes). The child exits.

The parent resumes. It has a 3 in its pushback position, and then gets the 3 from 73 (because the parent and child share the same open file description, and the read position is associated with the description, not the descriptor, so the child has moved the read position), and then it gets a newline and and terminates its scanning (the last loop was missing the trailing white space in the scanf() format string anyway), and prints 33 correctly. It then proceeds to read the rest of the input cleanly, skipping over white space (newline) before reading each number.

Changing the code to use fscanf(fichier, "%d", &n) throughout means that the child process stops with the newline before 73 in its pushback buffer, and the read position pointing at the 7 of 73, which is exactly where the parent needs it.

If the first parent loop had omitted the newline in the fscanf() format, then the child would still have worked, but the parent would have reported 3 as the first number when it resumed, instead of 33.

how to read entirety of pipe

For performance reasons, one typically reads a chunk at a time, not a character at a time.

Loop,
1. Attempt to enlarge the buffer so it can fit CHUNK_SIZE more bytes.
2. If an error occurred,
  1. Fail.
3. Attempt to read CHUNK_SIZE bytes from the pipe into the unused part of the buffer.
4. If an error occurred,
  1. Fail.
5. If EOF was reached,
  1. Break.
6. Increased the total number of bytes read by the number of bytes read.

Block Linux read(2) until all of count bytes have arrived

While read can be interrupted by a signal before the requested data is received, it cannot really be done without while.

You have to check the return value and count bytes, unfortunately.
And yes, the easiest way would be to write a wrapping function.

Unable to exit while loop after reading information written to pipe

There are several problems in the code.

Unfortunately I cannot compile it and fix the errors because it is incomplete.

You cannot define arrays with a size that is not constant like this.
```
int pipes[proc][2]; // Pipes to be created
```
I would expect the compiler to show a warning at this line.

You should either use dynamic allocation (malloc) or statically allocate the arrays with a maximum size and check that proc is not greater than the maximum.
You have to close the write end of all pipes in all children. The read will detect EOF only if no process has the write end still open.

Instead of

while(1)
{
    if ( (r_val = read(pipes[procid][0], expression, MAIN_BUF_LEN)) > 0)
    {
        /*...*/
    }
    else
    {
        break;
    }
}

I suggest

while((r_val = read(pipes[procid][0], expression, MAIN_BUF_LEN)) > 0)
{
        /*...*/
}

Instead of

    pid = fork();
    if (pid != 0)
        printf("created child with child pid %d\n", pid);

it should be

    pid = fork();
    if (pid > 0)
        printf("created child with child pid %d\n", pid);

because pid < 0 is an error.

Instead of

if (pid == 0) // in child process
{
    child_work(pipes, proc, i, out_ptr);
    break;
}

use

if (pid == 0) // in child process
{
    child_work(pipes, proc, i, out_ptr);
    return 0;
}

With break; the child would continue with the code after the for loop that would read the file and write to the pipes when child_work returns.

It is not guaranteed that every child will get its turn to read from the pipe before the parent writes the next data, so it may get two or more messages in a single read. In real applications you should also be prepared to handle incomplete read or write calls and to continue writing/reading the remaining data with additional read or write calls.
I think the easiest way to handle partial read or write would be to use buffered IO. You can use fdopen with the wrtite file descriptor or the read file descriptor of the pipe and write/read the data as a line of text terminated with a newline using e.g. fprintf or fgets respectively.

Total Number of Bytes Read/Written by a Linux Process and Its Children