Using Named Pipes with Bash - Problem with Data Loss

Using named pipes with bash - Problem with data loss

Your problem is if statement below:

while true
do
    if read txt <"$pipe"
    ....
done

What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.

To fix this, change your loop in the server open the pipe once for the entire loop:

while true
do
    if read txt
    ....
done < "$pipe"

Done this way, the pipe is opened once and kept open.

You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.

Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:

while true
do
    if read txt
    ....
done < "$pipe" 3> "$pipe"

This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.

Linux - Named pipes - losing data

Named pipes lose their contents when the last process closes them. In your example, this can happen if the writer process does another iteration while the reader process is about to do fis.close(). No error is reported in this case.

A possible fix is to arrange that the reader process never closes the fifo. To get rid of the EOF condition when the last writer disconnects, open the fifo for writing, close the read end, reopen the read end and close the temporary write end.

Wrong read from a named pipe

You must have more than one reader of the named pipe host-pipe running for this to happen.

Check to see if you have a second instance of the script running in the background or possibly in another terminal.

Explanation

You will find that bash will issue reads from the pipe 1 byte at a time. If you are on Linux, you can strace your script. Here is an excerpt:

open("host-pipe", O_RDONLY|O_LARGEFILE) = 3
fcntl64(0, F_GETFD)                     = 0
fcntl64(0, F_DUPFD, 10)                 = 10
fcntl64(0, F_GETFD)                     = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
dup2(3, 0)                              = 0
close(3)                                = 0
ioctl(0, TCGETS, 0xbf99bfec)            = -1 ENOTTY (Inappropriate ioctl for device)
_llseek(0, 0, 0xbf99c068, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
read(0, "a", 1)                         = 1
read(0, "b", 1)                         = 1
read(0, "c", 1)                         = 1
read(0, "d", 1)                         = 1
read(0, "e", 1)                         = 1
read(0, "f", 1)                         = 1
read(0, "\n", 1)                        = 1
dup2(10, 0)                             = 0
fcntl64(10, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
close(10)                               = 0

Once you have more than one process with this consumption pattern, any single process will see lost characters.

How can I have output from one named pipe fed back into another named pipe?

it seems to me, you do not understand what a named pipe really is. A named pipe is not one stream like normal pipes. It is a series of normal pipes, because a named pipe can be closed and a close on the producer side is might be shown as a close on the consumer side.

The might be part is this: The consumer will read data until there is no more data. No more data means, that at the time of the read call no producer has the named pipe open. This means that multiple producer can feed one consumer only when there is no point in time without at least one producer. Think of it of door which closes automatically: If there is a steady stream of people keeping the door always open either by handing the doorknob to the next one or by squeezing multiple people through it at the same time, the door is open. But once the door is closed it stays closed.

A little demonstration should make the difference a little clearer:

Open three shells. First shell:

1> mkfifo xxx
1> cat xxx

no output is shown because cat has opened the named pipe and is waiting for data.

Second shell:

2> cat > xxx

no output, because this cat is a producer which keeps the named pipe open until we tell him to close it explicitly.

Third shell:

3> echo Hello > xxx
3>

This producer immediately returns.

First shell:

Hello

The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.

Third shell

3> echo World > xxx
3>

First shell:

World

The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.

Second Shell: write into the cat > xxx window:

And good bye!
(control-d key)
2>

First shell

And good bye!
1>

The ^D key closed the last producer, the cat > xxx, and hence the consumer exits also.

In your case which means:

Your log function will try to open and close the pipes multiple times. Not a good idea.
Both your while loops exit earlier than you think. (check this with (while ... done < $PIPE_X; echo FINISHED; ) &
Depending on the scheduling of your various producers and consumers the door might by slam shut sometimes and sometimes not - you have a race condition built in. (For testing you can add a sleep 1 at the end of the log function.)
You "testcases" only tries each possibility once - try to use them multiple times (you will block, especially with the sleeps ), because your producer might not find any consumer.

So I can explain the problems in your code but I cannot tell you a solution because it is unclear what the edges of your requirements are.

Sending structured data over named pipe (Linux)

A pipe (named or not) is a stream of bytes. If you were using the same language on both sides, there might be a better way of sending structured data. In your situation, a manual encoding and decoding, like you're doing, is by far the easiest solution.
Don't use spaces to separate fields that may contain spaces, such as people's names. Use :, like /etc/passwd.
In C, read is hard to use, because you have to decide on a buffer size in advance and you have to call it in a loop because it may return less than the buffer size on a whim. The functions from stdio.h (that operate on a FILE* rather than a file descriptor) are easier to use but still require work to handle long lines. If you don't care about portability outside Linux, use getline:
```
FILE *pipe = fdopen(fd, "r");
char *line = NULL;
size_t line_length;
getline(&line, &line_length, pipe);
```

Then use strchr to locate the :s in the line. (Don't be tempted to use strtok, it's only suitable for whitespace-separated fields that can't be empty.)

Can a pipe in Linux ever lose data?

Barring a machine crash, no it can't lose data. It's easy to misuse it and think you're losing data however, either because a write failed to write all the data you requested and you didn't check the return value or you did something wrong with the read.

The maximum amount of data it can hold is system dependent -- if you try to write more than that, you'll either get a short write or the writer will block until space is available. The pipe(7) man page contains lots of useful info about pipes, including (on Linux at least) how big the buffer is. Linux has buffers of 4K or 64K depending on version.

edit

Tim mentions SIGPIPE, which is also a potential issue that can seem to lose data. If the reader closes the pipe before reading everything in it, the unread data will be thrown away and the writer will get a SIGPIPE signal when they write more or close the pipe, indicating that this has occurred. If they block or ignore the SIGPIPE, they'll get an EPIPE error. This covers the situation Paul mentioned.

PIPE_BUF is a constant that tells you the limit of atomic writes to the buffer. Any write this size or smaller will either succeed completely or block until it can succeed completely (or give EWOULDBLOCK/EAGAIN if the pipe is in non-blocking mode). It has no relation to the actual size of the kernel's pipe buffer, though obviously the buffer must be at least PIPE_BUF in size to meet the atomicity guarentee.

Varying behavior when using named pipe with background process

a_or_b <pipe &

Redirections are processed before starting commands. The shell blocks trying to open pipe for reading. From the mkfifo(3) man page:

Opening a FIFO for reading normally blocks until some other process opens the same FIFO for writing, and vice versa.

The shell can't proceed until another process opens the FIFO for writing. Only then will it finish setting up the redirection and actually call a_or_b.

echo 'x' > pipe
a_or_b <pipe &

Unfortunately this has the same, but inverse problem. The shell can't proceed past the > pipe redirection until another process opens the FIFO for reading. It never gets to the echo nor the second line that would read from the pipe.

a_or_b <pipe &
echo 'x' > pipe

Hopefully you can now see why this version works. The background shell tries to read from the pipe and blocks. The foreground shell, a separate process, writes to it. Aha! Two processes have opened the pipe, one for reading and one for writing. They can now both proceed. The birds sing and everybody is happy.

Using Named Pipes with Bash - Problem with Data Loss