Can a pipe in Linux ever lose data?
Barring a machine crash, no it can't lose data. It's easy to misuse it and think you're losing data however, either because a write failed to write all the data you requested and you didn't check the return value or you did something wrong with the read.
The maximum amount of data it can hold is system dependent -- if you try to write more than that, you'll either get a short write or the writer will block until space is available. The pipe(7)
man page contains lots of useful info about pipes, including (on Linux at least) how big the buffer is. Linux has buffers of 4K or 64K depending on version.
edit
Tim mentions SIGPIPE, which is also a potential issue that can seem to lose data. If the reader closes the pipe before reading everything in it, the unread data will be thrown away and the writer will get a SIGPIPE signal when they write more or close the pipe, indicating that this has occurred. If they block or ignore the SIGPIPE, they'll get an EPIPE error. This covers the situation Paul mentioned.
PIPE_BUF
is a constant that tells you the limit of atomic writes to the buffer. Any write this size or smaller will either succeed completely or block until it can succeed completely (or give EWOULDBLOCK/EAGAIN if the pipe is in non-blocking mode). It has no relation to the actual size of the kernel's pipe buffer, though obviously the buffer must be at least PIPE_BUF in size to meet the atomicity guarentee.
Linux - Named pipes - losing data
Named pipes lose their contents when the last process closes them. In your example, this can happen if the writer process does another iteration while the reader process is about to do fis.close()
. No error is reported in this case.
A possible fix is to arrange that the reader process never closes the fifo. To get rid of the EOF condition when the last writer disconnects, open the fifo for writing, close the read end, reopen the read end and close the temporary write end.
Using named pipes with bash - Problem with data loss
Your problem is if
statement below:
while true
do
if read txt <"$pipe"
....
done
What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.
To fix this, change your loop in the server open the pipe once for the entire loop:
while true
do
if read txt
....
done < "$pipe"
Done this way, the pipe is opened once and kept open.
You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.
Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:
while true
do
if read txt
....
done < "$pipe" 3> "$pipe"
This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.
Do Named Pipes Clear Read Data
Data from pipes, named or otherwise, is consumed when read. If you want to write persistent data, use a regular file.
Note that the pipe will grow if data isn't read, up to a size limit defined by system configuration.
Is pipe or BufferedReader in java likely to loose data?
No. Not even slightly. The Reader
is exactly as reliable as the underlying pipe or socket. If it's TCP it can't lose data without resetting the connection.
How does data get processed across pipes?
There is buffering in each pipe, and maybe in the stdio layers of each program. Data will not make it to the disk until the final grep has processed enough lines to cause its buffers to fill to the point of being spilled to disk.
If you run your pipeline on the command-line, and then hit Ctrl-C, sigint
will be sent to every process, terminating each, and losing any pending output.
Either:
Ignore
sigint
in all processes but the first. Bash hackery follows:$ wget --spider --force-html -r -l2 http://example.com 2>&1 grep '^--' |
{ trap '' int; awk '{ print $3 }'; } |
∶Simply deliver the keyboard interrupt to the first process. Interactively you can discover the pid with
jobs -l
and thenkill
that. (Run the pipeline in the background.)
$ jobs -l
[1]+ 10864 Running wget
3364 Running | grep
13500 Running | awk
∶
$ kill -int 10864Play around with the
disown
bash builtin.
Stop accepting input on pipe but read buffered data
After struggling with this for a few days, I ended up having to drop IO.read
and use IO.sysread
instead and do my own buffering. The solution with this really isn't that complex, and below is the implementation.
Signal.trap('INT') do
$stdin.close
end
def myread(bufio, bytes) # `bufio` is a StringIO object, `bytes` is bytes to read
begin
while bufio.size < bytes do
bufio.write($stdin.sysread(bytes - bufio.size))
end
rescue SignalException, Interrupt, Errno::EINTR => e
retry
rescue SystemCallError, IOError, EOFError => e
# nothing, we're done
end
end
My exact code is a little different from that as I'm using the AWS ruby SDK, so that myread
method is actually just a block passed to AWS::S3::S3Object.write
Related Topics
Docker Run Hello-World Still Fails, Permission Denied
How to Pass Env Variables Between Make Targets
How to Access Environment Variables Inside .Gdbinit and Inside Gdb Itself
Create a Hard Link from a File Handle on Unix
How to Untar All .Tar.Gz with Shell-Script
How to Find Files Except Given Name
Tab Completion in Emacs Shell-Mode Ssh Sessions
How to Use Systemd to Restart a Service When Down
Application Counters in Linux? (And Osx)
Installing Gcc from Source on Alpine
What Does "No More Variables Left in This Mib View" Mean (Linux)
Can't Remove, Purge, Unistall Mongodb from Debian
Adding New System Call to Linux Kernel 3.13 on 64 Bit System
Detecting The Output Stream Type of a Shell Script
How to List All Binary File Extensions Within a Directory Tree
How to Get The Bash Date Script to Return a Day of The Week Relative to a Non-Current Time