How to Stop Sed from Buffering

How to stop sed from buffering?

An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like

BEGIN { $| = 1 }

The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.

But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:

perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'

perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'

perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'

Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.

Turn off buffering in pipe

You can use the unbuffer command (which comes as part of the expect package), e.g.

unbuffer long_running_command | print_progress

unbuffer connects to long_running_command via a pseudoterminal (pty), which makes the system treat it as an interactive process, therefore not using the 4-kiB buffering in the pipeline that is the likely cause of the delay.

For longer pipelines, you may have to unbuffer each command (except the final one), e.g.

unbuffer x | unbuffer -p y | z

Make sed not buffer by lines

There is a tool that matches an input stream against multiple regular expressions in parallel and acts as soon as it decides on a match. It's not sed. It's lex. Or the GNU version, flex.

To make this demonstration work, I had to define a YY_INPUT macro, because flex was line-buffering input by default. Even with no buffering at the stdio level, and even in "interactive" mode, there is an assumption that you don't want to process less than a line at a time.

So this is probably not portable to other versions of lex.

%{
#include <stdio.h>

#define YY_INPUT(buf,result,max_size) \
{ \
int c = getchar(); \
result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
}
%}

%%

abc fputs("zzz", stdout); fflush(stdout);
. fputs(yytext, stdout); fflush(stdout);

%%

int main(void)
{
setbuf(stdin, 0);
yylex();
}

Usage: put that program into a file called abczzz.l and run

flex --always-interactive -o abczzz.c abczzz.l
cc abczzz.c -ll -o abczzz
for ch in a b c 1 2 3 ; do echo -n $ch ; sleep 1 ; done | ./abczzz ; echo

Force line-buffering of stdout in a pipeline

Try unbuffer (man page) which is part of the expect package. You may already have it on your system.

In your case you would use it like this:

unbuffer ./a | tee output.txt

The -p option is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments.

sed and tee when working together in pipes no longer behaves like stream

stdout is usually line buffered only if connected to a terminal. Connecting it to a pipe causes full buffering.

Although some commands, including tee, are guaranteed not to buffer at all (hence the last example of tee | sed).

You can use stdbuf -oL sed 's...g' or unbuffer sed 's...g' to get line buffering. stdbuf is part of GNU coreutils so should be available on Linux, It's also available by default on FreeBSD.

This article from 2006 is old, but provides some good detail. The author is a GNU coreutils maintainer.

http://www.pixelbeat.org/programming/stdio_buffering/

How to make output of any shell command unbuffered?

AFAIK, you can't do it without ugly hacks. Writing to a pipe (or reading from it) automatically turns on full buffering and there is nothing you can do about it :-(. "Line buffering" (which is what you want) is only used when reading/writing a terminal. The ugly hacks exactly do this: They connect a program to a pseudo-terminal, so that the other tools in the pipe read/write from that terminal in line buffering mode. The whole problem is described here:

  • http://www.pixelbeat.org/programming/stdio_buffering/

The page has also some suggestions (the aforementioned "ugly hacks") what to do, i.e. using unbuffer or pulling some tricks with LD_PRELOAD.

Block cut with sed and suppress the last line

Below script :

sed -n '/LBL 75677/{p;:loop;n;/LBL/!{p;b loop}}' file

may be what you're looking for.

:loop here is a label and b loop is unconditional jumping to that label.

Here we create a small loop and go on to print the lines until the next LBL is reached.

How to prevent pipes from delaying output?

You can concatenate several lines of perl code with repeated -e options (be sure to end them with ; -- they are strung together to form a program). And you can make your pipes piping "hot" with $|=1. See the perl manual on $| for details (2/3 down the page, search for OUTPUT_AUTOFLUSH).

{ for i in `seq 3` ; do echo $i ; sleep 1 ; done ; } \
| perl -p -e 'BEGIN{$|=1};' \
-e 's,(.*ERROR.*),\e[01;31m\1\e[00m,g;' \
-e 's,(.*WARNING.*),\e[01;33m\1\e[00m,g;' \
-e 's,(.*TCPEchoTest.*),\e[01;30m\1\e[00m,g;' \
-e 's,(.*enters.*),\e[00;33m\1\e[00m,g;'

This prints 1, 2, 3 with one second in between each number. In fact, the BEGIN line is not needed when perl output is to the terminal. But you want it if you keep piping to another program.

How to avoid the last newline in sed?

$ awk '/^STOP/{exit} {printf "%s%s", ors, $0; ors=RS}' file
keep$

The above prints every line without a trailing newline but preceded by a newline (\n or \r\n - whichever your environment dictates so it'll behave correctly on UNIX or Windows or whatever) for every 2nd and subsequent line. When it finds a STOP line it just exits before printing anything.

Note that the above doesn't keep anything in memory except the current line so it'll work no matter how large your input file is and no matter where the STOP appears in it - it'll even work if STOP is the first line of the file unlike the other answers you have so far.

It will also work using any awk in any shell on every UNIX box.

The Concept of 'Hold space' and 'Pattern space' in sed

When sed reads a file line by line, the line that has been currently read is inserted into the pattern buffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.

Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command p prints the pattern space only. Likewise, s operates on the pattern space.

Here is an example:

sed -n '1!G;h;$p'

(the -n option suppresses automatic printing of lines)

There are three commands here: 1!G, h and $p. 1!G has an address, 1 (first line), but the ! means that the command will be executed everywhere but on the first line. $p on the other hand will only be executed on the last line. So what happens is this:

  1. first line is read and inserted automatically into the pattern space
  2. on the first line, first command is not executed; h copies the first line into the hold space.
  3. now the second line replaces whatever was in the pattern space
  4. on the second line, first we execute G, appending the contents of the hold buffer to the pattern buffer, separating it by a newline. The pattern space now contains the second line, a newline, and the first line.
  5. Then, h command inserts the concatenated contents of the pattern buffer into the hold space, which now holds the reversed lines two and one.
  6. We proceed to line number three -- go to the point (3) above.

Finally, after the last line has been read and the hold space (containing all the previous lines in a reverse order) have been appended to the pattern space, pattern space is printed with p. As you have guessed, the above does exactly what the tac command does -- prints the file in reverse.



Related Topics



Leave a reply



Submit