Force Another Program'S Standard Output to Be Unbuffered Using Python

Unbuffered stdout in python (as in python -u) from within the program

The best I could come up with:

>>> import os
>>> import sys
>>> unbuffered = os.fdopen(sys.stdout.fileno(), 'w', 0)
>>> unbuffered.write('test')
test>>> 
>>> sys.stdout = unbuffered
>>> print 'test'
test

Tested on GNU/Linux. It seems it should work on Windows too. If I knew how to reopen sys.stdout, it would be much easier:

sys.stdout = open('???', 'w', 0)

References:

http://docs.python.org/library/stdtypes.html#file-objects

http://docs.python.org/library/functions.html#open

http://docs.python.org/library/os.html#file-object-creation

[Edit]

Note that it would be probably better to close sys.stdout before overwriting it.

Disable output buffering

From Magnus Lycka answer on a mailing list:

You can skip buffering for a whole
python process using python -u
(or #!/usr/bin/env python -u etc.) or by
setting the environment variable
PYTHONUNBUFFERED.
You could also replace sys.stdout with
some other stream like wrapper which
does a flush after every call.
class Unbuffered(object):
   def __init__(self, stream):
       self.stream = stream
   def write(self, data):
       self.stream.write(data)
       self.stream.flush()
   def writelines(self, datas):
       self.stream.writelines(datas)
       self.stream.flush()
   def __getattr__(self, attr):
       return getattr(self.stream, attr)

import sys
sys.stdout = Unbuffered(sys.stdout)
print 'Hello'

Pyinstaller: setting unbuffered stdio on python 2.7 on Windows

This is more of a work around than a fix, but the way I was able to get around this was by using the option flush=True when calling the print statement. In other words, I replaced all print statements in my application with a new function called app_print() as follows:-

#Do something
app_print("printing a generic string")

then I defined app_print to always flush the stdio:-

def app_print(sting):
    print(string, flush=True)

This fixed my problem and the resulting executable now has an instantaneous feel similar to how the original python script did.

Force unbuffered output for script made with buildout and zc.recipe.egg:scripts

You can force unbuffered I/O from within your Python script by re-opening stdin or stdout by opening a new file object on the filenumber:

import io, os, sys
try:
    # Python 3, open as binary, then wrap in a TextIOWrapper
    unbuffered = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
except TypeError:
    # Python 2
    unbuffered = os.fdopen(sys.stdout.fileno(), 'w', 0)

You can then reassign sys.stdout if you want to use other modules or build-ins that use stdout or stdin:

sys.stdout = unbuffered

Also see unbuffered stdout in python (as in python -u) from within the program

How can I flush the output of the print function?

In Python 3, print can take an optional flush argument:

print("Hello, World!", flush=True)

In Python 2, after calling print, do:

import sys
sys.stdout.flush()

By default, print prints to sys.stdout (see the documentation for more about file objects).

Why is python process with unbuffered output scrambled using xargs --max-procs?

You might want to consider using GNU Parallel. By default, the output is buffered until the instance has completed running:

When running jobs that output data, you often do not want the output
of multiple jobs to run together. GNU parallel defaults to grouping
the output of each job, so the output is printed when the job
finishes. If you want the output to be printed while the job is
running you can use -u.

I believe the best way to run your script is vai:

find /path/to/logfiles/*.gz | parallel python logparser.py

parallel python logparser.py ::: /path/to/logfiles/*.gz

You can specify the number of processes to run using the -j flag, i.e., -j4.

The nice thing about Parallel is that is supports cartesian products of input arguments. For example, if you had some additional arguments that you wanted to iterate through for each file, you can use:

parallel python logparser.py ::: /path/to/logfiles/*.gz ::: 1 2 3

This will result in running the following across multiple processes:

python logparser.py /path/to/logfiles/A.gz 1
python logparser.py /path/to/logfiles/A.gz 2
python logparser.py /path/to/logfiles/A.gz 3
python logparser.py /path/to/logfiles/B.gz 1
python logparser.py /path/to/logfiles/B.gz 2
python logparser.py /path/to/logfiles/B.gz 3
...

Good luck!

Force line-buffering of stdout in a pipeline

Try unbuffer (man page) which is part of the expect package. You may already have it on your system.

In your case you would use it like this:

unbuffer ./a | tee output.txt

The -p option is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments.

Popen does not give output immediately when available

Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.

Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.

However, if you call /bin/echo instead of the shell built-in echo, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.