Unbuffered stdout in python (as in python -u) from within the program
The best I could come up with:
>>> import os
>>> import sys
>>> unbuffered = os.fdopen(sys.stdout.fileno(), 'w', 0)
>>> unbuffered.write('test')
test>>>
>>> sys.stdout = unbuffered
>>> print 'test'
test
Tested on GNU/Linux. It seems it should work on Windows too. If I knew how to reopen sys.stdout, it would be much easier:
sys.stdout = open('???', 'w', 0)
References:
http://docs.python.org/library/stdtypes.html#file-objects
http://docs.python.org/library/functions.html#open
http://docs.python.org/library/os.html#file-object-creation
[Edit]
Note that it would be probably better to close sys.stdout before overwriting it.
Disable output buffering
From Magnus Lycka answer on a mailing list:
You can skip buffering for a whole
python process usingpython -u
(or#!/usr/bin/env python -u
etc.) or by
setting the environment variable
PYTHONUNBUFFERED.You could also replace sys.stdout with
some other stream like wrapper which
does a flush after every call.class Unbuffered(object):
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
def writelines(self, datas):
self.stream.writelines(datas)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
import sys
sys.stdout = Unbuffered(sys.stdout)
print 'Hello'
Pyinstaller: setting unbuffered stdio on python 2.7 on Windows
This is more of a work around than a fix, but the way I was able to get around this was by using the option flush=True when calling the print statement. In other words, I replaced all print statements in my application with a new function called app_print() as follows:-
#Do something
app_print("printing a generic string")
then I defined app_print to always flush the stdio:-
def app_print(sting):
print(string, flush=True)
This fixed my problem and the resulting executable now has an instantaneous feel similar to how the original python script did.
More links about this:-
- How can I flush the output of the print function
- How to flush output of print function
- Python print function
Force unbuffered output for script made with buildout and zc.recipe.egg:scripts
You can force unbuffered I/O from within your Python script by re-opening stdin or stdout by opening a new file object on the filenumber:
import io, os, sys
try:
# Python 3, open as binary, then wrap in a TextIOWrapper
unbuffered = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
except TypeError:
# Python 2
unbuffered = os.fdopen(sys.stdout.fileno(), 'w', 0)
You can then reassign sys.stdout if you want to use other modules or build-ins that use stdout or stdin:
sys.stdout = unbuffered
Also see unbuffered stdout in python (as in python -u) from within the program
How can I flush the output of the print function?
In Python 3, print
can take an optional flush
argument:
print("Hello, World!", flush=True)
In Python 2, after calling print
, do:
import sys
sys.stdout.flush()
By default, print
prints to sys.stdout
(see the documentation for more about file objects).
Why is python process with unbuffered output scrambled using xargs --max-procs?
You might want to consider using GNU Parallel. By default, the output is buffered until the instance has completed running:
When running jobs that output data, you often do not want the output
of multiple jobs to run together. GNU parallel defaults to grouping
the output of each job, so the output is printed when the job
finishes. If you want the output to be printed while the job is
running you can use -u.
I believe the best way to run your script is vai:
find /path/to/logfiles/*.gz | parallel python logparser.py
or
parallel python logparser.py ::: /path/to/logfiles/*.gz
You can specify the number of processes to run using the -j
flag, i.e., -j4
.
The nice thing about Parallel is that is supports cartesian products of input arguments. For example, if you had some additional arguments that you wanted to iterate through for each file, you can use:
parallel python logparser.py ::: /path/to/logfiles/*.gz ::: 1 2 3
This will result in running the following across multiple processes:
python logparser.py /path/to/logfiles/A.gz 1
python logparser.py /path/to/logfiles/A.gz 2
python logparser.py /path/to/logfiles/A.gz 3
python logparser.py /path/to/logfiles/B.gz 1
python logparser.py /path/to/logfiles/B.gz 2
python logparser.py /path/to/logfiles/B.gz 3
...
Good luck!
Force line-buffering of stdout in a pipeline
Try unbuffer
(man
page) which is part of the expect
package. You may already have it on your system.
In your case you would use it like this:
unbuffer ./a | tee output.txt
The -p
option is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments.
Popen does not give output immediately when available
Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.
Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.
However, if you call /bin/echo
instead of the shell built-in echo
, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo
process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.
Related Topics
Why Is Reading Lines from Stdin Much Slower in C++ Than Python
Extracting Text from Ms Word Files in Python
System-Wide Mutex in Python on Linux
Why Does "A == X or Y or Z" Always Evaluate to True
How to Count the Occurrences of a List Item
How Does Python'S Super() Work With Multiple Inheritance
Get Statistics For Each Group (Such as Count, Mean, etc) Using Pandas Groupby
How to Do a Case-Insensitive String Comparison
Is There Go Up Line Character? (Opposite of \N)
Pycharm and Sys.Argv Arguments
Cross-Platform Space Remaining on Volume Using Python
Short Description of the Scoping Rules
Are Global Variables Thread-Safe in Flask? How to Share Data Between Requests