Understanding Popen.Communicate

Understanding Popen.communicate

.communicate() writes input (there is no input in this case so it just closes subprocess' stdin to indicate to the subprocess that there is no more input), reads all output, and waits for the subprocess to exit.

The exception EOFError is raised in the child process by raw_input() (it expected data but got EOF (no data)).

p.stdout.read() hangs forever because it tries to read all output from the child at the same time as the child waits for input (raw_input()) that causes a deadlock.

To avoid the deadlock you need to read/write asynchronously (e.g., by using threads or select) or to know exactly when and how much to read/write, for example:

from subprocess import PIPE, Popen

p = Popen(["python", "-u", "1st.py"], stdin=PIPE, stdout=PIPE, bufsize=1)
print p.stdout.readline(), # read the first line
for i in range(10): # repeat several times to show that it works
    print >>p.stdin, i # write input
    p.stdin.flush() # not necessary in this case
    print p.stdout.readline(), # read output

print p.communicate("n\n")[0], # signal the child to exit,
                               # read the rest of the output, 
                               # wait for the child to exit

Note: it is a very fragile code if read/write are not in sync; it deadlocks.

Beware of block-buffering issue (here it is solved by using "-u" flag that turns off buffering for stdin, stdout in the child).

bufsize=1 makes the pipes line-buffered on the parent side.

How to use the subprocess Popen.communicate() method?

You're not providing any stdout to the Popen constructor, the default functionality simply writes the output to parent's stdout handle. Hence you're seeing it being printed in your shell.

Quoting from Popen's docs:

With the default settings of None, no redirection will occur; the child’s file handles will be inherited from the parent.

To populate stdout in resulting tuple use subprocess.PIPE as stdout.

Quoting from docs:

To get anything other than None in the result tuple, you need to give
stdout=PIPE and/or stderr=PIPE too.

>>> import subprocess
>>> p = subprocess.Popen(["echo", "hello"], stdout=subprocess.PIPE)
>>> p.communicate()
('hello\n', None)

Read streaming input from subprocess.communicate()

Please note, I think J.F. Sebastian's method (below) is better.

Here is an simple example (with no checking for errors):

import subprocess
proc = subprocess.Popen('ls',
                       shell=True,
                       stdout=subprocess.PIPE,
                       )
while proc.poll() is None:
    output = proc.stdout.readline()
    print output,

If ls ends too fast, then the while loop may end before you've read all the data.

You can catch the remainder in stdout this way:

output = proc.communicate()[0]
print output,

Python Popen - wait vs communicate vs CalledProcessError

about the deadlock: It is safe to use stdout=PIPE and wait() together iff you read from the pipe. .communicate() does the reading and calls wait() for you
about the memory: if the output can be unlimited then you should not use .communicate() that accumulates all output in memory.

what is the proper thing to use here ?

To start subprocess, read its output line by line and to wait for it to exit:

#!/usr/bin/env python
from subprocess import Popen, PIPE

process = Popen(command, stdout=PIPE, bufsize=1)
with process.stdout:
    for line in iter(process.stdout.readline, b''): 
        handle(line)
returncode = process.wait()

This code does not deadlock due to a finite OS pipe buffer. Also, the code supports commands with unlimited output (if an individual line fits in memory).

iter() is used to read a line as soon as the subprocess' stdout buffer is flushed, to workaround the read-ahead bug in Python 2. You could use a simple for line in process.stdout if you don't need to read lines as soon as they are written without waiting for the buffer to fill or the child process to end. See Python: read streaming input from subprocess.communicate().

If you know that the command output can fit in memory in all cases then you could get the output all at once:

#!/usr/bin/env python
from subprocess import check_output

all_output = check_output(command)

It raises CalledProcessError if the command returns with a non-zero exit status. Internally, check_output() uses Popen() and .communicate()

There should be one-- and preferably only one --obvious way to do it

subprocess.Popen() is the main API that works in many many cases. There are convenience functions/methods such as Popen.communicate(), check_output(), check_call() for common use-cases.

There are multiple methods, functions because there are multiple different use-cases.

subprocess.Popen communicate() writes to console, but not to log file

Keep in mind that subprocess spawns a new process, and doesn't really communicate with the parent process (they're pretty much independent entities). Despite its name, the communicate method is just a way of sending/receiving data from the parent process to the child process (simulate that the user input something on the terminal, for instance)

In order to know where to write the output, subprocess uses numbers (file identifiers or file numbers). When subprocess spawns a process, the child process only knows that the standard output is the file identified in the O.S. as 7 (to say a number) but that's pretty much it. The subprocess will independently query the operative system with something like "Hey! What is file number 7? Give it to me, I have something to write in it." (understanding what a C fork does is quite helpful here)

Basically, the spawned subprocess doesn't understand your Logger class. It just knows it has to write its stuff to a file: a file which is uniquely identified within the O.S with a number and that unless otherwise specified, that number corresponds with the file descriptor of the standard output (but as explained in the case #2 below, you can change it if you want)

So you have several "solutions"...

Clone (tee) stdout to a file, so when something is written to stdout, the operative system ALSO writes it to your file (this is really not Python-related... it's OS related):

import os
import tempfile
import subprocess

file_log = os.path.join(tempfile.gettempdir(), 'foo.txt')
p = subprocess.Popen("python ./run_something.py | tee %s" % file_log, shell=True)
p.wait()

Choose whether to write to terminal OR to the file using the fileno() function of each. For instance, to write only to the file:

import os
import tempfile
import subprocess

file_log = os.path.join(tempfile.gettempdir(), 'foo.txt')
with open(file_log, 'w') as f:
    p = subprocess.Popen("python ./run_something.py", shell=True, stdout=f.fileno())
    p.wait()

What I personally find "safer" (I don't feel confortable overwriting sys.stdout): Just let the command run and store its output into a variable and pick it up later (in the parent process):

import os
import tempfile
import subprocess

p = subprocess.Popen("python ./run_something.py", shell=True, stdout=subprocess.PIPE)
p.wait()
contents = p.stdout.read()
# Whatever the output of Subprocess was is now stored in 'contents'
# Let's write it to file:
file_log = os.path.join(tempfile.gettempdir(), 'foo.txt')
with open(file_log, 'w') as f:
    f.write(contents)

This way, you can also do a print(contents) somewhere in your code to output whatever the subprocess "said" to the terminal.

For example purposes, the script "./run_something.py" is just this:

print("Foo1")
print("Foo2")
print("Foo3")

subprocess popen.communicate() vs. stdin.write() and stdout.read()

Your program without communicate() deadlocks because both processes are waiting on each other to write something before they write anything more themselves.

communicate() does not deadlock in your example because it closes the stream, like the command a.stdin.close() would. This sends an EOF to your subprocess, letting it know that there is no more input coming, so it can close itself, which in turn closes its output, so a.stdout.read() eventually returns an EOF (empty string).

There is no special signal that your main process will receive from your subprocess to let you know that it is done writing the results from one command, but is ready for another command.

This means that to communicate back and forth with one subprocess like you're trying to, you must read the exact number of lines that the subprocess sends. Like you saw, if you try to read too many lines, you deadlock. You might be able to use what you know, such as the command you sent it, and the output you have seen so far, to figure out exactly how many lines to read.

Why does Popen.communicate() return b'hi\n' instead of 'hi'?

The echo command by default returns a newline character

Compare with this:

print(subprocess.Popen("echo -n hi", \
    shell=True, stdout=subprocess.PIPE).communicate()[0])

As for the b preceding the string it indicates that it is a byte sequence which is equivalent to a normal string in Python 2.6+

http://docs.python.org/3/reference/lexical_analysis.html#literals

Is there a difference between subprocess.call() and subprocess.Popen.communicate()?

Like the documentation already tells you, you want to avoid Popen whenever you can.

The subprocess.check_output() function in Python 2.7 lets you retrieve the output from the subprocess, but otherwise works basically like check_call() (which in turn differs from call only in that it will raise an exception if the subprocess fails, which is something you usually want and need).

The case for Popen is that it enables you to build new high-level functions like these if you need to (like also then subprocess.run() in Python 3.5+ which is rather more versatile, and basically subsumes the functionality of all the above three). They all use Popen under the hood, but it is finicky and requires you to take care of several details related to managing the subprocess object if you use it directly; those higher-level functions already do those things for you.

Common cases where you do need Popen is if you want the subprocess to run in parallel with your main Python script. There is no simple library function which does this for you.

Python subprocess Popen.communicate() equivalent to Popen.stdout.read()?

If you look at the source for subprocess.communicate(), it shows a perfect example of the difference:

def communicate(self, input=None):
    ...
    # Optimization: If we are only using one pipe, or no pipe at
    # all, using select() or threads is unnecessary.
    if [self.stdin, self.stdout, self.stderr].count(None) >= 2:
        stdout = None
        stderr = None
        if self.stdin:
            if input:
                self.stdin.write(input)
            self.stdin.close()
        elif self.stdout:
            stdout = self.stdout.read()
            self.stdout.close()
        elif self.stderr:
            stderr = self.stderr.read()
            self.stderr.close()
        self.wait()
        return (stdout, stderr)

    return self._communicate(input)

You can see that communicate does make use of the read calls to stdout and stderr, and also calls wait(). It is just a matter of order of operations. In your case because you are using PIPE for both stdout and stderr, it goes into _communicate():

def _communicate(self, input):
    stdout = None # Return
    stderr = None # Return

    if self.stdout:
        stdout = []
        stdout_thread = threading.Thread(target=self._readerthread,
                                         args=(self.stdout, stdout))
        stdout_thread.setDaemon(True)
        stdout_thread.start()
    if self.stderr:
        stderr = []
        stderr_thread = threading.Thread(target=self._readerthread,
                                         args=(self.stderr, stderr))
        stderr_thread.setDaemon(True)
        stderr_thread.start()

    if self.stdin:
        if input is not None:
            self.stdin.write(input)
        self.stdin.close()

    if self.stdout:
        stdout_thread.join()
    if self.stderr:
        stderr_thread.join()

    # All data exchanged.  Translate lists into strings.
    if stdout is not None:
        stdout = stdout[0]
    if stderr is not None:
        stderr = stderr[0]

    # Translate newlines, if requested.  We cannot let the file
    # object do the translation: It is based on stdio, which is
    # impossible to combine with select (unless forcing no
    # buffering).
    if self.universal_newlines and hasattr(file, 'newlines'):
        if stdout:
            stdout = self._translate_newlines(stdout)
        if stderr:
            stderr = self._translate_newlines(stderr)

    self.wait()
    return (stdout, stderr)

This uses threads to read from multiple streams at once. Then it calls wait() at the end.

So to sum it up:

This example reads from one stream at a time and does not wait for it to finish the process.
This example reads from both streams at the same time via internal threads, and waits for it to finish the process.
This example waits for the process to finish, and then reads one stream at a time. And as you mentioned has the potential to deadlock if there is too much written to the streams.

Also, you don't need these two import statements in your 2nd and 3rd examples:

from subprocess import communicate
from subprocess import wait

They are both methods of the Popen object.

Dynamic communication between main and subprocess in Python

You want to make a Popen object with subprocess.PIPE for standard input and output and use its file objects to communicate—rather than using one of the cantrips like run (and the older, more specific ones like check_output). The challenge is avoiding deadlock: it’s easy to land in a situation where each process is trying to write, the pipe buffers fill (because no one is reading from them), and everything hangs. You also have to remember to flush in both processes, to avoid having a request or response stuck in a file object’s buffer.

Popen.communicate is provided to avoid these issues, but it supports only a single string (rather than an ongoing conversation). The traditional solution is select, but it also works to use separate threads to send requests and read results. (This is one of the reasons to use CPython threads in spite of the GIL: each exists to run while the other is blocked, so there’s very little contention.) Of course, synchronization is then an issue, and you may need to do some work to make the multithreaded client act like a simple, synchronous function call on the outside.

Note that both processes need to flush, but it’s enough if either implements such non-blocking I/O; one normally does that job in the process that starts the other because that’s where it’s known to be necessary (and such programs are the exception).

Understanding Popen.Communicate