Why Use Python's Os Module Methods Instead of Executing Shell Commands Directly

Why use Python's os module methods instead of executing shell commands directly?

  1. It's faster, os.system and subprocess.call create new processes which is unnecessary for something this simple. In fact, os.system and subprocess.call with the shell argument usually create at least two new processes: the first one being the shell, and the second one being the command that you're running (if it's not a shell built-in like test).

  2. Some commands are useless in a separate process. For example, if you run os.spawn("cd dir/"), it will change the current working directory of the child process, but not of the Python process. You need to use os.chdir for that.

  3. You don't have to worry about special characters interpreted by the shell. os.chmod(path, mode) will work no matter what the filename is, whereas os.spawn("chmod 777 " + path) will fail horribly if the filename is something like ; rm -rf ~. (Note that you can work around this if you use subprocess.call without the shell argument.)

  4. You don't have to worry about filenames that begin with a dash. os.chmod("--quiet", mode) will change the permissions of the file named --quiet, but os.spawn("chmod 777 --quiet") will fail, as --quiet is interpreted as an argument. This is true even for subprocess.call(["chmod", "777", "--quiet"]).

  5. You have fewer cross-platform and cross-shell concerns, as Python's standard library is supposed to deal with that for you. Does your system have chmod command? Is it installed? Does it support the parameters that you expect it to support? The os module will try to be as cross-platform as possible and documents when that it's not possible.

  6. If the command you're running has output that you care about, you need to parse it, which is trickier than it sounds, as you may forget about corner-cases (filenames with spaces, tabs and newlines in them), even when you don't care about portability.

How do I execute a program or call a system command?

Use the subprocess module in the standard library:

import subprocess
subprocess.run(["ls", "-l"])

The advantage of subprocess.run over os.system is that it is more flexible (you can get the stdout, stderr, the "real" status code, better error handling, etc...).

Even the documentation for os.system recommends using subprocess instead:

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

On Python 3.4 and earlier, use subprocess.call instead of .run:

subprocess.call(["ls", "-l"])

Running shell command and capturing the output

In all officially maintained versions of Python, the simplest approach is to use the subprocess.check_output function:

>>> subprocess.check_output(['ls', '-l'])
b'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

check_output runs a single program that takes only arguments as input.1 It returns the result exactly as printed to stdout. If you need to write input to stdin, skip ahead to the run or Popen sections. If you want to execute complex shell commands, see the note on shell=True at the end of this answer.

The check_output function works in all officially maintained versions of Python. But for more recent versions, a more flexible approach is available.

Modern versions of Python (3.5 or higher): run

If you're using Python 3.5+, and do not need backwards compatibility, the new run function is recommended by the official documentation for most tasks. It provides a very general, high-level API for the subprocess module. To capture the output of a program, pass the subprocess.PIPE flag to the stdout keyword argument. Then access the stdout attribute of the returned CompletedProcess object:

>>> import subprocess
>>> result = subprocess.run(['ls', '-l'], stdout=subprocess.PIPE)
>>> result.stdout
b'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

The return value is a bytes object, so if you want a proper string, you'll need to decode it. Assuming the called process returns a UTF-8-encoded string:

>>> result.stdout.decode('utf-8')
'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

This can all be compressed to a one-liner if desired:

>>> subprocess.run(['ls', '-l'], stdout=subprocess.PIPE).stdout.decode('utf-8')
'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

If you want to pass input to the process's stdin, you can pass a bytes object to the input keyword argument:

>>> cmd = ['awk', 'length($0) > 5']
>>> ip = 'foo\nfoofoo\n'.encode('utf-8')
>>> result = subprocess.run(cmd, stdout=subprocess.PIPE, input=ip)
>>> result.stdout.decode('utf-8')
'foofoo\n'

You can capture errors by passing stderr=subprocess.PIPE (capture to result.stderr) or stderr=subprocess.STDOUT (capture to result.stdout along with regular output). If you want run to throw an exception when the process returns a nonzero exit code, you can pass check=True. (Or you can check the returncode attribute of result above.) When security is not a concern, you can also run more complex shell commands by passing shell=True as described at the end of this answer.

Later versions of Python streamline the above further. In Python 3.7+, the above one-liner can be spelled like this:

>>> subprocess.run(['ls', '-l'], capture_output=True, text=True).stdout
'total 0\n-rw-r--r-- 1 memyself staff 0 Mar 14 11:04 files\n'

Using run this way adds just a bit of complexity, compared to the old way of doing things. But now you can do almost anything you need to do with the run function alone.

Older versions of Python (3-3.4): more about check_output

If you are using an older version of Python, or need modest backwards compatibility, you can use the check_output function as briefly described above. It has been available since Python 2.7.

subprocess.check_output(*popenargs, **kwargs)  

It takes takes the same arguments as Popen (see below), and returns a string containing the program's output. The beginning of this answer has a more detailed usage example. In Python 3.5+, check_output is equivalent to executing run with check=True and stdout=PIPE, and returning just the stdout attribute.

You can pass stderr=subprocess.STDOUT to ensure that error messages are included in the returned output. When security is not a concern, you can also run more complex shell commands by passing shell=True as described at the end of this answer.

If you need to pipe from stderr or pass input to the process, check_output won't be up to the task. See the Popen examples below in that case.

Complex applications and legacy versions of Python (2.6 and below): Popen

If you need deep backwards compatibility, or if you need more sophisticated functionality than check_output or run provide, you'll have to work directly with Popen objects, which encapsulate the low-level API for subprocesses.

The Popen constructor accepts either a single command without arguments, or a list containing a command as its first item, followed by any number of arguments, each as a separate item in the list. shlex.split can help parse strings into appropriately formatted lists. Popen objects also accept a host of different arguments for process IO management and low-level configuration.

To send input and capture output, communicate is almost always the preferred method. As in:

output = subprocess.Popen(["mycmd", "myarg"], 
stdout=subprocess.PIPE).communicate()[0]

Or

>>> import subprocess
>>> p = subprocess.Popen(['ls', '-a'], stdout=subprocess.PIPE,
... stderr=subprocess.PIPE)
>>> out, err = p.communicate()
>>> print out
.
..
foo

If you set stdin=PIPE, communicate also allows you to pass data to the process via stdin:

>>> cmd = ['awk', 'length($0) > 5']
>>> p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
... stderr=subprocess.PIPE,
... stdin=subprocess.PIPE)
>>> out, err = p.communicate('foo\nfoofoo\n')
>>> print out
foofoo

Note Aaron Hall's answer, which indicates that on some systems, you may need to set stdout, stderr, and stdin all to PIPE (or DEVNULL) to get communicate to work at all.

In some rare cases, you may need complex, real-time output capturing. Vartec's answer suggests a way forward, but methods other than communicate are prone to deadlocks if not used carefully.

As with all the above functions, when security is not a concern, you can run more complex shell commands by passing shell=True.

Notes

1. Running shell commands: the shell=True argument

Normally, each call to run, check_output, or the Popen constructor executes a single program. That means no fancy bash-style pipes. If you want to run complex shell commands, you can pass shell=True, which all three functions support. For example:

>>> subprocess.check_output('cat books/* | wc', shell=True, text=True)
' 1299377 17005208 101299376\n'

However, doing this raises security concerns. If you're doing anything more than light scripting, you might be better off calling each process separately, and passing the output from each as an input to the next, via

run(cmd, [stdout=etc...], input=other_output)

Or

Popen(cmd, [stdout=etc...]).communicate(other_output)

The temptation to directly connect pipes is strong; resist it. Otherwise, you'll likely see deadlocks or have to do hacky things like this.

Python exit commands - why so many and when should each be used?

The functions* quit(), exit(), and sys.exit() function in the same way: they raise the SystemExit exception. So there is no real difference, except that sys.exit() is always available but exit() and quit() are only available if the site module is imported (docs).

The os._exit() function is special, it exits immediately without calling any cleanup functions (it doesn't flush buffers, for example). This is designed for highly specialized use cases... basically, only in the child after an os.fork() call.

Conclusion

  • Use exit() or quit() in the REPL.

  • Use sys.exit() in scripts, or raise SystemExit() if you prefer.

  • Use os._exit() for child processes to exit after a call to os.fork().

All of these can be called without arguments, or you can specify the exit status, e.g., exit(1) or raise SystemExit(1) to exit with status 1. Note that portable programs are limited to exit status codes in the range 0-255, if you raise SystemExit(256) on many systems this will get truncated and your process will actually exit with status 0.

Footnotes

* Actually, quit() and exit() are callable instance objects, but I think it's okay to call them functions.

Assign output of os.system to a variable and prevent it from being displayed on the screen

From "https://stackoverflow.com/questions/1410976/equivalent-of-backticks-in-python", which I asked a long time ago, what you may want to use is popen:

os.popen('cat /etc/services').read()

From the docs for Python 3.6,

This is implemented using subprocess.Popen; see that class’s
documentation for more powerful ways to manage and communicate with
subprocesses.


Here's the corresponding code for subprocess:

import subprocess

proc = subprocess.Popen(["cat", "/etc/services"], stdout=subprocess.PIPE, shell=True)
(out, err) = proc.communicate()
print("program output:", out)

How to escape os.system() calls?

This is what I use:

def shellquote(s):
return "'" + s.replace("'", "'\\''") + "'"

The shell will always accept a quoted filename and remove the surrounding quotes before passing it to the program in question. Notably, this avoids problems with filenames that contain spaces or any other kind of nasty shell metacharacter.

Update: If you are using Python 3.3 or later, use shlex.quote instead of rolling your own.



Related Topics



Leave a reply



Submit