Rescuing "Command Not Found" for Io::Popen

Rescuing command not found for IO::popen

Yes: Upgrade to ruby 1.9. If you run that in 1.9, an Errno::ENOENT will be raised instead, and you will be able to rescue it.

(Edit) Here is a hackish way of doing it in 1.8:

error = IO.pipe
$stderr.reopen error[1]
pipe = IO.popen 'qwe' # <- not a real command
$stderr.reopen IO.new(2)
error[1].close

if !select([error[0]], nil, nil, 0.1)
  # The command was found. Use `pipe' here.
  puts 'found'
else
  # The command could not be found.
  puts 'not found'
end

run external program in Ruby IO.popen : rescue not working

After your call to IO.popen you are passing the output from the child program to JSON.parse, regardless of whether it is valid. The exception you see is the json parser trying to parse the Java exception method, which is captured because you redirect stderr with 2>&1.

You need to check that the child process completed successfully before continuing. The simplest way is probably to use the $? special variable, which indicates the status of the last executed child process, after the call to popen. This variable is an instance if Process::Status. You could do something like this:

output = IO.popen(command+" 2>&1") do |io|
  io.read
end

unless $?.success?
  # Handle the error however you feel is best, e.g.
  puts "Tika had an error, the message was:\n#{output}"
  raise "Tika error"
end

For more control you could look at the Open3 module in the standard library. Since Tika is a Java program, another possibility might be to look into using JRuby and call it directly.

Stopping IO.popen in the middle of execution using Exceptions given a bad condition

So - disclaimer, I am using Ruby 1.8.6 and on Windows. It is the only Ruby that the software I use currently supports, so there may be a more elegant solution. Overall, it came down to making sure the process died using the Process.kill command before continuing execution.

IO.popen(cmdLineExecution) do |stream|
  stream.each do |line|                         
      puts line
      begin
        #if it finds an error, throws an exception
        analyzeLine(line) 
      rescue correctionException 
        #if it was able to handle the error 
        puts "Handled the exception successfully"
        Process.kill("KILL", stream.pid) #stop the system process
      rescue correctionFailedException => failedEx
        #not able to handle the error
        puts "Failed handling the exception"
        Process.kill("KILL", stream.pid) #stop the system process
        raise "Was unable to make a known correction to the running enviorment: #{failedEx.message}"
      end
  end
end

I made both the exceptions standard classes that inherit Exception.

How to detect if shell failed to execute a command after popen call? Not to confuse with the command exit status

General comments about when to use `errno`

No standard C or POSIX library function ever sets errno to zero. Printing an error message based on errno when fd is not NULL is not appropriate; the error number is not from popen() (or is not set because popen() failed). Printing res after pclose() is OK; adding strerror(errno) runs into the same problem (the information in errno may be entirely irrelevant). You can set errno to zero before calling a function. If the function returns a failure indication, it may be relevant to look at errno (look at the specification of the function — is it defined to set errno on failure?). However, errno can be set non-zero by a function even if it succeeds. Solaris standard I/O used to set errno = ENOTTY if the output stream was not connected to a terminal, even though the operation succeeded; it probably still does. And Solaris setting errno even on success is perfectly legitimate; it is only legitimate to look at errno if (1) the function reports failure and (2) the function is documented to set errno (by POSIX or by the system manual).

See C11 §7.5 Errors <errno.h> ¶3:

The value of errno in the initial thread is zero at program startup (the initial value of errno in other threads is an indeterminate value), but is never set to zero by any library function.²⁰²⁾ The value of errno may be set to nonzero by a library function call whether or not there is an error, provided the use of errno is not documented in the description of the function in this International Standard.
²⁰²⁾ Thus, a program that uses errno for error checking should set it to zero before a library function call, then inspect it before a subsequent library function call. Of course, a library function can save the value of errno on entry and then set it to zero, as long as the original value is restored if errno's value is still zero just before the return.

POSIX is similar (errno):

Many functions provide an error number in errno, which has type int and is defined in <errno.h>. The value of errno shall be defined only after a call to a function for which it is explicitly stated to be set and until it is changed by the next function call or if the application assigns it a value. The value of errno should only be examined when it is indicated to be valid by a function's return value. Applications shall obtain the definition of errno by the inclusion of <errno.h>. No function in this volume of POSIX.1-2017 shall set errno to 0. The setting of errno after a successful call to a function is unspecified unless the description of that function specifies that errno shall not be modified.

`popen()` and `pclose()`

The POSIX specification for popen() is not dreadfully helpful. There's only one circumstance under which popen() 'must fail'; everything else is 'may fail'.

However, the details for pclose() are much more helpful, including:

If the command language interpreter cannot be executed, the child termination status returned by pclose() shall be as if the command language interpreter terminated using exit(127) or _exit(127).

and

Upon successful return, pclose() shall return the termination status of the command language interpreter. Otherwise, pclose() shall return -1 and set errno to indicate the error.

That means that pclose() returns the value it received from waitpid() — the exit status from the command that was invoked. Note that it must use waitpid() (or an equivalently selective function — hunt for wait3() and wait4() on BSD systems); it is not authorized to wait for any other child processes than the one created by popen() for this file stream. There are prescriptions about pclose() must be sure that the child has exited, even if some other function waited on the dead child in the interim and thereby caused the system to lose the status for the child created by popen().

If you interpret decimal 32512 as hexadecimal, you get 0x7F00. And if you used the WIFEXITED and WEXITSTATUS macros from <sys/wait.h> on that, you'd find that the exit status is 127 (because 0x7F is 127 decimal, and the exit status is encoded in the high-order bits of the status returned by waitpid().

int res = pclose(fd);

if (WIFEXITED(res))
    printf("Command exited with status %d (0x%.4X)\n", WEXITSTATUS(res), res);
else if (WIFSIGNALED(res))
    printf("Command exited from signal %d (0x%.4X)\n", WTERMSIG(res), res);
else
    printf("Command exited with unrecognized status 0x%.4X\n", res);

And remember that 0 is the exit status indicating success; anything else normally indicates an error of some sort. You can further analyze the exit status to look for 127 or relayed signals, etc. It's unlikely you'd get a 'signalled' status, or an unrecognized status.

`popen()` told you that the child failed.

Of course, it is possible that the executed command actually exited itself with status 127; that's unavoidably confusing, and the only way around it is to avoid exit statuses in the range 126 to 128 + 'maximum signal number' (which might mean 126 .. 191 if there are 63 recognized signals). The value 126 is also used by POSIX to report when the interpreter specified in a shebang (#!/usr/bin/interpreter) is missing (as opposed to the program to be executed not being available). Whether that's returned by pclose() is a separate discussion. And the signal reporting is done by the shell because there's no (easy) way to report that a child died from a signal otherwise.

Timeout within a popen works, but popen inside a timeout doesn't?

Aha, subtle.

There is a hidden, blocking ensure clause at the end of the IO#popen block in the second case. The Timeout::Error is raised raised timely, but you cannot rescue it until execution returns from that implicit ensure clause.

Under the hood, IO.popen(cmd) { |io| ... } does something like this:

def my_illustrative_io_popen(cmd, &block)
  begin
    pio = IO.popen(cmd)
    block.call(pio)      # This *is* interrupted...
  ensure
    pio.close            # ...but then control goes here, which blocks on cmd's termination
  end

and the IO#close call is really more-or-less a pclose(3), which is blocking you in waitpid(2) until the sleeping child exits.

You can verify this like so:

#!/usr/bin/env ruby

require 'timeout'

BEGIN { $BASETIME = Time.now.to_i }

def xputs(msg)
  puts "%4.2f: %s" % [(Time.now.to_f - $BASETIME), msg]
end

begin
  Timeout.timeout(3) do
    begin
      xputs "popen(sleep 10)"
      pio = IO.popen("sleep 10")
      sleep 100                     # or loop over pio.gets or whatever
    ensure
      xputs "Entering ensure block"
      #Process.kill 9, pio.pid      # <--- This would solve your problem!
      pio.close
      xputs "Leaving ensure block"
    end
  end
rescue Timeout::Error => ex
  xputs "rescuing: #{ex}"
end

So, what can you do?

You'll have to do it the explicit way, since the interpreter doesn't expose a way to override the IO#popen ensure logic. You can use the above code as a starting template and uncomment the kill() line, for example.

Output redirection to file using subprocess.Popen

I/O redirection with < and > is done by the shell. When you call subprocess.Popen() with a list as the first argument or without shell=True, the program is executed directly, not using the shell to parse the command line. So you're executing the program and passing literal arguments < and > to it. It's as if you executed the shell command and quoted the < and > characters:

scriptname '<' infile.txt '>' outfile.txt

If you want to use the shell you have to send a single string (just like using os.system().

data = subprocess.Popen(" ".join([ shlex.quote(script.out), "<", shlex.quote(input_file[i]), ">", shlex.quote(output_file[i])]), shell=True)

Use shlex.quote() to escape arguments that shouldn't be treated as shell metacharacters.

Running shell command and capturing the output

In all officially maintained versions of Python, the simplest approach is to use the subprocess.check_output function:

>>> subprocess.check_output(['ls', '-l'])
b'total 0\n-rw-r--r--  1 memyself  staff  0 Mar 14 11:04 files\n'

check_output runs a single program that takes only arguments as input.¹ It returns the result exactly as printed to stdout. If you need to write input to stdin, skip ahead to the run or Popen sections. If you want to execute complex shell commands, see the note on shell=True at the end of this answer.

The check_output function works in all officially maintained versions of Python. But for more recent versions, a more flexible approach is available.

Modern versions of Python (3.5 or higher): `run`

If you're using Python 3.5+, and do not need backwards compatibility, the new run function is recommended by the official documentation for most tasks. It provides a very general, high-level API for the subprocess module. To capture the output of a program, pass the subprocess.PIPE flag to the stdout keyword argument. Then access the stdout attribute of the returned CompletedProcess object:

>>> import subprocess
>>> result = subprocess.run(['ls', '-l'], stdout=subprocess.PIPE)
>>> result.stdout
b'total 0\n-rw-r--r--  1 memyself  staff  0 Mar 14 11:04 files\n'

The return value is a bytes object, so if you want a proper string, you'll need to decode it. Assuming the called process returns a UTF-8-encoded string:

>>> result.stdout.decode('utf-8')
'total 0\n-rw-r--r--  1 memyself  staff  0 Mar 14 11:04 files\n'

This can all be compressed to a one-liner if desired:

>>> subprocess.run(['ls', '-l'], stdout=subprocess.PIPE).stdout.decode('utf-8')
'total 0\n-rw-r--r--  1 memyself  staff  0 Mar 14 11:04 files\n'

If you want to pass input to the process's stdin, you can pass a bytes object to the input keyword argument:

>>> cmd = ['awk', 'length($0) > 5']
>>> ip = 'foo\nfoofoo\n'.encode('utf-8')
>>> result = subprocess.run(cmd, stdout=subprocess.PIPE, input=ip)
>>> result.stdout.decode('utf-8')
'foofoo\n'

You can capture errors by passing stderr=subprocess.PIPE (capture to result.stderr) or stderr=subprocess.STDOUT (capture to result.stdout along with regular output). If you want run to throw an exception when the process returns a nonzero exit code, you can pass check=True. (Or you can check the returncode attribute of result above.) When security is not a concern, you can also run more complex shell commands by passing shell=True as described at the end of this answer.

Later versions of Python streamline the above further. In Python 3.7+, the above one-liner can be spelled like this:

>>> subprocess.run(['ls', '-l'], capture_output=True, text=True).stdout
'total 0\n-rw-r--r--  1 memyself  staff  0 Mar 14 11:04 files\n'

Using run this way adds just a bit of complexity, compared to the old way of doing things. But now you can do almost anything you need to do with the run function alone.

Older versions of Python (3-3.4): more about `check_output`

If you are using an older version of Python, or need modest backwards compatibility, you can use the check_output function as briefly described above. It has been available since Python 2.7.

subprocess.check_output(*popenargs, **kwargs)

It takes takes the same arguments as Popen (see below), and returns a string containing the program's output. The beginning of this answer has a more detailed usage example. In Python 3.5+, check_output is equivalent to executing run with check=True and stdout=PIPE, and returning just the stdout attribute.

You can pass stderr=subprocess.STDOUT to ensure that error messages are included in the returned output. When security is not a concern, you can also run more complex shell commands by passing shell=True as described at the end of this answer.

If you need to pipe from stderr or pass input to the process, check_output won't be up to the task. See the Popen examples below in that case.

Complex applications and legacy versions of Python (2.6 and below): `Popen`

If you need deep backwards compatibility, or if you need more sophisticated functionality than check_output or run provide, you'll have to work directly with Popen objects, which encapsulate the low-level API for subprocesses.

The Popen constructor accepts either a single command without arguments, or a list containing a command as its first item, followed by any number of arguments, each as a separate item in the list. shlex.split can help parse strings into appropriately formatted lists. Popen objects also accept a host of different arguments for process IO management and low-level configuration.

To send input and capture output, communicate is almost always the preferred method. As in:

output = subprocess.Popen(["mycmd", "myarg"], 
                          stdout=subprocess.PIPE).communicate()[0]

>>> import subprocess
>>> p = subprocess.Popen(['ls', '-a'], stdout=subprocess.PIPE, 
...                                    stderr=subprocess.PIPE)
>>> out, err = p.communicate()
>>> print out
.
..
foo

If you set stdin=PIPE, communicate also allows you to pass data to the process via stdin:

>>> cmd = ['awk', 'length($0) > 5']
>>> p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
...                           stderr=subprocess.PIPE,
...                           stdin=subprocess.PIPE)
>>> out, err = p.communicate('foo\nfoofoo\n')
>>> print out
foofoo

Note Aaron Hall's answer, which indicates that on some systems, you may need to set stdout, stderr, and stdin all to PIPE (or DEVNULL) to get communicate to work at all.

In some rare cases, you may need complex, real-time output capturing. Vartec's answer suggests a way forward, but methods other than communicate are prone to deadlocks if not used carefully.

As with all the above functions, when security is not a concern, you can run more complex shell commands by passing shell=True.

Notes

1. Running shell commands: the shell=True argument

Normally, each call to run, check_output, or the Popen constructor executes a single program. That means no fancy bash-style pipes. If you want to run complex shell commands, you can pass shell=True, which all three functions support. For example:

>>> subprocess.check_output('cat books/* | wc', shell=True, text=True)
' 1299377 17005208 101299376\n'

However, doing this raises security concerns. If you're doing anything more than light scripting, you might be better off calling each process separately, and passing the output from each as an input to the next, via

run(cmd, [stdout=etc...], input=other_output)

Popen(cmd, [stdout=etc...]).communicate(other_output)

The temptation to directly connect pipes is strong; resist it. Otherwise, you'll likely see deadlocks or have to do hacky things like this.

Rescuing "Command Not Found" for Io::Popen