Emulating Bash 'Source' in Python

Emulating Bash 'source' in Python

The problem with your approach is that you are trying to interpret bash scripts. First you just try to interpret the export statement. Then you notice people are using variable expansion. Later people will put conditionals in their files, or process substitutions. In the end you will have a full blown bash script interpreter with a gazillion bugs. Don't do that.

Let Bash interpret the file for you and then collect the results.

You can do it like this:

#! /usr/bin/env python

import os
import pprint
import shlex
import subprocess

command = shlex.split("env -i bash -c 'source init_env && env'")
proc = subprocess.Popen(command, stdout = subprocess.PIPE)
for line in proc.stdout:
  (key, _, value) = line.partition("=")
  os.environ[key] = value
proc.communicate()

pprint.pprint(dict(os.environ))

Make sure that you handle errors in case bash fails to source init_env, or bash itself fails to execute, or subprocess fails to execute bash, or any other errors.

the env -i at the beginning of the command line creates a clean environment. that means you will only get the environment variables from init_env. if you want the inherited system environment then omit env -i.

Read the documentation on subprocess for more details.

Note: this will only capture variables set with the export statement, as env only prints exported variables.

Enjoy.

Note that the Python documentation says that if you want to manipulate the environment you should manipulate os.environ directly instead of using os.putenv(). I consider that a bug, but I digress.

What is a good way to source a Bash script from within Python?

The line you are seeing is the result of the script doing the following:

module() { eval `/usr/bin/modulecmd bash $*`; }
export -f module

That is, it is explicitly exporting the bash function module so that sub(bash)shells can use it.

We can tell from the format of the environment variable that you upgraded your bash in the middle of the shellshock patches. I don't think there is a current patch which would generate BASH_FUNC_module()= instead of BASH_FUNC_module%%()=, but iirc there was such a patch distributed during the flurry of fixes. You might want to upgrade your bash again now that things have settled down. (If that was a cut-and-paste error, ignore this paragraph.)

And we can also tell that /bin/sh on your system is bash, assuming that the module function was introduced by sourcing the shell script.

Probably you should decide whether you care about exported bash functions. Do you want to export module into the environment you are creating, or just ignore it? The solution below just returns what it finds in the environment, so it will include module.

In short, if you're going to parse the output of some shell command which tries to print the environment, you're going to have three possible issues:

Exported functions (bash only), which look different pre- and post-shellshock patch, but always contain at least one newline. (Their value always starts with () { so they are easy to identify. Post shellshock, their names will be BASH_FUNC_funcname%% but until you don't find both pre- and post-patched bashes in the wild, you might not want to rely on that.)
Exported variables which contain a newline.
In some case, exported variables with no value at all. These actually have the value of an empty string, but it is possible for them to be in the environment list without an = sign, and some utilities will print them out without an =.

As always, the most robust (and possibly even simplest) solution would be to avoid parsing, but we can fall back on the strategy of parsing a formatted string we create ourselves, which is carefully designed to be parsed.

We can use any programming language with access to the environment to produce this output; for simplicity, we can use python itself. We'll output the environment variables in a very simple format: the variable name (which must be alphanumeric), followed by an equal sign, followed by the value, followed by a NUL (0) byte (which cannot appear in the value). Something like the following:

from subprocess import Popen, PIPE

# The commented-out line really should not be necessary; it's impossible
# for an environment variable name to contain an =. However, it could
# be replaced with a more stringent check.
prog = ( r'''from os import environ;'''
       + r'''from sys import stdout;'''
       + r'''stdout.write("\0".join("{k}={v}".format(kv)'''
       + r'''                       for kv in environ.iteritems()'''
      #+ r'''                       if "=" not in kv[0]'''
       + r'''            ))'''
       )

# Lots of error checking omitted.    
def getenv_after_sourcing(fn):
  argv = [ "bash"
         , "-c"
         , '''. "{fn}"; python -c '{prog}' '''.format(fn=fn, prog=prog)]
  data = Popen(argv, stdout=PIPE).communicate()[0]
  return dict(kv.split('=', 1) for kv in data.split('\0'))

Linux shell source command equivalent in python

Too much work for the return. Going to keep a small shell script to get all the env vars that we need and forget reading them into python.

Named variables from Python for use in Bash source without temp file

You will need spaces around the strings and will also need to ensure that there are no spaces before or after "=" and so:

print("STARTDATE=\"" + STARTDATE + "\"")
print("STARTTIME=\"" + STARTTIME + "\"")

Making sure that you are using a bash shell and not sh, you should then be able to action the script and set the variables with:

source <(./script.py)

Or in a script:

#!/bin/bash
source <(./script.py)

Calling the source command from subprocess.Popen

source is not an executable command, it's a shell builtin.

The most usual case for using source is to run a shell script that changes the environment and to retain that environment in the current shell. That's exactly how virtualenv works to modify the default python environment.

Creating a sub-process and using source in the subprocess probably won't do anything useful, it won't modify the environment of the parent process, none of the side-effects of using the sourced script will take place.

Python has an analogous command, execfile, which runs the specified file using the current python global namespace (or another, if you supply one), that you could use in a similar way as the bash command source.

Unix Source command equivalent in Python

You can in fact customize the import mechanism for such purposes. However, let me provide a quick hack instead:

def source(filename):
    variables = {}
    with open(filename) as f:
        for line in f:
            try:
                name, value = line.strip().split('=')
            except:
                continue
            variables[name] = value
    return variables

variables = source('job.properties')
print(variables)

The function source iterates over the lines in the supplied file and store the assignments in the dictionary variables. In case of lines not containing assignment, the try/except will simply skip it.

To emulate the behavior of shell sourcing further, you might add

globals().update(variables)

if working at the module (non-function) level, which will make databaseName and hdfspath available as Python variables.

Note that all "sourced" variables will be strs, even for lines such as my_int=42 in the sourced file.

Emulating Bash 'Source' in Python