Hash ("#") Symbol in /Etc/Environment Causes String to Be Split

Unable to see or modify value of PYTHONHASHSEED through a module

You can set PYTHONHASHSEED in a Python script, but it has no effect on the behavior of the hash() function - it needs to be set in the environment of the interpreter before the interpreter starts up.


How to set its value using pure Python

The trick is to pass the environment variable to the Python interpreter in a subprocess.

import random
from subprocess import call

random.seed(37)
cmd = ['python', '-c', 'print(hash("abc"))']

for i in range(5):
hashseed = bytes(random.randint(0, 4294967295))
print('\nhashseed', hashseed)
call(cmd, env={'PYTHONHASHSEED': hashseed})

output

hashseed 2929187283
-972692480

hashseed 393430205
2066796829

hashseed 2653501013
1620854360

hashseed 3616018455
-599248233

hashseed 3584366196
-2103216293

You can change the cmd list so that it runs the hashtest.py script above:

cmd = ['python', 'hashtest.py']

or if hashtest.py is executable,

cmd = './hashtest.py'

By passing a dict as the env argument we replace the default environment that would be passed to the command. If you need access to those other environment variables, then instead you should modify os.environ in the calling script, with eg, os.environ['PYTHONHASHSEED'] = hashseed.

How to set its value using Bash

First, we have a short Bash script pyhashtest.bsh that uses the RANDOM environment variable as the seed for PYTHONHASHSEED. This variable must be exported so that the Python interpreter can see it. Then we run our Python script hashtest.py. We do this in a loop 5 times so we can see that using different seeds has an effect on the hash value.

The Python script hashtest.py reads PYTHONHASHSEED from the environment and prints it to show that it has the value we expect it to have. We then calculate & print the hash of a short string.

pyhashtest.bsh

#!/usr/bin/env bash

for((i=0; i<5; i++)); do
n=$RANDOM
echo "$i: Seed is $n"
export PYTHONHASHSEED="$n"
python hashtest.py
echo
done

hashtest.py

#!/usr/bin/env python
import os

s = 'abc'
print('Hashseed is', os.environ['PYTHONHASHSEED'])
print('hash of s is', hash(s))

typical output

0: Seed is 9352
Hashseed is 9352
hash of s is 401719638

1: Seed is 24945
Hashseed is 24945
hash of s is -1250185385

2: Seed is 17661
Hashseed is 17661
hash of s is -571990551

3: Seed is 24313
Hashseed is 24313
hash of s is 99658978

4: Seed is 21142
Hashseed is 21142
hash of s is -662114263

To run these programs, save them both into the same directory, eg the usual directory you run Python scripts from. Then open a Bash shell and navigate to that directory using the cd command.

Eg, if you've saved the scripts to /mnt/sda2/fred/python then you'd do

cd /mnt/sda2/fred/python

Next, make pyhashtest.bsh executable using this command:

chmod a+x pyhashtest.bsh

Then run it with

./pyhashtest.bsh

Perl string replace with backreferenced values and shell variables

The meaning of $1 is different in the shell and in Perl.

In the shell, it means the first positional argument. As double quotes expand variables, $1 in double quotes also means the first positional argument.

In Perl, $1 means the first capture group matched by a regular expression.

But, if you use $1 in double quotes on the shell level, Perl never sees it: the shell expands $1 as the first positional argument and sends the expanded string to Perl.

You can use the %ENV hash in Perl to refer to environment variables:

aaa=5 perl -i.bak -pe 's/pm.max_children\s*=\s*\K([0-9]+)/($1 * $ENV{aaa})/ge' /usr/local/etc/php-fpm.d/www.conf

How to substitute shell variables in complex text files

Looking, it turns out on my system there is an envsubst command which is part of the gettext-base package.

So, this makes it easy:

envsubst < "source.txt" > "destination.txt"

Note if you want to use the same file for both, you'll have to use something like moreutil's sponge, as suggested by Johnny Utahh: envsubst < "source.txt" | sponge "source.txt". (Because the shell redirect will otherwise empty the file before its read.)

bash : Bad Substitution

The default shell (/bin/sh) under Ubuntu points to dash, not bash.

me@pc:~$ readlink -f $(which sh)
/bin/dash

So if you chmod +x your_script_file.sh and then run it with ./your_script_file.sh, or if you run it with bash your_script_file.sh, it should work fine.

Running it with sh your_script_file.sh will not work because the hashbang line will be ignored and the script will be interpreted by dash, which does not support that string substitution syntax.

What's a concise way to check that environment variables are set in a Unix shell script?

Parameter Expansion

The obvious answer is to use one of the special forms of parameter expansion:

: ${STATE?"Need to set STATE"}
: ${DEST:?"Need to set DEST non-empty"}

Or, better (see section on 'Position of double quotes' below):

: "${STATE?Need to set STATE}"
: "${DEST:?Need to set DEST non-empty}"

The first variant (using just ?) requires STATE to be set, but STATE="" (an empty string) is OK — not exactly what you want, but the alternative and older notation.

The second variant (using :?) requires DEST to be set and non-empty.

If you supply no message, the shell provides a default message.

The ${var?} construct is portable back to Version 7 UNIX and the Bourne Shell (1978 or thereabouts). The ${var:?} construct is slightly more recent: I think it was in System III UNIX circa 1981, but it may have been in PWB UNIX before that. It is therefore in the Korn Shell, and in the POSIX shells, including specifically Bash.

It is usually documented in the shell's man page in a section called Parameter Expansion. For example, the bash manual says:

${parameter:?word}

Display Error if Null or Unset. If parameter is null or unset, the expansion of word (or a message to that effect if word is not present) is written to the standard error and the shell, if it is not interactive, exits. Otherwise, the value of parameter is substituted.

The Colon Command

I should probably add that the colon command simply has its arguments evaluated and then succeeds. It is the original shell comment notation (before '#' to end of line). For a long time, Bourne shell scripts had a colon as the first character. The C Shell would read a script and use the first character to determine whether it was for the C Shell (a '#' hash) or the Bourne shell (a ':' colon). Then the kernel got in on the act and added support for '#!/path/to/program' and the Bourne shell got '#' comments, and the colon convention went by the wayside. But if you come across a script that starts with a colon, now you will know why.


Position of double quotes

blong asked in a comment:

Any thoughts on this discussion? https://github.com/koalaman/shellcheck/issues/380#issuecomment-145872749

The gist of the discussion is:

… However, when I shellcheck it (with version 0.4.1), I get this message:

In script.sh line 13:
: ${FOO:?"The environment variable 'FOO' must be set and non-empty"}
^-- SC2086: Double quote to prevent globbing and word splitting.

Any advice on what I should do in this case?

The short answer is "do as shellcheck suggests":

: "${STATE?Need to set STATE}"
: "${DEST:?Need to set DEST non-empty}"

To illustrate why, study the following. Note that the : command doesn't echo its arguments (but the shell does evaluate the arguments). We want to see the arguments, so the code below uses printf "%s\n" in place of :.

$ mkdir junk
$ cd junk
$ > abc
$ > def
$ > ghi
$
$ x="*"
$ printf "%s\n" ${x:?You must set x} # Careless; not recommended
abc
def
ghi
$ unset x
$ printf "%s\n" ${x:?You must set x} # Careless; not recommended
bash: x: You must set x
$ printf "%s\n" "${x:?You must set x}" # Careful: should be used
bash: x: You must set x
$ x="*"
$ printf "%s\n" "${x:?You must set x}" # Careful: should be used
*
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
abc
def
ghi
$ x=
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
bash: x: You must set x
$ unset x
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
bash: x: You must set x
$

Note how the value in $x is expanded to first * and then a list of file names when the overall expression is not in double quotes. This is what shellcheck is recommending should be fixed. I have not verified that it doesn't object to the form where the expression is enclosed in double quotes, but it is a reasonable assumption that it would be OK.

How to store /etc/passwd in a hash or array?

Store it in a hash with usernames as keys, and the split array as value:

my %passwd = ();

open PASSWD, "/etc/passwd";
while(<PASSWD>) {

chomp;
my @f = split /:/;
@{$passwd{$f[0]}} = @f;
}
print $passwd{'Sjoerder'}[3];

Should I use quotes in environment path names?

Tip of the hat to @gniourf_gniourf and @chepner for their help.

tl;dr

To be safe, double-quote: it'll work in all cases, across all POSIX-like shells.

If you want to add a ~-based path, selectively leave the ~/ unquoted to ensure that ~ is expanded; e.g.: export PATH=~/"bin:$PATH".
See below for the rules of ~ expansion in variable assignments.
Alternatively, simply use $HOME inside a single, double-quoted string:

export PATH="$HOME/bin:$PATH"


NOTE: The following applies to bash, ksh, and zsh, but NOT to (mostly) strictly POSIX compliant shells such as dash; thus, when you target /bin/sh, you MUST double-quote the RHS of export.[1]

  • Double-quotes are optional, ONLY IF the literal part of your RHS (the value to assign) contains neither whitespace nor other shell metacharacters.
  • Whether the values of the variables referenced contain whitespace/metacharacters or not does not matter - see below.

    • Again: It does matter with sh, when export is used, so always double-quote there.

The reason you can get away without double-quoting in this case is that variable-assignment statements in POSIX-like shells interpret their RHS differently than arguments passed to commands, as described in section 2.9.1 of the POSIX spec:

  • Specifically, even though initial word-splitting is performed, it is only applied to the unexpanded (raw) RHS (that's why you do need quoting with whitespace/metacharacters in literals), and not to its results.

  • This only applies to genuine assignment statements of the form

    <name>=<value> in all POSIX-like shells
    , i.e., if there is no command name before the variable name; note that that includes assignments prepended to a command to define ad-hoc environment variables for it, e.g., foo=$bar cmd ....

  • Assignments in the context of other commands should always be double-quoted, to be safe:

    • With sh (in a (mostly) strictly POSIX-compliant shell such as dash) an assignment with export is treated as a regular command, and the foo=$bar part is treated as the 1st argument to the export builtin and therefore treated as usual (subject to word-splitting of the result, too).

      (POSIX doesn't specify any other commands involving (explicit) variable-assignment; declare, typeset, and local are nonstandard extensions).

    • bash, ksh, zsh, in an understandable deviation from POSIX, extend the assignment logic to export foo=$bar and typeset/declare/local foo=$bar as well. In other words: in bash, ksh, zsh, export/typeset/declare/local commands are treated like assignments, so that quoting isn't strictly necessary.

      • Perhaps surprisingly, dash, which also chose to implement the non-POSIX local builtin[2]
        , does NOT extend assignment logic to it; it is consistent with its export behavior, however.
    • Assignments passed to env (e.g., env foo=$bar cmd ...) are also subject to expansion as a command argument and therefore need double-quoting - except in zsh.

      • That env acts differently from export in ksh and bash in that regard is due to the fact that env is an external utility, whereas export is a shell builtin.

        (zsh's behavior fundamentally differs from that of the other shells when it comes to unquoted variable references).
  • Tilde (~) expansion happens as follows in genuine assignment statements:

    • In addition to the ~ needing to be unquoted, as usual, it is also only applied:

      • If the entire RHS is ~; e.g.:

        • foo=~ # same as: foo="$HOME"
      • Otherwise: only if both of the following conditions are met:

        • if ~ starts the string or is preceded by an unquoted :
        • if ~ is followed by an unquoted /.
        • e.g.,

          foo=~/bin # same as foo="$HOME/bin"
          foo=$foo:~/bin # same as foo="$foo:$HOME/bin"

Example

This example demonstrates that in bash, ksh, and zsh you can get away without double-quoting, even when using export, but I do not recommend it.

#!/usr/bin/env bash
# or ksh or zsh - but NOT /bin/sh!

# Create env. variable with whitespace and other shell metacharacters
export FOO="b:c &|<> d"

# Extend the value - the double quotes here are optional, but ONLY
# because the literal part, 'a:`, contains no whitespace or other shell metacharacters.
# To be safe, DO double-quote the RHS.
export FOO=a:$foo # OK - $FOO now contains 'a:b:c &|<> d'

[1] As @gniourf_gniourf points out: Use of export to modify the value of PATH is optional, because once a variable is marked as exported, you can use a regular assignment (PATH=...) to change its value.

That said, you may still choose to use export, so as to make it explicit that the variable being modified is exported.

[2] @gniourf_gniourf states that a future version of the POSIX standard may introduce the local builtin.



Related Topics



Leave a reply



Submit