Do Here-Strings Undergo Word-Splitting

Do here-strings undergo word-splitting?

The change from splitting here-strings to not splitting them happened between bash-4.4-alpha and bash-4.4-beta. Quoting from CHANGES:

Bash no longer splits the expansion of here-strings, as the documentation has always said.

So the manuals of older Bash versions mentioned it, but didn't actually do it.

Word splitting in Bash with IFS set to a non-whitespace character

You can read more about word splitting here.

The shell scans the results of parameter expansion, command substitution,
and arithmetic expansion that did not occur within double quotes for word splitting.

When you pass the bare string one:two:three as an argument with IFS set to :, Bash doesn't do word splitting because the bare string is not one of parameter expansion, command substitution, or arithmetic expansion contexts.

However, when the same string is assigned to a variable and the variable is passed to the script unquoted, word splitting does occur as it is a case of parameter expansion.

The same thing applies to these as well (command substitution):

$ ./args $(echo one:two:three)
3 args: <one> <two> <three>

$ ./args "$(echo one:two:three)"
1 args: <one:two:three>

As documented, read command does do word splitting on every line read, unless IFS has been set to an empty string.

using bash here string to STDOUT

You're misreading the description. It doesn't say word is appended to anything, it says it is "...supplied as a single string, with a newline appended, to the command...". So it's a newline being appended to word, not word being appended to anything.

word plus a newline is "supplied" to the command. That means it is made available (in the form of a file, or pipe, or something like that) for the command to read as input, either on FD 0 (standard input aka stdin) by default, or some other FD if a number is specified. So someCommand 3<<<"test" is essentially equivalent to:

echo "test" >tempfile
someCommand 3<tempfile

If the command doesn't read from that FD (3 in my example here), then the redirection doesn't do anything. The input just sits there unread. If there's no command there (as in just 1<<<test), then it certainly won't get read.

FD 1 a bit special, because it's reserved for standard output (stdout). Commands normally don't read from it at all, they write to it. Also, it's normally directed to your terminal device; if you redirect it, that replaces the connection to your terminal, so you're not going to see the output. And since it's now open for reading only, anything that even tries to write to it will run into trouble.

Suppose you run echo "something" 1<<<"test", that's essentially:

echo "test" >tempfile
echo "something" 1<tempfile

...so the second echo command tries to write "something" to stdout (FD 1), but since stdout is a file (or pipe or whatever) that's been opened for reading only (no writing allowed!), its attempt fails and you get "write error: Bad file descriptor" (the error message is sent to FD 2, stderr, so it successfully reaches your terminal).

Now look at your second command, exec 1<<<test. Again, that's essentially:

echo "test" >tempfile
exec 1<tempfile

The exec command applies its redirects to the shell (and they're inherited by everything that shell runs). That means the shell and all commands it runs will be trying to send their normal output to a read-only file descriptor, which means almost everything will fail.

Bash word splitting mechanism

Read man bash. For assignment, it says

All values undergo tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, and quote removal [ ... ] Word splitting is not performed, with the exception of "$@" as explained below under Special Parameters. Pathname expansion is not performed.

Word splitting also does not happen in [[ ]] conditions:

Word splitting and pathname expansion are not performed on the words between the [[ and ]]

Trying to split a string into two variables

This is a bug in Bash 4.2. See chepner's answer for a proper explanation.

It is about quotes. Use:

IFS=':' read var1 var2 <<< "$var"
                           ^    ^

instead of

IFS=':' read var1 var2 <<< $var

See result:

$ IFS=':' read var1 var2 <<< "$var"
$ echo "var1=$var1, var2=$var2"
var1=hello, var2=world

But

$ IFS=':' read var1 var2 <<< $var
$ echo "var1=$var1, var2=$var2"
var1=hello world, var2=

Why does read here incorrectly populates array when used with unquoted here-string?

This is a known bug in bash version 4.2 (and apparently some 4.1.x versions), which was fixed in version 4.3. See this bug-bash thread, which references this StackOverflow question; there is an explanation of the bug in @chepner's answer to that question.

What does it mean a line in bash manual, Expansion is performed on the command line after it has been split into words.

When the manual for bash states this:

Expansion is performed on the command line after it has been split into words.

It means that the command line starts as one whole string, then, from left to right, that string is split on metacharacters:

metacharacter

A character that, when unquoted, separates words. One of the following:

| & ; ( ) < > space tab

What results from such splitting is called "words" or "tokens".

That is applied to the whole command line string.

In contrast, what is called "word splitting" is applied only to the result of "unquoted expansions" and use only the characters in $IFS to split.

Thus, this command line prints only two lines:

$ a="word splitting"  b="single_word"

$ printf '%s\n' "$a" "$b"
word splitting
single_word

The command line: printf '%s\n' "$a" "$b" was divided by spaces into four tokens (words):

one: printf, two: '%s\n' three:"$a" and four:"$b".

If the same command line is unquoted:

$ printf '%s\n' $a $b
word
splitting
single_word

Then, the third token (word): word splitting is subject to word splitting and becomes two token (words): word and splitting (because IFS contains an space by default).

The fourth token is also subject to word splitting (as it is unquoted) but remains as one token (word) as it has no IFS characters.

Please note that "word splitting" is usually associated with globing in that both are applied to unquoted expansions. Globing could be turned off by the command set -f and word-splitting could be avoided by using quoting.

Correctly allow word splitting of command substitution in bash

The safe way to capture output from one command and pass it to another is to temporarily capture the output in an array. This allows splitting on arbitrary delimiters and prevents unintentional splitting or globbing while capturing output as more than one string to be passed on to another command.

If you want to read a space-separated string into an array, use read -a:

read -r -a names < <(docker ps | awk '{print $NF}' | grep -v NAMES)
printf 'Found name: %s\n' "${names[@]}"

Unlike the unquoted-expansion approach, this doesn't expand globs. Thus, foo[bar] can't be replaced with a filesystem entry named foob, or with an empty string if no such filesystem entry exists and the nullglob shell option is set. (Likewise, * will no longer be replaced with a list of files in the current directory).

To go into detail regarding behavior: read -r -a reads up to a delimiter passed as the first character of the option argument following -d (if given), or a NUL if that option argument is 0 bytes, and splits the results into fields based on characters within IFS -- a set which, by default, contains the newline, the tab, and the space; it then assigns those split results to an array.

This behavior does not meaningfully vary based on shell-local configuration, except for IFS, which can be modified scoped to the single command.

mapfile -t and readarray -t are similarly consistent in behavior, and likewise recommended if portability constraints do not prevent their use.

By contrast, array=( $string ) is much more dependent on the shell's configuration and settings, and will behave badly if the shell's configuration is left at defaults:

When using array=( $string ), if set -f is not set, each word created by splitting $string is evaluated as a glob, with further variances based in behavior depending on the shopt settings nullglob (which would cause a pattern which didn't expand to any contents to result in an empty set, rather than the default of expanding to the glob expression itself), failglob (which would cause a pattern which didn't expand to any contents to result in a failure), extglob, dotglob and others.
When using array=( $string ), the value of IFS used for the split operation cannot be easily and reliably altered in a manner scoped to this single operation. By contrast, one can run IFS=: read to force read to split only on :s without modifying the value of IFS outside the scope of that single value; no equivalent for array=( $string ) exists without storing and re-setting IFS (which is an error-prone operation; some common idioms [such as assignment to oIFS or a similar variable name] operate contrary to intent in common scenarios, such as failing to reproduce an unset or empty IFS at the end of the block to which the temporary modification is intended to apply).

BASH: Unavoidable wordsplitting in subcommand expansion?

what is the correct way to invoke/format a command, stored as a string in a BASH variable, containing a quoted literal string

You assume that the there is no difference between

typing a command directly into the terminal/script
storing the exact same command string into a variable and then executing $variable.

But there are many differences! Commands typed directly into bash undergo more processing steps than anything else. These steps are documented in bash's manual:

Tokenization
Quotes are interpreted. Operators are identified. The command is split into words at whitespace between unquoted parts. IFS is not used here.
Several expansions in a left-to-right fashion. That is, after one of these transformations were applied to a token, bash would continue to process its result with 3. For example, you could safely use a home directory with a literal $ in its pathname as the result of expanding ~ does not undergo variable expansion, thus the $ remains uninterpreted.
- brace expansion {1..9}
- tilde expansion ~
- parameter and variable expansion $var
- arithmetic expansion $((...))
- command substitution $(...), `...`
- process substitution <()
Word splitting
Split the result of unquoted expansions using IFS.
Filename expansion
Also known as globbing: *, ?, [...] and more with shopt -s extglob.

_{Admittedly, this confuses most bash beginners. To me it seems, most of Stackoverflow's bash questions are about things related to these processing steps. Some classical examples are for i in {1..$n} does not work and echo $var does not print what I assigned to var.}

Strings from unquoted variables only undergo some of the processing steps listed above. As described, these steps are "3. word splitting" and "4. filename expansion".

If you want to apply all processing steps to a string, you can use the eval command. However, this is very frowned upon as there are either better alternatives (if you define the command yourself) or huge security implications (if an outsider defines the command).

In your example, I don't see a reason to store the command at all. But if you really want to access it as a string somewhere else, then use an array:

command=(node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o)
echo "${command[*]}" # print
"${command[@]}"      # execute

Do Here-Strings Undergo Word-Splitting