When Setting Ifs to Split on Newlines, Why Is It Necessary to Include a Backspace

When setting IFS to split on newlines, why is it necessary to include a backspace?

Because as bash manual says regarding command substitution:

Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted.

So, by adding \b you prevent removal of \n.

A cleaner way to do this could be to use $'' quoting, like this:

IFS=$'\n'

shell - temp IFS as newline only. Why doesn't this work: IFS=$(echo -e '\n')

Update - changing my pseudo-comment to a real answer.

I think this answer should explain the behavior you are seeing. Specifically command substitution operators $() and backticks will strip trailing newlines from the command output. However the direct assignment in your second example doesn't do any command subsitution, so works as expected.

So I'm afraid to say I think the upvoted comment you refer to is incorrect.


I think the safest way to restore IFS is to set it in a subshell. All you need to do is put the relevant commands in parentheses ():

(
IFS=$'\n'
echo -n "$IFS" | od -t x1
for file in `printf 'one\ntwo two\nthree'`; do
echo "Found $file"
done
)

Of course invoking a subshell incurs a small delay, so performance needs to be considered if this is to be repeated many times.


As an aside, be very careful, filenames can contain both \b and \n. I think just about the only characters they cannot contain are slash and \0. At least thats what it says on this wikipedia article.

$ touch $'12345\b67890'
$ touch "new
> line"
$ ls --show-control-chars
123467890 new
line
$

Hi what is the meaning of IFS with single quote in next line?

It means you are specifying the IFS to use newline for splitting. This would be similar to doing:

IFS=$'\n'

The difference being is that your way is POSIX compliant.

My sources for this answer are here and here

You may find that the different methods are preferred depending on which shell implementation you are using (I think that's the right term?)


NOTE: My answer is based purely on the last 10 minutes of research, I have no prior experience or knowledge with this.

What is the exact meaning of IFS=$'\n'?

Normally bash doesn't interpret escape sequences in string literals. So if you write \n or "\n" or '\n', that's not a linebreak - it's the letter n (in the first case) or a backslash followed by the letter n (in the other two cases).

$'somestring' is a syntax for string literals with escape sequences. So unlike '\n', $'\n' actually is a linebreak.

Why can't I set IFS for one call of printf

In this command line:

IFS=',' printf "setup-x86.exe -q -p='%s'\n" "${deps[*]}"

printf does not expand "${deps[*]}". The shell does the expansion. In fact, that's pretty well always true. Although printf happens to be a shell builtin, it doesn't do anything special to its arguments, and you would get exactly the same behaviour with an external printf.

The syntax

envvar=value program arg1 arg2 arg3

causes the shell to add envvar=value to the list of environment variables provided to program, and the strings arg1, arg2, and arg3 to be made into an argv list for program. Before all that happens, the shell does its normal expansions of various types, which will cause shell variables referenced in the value and the three arguments to be replaced with their values. But the environment variable setting envvar=value is not part of the shell's execution environment.

Equally,

FOO=World echo "Hello, $FOO"

will not use FOO=World when expanding $FOO in the argument to echo. "Hello, $FOO" is expanded by the shell in the shell's execution environment, and then passed to echo as an argument, and FOO=World is passed to echo as part of its environment.

Putting the variable setting in a separate command is completely different.

IFS=','; printf "setup-x86.exe -q -p='%s'\n" "${deps[*]}"

first sets the value of IFS in the shell's environment, before the shell parses the printf command. When the shell then does its expansions in the arguments it will eventually pass to printf, it uses the value of IFS in order to expand the array deps[*]. In this case, IFS is not included in the environment variables passed to printf, unless IFS has previously been exported.

The use of IFS with the read builtin may seem confusing, but it is entirely consistent with the above. In the command

IFS=, read A B C

IFS=, is passed as part of the list of environment variables to read. read the consumes a line of input, and consults the value of IFS in its environment in order to figure out how to split the input line into words.

In order to change IFS for the purposes of parameter expansion in an argument, the change must be made in the shell's environment, which is a global change. Since you rarely want to globally change the value of IFS, a common idiom is to change it within a subshell created with ():

( IFS=,; printf "setup-x86.exe -q -p='%s'\n" "${deps[*]}"; )

probably does what you want.

How can I have a newline in a string in sh?

If you're using Bash, you can use backslash-escapes inside of a specially-quoted $'string'. For example, adding \n:

STR=$'Hello\nWorld'
echo "$STR" # quotes are required here!

Prints:

Hello
World

If you're using pretty much any other shell, just insert the newline as-is in the string:

STR='Hello
World'

Bash recognizes a number of other backslash escape sequences in the $'' string. Here is an excerpt from the Bash manual page:

Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\cx a control-x character

The expanded result is single-quoted, as if the dollar sign had not
been present.

A double-quoted string preceded by a dollar sign ($"string") will cause
the string to be translated according to the current locale. If the
current locale is C or POSIX, the dollar sign is ignored. If the
string is translated and replaced, the replacement is double-quoted.

Using Python to read newline characters correctly in Linux

Each text file in Linux consists of series of lines plus a terminating new line character. If a file doesn't end with a new line character in linux, it is not considered as a text file.
This is defined in the POSIX file system used by linux.

3.206 Line

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

So it is not the issue with your code. It is just the filesystem under linux.
You can simply remove the terminating newline character in the string you read from the file.

Bash: Strip trailing linebreak from output

If your expected output is a single line, you can simply remove all newline characters from the output. It would not be uncommon to pipe to the tr utility, or to Perl if preferred:

wc -l < log.txt | tr -d '\n'

wc -l < log.txt | perl -pe 'chomp'

You can also use command substitution to remove the trailing newline:

echo -n "$(wc -l < log.txt)"

printf "%s" "$(wc -l < log.txt)"

If your expected output may contain multiple lines, you have another decision to make:

If you want to remove MULTIPLE newline characters from the end of the file, again use cmd substitution:

printf "%s" "$(< log.txt)"

If you want to strictly remove THE LAST newline character from a file, use Perl:

perl -pe 'chomp if eof' log.txt

Note that if you are certain you have a trailing newline character you want to remove, you can use head from GNU coreutils to select everything except the last byte. This should be quite quick:

head -c -1 log.txt

Also, for completeness, you can quickly check where your newline (or other special) characters are in your file using cat and the 'show-all' flag -A. The dollar sign character will indicate the end of each line:

cat -A log.txt


Related Topics



Leave a reply



Submit