When should xargs be preferred over while-read loops?
The thing with while
loops is that they tend to process one item at a time, often when it's unnecessary. This is where xargs
has an advantage - it can batch up the arguments to allow one command to process lots of items.
For example, a while loop:
pax> echo '1
2
3
4
5' | while read -r; do echo $REPLY; done
1
2
3
4
5
and the corresponding xargs
:
pax> echo '1
2
3
4
5' | xargs echo
1 2 3 4 5
Here you can see that the lines are processed one-by-one with the while
and altogether with the xargs
. In other words, the former is equivalent to echo 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5
while the latter is equivalent to echo 1 2 3 4 5
(five processes as opposed to one). This really makes a difference when processing thousands or tens of thousands of lines, since process creation takes time.
It's mostly advantageous when using commands that can accept multiple arguments since it reduces the number of individual processes started, making things much faster.
When I'm processing small files or the commands to run on each item are complicated (where I'm too lazy to write a separate script to give to xargs
), I will use the while
variant.
Where I'm interested in performance (large files), I will use xargs
, even if I have to write a separate script.
xargs split at newlines not spaces
Try:
printf %b 'ac s\nbc s\ncc s\n' | xargs -d '\n' bash /tmp/test.sh
You neglected to quote the \n
passed to -d
, which means that just n
rather than \n
was passed to xargs
as the delimiter - the shell "ate" the \
(when the shell parses an unquoted string, \
functions as an escape character; if an ordinary character follows the \
- n
in this case - only that ordinary character is used).
Also heed @glenn jackman's advice to double-quote the $@
inside the script (or omit the in "$@"
part altogether).
Also: xargs -d
is a GNU extension, which, for instance, won't work on FreeBSD/macOS. To make it work there, see @glenn jackman's xargs -0
-based solution.
Note that I'm using printf
rather than echo
to ensure that the \n
instances in the string are interpreted as newlines in all Bourne-like shells:
In bash
and ksh
[1], echo
defaults to NOT interpreting \
-based escape sequences (you have to use -e
to achieve that) - unlike in zsh
and strictly POSIX-compliant shells such as dash
.
Therefore, printf
is the more portable choice.
[1] According to the manual, ksh
's echo
builtin exhibits the same behavior as the host platform's external echo
utility; while this may vary across platforms, the Linux and BSD/macOS implementations do not interpret \
escape sequences by default.
Looping through the content of a file in Bash
One way to do it is:
while read p; do
echo "$p"
done <peptides.txt
As pointed out in the comments, this has the side effects of trimming leading whitespace, interpreting backslash sequences, and skipping the last line if it's missing a terminating linefeed. If these are concerns, you can do:
while IFS="" read -r p || [ -n "$p" ]
do
printf '%s\n' "$p"
done < peptides.txt
Exceptionally, if the loop body may read from standard input, you can open the file using a different file descriptor:
while read -u 10 p; do
...
done 10<peptides.txt
Here, 10 is just an arbitrary number (different from 0, 1, 2).
Loop over file names from `find`?
For this, use the read
builtin:
sudo find . -name *.mp3 |
while read filename
do
echo "$filename" # ... or any other command using $filename
done
Provided that your filenames don't use the newline (\n
) character, this should work fine.
Running multiple commands with xargs
cat a.txt | xargs -d $'\n' sh -c 'for arg do command1 "$arg"; command2 "$arg"; ...; done' _
...or, without a Useless Use Of cat:
<a.txt xargs -d $'\n' sh -c 'for arg do command1 "$arg"; command2 "$arg"; ...; done' _
To explain some of the finer points:
The use of
"$arg"
instead of%
(and the absence of-I
in thexargs
command line) is for security reasons: Passing data onsh
's command-line argument list instead of substituting it into code prevents content that data might contain (such as$(rm -rf ~)
, to take a particularly malicious example) from being executed as code.Similarly, the use of
-d $'\n'
is a GNU extension which causesxargs
to treat each line of the input file as a separate data item. Either this or-0
(which expects NULs instead of newlines) is necessary to prevent xargs from trying to apply shell-like (but not quite shell-compatible) parsing to the stream it reads. (If you don't have GNU xargs, you can usetr '\n' '\0' <a.txt | xargs -0 ...
to get line-oriented reading without-d
).The
_
is a placeholder for$0
, such that other data values added byxargs
become$1
and onward, which happens to be the default set of values afor
loop iterates over.
Related Topics
Kdevtmpfsi - How to Find and Delete That Miner
Running R Scripts or Commands with Interpretor in Unix for Unix-Layman
Find and Delete Files with Non-Ascii Names
How to Check If Two Paths Are Equal in Bash
How to Tail -F the Latest Log File with a Given Pattern
How to Copy File to Stopped Docker Container
What Does "Typedef _U16 _Bitwise _Le16;" Mean in Linux Kernel
Shell Script Printing Contents of Variable Containing Output of a Command Removes Newline Characters
Put Every N Rows of Input into a New Column
Change Filenames to Lowercase in Ubuntu in All Subdirectories
Can't Build 32Bit Wine on 64Bit Linux
Hook into Linux Key Event Handling
Running Shell Script Using .Env File
What Is the Best Tool to Convert Common Video Formats to Flv on a Linux Cli
Add Some Specific Time While Using the Linux Command "Date"