How to Use Xargs to Run a Function in a Command Substitution for Each Match

How can I use xargs to run a function in a command substitution for each match?

If you really need to do this (and you probably don't, but we can't help without a more representative sample), a better-practice approach might look like:

subs() { sed -E "s/(.)/\1\1/g" <<<"$1"; }
export -f subs

echo "ABC" | xargs bash -c 'for arg; do subs "$arg"; done' _
  • The use of echo "$(subs "$arg")" instead of just subs "$arg" adds nothing but bugs (consider what happens if one of your arguments is -n -- and that's assuming a relatively tame echo; they're allowed to consume backslashes even without a -e argument and to do all manner of other surprising things). You could do it above, but it slows your program down and makes it more prone to surprising behaviors; there's no point.
  • Running export -f subs export your function to the environment, so it can be run by other instances of bash invoked as child processes (all programs invoked by xargs are outside your shell, so they can't see shell-local variables or functions).
  • Without -I -- which is to say, in its default mode of operation -- xargs appends arguments to the end of the command it's given. This permits a much more efficient usage mode, where instead of invoking one command per line of input, it passes as many arguments as possible to the shortest possible number of subprocesses.

    This also avoids major security bugs that can happen when using xargs -I in conjunction with bash -c '...' or sh -c '...'. (If you ever use -I% sh -c '...%...', then your filenames become part of your code, and are able to be used in injection attacks on your system).

How to use substitution in xargs?

  1. How to do the substitution rightly?

You cannot use substitution in the way you are trying to do because {} is not a bash variable (only part of xargs syntax), therefore bash cannot do substitution on it.

A better way to it would be to create a full bash command and provide it as and argument to xargs (e.g. xargs -0 -i bash -c 'echo cp "$1" "${1%.txt}.dat"' - '{}' - this way you can do bash substitution).


  1. I am curious about that xargs will do things parallel when for loop do things one by one?

Yes, for loop will do things sequently but by default xargs always will. However, you can use -P option of xargs to parallelize it, from xargs man pages:

   -P max-procs, --max-procs=max-procs
Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option or the -L option
with -P; otherwise chances are that only one exec will be done. While xargs is running, you can send its process a

SIGUSR1 signal to increase the number of commands to
run simultaneously, or a SIGUSR2 to decrease the number. You cannot increase it above an implementation-defined limit (which is
shown with --show-limits). You cannot de‐
crease it below 1. xargs never terminates its commands; when asked to decrease, it merely waits for more than one existing
command to terminate before starting another.

Please  note that it is up to the called processes to properly manage parallel access to shared resources.  For example, if

more than one of them tries to print to stdout,
the ouptut will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some
way to prevent this. Using some kind of locking
scheme is one way to prevent such problems. In general, using a locking scheme will help ensure correct output but
reduce performance. If you don't want to tolerate the
performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate
resources).

Calling shell functions with xargs

Exporting the function should do it (untested):

export -f echo_var
seq -f "n%04g" 1 100 | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}

You can use the builtin printf instead of the external seq:

printf "n%04g\n" {1..100} | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}

Also, using return 0 and exit 0 like that masks any error value that might be produced by the command preceding it. Also, if there's no error, it's the default and thus somewhat redundant.

@phobic mentions that the Bash command could be simplified to

bash -c 'echo_var "{}"'

moving the {} directly inside it. But it's vulnerable to command injection as pointed out by @Sasha.

Here is an example why you should not use the embedded format:

$ echo '$(date)' | xargs -I {} bash -c 'echo_var "{}"'
Sun Aug 18 11:56:45 CDT 2019

Another example of why not:

echo '\"; date\"' | xargs -I {} bash -c 'echo_var "{}"'

This is what is output using the safe format:

$ echo '$(date)' | xargs -I {} bash -c 'echo_var "$@"' _ {}
$(date)

This is comparable to using parameterized SQL queries to avoid injection.

I'm using date in a command substitution or in escaped quotes here instead of the rm command used in Sasha's comment since it's non-destructive.

Make xargs execute the command once for each line of input

The following will only work if you do not have spaces in your input:

xargs -L 1
xargs --max-lines=1 # synonym for the -L option


from the man page:

-L max-lines
Use at most max-lines nonblank input lines per command line.
Trailing blanks cause an input line to be logically continued on
the next input line. Implies -x.

xargs with multiple commands

To start with, there is virtually no difference between:

find . | grep "file_for_print" | xargs echo

and

find . -name "file_for_print*"

except that the second one will not match filenames like this_is_not_the_file_for_print, and it will print the filenames one per line. It will also be a lot faster, because it doesn't need to generate and print the entire recursive directory structure just in order for grep to toss most of it away.

find . -name "file_for_print*"

is actually exactly the same as

find . -name "file_for_print*" -print

where the -print action prints each matched filename followed by a newline. If you don't provide find with any actions, it assumes you wanted -print. But it has more tricks up its sleeve than that. For example:

find . -name "file_for_print*" -exec cat {} \;

The -exec action causes find to execute the following command, up to the \;, replacing {} with each matching file name.

find does not limit itself to a single action. You can tell it to do however many you want. So:

find . -name "file_for_print*" -print -exec cat {} \;

will probably do pretty well what you want.

For lots more information on this very useful utility, type:

man find

or

info find

and read all about It.

Is there a better way to 'use arguments in a pipes sequence' with xargs?

I think it is more concise with GNU Parallel as follows:

find "*.obj" -print0 | parallel -0 sha256sum {} \| tee {}.sha256

In addition, it is:

  • potentially more performant since it works in parallel across all CPU cores
  • debuggable by using parallel --dry-run ...
  • more informative by using --bar or --eta to get a progress bar or ETA for completion
  • more flexible, since it gives you many predefined variables, such as {.} meaning the current file minus its extension, {/} meaning "base name" of current file, {//} meaning directory of current file and so on

xargs with multiple arguments

None of the solutions given so far deals correctly with file names containing space. Some even fail if the file names contain ' or ". If your input files are generated by users, you should be prepared for surprising file names.

GNU Parallel deals nicely with these file names and gives you (at least) 3 different solutions. If your program takes 3 and only 3 arguments then this will work:

(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -N 3 my-program --file={1} --file={2} --file={3}

Or:

(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -X -N 3 my-program --file={}

If, however, your program takes as many arguments as will fit on the command line:

(echo a1.txt; echo b1.txt; echo c1.txt;
echo d1.txt; echo e1.txt; echo f1.txt;) |
parallel -X my-program --file={}

Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ

How to properly pass filenames with spaces with $* from xargs to sed via sh?

Finally ended up with this single-line script:

sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$@" | xargs -I% sh -c 'sed -i "" "s@%@`curl -s % | base64`@" "$@"' _ "$@"

which does properly support filenames with or without spaces.

How to apply shell command to each line of a command output?

It's probably easiest to use xargs. In your case:

ls -1 | xargs -L1 echo

The -L flag ensures the input is read properly. From the man page of xargs:

-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]


Related Topics



Leave a reply



Submit