How can I use xargs to run a function in a command substitution for each match?
If you really need to do this (and you probably don't, but we can't help without a more representative sample), a better-practice approach might look like:
subs() { sed -E "s/(.)/\1\1/g" <<<"$1"; }
export -f subs
echo "ABC" | xargs bash -c 'for arg; do subs "$arg"; done' _
- The use of
echo "$(subs "$arg")"
instead of justsubs "$arg"
adds nothing but bugs (consider what happens if one of your arguments is-n
-- and that's assuming a relatively tameecho
; they're allowed to consume backslashes even without a-e
argument and to do all manner of other surprising things). You could do it above, but it slows your program down and makes it more prone to surprising behaviors; there's no point. - Running
export -f subs
export your function to the environment, so it can be run by other instances of bash invoked as child processes (all programs invoked byxargs
are outside your shell, so they can't see shell-local variables or functions). Without
-I
-- which is to say, in its default mode of operation --xargs
appends arguments to the end of the command it's given. This permits a much more efficient usage mode, where instead of invoking one command per line of input, it passes as many arguments as possible to the shortest possible number of subprocesses.This also avoids major security bugs that can happen when using
xargs -I
in conjunction withbash -c '...'
orsh -c '...'
. (If you ever use-I% sh -c '...%...'
, then your filenames become part of your code, and are able to be used in injection attacks on your system).
How to use substitution in xargs?
- How to do the substitution rightly?
You cannot use substitution in the way you are trying to do because {}
is not a bash variable (only part of xargs syntax), therefore bash cannot do substitution on it.
A better way to it would be to create a full bash command and provide it as and argument to xargs (e.g. xargs -0 -i bash -c 'echo cp "$1" "${1%.txt}.dat"' - '{}'
- this way you can do bash substitution).
- I am curious about that xargs will do things parallel when for loop do things one by one?
Yes, for
loop will do things sequently but by default xargs always will. However, you can use -P
option of xargs
to parallelize it, from xargs
man pages:
-P max-procs, --max-procs=max-procs
Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option or the -L option
with -P; otherwise chances are that only one exec will be done. While xargs is running, you can send its process aSIGUSR1 signal to increase the number of commands to
run simultaneously, or a SIGUSR2 to decrease the number. You cannot increase it above an implementation-defined limit (which is
shown with --show-limits). You cannot de‐
crease it below 1. xargs never terminates its commands; when asked to decrease, it merely waits for more than one existing
command to terminate before starting another.Please note that it is up to the called processes to properly manage parallel access to shared resources. For example, if
more than one of them tries to print to stdout,
the ouptut will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some
way to prevent this. Using some kind of locking
scheme is one way to prevent such problems. In general, using a locking scheme will help ensure correct output but
reduce performance. If you don't want to tolerate the
performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate
resources).
Calling shell functions with xargs
Exporting the function should do it (untested):
export -f echo_var
seq -f "n%04g" 1 100 | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}
You can use the builtin printf
instead of the external seq
:
printf "n%04g\n" {1..100} | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}
Also, using return 0
and exit 0
like that masks any error value that might be produced by the command preceding it. Also, if there's no error, it's the default and thus somewhat redundant.
@phobic mentions that the Bash command could be simplified to
bash -c 'echo_var "{}"'
moving the {}
directly inside it. But it's vulnerable to command injection as pointed out by @Sasha.
Here is an example why you should not use the embedded format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "{}"'
Sun Aug 18 11:56:45 CDT 2019
Another example of why not:
echo '\"; date\"' | xargs -I {} bash -c 'echo_var "{}"'
This is what is output using the safe format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "$@"' _ {}
$(date)
This is comparable to using parameterized SQL queries to avoid injection.
I'm using date
in a command substitution or in escaped quotes here instead of the rm
command used in Sasha's comment since it's non-destructive.
Make xargs execute the command once for each line of input
The following will only work if you do not have spaces in your input:
xargs -L 1
xargs --max-lines=1 # synonym for the -L option
from the man page:
-L max-lines
Use at most max-lines nonblank input lines per command line.
Trailing blanks cause an input line to be logically continued on
the next input line. Implies -x.
xargs with multiple commands
To start with, there is virtually no difference between:
find . | grep "file_for_print" | xargs echo
and
find . -name "file_for_print*"
except that the second one will not match filenames like this_is_not_the_file_for_print
, and it will print the filenames one per line. It will also be a lot faster, because it doesn't need to generate and print the entire recursive directory structure just in order for grep to toss most of it away.
find . -name "file_for_print*"
is actually exactly the same as
find . -name "file_for_print*" -print
where the -print
action prints each matched filename followed by a newline. If you don't provide find
with any actions, it assumes you wanted -print
. But it has more tricks up its sleeve than that. For example:
find . -name "file_for_print*" -exec cat {} \;
The -exec
action causes find to execute the following command, up to the \;
, replacing {}
with each matching file name.
find
does not limit itself to a single action. You can tell it to do however many you want. So:
find . -name "file_for_print*" -print -exec cat {} \;
will probably do pretty well what you want.
For lots more information on this very useful utility, type:
man find
or
info find
and read all about It.
Is there a better way to 'use arguments in a pipes sequence' with xargs?
I think it is more concise with GNU Parallel as follows:
find "*.obj" -print0 | parallel -0 sha256sum {} \| tee {}.sha256
In addition, it is:
- potentially more performant since it works in parallel across all CPU cores
- debuggable by using
parallel --dry-run ...
- more informative by using
--bar
or--eta
to get a progress bar or ETA for completion - more flexible, since it gives you many predefined variables, such as
{.}
meaning the current file minus its extension,{/}
meaning "base name" of current file,{//}
meaning directory of current file and so on
xargs with multiple arguments
None of the solutions given so far deals correctly with file names containing space. Some even fail if the file names contain ' or ". If your input files are generated by users, you should be prepared for surprising file names.
GNU Parallel deals nicely with these file names and gives you (at least) 3 different solutions. If your program takes 3 and only 3 arguments then this will work:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -N 3 my-program --file={1} --file={2} --file={3}
Or:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -X -N 3 my-program --file={}
If, however, your program takes as many arguments as will fit on the command line:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo d1.txt; echo e1.txt; echo f1.txt;) |
parallel -X my-program --file={}
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
How to properly pass filenames with spaces with $* from xargs to sed via sh?
Finally ended up with this single-line script:
sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$@" | xargs -I% sh -c 'sed -i "" "s@%@`curl -s % | base64`@" "$@"' _ "$@"
which does properly support filenames with or without spaces.
How to apply shell command to each line of a command output?
It's probably easiest to use xargs
. In your case:
ls -1 | xargs -L1 echo
The -L
flag ensures the input is read properly. From the man page of xargs
:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]
Related Topics
How to Use Multiple Lower Layers in Overlayfs
In Bash Tee Is Making Function Variables Local, How to Escape This
What Is The Side Effect of Setting Tcp_Max_Tw_Buckets to a Very Small Value
Can a Gnome Application Be Automated? How
How to Make 'Docker Run' Inherit Ulimits
Configuring Tomat's Server.Xml File with Auto Generating Mod_Jk.Conf
.Dat Attachment Instead of Text Using Mailx in Redhat Linux
Install Gulp Browserify Gives Error Always
How to Use Find on Dirs with White Spaces
Question About Epoll and Splice
Passing a Password to "Su" Command Over Sshexec from Ant
Grep Array Parameter of Excluded Files
Sigbus While Doing Memcpy from Mmap Ed Buffer Which Is in Ram as Identified by Mincore
Bash: Ctrl+C During Input Breaks Current Terminal
Why Is Capeff All Zeros in /Proc/$Pid/Status
How to Get Complete Stack Dump from Profiler in Every Sample for Use in Flame Graph