Using a Glob Expression Passed as a Bash Script Argument

Using a glob expression passed as a bash script argument


Addressing the "why"

Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.

By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.

Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.

The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).


Idiomatic Usage

The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:

#!/bin/bash
out_base=$1; shift

shopt -s nullglob # avoid generating an error if a directory has no .status

for dir; do # iterate over directories passed in $2, $3, etc
for file in "$dir"/*.status; do # iterate over files ending in .status within those
grep -e "string" "$file" # match a single file
done
done >"${out_base}.extension"

If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:

#!/bin/bash
out_base=$1; shift

find "$@" -maxdepth 1 -type f -name '*.status' \
-exec grep -h -- /dev/null '{}' + \
>"${out_base}.extension"

Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:

# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map

This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.


Some other points of note:

  • Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
  • for dir; do is precisely equivalent to for dir in "$@"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $@; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
  • Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
  • Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

Glob as the argument of a shell function

ists all the files ending with .mp3 in an array ... there is no array involved in your question.

But to your problem: First, you want to pass to your function a wildcard pattern, but this is not what you are actually doing. testf *.mp3 expands the pattern before the function is invoked (this process is called filename generation), and your testf gets just a list of files as parameters. You can pass a pattern, but you have to ask the shell not to expand it:

testf '*.mp3'

In this case, your $1 indeed will contain the string *.mp3. However, your print ./$1 will still not work. The reason is that filename generation occurs before parameter expansion (which is the process where $1 is replaced by the string it contains). Again, you have to ask the shell to do it the other way round:

print ./${~1}

How to assign a glob expression to a variable in a Bash script?

I think it is the order of expansions:

The order of expansions is: brace
expansion
, tilde expansion, parameter,
variable and arithmetic expansion and
command substitution (done in a
left-to-right fashion), word
splitting, and pathname expansion.

So if your variable is substituted, brace expansion doesn't take place anymore. This works for me:

eval ls $dirs

Be very careful with eval. It will execute the stuff verbatimly. So if dirs contains f{m,k}t*; some_command, some_command will be executed after the ls finished. It will execute the string you give to eval in the current shell. It will pass /content/dev01 /content/dev02 to ls, whether they exist or not. Putting * after the stuff makes it a pathname-expansion, and it will omit non-existing paths:

dirs=/content/{dev01,dev02}*

I'm not 100% sure about this, but it makes sense to me.

Get unexpanded argument from bash command line


Get unexpanded argument from bash command line

The only way to pass an argument unexpanded by the shell is to either quote it or to use appropriate escapes.
Specifically, to quote or escape the portions that the shell would try to expand.

However, this requires to add quotes around the arguments, so tab-completion cannot be used directly.

You don't need to add quotes around entire arguments.
It's enough to do that around characters that have special meaning in the shell.

For example, if you autocomplete this command line until this point:

python stuff.py file_00
^ cursor is here, and you have many files,
for example file_001, file_002, ...

At this point, if you want to add a literal * to pass file_00* to the Python script without the shell interpreting it,
you can write like this:

python stuff.py file_00\*

Or like this:

python stuff.py file_00'*'

As a further example,
note that when the file pattern contains spaces,
tab completion will add the \ correctly, for example:

python stuff.py file\ with\ spaces\ 00

Here too, you can add the escaped * as usual:

python stuff.py file\ with\ spaces\ 00\*

In conclusion,
you can use tab completion naturally,
and escape only the special characters after the tab completion.
And then use the glob Python module to expand the glob parts in arguments.

Bash glob parameter only shows first file instead of all files

In bash and ksh you can iterate through all arguments except the last like this:

for f in "${@:1:$#-1}"; do
echo "$f"
done

In zsh, you can do something similar:

for f in $@[1,${#}-1]; do
echo "$f"
done

$# is the number of arguments and ${@:start:length} is substring/subsequence notation in bash and ksh, while $@[start,end] is subsequence in zsh. In all cases, the subscript expressions are evaluated as arithmetic expressions, which is why $#-1 works. (In zsh, you need ${#}-1 because $#- is interpreted as "the length of $-".)

In all three shells, you can use the ${x:start:length} syntax with a scalar variable, to extract a substring; in bash and ksh, you can use ${a[@]:start:length} with an array to extract a subsequence of values.

Passing regular expression as parameter

The if statement takes a command. [[ being one, and grep is another, writing [[ grep ... ]] is essentially as wrong as writing vim grep, or cat grep etc, just use:

if grep -q -e "$pattern"
then
...

instead.

The -q switch to grep will disable output, but set the exit status to 0 (success) when the pattern is matches, and 1 (failure) otherwise, and the if statement will only execute the then block if the command succeded.

Using -q will allow grep to exit as soon as the first line is matches.

And as always, remember to wrap your paremeter expansions in double quotes, to avoid pathname expansion and wordsplitting.

Note that square brackets [...] will be interpreted by your calling shell, and you should escape them, or wrap the whole pattern in quotes.

It's always recommended use single quotes, as the only special character is another single quote.

$ ./lab12.sh find '[Gg]reen'

expand a glob string and pass resulting arguments to GNU parallel

Try this:

string=./script*.sh

# expand * here
echo $string | tr " " "\n" | parallel


Related Topics



Leave a reply



Submit