Match All Files Under All Nested Directories with Shell Globbing

Match all files under all nested directories with shell globbing

In Bash 4, with shopt -s globstar, and zsh you can use **/* which will include everything except hidden files. You can do shopt -s dotglob in Bash 4 or setopt dotglob in zsh to cause hidden files to be included.

In ksh, set -o globstar enables it. I don't think there's a way to include dot files implicitly, but I think **/{.[^.],}* works.

Is there a globbing pattern to match by file extension, both PWD and recursively?

With shell globing it is possible to only get directories by adding a / at the end of the glob, but there's no way to exclusively get files (zsh being an exception)

Illustration:

With the given tree:

file.php
inc.php/include.php
lib/lib.php

Supposing that the shell supports the non-standard ** glob:

  • **/*.php/ expands to inc.php/

  • **/*.php expands to file.php inc.php inc.php/include.php lib/lib.php

  • For getting file.php inc.php/include.php lib/lib.php, you cannot use a glob.

    => with zsh it would be **/*.php(.)

Standard work-around (any shell, any OS)

The POSIX way to recursively get the files that match a given standard glob and then apply a command to them is to use find -type f -name ... -exec ...:

  • ls -l <all .php files> would be:
find . -type f -name '*.php' -exec ls -l {} +
  • grep "finde me" <all .php files> would be:
find . -type f -name '*.php' -exec grep "finde me" {} +
  • cp <all .php files> ~/destination/ would be:
find . -type f -name '*.php' -type f -exec sh -c 'cp "$@" ~/destination/' _ {} +

remark: This one is a little more tricky because you need ~/destination/ to be after the file arguments, and find's syntax doesn't allow find -exec ... {} ~/destination/ +

What expands to all files in current directory recursively?

This will work in Bash 4:

ls -l {,**/}*.ext

In order for the double-asterisk glob to work, the globstar option needs to be set (default: on):

shopt -s globstar

From man bash:


globstar
If set, the pattern ** used in a filename expansion con‐
text will match a files and zero or more directories and
subdirectories. If the pattern is followed by a /, only
directories and subdirectories match.

Now I'm wondering if there might have once been a bug in globstar processing, because now using simply ls **/*.ext I'm getting correct results.

Regardless, I looked at the analysis kenorb did using the VLC repository and found some problems with that analysis and in my answer immediately above:

The comparisons to the output of the find command are invalid since specifying -type f doesn't include other file types (directories in particular) and the ls commands listed likely do. Also, one of the commands listed, ls -1 {,**/}*.* - which would seem to be based on mine above, only outputs names that include a dot for those files that are in subdirectories. The OP's question and my answer include a dot since what is being sought is files with a specific extension.

Most importantly, however, is that there is a special issue using the ls command with the globstar pattern **. Many duplicates arise since the pattern is expanded by Bash to all file names (and directory names) in the tree being examined. Subsequent to the expansion the ls command lists each of them and their contents if they are directories.

Example:

In our current directory is the subdirectory A and its contents:

A
└── AB
   └── ABC
   ├── ABC1
   ├── ABC2
   └── ABCD
      └── ABCD1

In that tree, ** expands to "A A/AB A/AB/ABC A/AB/ABC/ABC1 A/AB/ABC/ABC2 A/AB/ABC/ABCD A/AB/ABC/ABCD/ABCD1" (7 entries). If you do echo ** that's the exact output you'd get and each entry is represented once. However, if you do ls ** it's going to output a listing of each of those entries. So essentially it does ls A followed by ls A/AB, etc., so A/AB gets shown twice. Also, ls is going to set each subdirectory's output apart:

...
<blank line>
directory name:
content-item
content-item

So using wc -l counts all those blank lines and directory name section headings which throws off the count even farther.

This a yet another reason why you should not parse ls.

As a result of this further analysis, I recommend not using the globstar pattern in any circumstance other than iterating over a tree of files in this manner:

for entry in **
do
something "$entry"
done

As a final comparison, I used a Bash source repository I had handy and did this:

shopt -s globstar dotglob
diff <(echo ** | tr ' ' '\n') <(find . | sed 's|\./||' | sort)
0a1
> .

I used tr to change spaces to newlines which is only valid here since no names include spaces. I used sed to remove the leading ./ from each line of output from find. I sorted the output of find since it is normally unsorted and Bash's expansion of globs is already sorted. As you can see, the only output from diff was the current directory . output by find. When I did ls ** | wc -l the output had almost twice as many lines.

How can I recursively find all files in current and subfolders based on wildcard matching?

Use find:

find . -name "foo*"

find needs a starting point, so the . (dot) points to the current directory.

How to ls all the files in the subdirectories using wildcard?

3 solutions :

Simple glob

ls */*.pdb

Recursive using bash

shopt -s globstar
ls **/*.pdb

Recursive using find

find . -type f -name '*.pdb'

How can I search sub-folders using glob.glob module?

In Python 3.5 and newer use the new recursive **/ functionality:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursive is set, ** followed by a path separator matches 0 or more subdirectories.

In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.

In that case I'd use os.walk() combined with fnmatch.filter() instead:

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]

This'll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]

Globbing pattern to include all files in the intermediate folder

There is no "webapp" in your directory structure :) Maybe you want something like this?

$ find . -wholename "**/web/libs/*"
./src2/web/libs/t
./src2/web/libs/tt
./src/web/libs/ttt

Bash - What is a good way to recursively find the type of all files in a directory and its subdirectories?

This may help: How to recursively list subdirectories in Bash without using "find" or "ls" commands?

That said, I modified it to accept user input as follows:

#!/bin/bash

recurse() {
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
recurse "$i"
elif [ -f "$i" ]; then
echo "file: $i"
fi
done
}

recurse $1

If you didn't want the files portion (which it appears you don't) then just remove the elif and line below it. I left it in as the original post had it also. Hope this helps.

How to use glob() to find files recursively?

pathlib.Path.rglob

Use pathlib.Path.rglob from the the pathlib module, which was introduced in Python 3.5.

from pathlib import Path

for path in Path('src').rglob('*.c'):
print(path.name)

If you don't want to use pathlib, use can use glob.glob('**/*.c'), but don't forget to pass in the recursive keyword parameter and it will use inordinate amount of time on large directories.

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

os.walk

For older Python versions, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
for filename in fnmatch.filter(filenames, '*.c'):
matches.append(os.path.join(root, filename))


Related Topics



Leave a reply



Submit