Shell: Find Files in a List Under a Directory

Shell: find files in a list under a directory

If filelist.txt has a single filename per line:

find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)

(The -f option means that grep searches for all the patterns in the given file.)

Explanation of <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt):

The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):

sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt

The call to sed runs the commands s@^@/@, s/$/$/ and s/\([\.[\*]\|\]\)/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.

  • s@^@/@ means put a / at the before each filename. (The ^ means "start of line" in a regex)
  • s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").

The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.

s/\([\.[\*]\|\]\)/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).

As an example:

$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile

$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$

Grep then uses each line of that output as a pattern when it is searching the output of find.

How to get the list of files in a directory in a shell script?


search_dir=/the/path/to/base/dir/
for entry in "$search_dir"/*
do
echo "$entry"
done

How to list only files and not directories of a directory Bash?

Using find:

find . -maxdepth 1 -type f

Using the -maxdepth 1 option ensures that you only look in the current directory (or, if you replace the . with some path, that directory). If you want a full recursive listing of all files in that and subdirectories, just remove that option.

Linux : Search for a Particular word in a List of files under a directory

grep is made for this.

Use:

  • grep myword * for a simple word
  • grep 'my sentence' * for a literal string
  • grep "I am ${USER}" * when you need variable replacement

You can also use regular expressions.

Add -r for recursive and -n to show the line number of matching lines.

And check man grep.

Shell script to list all files in a directory

Try this Shellcheck-clean pure Bash code for the "further plan" mentioned in a comment:

#! /bin/bash -p

# List all subdirectories of the directory given in the first positional
# parameter. Include subdirectories whose names begin with dot. Exclude
# symlinks to directories.

shopt -s dotglob
shopt -s nullglob
for d in "$1"/*/; do
dir=${d%/} # Remove trailing slash
[[ -L $dir ]] && continue # Skip symlinks
printf '%s\n' "$dir"
done
  • shopt -s dotglob causes shell glob patterns to match names that begin with a dot (.). (find does this by default.)
  • shopt -s nullglob causes shell glob patterns to expand to nothing when nothing matches, so looping over glob patterns is safe.
  • The trailing slash on the glob pattern ("$1"/*/) causes only directories (including symlinks to directories) to be matched. It's removed (dir=${d%/}) partly for cleanliness but mostly to enable the test for a symlink ([[ -L $dir ]]) to work.
  • See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I used printf instead of echo to print the subdirectory paths.

How to find all files containing specific text (string) on Linux

Do the following:

grep -rnw '/path/to/somewhere/' -e 'pattern'
  • -r or -R is recursive,
  • -n is line number, and
  • -w stands for match the whole word.
  • -l (lower-case L) can be added to just give the file name of matching files.
  • -e is the pattern used during the search

Along with these, --exclude, --include, --exclude-dir flags could be used for efficient searching:

  • This will only search through those files which have .c or .h extensions:
grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
  • This will exclude searching all the files ending with .o extension:
grep --exclude=\*.o -rnw '/path/to/somewhere/' -e "pattern"
  • For directories it's possible to exclude one or more directories using the --exclude-dir parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:
grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/somewhere/' -e "pattern"

This works very well for me, to achieve almost the same purpose like yours.

For more options, see man grep.

Shell Script to list files in a given directory and if they are files or directories

Your line:

for file in $dir; do

will expand $dir just to a single directory string. What you need to do is expand that to a list of files in the directory. You could do this using the following:

for file in "${dir}/"* ; do

This will expand the "${dir}/"* section into a name-only list of the current directory. As Biffen points out, this should guarantee that the file list wont end up with split partial file names in file if any of them contain whitespace.

If you want to recurse into the directories in dir then using find might be a better approach. Simply use:

for file in $( find ${dir} ); do

Note that while simple, this will not handle files or directories with spaces in them. Because of this, I would be tempted to drop the loop and generate the output in one go. This might be slightly different than what you want, but is likely to be easier to read and a lot more efficient, especially with large numbers of files. For example, To list all the directories:

find ${dir} -maxdepth 1 -type d

and to list the files:

find ${dir} -maxdepth 1 -type f

if you want to iterate into directories below, then remove the -maxdepth 1

How to get a list of the filenames of a specific folder in shell script?

When the argument to ls is a directory, it lists the filenames in the directory.

But when you use a wildcard, the shell expands the wildcard to all the filenames. So ls doesn't receive the directory as its argument, it receives all the filenames, and it lists them as given.

You can change to the directory and then list the matching files in the current directory:

(cd /a/b/c; ls *2021-08-18*) > filenames.txt

The parentheses make this run in a subshell, so the working directory of the original shell is unaffected.



Related Topics



Leave a reply



Submit