Shell: find files in a list under a directory
If filelist.txt
has a single filename per line:
find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f
option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
:
The <( ... )
is called a process subsitution, and is a little similar to $( ... )
. The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed
runs the commands s@^@/@
, s/$/$/
and s/\([\.[\*]\|\]\)/\\\1/g
on each line of filelist.txt
and prints them out. These commands convert the filenames into a format that will work better with grep.
s@^@/@
means put a/
at the before each filename. (The^
means "start of line" in a regex)s/$/$/
means put a$
at the end of each filename. (The first$
means "end of line", the second is just a literal$
which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>
, so that a.txt
doesn't match ./a.txt.backup
or ./abba.txt
.
s/\([\.[\*]\|\]\)/\\\1/g
puts a \
before each occurrence of .
[
]
or *
. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt
would match files like abtxt
).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find
.
How to get the list of files in a directory in a shell script?
search_dir=/the/path/to/base/dir/
for entry in "$search_dir"/*
do
echo "$entry"
done
How to list only files and not directories of a directory Bash?
Using find
:
find . -maxdepth 1 -type f
Using the -maxdepth 1
option ensures that you only look in the current directory (or, if you replace the .
with some path, that directory). If you want a full recursive listing of all files in that and subdirectories, just remove that option.
Linux : Search for a Particular word in a List of files under a directory
grep
is made for this.
Use:
grep myword *
for a simple wordgrep 'my sentence' *
for a literal stringgrep "I am ${USER}" *
when you need variable replacement
You can also use regular expressions.
Add -r
for recursive and -n
to show the line number of matching lines.
And check man grep
.
Shell script to list all files in a directory
Try this Shellcheck-clean pure Bash code for the "further plan" mentioned in a comment:
#! /bin/bash -p
# List all subdirectories of the directory given in the first positional
# parameter. Include subdirectories whose names begin with dot. Exclude
# symlinks to directories.
shopt -s dotglob
shopt -s nullglob
for d in "$1"/*/; do
dir=${d%/} # Remove trailing slash
[[ -L $dir ]] && continue # Skip symlinks
printf '%s\n' "$dir"
done
shopt -s dotglob
causes shell glob patterns to match names that begin with a dot (.
). (find
does this by default.)shopt -s nullglob
causes shell glob patterns to expand to nothing when nothing matches, so looping over glob patterns is safe.- The trailing slash on the glob pattern (
"$1"/*/
) causes only directories (including symlinks to directories) to be matched. It's removed (dir=${d%/}
) partly for cleanliness but mostly to enable the test for a symlink ([[ -L $dir ]]
) to work. - See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I used
printf
instead ofecho
to print the subdirectory paths.
How to find all files containing specific text (string) on Linux
Do the following:
grep -rnw '/path/to/somewhere/' -e 'pattern'
-r
or-R
is recursive,-n
is line number, and-w
stands for match the whole word.-l
(lower-case L) can be added to just give the file name of matching files.-e
is the pattern used during the search
Along with these, --exclude
, --include
, --exclude-dir
flags could be used for efficient searching:
- This will only search through those files which have .c or .h extensions:
grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
- This will exclude searching all the files ending with .o extension:
grep --exclude=\*.o -rnw '/path/to/somewhere/' -e "pattern"
- For directories it's possible to exclude one or more directories using the
--exclude-dir
parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:
grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/somewhere/' -e "pattern"
This works very well for me, to achieve almost the same purpose like yours.
For more options, see man grep
.
Shell Script to list files in a given directory and if they are files or directories
Your line:
for file in $dir; do
will expand $dir
just to a single directory string. What you need to do is expand that to a list of files in the directory. You could do this using the following:
for file in "${dir}/"* ; do
This will expand the "${dir}/"*
section into a name-only list of the current directory. As Biffen points out, this should guarantee that the file list wont end up with split partial file names in file
if any of them contain whitespace.
If you want to recurse into the directories in dir
then using find
might be a better approach. Simply use:
for file in $( find ${dir} ); do
Note that while simple, this will not handle files or directories with spaces in them. Because of this, I would be tempted to drop the loop and generate the output in one go. This might be slightly different than what you want, but is likely to be easier to read and a lot more efficient, especially with large numbers of files. For example, To list all the directories:
find ${dir} -maxdepth 1 -type d
and to list the files:
find ${dir} -maxdepth 1 -type f
if you want to iterate into directories below, then remove the -maxdepth 1
How to get a list of the filenames of a specific folder in shell script?
When the argument to ls
is a directory, it lists the filenames in the directory.
But when you use a wildcard, the shell expands the wildcard to all the filenames. So ls
doesn't receive the directory as its argument, it receives all the filenames, and it lists them as given.
You can change to the directory and then list the matching files in the current directory:
(cd /a/b/c; ls *2021-08-18*) > filenames.txt
The parentheses make this run in a subshell, so the working directory of the original shell is unaffected.
Related Topics
Replace Whitespace with a Comma in a Text File in Linux
Best Way to Get MAChine Id on Linux
Bash: Silently Kill Background Function Process
Linux Shell Sort File According to the Second Column
Specifying Udp Receive Buffer Size at Runtime in Linux
How to Hide the Mouse Pointer Under Linux/X11
Setting Creation or Change Timestamps
How to Discover What Linux Distribution Is in Use
Find Matching Text and Replace Next Line
How to Set Process Id in Linux for a Specific Program
How to Tell in Linux Which Process Sent My Process a Signal
Shell: Find Files in a List Under a Directory
Need Text to Speech and Speech Recognition Tools for Linux
Find and Delete File or Folder Older Than X Days
How to Install Svn Post-Commit Hook