Exclude List of Files from Find

Exclude list of files from find

I don't think find has an option like this, you could build a command using printf and your exclude list:

find /dir -name "*.gz" $(printf "! -name %s " $(cat skip_files))

Which is the same as doing:

find /dir -name "*.gz" ! -name first_skip ! -name second_skip .... etc

Alternatively you can pipe from find into grep:

find /dir -name "*.gz" | grep -vFf skip_files

How to exclude files from list

You can use grep to filter the output of find, then use xargs to process the resulting list.

find /20210111/ -type f -iname '*.zip' -print0 \
| grep -zvFf Exclude.list - \
| xargs -0 rm
  • The -print0, -z, and -0 are used to separate the filenames by the null byte, so filenames can contain any valid character (you can't store patterns containing literal newlines in your Exclude.list, anyway).
  • grep's -F interprets the patterns as fixed strings instead of regexes.

Use find command but exclude files in two directories

Here's how you can specify that with find:

find . -type f -name "*_peaks.bed" ! -path "./tmp/*" ! -path "./scripts/*"

Explanation:

  • find . - Start find from current working directory (recursively by default)
  • -type f - Specify to find that you only want files in the results
  • -name "*_peaks.bed" - Look for files with the name ending in _peaks.bed
  • ! -path "./tmp/*" - Exclude all results whose path starts with ./tmp/
  • ! -path "./scripts/*" - Also exclude all results whose path starts with ./scripts/

Testing the Solution:

$ mkdir a b c d e
$ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
$ find . -type f ! -path "./a/*" ! -path "./b/*"

./d/4
./c/3
./e/a
./e/b
./e/5

You were pretty close, the -name option only considers the basename, where as -path considers the entire path =)

Exclude list of file extensions from find in bash shell

find . -not -name \( '*.f' '*.F' '*.h' \)

is interpreted as

find
. # path to search
-not # negate next expression
-name \( # expression for files named "("
'*.f' '*.F' .'*.h' \) # more paths to search?

leading to the error.

Since these are single-letter extensions, you can collapse them to a single glob:

find . -not -name '*.[fFh]'

but if they are longer, you have to write out the globs

find . -not -name '*.f' -not -name '*.F' -not -name '*.h'

or

find . -not \( -name '*.f' -o -name '*.F' -o -name '*.h' \)

or switch to using regular expressions.

find . -not -regex '.*\.(f|F|h)$'

Note that regular expressions in find is not part of the POSIX standard and might not be available in all implementations.

How do I exclude a directory when using `find`?

Use the -prune primary. For example, if you want to exclude ./misc:

find . -path ./misc -prune -o -name '*.txt' -print

To exclude multiple directories, OR them between parentheses.

find . -type d \( -path ./dir1 -o -path ./dir2 -o -path ./dir3 \) -prune -o -name '*.txt' -print

And, to exclude directories with a specific name at any level, use the -name primary instead of -path.

find . -type d -name node_modules -prune -o -name '*.json' -print

Exclude range of directories in find command

You can use wildcards in the pattern for the option -not -path:

find ./ -type f -name "*.bz2" -not -path "./0*/*" -not -path "./1*/*

this will exclude all directories starting with 0 or 1. Or even better:

find ./ -type f -name "*.bz2" -not -path "./[01]*/*"

How to ignore/exclude files from the output of find command using grep?

You can use grep's -v flag to achieve this. In order to exclude one such item from results:

your commands here | grep -v "cards.js"

And if you want to chain multiple grep matches, do this:

yourcommands here | grep -v -e "cards.js" -e "radios-and-checkboxes.css"`

Please use the -w if you want EXACT match with the strings in grep. So for an exact match with "cards.js" use: grep -v -w -e "cards.js". Using the -w once will work for multiple extends.

find directories but exclude list where directories have a space in name

You could read the exclude file into a Bash array and then craft a find command like this:

mapfile -t exclude < exclude.txt
find ./base_dir \
-mindepth 1 \ # Exclude the current directory
-type d \
-regextype egrep \ # Make sure alternation "|" does not have to be escaped
! -iregex ".*/($(IFS='|'; echo "${exclude[*]}"))" \
-printf '%f\n' # Print just filename without leading directories

resulting in

sub_dir1
sub_dir4

For your example input, the -iregex test expands like this:

$ IFS='|'
$ echo "${exclude[*]}")
sub_dir2|sub dir3

so the regular expression for paths to exclude becomes

.*/(sub_dir2|sub dir3)

The change to IFS is limited to the command substitution.

The limitation to this is if the directories to be excluded contain characters that are special to regexes, you have to escape those, which can get messy. If you wanted to escape, for example, pipes, you could use

echo "${exclude[*]//|/\\|}"

in the command substitution, resulting in

sub_dir2|sub dir3|has\|pipe

where the directory has|pipe with a | in its name has its pipe properly escaped.

Exclude files and directories from find . function that match some regular expression

You have several options. Closest to what you asked, if you are using GNU find then you can use a negated -regex test to filter out the files you don't want to see. Since this matches against the whole path to each file (relative to one of the starting directories) you can write a regex that matches both paths ending with your file name and those having that name as an intermediate directory. For example,

find . -not -regex '\(.*/\)?bbb\(/.*\)?'

(Note that anchoring the pattern is unnecessary, as the test succeeds only if the pattern matches the whole path under consideration anyway.)

But better might be to use a negated filename test combined with the -prune action, something like this:

find . -not -name 'bbb' -o \( -prune -false \)

The -name test compares the base file name of each file considered with the specified shell pattern (glob), and the result is negated by the -not operator. The right-hand expression of the -o (logical or) operator is evaluated only if the left-hand expression evaluates to false, and in that case it performs the -prune action before ultimately evaluating to false itself. Thus, files with the given name are suppressed (by -false) and their descendants, if any, are not scanned at all (because of -prune). All of this is portable to any POSIX-conformant find.

find common files between two directories - exclude file extension

Python version:

EDIT: now suports multiple extensions

#!/usr/bin/python3

import glob, os

def removeext(filename):
index = filename.find(".")
return(filename[:index])

setA = set(map(removeext,os.listdir('A')))
print("Files in directory A: " + str(setA))

setB = set(map(removeext,os.listdir('B')))
print("Files in directory B: " + str(setB))

setDiff = setA.difference(setB)
print("Files only in directory A: " + str(setDiff))

for filename in setDiff:
file_path = "A/" + filename + ".*"
for file in glob.glob(file_path):
print("file=" + file)
os.remove(file)

Does pretty much the same as my bash version above.

  • list files in A
  • list files in B
  • get the list of differences
  • delete the differences from A

Test output, done on Linux Mint, bash 4.4.20

mint:~/SO$ l
drwxr-xr-x 2 Nic3500 Nic3500 4096 May 10 10:36 A/
drwxr-xr-x 2 Nic3500 Nic3500 4096 May 10 10:36 B/

mint:~/SO$ l A
total 0
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file1.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file2.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:14 file3.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:36 file4.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file.fasta.profile
mint:~/SO$ l B
total 0
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:05 file1.dssp
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file2.dssp
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file3.dssp
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:05 file.dssp

mint:~/SO$ ./so.py
Files in directory A: {'file1', 'file', 'file3', 'file2', 'file4'}
Files in directory B: {'file1', 'file', 'file3', 'file2'}
Files only in directory A: {'file4'}
file=A/file4.fasta.profile

mint:~/SO$ l A
total 0
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file1.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file2.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:14 file3.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file.fasta.profile


Related Topics



Leave a reply



Submit