find files not in a list
Pipe your find
command through grep -Fxvf
:
find -mtime +7 -print | grep -Fxvf file.lst
What the flags mean:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
Exclude list of files from find
I don't think find
has an option like this, you could build a command using printf
and your exclude list:
find /dir -name "*.gz" $(printf "! -name %s " $(cat skip_files))
Which is the same as doing:
find /dir -name "*.gz" ! -name first_skip ! -name second_skip .... etc
Alternatively you can pipe from find
into grep
:
find /dir -name "*.gz" | grep -vFf skip_files
find filenames NOT ending in specific extensions on Unix?
Or without (
and the need to escape it:
find . -not -name "*.exe" -not -name "*.dll"
and to also exclude the listing of directories
find . -not -name "*.exe" -not -name "*.dll" -not -type d
or in positive logic ;-)
find . -not -name "*.exe" -not -name "*.dll" -type f
How to list only files and not directories of a directory Bash?
Using find
:
find . -maxdepth 1 -type f
Using the -maxdepth 1
option ensures that you only look in the current directory (or, if you replace the .
with some path, that directory). If you want a full recursive listing of all files in that and subdirectories, just remove that option.
find all files that do not begin with a specific char?
The -name
parameter does allow fnmatch style globing patterns, which is not a regular expression and has no negative patterns. So you need one of two alternatives (if you want to filter inside the find operation):
You have the option to use a RE instead. You need to be aware that regular expressions match against the whole path not the name only, so something like this should work:
find ./src -regex '.*/[^_/][^/]*\.php'
An alternative is to combine -name
with a negation expression (-a -not
means „and not“):
find ./src -name '*.php' -a -not -name '_*'
How do I find files that do not contain a given string pattern?
The following command gives me all the files that do not contain the pattern foo
:
find . -not -ipath '.*svn*' -exec grep -H -E -o -c "foo" {} \; | grep 0
Find the files existing in one directory but not in the other
diff -r dir1 dir2 | grep dir1 | awk '{print $4}' > difference1.txt
Explanation:
diff -r dir1 dir2
shows which files are only in dir1 and those only in dir2 and also the changes of the files present in both directories if any.diff -r dir1 dir2 | grep dir1
shows which files are only in dir1awk
to print only filename.
Find files that were not accessed in the last x days
You should use -atime
for access time and +10
for greater than 10 days:
find / -type f -atime +10 -name "*.png"
Fast way of finding lines in one file that are not in another?
You can achieve this by controlling the formatting of the old/new/unchanged lines in GNU diff
output:
diff --new-line-format="" --unchanged-line-format="" file1 file2
The input files should be sorted for this to work. With bash
(and zsh
) you can sort in-place with process substitution <( )
:
diff --new-line-format="" --unchanged-line-format="" <(sort file1) <(sort file2)
In the above new and unchanged lines are suppressed, so only changed (i.e. removed lines in your case) are output. You may also use a few diff
options that other solutions don't offer, such as -i
to ignore case, or various whitespace options (-E
, -b
, -v
etc) for less strict matching.
Explanation
The options --new-line-format
, --old-line-format
and --unchanged-line-format
let you control the way diff
formats the differences, similar to printf
format specifiers. These options format new (added), old (removed) and unchanged lines respectively. Setting one to empty "" prevents output of that kind of line.
If you are familiar with unified diff format, you can partly recreate it with:
diff --old-line-format="-%L" --unchanged-line-format=" %L" \
--new-line-format="+%L" file1 file2
The %L
specifier is the line in question, and we prefix each with "+" "-" or " ", like diff -u
(note that it only outputs differences, it lacks the ---
+++
and @@
lines at the top of each grouped change).
You can also use this to do other useful things like number each line with %dn
.
The diff
method (along with other suggestions comm
and join
) only produce the expected output with sorted input, though you can use <(sort ...)
to sort in place. Here's a simple awk
(nawk) script (inspired by the scripts linked-to in Konsolebox's answer) which accepts arbitrarily ordered input files, and outputs the missing lines in the order they occur in file1.
# output lines in file1 that are not in file2
BEGIN { FS="" } # preserve whitespace
(NR==FNR) { ll1[FNR]=$0; nl1=FNR; } # file1, index by lineno
(NR!=FNR) { ss2[$0]++; } # file2, index by string
END {
for (ll=1; ll<=nl1; ll++) if (!(ll1[ll] in ss2)) print ll1[ll]
}
This stores the entire contents of file1 line by line in a line-number indexed array ll1[]
, and the entire contents of file2 line by line in a line-content indexed associative array ss2[]
. After both files are read, iterate over ll1
and use the in
operator to determine if the line in file1 is present in file2. (This will have have different output to the diff
method if there are duplicates.)
In the event that the files are sufficiently large that storing them both causes a memory problem, you can trade CPU for memory by storing only file1 and deleting matches along the way as file2 is read.
BEGIN { FS="" }
(NR==FNR) { # file1, index by lineno and string
ll1[FNR]=$0; ss1[$0]=FNR; nl1=FNR;
}
(NR!=FNR) { # file2
if ($0 in ss1) { delete ll1[ss1[$0]]; delete ss1[$0]; }
}
END {
for (ll=1; ll<=nl1; ll++) if (ll in ll1) print ll1[ll]
}
The above stores the entire contents of file1 in two arrays, one indexed by line number ll1[]
, one indexed by line content ss1[]
. Then as file2 is read, each matching line is deleted from ll1[]
and ss1[]
. At the end the remaining lines from file1 are output, preserving the original order.
In this case, with the problem as stated, you can also divide and conquer using GNU split
(filtering is a GNU extension), repeated runs with chunks of file1 and reading file2 completely each time:
split -l 20000 --filter='gawk -f linesnotin.awk - file2' < file1
Note the use and placement of -
meaning stdin
on the gawk
command line. This is provided by split
from file1 in chunks of 20000 line per-invocation.
For users on non-GNU systems, there is almost certainly a GNU coreutils package you can obtain, including on OSX as part of the Apple Xcode tools which provides GNU diff
, awk
, though only a POSIX/BSD split
rather than a GNU version.
Bash script to find files in a list, copy them to dest, print files not found
Assuming that your list is separated by newlines; something like this should work
#!/bin/bash
dir=someWhere
dest=someWhereElse
toCopyList=filesomewhere
notCopied=filesomewhereElse
while read line; do
find "$dir" -name "$line" -exec cp '{}' $dest \; -printf "%f\n"
done < "$toCopyList" > cpList
#sed -i 's#'$dir'/##' cpList
# I used # instead of / in sed to not confuse sed with / in $dir
# Also, I assumed the string in $dir doesnot end with a /
cat cpList "$toCopyList" | sort | uniq -c | sed -nr '/^ +1/s/^ +1 +(.*)/\1/p' > "$notCopied"
# Will not work if you give wild cards in your "toCopyList"
Hope it helps
Related Topics
Limit on File Name Length in Bash
Linux Tool to Send Raw Data to a Tcp Server
Move Files to Directories Based on Extension
How to Access an Environment Variable in a .Desktop File's Exec Line
Why Disabling Interrupts Disables Kernel Preemption and How Spin Lock Disables Preemption
Rm: Cannot Remove: Permission Denied
How to Add a Line to a File in a Shell Script
What Do the .Eh_Frame and .Eh_Frame_Hdr Sections Store, Exactly
What Is the Use of _Iomem in Linux While Writing Device Drivers
Can't Change Tomcat 7 Heap Size
How to Convert Pptx Files to Jpg or Png (For Each Slide) on Linux
Hadoop: «Error:Java_Home Is Not Set»
Using Jq to Fetch Key Value from JSON Output
What Is the "Current" in Linux Kernel Source
Gcc Compilation "Cannot Compute Suffix of Object Files: Cannot Compile"