Linux: compute a single hash for a given folder & contents?
One possible way would be:
sha1sum path/to/folder/* | sha1sum
If there is a whole directory tree, you're probably better off using find and xargs. One possible command would be
find path/to/folder -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum
And, finally, if you also need to take account of permissions and empty directories:
(find path/to/folder -type f -print0 | sort -z | xargs -0 sha1sum;
find path/to/folder \( -type f -o -type d \) -print0 | sort -z | \
xargs -0 stat -c '%n %a') \
| sha1sum
The arguments to stat
will cause it to print the name of the file, followed by its octal permissions. The two finds will run one after the other, causing double the amount of disk IO, the first finding all file names and checksumming the contents, the second finding all file and directory names, printing name and mode. The list of "file names and checksums", followed by "names and directories, with permissions" will then be checksummed, for a smaller checksum.
How can I create a hash of a directory in Linux in Shell or Python?
Hash a directory's structure using filenames only
List all filepaths in the dir (recursively), sort
them (in case find
messes up), hash it all with sha1sum
and print the hash:
find /my/dir -mindepth 1 -type f -print0 | sort -z | sha1sum
You can put that in a script, like:
#!/bin/bash
# hashtree-names.sh - hash a dir's structure by filenames
# (files with same names are considered identical)
# Usage: hashtree-names.sh <dirname>
DIR=$1
find $DIR -mindepth 1 -type f -print0 | sort -z | sha1sum
And execute it on every dir under a large tree like so:
find /my/tree -mindepth 1 -type d -exec hashtree-names.sh {} \; | sort
Which will produce output similar to:
3cd8fea391f3055d9de3d6e05a422b6e97ce4204 *-
8cd93d83e9baeea479785fe0cc03c8b58aa293a3 *-
8cd93d83e9baeea479785fe0cc03c8b58aa293a3 *-
fe7dd981bb0d978608ba648eb3d38bb41f6cd956 *-
afc483808be60fbd48e716a7b916b5deaa9c78b5 *-
a518cfa27e7e9afbab2ba2209c80dbab0631736b *-
251f3cfc11eeccdfaf28142dadc5aa3aa4e2aec1 *-
251f3cfc11eeccdfaf28142dadc5aa3aa4e2aec1 *-
4a689e7c27733498c4ac5730f172c844cb6b21d1 *-
600a61b8c1a973aa6322ab4a7d57f7c07174e0ec *-
a401f27520252ae334625ca1b452396f0287f42d *-
e0b2d5f825f062d40f0f2490673888b5eb6c66fd *-
85a533625c5a38892d392f2ae9e7974e3eceaf6a *-
Hash a directory's structure, complete with file contents
See Vatine's and David Schmitt's answers to Linux: compute a single hash for a given folder & contents?.
EDIT 2017-01-27
- Code improvements: Added
-mindepth 1
tofind
Run a single command for every files in a folder- LINUX
Maybe this solution would be easier:
find <path> -maxdepth 1 -type f -exec sed -i "s/\$/\t\$f/" '{}' \;
If you omit or increase -maxdepth
the command will search in subfolders too.
How to hash the (possibly recursive) contents of a directory
Computing a hash of a directory hierarchy is expensive, especially in a large git repository.
You should look at the API provided by git. There may be a way to ask git to tell you what it is changing.
You should look at OS X's file system events API. This can send your app a notification when something in a directory hierarchy changes.
https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/FSEvents_ProgGuide/Introduction/Introduction.html
Create separate MD5 file for each file recursively
find
is the go-to tool for recursively doing anything with files:
find . -type f ! -name '*.md5' -execdir sh -c 'md5sum "$1" > "$1.md5"' _ {} \;
This picks files (not named '*.md5') and runs the given inlined shell script with the filename as $1
.
How can I calculate a hash for a filesystem-directory using Python?
This Recipe provides a nice function to do what you are asking. I've modified it to use the MD5 hash, instead of the SHA1, as your original question asks
def GetHashofDirs(directory, verbose=0):
import hashlib, os
SHAhash = hashlib.md5()
if not os.path.exists (directory):
return -1
try:
for root, dirs, files in os.walk(directory):
for names in files:
if verbose == 1:
print 'Hashing', names
filepath = os.path.join(root,names)
try:
f1 = open(filepath, 'rb')
except:
# You can't open the file for some reason
f1.close()
continue
while 1:
# Read file in as little chunks
buf = f1.read(4096)
if not buf : break
SHAhash.update(hashlib.md5(buf).hexdigest())
f1.close()
except:
import traceback
# Print the stack traceback
traceback.print_exc()
return -2
return SHAhash.hexdigest()
You can use it like this:
print GetHashofDirs('folder_to_hash', 1)
The output looks like this, as it hashes each file:
...
Hashing file1.cache
Hashing text.txt
Hashing library.dll
Hashing vsfile.pdb
Hashing prog.cs
5be45c5a67810b53146eaddcae08a809
The returned value from this function call comes back as the hash. In this case, 5be45c5a67810b53146eaddcae08a809
MD5-Checksum hashing with powershell for a whole directory
EDIT: Here's an alternate method that is consistent even if all the files are moved/copied to another location. This one uses the hashes of all files to create a "master hash". It takes longer to run for obvious reasons but will be more reliable.
$HashString = (Get-ChildItem C:\Temp -Recurse | Get-FileHash -Algorithm MD5).Hash | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))
Original, faster but less robust, method:
$HashString = Get-ChildItem C:\script\test\TestFolders -Recurse | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))
could be condensed into one line if wanted, although it starts getting harder to read:
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]"$(Get-ChildItem C:\script\test\TestFolders -Recurse|Out-String)"))
Basically it creates a memory stream with the information from Get-ChildItem
and passes that to Get-FileHash
.
Not sure if this is a great way of doing it, but it's one way :-)
Is there a way of making an md5sum of all files in subfolders?
With bash:
shopt -s globstar
md5sum ** >/tmp/hash.md5
Ignore errors of the kind: md5sum: foobar: Is a directory
From man bash
:
globstar
: If set, the pattern ** used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.
Compare all files of a folder pairwise
z_files="./z*"
for i in $z_files; do
for j in $z_files; do
if [ "$i" \> "$j" ]; then
diff -q "$i" "$j"
fi
done
done
Related Topics
Linux Command to Check If a Shell Script Is Running or Not
Bind Ctrl+Tab and Ctrl+Shift+Tab in Tmux
How to Remove All .Svn Directories from My Application Directories
Repeat Command Automatically in Linux
How to Print to the Console in Color in a Cross-Platform Manner
Single File Volume Mounted as Directory in Docker
Merge PDF's with PDFtk with Bookmarks
Difference Between Posix Aio and Libaio on Linux
How to Create a Configure Script
Linux: Compute a Single Hash for a Given Folder & Contents
Why Doesn't Linux Use the Hardware Context Switch via the Tss
What Is the Purpose of Map_Anonymous Flag in Mmap System Call
Getting Pids from Ps -Ef |Grep Keyword
What Linux Shell Command Returns a Part of a String
Crontab Run Every 15 Minutes Except at 3Am