Linux: Compute a Single Hash for a Given Folder & Contents

Linux: compute a single hash for a given folder & contents?

One possible way would be:


sha1sum path/to/folder/* | sha1sum

If there is a whole directory tree, you're probably better off using find and xargs. One possible command would be


find path/to/folder -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum

And, finally, if you also need to take account of permissions and empty directories:

(find path/to/folder -type f -print0  | sort -z | xargs -0 sha1sum;
find path/to/folder \( -type f -o -type d \) -print0 | sort -z | \
xargs -0 stat -c '%n %a') \
| sha1sum

The arguments to stat will cause it to print the name of the file, followed by its octal permissions. The two finds will run one after the other, causing double the amount of disk IO, the first finding all file names and checksumming the contents, the second finding all file and directory names, printing name and mode. The list of "file names and checksums", followed by "names and directories, with permissions" will then be checksummed, for a smaller checksum.

How can I create a hash of a directory in Linux in Shell or Python?


Hash a directory's structure using filenames only

List all filepaths in the dir (recursively), sort them (in case find messes up), hash it all with sha1sum and print the hash:

find /my/dir -mindepth 1 -type f -print0 | sort -z | sha1sum

You can put that in a script, like:

#!/bin/bash
# hashtree-names.sh - hash a dir's structure by filenames
# (files with same names are considered identical)
# Usage: hashtree-names.sh <dirname>
DIR=$1
find $DIR -mindepth 1 -type f -print0 | sort -z | sha1sum

And execute it on every dir under a large tree like so:

find /my/tree -mindepth 1 -type d -exec hashtree-names.sh {} \; | sort

Which will produce output similar to:


3cd8fea391f3055d9de3d6e05a422b6e97ce4204 *-
8cd93d83e9baeea479785fe0cc03c8b58aa293a3 *-
8cd93d83e9baeea479785fe0cc03c8b58aa293a3 *-
fe7dd981bb0d978608ba648eb3d38bb41f6cd956 *-
afc483808be60fbd48e716a7b916b5deaa9c78b5 *-
a518cfa27e7e9afbab2ba2209c80dbab0631736b *-
251f3cfc11eeccdfaf28142dadc5aa3aa4e2aec1 *-
251f3cfc11eeccdfaf28142dadc5aa3aa4e2aec1 *-
4a689e7c27733498c4ac5730f172c844cb6b21d1 *-
600a61b8c1a973aa6322ab4a7d57f7c07174e0ec *-
a401f27520252ae334625ca1b452396f0287f42d *-
e0b2d5f825f062d40f0f2490673888b5eb6c66fd *-
85a533625c5a38892d392f2ae9e7974e3eceaf6a *-

Hash a directory's structure, complete with file contents

See Vatine's and David Schmitt's answers to Linux: compute a single hash for a given folder & contents?.

EDIT 2017-01-27

  • Code improvements: Added -mindepth 1 to find

Run a single command for every files in a folder- LINUX

Maybe this solution would be easier:

find <path> -maxdepth 1 -type f -exec sed -i "s/\$/\t\$f/" '{}' \;

If you omit or increase -maxdepth the command will search in subfolders too.

How to hash the (possibly recursive) contents of a directory

Computing a hash of a directory hierarchy is expensive, especially in a large git repository.

You should look at the API provided by git. There may be a way to ask git to tell you what it is changing.

You should look at OS X's file system events API. This can send your app a notification when something in a directory hierarchy changes.
https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/FSEvents_ProgGuide/Introduction/Introduction.html

Create separate MD5 file for each file recursively

find is the go-to tool for recursively doing anything with files:

find . -type f ! -name '*.md5' -execdir sh -c 'md5sum "$1" > "$1.md5"' _ {} \;

This picks files (not named '*.md5') and runs the given inlined shell script with the filename as $1.

How can I calculate a hash for a filesystem-directory using Python?

This Recipe provides a nice function to do what you are asking. I've modified it to use the MD5 hash, instead of the SHA1, as your original question asks

def GetHashofDirs(directory, verbose=0):
import hashlib, os
SHAhash = hashlib.md5()
if not os.path.exists (directory):
return -1

try:
for root, dirs, files in os.walk(directory):
for names in files:
if verbose == 1:
print 'Hashing', names
filepath = os.path.join(root,names)
try:
f1 = open(filepath, 'rb')
except:
# You can't open the file for some reason
f1.close()
continue

while 1:
# Read file in as little chunks
buf = f1.read(4096)
if not buf : break
SHAhash.update(hashlib.md5(buf).hexdigest())
f1.close()

except:
import traceback
# Print the stack traceback
traceback.print_exc()
return -2

return SHAhash.hexdigest()

You can use it like this:

print GetHashofDirs('folder_to_hash', 1)

The output looks like this, as it hashes each file:

...
Hashing file1.cache
Hashing text.txt
Hashing library.dll
Hashing vsfile.pdb
Hashing prog.cs
5be45c5a67810b53146eaddcae08a809

The returned value from this function call comes back as the hash. In this case, 5be45c5a67810b53146eaddcae08a809

MD5-Checksum hashing with powershell for a whole directory

EDIT: Here's an alternate method that is consistent even if all the files are moved/copied to another location. This one uses the hashes of all files to create a "master hash". It takes longer to run for obvious reasons but will be more reliable.

$HashString = (Get-ChildItem C:\Temp -Recurse | Get-FileHash -Algorithm MD5).Hash | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))

Original, faster but less robust, method:

$HashString = Get-ChildItem C:\script\test\TestFolders -Recurse | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))

could be condensed into one line if wanted, although it starts getting harder to read:

Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]"$(Get-ChildItem C:\script\test\TestFolders -Recurse|Out-String)"))

Basically it creates a memory stream with the information from Get-ChildItem and passes that to Get-FileHash.

Not sure if this is a great way of doing it, but it's one way :-)

Is there a way of making an md5sum of all files in subfolders?

With bash:

shopt -s globstar
md5sum ** >/tmp/hash.md5

Ignore errors of the kind: md5sum: foobar: Is a directory

From man bash:

globstar: If set, the pattern ** used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.

Compare all files of a folder pairwise


z_files="./z*"
for i in $z_files; do
for j in $z_files; do
if [ "$i" \> "$j" ]; then
diff -q "$i" "$j"
fi
done
done


Related Topics



Leave a reply



Submit