Compare Md5 Sums in Bash Script

Compare md5 sums in bash script

So .. the problem you're seeing appears to be that the format of the md5sum.txt file you create doesn't match the format of the .md5 file that you download, against which you need to check the value that you calculate.

The following would be closer to my version of the script. (Explanation below.)

#!/bin/bash

if ! cd /home/example/public_html/exampledomain.com/billing/system/; then
  echo "Can't find work directory" >&2
  exit 1
fi

rm -f GeoLiteCity.dat

curl -L https://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz | gunzip > GeoLiteCity.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz | gunzip > GeoLite2-City.dat
curl -O https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.md5
md5sum < GeoLite2-City.dat | cut -d\  -f1 > md5sum.txt

file1="md5sum.txt"
file2="GeoLite2-City.md5"

if ! cmp --silent "$file1" "$file2"; then
  mail -s "Results of GeoLite Updates" email@address.com <<< "md5sum for GeoLite2-City failed. Please check the md5sum. File may possibly be corrupted."
fi

The major differences here are..

rm -f GeoLightCity.dat instead of -rf. Let's not reach farther than we need to.
md5sum takes standard input rather than processing the file by name. The effect is that the output does not include a filename. Unfortunately because of limitations to the Linux md5sum command, this still doesn't match the .md5 file you download from Maxmind, so:
cut is used to modify the resultant output, leaving only the calculated md5.
using cmp instead of subshells, per comments on your question.

The second and third points are perhaps the most important ones for you.

Another option for creating your md5sum.txt file would be to do it on-the-fly as you're download. For example:

curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz \
| gunzip | tee -a GeoLite2-City.dat | cut -d\  -f1 | md5sum > md5sum.txt

This uses the tee command to split the file into its "save" location and another pipe, which goes through md5sum to generate your .txt file.

Might save you a minute that would otherwise be eaten by the md5sum that runs afterwards. And it'll take better advantage of SMP. :)

Comparing content of 2 files with md5sum

You need a program/built-in that evaluates the comparison. Usually you would use test/[/[[ to do so. With these programs -eq compares decimal numbers. Therefore use the string comparison = instead.

[[ "$(md5sum file_1.sql)" = "$(md5sum file_2.sql)" ]]

The exit code $? of this command tells you wether the two strings were equal.

However, you may want to use cmp instead. This program compares the files directly, should be faster because it doesn't have to compute anything, and is also safer as it cannot give false positives like a hash comparison can do.

cmp file_1.sql file_2.sql

compare files in shell script with md5sum and create csv for the changed file

Food for thought maybe
runs to check if different, if so prints lines that have with the bits you indicated you wished to save to csv

#!/bin/bash

#Check if file are different then grep for word differ 
#normally would spit out Files file2 and file1 differ
# flags are -F fixed string, -w match only full words
# -q quiet ie no output to stdout (screen)

if $(diff -q "$2" "$1" | grep -Fwq "differ")
then
    #create a var of the changed text, awk looking at 
    #begining of line to see if begins with > and then
    #output the full fine for awk to then select the 
    #vars you want
    changeSyn=$(diff file2 file1 | awk '$1 ~ /^ *>/' | awk '{print $2","$5","$7 }')
    #same again only for new vars
    addedSyn=$(diff file2 file1 | awk '$1 ~ /^ *</' | awk '{print $2","$5","$7 }')
    echo "$changeSyn"
    echo "$addedSyn"
else
    echo "No change"
fi

Bash - Compare 2 lists of files with their md5 check sums

An attempt using Awk which is the right tool meant for this,

awk -F"/" 'FNR==NR{filearray[$1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' file2 file1
file4.php has a different md5sum

Where, file2 and file1 are as follows

$ cat file1
df7a0edcb7994581430379db56d8d53b  /home/user/vanila/file-1.php
e1af39e94239a944440ab2925393ae60  /home/user/vanila/file-2.php
ce74e43d24d9c36cd579e932ee94b152  /home/user/vanila/file-3.php
95b7d47ed7134912270f8d3059100e8c  /home/user/vanila/file-4.php

$ cat file2
df7a0edcb7994581430379db56d8d53b  /home/user/file-1.php
94b2a24a1fc9883246fc103f22818930  /home/user/file-1.1.php
e1af39e94239a944440ab2925393ae60  /home/user/file-2.php
ce74e43d24d9c36cd579e932ee94b152  /home/user/file-3.php
f5233ee990c50aade7c4e3ab9b4fe524  /home/user/file-4.php

To find the file is not present in one and not in other,

awk -F"/" 'FNR==NR{filelist[$NF]=$NF; next;}!($NF in filelist){printf "%s is an extra file",$NF}' file1 file2
file-1.1.php is an extra file

How to compare md5 hash values on a condition on shell script?

The issue you are having with the (updated) posted code is that you are using a for loop when a while loop works.

The following code works for me. I simply changed the for loop to a while loop.

#!/bin/sh

check() {
        dir="$1"
        chsum1=`find ~/NASAtest -type f -exec cat {} \; | md5`
        chsum2=$chsum1

        while [ $chsum1 == $chsum2 ]
        do
                echo "hello"
                sleep 10
                chsum2=`find ~/NASAtest -type f -exec cat {} \; | md5`
        done

        echo "hello"
        #eval $2
}

check $*

The reason the while loop wasn't working is because you were missing spaces between the square brackets and the expression.

How to detect only the different files in my bash shell script?

Here is your script corrected:

while IFS= read -r filename;
    do
        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
        # inspecting the digest of each file individually         #
        # shows many files are identical and so are the digests   #
        # It also prints MD5 (full file path) = md5_signature!    #
        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
        md5 "old/$filename"              # please use double quotes
        md5 "new/$filename" 
        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
        # Using -q eliminates all output from md5 except the sig      #
        # Your script now works correctly                             #
        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        [[ $(md5 -q "old/$filename") == $(md5 -q "new/$filename") ]] || echo differs; # differs
    done < files.txt

Problems:

You had a typo of new/$fullfile rather than new/$filename
You should use "new/$filename" (ie, use double quotes) around the file name expansions
Use md5 -q to compare output of md5 on different files. Otherwise md5, by default, prints the input file path in the form of MD5 (full_path/base_name) = 2504fcc0c0a57d14aa6b4193b5efaf94. Since these paths are guaranteed to be different in two different directories, the different path names will cause the failure in the string comparison.

The comments above assume you are using md5 on BSD or, likely, on macOS.

Here is an alternate solution that works both on Linux with md5sum and BSD with md5. Just feed the content of the file to the stdin of either program and only the md5 signature is printed:

$ md5 <new/file.pdf
2504fcc0c0a57d14aa6b4193b5efaf94

vs if you use the file name, the path is printed and the MD5 hash signature used is printed:

$ md5 new/file.pdf
MD5 (new/file.pdf) = 2504fcc0c0a57d14aa6b4193b5efaf94

The same holds true for md5sum on Linux or GNU core utilities.

MD5 comparison between two text files

I don't know if such a command exist, but I've taken the liberty to write you a sorting mechanism in Bash. Although it's optimised, I suggest you recreate it in a language of your own choice.

#! /bin/bash

# Sets the array delimiter to a newline
IFS=$'\n'

# If $1 is empty, default to 'file1.txt'. Same for $2.
FILE1=${1:-file1.txt}
FILE2=${2:-file2.txt}

DELETED=()
ADDED=()
CHANGED=()

# Loop over array $1 and print content
function array_print {
        # -n creates a "pointer" to an array. This
        # way you can pass large arrays to functions.
        local -n array=$1
        echo "$1: "

        for i in "${array}"; do
                echo $i
        done
}

# This function loops over the entries in file_in and checks
# if they exist in file_tst. Unless doubles are found, a
# callback is executed.
function array_sort {
        local file_in="$1"
        local file_tst="$2"
        local callback=${3:-true}
        local -n arr0=$4
        local -n arr1=$5

        while read -r line; do

                tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
                tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
                hit=$(grep $tst_name $file_tst)

                # If found, skip. Nothing is changed.
                [[ $hit != $line ]] || continue

                # Run callback
                $callback "$hit" "$line" arr0 arr1

        done < "$file_in"
}

# If tst is empty, line will be added to not_found. For file 1 this 
# means that file doesn't exist in file2, thus is deleted. Otherwise
# the file is changed.
function callback_file1 {
        local tst=$1
        local line=$2
        local -n not_found=$3
        local -n found=$4

        if [[ -z $tst ]]; then
                not_found+=($line)
        else
                found+=($line)
        fi
}

# If tst is empty, line will be added to not_found. For file 2 this
# means that file doesn't exist in file1, thus is added. Since the 
# callback for file 1 already filled all the changed files, we do 
# nothing with the fourth parameter.
function callback_file2 {
        local tst=$1
        local line=$2
        local -n not_found=$3

        if [[ -z $tst ]]; then
                not_found+=($line)
        fi
}

array_sort "$FILE1" "$FILE2" callback_file1 DELETED CHANGED 
array_sort "$FILE2" "$FILE1" callback_file2 ADDED CHANGED 

array_print ADDED
array_print DELETED
array_print CHANGED
exit 0

Since it might be hard to understand the code above, I've written it out. I hope it helps :-)

while read -r line; do
       tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
       tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
       hit=$(grep $tst_name $FILE2)

       # If found, skip. Nothing is changed.
       [[ $hit != $line ]] || continue

       # If name does not occur, it's deleted (exists in 
       # file1, but not in file2)
       if [[ -z $hit ]]; then
               DELETED+=($line)
       else
       # If name occurs, it's changed. Otherwise it would
       # not come here due to previous if-statement.
               CHANGED+=($line)
       fi
done < "$FILE1"

while read -r line; do
       tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
       tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
       hit=$(grep $tst_name $FILE1)

       # If found, skip. Nothing is changed.
       [[ $hit != $line ]] || continue

       # If name does not occur, it's added. (exists in 
       # file2, but not in file1)
       if [[ -z $hit ]]; then
               ADDED+=($line)
       fi
done < "$FILE2"

Bash script md5sum

#! /bin/bash
while read -r user passwd ; do
    md5=$(printf %s "$passwd" | md5sum | cut -c1-32)
    printf '%s %s %s\n' "$user" "$passwd" "$md5"
done

Compare Md5 Sums in Bash Script