Algorithm to Compare Two Images in C#

Algorithm to compare two images in C#

Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)

  1. resize the picture to 16x16 pixel

16x16 resized


  1. reduce colors to black/white (which equals true/false in this console output)

Sample Image


  1. read the boolean values into List<bool> - this is the hash

Code:

public static List<bool> GetHash(Bitmap bmpSource)
{
List<bool> lResult = new List<bool>();
//create new image with 16x16 pixel
Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
for (int j = 0; j < bmpMin.Height; j++)
{
for (int i = 0; i < bmpMin.Width; i++)
{
//reduce colors to true / false
lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
}
}
return lResult;
}

I know, GetPixel is not that fast but on a 16x16 pixel image it should not be the bottleneck.


  1. compare this hash to hash values from other images and add a tolerance.(number of pixels that can differ from the other hash)

Code:

List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));

//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);

So this code is able to find equal images with:

  • different file formats (e.g. jpg, png, bmp)
  • rotation (90, 180, 270), horizontal /vertical flip - by changing the iteration order of i and j
  • different dimensions (same aspect is required)
  • different compression (tolerance is required in case of quality loss like jpeg artifacts) - you can accept a 99% equality to be the same image and 50% to be a different one.
  • colored changed to geyscaled and the other way round (because brightness is independent of color)

Update / Improvements:

after using this method for a while I noticed a few improvements that can be done

  • replacing GetPixel for more performance
  • using the exeif-thumbnail instead of reading the whole image for a performance improvement
  • instead of setting 0.5f to differ between light and dark - use the distinct median brightness of all 256 pixels. Otherwise dark/light images are assumed to be the same and it enables to detect images which have a changed brightness.
  • if you need fast calculations, use bool[] or List<bool> if you need to store a lot hashes with the need to save memory, use a Bitarray because a Boolean isn't stored in a bit, it takes a byte!

Comparing two images problem

http://en.wikipedia.org/wiki/BMP_file_format#Bitmap_file_header

Read 4 bytes at offset 10 to obtain the offset of the pixel data. Read 4 bytes at offset 2 to get the size of the entire thing including header. Subtract the former from the latter to get the size of the pixel data.

C# Comparing 2 Bitmaps with tolerance?

There are a few possible test cases for how "different" images can be, and the approach that you need to take to "match" them become progressively harder.

  • The simple case is where you expect the images to be the same size and contain the same features, except for pixels being slightly lighter or darker as you state. For example, if you were to save a bitmap image once as 24-bit, then again as 8-bit (without dithering). The two images would have the same features in the same place, but slightly different colours for each pixel.
  • Another possibility is where you take a bitmap image, and crop 1 line of pixels of the left to create the first image, and crop a line off the right for the second image.
  • A third possibility is where you save a bitmap image, then double the image size in both directions to create the second (so one pixel in the source becomes four pixels in the output). I accept that you have size checking code to detect this case already.
  • A fourth possibility (more of a test case) is to have a large image of just white, and a large image of just black. We expect this to return ciPixelMismatch, but without throwing an exception.

If we consider only the first case, then what you really want is a function to compare two Colors and return a value for how different they are. A simple way to compare two colours is to calculate the Pythagorean distance between the Red, Green and Blue components, such as

static int CompareColours(Color x, Color y)
{
return (int)(Math.Pow((int)x.R - y.R, 2) + Math.Pow((int)x.B - y.B, 2) + Math.Pow((int)x.G - y.G, 2));
}

this will return an number between 0 (when the Colors are identical) and 198608 (between Black and White, which is Math.Pow(256, 2) * 3).

With this, you can apply the function to each pair of pixels (one from each image) and accumulate the error. Average this error across the number of pixels to determine the average pixel error across the whole image, then compare this to a threshold to determine if they are "the same":

const decimal errorThreshold = 0.0001D
decimal totalError = 0;
for (int x = 0; x < bmp1.Width; x++)
{
for (int y = 0; y < bmp1.Height; y++)
{
totalError += CompareColours(bmp1.GetPixel(x, y), bmp2.GetPixel(x, y)) / 198608D;
}
}
decimal averageError = totalError / (bmp1.Width * bmp1.Height);
if ( averageError > errorThreshold ) cr = CompareResult.ciPixelMismatch;

(I divide by 198608D to avoid the possibility of an Integer Overflow while adding. averageError is then a value between 0D for identical and 1D for completely different.)

I'd also recommend you look at some of the other questions here on StackOverflow. While this pixel-colour-matching works for the simplest case, it won't work for the others. The approaches given in answers to other questions will be useful if you need something more complex:

  • Algorithm to compare two images
  • Algorithm to compare two images in C#
  • Image comparison - fast algorithm

Hope this helps

How to compare 1,000 images using the available memory efficiently

I managed to come up with a compact algorithm that solves this quite difficult problem. The algorithm maintains 5 pieces of information as state:

  1. A list with the items that are currently loaded.
  2. The total size of the currently loaded items.
  3. A queue with the items that are not currently loaded.
  4. An array containing the number of pairs that are remaining for each item.
  5. A BitArray storing whether any particular pair has been emitted.

The algorithm moves items between the two collections loaded and unloaded, loads and unloads items in respect to the maxConcurrentSize policy, yields as many pairs from the loaded list is possible, and continues until both collections are empty. At that point all possible pairs are expected to be emitted, provided that the algorithm is correct.

public static IEnumerable<(TItem, TItem)> GetPairs<TSource, TItem>(
IReadOnlyList<TSource> source,
Func<TSource, long> sizeSelector,
Func<TSource, TItem> itemLoader,
long maxConcurrentSize) where TItem : IDisposable
{
List<(int Index, TItem Item, long Size)> loaded = new();
try
{
long loadedSumSize = 0L;
Queue<int> unloaded = new(Enumerable.Range(0, source.Count));
int[] pending = new int[source.Count];
for (int i = 0; i < pending.Length; i++) pending[i] = source.Count - 1;
BitArray emittedPairs = new(checked(source.Count * (source.Count - 1) >> 1));
while (unloaded.Count > 0)
{
// Select the next item to load, in FIFO order.
int indexToLoad = unloaded.Dequeue();
long sizeToLoad = Math.Max(0L, sizeSelector(source[indexToLoad]));

// Unload already loaded items until there is enough space for the
// new item, in LIFO order.
while (loaded.Count > 1 && loadedSumSize + sizeToLoad > maxConcurrentSize)
{
var toUnload = loaded[loaded.Count - 1];
loaded.RemoveAt(loaded.Count - 1);
toUnload.Item.Dispose();
unloaded.Enqueue(toUnload.Index);
loadedSumSize -= toUnload.Size;
}

// Load the new item.
var itemToLoad = itemLoader(source[indexToLoad]);
loaded.Add((indexToLoad, itemToLoad, sizeToLoad));
checked { loadedSumSize += sizeToLoad; }
var justLoaded = loaded[loaded.Count - 1];

// Yield pairs constituted of the new item with all already loaded items.
for (int i = 0; i < loaded.Count - 1; i++)
{
var existing = loaded[i];
Debug.Assert(existing.Index != justLoaded.Index);

// Switch the order in case the pair is not in the correct order.
var (x, y) = existing.Index < justLoaded.Index ?
(existing, justLoaded) : (justLoaded, existing);

// Emit the pair if it's not already emitted.
int emittedIndex = (y.Index * (y.Index - 1) >> 1) + x.Index;
if (emittedPairs[emittedIndex]) continue;
yield return (x.Item, y.Item);
emittedPairs[emittedIndex] = true;
pending[x.Index]--;
pending[y.Index]--;
}

// Unload all items that are completed.
for (int i = loaded.Count - 1; i >= 0; i--)
{
var (index, item, size) = loaded[i];
if (pending[index] > 0) continue;
Debug.Assert(pending[index] == 0);
loaded.RemoveAt(i);
item.Dispose();
loadedSumSize -= size;
}
}

// If the algorithm is correct, these final assertions should never fail.
Debug.Assert(loaded.Count == 0);
Debug.Assert(loadedSumSize == 0L);
Debug.Assert(pending.All(n => n == 0));
Debug.Assert(emittedPairs.Cast<bool>().All(b => b));
}
finally
{
// Ensure that in case the enumerable is enumerated only partially,
// all currently loaded items will be disposed.
foreach (var entry in loaded) entry.Item.Dispose();
}
}

I measured the efficiency of this implementation by writing a simulation of the original problem:

  1. 1,000 files, each having size between 5 and 50 MB.
  2. maxConcurrentSize equal to 2 GB.
var source = Enumerable.Range(1, 1000).ToArray();
var random = new Random(0);
var sizes = source.ToDictionary(x => x,
_ => (long)random.Next(5_000_000, 50_000_000));
int loaderInvokedCount = 0;
var query = GetPairs(source, x => sizes[x], x =>
{
loaderInvokedCount++;
return new MemoryStream();
}, 2_000_000_000);
var emittedPairsCount = query.Count();
Console.WriteLine($"Emitted {emittedPairsCount:#,0} pairs");
Console.WriteLine($"Loader invoked {loaderInvokedCount:#,0} times");

Output:

Emitted 499,500 pairs
Loader invoked 7,682 times

Live demo.

The naive implementation presented as part of the question invokes the loader 500,500 times for the same input, which is 65 times more compared to this implementation. Achieving a 98,5% discount is a quite satisfactory outcome.

What is the best way to compare images in a bitmap list

I would avoid calculating the hash multiple times for the same image, and looping through the images only once:

public static void Main(string[] args)
{
var files = new Dictionary<string, string>();
foreach (var file in Directory.GetFiles("c:\\", "*.png"))
{
files.Add(file, CalculateHash(file));
}

var duplicates = files.GroupBy(item => item.Value).Where(group => group.Count() > 1);
}

private static string CalculateHash(string file)
{
using (var stream = File.OpenRead(file))
{
var sha = new SHA256Managed();
var checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}

Image Comparison for find percentage of similarity between images

OpenCV is a competent image algorithm library. I'd recommend EmGuCV as a C#-wrapper:
http://www.emgu.com/

Comparing images are quite tricky and you should be more specific in your question to get a good and specific answer. I'd guess you want to ignore things as different brightness levels, contrast and translation...

One simple way to look for an image within an image (if no rotation is applied) is to use the small image as a kernel and use convolution to get the best match(es) in the image. Threshold or apply a percentage scale (I recommend a non-linear) to the response and you have a decent filter.



Related Topics



Leave a reply



Submit