Possible to Calculate Md5 (Or Other) Hash with Buffered Reads

Possible to calculate MD5 (or other) hash with buffered reads?

You use the TransformBlock and TransformFinalBlock methods to process the data in chunks.

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock and then send an empty block to TransformFinalBlock to finalise the process.

Calculate hash without having the entire buffer in memory at once

You simply want to use the TransformBlock and TransformFinalBlock members of the class, which allow you to compute the hash in chunks.

MSDN has a good example of how to do this.

Combining MD5 hash values

In order to calculate MD5 values for files which are too large to fit in memory

With that in mind, you don't want to "combine" two MD5 hashes. With any MD5 implementation, you have a object that keeps the current checksum state. So you can extract the MD5 checksum at any time, which is very handy when hashing two files that share the same beginning. For big files, you just keep feeding in data - there's no difference if you hash the file at once or in blocks, as the state is remembered. In both cases you will get the same hash.

Combining MD5 hash values

In order to calculate MD5 values for files which are too large to fit in memory

With that in mind, you don't want to "combine" two MD5 hashes. With any MD5 implementation, you have a object that keeps the current checksum state. So you can extract the MD5 checksum at any time, which is very handy when hashing two files that share the same beginning. For big files, you just keep feeding in data - there's no difference if you hash the file at once or in blocks, as the state is remembered. In both cases you will get the same hash.

How to calculate the MD5 hash of a large file in C?

example

gcc -g -Wall -o file file.c -lssl -lcrypto

#include <stdio.h>
#include <openssl/md5.h>

int main()
{
unsigned char c[MD5_DIGEST_LENGTH];
char *filename="file.c";
int i;
FILE *inFile = fopen (filename, "rb");
MD5_CTX mdContext;
int bytes;
unsigned char data[1024];

if (inFile == NULL) {
printf ("%s can't be opened.\n", filename);
return 0;
}

MD5_Init (&mdContext);
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5_Update (&mdContext, data, bytes);
MD5_Final (c,&mdContext);
for(i = 0; i < MD5_DIGEST_LENGTH; i++) printf("%02x", c[i]);
printf (" %s\n", filename);
fclose (inFile);
return 0;
}

result:

$ md5sum file.c
25a904b0e512ee546b3f47574703d9fc file.c
$ ./file
25a904b0e512ee546b3f47574703d9fc file.c

caluclate MD5 of uploaded image

When you "upload" a file, it is just stored in memory on the server until you do something with it (and if you do nothing, it's lost at the end of the request handler). Since you also need to read it into memory to compute the md5, you can skip the step of writing to disk, at least for the MD5 part:

public void ProcessRequest(HttpContext context)
{
using (var md5 = MD5.Create())
{
foreach (var file in context.Request.Files)
{
var hash = md5.ComputeHash(file.InputStream);

// do whatever with the file + md5 now
}
}
}

You can still write it to disk after this if that's what you want, but doing the md5 calculation first saves reading it back out again, plus lets you do duplicate checking or lets you name the file based on the hash.

You should also strongly consider a better hash than md5, as it's considered weak these days. SHA256 is a popular and solid choice.



Related Topics



Leave a reply



Submit