I'm doing a presentation on MD5 collisions and I'd like to give people any idea how likely a collision is.
It would be good to have two blocks of text which hash to the same thing, and explain how many combinations of [a-zA-Z ] were needed before I hit a collision.
The obvious answer is hash every possible combination until hit two hashes the same. So how would you go about coding this. As a quick experiment I tried hashing every combination of 5 columns of [A-Z], storing this in a .net hashtable and catching the collision exception. Two problems with this – the hashtable eventually times out, and I'm pretty sure I'm going to need A LOT more characters.
Obviously this data structure is too big to handle in memory, so now I'll have to get a database involved. Also sounds like a good project to test out azure – a bit like these guys.
Can anyone point me in the direction of an efficient way of doing this?
Best Solution
These following two different 128 byte sequences hash to the same:
MD5 Hash: 79054025255fb1a26e4bc422aef54eb4
The differences below are highlighted (bold). Sorry it's kind of hard to see.
and
The visualization of the collision/block1 (Source: Links.Org)
The visualization of the collision/block2 (Source: Links.Org)