Wednesday, June 24, 2009

Mathematics of a Perfect Hash

This is in response to a problem known as the "Kember Identity". I wanted to know if this search was in vane, or had some chance of success. The third requirement of a Hash is that "It is computationally infeasible to find messages m1 and m2 with h(m1) = h(m2) (in this case, the function h is said to be strongly collision-free)" (Trappe p219). Given that statement it is reasonable to state that the bits of the hash should be randomly flipped.

The idea of the Kember Identity is simple, find a hash of a message (in this case MD5) such that: h(m) = m. In order for this to occur the bits of at least one 128-bit message need to hash into the same 128 bit digest. Assuming a hash that has the property of treating all bit combinations as equal even its own, there is a 1 in 2128 chance that any single value will achieve this. This can be restated as there is a 2128-1 in 2128 chance that it will be anything else. The question is what are the odds that all 2128 will be anything else, the answer is:

$\left (\frac{2^{128}-1}{2^{128}} \right )\left (\frac{2^{128}-1}{2^{128}} \right )\left ( ... \right )=\left (\frac{2^{128}-1}{2^{128}} \right )^{2^{128}}$

Using my favorite package to ask big math questions on (PARI), I came up with an answer of approximately 37%. That means there is a 37% chance that in a perfect hash there will be no combination of a 128-bit message that will produce the same output. To put it in the reverse which I am sure my friend who is searching for the "Kember Identity" is interested in, that means that there is a 63% percent chance that there is at least one 128-bit message whose digest is the same.

Kember came out here much better then I predicted. I should have remembered the lessons of the Birthday Paradox. Needless to say if MD5 performs near the assumptions above there is a better then 50/50 chance that such a hash exists. I wish Kember the best of luck in his search.

UPDATE: Since 2128 is a very large number lets see what happens when the above equation goes to infinity:

$\lim_{n\to \infty } \left (\frac{n-1}{n} \right )^{n} = e^{-1}\approx 0.37$

Which gives an answer near the long and more correct way to calculate this.