Cryptography/Breaking Hash Algorithms

Cryptographic hash functions are one of the more difficult, from a cryptography perspective, things to break.

Cryptographic hash functions are specifically designed to be "one-way": If you have some message, it is easy to go forward to the corresponding hashed value; but if you only have the hashed value, cryptographic hashes are specifically designed to be difficult to calculate the original message that produced that hash value -- or any other message that produces the same hash value.

As we previously mentioned in ../Hashes/, a cryptographically secure hash is designed to have these properties:


 * Preimage resistant: Given H it should be hard to find M such that H = hash(M).
 * Second preimage resistant: Given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2).
 * Collision-resistant: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2).

Cryptographers distinguish between three different kinds of attacks on hash functions:


 * collision attack: try to find any two different messages m1 and m2 such that hash(m1) = hash(m2).
 * preimage attack: Given only the hash value H, try to recover *any* M such that H = hash(M).
 * second-preimage attack: Given an input m1, try to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2).
 * Some hash functions (MD5, SHA-1, SHA-256, etc.) are vulnerable to a "length extension attack".

(Alas, different cryptographers use different and sometimes use contradictory terms for these three kinds of attacks. Outside of this book, some cryptographers use "collision" to refer to a successful attack of any of these 3 types, and use the term "free collision" for what this book calls a "successful collision attack", or "bound collision" for either one of a "successful preimage attack" or a "successful second-preimage attack".)

When designing a new system that requires some hash function, most cryptographers recommend using hash fuctions that, as far as we know, are resistant to all these attacks (such as SHA-3, BLAKE, Grøstl, Skein, etc.).

The collision attack is the easiest kind of attack, and the most difficult to defend against. Because there are an infinite number of possible files, the pigeonhole principle tells us that there are in theory an infinite number of hash collisions, even for the "ideal" random oracle hash. Cryptographic hashes are designed to make it difficult -- using only resources available in our solar system, practically impossible -- to find *any* of those messages that hash to some given hash value.

Some applications require collision resistance. When a possible attacker generates a message and we want to confirm that the message that person shows Alice is the same as the message that person shows Bob, ensuring message integrity, we need a hash that hash collision resistance.

Many applications do not actually require collision resistance. For example, password hashing requires preimage and second-preimage resistance (and a few other special characteristics), but not collision resistance. For example, de-duplicating file systems, host-proof file systems such as IPFS, digital signatures, etc. only require second-preimage resistance, not preimage or collision resistance, because in those applications it is assumed that the attacker already knows the original message that hashes to the given value. For example, message authentication using HMAC does not require collision resistance and is immune to length extension; so as of 2011 cryptographers find using HMAC-MD5 message authentication in existing applications acceptable, although they recommend that new applications use some alternative such as HMAC-SHA256 or AES-CMAC.

The MD5 and SHA-1 hash functions, in applications that do not actually require collision resistance, are still considered adequate.

Many people criticise MD5 and SHA1 for the wrong reasons. There is no known practical or almost-practical preimage attack on MD5 or SHA-1, much less second-preimage attacks, only collision attacks.

Such collision attacks include:


 * Dobbertin announced a collision of the MD5 compression function in 1996 ...
 * As of 2009, finding chosen-prefix collisions in MD5 takes about 30 seconds on a laptop.
 * Manuel and Peyrin's SHA-0 attack
 * Nat McHugh's MD5 collision attacks


 * etc.
 * etc.

In the next chapters we will discuss


 * ../Collisions/
 * ../Birthday Attack/
 * ../Rainbow tables/