I’ve been working a bit on password hashing lately, and have come up with a mode which seems to be so simple and fundamental that I’m surprised that it isn’t a standard construct. Or at least it doesn’t appear to be. I’ve looked through some, but not all, of the password hashing competition candidates and to the best of my ability to read them they appear to take slightly different approaches to the problem. Analysis of what’s presented here and comparisons to other work which has been done would be much appreciated.
The goal of password hashing is to make something which is no easier to do on custom hardware than it is on a general purpose CPU. To that end, you’d like an operation which requires that cache memory be used extensively, and that the order in which cache memory is used is unpredictable. You’d also like to use a core operation which has the same property. Going with AES128 is a reasonable bet, because it’s already accelerated on common hardware.
The basic idea behind this construction is to have bulk memory, which will be accessed pseudorandomly, and an internal state which consisting of at least a reference to some part of bulk memory. Everything is initially zeroed out. There’s a reversible operation based on the data being hashed which munges the internal state and the part of bulk memory referred to, resulting in the pointer referring to a new pseudorandomly selected block of memory. This operation is repeated some number of times until enough work has been done, and then output is extracted. It’s reasonable to set the number of rounds to be be double the number of blocks stored in bulk memory. It might be okay to go far lower, but that feels sketchy. In particular, going lower might result in custom hardware being able to effectively pipeline the next block being uninitialized (which might be able to produce a small speedup on a general CPU anyway, further research needed).
For highly convoluted and technical reasons it appears to be that the internal state consisting of just a reference isn’t enough. It also appears to be that having an internal state consisting of a single block isn’t enough. The construction I’ve come up with requires the internal state be a single block plus another reference. It was surprisingly difficult to come up with the construction, but it’s trivial to show that it’s reversible and easy to intuit that it’s secure if the underlying primitive is secure. It might be an interesting exercise to prove that, but I don’t know how such proofs work.
Anyway, on to the actual construction.
There’s a difficulty factor, k, which is a parameter to the whole thing. There’s also an encryption function used, probably AES128, resulting in a block size of 128 bits. Bulk memory is 2^k * blocksize in size. Internal state is one block of length blocksize, plus a backpointer consisting of k bits.
Every round operates on the internal block and the external block referred to by the last k bits of the internal block. Transforms are done as follows:
The new value of the backpointer is the last k bits of the old value of the internal block.
The new value of the external block is old value of the back pointer concatenated to everything but the last k bits of the old value of the internal block.
The new value of the internal block is the old value of the internal block xored with the old value of the external block and encrypted.
The extracted value is the final state of the internal block.
That’s the entire construction. It’s so simple and elegant that I’m surprised that it isn’t a standard construction.
One might protest that the input and output of this construction using AES128 are 128 bits and for password hashing you want them to be 256 bits. That can be fixed straightforwardly. From the input, set the AES key to be the first 128 bits of the input and before every encryption xor the block with the last 128 bits of the input.
For the output, take the 128 bits which are the direct output of the hashing process and encrypt those using the same encryption function used everywhere else to get the next 128 bits. It feels a bit like the output should be xored by the original key to make it a true hash function and remove all trace of reversibility from it, but that feels a bit unnecessary after all that junk done in the middle. It would be good to have an opinion from someone who knows these things better than I.Update:I figured it out. The first 128 bits of the output are the final state of the internal block xored with the part of the key which is used for xoring (on the general principle that the way you turn an encryption function into a hash function is by encrypting and then xoring with the original key). It’s then encrypted (without a pre-xor, obviously) and finally xored one last time to produce the last 128 bits of the output.