Ideally, one would like any password handling algorithm to be a strong one-way function of the password--that is, given the algorithm's output and other inputs, an attacker should have little chance of learning even partial information she could not already have guessed about the password. Unfortunately, one-way functions are defined asymptotically with respect to their input lengths--an attacker has negligible probability of inverting a one-way function on sufficiently large inputs, but exactly how large depends on the attacker. Because there is a fixed limit to the size of passwords users will tolerate, we need a different criterion for functions on passwords.
Informally, we would like a password scheme to be ``as good as the
passwords users choose.'' Given a probability distribution D on
passwords, we define the predictability R(D) of the
distribution to be the highest probability of any
single password s in D:
. A
function of a password is secure if an attacker's probability of
learning any partial information about the password is proportional to
the product of the work she invests and the predictability of the
password distribution.
What does it mean for an attacker to learn partial information about a password? We define partial information to be the value of any single-bit predicate on a password. Interesting predicates on passwords might include the first bit of a password, or the parity of bits in a password. An attacker can always guess certain predicates with high probability--for instance, the trivial predicate P(s)=1 which returns 1 on all passwords. If a function of a password is secure, however, its output should not let an attacker guess any predicate more accurately than she could have without the function's output.
More formally, let F(s,t) be a function. The argument s
represents a user's secret password, which will be drawn from a
probability distribution D. The argument t represents any
additional non-secret inputs F might take. Let the values of t be
drawn from a probability distribution T. We model an attacker as a
randomized boolean circuit, A,
that tries to guess a predicate P about a password. The cost of an
attack--or the work invested by an attacker--is the number of gates
in the circuit, which we denote |A|. We use the notation
to denote the
probability of statement B after an experiment in which variables
are drawn from probability distributions
, respectively. Now we can define what it means for a
password function to resist attack. We say that function F(s,t) is
an
-secure password function if the following hold:
We should first note that this definition matches our intuition about a password hashing function like crypt. If users choose predictable enough passwords, knowing a password hash gives adversaries a large advantage--they can compare hashes of the most popular passwords to that of the password they are trying to break. If, additionally, one can guess a useful predicate without even looking at a password hash--for instance by knowing that the third character of most passwords is a lower-case letter--then clearly an adversary can guess this too.
If, however, no single password occurs with particularly high probability, an adversary should need to expend a large amount of effort (as measured in circuit gates) to discover any non-trivial information about a password. Finally, we also wish to prevent an attacker from finding other strings that hash to the same value as a password; such strings may prove equivalent to passwords during authentication. The requirement of second preimage resistance guarantees such collisions are hard to find, even with knowledge of the original password. It also ensures that F does not ignore any bits of its password input.
The definition implies that a secure password function F(s,t) must make non-trivial use of its second argument, t. To see this, consider that the first bit of F(s,0) is a perfectly valid predicate on passwords. An attacker could easily guess this predicate if either F ignored its second argument or the string 0 occurred in T with high probability. This point is not merely an academic one. A single-input password hashing function F(s) can be inverted by a circuit large enough to encode a lookup table mapping F(s) (or sufficiently many bits of F(s)) to s. The size of such a circuit depends only on the probability distribution of the passwords, not on the particulars of F.
As proposed by Morris and Thompson [9], however, lookup tables can be thwarted with the second input to F, which they call a salt. If a random salt is chosen whenever users establish new passwords, and if the salt space is large enough to ensure a negligible probability of recurrence, lookup tables offer an adversary no advantage; he may as well compute F at the time of attack. If, on the other hand, the salt space is too small, the output bits of F become useful predicates on passwords, a fact exploited by the QCrack [12] program described in Section 6.
While salted passwords defeat lookup tables, given a particular salt
and hash, an adversary can still mount a brute force attack by
evaluating F(s,t) on every possible password. It follows that the
best security one can achieve is , where |F|
is the cost in gates of implementing F. Usability requirements
therefore effect a lower limit on
--if people can only
tolerate a one second delay for checking passwords, F can take at
most one second to evaluate. F should not take significantly less,
however, as this would unnecessarily weaken security.
The number of gates |A| that an adversary can reasonably muster for
an attack increases constantly as hardware improves. Fortunately, so
does the speed of machines that must legitimately evaluate F. That
means passwords should not be hashed by a single function F with
fixed computational cost, but rather by one of a family of functions
with arbitrarily high cost. Instead of repeatedly throwing out
functions like crypt and MD5 crypt to start over with more
expensive but incompatible ones, systems should allow the cost of any
password manipulation software to scale gracefully with a tunable
parameter. Thus, can decrease as fast as hardware improves
and users will tolerate. Compromised password databases will then
enjoy maximum security against off-line attacks.
In summary, a good password function makes extracting any partial
information about passwords as difficult as guessing passwords. A
concrete parameter, , should characterize this difficulty.
To achieve low values of
, a password function must take a
second input, the salt, that prevents adversaries from benefiting from
large lookup tables. The best value of
is inversely
proportional to the cost of evaluating a password function. This
establishes a lower limit for
based on the maximum
tolerable cost of evaluating F during legitimate use. As hardware
speeds constantly improve, a good password scheme should allow the
cost of F to increase gradually so that
can decrease over
time.
One final criterion for a good password function is then to minimize
the value . That means one should make any password
function as efficient as possible for the setting in which it will
operate. The designers of crypt failed to do this. They based
crypt on DES [10], a particularly inefficient algorithm to
implement in software because of many bit transpositions. They
discounted hardware attacks, in part because crypt cannot be
calculated with stock DES hardware. Unfortunately, Biham [4]
later discovered a software technique known as bitslicing that
eliminates the cost of bit transpositions in computing many
simultaneous DES encryptions. While bitslicing won't help anyone log
in faster, it offers a staggering speedup to brute force password
searches.
In general, a password algorithm, whatever its cost, should execute with near optimal efficiency in any setting in which it sees legitimate use, while offering little opportunity for speedup in other contexts. It should rely heavily on a CPU's fast instructions--for instance addition, bitwise exclusive-or, shifts, and memory access to state that fits in a processor's first level cache. Ideally these operations should all be portably accessible from high-level languages like C, so as to minimize the benefit of hand-coded assembly language implementations. Conversely, the algorithm should avoid operations like bit transposition on which customized hardware enjoys a large advantage.
A password function should also not lend itself to any kind of pipelined hardware implementation. It should permit relatively little speedup from any kind of precomputation--for instance, hashing 1,000 passwords with the same salt and hashing one password under 1,000 salts should each cost 1,000 times more than hashing a single password.
![]() |