The key to proper application of cryptography is to define with enough precision what properties you are after.
Usually, when someone wants to hash passwords, it is in the following context: a server is authenticating users; users show their password, through a confidential channel (HTTPS...). Thus, the server must store user passwords, or at least store something which can be used to verify a password. We do not want to store the passwords "as is" because an attacker gaining read access to the server database would then learn all passwords. This is our attack model.
A password is something which fits in the brain of the average user, hence it cannot be fully unguessable. A few users will choose very long passwords with high entropy, but most will select passwords with an entropy no higher than, say, 32 bits. This is a way of saying that an attacker will have to "try" on average less than 231 (about 2 billions) potential passwords before finding the right one.
Whatever the server stores, it is sufficient to verify a password; hence, our attacker has all the data needed to try passwords, limited only by the computing power he can muster. This is known as an offline dictionary attack.
One must assume that our attacker can crack one password. At that point we may hope for two properties:
- cracking a single password should be difficult (a matter of days or weeks, rather than seconds);
- cracking two passwords should be twice as hard as cracking one.
Those two properties call for distinct countermeasures, which can be combined.
1. Slow hash
Hash functions are fast. Computing power is cheap. As a data point, with SHA-1 as hash function, and a 130$ NVidia graphic card, I can hash 160 millions passwords per second. The 231 cost is paid in about 13 seconds. SHA-1 is thus too fast for security.
On the other hand, the user will not see any difference between being authenticated in 1µs, and being authenticated in 1ms. So the trick here is to warp the hash function in a way which makes it slow.
For instance, given a hash function H, use another hash function H' defined as:
H'(x) = H(x || x || x || ... || x)
where '||' means concatenation. In plain words, repeat the input enough times so that computing the H' function takes some non-negligible time. So you set a timing target, e.g. 1ms, and adjust the number of repetitions needed to reach that target. 10ms means that your server will be able to authenticate 10 users per second at the cost of only 10% of its computing power. Note that we are talking about a server storing a hashed password for its own ulterior usage, hence there is no interoperability issue here: each server can use a specific repetition count, tailored for its power.
Suppose now that the attacker can have 100 times your computing power; e.g. the attacker is a bored student -- the nemesis of many security systems -- and can use dozens of computers across his university campus. Also, the attacker may use a more thoroughly optimized implementation of the hash function H (you are talking about PHP but the attacker can do assembly). Moreover, the attacker is patient: users cannot wait for more than a fraction of a second, but a sufficiently bored student may try for several days. Yet, trying 2 billions passwords will still require about 3 full days worth of computing. This is not ultimately secure, but is much better than 13 seconds on a single cheap PC.
2. Salts
A salt is a piece of public data which you hash with the password in order to prevent sharing.
"Sharing" is what happens when the attacker can reuse his hashing efforts over several attacked passwords. This is what happens when the attacker has several hashed passwords (he read the whole database of hashed passwords): whenever he hashes one potential password, he can look it up against all hashed passwords he is trying to attack. We call that a parallel dictionary attack. Another instance of sharing is when the attacker can build a precomputed table of hashed passwords, and then use his table repeatedly (by simple lookups). The fabled rainbow table is just a special case of a precomputed table (that's just a time-memory trade-off which allows for using a precomputed table much bigger than what would fit on a hard disk; but building the table still requires hashing each potential password). Space-time wise, parallel attacks and precomputed tables are the same attack.
Salting defeats sharing. The salt is a public data element which alters the hashing process (one could say that the salt selects the hash function among a whole set of distinct functions). The point of the salt is that it is unique for each password. The attacker can no longer share cracking efforts because any precomputed table would have to use a specific salt and would be useless against a password hashed with a distinct salt.
The salt must be used to verify a password, hence the server must store, for each hashed password, the salt value which was used to hash that password. In a database, that's just an extra column. Or you could concatenate the salt and the hash password in a single blob; that's just a matter of data encoding and it is up to you.
Assuming S to be the salt (i.e. some bytes), the hashing process for password p is: H'(S||p) (with the H' function defined in the previous section). That's it!
The point of the salt is to be, as much as possible, unique to each hashed password. A simple way to achieve that is to use random salts: whenever a password is created or changed, use a random generator to get 16 random bytes. 16 bytes ought to be enough to make salt reuse highly improbable. Note that the salt should be unique for each password: using the user name as a salt is not sufficient (some distinct server instances may have users with the same name -- how many "bob"s exist out there ? -- and, also, some users change their password, and the new password should not use the same salt than the previous password).
3. Choice of hash function
The H' hash function is built over a hash function H. Some traditional implementations have used encryption algorithms twisted into hash functions (e.g. DES for Unix's crypt()). This has promoted the use of the "encrypted password" expression, although it is not proper (the password is not encrypted because there is no decryption process; the correct term is "hashed password"). It seems safer, however, to use a real hash function, designed for the purpose of hashing.
The most used hash functions are: MD5, SHA-1, SHA-256, SHA-512 (the latter two are collectively known as "SHA-2"). Some weaknesses have been found in MD5 and SHA-1. Those weaknesses have serious impact for some usages, but not for what is described above (the weaknesses are about collisions, whereas we work here on preimage resistance). However, it is better public relations to choose SHA-256 or SHA-512: if you use MD5 or SHA-1, you may have to justify yourself. SHA-256 and SHA-512 differ by their output size and performance (on some systems, SHA-256 is much faster than SHA-512, and on others SHA-512 is faster than SHA-256). However, performance is not an issue here (regardless of the hash function intrinsic speed, we make it much slower through input repetitions), and the 256 bits of SHA-256 output are more than enough. Truncating the hash function output to the first n bits, in order to save on storage costs, is cryptographically valid, as long as you keep at least 128 bits (n >= 128).
4. Conclusion
Whenever you create or modify a password, generate a new random salt S (16 bytes). Then hash the password p as SHA-256(S||p||S||p||S||p||...||S||p) where the 'S||p' pattern is repeated enough times to that the hashing process takes 10ms. Store both S and the hash result. To verify a user password, retrieve S, recompute the hash, and compare it with the stored value.
And you will live longer and happier.