Password Management - Approach to Hash, Salt & Iteration - security

Have gone through several questions on this topic at SO, and am unable to find answers to this specific query. I've seen
Salting Your Password: Best Practices? and the excellent answer to Non-random salt for password hashes, which both have very helpful guidelines, but doesn't have a clear guideline on storage.
Is it advisable to have the hash, random salt and iteration count all in the same table? If not, what is a suggested approach?
I do understand that rainbow tables can't be made easily with random salts in place, even if we have them together. The question is because there are many simple extra deterrents that can go a long way. For example, have the salt in a different table (injections usually leach a table, not a DB) and the iteration count in a different tier (say, a constant in mid-tier).

It is the normal pattern to store the salt and iteration count together with the computed hash.
The salt is not a secret. A salt 'works' by being different for each computed hash. If the attacker knows the salt and iteration count, it does not help him in any way.

We have answered this question in various forms over on Security Stack Exchange.
Salting with the first 8 bits of the password - general agreement this isn't a good idea
Splitting a password - also not a benefit
These are similar in concept to your approach of holding all the information in the same table.
#Greg's comment is partially right - a determined attacker will be able to get all the data eventually, but the key here is around timing. A skilled attacker, given enough time and resources will be able to access your systems - the key is to making it difficult or noisy enough that you spot it in time.
From one of cryptographer Thomas Pornin's posts on our Security Stack Exchange blog:
Why passwords should be hashed - we hash passwords to prevent an attacker with read-only access from escalating to higher power levels. Password hashing will not make your Web site impervious to attacks; it will still be hacked. Password hashing is damage containment.

Related

Does adding a constant string to the user's password before hashing it make it more secure?

Does adding a constant string that is stored in the code to the password before hashing make it harder for an attacker to figure out the original password?
This constant string is in addition to a salt. So, Hash(password + "string in code added to every password" + randomSaltForEachPassword)
Normally, if an attacker gets their hands on the database, they can possibly figure out someone's password by brute force. The database contains the salts corresponding to each password, so they would know what to salt their brute force attempts with. But, with the constant string in code, the attacker would also have to obtain the source code to know what to append to each of their brute force attempts.
I think it would be more secure, but I wanted to get other people's thoughts, and also make sure I'm not inadvertently making it less secure.
Given that you already have a random salt, appending some other string neither adds nor detracts from the security level.
Basically, it's just a waste of time.
update
This was getting a little long to use the comments.
First off, if the attacker has the database and the only thing you've encrypted is the password then games over anyhow. They have the data which is the truly important part.
Second, the salt means that they have to create a larger rainbow table to encompass the larger password length possibilities. The time this takes becomes impractical depending on salt length and the resources available to the cracker. See this question for a bit more info:
How to implement password protection for individual files?
update 2
;)
It is true that users reuse passwords (as some of the latest hacked sites reveal) and it's good that you want to prevent your data loss from impacting them. However, once you finish reading this update you'll see why that's not entirely possible.
The other questions will have to be taken together. The entire purpose of a salt is to ensure that the same two passwords result in a different hash value. Each salt value would require a rainbow table to be created encompassing all of the password hash possibilities.
Therefore not using a salt value means that a single global rainbow table can be referenced. It also means that if you use just one salt value for all passwords on the site, then, again, they can create a single rainbow table and grab all of the passwords at once.
However, when each password has a separate salt value this means they have to create a rainbow table for each salt value. Rainbow tables take time and resources to build. Things that can help limit the time it takes to create a table is knowing the password length restrictions. For example, if your passwords must be between 7 and 9 characters then the hacker only has to compute hash values in that range.
Now the salt value has to be available to the function that is going to hash a password attempt. Generally speaking you could hide this value elsewhere; but quite frankly if they've stolen the database then they'll be able to track it down pretty easily. So, placing the values next to the actual password has zero impact on security.
Adding an extra bit of characters that is common to ALL passwords adds nothing to the mix. Once a hacker cracks the first one it will be obvious that the others have this value and they can code their rainbow table generator accordingly. Meaning that it essentially saves no time. Further, it leads to a false sense of security on your part which can lead to you making bad choices.
Which leads us back to the purpose of salting passwords. The purpose is not to make it impossible, as anyone with time and resources can crack them. The purpose is to make it difficult and time consuming. The time consuming part is to allow you the time to detect the break in, notify everyone you have to, and enforce password changes in your system.
In other words, once the database is lost then all users should be notified so that they can take the appropriate action of changing their passwords on yours and other systems. The salt is just buying you and them time to do this.
The reason I mentioned "impractical" before with regards to cracking them is that the question is really one of the hacker determining the value of the passwords versus the cost in cracking them. Using reasonable salt values you can drive the computational costs up enough that very few hackers would bother. They tend to be low hanging fruit kind of people; unless you have a reason to be a target. At which point you should look into other forms of authentication.
This only helps if your threat model includes a situation in which your attacker somehow obtains your password database, but cannot read the secret key stored in your code. For most, this isn't a terribly likely scenario, so it's not worth catering for.
Even in that limited case, it doesn't gain you a great deal of additional security, as the attacker can simply take their own password, and iterate over all possible secret key values. Once they find the right one (because it hashes their own password correctly), they can use that to attack all the other passwords in the database as they would normally.
If you're concerned about storing passwords securely, you should use a standard scheme like PBKDF2, which uses key stretching to make brute forcing much less practical.

Ultimate Hash Protection - Discussion of Concepts

Ok, so the whole problem with hashes is that users don't enter passwords over 15 characters long. Most only use 4-8 characters making them easy for attackers to crack with a rainbow table.
Solution, use a user salt to make hash input more complex and over 50chars so that they will never be able to generate a table (way to big for strings that size). plus, they will have to create a new table for each user. Problem: if they download the db they will get the user salt so you are back to square one if they care enough.
Solution, use a site "pepper" plus the user salt, then even if they get the DB they will still have to know the config file. Problem: if they can get into your DB chances are they might also get into your filesystem and discover your site pepper.
So, with all of this known - lets assume that an attacker makes it into your site and gets everything, EVERYTHING. So what do you do now?
At this point in the discussion, most people reply with "who cares at this point?". But that is just a cheap way of saying "I don't know what to do next so it can't be that important". Sadly, everywhere else I have asked this question that has been the reply. Which shows that most programmers miss a very important point.
Lets image that your site is like the other 95% of sites out there and the user data - or even full sever access - isn't worth squat. The attacker happens to be after one of your users "Bob" because he knows that "Bob" uses the same password on your site as he does on the banks site. He also happens to know Bob has his life savings in there. Now, if the attacker can just crack our sites hashes the rest will be a piece of cake.
So here is my question - How do you extend the length of the password without any traceable path? Or how do you make the hashing process to complex to duplicate in a timely manner? The only thing that I have come up with is that you can re-hash a hash several thousand times and increase the time it would take to create the final rainbowtable by a factor of 1,000. This is because the attacker must follow that same path when creating his tables.
Any other ideas?
Solution, use a user salt to make hash
input more complex and over 50chars so
that they will never be able to
generate a table (way to big for
strings that size). plus, they will
have to create a new table for each
user. Problem: if they download the db
they will get the user salt so you are
back to square one if they care
enough.
This reasoning is fallacious.
A rainbow table (which is a specific implementation of the general dictionary attack) trades space for time. However, generating a dictionary (rainbow or otherwise) takes a lot of time. It is only worthwhile when it can be used against multiple hashes. Salt prevents this. The salt does not need to be secret, it just needs to be unpredictable for a given password. This makes the chance of an attacker having a dictionary generated for that particular salt negligibly small.
"The only thing that I have come up with is that you can re-hash a hash several thousand times and increase the time it would take to create the final rainbowtable by a factor of 1,000."
Isn't that exactly what the Blowfish-based BCrypt hash is about? Increasing the time it takes to compute a hash so that brute force cracking (and rainbow table creation) becomes undoable?
"We present two algorithms with adaptable cost (...)"
More about adaptable cost hashing algorithms: http://www.usenix.org/events/usenix99/provos.html
How about taking the "pepper" idea and implementing it on a separate server dedicated to hashing passwords - and locked down except for this one simple and secure-as-possible service - possibly even with rate-limits to prevent abuse. Gives the attacker one more hurdle to overcome, either gaining access to this server or reverse engineering the pepper, custom RNG and cleartext extension algorithm.
Of course if they have access to EVERYTHING they could just evesdrop on user activity for a little while..
uhmm... Okay, my take on this:
You can't get the original password back from a hash. I I have your hash, I may find a password that fits that hash, but I can not log in to any other site that uses this password, assuming they all use salting. No no real issue here.
If someone gets your DB or even your site to get your config, you're screwed anyway.
For Admin or other Super Accounts, implement a second mean of verification, i.e. limit logins to certain IP ranges, use Client-Side-SSL Certificates etc.
For normal users, you won't have much chance. Everything you do with their password needs to be stored in some config or database, so if have your site, I have your magic snake oil as well.
Strong Password limitations don't always work. Some sites require passwords to have a numeric character - and as a result, most users add 1 to their usual password.
So I'm not entirely sure what you want to achieve here? Adding a Salt to the front of the users password and protecting Admin accounts with a second mean of authentication seems to be the best way, given the fact that users simply don't pick proper passwords and can't be forced to either.
I was hoping that someone might have a solution but sadly I am no better off then when I first posted the question. It seems that there is nothing that can be done but to find a time-costly algorithm or re-hash 1,000's of times to slow down the whole process of generating rainbow tables (or brute-forcing) a hash.

Crypto, hashes and password questions, total noob?

I've read several stackoverflow posts about this topic, particularly this one:
Secure hash and salt for PHP passwords
but I still have a few questions, I need some clarification, please let me know if the following statements are true and explain your comments:
If someone has access to your database/data, then they would still have to figure out your hashing algorithm and your data would still be somewhat secure, depending on your algorithm? All they would have is the hash and the salt.
If someone has access to your database/data and your source code, then it seems like no matter what your do, your hashing algorithm can be reversed engineered, the only thing you would have on your side would be how complex and time consuming your algorithm is?
It seems like the weakest link is: how secure your own systems are and who has access to it?
Lasse V. Karlsen ... brings up a good point, if your data is compromised then game over ... my follow up question is: what types of attacks are these hashes trying to protect against? I've read about rainbow table and dictionary attacks (brute force), but how are these attacks administered?
The security of cryptographic algorithms is always in their secret input. Reasonable cryptanalysis is based on an assumption that any attacker knows what algorithm you use. Good cryptographic hashes are non-invertible and collision resistant. This means that there's still a lot of work to do going from a hash to the value that generated it, regardless of whether you know the algorithm applied.
If you used a secure hash, access to the hash, salt, and algorithm will still leave a lot of work for a would-be attacker.
Yes, a secure hash puts a very hard to invert algorithm on your side. Note that this inversion is not 'reverse-engineering'
The weak link is probably the processes and procedures that get those password hashes into the database. There are all sorts of ways to screw up and store sensitive data in the clear.
As I noted in a comment, there are attacks that these measures defend against. First, knowing the password may lead to authorization to do things beyond what the contents of the database suggest. Second, those passwords may be used elsewhere, and you expose your users to risk by revealing their passwords as a result of a break-in. Third, with hashing, an insider can't exploit read-only access to the database (subject to less auditing, etc.) to impersonate a user.
Dictionaries and rainbow tables are techniques for accelerating hash inversion.
You question is about using passwords as an authentication mechanism and how to securely store these passwords in a database using a hash. As you probably already know the goal is to be able to verify passwords without storing these passwords i clear text in the database. In this context let me try to answer each of your questions:
If someone has access to your database/data, then they would still have to figure out your hashing algorithm and your data would still be somewhat secure, depending on your algorithm? All they would have is the hash and the salt.
The basic idea of hashing passwords is that the attacker has knowledge of the hashing algorithm and has access to both the hash and the salt. By selecting a cryptographic strong hash function and a suitable salt value that is different for each password the computational effort required to guess the password is so high that the cost exceeds the possible gain the attacker can get from guessing the password. So to answer your question, hiding the hash function does not improve the security.
If someone has access to your database/data and your source code, then it seems like no matter what your do, your hashing algorithm can be reversed engineered, the only thing you would have on your side would be how complex and time consuming your algorithm is?
You should always use a well-known (and suitably strong) hashing algorithm, and reverse engineering this algorithm is not meaningful as there is nothing hidden in your code. If you didn't mean reverse engineer but actually reverse then, yes, the passwords are protected by the complexity of reversing the hash function (or guessing a password that matches a hash value). Good hash functions makes this very hard.
It seems like the weakest link is: how secure your own systems are and who has access to it?
In general this is true, but when it comes to securing passwords by storing them as hashes you should still assume that the attacker has full access to the hashes and design your system accordingly by choosing an appropriate hash function and using salts.
What types of attacks are these hashes trying to protect against? I've read about rainbow table and dictionary attacks (brute force), but how are these attacks administered?
The basic attack that password hashing protects against is when the attacker gets access to your database. The clear text password cannot be read from the database and the password is protected.
A more sophisticated attacker can generate a list of possible passwords and compute the hash using the same algorithm as you. He can then compare the computed hash to the stored hash and if he finds a match he has a valid password. This is a brute force attack and it is generally assumed that the attacker has "offline" access to your database. By requiring the users to use long and complex passwords the effort required to "brute force" a password is significantly increased.
When the attacker wants to attack not one password, but all the passwords in the database a large table of passwords and hash value pairs can be precomputed and further improved by using what is called hash chains. Rainbow tables is an application of this idea and can be used to brute force many passwords simultaneously without increasing the effort significantly. However, if a unique salt is used to compute the hash for each password a precomputed table becomes useless as it is different for each salt and cannot be reused.
To sum it up: Security by obscurity is not a good strategy for protecting sensitive information and modern cryptography allows you to secure information without having to resort to obscurity.
what types of attacks are these hashes trying to protect against?
That type when someone gets your password from poorly secured site, reverses it, and then tries to access your bank/PayPal/etc. account. It happens all the time, and many people are still using same (and often weak) passwords everywhere.
As a side note, from what I've read, key derivation functions (PBKDF2/scrypt/bcrypt) are considered better/more secure (#1, #2) than plain salted SHA-1/SHA-2 hashes by crypto people.
If you have just a hash, no salt, then once they know your data (and algorithm) they can get your password via a rainbow table lookup. If you have a hash and a salt, they can get your password by burning a lot of CPU cycles and building a rainbow table.
If your salt is the same for all your data, they only need to burn a lot of CPU cycles once to build the table and then they have all the passwords. If your salt is not always the same, they need to burn through the CPU cycles to make a unique rainbow table for each record.
If the salt is long enough, the CPU cycles they need become very cost-prohibitive.
If you know your data security is breached, of course, you need to reset all the passwords immediately anyway, because as far as you know the attacker is willing to spend that time.
If someone has access to your database/data, then they would still
have to figure out your hashing
algorithm and your data would still be
somewhat secure, depending on your
algorithm? All they would have is the
hash and the salt.
This might be all a really dedicated opponent would need. Much of this answer depends on how valuable the data is, which would tell you how motivated the opponent is. Credit card numbers are going to be extremely valuable, and criminal attackers seem to have plenty of time and accomplices to do their dirty work. Some bad guys have been known to farm out key decryption tasks to botnets!
If someone has access to your database/data and your source code,
then it seems like no matter what your
do, your hashing algorithm can be
reversed engineered, the only thing
you would have on your side would be
how complex and time consuming your
algorithm is?
If they have access to your source and all the data, the question is going to be "how did you load your key into the memory of the server in the first place?" If it's embedded in the data or in the program code, it's game over and you've lost. If it was hand-keyed by an operator at the machine's boot time, it should be as secure as your trust in your operator. If it is stored in an HSM*, it should still be secure.
And if they have root-level authority access to your running machine, then they can probably trigger and recover a memory dump that will reveal the secret key.
It seems like the weakest link is: how
secure your own systems are and who
has access to it?
This is true. But there are alternatives that help improve security.
For bank-like protection, the kind that passes security and industry audits, it's recommended that you use a *Hardware Security Module (HSM) to perform key storage and encryption/decryption functions. The commercial strength HSMs we're looking at cost 10s of thousands of dollars or more each, depending on capacity. But I have seen hardware encryption cards that plug into a PCI slot that cost substantially less.
The idea behind an HSM is that the encryption happens on a secure, hardened platform that nobody has access to without the secret keys. Most of them have cabinets with intrusion detection switches, trip wires, epoxied chips, and memory that will self-destruct if tampered with. Not even the legitimate owner or the factory should be able to recover the database key from an HSM without the set of authorized crypto keys (usually carried on smart cards.)
For a very small installation, an HSM can be as simple as a smart card. Smart cards aren't high performance encryption devices, though, so you can't pump more than about one decryption transaction per second through them. Systems using smart cards usually just store the root key, then decrypt the working database key on the smart card and send it back to the database accessing system. These will still yield the working database key if the attacker can access running memory, or if the attacker can sniff the USB traffic to and from the smart card.
And I have no experience getting TPM chips to work (yet), but theoretically they can be used to securely store keys on a machine. Again, it is still no defense against an attacker taking a memory dump while the key is loaded in memory, but they would prevent a stolen hard drive containing code and data from revealing its secrets.
A hash cannot be reversed. Conceptually, think of a hash as taking the value to be hashed as the seed to a random number generator, then taking the 500th number that it generates. This is a repeatable process, but it is not a reversible process.
If you store a hashed password in your database, when your user logs in, you take his password from the input to the login page, you apply the same hash to it, and then you compare the result of that operation to what you have stored in the database. If they match, the user typed the right password. (Or, in theory, they could have typed something that happens to hash to the same value, but in practice, you can completely ignore this.)
The purpose of the salt is so that even if users have the same password, you can't tell, and also lots of other things which are equivalent to this idea. If the user's password is "secret", and the salt is "abc", then instead of making a hash of "secret", you hash "secretabc" and store the results of that in your database. You also store the salt, but this is perfectly safe to store -- you can't figure out any information about the password from it.
The only reason to safeguard the hashed passwords and salt is that if an attacker has a copy of it, he can test passwords offline on his own machine, rather than repeatedly trying to log in to your server, which you would probably lock him out after three attempts or something like that. Even if you don't lock him out, it's much faster to test locally than to wait for the network round-trip.
( OP )
brings up a good point, if your data
is compromised then game over ... my
follow up question is: what types of
attacks are these hashes trying to
protect against? I've read about
rainbow table and dictionary attacks
(brute force), but how are these
attacks administered
( discussion )
It's not a game, except to the attacker. Research these terms:
Sarbanes-Oxley
Gramm-Leach-Bliley Act (GLBA)
HIPAA
Digital Millenium Copyright Act (DMCA)
PATRIOT Act
Then tell us ( as thought provocation for you ) how do we protect against whom? For one thing, it is the efforts of innocents vis-a-vis intruders - and for another it is data-recovery if part of the system fails.
It is an interesting experiment that the original intent of tcp/ip and so on is advertised as being a weapon of war, survivability under attacks. Okay, so passwords are hashed - no one can recover them ...
Which, duh, includes the owner-operator of the system.
So you build a robust record locking tool that implements key controls, then political pressures force the use of brand-x tools.
You can read Federal Information Security Management Act (FISMA) and by the time you have read it some governmental entity somewhere will have had an entire disk either stolen or compromised.
How would you protect that disk if it was your personal identity information on that disk.
I can tell you from the caliber of Martin Liversage and jadeters they will be paying attention.
Here are my thoughts to your points:
If people have access to your database you have bigger security concerns than your hash algorithm and salt phrase. Hashes are somewhat secure, however there are problems such as hash collisions and hash lookups.
Hashes are one-way, so unless they can guess the input there is no way to reverse out the original text even with the algorithm and salt; hence the name one-way hash.
Security is about obscurity and layers of defense. If you layer your defenses and make determining what those defenses are you stand a much better chance of staving off an attack than if you relied on a single approach to security such as password hashing and running OS/network hardware updates. Throw in some curveballs like obsfucation of the web server platform and clear boundaries between the prod web and database environments. Layers and hiding implementation details buy you valuable time.
When hashing a password, it is one way. So it is very difficult to get the password even if you have the salt, source and alot of cpu cyles to burn.

The necessity of hiding the salt for a hash

At work we have two competing theories for salts. The products I work on use something like a user name or phone number to salt the hash. Essentially something that is different for each user but is readily available to us. The other product randomly generates a salt for each user and changes each time the user changes the password. The salt is then encrypted in the database.
My question is if the second approach is really necessary? I can understand from a purely theoretical perspective that it is more secure than the first approach, but what about from a practicality point of view. Right now to authenticate a user, the salt must be unencrypted and applied to the login information.
After thinking about it, I just don't see a real security gain from this approach. Changing the salt from account to account, still makes it extremely difficult for someone to attempt to brute force the hashing algorithm even if the attacker was aware of how to quickly determine what it was for each account. This is going on the assumption that the passwords are sufficiently strong. (Obviously finding the correct hash for a set of passwords where they are all two digits is significantly easier than finding the correct hash of passwords which are 8 digits). Am I incorrect in my logic, or is there something that I am missing?
EDIT: Okay so here's the reason why I think it's really moot to encrypt the salt. (lemme know if I'm on the right track).
For the following explanation, we'll assume that the passwords are always 8 characters and the salt is 5 and all passwords are comprised of lowercase letters (it just makes the math easier).
Having a different salt for each entry means that I can't use the same rainbow table (actually technically I could if I had one of sufficient size, but let's ignore that for the moment). This is the real key to the salt from what I understand, because to crack every account I have to reinvent the wheel so to speak for each one. Now if I know how to apply the correct salt to a password to generate the hash, I'd do it because a salt really just extends the length/complexity of the hashed phrase. So I would be cutting the number of possible combinations I would need to generate to "know" I have the password + salt from 13^26 to 8^26 because I know what the salt is. Now that makes it easier, but still really hard.
So onto encrypting the salt. If I know the salt is encrypted, I wouldn't try and decrypt (assuming I know it has a sufficient level of encryption) it first. I would ignore it. Instead of trying to figure out how to decrypt it, going back to the previous example I would just generate a larger rainbow table containing all keys for the 13^26. Not knowing the salt would definitely slow me down, but I don't think it would add the monumental task of trying to crack the salt encryption first. That's why I don't think it's worth it. Thoughts?
Here is a link describing how long passwords will hold up under a brute force attack:
http://www.lockdown.co.uk/?pg=combi
Hiding a salt is unnecessary.
A different salt should be used for every hash. In practice, this is easy to achieve by getting 8 or more bytes from cryptographic quality random number generator.
From a previous answer of mine:
Salt helps to thwart pre-computed dictionary attacks.
Suppose an attacker has a list of likely passwords. He can hash each
and compare it to the hash of his victim's password, and see if it
matches. If the list is large, this could take a long time. He doesn't
want spend that much time on his next target, so he records the result
in a "dictionary" where a hash points to its corresponding input. If
the list of passwords is very, very long, he can use techniques like a
Rainbow Table to save some space.
However, suppose his next target salted their password. Even if the
attacker knows what the salt is, his precomputed table is
worthless—the salt changes the hash resulting from each password. He
has to re-hash all of the passwords in his list, affixing the target's
salt to the input. Every different salt requires a different
dictionary, and if enough salts are used, the attacker won't have room
to store dictionaries for them all. Trading space to save time is no
longer an option; the attacker must fall back to hashing each password
in his list for each target he wants to attack.
So, it's not necessary to keep the salt secret. Ensuring that the
attacker doesn't have a pre-computed dictionary corresponding to that
particular salt is sufficient.
After thinking about this a bit more, I've realized that fooling yourself into thinking the salt can be hidden is dangerous. It's much better to assume the salt cannot be hidden, and design the system to be safe in spite of that. I provide a more detailed explanation in another answer.
However, recent recommendations from NIST encourage the use of an additional, secret "salt" (I've seen others call this additional secret "pepper"). One additional iteration of the key derivation can be performed using this secret as a salt. Rather than increasing strength against a pre-computed lookup attack, this round protects against password guessing, much like the large number of iterations in a good key derivation function. This secret serves no purpose if stored with the hashed password; it must be managed as a secret, and that could be difficult in a large user database.
The answer here is to ask yourself what you're really trying to protect from? If someone has access to your database, then they have access to the encrypted salts, and they probably have access to your code as well. With all that could they decrypt the encrypted salts? If so then the encryption is pretty much useless anyway. The salt really is there to make it so it isn't possible to form a rainbow table to crack your entire password database in one go if it gets broken into. From that point of view, so long as each salt is unique there is no difference, a brute force attack would be required with your salts or the encrypted salts for each password individually.
A hidden salt is no longer salt. It's pepper. It has its use. It's different from salt.
Pepper is a secret key added to the password + salt which makes the hash into an HMAC (Hash Based Message Authentication Code). A hacker with access to the hash output and the salt can theoretically brute force guess an input which will generate the hash (and therefore pass validation in the password textbox). By adding pepper you increase the problem space in a cryptographically random way, rendering the problem intractable without serious hardware.
For more information on pepper, check here.
See also hmac.
My understanding of "salt" is that it makes cracking more difficult, but it doesn't try to hide the extra data. If you are trying to get more security by making the salt "secret", then you really just want more bits in your encryption keys.
The second approach is only slightly more secure. Salts protect users from dictionary attacks and rainbow table attacks. They make it harder for an ambitious attacker to compromise your entire system, but are still vulnerable to attacks that are focused on one user of your system. If you use information that's publicly available, like a telephone number, and the attacker becomes aware of this, then you've saved them a step in their attack. Of course the question is moot if the attacker gets your whole database, salts and all.
EDIT: After re-reading over this answer and some of the comments, it occurs to me that some of the confusion may be due to the fact that I'm only comparing the two very specific cases presented in the question: random salt vs. non-random salt. The question of using a telephone number as a salt is moot if the attacker gets your whole database, not the question of using a salt at all.
... something like a user name or phone number to salt the hash. ...
My question is if the second approach is really necessary? I can understand from a purely theoretical perspective that it is more secure than the first approach, but what about from a practicality point of view?
From a practical point of view, a salt is an implementation detail. If you ever change how user info is collected or maintained – and both user names and phone numbers sometimes change, to use your exact examples – then you may have compromised your security. Do you want such an outward-facing change to have much deeper security concerns?
Does stopping the requirement that each account have a phone number need to involve a complete security review to make sure you haven't opened up those accounts to a security compromise?
Here is a simple example showing why it is bad to have the same salt for each hash
Consider the following table
UserId UserName, Password
1 Fred Hash1 = Sha(Salt1+Password1)
2 Ted Hash2 = Sha(Salt2+Password2)
Case 1 when salt 1 is the same as salt2
If Hash2 is replaced with Hash1 then user 2 could logon with user 1 password
Case 2 when salt 1 not the same salt2
If Hash2 is replaced with Hash1 then user2 can not logon with users 1 password.
There are two techniques, with different goals:
The "salt" is used to make two otherwise equal passwords encrypt differently. This way, an intruder can't efficiently use a dictionary attack against a whole list of encrypted passwords.
The (shared) "secret" is added before hashing a message, so that an intruder can't create his own messages and have them accepted.
I tend to hide the salt. I use 10 bits of salt by prepending a random number from 1 to 1024 to the beginning of the password before hashing it. When comparing the password the user entered with the hash, I loop from 1 to 1024 and try every possible value of salt until I find the match. This takes less than 1/10 of a second. I got the idea to do it this way from the PHP password_hash and password_verify. In my example, the "cost" is 10 for 10 bits of salt. Or from what another user said, hidden "salt" is called "pepper". The salt is not encrypted in the database. It's brute forced out. It would make the rainbow table necessary to reverse the hash 1000 times larger. I use sha256 because it's fast, but still considered secure.
Really, it depends on from what type of attack you're trying to protect your data.
The purpose of a unique salt for each password is to prevent a dictionary attack against the entire password database.
Encrypting the unique salt for each password would make it more difficult to crack an individual password, yes, but you must weigh whether there's really much of a benefit. If the attacker, by brute force, finds that this string:
Marianne2ae85fb5d
hashes to a hash stored in the DB, is it really that hard to figure out what which part is the pass and which part is the salt?

Non-random salt for password hashes

UPDATE: I recently learned from this question that in the entire discussion below, I (and I am sure others did too) was a bit confusing: What I keep calling a rainbow table, is in fact called a hash table. Rainbow tables are more complex creatures, and are actually a variant of Hellman Hash Chains. Though I believe the answer is still the same (since it doesn't come down to cryptanalysis), some of the discussion might be a bit skewed.
The question: "What are rainbow tables and how are they used?"
Typically, I always recommend using a cryptographically-strong random value as salt, to be used with hash functions (e.g. for passwords), such as to protect against Rainbow Table attacks.
But is it actually cryptographically necessary for the salt to be random? Would any unique value (unique per user, e.g. userId) suffice in this regard? It would in fact prevent using a single Rainbow Table to crack all (or most) passwords in the system...
But does lack of entropy really weaken the cryptographic strength of the hash functions?
Note, I am not asking about why to use salt, how to protect it (it doesn't need to be), using a single constant hash (don't), or what kind of hash function to use.
Just whether salt needs entropy or not.
Thanks all for the answers so far, but I'd like to focus on the areas I'm (a little) less familiar with. Mainly implications for cryptanalysis - I'd appreciate most if anyone has some input from the crypto-mathematical PoV.
Also, if there are additional vectors that hadn't been considered, that's great input too (see #Dave Sherohman point on multiple systems).
Beyond that, if you have any theory, idea or best practice - please back this up either with proof, attack scenario, or empirical evidence. Or even valid considerations for acceptable trade-offs... I'm familiar with Best Practice (capital B capital P) on the subject, I'd like to prove what value this actually provides.
EDIT: Some really good answers here, but I think as #Dave says, it comes down to Rainbow Tables for common user names... and possible less common names too. However, what if my usernames are globally unique? Not necessarily unique for my system, but per each user - e.g. email address.
There would be no incentive to build a RT for a single user (as #Dave emphasized, the salt is not kept secret), and this would still prevent clustering. Only issue would be that I might have the same email and password on a different site - but salt wouldnt prevent that anyway.
So, it comes back down to cryptanalysis - IS the entropy necessary, or not? (My current thinking is it's not necessary from a cryptanalysis point of view, but it is from other practical reasons.)
Salt is traditionally stored as a prefix to the hashed password. This already makes it known to any attacker with access to the password hash. Using the username as salt or not does not affect that knowledge and, therefore, it would have no effect on single-system security.
However, using the username or any other user-controlled value as salt would reduce cross-system security, as a user who had the same username and password on multiple systems which use the same password hashing algorithm would end up with the same password hash on each of those systems. I do not consider this a significant liability because I, as an attacker, would try passwords that a target account is known to have used on other systems first before attempting any other means of compromising the account. Identical hashes would only tell me in advance that the known password would work, they would not make the actual attack any easier. (Note, though, that a quick comparison of the account databases would provide a list of higher-priority targets, since it would tell me who is and who isn't reusing passwords.)
The greater danger from this idea is that usernames are commonly reused - just about any site you care to visit will have a user account named "Dave", for example, and "admin" or "root" are even more common - which would make construction of rainbow tables targeting users with those common names much easier and more effective.
Both of these flaws could be effectively addressed by adding a second salt value (either fixed and hidden or exposed like standard salt) to the password before hashing it, but, at that point, you may as well just be using standard entropic salt anyhow instead of working the username into it.
Edited to Add: A lot of people are talking about entropy and whether entropy in salt is important. It is, but not for the reason most of the comments on it seem to think.
The general thought seems to be that entropy is important so that the salt will be difficult for an attacker to guess. This is incorrect and, in fact, completely irrelevant. As has been pointed out a few times by various people, attacks which will be affected by salt can only be made by someone with the password database and someone with the password database can just look to see what each account's salt is. Whether it's guessable or not doesn't matter when you can trivially look it up.
The reason that entropy is important is to avoid clustering of salt values. If the salt is based on username and you know that most systems will have an account named either "root" or "admin", then you can make a rainbow table for those two salts and it will crack most systems. If, on the other hand, a random 16-bit salt is used and the random values have roughly even distribution, then you need a rainbow table for all 2^16 possible salts.
It's not about preventing the attacker from knowing what an individual account's salt is, it's about not giving them the big, fat target of a single salt that will be used on a substantial proportion of potential targets.
Using a high-entropy salt is absolutely necessary to store passwords securely.
Take my username 'gs' and add it to my password 'MyPassword' gives gsMyPassword. This is easily broken using a rainbow-table because if the username hasn't got enough entropy it could be that this value is already stored in the rainbow-table, especially if the username is short.
Another problem are attacks where you know that a user participates in two or more services. There are lots of common usernames, probably the most important ones are admin and root. If somebody created a rainbow-table that have salts with the most common usernames, he could use them to compromise accounts.
They used to have a 12-bit salt. 12 bit are 4096 different combinations. That was not secure enough because that much information can be easily stored nowadays. The same applies for the 4096 most used usernames. It's likely that a few of your users will be choosing a username that belongs to the most common usernames.
I've found this password checker which works out the entropy of your password. Having smaller entropy in passwords (like by using usernames) makes it much easier for rainbowtables as they try to cover at least all passwords with low entropy, because they are more likely to occur.
It is true that the username alone may be problematic since people may share usernames among different website. But it should be rather unproblematic if the users had a different name on each website. So why not just make it unique on each website. Hash the password somewhat like this
hashfunction("www.yourpage.com/"+username+"/"+password)
This should solve the problem. I'm not a master of cryptanalysis, but I sure doubt that the fact that we don't use high entropy would make the hash any weaker.
I like to use both: a high-entropy random per-record salt, plus the unique ID of the record itself.
Though this doesn't add much to security against dictionary attacks, etc., it does remove the fringe case where someone copies their salt and hash to another record with the intention of replacing the password with their own.
(Admittedly it's hard to think of a circumstance where this applies, but I can see no harm in belts and braces when it comes to security.)
If the salt is known or easily guessable, you have not increased the difficulty of a dictionary attack. It even may be possible to create a modified rainbow table that takes a "constant" salt into account.
Using unique salts increases the difficulty of BULK dictionary attacks.
Having unique, cryptographically strong salt value would be ideal.
I would say that as long as the salt is different for each password, you will probably be ok. The point of the salt, is so that you can't use standard rainbow table to solve every password in the database. So if you apply a different salt to every password (even if it isn't random), the attacker would basically have to compute a new rainbow table for each password, since each password uses a different salt.
Using a salt with more entropy doesn't help a whole lot, because the attacker in this case is assumed to already have the database. Since you need to be able to recreate the hash, you have to already know what the salt is. So you have to store the salt, or the values that make up the salt in your file anyway. In systems like Linux, the method for getting the salt is known, so there is no use in having a secret salt. You have to assume that the attacker who has your hash values, probably knows your salt values as well.
The strength of a hash function is not determined by its input!
Using a salt that is known to the attacker obviously makes constructing a rainbow table (particularly for hard-coded usernames like root) more attractive, but it doesn't weaken the hash. Using a salt which is unknown to the attacker will make the system harder to attack.
The concatenation of a username and password might still provide an entry for an intelligent rainbow table, so using a salt of a series pseudo-random characters, stored with the hashed password is probably a better idea. As an illustration, if I had username "potato" and password "beer", the concatenated input for your hash is "potatobeer", which is a reasonable entry for a rainbow table.
Changing the salt each time the user changes their password might help to defeat prolonged attacks, as would the enforcement of a reasonable password policy, e.g. mixed case, punctuation, min length, change after n weeks.
However, I would say your choice of digest algorithm is more important. Use of SHA-512 is going to prove to be more of a pain for someone generating a rainbow table than MD5, for example.
Salt should have as much entropy as possible to ensure that should a given input value be hashed multiple times, the resulting hash value will be, as close as can be achieved, always different.
Using ever-changing salt values with as much entropy as possible in the salt will ensure that the likelihood of hashing (say, password + salt) will produce entirely different hash values.
The less entropy in the salt, the more chance you have of generating the same salt value, as thus the more chance you have of generating the same hash value.
It is the nature of the hash value being "constant" when the input is known and "constant" that allow dictionary attacks or rainbow tables to be so effective. By varying the resulting hash value as much as possible (by using high entropy salt values) ensures that hashing the same input+random-salt will produce many different hash value results, thereby defeating (or at least greatly reducing the effectiveness of) rainbow table attacks.
Entropy is the point of Salt value.
If there is some simple and reproducible "math" behind salt, than it's the same as the salt is not there. Just adding time value should be fine.

Resources