Store passwords safely but determine same passwords - security

I have legacy browser game which historicaly uses simple hashing function for password storage. I know that it' far from ideal. However time has proven that most of the cheaters (multiaccounts) use same password for all of fake accounts.
In update of my game I want to store passwords more safely. I already know, that passwords should by randomly salted, hashed by safe algorithms etc. That's all nice.
But is there any way, how to store passwords properly and determine that two (or more) users use same password? I don't want to know the password. I don't want to be able to search by password. I only need to tell, that suspect users A, B and C use same one.
Thanks.

If you store them correctly - no. This is one of the points of a proper password storage.
You could have very long passwords, beyond what is available on rainbow tables (not sure about the current state of the art, but it used to be 10 or 12 characters) and not salt them. In this case two passwords would have the same hash. This is a very bad idea (but a solution nevertheless) - if your passwords leak someone may be able to guess them indirectly (xkcd reference).
You may also look at homomorphic encryption, but this is in the realm of science fiction for now.

Well, if you use salt + hashing, you have all the salts as plain text. When a user enters a password, before storing/verifying it, you can hash it with all the salts available and see if you get the corresponding existing hash. :)
The obvious problem with this is that if you are doing it properly with bcrypt or pbkdf2 for hashing, this would be very slow - that's kind of the point in these functions.
I don't think there is any other way you can tell whether two passwords are the same - you need at least one of them plain text, which is only when the user enters it. And then you want to remove it from memory asap, which contradicts doing all these calculations with the plain text password in memory.

This will reduce the security of all passwords somewhat, since it leaks information about when two users have the same password. Even so, it is a workable trade-off and is straightforward to secure within that restriction.
The short answer is: use the same salt for all the passwords, but make that salt unique to your site.
Now the long answer:
First, to describe a standard and appropriate way to handle passwords. I'll get to the differences for you afterwards. (You may know all of this already, but it's worth restating.)
Start with a decent key-stretching algorithm, such as PBKDF2 (there are others, some even better, but PBKDF2 is ubiquitous and sufficient for most uses). Select a number of iterations depending on what is client-side environment is involved. For JavaScript, you'll want something like 1k-4k iterations. For languages with faster math, you can use 10k-100k.
The key stretcher will need a salt. I'll talk about the salt in a moment.
The client sends the password to the server. The server applies a fast hash (SHA-256 is nice) and compares that to the stored hash. (For setting the password, the server does the same thing; it accepts a PBKDF2 hash, applies SHA-256, and then stores it.)
All that is standard stuff. The question is the salt. The best salt is random, but no good for this. The second-best salt is built from service_id+user_id (i.e. use a unique identifier for the service and concatenate the username). Both of these make sure that every user's password hash is unique, even if their passwords are identical. But you don't want that.
So now finally to the core of your question. You want to use a per-service, but not per-user, static salt. So something like "com.example.mygreatapp" (obviously don't use that actual string; use a string based on your app). With a constant salt, all passwords on your service that are the same will stretch (PBKDF2) and hash (SHA256) to the same value and you can compare them without having any idea what the actual password is. But if your password database is stolen, attackers cannot compare the hashes in it to hashes in other sites' databases, even if they use the same algorithm (because they'll have a different salt).
The disadvantage of this scheme is exactly its goal: if two people on your site have the same password and an attacker steals your database and knows the password of one user, they know the password of the other user, too. That's the trade-off.

Related

how salt can be implemented to prevent pre-computation dictionary attack on password

A salt makes every users password hash unique, and adding a salt to a password before hashing to protect against a dictionary attack. But how?
The tool you almost certainly want is called PBKDF2 (Password-Based Key Derivation Function 2). It's widely available, either under the name "pbkdf2" or "RFC 2898". PBKDF2 provides both salting (making two otherwise identical passwords different) and stretching (making it expensive to guess passwords).
Whatever system you are developing for probably has a function available that takes a password, a salt, a number of iterations, and an output size. Given those, it will output some string of bytes. There are several ways to actually make use of this depending on your situation (most notably are you dealing with local authentication or remote authentication?)
Most people are looking for remote authentication, so let's walk through a reasonable way to implement that using a mix of deterministic and random salts. (See further discussion below w/ #SilverlightFox.)
First, the high-level approach:
Hash on the client against a deterministic salt. The client should never send a bare password to the server. Users reuse their passwords all the time. You don't want to know their actual password. You'd rather never see it.
Salt randomly and stretch on the server and then compare.
Here's the actual breakdown:
Choose an app-specific component for your salt. For example, "net.robnapier.mygreatapp" might be my prefix.
Choose a user-specific component for your salt. The userid is usually ideal here.
Concatenate them to create your salt. For example, my salt might be "net.robnapier.mygreatapp:suejones#example.org". The actual salt does not matter too much. What matters is that it is at least "mostly" unique across all of your users and across all other sites that might also hash passwords from your users. The scheme I've given achieves that.
Choose a local number of iterations for PBKDF2. That number is almost certainly 1000. This is too few iterations, but is about all JavaScript can handle reasonably. The more iterations, the more secure the system, but the worse the performance. It's a tension.
Choose a length for your hash. 32 bytes is generally a good choice.
Choose a "PRF" if your system allows you to pick one. HMAC-SHA-256 is a good choice.
You now have all the basic pieces in place. Let's compute some hashes.
On the client, take the password and pass it through PBKDF2 with the above settings. That will give you 32 bytes to send to the server.
On the server, if this is the account creation, create 8 or 16 bytes of random data as your salt for this account. Save that in the database along with the username. Use that salt, and another set of iterations (usually 10,000 or 100,000 if you're not in Node) and apply PBKDF2 to the data that the user sent. Store that in the database. If you're testing the password, just read the salt from the database and reapply PBKDF2 to validate.
Everywhere I say "PBKDF2" here there are another options, probably the most common of which is scrypt (there is also bcrypt). The other options are technically better than PBKDF2. I don't think anyone would disagree with that. I usually recommend PBKDF2 because it's so ubiquitous and there's nothing really wrong with it. But if you have scrypt available, feel free to use that. The client and server do not have to use the same algorithm (the client can use PBKDF2 and the server can use scrypt if you like).
What's the md5 hash of "superCommonPassword"? That's easy to pre-calculate.
It's b77755edafab848ffcb9580307e97414
If you steal a password database and see that hash value, you know the password is probably "superCommonPassword".
What's the md5 hash ("aStringYouDontKnowUntilYouStealAPasswordDatabase" + "superCommonPassword")? Oh, you can't calculate that until you steal the database.
An unknown salt means pre-calculating hashes of common passwords is useless. An unknown salt per user means you need to calculate hashes of common passwords for each user. This slows down the attacker and increases his costs.
Don't use md5 for password hashing though. Use bcrypt or scrypt or PBKDF2.

Multiple Salts to protect passwords

If you use a salt before hashing a password - it will make the hash more secure. It makes sense, because rainbow table attacks become much more difficult (impossible?).
What if you use multiple salts? For example - you check if the day is Monday, or the Month, the hour, etc (or some combination). Then you have a database which stores the fields: (userid, hash1, hash2, hash3...).
Would this make the information any more (or less) secure?
Example:
1) User registers with password 'PASS'.
2) System (php in this example) stores values (md5($password.$this_day)) for each day (7 passwords). into table password, column hash_monday, hash_tuesday etc.
3) user logs in, and script checks password where 'hash_'.$this_day matches what is entered.
Your system will be no more secure - you end up with several single salt databases instead of one. In principle it may be even less secure, since you helpfully provide the attacker with 7 hashes to the same string to choose from and he only needs to guess one. These multiple hashes of the same plaintext may also lead to implications to cryptographic strength of the encryption used for passwords (not sure on that one and it will depend on the algorithm used).
Maybe you should have a look at this small article. There are several things wrong with your approach.
A salt does not protect against a dictionary attack. It protects against rainbow-tables if correctly used.
Use a unique salt for each password. The salt should be a random value, not derrived from known information. It has to be stored with the password.
Do not use MD5 for hashing passwords. Md5 is considered broken, and it is ways too fast to hash passwords. With an off-the-shelf GPU, you are able to calculate 8 Giga MD5-hashes per second (in 2012). That makes it possible to brute-force a whole english dictionary with about 500000 words, in less than 0.1 milliseconds!
Use Bcrypt for hashing passwords. It is recommended to use a well established library like phpass, and if you want to understand how it can be implemented, you can read the article above.
If you want to add a secret to your hash function (like a hidden key, or a hidden function), you can add a pepper to the password. The pepper should not be stored in the database, and should remain secret. The pepper can protect against dictionary attacks, as long as the attacker has only access to your password-hashes (SQL-Injection), but not to the server with the secret.
I do not believe multiple hashes are going to help you in this scenario, primarily because when someone compromises your database they will notice that you have 7 different salts to go against and may make an educated guess that they are based on days of the week. There is nothing fundamentally wrong with MD5, as so many people like to jump on that bandwagon. The types of people that say MD5 is a broken hash have a fundamental misunderstanding between a hash function and a cryptographic hash function, I would recommend ignoring them. In the event you need a cryptographic hash function, use SHA-2 (or something from that family or greater).
You will need to salt the user input, as you know, a random value is generally recommended,but it can also be a value you store in a separate application space (outside of the database), you just have to protect that information as well. I highly recommend making the password hashing function take several thousand iterations for any input. As this will slow down the automated process of matching hashes on the database.
If your users use easy to guess passwords, dictionary attacks will beat you every day, cant protect against stupidity.

Password hashing, salt and storage of hashed values

Suppose you were at liberty to decide how hashed passwords were to be stored in a DBMS. Are there obvious weaknesses in a scheme like this one?
To create the hash value stored in the DBMS, take:
A value that is unique to the DBMS server instance as part of the salt,
And the username as a second part of the salt,
And create the concatenation of the salt with the actual password,
And hash the whole string using the SHA-256 algorithm,
And store the result in the DBMS.
This would mean that anyone wanting to come up with a collision should have to do the work separately for each user name and each DBMS server instance separately. I'd plan to keep the actual hash mechanism somewhat flexible to allow for the use of the new NIST standard hash algorithm (SHA-3) that is still being worked on.
The 'value that is unique to the DBMS server instance' need not be secret - though it wouldn't be divulged casually. The intention is to ensure that if someone uses the same password in different DBMS server instances, the recorded hashes would be different. Likewise, the user name would not be secret - just the password proper.
Would there be any advantage to having the password first and the user name and 'unique value' second, or any other permutation of the three sources of data? Or what about interleaving the strings?
Do I need to add (and record) a random salt value (per password) as well as the information above? (Advantage: the user can re-use a password and still, probably, get a different hash recorded in the database. Disadvantage: the salt has to be recorded. I suspect the advantage considerably outweighs the disadvantage.)
There are quite a lot of related SO questions - this list is unlikely to be comprehensive:
Encrypting/Hashing plain text passwords in database
Secure hash and salt for PHP passwords
The necessity of hiding the salt for a hash
Clients-side MD5 hash with time salt
Simple password encryption
Salt generation and Open Source software
Password hashes: fixed-length binary fields or single string field?
I think that the answers to these questions support my algorithm (though if you simply use a random salt, then the 'unique value per server' and username components are less important).
The salt just needs to be random and unique. It can be freely known as it doesn't help an attacker. Many systems will store the plain text salt in the database in the column right next to the hashed password.
The salt helps to ensure that if two people (User A and User B) happen to share the same password it isn't obvious. Without the random and unique salt for each password the hash values would be the same and obviously if the password for User A is cracked then User B must have the same password.
It also helps protect from attacks where a dictionary of hashes can be matched against known passwords. e.g. rainbow tables.
Also using an algorithm with a "work factor" built in also means that as computational power increases the work an algorithm has to go through to create the hash can also be increased. For example, bcrypt. This means that the economics of brute force attacks become untenable. Presumably it becomes much more difficult to create tables of known hashes because they take longer to create; the variations in "work factor" will mean more tables would have to be built.
I think you are over-complicating the problem.
Start with the problem:
Are you trying to protect weak passwords?
Are you trying to mitigate against rainbow attacks?
The mechanism you propose does protect against a simple rainbow attack, cause even if user A and user B have the SAME password, the hashed password will be different. It does, seem like a rather elaborate method to be salting a password which is overly complicated.
What happens when you migrate the DB to another server?
Can you change the unique, per DB value, if so then a global rainbow table can be generated, if not then you can not restore your DB.
Instead I would just add the extra column and store a proper random salt. This would protect against any kind of rainbow attack. Across multiple deployments.
However, it will not protect you against a brute force attack. So if you are trying to protect users that have crappy passwords, you will need to look elsewhere. For example if your users have 4 letter passwords, it could probably be cracked in seconds even with a salt and the newest hash algorithm.
I think you need to ask yourself "What are you hoping to gain by making this more complicated than just generating a random salt value and storing it?" The more complicated you make your algorithm, the more likely you are to introduce a weakness inadvertently. This will probably sound snarky no matter how I say it, but it's meant helpfully - what is so special about your app that it needs a fancy new password hashing algorithm?
Why not add a random salt to the password and hash that combination. Next concatenate the hash and salt to a single byte[] and store that in the db?
The advantage of a random salt is that the user is free to change it's username. The Salt doesn't have to be secret, since it's used to prevent dictionary attacks.

How to store passwords *correctly*?

An article that I stumbled upon here in SO provided links to other articles which in turn provided links to even more articles etc.
And in the end I was left completely stumped - so what is the best way to store passwords in the DB? From what I can put together you should:
Use a long (at least 128 fully random bits) salt, which is stored in plaintext next to the password;
Use several iterations of SHA-256 (or even greater SHA level) on the salted password.
But... the more I read about cryptography the more I understand that I don't really understand anything, and that things I had thought to be true for years are actually are flat out wrong. Are there any experts on the subject here?
Added: Seems that some people are missing the point. I repeat the last link given above. That should clarify my concerns.
https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2007/july/enough-with-the-rainbow-tables-what-you-need-to-know-about-secure-password-schemes/
You got it right. Only two suggestions:
If one day SHA1 becomes too weak and you want to use something else, it is impossible to unhash the old passwords and rehash them with the new scheme. For this reason, I suggest that attached to each password a "version" number that tells you what scheme you used (salt length, which hash, how many times). If one day you need to switch from SHA to something stronger, you can create new-style passwords while still having old-style passwords in the database and still tell them apart. Migrating users to the new scheme will be easier.
Passwords still go from user to system without encryption. Look at SRP if that's a problem. SRP is so new that you should be a little paranoid about implementing it, but so far it looks promising.
Edit: Turns out bcrypt beat me to it on idea number 1. The stored info is (cost, salt, hash), where cost is how many times the hashing has been done. Looks like bcrypt did something right. Increasing the number of times that you hash can be done without user intervention.
In truth it depends on what the passwords are for. You should take storing any password with care, but sometimes much greater care is needed than others. As a general rule all passwords should be hashed and each password should have a unique salt.
Really, salts don't need to be that complex, even small ones can cause a real nightmare for crackers trying to gain entry into the system. They are added to a password to prevent the use of Rainbow tables to hack multiple account's passwords. I wouldn't add a single letter of the alphabet to a password and call it a salt, but you don't need to make it a unique guid which is encrypted somewhere else in the database either.
One other thing concerning salts. The key to making a password + salt work when hashing is the complexity of the combination of the two. If you have a 12 character password and add a 1 character salt to it, the salt doesn't do much, but cracking the password is still a monumental feat. The reverse is also true.
Use:
Hashed password storage
A 128+ bit user-level salt, random, regenerated (i.e. you make new salts when you make new password hashes, you don't persistently keep the same salt for a given user)
A strong, computationally expensive hashing method
Methodology that is somewhat different (hash algorithm, how many hashing iterations you use, what order the salts are concatenated in, something) from both any 'standard implementation guides' like these and from any other password storage implementation you've written
I think there no extra iteration on the password needed, juste make sure there is a salt, and a complexe one ;)
I personnaly use SHA-1 combined with 2 salt keyphrases.
The length of the salt doesnt really matter, as long as it is unique to a user. The reason for a salt is so that a given generated attempt at a hash match is only useful for a single row of your users table in the DB.
Simply said, use a cryptographically secure hash algorithm and some salt for the passwords, that should be good enough for 99.99% of all use cases. The weak link will be the code that checks the password as well as the password input.

The necessity of hiding the salt for a hash

At work we have two competing theories for salts. The products I work on use something like a user name or phone number to salt the hash. Essentially something that is different for each user but is readily available to us. The other product randomly generates a salt for each user and changes each time the user changes the password. The salt is then encrypted in the database.
My question is if the second approach is really necessary? I can understand from a purely theoretical perspective that it is more secure than the first approach, but what about from a practicality point of view. Right now to authenticate a user, the salt must be unencrypted and applied to the login information.
After thinking about it, I just don't see a real security gain from this approach. Changing the salt from account to account, still makes it extremely difficult for someone to attempt to brute force the hashing algorithm even if the attacker was aware of how to quickly determine what it was for each account. This is going on the assumption that the passwords are sufficiently strong. (Obviously finding the correct hash for a set of passwords where they are all two digits is significantly easier than finding the correct hash of passwords which are 8 digits). Am I incorrect in my logic, or is there something that I am missing?
EDIT: Okay so here's the reason why I think it's really moot to encrypt the salt. (lemme know if I'm on the right track).
For the following explanation, we'll assume that the passwords are always 8 characters and the salt is 5 and all passwords are comprised of lowercase letters (it just makes the math easier).
Having a different salt for each entry means that I can't use the same rainbow table (actually technically I could if I had one of sufficient size, but let's ignore that for the moment). This is the real key to the salt from what I understand, because to crack every account I have to reinvent the wheel so to speak for each one. Now if I know how to apply the correct salt to a password to generate the hash, I'd do it because a salt really just extends the length/complexity of the hashed phrase. So I would be cutting the number of possible combinations I would need to generate to "know" I have the password + salt from 13^26 to 8^26 because I know what the salt is. Now that makes it easier, but still really hard.
So onto encrypting the salt. If I know the salt is encrypted, I wouldn't try and decrypt (assuming I know it has a sufficient level of encryption) it first. I would ignore it. Instead of trying to figure out how to decrypt it, going back to the previous example I would just generate a larger rainbow table containing all keys for the 13^26. Not knowing the salt would definitely slow me down, but I don't think it would add the monumental task of trying to crack the salt encryption first. That's why I don't think it's worth it. Thoughts?
Here is a link describing how long passwords will hold up under a brute force attack:
http://www.lockdown.co.uk/?pg=combi
Hiding a salt is unnecessary.
A different salt should be used for every hash. In practice, this is easy to achieve by getting 8 or more bytes from cryptographic quality random number generator.
From a previous answer of mine:
Salt helps to thwart pre-computed dictionary attacks.
Suppose an attacker has a list of likely passwords. He can hash each
and compare it to the hash of his victim's password, and see if it
matches. If the list is large, this could take a long time. He doesn't
want spend that much time on his next target, so he records the result
in a "dictionary" where a hash points to its corresponding input. If
the list of passwords is very, very long, he can use techniques like a
Rainbow Table to save some space.
However, suppose his next target salted their password. Even if the
attacker knows what the salt is, his precomputed table is
worthless—the salt changes the hash resulting from each password. He
has to re-hash all of the passwords in his list, affixing the target's
salt to the input. Every different salt requires a different
dictionary, and if enough salts are used, the attacker won't have room
to store dictionaries for them all. Trading space to save time is no
longer an option; the attacker must fall back to hashing each password
in his list for each target he wants to attack.
So, it's not necessary to keep the salt secret. Ensuring that the
attacker doesn't have a pre-computed dictionary corresponding to that
particular salt is sufficient.
After thinking about this a bit more, I've realized that fooling yourself into thinking the salt can be hidden is dangerous. It's much better to assume the salt cannot be hidden, and design the system to be safe in spite of that. I provide a more detailed explanation in another answer.
However, recent recommendations from NIST encourage the use of an additional, secret "salt" (I've seen others call this additional secret "pepper"). One additional iteration of the key derivation can be performed using this secret as a salt. Rather than increasing strength against a pre-computed lookup attack, this round protects against password guessing, much like the large number of iterations in a good key derivation function. This secret serves no purpose if stored with the hashed password; it must be managed as a secret, and that could be difficult in a large user database.
The answer here is to ask yourself what you're really trying to protect from? If someone has access to your database, then they have access to the encrypted salts, and they probably have access to your code as well. With all that could they decrypt the encrypted salts? If so then the encryption is pretty much useless anyway. The salt really is there to make it so it isn't possible to form a rainbow table to crack your entire password database in one go if it gets broken into. From that point of view, so long as each salt is unique there is no difference, a brute force attack would be required with your salts or the encrypted salts for each password individually.
A hidden salt is no longer salt. It's pepper. It has its use. It's different from salt.
Pepper is a secret key added to the password + salt which makes the hash into an HMAC (Hash Based Message Authentication Code). A hacker with access to the hash output and the salt can theoretically brute force guess an input which will generate the hash (and therefore pass validation in the password textbox). By adding pepper you increase the problem space in a cryptographically random way, rendering the problem intractable without serious hardware.
For more information on pepper, check here.
See also hmac.
My understanding of "salt" is that it makes cracking more difficult, but it doesn't try to hide the extra data. If you are trying to get more security by making the salt "secret", then you really just want more bits in your encryption keys.
The second approach is only slightly more secure. Salts protect users from dictionary attacks and rainbow table attacks. They make it harder for an ambitious attacker to compromise your entire system, but are still vulnerable to attacks that are focused on one user of your system. If you use information that's publicly available, like a telephone number, and the attacker becomes aware of this, then you've saved them a step in their attack. Of course the question is moot if the attacker gets your whole database, salts and all.
EDIT: After re-reading over this answer and some of the comments, it occurs to me that some of the confusion may be due to the fact that I'm only comparing the two very specific cases presented in the question: random salt vs. non-random salt. The question of using a telephone number as a salt is moot if the attacker gets your whole database, not the question of using a salt at all.
... something like a user name or phone number to salt the hash. ...
My question is if the second approach is really necessary? I can understand from a purely theoretical perspective that it is more secure than the first approach, but what about from a practicality point of view?
From a practical point of view, a salt is an implementation detail. If you ever change how user info is collected or maintained – and both user names and phone numbers sometimes change, to use your exact examples – then you may have compromised your security. Do you want such an outward-facing change to have much deeper security concerns?
Does stopping the requirement that each account have a phone number need to involve a complete security review to make sure you haven't opened up those accounts to a security compromise?
Here is a simple example showing why it is bad to have the same salt for each hash
Consider the following table
UserId UserName, Password
1 Fred Hash1 = Sha(Salt1+Password1)
2 Ted Hash2 = Sha(Salt2+Password2)
Case 1 when salt 1 is the same as salt2
If Hash2 is replaced with Hash1 then user 2 could logon with user 1 password
Case 2 when salt 1 not the same salt2
If Hash2 is replaced with Hash1 then user2 can not logon with users 1 password.
There are two techniques, with different goals:
The "salt" is used to make two otherwise equal passwords encrypt differently. This way, an intruder can't efficiently use a dictionary attack against a whole list of encrypted passwords.
The (shared) "secret" is added before hashing a message, so that an intruder can't create his own messages and have them accepted.
I tend to hide the salt. I use 10 bits of salt by prepending a random number from 1 to 1024 to the beginning of the password before hashing it. When comparing the password the user entered with the hash, I loop from 1 to 1024 and try every possible value of salt until I find the match. This takes less than 1/10 of a second. I got the idea to do it this way from the PHP password_hash and password_verify. In my example, the "cost" is 10 for 10 bits of salt. Or from what another user said, hidden "salt" is called "pepper". The salt is not encrypted in the database. It's brute forced out. It would make the rainbow table necessary to reverse the hash 1000 times larger. I use sha256 because it's fast, but still considered secure.
Really, it depends on from what type of attack you're trying to protect your data.
The purpose of a unique salt for each password is to prevent a dictionary attack against the entire password database.
Encrypting the unique salt for each password would make it more difficult to crack an individual password, yes, but you must weigh whether there's really much of a benefit. If the attacker, by brute force, finds that this string:
Marianne2ae85fb5d
hashes to a hash stored in the DB, is it really that hard to figure out what which part is the pass and which part is the salt?

Resources