Have gone through several questions on this topic at SO, and am unable to find answers to this specific query. I've seen
Salting Your Password: Best Practices? and the excellent answer to Non-random salt for password hashes, which both have very helpful guidelines, but doesn't have a clear guideline on storage.
Is it advisable to have the hash, random salt and iteration count all in the same table? If not, what is a suggested approach?
I do understand that rainbow tables can't be made easily with random salts in place, even if we have them together. The question is because there are many simple extra deterrents that can go a long way. For example, have the salt in a different table (injections usually leach a table, not a DB) and the iteration count in a different tier (say, a constant in mid-tier).
It is the normal pattern to store the salt and iteration count together with the computed hash.
The salt is not a secret. A salt 'works' by being different for each computed hash. If the attacker knows the salt and iteration count, it does not help him in any way.
We have answered this question in various forms over on Security Stack Exchange.
Salting with the first 8 bits of the password - general agreement this isn't a good idea
Splitting a password - also not a benefit
These are similar in concept to your approach of holding all the information in the same table.
#Greg's comment is partially right - a determined attacker will be able to get all the data eventually, but the key here is around timing. A skilled attacker, given enough time and resources will be able to access your systems - the key is to making it difficult or noisy enough that you spot it in time.
From one of cryptographer Thomas Pornin's posts on our Security Stack Exchange blog:
Why passwords should be hashed - we hash passwords to prevent an attacker with read-only access from escalating to higher power levels. Password hashing will not make your Web site impervious to attacks; it will still be hacked. Password hashing is damage containment.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What is the purpose of salt?
I've just been reading up a bit about the use of salts, and the example I've been reading gives that of adding a salt to a password before hashing to protect against a dictionary attack.
However I don't really see how that helps - if the attacker has access to the hash of the password (as they do in the example I've been reading) they most likely they will also have access to the salt.
Therefore can't an attacker just prepend and postpend the salt to each item in a dictionary before running through the dictionary to see if it matches the hash? So they have to iterate through the dictionary more than once, that's doesn't seem much of a protection enhancement?
A dictionary attack is an attack where the attacker takes a large list of passwords, possibly ordered by likelyhood/probability, and applies the algorithm for each of it, checking the result.
In case of a salted password, such an attack is still possible (and not significantly costlier), if the attacker has the salt (what is normally assumed): Simply input the salt in your algorithm, too.
What a salt protect against, is a rainbow table. A rainbow table is a table containing pairs of plaintext (e.g. passwords) and the corresponding hashes, ordered by hash. Such a table allows a simple lookup of the password, given the hash.
Producing a rainbow table is a costly step (depending on the size of the dictionary used as input), but then you can use it without any cost later to lookup as many passwords as wanted.
As salt protects against this, since you now would need a separate table for each salt. Even with the simple Unix crypt's 2-letter salt, this already is a factor of 3,844. Modern password hash algorithms use a much larger salt (for example bcrypt uses a 128-bit salt, which gives a factor of 2128.)
To protect against dictionary attacks, too, you'll use a slow hash algorithm instead of a fast one like simple MD5 or SHA1/SHA2. Bcrypt is such an algorithm (with a configurable work factor), and the same author later proposed scrypt (which not only takes much time, but also needs lots of memory, which attackers often don't have as much as processing power).
1- You can't use rainbow tables to crack the hashes
2- If two users have the same password the hash would be different if salted (so it's harder to catch common passwords)
It does increase the work they have to do by increasing the amount of possible answers in the password file.
One means of doing a dictionary attack is to scan the password file. If there is no salt and you see "DFGE$%$%£TEW" then you know the password is "PASSWORD". Adding salt means you'll have to use either a much larger dictionary containing all the values for "PASSWORD" with all possible salts, or you have to spend the effort to read the salt and do the encryption which slows you down. It's no longer a simple search.
Salt also helps in situations where more than one user chooses the same password. Especially in the old days when the password file was readable by all users, it makes it not obvious if another user has the same password as you, or the same password as one you know.
Actually a salt doesn't protect against dictionary attack. It has the following benefits:
Increase the computational cost of breaking it, because for each password in the dictonary the attacker need to try hash it with all possible salts.
Prevent two users that have the same password to have also the same hash. This way an attacker has to explicitely break all the passwords even if there are identical passwords in the same file (the hash of password is always different).
Dictionary attacks are based on words from the dictionary. By adding a random salt, you no longer have dictionary words. Thus a password hash table based on dictionary words will not be helpful in cracking a password.
Each salt value requires a different dictionary, so every database that doesn't use a salt can be attacked with the same dictionary.
Without any salt an attacker can just use an off-the-shelf
pre-computed dictionary, of which there are plenty.
If you have one salt for your entire database then they need to
create a dictionary specific to your database.
If each user record had it's own salt, now they need to create 1
dictionary per user.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Today I came up with a question about the web application conventions.
For the sake of security, if we store passwords of our users, most probably we are encrypting it (with MD5, SHA-1 etc.) and storing digested-hash in order to make them difficult or impossible to reverse.
Today there are many Rainbow Tables that are lookup tables of usual A-Za-z0-9 sequences up to 6 chars or widely used passwords. Let's say you are MD5-ing the user password once and storing the hash as password in database and someday hackers pwned your database and now they have many md5 hashes and e-mail addresses. Surely they'll look up passwords and when they got a preindexed match, they will try to login to that user's e-mail account.
Here this can be easily solved by digesting the message twice or simply reversing it. However I am wondering about what is the convention about this problem and how (as far as you know) enterprise applications or giants (Facebook, Google) solve this?
You use what is called a salt. Prepend some string that you make up before hashing. Prepend it also when you are checking the password. This is an application-wide string. This makes it much harder to look up via a rainbow table.
So if your salt is "kdi37s!!" save this in the db md5(kdi37s!!P#$$w3rd) and do the same when checking.
Use a little bit of salt and make a hash using sha1 or so.
Check out PBKDF2, it is one of the correct way to do it.
If you use an algorithm like BCrypt and salt (which uses the blowfish block cipher), it makes your db pretty safe against brute force attacks. Naturally, you want to require that your users have a reasonable amount of complexity in their password, if a user's password is a its not going to take long to guess it.
If an attacker gets a copy of your db, only being able to try 10 or so passwords a second will mean it will take a real long time to gain any passwords. If you are worried about Moore's law and would like to future proof this, you can specify a cost and make the algorithm even slower.
The trouble with a pure SHA/X or MD5 password hash is that by-design these algorithms are very fast, this makes it very sensitive to brute force attacks. Of course if you don't salt your hashes there are tons of rainbow tables that make cracking all the passwords in your db trivial.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
I've inherited a web app that I've just discovered stores over 300,000 usernames/passwords in plain text in a SQL Server database. I realize that this is a Very Bad Thing™.
Knowing that I'll have to update the login and password update processes to encrypt/decrypt, and with the smallest impact on the rest of the system, what would you recommend as the best way to remove the plain text passwords from the database?
Any help is appreciated.
Edit: Sorry if I was unclear, I meant to ask what would be your procedure to encrypt/hash the passwords, not specific encryption/hashing methods.
Should I just:
Make a backup of the DB
Update login/update password code
After hours, go through all records in the users table hashing the password and replacing each one
Test to ensure users can still login/update passwords
I guess my concern is more from the sheer number of users so I want to make sure I'm doing this correctly.
EDIT (2016): use Argon2, scrypt, bcrypt, or PBKDF2, in that order of preference. Use as large a slowdown factor as is feasible for your situation. Use a vetted existing implementation. Make sure you use a proper salt (although the libraries you're using should be making sure of this for you).
When you hash the passwords use DO NOT USE PLAIN MD5.
Use PBKDF2, which basically means using a random salt to prevent rainbow table attacks, and iterating (re-hashing) enough times to slow the hashing down - not so much that your application takes too long, but enough that an attacker brute-forcing a large number of different password will notice
From the document:
Iterate at least 1000 times, preferably more - time your implementation to see how many iterations are feasible for you.
8 bytes (64 bits) of salt are sufficient, and the random doesn't need to be secure (the salt is unencrypted, we're not worried someone will guess it).
A good way to apply the salt when hashing is to use HMAC with your favorite hash algorithm, using the password as the HMAC key and the salt as the text to hash (see this section of the document).
Example implementation in Python, using SHA-256 as the secure hash:
EDIT: as mentioned by Eli Collins this is not a PBKDF2 implementation. You should prefer implementations which stick to the standard, such as PassLib.
from hashlib import sha256
from hmac import HMAC
import random
def random_bytes(num_bytes):
return "".join(chr(random.randrange(256)) for i in xrange(num_bytes))
def pbkdf_sha256(password, salt, iterations):
result = password
for i in xrange(iterations):
result = HMAC(result, salt, sha256).digest() # use HMAC to apply the salt
return result
NUM_ITERATIONS = 5000
def hash_password(plain_password):
salt = random_bytes(8) # 64 bits
hashed_password = pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)
# return the salt and hashed password, encoded in base64 and split with ","
return salt.encode("base64").strip() + "," + hashed_password.encode("base64").strip()
def check_password(saved_password_entry, plain_password):
salt, hashed_password = saved_password_entry.split(",")
salt = salt.decode("base64")
hashed_password = hashed_password.decode("base64")
return hashed_password == pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)
password_entry = hash_password("mysecret")
print password_entry # will print, for example: 8Y1ZO8Y1pi4=,r7Acg5iRiZ/x4QwFLhPMjASESxesoIcdJRSDkqWYfaA=
check_password(password_entry, "mysecret") # returns True
The basic strategy is to use a key derivation function to "hash" the password with some salt. The salt and the hash result are stored in the database. When a user inputs a password, the salt and their input are hashed in the same way and compared to the stored value. If they match, the user is authenticated.
The devil is in the details. First, a lot depends on the hash algorithm that is chosen. A key derivation algorithm like PBKDF2, based on a hash-based message authentication code, makes it "computationally infeasible" to find an input (in this case, a password) that will produce a given output (what an attacker has found in the database).
A pre-computed dictionary attack uses pre-computed index, or dictionary, from hash outputs to passwords. Hashing is slow (or it's supposed to be, anyway), so the attacker hashes all of the likely passwords once, and stores the result indexed in such a way that given a hash, he can lookup a corresponding password. This is a classic tradeoff of space for time. Since password lists can be huge, there are ways to tune the tradeoff (like rainbow tables), so that an attacker can give up a little speed to save a lot of space.
Pre-computation attacks are thwarted by using "cryptographic salt". This is some data that is hashed with the password. It doesn't need to be a secret, it just needs to be unpredictable for a given password. For each value of salt, an attacker would need a new dictionary. If you use one byte of salt, an attacker needs 256 copies of their dictionary, each generated with a different salt. First, he'd use the salt to lookup the correct dictionary, then he'd use the hash output to look up a usable password. But what if you add 4 bytes? Now he needs 4 billion copies of the the dictionary. By using a large enough salt, a dictionary attack is precluded. In practice, 8 to 16 bytes of data from a cryptographic quality random number generator makes a good salt.
With pre-computation off the table, an attacker has compute the hash on each attempt. How long it takes to find a password now depends entirely on how long it takes to hash a candidate. This time is increased by iteration of the hash function. The number iterations is generally a parameter of the key derivation function; today, a lot of mobile devices use 10,000 to 20,000 iterations, while a server might use 100,000 or more. (The bcrypt algorithm uses the term "cost factor", which is a logarithmic measure of the time required.)
I would imagine you will have to add a column to the database for the encrypted password then run a batch job over all records which gets the current password, encrypts it (as others have mentiond a hash like md5 is pretty standard edit: but should not be used on its own - see other answers for good discussions), stores it in the new column and checks it all happened smoothly.
Then you will need to update your front-end to hash the user-entered password at login time and verify that vs the stored hash, rather than checking plaintext-vs-plaintext.
It would seem prudent to me to leave both columns in place for a little while to ensure that nothing hinky has gone on, before eventually removing the plaintext passwords all-together.
Don't forget also that anytime the password is acessed the code will have to change, such as password change / reminder requests. You will of course lose the ability to email out forgotten passwords, but this is no bad thing. You will have to use a password reset system instead.
Edit:
One final point, you might want to consider avoiding the error I made on my first attempt at a test-bed secure login website:
When processing the user password, consider where the hashing takes place. In my case the hash was calculated by the PHP code running on the webserver, but the password was transmitted to the page from the user's machine in plaintext! This was ok(ish) in the environment I was working in, as it was inside an https system anyway (uni network). But, in the real world I imagine you would want to hash the password before it leaves the user system, using javascript etc. and then transmit the hash to your site.
Follow Xan's advice of keeping the current password column around for a while so if things go bad, you can rollback quick-n-easy.
As far as encrypting your passwords:
use a salt
use a hash algorithm that's meant for passwords (ie., - it's slow)
See Thomas Ptacek's Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes for some details.
I think you should do the following:
Create a new column called HASHED_PASSWORD or something similar.
Modify your code so that it checks for both columns.
Gradually migrate passwords from the non-hashed table to the hashed one. For example, when a user logs in, migrate his or her password automatically to the hashed column and remove the unhashed version. All newly registered users will have hashed passwords.
After hours, you can run a script which migrates n users a time
When you have no more unhashed passwords left, you can remove your old password column (you may not be able to do so, depends on the database you are using). Also, you can remove the code to handle the old passwords.
You're done!
As the others mentioned, you don't want to decrypt if you can help it. Standard best practice is to encrypt using a one-way hash, and then when the user logs in hash their password to compare it.
Otherwise you'll have to use a strong encryption to encrypt and then decrypt. I'd only recommend this if the political reasons are strong (for example, your users are used to being able to call the help desk to retrieve their password, and you have strong pressure from the top not to change that). In that case, I'd start with encryption and then start building a business case to move to hashing.
For authentication purposes you should avoid storing the passwords using reversible encryption, i.e. you should only store the password hash and check the hash of the user-supplied password against the hash you have stored. However, that approach has a drawback: it's vulnerable to rainbow table attacks, should an attacker get hold of your password store database.
What you should do is store the hashes of a pre-chosen (and secret) salt value + the password. I.e., concatenate the salt and the password, hash the result, and store this hash. When authenticating, do the same - concatenate your salt value and the user-supplied password, hash, then check for equality. This makes rainbow table attacks unfeasible.
Of course, if the user send passwords across the network (for example, if you're working on a web or client-server application), then you should not send the password in clear text across, so instead of storing hash(salt + password) you should store and check against hash(salt + hash(password)), and have your client pre-hash the user-supplied password and send that one across the network. This protects your user's password as well, should the user (as many do) re-use the same password for multiple purposes.
Encrypt using something like MD5, encode it as a hex string
You need a salt; in your case, the username can be used as the salt (it has to be unique, the username should be the most unique value available ;-)
use the old password field to store the MD5, but tag the MD5 (i.e.g "MD5:687A878....") so that old (plain text) and new (MD5) passwords can co-exist
change the login procedure to verify against the MD5 if there is an MD5, and against the plain password otherwise
change the "change password" and "new user" functions to create MD5'ed passwords only
now you can run the conversion batch job, which might take as long as needed
after the conversion has been run, remove the legacy-support
Step 1: Add encrypted field to database
Step 2: Change code so that when password is changed, it updates both fields but logging in still uses old field.
Step 3: Run script to populate all the new fields.
Step 4: Change code so that logging in uses new field and changing passwords stops updating old field.
Step 5: Remove unencrypted passwords from database.
This should allow you to accomplish the changeover without interruption to the end user.
Also:
Something I would do is name the new database field something that is completely unrelated to password like "LastSessionID" or something similarly boring. Then instead of removing the password field, just populate with hashes of random data. Then, if your database ever gets compromised, they can spend all the time they want trying to decrypt the "password" field.
This may not actually accomplish anything, but it's fun thinking about someone sitting there trying to figure out worthless information
As with all security decisions, there are tradeoffs. If you hash the password, which is probably your easiest move, you can't offer a password retrieval function that returns the original password, nor can your staff look up a person's password in order to access their account.
You can use symmetric encryption, which has its own security drawbacks. (If your server is compromised, the symmetric encryption key may be compromised also).
You can use public-key encryption, and run password retrieval/customer service on a separate machine which stores the private key in isolation from the web application. This is the most secure, but requires a two-machine architecture, and probably a firewall in between.
MD5 and SHA1 have shown a bit of weakness (two words can result in the same hash) so using SHA256-SHA512 / iterative hashes is recommended to hash the password.
I would write a small program in the language that the application is written in that goes and generates a random salt that is unique for each user and a hash of the password. The reason I tend to use the same language as the verification is that different crypto libraries can do things slightly differently (i.e. padding) so using the same library to generate the hash and verify it eliminates that risk. This application could also then verify the login after the table has been updated if you want as it knows the plain text password still.
Don't use MD5/SHA1
Generate a good random salt (many crypto libraries have a salt generator)
An iterative hash algorithm as orip recommended
Ensure that the passwords are not transmitted in plain text over the wire
I would like to suggest one improvement to the great python example posted by Orip. I would redefine the random_bytes function to be:
def random_bytes(num_bytes):
return os.urandom(num_bytes)
Of course, you would have to import the os module. The os.urandom function provides a random sequence of bytes that can be safely used in cryptographic applications. See the reference help of this function for further details.
To hash the password you can use the HashBytes function. Returns a varbinary, so you'd have to create a new column and then delete the old varchar one.
Like
ALTER TABLE users ADD COLUMN hashedPassword varbinary(max);
ALTER TABLE users ADD COLUMN salt char(10);
--Generate random salts and update the column, after that
UPDATE users SET hashedPassword = HashBytes('SHA1',salt + '|' + password);
Then you modify the code to validate the password, using a query like
SELECT count(*) from users WHERE hashedPassword =
HashBytes('SHA1',salt + '|' + <password>)
where <password> is the value entered by the user.
I'm not a security expert, but i htink the current recommendation is to use bcrypt/blowfish or a SHA-2 variant, not MD5 / SHA1.
Probably you need to think in terms of a full security audit, too