Related
We've got a legacy system that stores passwords using the MS membership provider and have just found out that it only used a SHA1 with a random salt to store the passwords, so obviously we are concerned about this situation. I know the ideal situation would be to force a global password reset, but for assorted reasons we would like to avoid this, if possible, and keep the existing passwords as-is. I've done some poking around and have managed to find some source-code, and can re-hash my password so I get the same result as the stored version of it, so I am wanting to override all the appropriate methods to reimplement the code in a secure manner.
What I am proposing to do is to rehash the stored passwords using a currently "secure" hash (AFAIK, the current methods are only classed as secure due to the amount of computation time it takes to brute-force a password, so if systems get a big performance upgrade the whole programming world could end up having to revisit this), then wrap this hash around the existing hash in the code, but I have 2 questions:-
Is this actually secure? As far as I have read, each hash needs to increase the amount of entropy and I'm 90% sure this will do so, but are there any issues in doing this that I need to be aware of? I'm also guessing that in chains of hashing it's the strongest hash function that determines the "base-line" security level, but I thought I'd double-check there weren't any weird mathematical quirks with hashing an "insecure" hash. I'm sure not, but due to the nature of the problem, I'd rather ask a stupid question than make any incorrect assumptions, as the technical aspect of hashing functions isn't something I've really looked into.
Should I re-apply the salt to the current hash, before re-hashing. My thinking on this was in case there were existing calculated tables which convert older hashes to newer ones - in case someone nefarious had done the grunt-work to try to bypass this method. I believe b-crypt may already include a salt, but if I use an alternative that doesn't, I'm guessing should include one?
Double hashing can be a good way to protect very weak password-hashes immediately, if you can't wait on the next user login and don't want to enforce a login. Weak password-hashes include unsalted hashes or very fast hashes like SHA-*/MD5.
So you can prepare your database like this:
Make the old salt persistent in the database, you need the oldSalt to verify the double hash.
Calculate the double hash and store it in the database newHash = newSafeHashFunction(oldHash, newSalt). Nowadays safe hash functions are BCrypt, SCrypt, Argon2 and PBKDF2. Generate a new salt fullfilling the requirements of the new password-hash function.
After the next successful login, the double hash should be replaced with the pure new algorithm newSafeHashFunction(password, newSalt).
Most password-hash implementations will generate a safe salt on their own and include it in the resulting password-hash string, so there is no need to generate and store them separately. When the user logs in the next time, the password can be verified like this:
if (checkIfDoubleHash(storedHash))
correctPassword = newSafeHashFunction(oldUnsafeHashFunction(password, oldSalt), storedHash)
else
correctPassword = newSafeHashFunction(password, storedHash)
➽ Note the function checkIfDoubleHash(), it is crucial and a common pitfall for double hashing. If we would generally accept newSafeHashFunction(password, storedHash) and an attacker can get hold of an old backup, or has values from an earlier SQL-injection, (s)he could use the old hashes directly as password.
The implementation of checkIfDoubleHash() can be as easy as checking for the old salt, or it can be made future proof in marking the hash as double hash. Most frameworks already offer a password_hash() function which adds such a "mark", so they can switch to newer algorithms when necessary.
$2y$10$nOUIs5kJ7naTuTFkBy1veuK0kSxUFXfuaOKdOKf9xYT0KKIGSJwFa
|
hash-algorithm-descriptor = 2y = BCrypt
This is an often adopted format used by the Unix crypt() function. There is nothing preventing you from using your own descriptor for the double hashes. Of course the mark can also be stored in a separate database field.
I've been asked to implement some changes/updates to an intranet-site; make it 'future proof' as they call it.
We found that the passwords are hashed using the MD5 algorithm. (the system has been around since 2001 so it was adequate at time).
We would now like to upgrade the hashing-algorithm to a stronger one (BCrypt-hash or SHA-256).
We obviously do not know the plaintext-passwords and creating a new password for the userbase is not an option*).
So, my question is:
What is the accepted way to change hashing-algorithm without having access to the plaintext passwords?
The best solution would be a solution that is entirely 'behind the scenes'.
*) we tried; tried to convince them, we used the argument of 'password age', tried to bribe them with coffee, tried to bribe them with cake, etc. etc. But it is not an option.
Update
I was hoping for some sort of automagic solution for solving the problem, but apparently there are no other options than just 'wait for the user to log in, then convert'.
Well, at least now I now there is no other solution available.
First, add a field to the DB to identify whether or not the password is using MD5 or the new algorithm.
For all passwords still using MD5:
-- In the login process, where you verify a user's entered password: temporarily store the user's submitted password in memory (no security issue here, as it is already in memory somewhere) and do the usual MD5 hash & compare with the stored hash;
-- If the correct password was given (matches the existing hash), run the temporarily stored password through the new algorithm, store that value, update the new field to identify that this password has been updated to the new algorithm.
(Of course you would just use the new algorithm for any new users/new passwords.)
I'm not entirely sure about this option, since I'm not an expert on cryptography. Please correct me if I'm wrong at some point here!
I think Dave P. has clearly the best option.
... but. There is an automagic solution - hash the older hashes themselves. That is, take the current hashes, and hash them again with a stronger algorithm. Notice that as far as I understand, you don't get any added security from hash length here, only the added cryptographical complexity of the new algorithm.
The problem is, of course, that checking a password would then have to go through both hashes. And you'd have to do the same for evey new password as well. Which is, well, pretty much silly. Unless you want to use a similar scheme like Dave P. explained to eventually graduate back to single-hashed passwords with the new hashing algorithm... in which case, why even bother with this? (Granted, you might use it in a flashy "Improved security for all passwords, applied immediately!"-way at a presentation to corporate suits, with a relatively straight face...)
Still, it's an option that can be applied immediately to all current passwords, without any gradual migration phase.
But boy, oh boy, is someone going to have a good laugh looking at that code later on! :)
Add passwordChange datetime field to the database.
All password set before day X, check using MD5
All passwords set after day X, check using BCrypt or whatever.
You could store, either in the hash field itself (e.g. "MD5:d41d8cd98f00b204e9800998ecf8427e") or in another column, which algorithm was used to create that hash. Then you'd have to modify the login process to use the correct algorithm when checking the password. Naturally, any new passwords will be hashed using the new algorithm. Hopefully, passwords eventually expire, and over time all of the MD5 hashes will be phased out.
Since you don't know plaintext password, maybe you should to create a field which indicates encription version (like PasswordVersion bit default 0)
Next time user tries to log in, check hashed password using current algorithm version, just like you do today. If it matches, hash it again and update PasswordVersion field.
Hopefully you'll not need a PasswordVersion column bigger than bit. =)
You should change your password database to store 3 items:
An algorithm identifier.
A random salt string chosen by the server when it first computes and stores the password hash.
The hash of the concatenation of salt+password using the specified algorithm.
Of course these could just be stored together in one text field with a delimiter:
"SHA256:this-is-salt:this-is-hash-value"
Now convert you existing entries to a value with empty salt and the old algorithm
"MD5::this-is-the-old-md5-hash-without-salt"
Now you have enough information to verify all you existing password entries, but you can also verify new entries (since you know which hash function was used). You can convert the old entries to the new algorithm the next time the existing users login since you will have their password available during this process:
If your database indicates they are using the old algorithm with no salt, first verify the password the old way by checking that the MD5 hash of the password matches. If not, reject the login.
If the password was verified, have the server choose a random salt string, compute the SHA256 hash of the salt+password, and replace the password table entry with a new one specifiy the new algorithm, salt and hash.
When the user logs in again, you'll see they are using the new algorithm, so compute the hash of the salt+password and check that it matches the stored hash.
Eventually, after this system has been running for a suitable time, you can disable accounts that haven't been converted (if desired).
The addition of a random salt string unique to each entry makes this scheme much more resistent to dictionary attacks using rainbow tables.
The best answer is from an actual cryptography expert
https://paragonie.com/blog/2016/02/how-safely-store-password-in-2016#legacy-hashes
This post also helps explain which hashing you should use. It's still current even if it says 2016. If in doubt use bcrypt.
Add a column to your user accounts table, called legacy_password (or equivalent). This is just a Boolean
Calculate the new stronger hash of the existing password hashes and store them in the database.
Modify your authentication code to handle the legacy flag.
When a user attempts to login, first check if the legacy_password flag is set. If it is, first pre-hash their password with your old password hashing algorithm, then use this prehashed value in place of their password. Afterwards (md5), recalculate the new hash and store the new hash in the database, disabling the legacy_password flag in the process.
Someone told me that he has seen software systems that:
retrieve MD5 encrypted passwords from other systems;
decrypt the encrypted passwords and
store the passwords in the database of the system using the systems own algorithm.
Is that possible? I thought that it wasn't possible / feasible to decrypt MD5 hashes.
I know there are MD5 dictionaries, but is there an actual decryption algorithm?
No. MD5 is not encryption (though it may be used as part of some encryption algorithms), it is a one way hash function. Much of the original data is actually "lost" as part of the transformation.
Think about this: An MD5 is always 128 bits long. That means that there are 2128 possible MD5 hashes. That is a reasonably large number, and yet it is most definitely finite. And yet, there are an infinite number of possible inputs to a given hash function (and most of them contain more than 128 bits, or a measly 16 bytes). So there are actually an infinite number of possibilities for data that would hash to the same value. The thing that makes hashes interesting is that it is incredibly difficult to find two pieces of data that hash to the same value, and the chances of it happening by accident are almost 0.
A simple example for a (very insecure) hash function (and this illustrates the general idea of it being one-way) would be to take all of the bits of a piece of data, and treat it as a large number. Next, perform integer division using some large (probably prime) number n and take the remainder (see: Modulus). You will be left with some number between 0 and n. If you were to perform the same calculation again (any time, on any computer, anywhere), using the exact same string, it will come up with the same value. And yet, there is no way to find out what the original value was, since there are an infinite number of numbers that have that exact remainder, when divided by n.
That said, MD5 has been found to have some weaknesses, such that with some complex mathematics, it may be possible to find a collision without trying out 2128 possible input strings. And the fact that most passwords are short, and people often use common values (like "password" or "secret") means that in some cases, you can make a reasonably good guess at someone's password by Googling for the hash or using a Rainbow table. That is one reason why you should always "salt" hashed passwords, so that two identical values, when hashed, will not hash to the same value.
Once a piece of data has been run through a hash function, there is no going back.
You can't - in theory. The whole point of a hash is that it's one way only. This means that if someone manages to get the list of hashes, they still can't get your password. Additionally it means that even if someone uses the same password on multiple sites (yes, we all know we shouldn't, but...) anyone with access to the database of site A won't be able to use the user's password on site B.
The fact that MD5 is a hash also means it loses information. For any given MD5 hash, if you allow passwords of arbitrary length there could be multiple passwords which produce the same hash. For a good hash it would be computationally infeasible to find them beyond a pretty trivial maximum length, but it means there's no guarantee that if you find a password which has the target hash, it's definitely the original password. It's astronomically unlikely that you'd see two ASCII-only, reasonable-length passwords that have the same MD5 hash, but it's not impossible.
MD5 is a bad hash to use for passwords:
It's fast, which means if you have a "target" hash, it's cheap to try lots of passwords and see whether you can find one which hashes to that target. Salting doesn't help with that scenario, but it helps to make it more expensive to try to find a password matching any one of multiple hashes using different salts.
I believe it has known flaws which make it easier to find collisions, although finding collisions within printable text (rather than arbitrary binary data) would at least be harder.
I'm not a security expert, so won't make a concrete recommendation beyond "Don't roll your own authentication system." Find one from a reputable supplier, and use that. Both the design and implementation of security systems is a tricky business.
Technically, it's 'possible', but under very strict conditions (rainbow tables, brute forcing based on the very small possibility that a user's password is in that hash database).
But that doesn't mean it's
Viable
or
Secure
You don't want to 'reverse' an MD5 hash. Using the methods outlined below, you'll never need to. 'Reversing' MD5 is actually considered malicious - a few websites offer the ability to 'crack' and bruteforce MD5 hashes - but all they are are massive databases containing dictionary words, previously submitted passwords and other words. There is a very small chance that it will have the MD5 hash you need reversed. And if you've salted the MD5 hash - this won't work either! :)
The way logins with MD5 hashing should work is:
During Registration:
User creates password -> Password is hashed using MD5 -> Hash stored in database
During Login:
User enters username and password -> (Username checked) Password is hashed using MD5 -> Hash is compared with stored hash in database
When 'Lost Password' is needed:
2 options:
User sent a random password to log in, then is bugged to change it on first login.
or
User is sent a link to change their password (with extra checking if you have a security question/etc) and then the new password is hashed and replaced with old password in database
Not directly. Because of the pigeonhole principle, there is (likely) more than one value that hashes to any given MD5 output. As such, you can't reverse it with certainty. Moreover, MD5 is made to make it difficult to find any such reversed hash (however there have been attacks that produce collisions - that is, produce two values that hash to the same result, but you can't control what the resulting MD5 value will be).
However, if you restrict the search space to, for example, common passwords with length under N, you might no longer have the irreversibility property (because the number of MD5 outputs is much greater than the number of strings in the domain of interest). Then you can use a rainbow table or similar to reverse hashes.
Not possible, at least not in a reasonable amount of time.
The way this is often handled is a password "reset". That is, you give them a new (random) password and send them that in an email.
You can't revert a md5 password.(in any language)
But you can:
give to the user a new one.
check in some rainbow table to maybe retrieve the old one.
No, he must have been confused about the MD5 dictionaries.
Cryptographic hashes (MD5, etc...) are one way and you can't get back to the original message with only the digest unless you have some other information about the original message, etc. that you shouldn't.
Decryption (directly getting the the plain text from the hashed value, in an algorithmic way), no.
There are, however, methods that use what is known as a rainbow table. It is pretty feasible if your passwords are hashed without a salt.
MD5 is a hashing algorithm, you can not revert the hash value.
You should add "change password feature", where the user gives another password, calculates the hash and store it as a new password.
There's no easy way to do it. This is kind of the point of hashing the password in the first place. :)
One thing you should be able to do is set a temporary password for them manually and send them that.
I hesitate to mention this because it's a bad idea (and it's not guaranteed to work anyway), but you could try looking up the hash in a rainbow table like milw0rm to see if you can recover the old password that way.
See all other answers here about how and why it's not reversible and why you wouldn't want to anyway.
For completeness though, there are rainbow tables which you can look up possible matches on. There is no guarantee that the answer in the rainbow table will be the original password chosen by your user so that would confuse them greatly.
Also, this will not work for salted hashes. Salting is recommended by many security experts.
No, it is not possible to reverse a hash function such as MD5: given the output hash value it is impossible to find the input message unless enough information about the input message is known.
Decryption is not a function that is defined for a hash function; encryption and decryption are functions of a cipher such as AES in CBC mode; hash functions do not encrypt nor decrypt. Hash functions are used to digest an input message. As the name implies there is no reverse algorithm possible by design.
MD5 has been designed as a cryptographically secure, one-way hash function. It is now easy to generate collisions for MD5 - even if a large part of the input message is pre-determined. So MD5 is officially broken and MD5 should not be considered a cryptographically secure hash anymore. It is however still impossible to find an input message that leads to a hash value: find X when only H(X) is known (and X doesn't have a pre-computed structure with at least one 128 byte block of precomputed data). There are no known pre-image attacks against MD5.
It is generally also possible to guess passwords using brute force or (augmented) dictionary attacks, to compare databases or to try and find password hashes in so called rainbow tables. If a match is found then it is computationally certain that the input has been found. Hash functions are also secure against collision attacks: finding X' so that H(X') = H(X) given H(X). So if an X is found it is computationally certain that it was indeed the input message. Otherwise you would have performed a collision attack after all. Rainbow tables can be used to speed up the attacks and there are specialized internet resources out there that will help you find a password given a specific hash.
It is of course possible to re-use the hash value H(X) to verify passwords that were generated on other systems. The only thing that the receiving system has to do is to store the result of a deterministic function F that takes H(X) as input. When X is given to the system then H(X) and therefore F can be recalculated and the results can be compared. In other words, it is not required to decrypt the hash value to just verify that a password is correct, and you can still store the hash as a different value.
Instead of MD5 it is important to use a password hash or PBKDF (password based key derivation function) instead. Such a function specifies how to use a salt together with a hash. That way identical hashes won't be generated for identical passwords (from other users or within other databases). Password hashes for that reason also do not allow rainbow tables to be used as long as the salt is large enough and properly randomized.
Password hashes also contain a work factor (sometimes configured using an iteration count) that can significantly slow down attacks that try to find the password given the salt and hash value. This is important as the database with salts and hash values could be stolen. Finally, the password hash may also be memory-hard so that a significant amount of memory is required to calculate the hash. This makes it impossible to use special hardware (GPU's, ASIC's, FPGA's etc.) to allow an attacker to speed up the search. Other inputs or configuration options such as a pepper or the amount of parallelization may also be available to a password hash.
It will however still allow anybody to verify a password given H(X) even if H(X) is a password hash. Password hashes are still deterministic, so if anybody has knows all the input and the hash algorithm itself then X can be used to calculate H(X) and - again - the results can be compared.
Commonly used password hashes are bcrypt, scrypt and PBKDF2. There is also Argon2 in various forms which is the winner of the reasonably recent password hashing competition. Here on CrackStation is a good blog post on doing password security right.
It is possible to make it impossible for adversaries to perform the hash calculation verify that a password is correct. For this a pepper can be used as input to the password hash. Alternatively, the hash value can of course be encrypted using a cipher such as AES and a mode of operation such as CBC or GCM. This however requires the storage of a secret / key independently and with higher access requirements than the password hash.
MD5 is considered broken, not because you can get back the original content from the hash, but because with work, you can craft two messages that hash to the same hash.
You cannot un-hash an MD5 hash.
There is no way of "reverting" a hash function in terms of finding the inverse function for it. As mentioned before, this is the whole point of having a hash function. It should not be reversible and it should allow for fast hash value calculation. So the only way to find an input string which yields a given hash value is to try out all possible combinations. This is called brute force attack for that reason.
Trying all possible combinations takes a lot of time and this is also the reason why hash values are used to store passwords in a relatively safe way. If an attacker is able to access your database with all the user passwords inside, you loose in any case. If you have hash values and (idealistically speaking) strong passwords, it will be a lot harder to get the passwords out of the hash values for the attacker.
Storing the hash values is also no performance problem because computing the hash value is relatively fast. So what most systems do is computing the hash value of the password the user keyed in (which is fast) and then compare it to the stored hash value in their user database.
You can find online tools that use a dictionary to retrieve the original message.
In some cases, the dictionary method might just be useless:
if the message is hashed using a SALT message
if the message is hash more than once
For example, here is one MD5 decrypter online tool.
The only thing that can be work is (if we mention that the passwords are just hashed, without adding any kind of salt to prevent the replay attacks, if it is so you must know the salt)by the way, get an dictionary attack tool, the files of many words, numbers etc. then create two rows, one row is word,number (in dictionary) the other one is hash of the word, and compare the hashes if matches you get it...
that's the only way, without going into cryptanalysis.
The MD5 Hash algorithm is not reversible, so MD5 decode in not possible, but some website have bulk set of password match, so you can try online for decode MD5 hash.
Try online :
MD5 Decrypt
md5online
md5decrypter
Yes, exactly what you're asking for is possible.
It is not possible to 'decrypt' an MD5 password without help, but it is possible to re-encrypt an MD5 password into another algorithm, just not all in one go.
What you do is arrange for your users to be able to logon to your new system using the old MD5 password. At the point that they login they have given your login program an unhashed version of the password that you prove matches the MD5 hash that you have. You can then convert this unhashed password to your new hashing algorithm.
Obviously, this is an extended process because you have to wait for your users to tell you what the passwords are, but it does work.
(NB: seven years later, oh well hopefully someone will find it useful)
No, it cannot be done. Either you can use a dictionary, or you can try hashing different values until you get the hash that you are seeking. But it cannot be "decrypted".
MD5 has its weaknesses (see Wikipedia), so there are some projects, which try to precompute Hashes. Wikipedia does also hint at some of these projects. One I know of (and respect) is ophrack. You can not tell the user their own password, but you might be able to tell them a password that works. But i think: Just mail thrm a new password in case they forgot.
In theory it is not possible to decrypt a hash value but you have some dirty techniques for getting the original plain text back.
Bruteforcing: All computer security algorithm suffer bruteforcing. Based on this idea today's GPU employ the idea of parallel programming using which it can get back the plain text by massively bruteforcing it using any graphics processor. This tool hashcat does this job. Last time I checked the cuda version of it, I was able to bruteforce a 7 letter long character within six minutes.
Internet search: Just copy and paste the hash on Google and see If you can find the corresponding plaintext there. This is not a solution when you are pentesting something but it is definitely worth a try. Some websites maintain the hash for almost all the words in the dictionary.
MD5 is a cryptographic (one-way) hash function, so there is no direct way to decode it. The entire purpose of a cryptographic hash function is that you can't undo it.
One thing you can do is a brute-force strategy, where you guess what was hashed, then hash it with the same function and see if it matches. Unless the hashed data is very easy to guess, it could take a long time though.
It is not yet possible to put in a hash of a password into an algorithm and get the password back in plain text because hashing is a one way thing. But what people have done is to generate hashes and store it in a big table so that when you enter a particular hash, it checks the table for the password that matches the hash and returns that password to you. An example of a site that does that is http://www.md5online.org/ . Modern password storage system counters this by using a salting algorithm such that when you enter the same password into a password box during registration different hashes are generated.
No, you can not decrypt/reverse the md5 as it is a one-way hash function till you can not found a extensive vulnerabilities in the MD5.
Another way is there are some website has a large amount of set of password database, so you can try online to decode your MD5 or SHA1 hash string.
I tried a website like http://www.mycodemyway.com/encrypt-and-decrypt/md5 and its working fine for me but this totally depends on your hash if that hash is stored in that database then you can get the actual string.
Suppose you were at liberty to decide how hashed passwords were to be stored in a DBMS. Are there obvious weaknesses in a scheme like this one?
To create the hash value stored in the DBMS, take:
A value that is unique to the DBMS server instance as part of the salt,
And the username as a second part of the salt,
And create the concatenation of the salt with the actual password,
And hash the whole string using the SHA-256 algorithm,
And store the result in the DBMS.
This would mean that anyone wanting to come up with a collision should have to do the work separately for each user name and each DBMS server instance separately. I'd plan to keep the actual hash mechanism somewhat flexible to allow for the use of the new NIST standard hash algorithm (SHA-3) that is still being worked on.
The 'value that is unique to the DBMS server instance' need not be secret - though it wouldn't be divulged casually. The intention is to ensure that if someone uses the same password in different DBMS server instances, the recorded hashes would be different. Likewise, the user name would not be secret - just the password proper.
Would there be any advantage to having the password first and the user name and 'unique value' second, or any other permutation of the three sources of data? Or what about interleaving the strings?
Do I need to add (and record) a random salt value (per password) as well as the information above? (Advantage: the user can re-use a password and still, probably, get a different hash recorded in the database. Disadvantage: the salt has to be recorded. I suspect the advantage considerably outweighs the disadvantage.)
There are quite a lot of related SO questions - this list is unlikely to be comprehensive:
Encrypting/Hashing plain text passwords in database
Secure hash and salt for PHP passwords
The necessity of hiding the salt for a hash
Clients-side MD5 hash with time salt
Simple password encryption
Salt generation and Open Source software
Password hashes: fixed-length binary fields or single string field?
I think that the answers to these questions support my algorithm (though if you simply use a random salt, then the 'unique value per server' and username components are less important).
The salt just needs to be random and unique. It can be freely known as it doesn't help an attacker. Many systems will store the plain text salt in the database in the column right next to the hashed password.
The salt helps to ensure that if two people (User A and User B) happen to share the same password it isn't obvious. Without the random and unique salt for each password the hash values would be the same and obviously if the password for User A is cracked then User B must have the same password.
It also helps protect from attacks where a dictionary of hashes can be matched against known passwords. e.g. rainbow tables.
Also using an algorithm with a "work factor" built in also means that as computational power increases the work an algorithm has to go through to create the hash can also be increased. For example, bcrypt. This means that the economics of brute force attacks become untenable. Presumably it becomes much more difficult to create tables of known hashes because they take longer to create; the variations in "work factor" will mean more tables would have to be built.
I think you are over-complicating the problem.
Start with the problem:
Are you trying to protect weak passwords?
Are you trying to mitigate against rainbow attacks?
The mechanism you propose does protect against a simple rainbow attack, cause even if user A and user B have the SAME password, the hashed password will be different. It does, seem like a rather elaborate method to be salting a password which is overly complicated.
What happens when you migrate the DB to another server?
Can you change the unique, per DB value, if so then a global rainbow table can be generated, if not then you can not restore your DB.
Instead I would just add the extra column and store a proper random salt. This would protect against any kind of rainbow attack. Across multiple deployments.
However, it will not protect you against a brute force attack. So if you are trying to protect users that have crappy passwords, you will need to look elsewhere. For example if your users have 4 letter passwords, it could probably be cracked in seconds even with a salt and the newest hash algorithm.
I think you need to ask yourself "What are you hoping to gain by making this more complicated than just generating a random salt value and storing it?" The more complicated you make your algorithm, the more likely you are to introduce a weakness inadvertently. This will probably sound snarky no matter how I say it, but it's meant helpfully - what is so special about your app that it needs a fancy new password hashing algorithm?
Why not add a random salt to the password and hash that combination. Next concatenate the hash and salt to a single byte[] and store that in the db?
The advantage of a random salt is that the user is free to change it's username. The Salt doesn't have to be secret, since it's used to prevent dictionary attacks.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
I've inherited a web app that I've just discovered stores over 300,000 usernames/passwords in plain text in a SQL Server database. I realize that this is a Very Bad Thing™.
Knowing that I'll have to update the login and password update processes to encrypt/decrypt, and with the smallest impact on the rest of the system, what would you recommend as the best way to remove the plain text passwords from the database?
Any help is appreciated.
Edit: Sorry if I was unclear, I meant to ask what would be your procedure to encrypt/hash the passwords, not specific encryption/hashing methods.
Should I just:
Make a backup of the DB
Update login/update password code
After hours, go through all records in the users table hashing the password and replacing each one
Test to ensure users can still login/update passwords
I guess my concern is more from the sheer number of users so I want to make sure I'm doing this correctly.
EDIT (2016): use Argon2, scrypt, bcrypt, or PBKDF2, in that order of preference. Use as large a slowdown factor as is feasible for your situation. Use a vetted existing implementation. Make sure you use a proper salt (although the libraries you're using should be making sure of this for you).
When you hash the passwords use DO NOT USE PLAIN MD5.
Use PBKDF2, which basically means using a random salt to prevent rainbow table attacks, and iterating (re-hashing) enough times to slow the hashing down - not so much that your application takes too long, but enough that an attacker brute-forcing a large number of different password will notice
From the document:
Iterate at least 1000 times, preferably more - time your implementation to see how many iterations are feasible for you.
8 bytes (64 bits) of salt are sufficient, and the random doesn't need to be secure (the salt is unencrypted, we're not worried someone will guess it).
A good way to apply the salt when hashing is to use HMAC with your favorite hash algorithm, using the password as the HMAC key and the salt as the text to hash (see this section of the document).
Example implementation in Python, using SHA-256 as the secure hash:
EDIT: as mentioned by Eli Collins this is not a PBKDF2 implementation. You should prefer implementations which stick to the standard, such as PassLib.
from hashlib import sha256
from hmac import HMAC
import random
def random_bytes(num_bytes):
return "".join(chr(random.randrange(256)) for i in xrange(num_bytes))
def pbkdf_sha256(password, salt, iterations):
result = password
for i in xrange(iterations):
result = HMAC(result, salt, sha256).digest() # use HMAC to apply the salt
return result
NUM_ITERATIONS = 5000
def hash_password(plain_password):
salt = random_bytes(8) # 64 bits
hashed_password = pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)
# return the salt and hashed password, encoded in base64 and split with ","
return salt.encode("base64").strip() + "," + hashed_password.encode("base64").strip()
def check_password(saved_password_entry, plain_password):
salt, hashed_password = saved_password_entry.split(",")
salt = salt.decode("base64")
hashed_password = hashed_password.decode("base64")
return hashed_password == pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)
password_entry = hash_password("mysecret")
print password_entry # will print, for example: 8Y1ZO8Y1pi4=,r7Acg5iRiZ/x4QwFLhPMjASESxesoIcdJRSDkqWYfaA=
check_password(password_entry, "mysecret") # returns True
The basic strategy is to use a key derivation function to "hash" the password with some salt. The salt and the hash result are stored in the database. When a user inputs a password, the salt and their input are hashed in the same way and compared to the stored value. If they match, the user is authenticated.
The devil is in the details. First, a lot depends on the hash algorithm that is chosen. A key derivation algorithm like PBKDF2, based on a hash-based message authentication code, makes it "computationally infeasible" to find an input (in this case, a password) that will produce a given output (what an attacker has found in the database).
A pre-computed dictionary attack uses pre-computed index, or dictionary, from hash outputs to passwords. Hashing is slow (or it's supposed to be, anyway), so the attacker hashes all of the likely passwords once, and stores the result indexed in such a way that given a hash, he can lookup a corresponding password. This is a classic tradeoff of space for time. Since password lists can be huge, there are ways to tune the tradeoff (like rainbow tables), so that an attacker can give up a little speed to save a lot of space.
Pre-computation attacks are thwarted by using "cryptographic salt". This is some data that is hashed with the password. It doesn't need to be a secret, it just needs to be unpredictable for a given password. For each value of salt, an attacker would need a new dictionary. If you use one byte of salt, an attacker needs 256 copies of their dictionary, each generated with a different salt. First, he'd use the salt to lookup the correct dictionary, then he'd use the hash output to look up a usable password. But what if you add 4 bytes? Now he needs 4 billion copies of the the dictionary. By using a large enough salt, a dictionary attack is precluded. In practice, 8 to 16 bytes of data from a cryptographic quality random number generator makes a good salt.
With pre-computation off the table, an attacker has compute the hash on each attempt. How long it takes to find a password now depends entirely on how long it takes to hash a candidate. This time is increased by iteration of the hash function. The number iterations is generally a parameter of the key derivation function; today, a lot of mobile devices use 10,000 to 20,000 iterations, while a server might use 100,000 or more. (The bcrypt algorithm uses the term "cost factor", which is a logarithmic measure of the time required.)
I would imagine you will have to add a column to the database for the encrypted password then run a batch job over all records which gets the current password, encrypts it (as others have mentiond a hash like md5 is pretty standard edit: but should not be used on its own - see other answers for good discussions), stores it in the new column and checks it all happened smoothly.
Then you will need to update your front-end to hash the user-entered password at login time and verify that vs the stored hash, rather than checking plaintext-vs-plaintext.
It would seem prudent to me to leave both columns in place for a little while to ensure that nothing hinky has gone on, before eventually removing the plaintext passwords all-together.
Don't forget also that anytime the password is acessed the code will have to change, such as password change / reminder requests. You will of course lose the ability to email out forgotten passwords, but this is no bad thing. You will have to use a password reset system instead.
Edit:
One final point, you might want to consider avoiding the error I made on my first attempt at a test-bed secure login website:
When processing the user password, consider where the hashing takes place. In my case the hash was calculated by the PHP code running on the webserver, but the password was transmitted to the page from the user's machine in plaintext! This was ok(ish) in the environment I was working in, as it was inside an https system anyway (uni network). But, in the real world I imagine you would want to hash the password before it leaves the user system, using javascript etc. and then transmit the hash to your site.
Follow Xan's advice of keeping the current password column around for a while so if things go bad, you can rollback quick-n-easy.
As far as encrypting your passwords:
use a salt
use a hash algorithm that's meant for passwords (ie., - it's slow)
See Thomas Ptacek's Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes for some details.
I think you should do the following:
Create a new column called HASHED_PASSWORD or something similar.
Modify your code so that it checks for both columns.
Gradually migrate passwords from the non-hashed table to the hashed one. For example, when a user logs in, migrate his or her password automatically to the hashed column and remove the unhashed version. All newly registered users will have hashed passwords.
After hours, you can run a script which migrates n users a time
When you have no more unhashed passwords left, you can remove your old password column (you may not be able to do so, depends on the database you are using). Also, you can remove the code to handle the old passwords.
You're done!
As the others mentioned, you don't want to decrypt if you can help it. Standard best practice is to encrypt using a one-way hash, and then when the user logs in hash their password to compare it.
Otherwise you'll have to use a strong encryption to encrypt and then decrypt. I'd only recommend this if the political reasons are strong (for example, your users are used to being able to call the help desk to retrieve their password, and you have strong pressure from the top not to change that). In that case, I'd start with encryption and then start building a business case to move to hashing.
For authentication purposes you should avoid storing the passwords using reversible encryption, i.e. you should only store the password hash and check the hash of the user-supplied password against the hash you have stored. However, that approach has a drawback: it's vulnerable to rainbow table attacks, should an attacker get hold of your password store database.
What you should do is store the hashes of a pre-chosen (and secret) salt value + the password. I.e., concatenate the salt and the password, hash the result, and store this hash. When authenticating, do the same - concatenate your salt value and the user-supplied password, hash, then check for equality. This makes rainbow table attacks unfeasible.
Of course, if the user send passwords across the network (for example, if you're working on a web or client-server application), then you should not send the password in clear text across, so instead of storing hash(salt + password) you should store and check against hash(salt + hash(password)), and have your client pre-hash the user-supplied password and send that one across the network. This protects your user's password as well, should the user (as many do) re-use the same password for multiple purposes.
Encrypt using something like MD5, encode it as a hex string
You need a salt; in your case, the username can be used as the salt (it has to be unique, the username should be the most unique value available ;-)
use the old password field to store the MD5, but tag the MD5 (i.e.g "MD5:687A878....") so that old (plain text) and new (MD5) passwords can co-exist
change the login procedure to verify against the MD5 if there is an MD5, and against the plain password otherwise
change the "change password" and "new user" functions to create MD5'ed passwords only
now you can run the conversion batch job, which might take as long as needed
after the conversion has been run, remove the legacy-support
Step 1: Add encrypted field to database
Step 2: Change code so that when password is changed, it updates both fields but logging in still uses old field.
Step 3: Run script to populate all the new fields.
Step 4: Change code so that logging in uses new field and changing passwords stops updating old field.
Step 5: Remove unencrypted passwords from database.
This should allow you to accomplish the changeover without interruption to the end user.
Also:
Something I would do is name the new database field something that is completely unrelated to password like "LastSessionID" or something similarly boring. Then instead of removing the password field, just populate with hashes of random data. Then, if your database ever gets compromised, they can spend all the time they want trying to decrypt the "password" field.
This may not actually accomplish anything, but it's fun thinking about someone sitting there trying to figure out worthless information
As with all security decisions, there are tradeoffs. If you hash the password, which is probably your easiest move, you can't offer a password retrieval function that returns the original password, nor can your staff look up a person's password in order to access their account.
You can use symmetric encryption, which has its own security drawbacks. (If your server is compromised, the symmetric encryption key may be compromised also).
You can use public-key encryption, and run password retrieval/customer service on a separate machine which stores the private key in isolation from the web application. This is the most secure, but requires a two-machine architecture, and probably a firewall in between.
MD5 and SHA1 have shown a bit of weakness (two words can result in the same hash) so using SHA256-SHA512 / iterative hashes is recommended to hash the password.
I would write a small program in the language that the application is written in that goes and generates a random salt that is unique for each user and a hash of the password. The reason I tend to use the same language as the verification is that different crypto libraries can do things slightly differently (i.e. padding) so using the same library to generate the hash and verify it eliminates that risk. This application could also then verify the login after the table has been updated if you want as it knows the plain text password still.
Don't use MD5/SHA1
Generate a good random salt (many crypto libraries have a salt generator)
An iterative hash algorithm as orip recommended
Ensure that the passwords are not transmitted in plain text over the wire
I would like to suggest one improvement to the great python example posted by Orip. I would redefine the random_bytes function to be:
def random_bytes(num_bytes):
return os.urandom(num_bytes)
Of course, you would have to import the os module. The os.urandom function provides a random sequence of bytes that can be safely used in cryptographic applications. See the reference help of this function for further details.
To hash the password you can use the HashBytes function. Returns a varbinary, so you'd have to create a new column and then delete the old varchar one.
Like
ALTER TABLE users ADD COLUMN hashedPassword varbinary(max);
ALTER TABLE users ADD COLUMN salt char(10);
--Generate random salts and update the column, after that
UPDATE users SET hashedPassword = HashBytes('SHA1',salt + '|' + password);
Then you modify the code to validate the password, using a query like
SELECT count(*) from users WHERE hashedPassword =
HashBytes('SHA1',salt + '|' + <password>)
where <password> is the value entered by the user.
I'm not a security expert, but i htink the current recommendation is to use bcrypt/blowfish or a SHA-2 variant, not MD5 / SHA1.
Probably you need to think in terms of a full security audit, too