Reversing an MD5 Hash [duplicate] - security

Someone told me that he has seen software systems that:
retrieve MD5 encrypted passwords from other systems;
decrypt the encrypted passwords and
store the passwords in the database of the system using the systems own algorithm.
Is that possible? I thought that it wasn't possible / feasible to decrypt MD5 hashes.
I know there are MD5 dictionaries, but is there an actual decryption algorithm?

No. MD5 is not encryption (though it may be used as part of some encryption algorithms), it is a one way hash function. Much of the original data is actually "lost" as part of the transformation.
Think about this: An MD5 is always 128 bits long. That means that there are 2128 possible MD5 hashes. That is a reasonably large number, and yet it is most definitely finite. And yet, there are an infinite number of possible inputs to a given hash function (and most of them contain more than 128 bits, or a measly 16 bytes). So there are actually an infinite number of possibilities for data that would hash to the same value. The thing that makes hashes interesting is that it is incredibly difficult to find two pieces of data that hash to the same value, and the chances of it happening by accident are almost 0.
A simple example for a (very insecure) hash function (and this illustrates the general idea of it being one-way) would be to take all of the bits of a piece of data, and treat it as a large number. Next, perform integer division using some large (probably prime) number n and take the remainder (see: Modulus). You will be left with some number between 0 and n. If you were to perform the same calculation again (any time, on any computer, anywhere), using the exact same string, it will come up with the same value. And yet, there is no way to find out what the original value was, since there are an infinite number of numbers that have that exact remainder, when divided by n.
That said, MD5 has been found to have some weaknesses, such that with some complex mathematics, it may be possible to find a collision without trying out 2128 possible input strings. And the fact that most passwords are short, and people often use common values (like "password" or "secret") means that in some cases, you can make a reasonably good guess at someone's password by Googling for the hash or using a Rainbow table. That is one reason why you should always "salt" hashed passwords, so that two identical values, when hashed, will not hash to the same value.
Once a piece of data has been run through a hash function, there is no going back.

You can't - in theory. The whole point of a hash is that it's one way only. This means that if someone manages to get the list of hashes, they still can't get your password. Additionally it means that even if someone uses the same password on multiple sites (yes, we all know we shouldn't, but...) anyone with access to the database of site A won't be able to use the user's password on site B.
The fact that MD5 is a hash also means it loses information. For any given MD5 hash, if you allow passwords of arbitrary length there could be multiple passwords which produce the same hash. For a good hash it would be computationally infeasible to find them beyond a pretty trivial maximum length, but it means there's no guarantee that if you find a password which has the target hash, it's definitely the original password. It's astronomically unlikely that you'd see two ASCII-only, reasonable-length passwords that have the same MD5 hash, but it's not impossible.
MD5 is a bad hash to use for passwords:
It's fast, which means if you have a "target" hash, it's cheap to try lots of passwords and see whether you can find one which hashes to that target. Salting doesn't help with that scenario, but it helps to make it more expensive to try to find a password matching any one of multiple hashes using different salts.
I believe it has known flaws which make it easier to find collisions, although finding collisions within printable text (rather than arbitrary binary data) would at least be harder.
I'm not a security expert, so won't make a concrete recommendation beyond "Don't roll your own authentication system." Find one from a reputable supplier, and use that. Both the design and implementation of security systems is a tricky business.

Technically, it's 'possible', but under very strict conditions (rainbow tables, brute forcing based on the very small possibility that a user's password is in that hash database).
But that doesn't mean it's
Viable
or
Secure
You don't want to 'reverse' an MD5 hash. Using the methods outlined below, you'll never need to. 'Reversing' MD5 is actually considered malicious - a few websites offer the ability to 'crack' and bruteforce MD5 hashes - but all they are are massive databases containing dictionary words, previously submitted passwords and other words. There is a very small chance that it will have the MD5 hash you need reversed. And if you've salted the MD5 hash - this won't work either! :)
The way logins with MD5 hashing should work is:
During Registration:
User creates password -> Password is hashed using MD5 -> Hash stored in database
During Login:
User enters username and password -> (Username checked) Password is hashed using MD5 -> Hash is compared with stored hash in database
When 'Lost Password' is needed:
2 options:
User sent a random password to log in, then is bugged to change it on first login.
or
User is sent a link to change their password (with extra checking if you have a security question/etc) and then the new password is hashed and replaced with old password in database

Not directly. Because of the pigeonhole principle, there is (likely) more than one value that hashes to any given MD5 output. As such, you can't reverse it with certainty. Moreover, MD5 is made to make it difficult to find any such reversed hash (however there have been attacks that produce collisions - that is, produce two values that hash to the same result, but you can't control what the resulting MD5 value will be).
However, if you restrict the search space to, for example, common passwords with length under N, you might no longer have the irreversibility property (because the number of MD5 outputs is much greater than the number of strings in the domain of interest). Then you can use a rainbow table or similar to reverse hashes.

Not possible, at least not in a reasonable amount of time.
The way this is often handled is a password "reset". That is, you give them a new (random) password and send them that in an email.

You can't revert a md5 password.(in any language)
But you can:
give to the user a new one.
check in some rainbow table to maybe retrieve the old one.

No, he must have been confused about the MD5 dictionaries.
Cryptographic hashes (MD5, etc...) are one way and you can't get back to the original message with only the digest unless you have some other information about the original message, etc. that you shouldn't.

Decryption (directly getting the the plain text from the hashed value, in an algorithmic way), no.
There are, however, methods that use what is known as a rainbow table. It is pretty feasible if your passwords are hashed without a salt.

MD5 is a hashing algorithm, you can not revert the hash value.
You should add "change password feature", where the user gives another password, calculates the hash and store it as a new password.

There's no easy way to do it. This is kind of the point of hashing the password in the first place. :)
One thing you should be able to do is set a temporary password for them manually and send them that.
I hesitate to mention this because it's a bad idea (and it's not guaranteed to work anyway), but you could try looking up the hash in a rainbow table like milw0rm to see if you can recover the old password that way.

See all other answers here about how and why it's not reversible and why you wouldn't want to anyway.
For completeness though, there are rainbow tables which you can look up possible matches on. There is no guarantee that the answer in the rainbow table will be the original password chosen by your user so that would confuse them greatly.
Also, this will not work for salted hashes. Salting is recommended by many security experts.

No, it is not possible to reverse a hash function such as MD5: given the output hash value it is impossible to find the input message unless enough information about the input message is known.
Decryption is not a function that is defined for a hash function; encryption and decryption are functions of a cipher such as AES in CBC mode; hash functions do not encrypt nor decrypt. Hash functions are used to digest an input message. As the name implies there is no reverse algorithm possible by design.
MD5 has been designed as a cryptographically secure, one-way hash function. It is now easy to generate collisions for MD5 - even if a large part of the input message is pre-determined. So MD5 is officially broken and MD5 should not be considered a cryptographically secure hash anymore. It is however still impossible to find an input message that leads to a hash value: find X when only H(X) is known (and X doesn't have a pre-computed structure with at least one 128 byte block of precomputed data). There are no known pre-image attacks against MD5.
It is generally also possible to guess passwords using brute force or (augmented) dictionary attacks, to compare databases or to try and find password hashes in so called rainbow tables. If a match is found then it is computationally certain that the input has been found. Hash functions are also secure against collision attacks: finding X' so that H(X') = H(X) given H(X). So if an X is found it is computationally certain that it was indeed the input message. Otherwise you would have performed a collision attack after all. Rainbow tables can be used to speed up the attacks and there are specialized internet resources out there that will help you find a password given a specific hash.
It is of course possible to re-use the hash value H(X) to verify passwords that were generated on other systems. The only thing that the receiving system has to do is to store the result of a deterministic function F that takes H(X) as input. When X is given to the system then H(X) and therefore F can be recalculated and the results can be compared. In other words, it is not required to decrypt the hash value to just verify that a password is correct, and you can still store the hash as a different value.
Instead of MD5 it is important to use a password hash or PBKDF (password based key derivation function) instead. Such a function specifies how to use a salt together with a hash. That way identical hashes won't be generated for identical passwords (from other users or within other databases). Password hashes for that reason also do not allow rainbow tables to be used as long as the salt is large enough and properly randomized.
Password hashes also contain a work factor (sometimes configured using an iteration count) that can significantly slow down attacks that try to find the password given the salt and hash value. This is important as the database with salts and hash values could be stolen. Finally, the password hash may also be memory-hard so that a significant amount of memory is required to calculate the hash. This makes it impossible to use special hardware (GPU's, ASIC's, FPGA's etc.) to allow an attacker to speed up the search. Other inputs or configuration options such as a pepper or the amount of parallelization may also be available to a password hash.
It will however still allow anybody to verify a password given H(X) even if H(X) is a password hash. Password hashes are still deterministic, so if anybody has knows all the input and the hash algorithm itself then X can be used to calculate H(X) and - again - the results can be compared.
Commonly used password hashes are bcrypt, scrypt and PBKDF2. There is also Argon2 in various forms which is the winner of the reasonably recent password hashing competition. Here on CrackStation is a good blog post on doing password security right.
It is possible to make it impossible for adversaries to perform the hash calculation verify that a password is correct. For this a pepper can be used as input to the password hash. Alternatively, the hash value can of course be encrypted using a cipher such as AES and a mode of operation such as CBC or GCM. This however requires the storage of a secret / key independently and with higher access requirements than the password hash.

MD5 is considered broken, not because you can get back the original content from the hash, but because with work, you can craft two messages that hash to the same hash.
You cannot un-hash an MD5 hash.

There is no way of "reverting" a hash function in terms of finding the inverse function for it. As mentioned before, this is the whole point of having a hash function. It should not be reversible and it should allow for fast hash value calculation. So the only way to find an input string which yields a given hash value is to try out all possible combinations. This is called brute force attack for that reason.
Trying all possible combinations takes a lot of time and this is also the reason why hash values are used to store passwords in a relatively safe way. If an attacker is able to access your database with all the user passwords inside, you loose in any case. If you have hash values and (idealistically speaking) strong passwords, it will be a lot harder to get the passwords out of the hash values for the attacker.
Storing the hash values is also no performance problem because computing the hash value is relatively fast. So what most systems do is computing the hash value of the password the user keyed in (which is fast) and then compare it to the stored hash value in their user database.

You can find online tools that use a dictionary to retrieve the original message.
In some cases, the dictionary method might just be useless:
if the message is hashed using a SALT message
if the message is hash more than once
For example, here is one MD5 decrypter online tool.

The only thing that can be work is (if we mention that the passwords are just hashed, without adding any kind of salt to prevent the replay attacks, if it is so you must know the salt)by the way, get an dictionary attack tool, the files of many words, numbers etc. then create two rows, one row is word,number (in dictionary) the other one is hash of the word, and compare the hashes if matches you get it...
that's the only way, without going into cryptanalysis.

The MD5 Hash algorithm is not reversible, so MD5 decode in not possible, but some website have bulk set of password match, so you can try online for decode MD5 hash.
Try online :
MD5 Decrypt
md5online
md5decrypter

Yes, exactly what you're asking for is possible.
It is not possible to 'decrypt' an MD5 password without help, but it is possible to re-encrypt an MD5 password into another algorithm, just not all in one go.
What you do is arrange for your users to be able to logon to your new system using the old MD5 password. At the point that they login they have given your login program an unhashed version of the password that you prove matches the MD5 hash that you have. You can then convert this unhashed password to your new hashing algorithm.
Obviously, this is an extended process because you have to wait for your users to tell you what the passwords are, but it does work.
(NB: seven years later, oh well hopefully someone will find it useful)

No, it cannot be done. Either you can use a dictionary, or you can try hashing different values until you get the hash that you are seeking. But it cannot be "decrypted".

MD5 has its weaknesses (see Wikipedia), so there are some projects, which try to precompute Hashes. Wikipedia does also hint at some of these projects. One I know of (and respect) is ophrack. You can not tell the user their own password, but you might be able to tell them a password that works. But i think: Just mail thrm a new password in case they forgot.

In theory it is not possible to decrypt a hash value but you have some dirty techniques for getting the original plain text back.
Bruteforcing: All computer security algorithm suffer bruteforcing. Based on this idea today's GPU employ the idea of parallel programming using which it can get back the plain text by massively bruteforcing it using any graphics processor. This tool hashcat does this job. Last time I checked the cuda version of it, I was able to bruteforce a 7 letter long character within six minutes.
Internet search: Just copy and paste the hash on Google and see If you can find the corresponding plaintext there. This is not a solution when you are pentesting something but it is definitely worth a try. Some websites maintain the hash for almost all the words in the dictionary.

MD5 is a cryptographic (one-way) hash function, so there is no direct way to decode it. The entire purpose of a cryptographic hash function is that you can't undo it.
One thing you can do is a brute-force strategy, where you guess what was hashed, then hash it with the same function and see if it matches. Unless the hashed data is very easy to guess, it could take a long time though.

It is not yet possible to put in a hash of a password into an algorithm and get the password back in plain text because hashing is a one way thing. But what people have done is to generate hashes and store it in a big table so that when you enter a particular hash, it checks the table for the password that matches the hash and returns that password to you. An example of a site that does that is http://www.md5online.org/ . Modern password storage system counters this by using a salting algorithm such that when you enter the same password into a password box during registration different hashes are generated.

No, you can not decrypt/reverse the md5 as it is a one-way hash function till you can not found a extensive vulnerabilities in the MD5.
Another way is there are some website has a large amount of set of password database, so you can try online to decode your MD5 or SHA1 hash string.
I tried a website like http://www.mycodemyway.com/encrypt-and-decrypt/md5 and its working fine for me but this totally depends on your hash if that hash is stored in that database then you can get the actual string.

Related

What's the big deal with brute force on hashes like MD5

I just spent some time reading https://stackoverflow.com/questions/2768248/is-md5-really-that-bad (I highly recommend!).
In it, it talks about hash collisions. Maybe I'm missing something here, but can't you just encrypt your password using, say, MD5 and then, say, SHA-1 (or any other, doesn't matter.) Wouldn't this increase the processing power required to brute-force the hash and reduce the possibility of collision?
First of all md5 and sha1 are not encryption functions, they are message digest functions. Also most hashes are broken in real world using dictionary attacks like John The Ripper and Rainbow Crack.
John The Ripper is best suited for salted passwords where the attacker knows the salt value. Rainbow Crack is good for passwords with small unknown salts and straight hashes like md5($pass).
Rainbow Crack takes a long time to build the tables, but after that passwords break in a matter of seconds. It depends on how fast your disk drives are.
You are talking about 2 distinct (although related) problems. First is the likely-hood of a collision, and the second is the ability to run the algorithm on tons of values to find the original value which created the hash.
Collisions. If you run sha1(md5(text)) you first get the hash of md5, then pass that to sha1. Lets assume the sha1 function has a 128-bit output, and the md5 also has 128-bit output. Your chance of collision in the md5 function is 1/2^128. Then your chance of collision in the sha1 is 1/2^128. If either collides then the function overall collides and hence the result is (1/2^128) + (1/2^128) or 1/2^127
Brute forcing. Running sha1(md5(text)) will only double the time it takes to find the original string. This is nothing in terms of security. FOr instance, if you have 128-bits of output space for each algorithm, and it takes 1 hour to brute force, then it will take 2 hours to run the same brute force twice to get the original string. This would be the same as increasing the output space to 129-bits. However, if you want to really make brute forcing impossible, what you have to do is double the output-size (which can be compared to the key size in encryption).
A collision attack (the type that's known against MD5, for example) does no real good. To be effective with regard to a password, you need a preimage attack (i.e. the ability to find some input that will hash to a known hash code). Though there are preimage attacks known against MD5, they're not currently practical.
Collision attacks are useful for entirely different purposes. One example that's been carried out is creating two X.509 certificates for two different identities that collide. Submit one to be signed by a certificate authority, and then you can use the other to claim that you're somebody else entirely. Since the hash will collide with the first, when/if a user tries to verify the certificate, it will show up as having been verified.
First not encryption creating Message Digest using the hash functions.
your question:
but can't you just encrypt (hash) your
password using, say, MD5 and then,
say, SHA-1 (or any other, doesn't
matter.)
if the hash function does not provide any of these properties, it does not matter how many times you hashed, also the attacker can hash n times to get the collisions.
For any given code h, it is computationally infeasible to find
such x that H(x)=h, this property is
called one way or preimage resistant.
For any given block x ,it is computationally infeasible to find y≠x
with H(y)=H(x).This property is
referred second preimage resistant or
weak collision resistant
It is computationally infeasible to find any pear (x,y) such that
H(x)=H(y). This is called Strong
collision resistant.
So as The Rook mentioned, the passwords are stored by adding different salt values for each users. The dictionary gets longer and also computational overhead and time gets longer for the attacker if she exploits the password file.
Let's say attacker has the hashed values of the passwords, and starts reading from the dictionary file and compares with the hashed values if matches then pasword is cracked, if salt is used then read from the dictionary and add some salt value then try to find a match.However this should be done for each user. So the complexity that salt adds is (from wikipedia)
Assume a user’s (encrypted) secret key
is stolen and he is known to use one
of 200,000 English words as his
password. The system uses a 32-bit
salt. The salted key is now the
original password appended to this
random 32-bit salt. Because of this
salt, the attacker’s pre-calculated
hashes are of no value. He must
calculate the hash of each word with
each of 2^32 (4,294,967,296) possible
salts appended until a match is found.
The total number of possible inputs
can be obtained by multiplying the
number of words in the dictionary with
the number of possible salts:
if H(password+salt)(in system)=H(Your password+salt) (login process)
login else
print<<error
When you hash a password multiple times you actually increase the chance of hash collisions, so best practice is to hash only once.
It also has nothing to do with how easy it will be to perform a brute-force attack. Such an attack will systematically try every possible password within a given range. Thus, if your password is "foobar" and the attack tests the password "foobar" it wont matter how or how many times you hashed the password, because the brute-force attack successfully guessed it.
Therefore, if you wish to guard yourself against a brute-force attack, you could limit how often a user can attempt authorization or require passwords to be above a certain length.
On a side note; Rainbow Tables and similar methods are used by hackers that have already gained access to your database and are meant to decrypt the stored password. In order make such an attack more difficult, you should use static and dynamic salts.
Hashing a hash is sort of "encryption though obfuscation" which isn't really a best practice. You're right in that it could theoretically "reduce" the possibility of a collision but it probably wont eliminate the possibility. Whats more, a hashing function isn't really an encrypting function, google "hashing vs encrypting" for several hundred explanations.

Is It Possible To Reconstruct a Cryptographic Hash's Key

We would like to cryptographically (SHA-256) hash a secret value in our database. Since we want to use this as a way to lookup individual records in our database, we cannot use a different random salt for each encrypted value.
My question is: given unlimited access to our database, and given that the attacker knows at least one secret value and hashed value pair, is it possible for the attacker to reverse engineer the cryptographic key? IE, would the attacker then be able to reverse all hashes and determine all secret values?
It seems like this defeats the entire purpose of a cryptographic hash if it is the case, so perhaps I'm missing something.
There are no published "first pre-image" attacks against SHA-256. Without such an attack to open a shortcut, it is impossible for an attacker to the recover a secret value from its SHA-256 hash.
However, the mention of a "secret key" might indicate some confusion about hashes. Hash algorithms don't use a key. So, if an attacker were able to attack one "secret-value–hash-value" pair, he wouldn't learn a "key" that would enable him to easily invert the rest of the hash values.
When a hash is attacked successfully, it is usually because the original message was from a small space. For example, most passwords are chosen from a relatively short list of real words, perhaps with some simple permutations. So, rather than systematically testing every possible password, the attacker starts with an ordered list of the few billion most common passwords. To avoid this, it's important to choose the "secret value" randomly from a large space.
There are message authentication algorithms that hash a secret key together with some data. These algorithms are used to protect the integrity of the message against tampering. But they don't help thwart pre-image attacks.
In short, yes.
No, a SHA hash is not reversible (at least not easily). When you Hash something if you need to reverse it you need to reconstruct the hash. This is usually done with a private (salt) and public key.
For example, if I'm trying to prevent access based off my user id. I would hash my user id and the salt. Let say MD5 for example. My user id is "12345" and the salt is "abcde"
So I will hash the string "12345_abcde", which return a hash of "7b322f78afeeb81ad92873b776558368"
Now I will pass to the validating application the hash and the public key, "12345" which is the public key and the has.
The validating application, knows the salt, so it hashes the same values. "12345_abcde", which in turn would generate the exact same hash. I then compare the hash i validated with the one passed off and they match. If I had somehow modified the public key without modifying the hash, a different has would have been generated resulting in a mismatch.
Yes it's possible, but not in this lifetime.
Modern brute-force attacks using multiple GPUs could crack this in short order. I recommend you follow the guidelines for password storage for this application. Here are the current password storage guidelines from OWASP. Currently, they recommend a long salt value, and PBKDF2 with 64,000 iterations, which iteratively stretches the key and makes it computationally complex to brute force the input values. Note that this will also make it computationally complex for you to generate your key values, but the idea is that you will be generating keys far less frequently than an attacker would have to. That said, your design requires many more key derivations than a typical password storage/challenge application, so your design may be fatally flawed. Also keep in mind that the iteration count should doubled every 18 months to make the computational complexity follow Moore's Law. This means that your system would need some way of allowing you to rehash these values (possibly by combining hash techniques). Over time, you will find that old HMAC functions are broken by cryptanalysts, and you need to be ready to update your algorithms. For example, a single iteration of MD5 or SHA-1 used to be sufficient, but it is not anymore. There are other HMAC functions that could also suit your needs that wouldn't require PBKDF2 (such as bcrypt or scrypt), but PBKDF2 is currently the industry standard that has received the most scrutiny. One could argue that bcrypt or scrypt would also be suitable, but this is yet another reason why a pluggable scheme should be used to allow you to upgrade HMAC functions over time.

Salt Generation and open source software

As I understand it, the best practice for generating salts is to use some cryptic formula (or even magic constant) stored in your source code.
I'm working on a project that we plan on releasing as open source, but the problem is that with the source comes the secret formula for generating salts, and therefore the ability to run rainbow table attacks on our site.
I figure that lots of people have contemplated this problem before me, and I'm wondering what the best practice is. It seems to me that there is no point having a salt at all if the code is open source, because salts can be easily reverse-engineered.
Thoughts?
Since questions about salting hashes come along on a quite regular basis and there seems to be quite some confusion about the subject, I extended this answer.
What is a salt?
A salt is a random set of bytes of a fixed length that is added to the input of a hash algorithm.
Why is salting (or seeding) a hash useful?
Adding a random salt to a hash ensures that the same password will produce many different hashes. The salt is usually stored in the database, together with the result of the hash function.
Salting a hash is good for a number of reasons:
Salting greatly increases the difficulty/cost of precomputated attacks (including rainbow tables)
Salting makes sure that the same password does not result in the same hash.
This makes sure you cannot determine if two users have the same password. And, even more important, you cannot determine if the same person uses the same password across different systems.
Salting increases the complexity of passwords, thereby greatly decreasing the effectiveness of both Dictionary- and Birthday attacks. (This is only true if the salt is stored separate from the hash).
Proper salting greatly increases the storage need for precomputation attacks, up to the point where they are no longer practical. (8 character case-sensitive alpha-numeric passwords with 16 bit salt, hashed to a 128 bit value, would take up just under 200 exabytes without rainbow reduction).
There is no need for the salt to be secret.
A salt is not a secret key, instead a salt 'works' by making the hash function specific to each instance. With salted hash, there is not one hash function, but one for every possible salt value. This prevent the attacker from attacking N hashed passwords for less than N times the cost of attacking one password. This is the point of the salt.
A "secret salt" is not a salt, it is called a "key", and it means that you are no longer computing a hash, but a Message Authentication Code (MAC). Computing MAC is tricky business (much trickier than simply slapping together a key and a value into a hash function) and it is a very different subject altogether.
The salt must be random for every instance in which it is used. This ensures that an attacker has to attack every salted hash separately.
If you rely on your salt (or salting algorithm) being secret, you enter the realms of Security Through Obscurity (won't work). Most probably, you do not get additional security from the salt secrecy; you just get the warm fuzzy feeling of security. So instead of making your system more secure, it just distracts you from reality.
So, why does the salt have to be random?
Technically, the salt should be unique. The point of the salt is to be distinct for each hashed password. This is meant worldwide. Since there is no central organization which distributes unique salts on demand, we have to rely on the next best thing, which is random selection with an unpredictable random generator, preferably within a salt space large enough to make collisions improbable (two instances using the same salt value).
It is tempting to try to derive a salt from some data which is "presumably unique", such as the user ID, but such schemes often fail due to some nasty details:
If you use for example the user ID, some bad guys, attacking distinct systems, may just pool their resources and create precomputed tables for user IDs 1 to 50. A user ID is unique system-wide but not worldwide.
The same applies to the username: there is one "root" per Unix system, but there are many roots in the world. A rainbow table for "root" would be worth the effort, since it could be applied to millions of systems. Worse yet, there are also many "bob" out there, and many do not have sysadmin training: their passwords could be quite weak.
Uniqueness is also temporal. Sometimes, users change their password. For each new password, a new salt must be selected. Otherwise, an attacker obtained the hash of the old password and the hash of the new could try to attack both simultaneously.
Using a random salt obtained from a cryptographically secure, unpredictable PRNG may be some kind of overkill, but at least it provably protects you against all those hazards. It's not about preventing the attacker from knowing what an individual salt is, it's about not giving them the big, fat target that will be used on a substantial number of potential targets. Random selection makes the targets as thin as is practical.
In conclusion:
Use a random, evenly distributed, high entropy salt. Use a new salt whenever you create a new password or change a password. Store the salt along with the hashed password. Favor big salts (at least 10 bytes, preferably 16 or more).
A salt does not turn a bad password into a good password. It just makes sure that the attacker will at least pay the dictionary attack price for each bad password he breaks.
Usefull sources:
stackoverflow.com: Non-random salt for password hashes
Bruce Schneier: Practical Cryptography (book)
Matasano Security: Enough with the Rainbow Tables
usenix.org: Unix crypt used salt since 1976
owasp.org: Why add salt
openwall.com: Salts
Disclaimer:
I'm not a security expert. (Although this answer was reviewed by Thomas Pornin)
If any of the security professionals out there find something wrong, please do comment or edit this wiki answer.
Really salts just need to be unique for each entry. Even if the attacker can calculate what the salt is, it makes the rainbow table extremely difficult to create. This is because the salt is added to the password before it is hashed, so it effectively adds to the total number of entries the rainbow table must contain to have a list of all possible values for a password field.
Since Unix became popular, the right way to store a password has been to append a random value (the salt) and hash it. Save the salt away where you can get to it later, but where you hope the bad guys won't get it.
This has some good effects. First, the bad guys can't just make a list of expected passwords like "Password1", hash them into a rainbow table, and go through your password file looking for matches. If you've got a good two-byte salt, they have to generate 65,536 values for each expected password, and that makes the rainbow table a lot less practical. Second, if you can keep the salt from the bad guys who are looking at your password file, you've made it much harder to calculate possible values. Third, you've made it impossible for the bad guys to determine if a given person uses the same password on different sites.
In order to do this, you generate a random salt. This should generate every number in the desired range with uniform probability. This isn't difficult; a simple linear congruential random number generator will do nicely.
If you've got complicated calculations to make the salt, you're doing it wrong. If you calculate it based on the password, you're doing it WAY wrong. In that case, all you're doing is complicating the hash, and not functionally adding any salt.
Nobody good at security would rely on concealing an algorithm. Modern cryptography is based on algorithms that have been extensively tested, and in order to be extensively tested they have to be well known. Generally, it's been found to be safer to use standard algorithms rather than rolling one's own and hoping it's good. It doesn't matter if the code is open source or not, it's still often possible for the bad guys to analyze what a program does.
You can just generate a random salt for each record at runtime. For example, say you're storing hashed user passwords in a database. You can generate an 8-character random string of lower- and uppercase alphanumeric characters at runtime, prepend that to the password, hash that string, and store it in the database. Since there are 628 possible salts, generating rainbow tables (for every possible salt) will be prohibitively expensive; and since you're using a unique salt for each password record, even if an attacker has generated a couple matching rainbow tables, he still won't be able to crack every password.
You can change the parameters of your salt generation based on your security needs; for example, you could use a longer salt, or you could generate a random string that also contains punctuation marks, to increase the number of possible salts.
Use a random function generator to generate the salt, and store it in the database, make salt one per row, and store it in the database.
I like how salt is generated in django-registration. Reference: http://bitbucket.org/ubernostrum/django-registration/src/tip/registration/models.py#cl-85
salt = sha_constructor(str(random.random())).hexdigest()[:5]
activation_key = sha_constructor(salt+user.username).hexdigest()
return self.create(user=user,
activation_key=activation_key)
He uses a combination of sha generated by a random number and the username to generate a hash.
Sha itself is well known for being strong and unbreakable. Add multiple dimensions to generate the salt itself, with random number, sha and the user specific component, you have unbreakable security!
In the case of a desktop application that encrypts data and send it on a remote server, how do you consider using a different salt each time?
Using PKCS#5 with the user's password, it needs a salt to generate an encryption key, to encrypt the data. I know that keep the salt hardcoded (obfuscated) in the desktop application is not a good idea.
If the remote server must NEVER know the user's password, is it possible to user different salt each time? If the user use the desktop application on another computer, how will it be able to decrypt the data on the remote server if he does not have the key (it is not hardcoded in the software) ?

Difference between Hashing a Password and Encrypting it

The current top-voted to this question states:
Another one that's not so much a security issue, although it is security-related, is complete and abject failure to grok the difference between hashing a password and encrypting it. Most commonly found in code where the programmer is trying to provide unsafe "Remind me of my password" functionality.
What exactly is this difference? I was always under the impression that hashing was a form of encryption. What is the unsafe functionality the poster is referring to?
Hashing is a one way function (well, a mapping). It's irreversible, you apply the secure hash algorithm and you cannot get the original string back. The most you can do is to generate what's called "a collision", that is, finding a different string that provides the same hash. Cryptographically secure hash algorithms are designed to prevent the occurrence of collisions. You can attack a secure hash by the use of a rainbow table, which you can counteract by applying a salt to the hash before storing it.
Encrypting is a proper (two way) function. It's reversible, you can decrypt the mangled string to get original string if you have the key.
The unsafe functionality it's referring to is that if you encrypt the passwords, your application has the key stored somewhere and an attacker who gets access to your database (and/or code) can get the original passwords by getting both the key and the encrypted text, whereas with a hash it's impossible.
People usually say that if a cracker owns your database or your code he doesn't need a password, thus the difference is moot. This is naïve, because you still have the duty to protect your users' passwords, mainly because most of them do use the same password over and over again, exposing them to a greater risk by leaking their passwords.
Hashing is a one-way function, meaning that once you hash a password it is very difficult to get the original password back from the hash. Encryption is a two-way function, where it's much easier to get the original text back from the encrypted text.
Plain hashing is easily defeated using a dictionary attack, where an attacker just pre-hashes every word in a dictionary (or every combination of characters up to a certain length), then uses this new dictionary to look up hashed passwords. Using a unique random salt for each hashed password stored makes it much more difficult for an attacker to use this method. They would basically need to create a new unique dictionary for every salt value that you use, slowing down their attack terribly.
It's unsafe to store passwords using an encryption algorithm because if it's easier for the user or the administrator to get the original password back from the encrypted text, it's also easier for an attacker to do the same.
As shown in the above image, if the password is encrypted it is always a hidden secret where someone can extract the plain text password. However when password is hashed, you are relaxed as there is hardly any method of recovering the password from the hash value.
Extracted from Encrypted vs Hashed Passwords - Which is better?
Is encryption good?
Plain text passwords can be encrypted using symmetric encryption algorithms like DES, AES or with any other algorithms and be stored inside the database. At the authentication (confirming the identity with user name and password), application will decrypt the encrypted password stored in database and compare with user provided password for equality. In this type of an password handling approach, even if someone get access to database tables the passwords will not be simply reusable. However there is a bad news in this approach as well. If somehow someone obtain the cryptographic algorithm along with the key used by your application, he/she will be able to view all the user passwords stored in your database by decryption. "This is the best option I got", a software developer may scream, but is there a better way?
Cryptographic hash function (one-way-only)
Yes there is, may be you have missed the point here. Did you notice that there is no requirement to decrypt and compare? If there is one-way-only conversion approach where the password can be converted into some converted-word, but the reverse operation (generation of password from converted-word) is impossible. Now even if someone gets access to the database, there is no way that the passwords be reproduced or extracted using the converted-words. In this approach, there will be hardly anyway that some could know your users' top secret passwords; and this will protect the users using the same password across multiple applications. What algorithms can be used for this approach?
I've always thought that Encryption can be converted both ways, in a way that the end value can bring you to original value and with Hashing you'll not be able to revert from the end result to the original value.
Hashing algorithms are usually cryptographic in nature, but the principal difference is that encryption is reversible through decryption, and hashing is not.
An encryption function typically takes input and produces encrypted output that is the same, or slightly larger size.
A hashing function takes input and produces a typically smaller output, typically of a fixed size as well.
While it isn't possible to take a hashed result and "dehash" it to get back the original input, you can typically brute-force your way to something that produces the same hash.
In other words, if a authentication scheme takes a password, hashes it, and compares it to a hashed version of the requires password, it might not be required that you actually know the original password, only its hash, and you can brute-force your way to something that will match, even if it's a different password.
Hashing functions are typically created to minimize the chance of collisions and make it hard to just calculate something that will produce the same hash as something else.
Hashing:
It is a one-way algorithm and once hashed can not rollback and this is its sweet point against encryption.
Encryption
If we perform encryption, there will a key to do this. If this key will be leaked all of your passwords could be decrypted easily.
On the other hand, even if your database will be hacked or your server admin took data from DB and you used hashed passwords, the hacker will not able to break these hashed passwords. This would actually practically impossible if we use hashing with proper salt and additional security with PBKDF2.
If you want to take a look at how should you write your hash functions, you can visit here.
There are many algorithms to perform hashing.
MD5 - Uses the Message Digest Algorithm 5 (MD5) hash function. The output hash is 128 bits in length. The MD5 algorithm was designed by Ron Rivest in the early 1990s and is not a preferred option today.
SHA1 - Uses Security Hash Algorithm (SHA1) hash published in 1995. The output hash is 160 bits in length. Although most widely used, this is not a preferred option today.
HMACSHA256, HMACSHA384, HMACSHA512 - Use the functions SHA-256, SHA-384, and SHA-512 of the SHA-2 family. SHA-2 was published in 2001. The output hash lengths are 256, 384, and 512 bits, respectively,as the hash functions’ names indicate.
Ideally you should do both.
First Hash the pass password for the one way security. Use a salt for extra security.
Then encrypt the hash to defend against dictionary attacks if your database of password hashes is compromised.
As correct as the other answers may be, in the context that the quote was in, hashing is a tool that may be used in securing information, encryption is a process that takes information and makes it very difficult for unauthorized people to read/use.
Here's one reason you may want to use one over the other - password retrieval.
If you only store a hash of a user's password, you can't offer a 'forgotten password' feature.

Non-random salt for password hashes

UPDATE: I recently learned from this question that in the entire discussion below, I (and I am sure others did too) was a bit confusing: What I keep calling a rainbow table, is in fact called a hash table. Rainbow tables are more complex creatures, and are actually a variant of Hellman Hash Chains. Though I believe the answer is still the same (since it doesn't come down to cryptanalysis), some of the discussion might be a bit skewed.
The question: "What are rainbow tables and how are they used?"
Typically, I always recommend using a cryptographically-strong random value as salt, to be used with hash functions (e.g. for passwords), such as to protect against Rainbow Table attacks.
But is it actually cryptographically necessary for the salt to be random? Would any unique value (unique per user, e.g. userId) suffice in this regard? It would in fact prevent using a single Rainbow Table to crack all (or most) passwords in the system...
But does lack of entropy really weaken the cryptographic strength of the hash functions?
Note, I am not asking about why to use salt, how to protect it (it doesn't need to be), using a single constant hash (don't), or what kind of hash function to use.
Just whether salt needs entropy or not.
Thanks all for the answers so far, but I'd like to focus on the areas I'm (a little) less familiar with. Mainly implications for cryptanalysis - I'd appreciate most if anyone has some input from the crypto-mathematical PoV.
Also, if there are additional vectors that hadn't been considered, that's great input too (see #Dave Sherohman point on multiple systems).
Beyond that, if you have any theory, idea or best practice - please back this up either with proof, attack scenario, or empirical evidence. Or even valid considerations for acceptable trade-offs... I'm familiar with Best Practice (capital B capital P) on the subject, I'd like to prove what value this actually provides.
EDIT: Some really good answers here, but I think as #Dave says, it comes down to Rainbow Tables for common user names... and possible less common names too. However, what if my usernames are globally unique? Not necessarily unique for my system, but per each user - e.g. email address.
There would be no incentive to build a RT for a single user (as #Dave emphasized, the salt is not kept secret), and this would still prevent clustering. Only issue would be that I might have the same email and password on a different site - but salt wouldnt prevent that anyway.
So, it comes back down to cryptanalysis - IS the entropy necessary, or not? (My current thinking is it's not necessary from a cryptanalysis point of view, but it is from other practical reasons.)
Salt is traditionally stored as a prefix to the hashed password. This already makes it known to any attacker with access to the password hash. Using the username as salt or not does not affect that knowledge and, therefore, it would have no effect on single-system security.
However, using the username or any other user-controlled value as salt would reduce cross-system security, as a user who had the same username and password on multiple systems which use the same password hashing algorithm would end up with the same password hash on each of those systems. I do not consider this a significant liability because I, as an attacker, would try passwords that a target account is known to have used on other systems first before attempting any other means of compromising the account. Identical hashes would only tell me in advance that the known password would work, they would not make the actual attack any easier. (Note, though, that a quick comparison of the account databases would provide a list of higher-priority targets, since it would tell me who is and who isn't reusing passwords.)
The greater danger from this idea is that usernames are commonly reused - just about any site you care to visit will have a user account named "Dave", for example, and "admin" or "root" are even more common - which would make construction of rainbow tables targeting users with those common names much easier and more effective.
Both of these flaws could be effectively addressed by adding a second salt value (either fixed and hidden or exposed like standard salt) to the password before hashing it, but, at that point, you may as well just be using standard entropic salt anyhow instead of working the username into it.
Edited to Add: A lot of people are talking about entropy and whether entropy in salt is important. It is, but not for the reason most of the comments on it seem to think.
The general thought seems to be that entropy is important so that the salt will be difficult for an attacker to guess. This is incorrect and, in fact, completely irrelevant. As has been pointed out a few times by various people, attacks which will be affected by salt can only be made by someone with the password database and someone with the password database can just look to see what each account's salt is. Whether it's guessable or not doesn't matter when you can trivially look it up.
The reason that entropy is important is to avoid clustering of salt values. If the salt is based on username and you know that most systems will have an account named either "root" or "admin", then you can make a rainbow table for those two salts and it will crack most systems. If, on the other hand, a random 16-bit salt is used and the random values have roughly even distribution, then you need a rainbow table for all 2^16 possible salts.
It's not about preventing the attacker from knowing what an individual account's salt is, it's about not giving them the big, fat target of a single salt that will be used on a substantial proportion of potential targets.
Using a high-entropy salt is absolutely necessary to store passwords securely.
Take my username 'gs' and add it to my password 'MyPassword' gives gsMyPassword. This is easily broken using a rainbow-table because if the username hasn't got enough entropy it could be that this value is already stored in the rainbow-table, especially if the username is short.
Another problem are attacks where you know that a user participates in two or more services. There are lots of common usernames, probably the most important ones are admin and root. If somebody created a rainbow-table that have salts with the most common usernames, he could use them to compromise accounts.
They used to have a 12-bit salt. 12 bit are 4096 different combinations. That was not secure enough because that much information can be easily stored nowadays. The same applies for the 4096 most used usernames. It's likely that a few of your users will be choosing a username that belongs to the most common usernames.
I've found this password checker which works out the entropy of your password. Having smaller entropy in passwords (like by using usernames) makes it much easier for rainbowtables as they try to cover at least all passwords with low entropy, because they are more likely to occur.
It is true that the username alone may be problematic since people may share usernames among different website. But it should be rather unproblematic if the users had a different name on each website. So why not just make it unique on each website. Hash the password somewhat like this
hashfunction("www.yourpage.com/"+username+"/"+password)
This should solve the problem. I'm not a master of cryptanalysis, but I sure doubt that the fact that we don't use high entropy would make the hash any weaker.
I like to use both: a high-entropy random per-record salt, plus the unique ID of the record itself.
Though this doesn't add much to security against dictionary attacks, etc., it does remove the fringe case where someone copies their salt and hash to another record with the intention of replacing the password with their own.
(Admittedly it's hard to think of a circumstance where this applies, but I can see no harm in belts and braces when it comes to security.)
If the salt is known or easily guessable, you have not increased the difficulty of a dictionary attack. It even may be possible to create a modified rainbow table that takes a "constant" salt into account.
Using unique salts increases the difficulty of BULK dictionary attacks.
Having unique, cryptographically strong salt value would be ideal.
I would say that as long as the salt is different for each password, you will probably be ok. The point of the salt, is so that you can't use standard rainbow table to solve every password in the database. So if you apply a different salt to every password (even if it isn't random), the attacker would basically have to compute a new rainbow table for each password, since each password uses a different salt.
Using a salt with more entropy doesn't help a whole lot, because the attacker in this case is assumed to already have the database. Since you need to be able to recreate the hash, you have to already know what the salt is. So you have to store the salt, or the values that make up the salt in your file anyway. In systems like Linux, the method for getting the salt is known, so there is no use in having a secret salt. You have to assume that the attacker who has your hash values, probably knows your salt values as well.
The strength of a hash function is not determined by its input!
Using a salt that is known to the attacker obviously makes constructing a rainbow table (particularly for hard-coded usernames like root) more attractive, but it doesn't weaken the hash. Using a salt which is unknown to the attacker will make the system harder to attack.
The concatenation of a username and password might still provide an entry for an intelligent rainbow table, so using a salt of a series pseudo-random characters, stored with the hashed password is probably a better idea. As an illustration, if I had username "potato" and password "beer", the concatenated input for your hash is "potatobeer", which is a reasonable entry for a rainbow table.
Changing the salt each time the user changes their password might help to defeat prolonged attacks, as would the enforcement of a reasonable password policy, e.g. mixed case, punctuation, min length, change after n weeks.
However, I would say your choice of digest algorithm is more important. Use of SHA-512 is going to prove to be more of a pain for someone generating a rainbow table than MD5, for example.
Salt should have as much entropy as possible to ensure that should a given input value be hashed multiple times, the resulting hash value will be, as close as can be achieved, always different.
Using ever-changing salt values with as much entropy as possible in the salt will ensure that the likelihood of hashing (say, password + salt) will produce entirely different hash values.
The less entropy in the salt, the more chance you have of generating the same salt value, as thus the more chance you have of generating the same hash value.
It is the nature of the hash value being "constant" when the input is known and "constant" that allow dictionary attacks or rainbow tables to be so effective. By varying the resulting hash value as much as possible (by using high entropy salt values) ensures that hashing the same input+random-salt will produce many different hash value results, thereby defeating (or at least greatly reducing the effectiveness of) rainbow table attacks.
Entropy is the point of Salt value.
If there is some simple and reproducible "math" behind salt, than it's the same as the salt is not there. Just adding time value should be fine.

Resources