Do similar passwords have similar hashes? - security

Our computer system at work requires users to change their password every few weeks, and you cannot have the same password as you had previously. It remembers something like 20 of your last passwords. I discovered most people simply increment a digit at the end of their password, so "thisismypassword1" becomes "thisismypassword2" then 3, 4, 5 etc.
Since all of these passwords are stored somewhere, I wondered if there was any weakness in the hashes themselves, for standard hashing algorithms used to store passwords like MD5. Could a hacker increase their chances of brute-forcing the password if they have a list of hashes of similar passwords?

With a good hash algorithm, similar passwords will get distributed across the hashes. So similar passwords will have very different hashes.
You can try this with MD5 and different strings.
"hello world" - 5eb63bbbe01eeed093cb22bb8f5acdc3
"hello world" - fd27fbb9872ba413320c606fdfb98db1

Do similar passwords have similar hashes?
No.
Any similarity, even a complex correlation, would be considered a weakness in the hash. Once discovered by the crypto community it would be published, and enough discovered weaknesses in the hash eventually add up to advice not to use that hash any more.
Of course there's no way to know whether a hash has undiscovered weaknesses, or weaknesses known to an attacker but not published, in which case most likely the attacker is a well-funded government organization. The NSA certainly is in possession of non-public theoretical attacks on some crypto components, but whether those attacks are usable is another matter. GCHQ probably is. I'd guess that a few other countries have secret crypto programs with enough mathematicians to have done original work: China would be my first guess. All you can do is act on the best available information. And if the best available information says that a hash is "good for crypto", then one of the things that means is no usable similarities of this kind.
Finally, some systems use weak hashes for passwords -- either due to ignorance by the implementer or legacy. All bets are off for the properties of a hashing scheme that either hasn't had public review, or else has been reviewed and found wanting, or else is old enough that significant weaknesses have eventually been found. MD5 is broken for some purposes (since there exist practical means to generate collisions) but not for all purposes. AFAIK it's OK for this, in the sense that there is no practical pre-image attack, and having a handful of hashes of related plaintexts is no better than having a handful of hashes of unrelated plaintexts. But for unrelated reasons you shouldn't really use a single application of any hash for password storage anyway, you should use multiple rounds.
Could a hacker increase their chances of brute-forcing the password if they have a list of hashes of similar passwords?
Indirectly, yes, knowing that those are your old passwords. Not because of any property of the hash, but suppose the attacker manages to (very slowly) brute-force one or more of your old passwords using those old hashes, and sees that in the past it has been "thisismypassword3" and "thisismypassword4".
Your password has since changed, to "thisismypassword5". Well done, by changing it before the attacker cracked it, you have successfully ensured that the attacker did not recover a valuable password! Victory! Except it does you no good, since the attacker has the means to guess the new one quickly anyway using the old password(s).
Even if the attacker only has one old password, and therefore cannot easily spot a trend, password crackers work by trying passwords which are similar to dictionary words and other values. To over-simplify a bit, it will try the dictionary words first, then strings consisting of a word with one extra character added, removed or changed, then strings with two changes, and so on.
By including your old password in the "other values", the attacker can ensure that strings very similar to it are checked early in the cracking process. So if your new password is similar to old ones, then having the old hashes does have some value to the attacker - reversing any one of them gives him a good seed to crack your current password.
So, incrementing your password regularly doesn't add much. Changing your password to something that's guessable from the old password puts your attacker in the same position as they'd be in if they knew nothing at all, but your password was guessable from nothing at all.
The main practical attacks on password systems these days are eavesdropping (via keyloggers and other malware) and phishing. Trying to reverse password hashes isn't a good percentage attack, although if an attacker has somehow got hold of an /etc/passwd file or equivalent, they will break some weak passwords that way on the average system.

It depends on the hashing algorithm. If it is any good, similar passwords should not have similar hashes.

The whole point of a cryptographic hash is that similar passwords would absolutely not create similar hashes.
More importantly, you would most likely salt the password so that even the same passwords do not produce the same hash.

It depends on the hash algorithm used. A good one will distribute similiar inputs to disparate outputs.

Different Inputs may result in the same Hash this is what is called a hash collision.
Check here:
http://en.wikipedia.org/wiki/Collision_%28computer_science%29
Hash colisions may be used to increase chances of a successfull brute force attack, see:
http://en.wikipedia.org/wiki/Birthday_attack

To expand on what others have said, a quick test shows that you get vastly different hashes with small changes made to the input.
I used the following code to run a quick test:
<?php
for($i=0;$i<5;$i++)
echo 'password' . $i . ' - ' .md5('password' . $i) . "<br />\n";
?>
and I got the following results:
password0 - 305e4f55ce823e111a46a9d500bcb86c
password1 - 7c6a180b36896a0a8c02787eeafb0e4c
password2 - 6cb75f652a9b52798eb6cf2201057c73
password3 - 819b0643d6b89dc9b579fdfc9094f28e
password4 - 34cc93ece0ba9e3f6f235d4af979b16c

Short answer, no!
The output of a hash function varies greatly even if one character is increased.
But this is only if you want to break the hashfunction itself.
Of course, it is bad practice since it makes bruteforcing easier.

No, if you check the password even slightly it produces completely new hash.

As a general rule, a "good hash" will not hash two similar (but unequal) strings to similar hashes. MD5 is good enough that this isn't a problem. However, there are "rainbow tables" (essentially password:hash pairs) for quite a few common passwords (and for some password hashes, the traditional DES-based unix passwords, for example) full rainbow tables exist.

Related

Best Practices: Salting & peppering passwords?

I came across a discussion in which I learned that what I'd been doing wasn't in fact salting passwords but peppering them, and I've since begun doing both with a function like:
hash_function($salt.hash_function($pepper.$password)) [multiple iterations]
Ignoring the chosen hash algorithm (I want this to be a discussion of salts & peppers and not specific algorithms but I'm using a secure one), is this a secure option or should I be doing something different? For those unfamiliar with the terms:
A salt is a randomly generated value usually stored with the string in the database designed to make it impossible to use hash tables to crack passwords. As each password has its own salt, they must all be brute-forced individually in order to crack them; however, as the salt is stored in the database with the password hash, a database compromise means losing both.
A pepper is a site-wide static value stored separately from the database (usually hard-coded in the application's source code) which is intended to be secret. It is used so that a compromise of the database would not cause the entire application's password table to be brute-forceable.
Is there anything I'm missing and is salting & peppering my passwords the best option to protect my user's security? Is there any potential security flaw to doing it this way?
Note: Assume for the purpose of the discussion that the application & database are stored on separate machines, do not share passwords etc. so a breach of the database server does not automatically mean a breach of the application server.
Ok. Seeing as I need to write about this over and over, I'll do one last canonical answer on pepper alone.
The Apparent Upside Of Peppers
It seems quite obvious that peppers should make hash functions more secure. I mean, if the attacker only gets your database, then your users passwords should be secure, right? Seems logical, right?
That's why so many people believe that peppers are a good idea. It "makes sense".
The Reality Of Peppers
In the security and cryptography realms, "make sense" isn't enough. Something has to be provable and make sense in order for it to be considered secure. Additionally, it has to be implementable in a maintainable way. The most secure system that can't be maintained is considered insecure (because if any part of that security breaks down, the entire system falls apart).
And peppers fit neither the provable or the maintainable models...
Theoretical Problems With Peppers
Now that we've set the stage, let's look at what's wrong with peppers.
Feeding one hash into another can be dangerous.
In your example, you do hash_function($salt . hash_function($pepper . $password)).
We know from past experience that "just feeding" one hash result into another hash function can decrease the overall security. The reason is that both hash functions can become a target of attack.
That's why algorithms like PBKDF2 use special operations to combine them (hmac in that case).
The point is that while it's not a big deal, it is also not a trivial thing to just throw around. Crypto systems are designed to avoid "should work" cases, and instead focus on "designed to work" cases.
While this may seem purely theoretical, it's in fact not. For example, Bcrypt cannot accept arbitrary passwords. So passing bcrypt(hash(pw), salt) can indeed result in a far weaker hash than bcrypt(pw, salt) if hash() returns a binary string.
Working Against Design
The way bcrypt (and other password hashing algorithms) were designed is to work with a salt. The concept of a pepper was never introduced. This may seem like a triviality, but it's not. The reason is that a salt is not a secret. It is just a value that can be known to an attacker. A pepper on the other hand, by very definition is a cryptographic secret.
The current password hashing algorithms (bcrypt, pbkdf2, etc) all are designed to only take in one secret value (the password). Adding in another secret into the algorithm hasn't been studied at all.
That doesn't mean it is not safe. It means we don't know if it is safe. And the general recommendation with security and cryptography is that if we don't know, it isn't.
So until algorithms are designed and vetted by cryptographers for use with secret values (peppers), current algorithms shouldn't be used with them.
Complexity Is The Enemy Of Security
Believe it or not, Complexity Is The Enemy Of Security. Making an algorithm that looks complex may be secure, or it may be not. But the chances are quite significant that it's not secure.
Significant Problems With Peppers
It's Not Maintainable
Your implementation of peppers precludes the ability to rotate the pepper key. Since the pepper is used at the input to the one way function, you can never change the pepper for the lifetime of the value. This means that you'd need to come up with some wonky hacks to get it to support key rotation.
This is extremely important as it's required whenever you store cryptographic secrets. Not having a mechanism to rotate keys (periodically, and after a breach) is a huge security vulnerability.
And your current pepper approach would require every user to either have their password completely invalidated by a rotation, or wait until their next login to rotate (which may be never)...
Which basically makes your approach an immediate no-go.
It Requires You To Roll Your Own Crypto
Since no current algorithm supports the concept of a pepper, it requires you to either compose algorithms or invent new ones to support a pepper. And if you can't immediately see why that's a really bad thing:
Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.
Bruce Schneier
NEVER roll your own crypto...
The Better Way
So, out of all the problems detailed above, there are two ways of handling the situation.
Just Use The Algorithms As They Exist
If you use bcrypt or scrypt correctly (with a high cost), all but the weakest dictionary passwords should be statistically safe. The current record for hashing bcrypt at cost 5 is 71k hashes per second. At that rate even a 6 character random password would take years to crack. And considering my minimum recommended cost is 10, that reduces the hashes per second by a factor of 32. So we'd be talking only about 2200 hashes per second. At that rate, even some dictionary phrases or modificaitons may be safe.
Additionally, we should be checking for those weak classes of passwords at the door and not allowing them in. As password cracking gets more advanced, so should password quality requirements. It's still a statistical game, but with a proper storage technique, and strong passwords, everyone should be practically very safe...
Encrypt The Output Hash Prior To Storage
There exists in the security realm an algorithm designed to handle everything we've said above. It's a block cipher. It's good, because it's reversible, so we can rotate keys (yay! maintainability!). It's good because it's being used as designed. It's good because it gives the user no information.
Let's look at that line again. Let's say that an attacker knows your algorithm (which is required for security, otherwise it's security through obscurity). With a traditional pepper approach, the attacker can create a sentinel password, and since he knows the salt and the output, he can brute force the pepper. Ok, that's a long shot, but it's possible. With a cipher, the attacker gets nothing. And since the salt is randomized, a sentinel password won't even help him/her. So the best they are left with is to attack the encrypted form. Which means that they first have to attack your encrypted hash to recover the encryption key, and then attack the hashes. But there's a lot of research into the attacking of ciphers, so we want to rely on that.
TL/DR
Don't use peppers. There are a host of problems with them, and there are two better ways: not using any server-side secret (yes, it's ok) and encrypting the output hash using a block cipher prior to storage.
Fist we should talk about the exact advantage of a pepper:
The pepper can protect weak passwords from a dictionary attack, in the special case, where the attacker has read-access to the database (containing the hashes) but does not have access to the source code with the pepper.
A typical scenario would be SQL-injection, thrown away backups, discarded servers... These situations are not as uncommon as it sounds, and often not under your control (server-hosting). If you use...
A unique salt per password
A slow hashing algorithm like BCrypt
...strong passwords are well protected. It's nearly impossible to brute force a strong password under those conditions, even when the salt is known. The problem are the weak passwords, that are part of a brute-force dictionary or are derivations of them. A dictionary attack will reveal those very fast, because you test only the most common passwords.
The second question is how to apply the pepper ?
An often recommended way to apply a pepper, is to combine the password and the pepper before passing it to the hash function:
$pepperedPassword = hash_hmac('sha512', $password, $pepper);
$passwordHash = bcrypt($pepperedPassword);
There is another even better way though:
$passwordHash = bcrypt($password);
$encryptedHash = encrypt($passwordHash, $serverSideKey);
This not only allows to add a server side secret, it also allows to exchange the $serverSideKey, should this be necessary. This method involves a bit more work, but if the code once exists (library) there is no reason not to use it.
The point of salt and pepper is to increase the cost of a pre-computed password lookup, called a rainbow table.
In general trying to find a collision for a single hash is hard (assuming the hash is secure). However, with short hashes, it is possible to use computer to generate all possible hashes into a lookup onto a hard disk. This is called a Rainbow Table. If you create a rainbow table you can then go out into the world and quickly find plausable passwords for any (unsalted unpeppered) hash.
The point of a pepper is to make the rainbow table needed to hack your password list unique. Thus wasting more time on the attacker to construct the rainbow table.
The point of the salt however is to make the rainbow table for each user be unique to the user, further increasing the complexity of the attack.
Really the point of computer security is almost never to make it (mathematically) impossible, just mathematically and physically impractical (for example in secure systems it would take all the entropy in the universe (and more) to compute a single user's password).
I want this to be a discussion of salts & peppers and not specific algorithms but I'm using a secure one
Every secure password hashing function that I know of takes the password and the salt (and the secret/pepper if supported) as separate arguments and does all of the work itself.
Merely by the fact that you're concatenating strings and that your hash_function takes only one argument, I know that you aren't using one of those well tested, well analyzed standard algorithms, but are instead trying to roll your own. Don't do that.
Argon2 won the Password Hashing Competition in 2015, and as far as I know it's still the best choice for new designs. It supports pepper via the K parameter (called "secret value" or "key"). I know of no reason not to use pepper. At worst, the pepper will be compromised along with the database and you are no worse off than if you hadn't used it.
If you can't use built-in pepper support, you can use one of the two suggested formulas from this discussion:
Argon2(salt, HMAC(pepper, password)) or HMAC(pepper, Argon2(salt, password))
Important note: if you pass the output of HMAC (or any other hashing function) to Argon2 (or any other password hashing function), either make sure that the password hashing function supports embedded zero bytes or else encode the hash value (e.g. in base64) to ensure there are no zero bytes. If you're using a language whose strings support embedded zero bytes then you are probably safe, unless that language is PHP, but I would check anyway.
Can't see storing a hardcoded value in your source code as having any security relevance. It's security through obscurity.
If a hacker acquires your database, he will be able to start brute forcing your user passwords. It won't take long for that hacker to identify your pepper if he manages to crack a few passwords.

Hashing Passwords With Multiple Algorithms

Does using multiple algorithms make passwords more secure? (Or less?)
Just to be clear, I'm NOT talking about doing anything like this:
key = Hash(Hash(salt + password))
I'm talking about using two separate algorithms and matching both:
key1 = Hash1(user_salt1 + password)
key2 = Hash2(user_salt2 + password)
Then requiring both to match when authenticating. I've seen this suggested as a way eliminate collision matches, but I'm wondering about unintended consequences, such as creating a 'weakest link' scenario or providing information that makes the user database easier to crack, since this method provides more data than a single key does. E.g. something like combining information the hash to find them more easily. Also if collisions were truly eliminated, you could theoretically brute force the actual password not just a matching password. In fact, you'd have to in order to brute force the system at all.
I'm not actually planning to implement this, but I'm curious whether or not this is actually an improvement over the standard practice of single key = Hash(user_salt + password).
EDIT:
Many good answers, so just to surmise here, this should have been obvious looking back, but you do create a weakest link by using both, because the matches of weaker of the two algorithms can be tried against the other. Example if you used a weak (fast) MD5 and a PBKDF2, I'd brute force the MD5 first, then try any match I found against the other, so by having the MD5 (or whatever) you actual make the situation worse. Also even if both are among the more secure set (bcrypt+PBKDF2 for example), you double your exposure to one of them breaking.
The only thing this would help with would be reducing the possibility of collisions. As you mention, there are several drawbacks (weakest link being a big one).
If the goal is to reduce the possibility of collisions, the best solution would simply be to use a single secure algorithm (e.g. bcrypt) with a larger hash.
Collisions are not a concern with modern hashing algorithms. The point isn't to ensure that every hash in the database is unique. The real point is to ensure that, in the event your database is stolen or accidentally given away, the attacker has a tough time determining a user's actual password. And the chance of a modern hashing algorithm recognizing the wrong password as the right password is effectively zero -- which may be more what you're getting at here.
To be clear, there are two big reasons you might be concerned about collisions.
A collision between the "right" password and a supplied "wrong" password could allow a user with the "wrong" password to authenticate.
A collision between two users' passwords could "reveals" user A's password if user B's password is known.
Concern 1 is addressed by using a strong/modern hashing algorithm (and avoiding terribly anti-brilliant things, like looking for user records based solely on their password hash). Concern 2 is addressed with proper salting -- a "lengthy" unique salt for each password. Let me stress, proper salting is still necessary.
But, if you add hashes to the mix, you're just giving potential attackers more information. I'm not sure there's currently any known way to "triangulate" message data (passwords) from a pair of hashes, but you're not making significant gains by including another hash. It's not worth the risk that there is a way to leverage the additional information.
To answer your question:
Having a unique salt is better than having a generic salt. H(S1 + PW1) , H(S2 + PW2)
Using multiple algorithms may be better than using a single one H1(X) , H2(Y)
(But probably not, as svidgen mentions)
However,
The spirit of this question is a bit wrong for two reasons:
You should not be coming up with your own security protocol without guidance from a security expert. I know it's not your own algorithm, but most security problems start because they were used incorrectly; the algorithms themselves are usually air-tight.
You should not be using hash(salt+password) to store passwords in a database. This is because hashing was designed to be fast - not secure. It's somewhat easy with today's hardware (especially with GPU processing) to find hash collisions in older algorithms. You can of course use a newer secure Hashing Algorithm (SHA-256 or SHA-512) where collisions are not an issue - but why take chances?
You should be looking into Password-Based Key Derivation Functions (PBKDF2) which are designed to be slow to thwart this type of attack. Usually it takes a combination of salting, a secure hashing algorithm (SHA-256) and iterates a couple hundred thousand times.
Making the function take about a second is no problem for a user logging in where they won't notice such a slowdown. But for an attacker, this is a nightmare since they have to perform these iterations for every attempt; significantly slowing down any brute-force attempt.
Take a look at libraries supporting PBKDF encryption as a better way of doing this. Jasypt is one of my favorites for Java encryption.
See this related security question: How to securely hash passwords
and this loosely related SO question
A salt is added to password hashes to prevent the use of generic pre-built hash tables. The attacker would be forced to generate new tables based on their word list combined with your random salt.
As mentioned, hashes were designed to be fast for a reason. To use them for password storage, you need to slow them down (large number of nested repetitions).
You can create your own password-specific hashing method. Essentially, nest your preferred hashes on the salt+password and recurs.
string MyAlgorithm(string data) {
string temp = data;
for i = 0 to X {
temp = Hash3(Hash2(Hash1(temp)));
}
}
result = MyAlgorithm("salt+password");
Where "X" is a large number of repetitions, enough so that the whole thing takes at least a second on decent hardware. As mentioned elsewhere, the point of this delay is to be insignificant to the normal user (who knows the correct password and only waits once), but significant to the attacker (who must run this process for every combination). Of course, this is all for the sake of learning and probably simpler to just use proper existing APIs.

Any value in salting an already "strong" password?

Is there any benefit in salting passwords for a strong, unique (not used for other applications by the user) password?
Salting (as I am aware) protects against rainbow tables generated with a dictionary or common passwords. It also protects against an attacker noticing a user with the same hash in another application.
Seeing as a strong password will (likely) not appear on a generated rainbow table, and a smart user will use unique passwords for each application he wants to protect, does salting protect an already "smart" user?
this is theoretical. i have no inclination to stop salting.
in essence, doesn't the salt just become part of the password? it just happens to be supplied by the gatekeeper rather than the user.
If you can guarantee that all users will never reuse passwords, and that none of their passwords will ever be of a form that it is computationally feasible to precalculate colliding hashes for, then indeed the salt is little additional benefit.
However, the salt is also of little additional cost; while these premises are very hard indeed to guarantee, and the cost of being wrong about them is high. Keep the salt.
Apart from rainbow tables there are also bruteforce tools to resolve a hash. This doesn't prevent unsalted hashes from being resolved. It only takes a longer as stronger the password is. Salting would certainly still make sense.
This feels like you want to make an assumption, then base your security on that assumption. When you assumption becomes bad, for whatever reason, then your security becomes bad.
So how might your assumption (that strong passwords don't need salting) become invalid?
1) Over time, larger, more comprehensive rainbow tables are generated. This is something I would worry about if it is up to your user to choose a strong password. They might think they have done a good job, and you and your safety checking might think they have done a good job too, but later it turns out their thought process creating the password was easily duplicated by stringing a few words and numbers together.
2) If users cannot choose their password, your strong password generation process might, due to bug or whatever, turn out to be not as strong as you want.
3) Your user might be too lazy to come up with a site-unique/strong password! This is surely the most important problem. Do you really want to generate a system which is usable only by cryptographic experts? :)
Rainbow tables are most definitely not restricted to dictionary passwords or the like. Most tend to include every character combination up to some max length - after all, it's a one time cost for generation. Do your users all use 12+ character passwords? Unlikely.

How to store passwords *correctly*?

An article that I stumbled upon here in SO provided links to other articles which in turn provided links to even more articles etc.
And in the end I was left completely stumped - so what is the best way to store passwords in the DB? From what I can put together you should:
Use a long (at least 128 fully random bits) salt, which is stored in plaintext next to the password;
Use several iterations of SHA-256 (or even greater SHA level) on the salted password.
But... the more I read about cryptography the more I understand that I don't really understand anything, and that things I had thought to be true for years are actually are flat out wrong. Are there any experts on the subject here?
Added: Seems that some people are missing the point. I repeat the last link given above. That should clarify my concerns.
https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2007/july/enough-with-the-rainbow-tables-what-you-need-to-know-about-secure-password-schemes/
You got it right. Only two suggestions:
If one day SHA1 becomes too weak and you want to use something else, it is impossible to unhash the old passwords and rehash them with the new scheme. For this reason, I suggest that attached to each password a "version" number that tells you what scheme you used (salt length, which hash, how many times). If one day you need to switch from SHA to something stronger, you can create new-style passwords while still having old-style passwords in the database and still tell them apart. Migrating users to the new scheme will be easier.
Passwords still go from user to system without encryption. Look at SRP if that's a problem. SRP is so new that you should be a little paranoid about implementing it, but so far it looks promising.
Edit: Turns out bcrypt beat me to it on idea number 1. The stored info is (cost, salt, hash), where cost is how many times the hashing has been done. Looks like bcrypt did something right. Increasing the number of times that you hash can be done without user intervention.
In truth it depends on what the passwords are for. You should take storing any password with care, but sometimes much greater care is needed than others. As a general rule all passwords should be hashed and each password should have a unique salt.
Really, salts don't need to be that complex, even small ones can cause a real nightmare for crackers trying to gain entry into the system. They are added to a password to prevent the use of Rainbow tables to hack multiple account's passwords. I wouldn't add a single letter of the alphabet to a password and call it a salt, but you don't need to make it a unique guid which is encrypted somewhere else in the database either.
One other thing concerning salts. The key to making a password + salt work when hashing is the complexity of the combination of the two. If you have a 12 character password and add a 1 character salt to it, the salt doesn't do much, but cracking the password is still a monumental feat. The reverse is also true.
Use:
Hashed password storage
A 128+ bit user-level salt, random, regenerated (i.e. you make new salts when you make new password hashes, you don't persistently keep the same salt for a given user)
A strong, computationally expensive hashing method
Methodology that is somewhat different (hash algorithm, how many hashing iterations you use, what order the salts are concatenated in, something) from both any 'standard implementation guides' like these and from any other password storage implementation you've written
I think there no extra iteration on the password needed, juste make sure there is a salt, and a complexe one ;)
I personnaly use SHA-1 combined with 2 salt keyphrases.
The length of the salt doesnt really matter, as long as it is unique to a user. The reason for a salt is so that a given generated attempt at a hash match is only useful for a single row of your users table in the DB.
Simply said, use a cryptographically secure hash algorithm and some salt for the passwords, that should be good enough for 99.99% of all use cases. The weak link will be the code that checks the password as well as the password input.

Is MD5 less secure than SHA et. al. in a practical sense?

I've seen a few questions and answers on SO suggesting that MD5 is less secure than something like SHA.
My question is, Is this worth worrying about in my situation?
Here's an example of how I'm using it:
On the client side, I'm providing a "secure" checksum for a message by appending the current time and a password and then hashing it using MD5. So: MD5(message+time+password).
On the server side, I'm checking this hash against the message that's sent using my knowledge of the time it was sent and the client's password.
In this example, am I really better off using SHA instead of MD5?
In what circumstances would the choice of hashing function really matter in a practical sense?
Edit:
Just to clarify - in my example, is there any benefit moving to an SHA algorithm?
In other words, is it feasible in this example for someone to send a message and a correct hash without knowing the shared password?
More Edits:
Apologies for repeated editing - I wasn't being clear with what I was asking.
Yes, it is worth worrying about in practice. MD5 is so badly broken that researchers have been able to forge fake certificates that matched a real certificate signed by a certificate authority. This meant that they were able to create their own fake certificate authority, and thus could impersonate any bank or business they felt like with browsers completely trusting them.
Now, this took them a lot of time and effort using a cluster of PlayStation 3s, and several weeks to find an appropriate collision. But once broken, a hash algorithm only gets worse, never better. If you care at all about security, it would be better to choose an unbroken hash algorithm, such as one of the SHA-2 family (SHA-1 has also been weakened, though not broken as badly as MD5 is).
edit: The technique used in the link that I provided you involved being able to choose two arbitrary message prefixes and a common suffix, from which it could generate for each prefix a block of data that could be inserted between that prefix and the common suffix, to produce a message with the same MD5 sum as the message constructed from the other prefix. I cannot think of a way in which this particular vulnerability could be exploited in the situation you describe, and in general, using a secure has for message authentication is more resistant to attack than using it for digital signatures, but I can think of a few vulnerabilities you need to watch out for, which are mostly independent of the hash you choose.
As described, your algorithm involves storing the password in plain text on the server. This means that you are vulnerable to any information disclosure attacks that may be able to discover passwords on the server. You may think that if an attacker can access your database then the game is up, but your users would probably prefer if even if your server is compromised, that their passwords not be. Because of the proliferation of passwords online, many users use the same or similar passwords across services. Furthermore, information disclosure attacks may be possible even in cases when code execution or privilege escalation attacks are not.
You can mitigate this attack by storing the password on your server hashed with a random salt; you store the pair <salt,hash(password+salt)> on the server, and send the salt to the client so that it can compute hash(password+salt) to use in place of the password in the protocol you mention. This does not protect you from the next attack, however.
If an attacker can sniff a message sent from the client, he can do an offline dictionary attack against the client's password. Most users have passwords with fairly low entropy, and a good dictionary of a few hundred thousand existing passwords plus some time randomly permuting them could make finding a password given the information an attacker has from sniffing a message pretty easy.
The technique you propose does not authenticate the server. I don't know if this is a web app that you are talking about, but if it is, then someone who can perform a DNS hijack attack, or DHCP hijacking on an unsecure wireless network, or anything of the sort, can just do a man-in-the-middle attack in which they collect passwords in clear text from your clients.
While the current attack against MD5 may not work against the protocol you describe, MD5 has been severely compromised, and a hash will only ever get weaker, never stronger. Do you want to bet that you will find out about new attacks that could be used against you and will have time to upgrade hash algorithms before your attackers have a chance to exploit it? It would probably be easier to start with something that is currently stronger than MD5, to reduce your chances of having to deal with MD5 being broken further.
Now, if you're just doing this to make sure no one forges a message from another user on a forum or something, then sure, it's unlikely that anyone will put the time and effort in to break the protocol that you described. If someone really wanted to impersonate someone else, they could probably just create a new user name that has a 0 in place of a O or something even more similar using Unicode, and not even bother with trying to forge message and break hash algorithms.
If this is being used for something where the security really matters, then don't invent your own authentication system. Just use TLS/SSL. One of the fundamental rules of cryptography is not to invent your own. And then even for the case of the forum where it probably doesn't matter all that much, won't it be easier to just use something that's proven off the shelf than rolling your own?
In this particular case, I don't think that the weakest link your application is using md5 rather than sha. The manner in which md5 is "broken" is that given that md5(K) = V, it is possible to generate K' such that md5(K') = V, because the output-space is limited (not because there are any tricks to reduce the search space). However, K' is not necessarily K. This means that if you know md5(M+T+P) = V, you can generate P' such that md5(M+T+P') = V, this giving a valid entry. However, in this case the message still remains the same, and P hasn't been compromised. If the attacker tries to forge message M', with a T' timestamp, then it is highly unlikely that md5(M'+T'+P') = md5(M'+T'+P) unless P' = P. In which case, they would have brute-forced the password. If they have brute-forced the password, then that means that it doesn't matter if you used sha or md5, since checking if md5(M+T+P) = V is equivalent to checking if sha(M+T+P) = V. (except that sha might take constant time longer to calculate, that doesn't affect the complexity of the brute-force on P).
However, given the choice, you really ought to just go ahead and use sha. There is no sense in not using it, unless there is a serious drawback to using it.
A second thing is you probably shouldn't store the user's password in your database in plain-text. What you should store is a hash of the password, and then use that. In your example, the hash would be of: md5(message + time + md5(password)), and you could safely store md5(password) in your database. However, an attacker stealing your database (through something like SQL injection) would still be able to forge messages. I don't see any way around this.
Brian's answer covers the issues, but I do think it needs to be explained a little less verbosely
You are using the wrong crypto algorithm here
MD5 is wrong here, Sha1 is wrong to use here Sha2xx is wrong to use and Skein is wrong to use.
What you should be using is something like RSA.
Let me explain:
Your secure hash is effectively sending the password out for the world to see.
You mention that your hash is "time + payload + password", if a third party gets a copy of your payload and knows the time. It can find the password (using a brute force or dictionary attack). So, its almost as if you are sending the password in clear text.
Instead of this you should look at a public key cryptography have your server send out public keys to your agents and have the agents encrypt the data with the public key.
No man in the middle will be able to tell whats in the messages, and no one will be able to forge the messages.
On a side note, MD5 is plenty strong most of the time.
It depends on how valuable the contents of the messages are. The SHA family is demonstrably more secure than MD5 (where "more secure" means "harder to fake"), but if your messages are twitter updates, then you probably don't care.
If those messages are the IPC layer of a distributed system that handles financial transactions, then maybe you care more.
Update: I should add, also, that the two digest algorithms are essentially interchangeable in many ways, so how much more trouble would it really be to use the more secure one?
Update 2: this is a much more thorough answer: http://www.schneier.com/essay-074.html
Yes, someone can send a message and a correct hash without knowing the shared password. They just need to find a string that hashes to the same value.
How common is that? In 2007, a group from the Netherlands announced that they had predicted the winner of the 2008 U.S. Presidential election in a file with the MD5 hash value 3D515DEAD7AA16560ABA3E9DF05CBC80. They then created twelve files, all identical except for the candidate's name and an arbitrary number of spaces following, that hashed to that value. The MD5 hash value is worthless as a checksum, because too many different files give the same result.
This is the same scenario as yours, if I'm reading you right. Just replace "candidate's name" with "secret password". If you really want to be secure, you should probably use a different hash function.
if you are going to generate a hash-mac don't invent your scheme. use HMAC. there are issues with doing HASH(secret-key || message) and HASH(message || secret-key). if you are using a password as a key you should also be using a key derivation function. have a look at pbkdf2.
Yes, it is worth to worry about which hash to use in this case. Let's look at the attack model first. An attacker might not only try to generate values md5(M+T+P), but might also try to find the password P. In particular, if the attacker can collect tupels of values Mi, Ti, and the corresponding md5(Mi, Ti, P) then he/she might try to find P. This problem hasn't been studied as extensively for hash functions as finding collisions. My approach to this problem would be to try the same types of attacks that are used against block ciphers: e.g. differential attacks. And since MD5 already highly susceptible to differential attacks, I can certainly imagine that such an attack could be successful here.
Hence I do recommend that you use a stronger hash function than MD5 here. I also recommend that you use HMAC instead of just md5(M+T+P), because HMAC has been designed for the situation that you describe and has accordingly been analyzed.
There is nothing insecure about using MD5 in this manner. MD5 was only broken in the sense that, there are algorithms that, given a bunch of data A additional data B can be generated to create a desired hash. Meaning, if someone knows the hash of a password, they could produce a string that will result with that hash. Though, these generated strings are usually very long so if you limit passwords to 20 or 30 characters you're still probably safe.
The main reason to use SHA1 over MD5 is that MD5 functions are being phased out. For example the Silverlight .Net library does not include the MD5 cryptography provider.
MD5 provide more collision than SHA which mean someone can actually get same hash from different word (but it's rarely).
SHA family has been known for it's reliability, SHA1 has been standard on daily use, while SHA256/SHA512 was a standard for government and bank appliances.
For your personal website or forum, i suggest you to consider SHA1, and if you create a more serious like commerce, i suggest you to use SHA256/SHA512 (SHA2 family)
You can check wikipedia article about MD5 & SHA
Both MD5 amd SHA-1 have cryptographic weaknesses. MD4 and SHA-0 are also compromised.
You can probably safely use MD6, Whirlpool, and RIPEMD-160.
See the following powerpoint from Princeton University, scroll down to the last page.
http://gcu.googlecode.com/files/11Hashing.pdf
I'm not going to comment on the MD5/SHA1/etc. issue, so perhaps you'll consider this answer moot, but something that amuses me very slightly is whenever the use of MD5 et al. for hashing passwords in databases comes up.
If someone's poking around in your database, then they might very well want to look at your password hashes, but it's just as likely they're going to want to steal personal information or any other data you may have lying around in other tables. Frankly, in that situation, you've got bigger fish to fry.
I'm not saying ignore the issue, and like I said, this doesn't really have much bearing on whether or not you should use MD5, SHA1 or whatever to hash your passwords, but I do get tickled slightly pink every time I read someone getting a bit too upset about plain text passwords in a database.

Resources