Related
I came across a discussion in which I learned that what I'd been doing wasn't in fact salting passwords but peppering them, and I've since begun doing both with a function like:
hash_function($salt.hash_function($pepper.$password)) [multiple iterations]
Ignoring the chosen hash algorithm (I want this to be a discussion of salts & peppers and not specific algorithms but I'm using a secure one), is this a secure option or should I be doing something different? For those unfamiliar with the terms:
A salt is a randomly generated value usually stored with the string in the database designed to make it impossible to use hash tables to crack passwords. As each password has its own salt, they must all be brute-forced individually in order to crack them; however, as the salt is stored in the database with the password hash, a database compromise means losing both.
A pepper is a site-wide static value stored separately from the database (usually hard-coded in the application's source code) which is intended to be secret. It is used so that a compromise of the database would not cause the entire application's password table to be brute-forceable.
Is there anything I'm missing and is salting & peppering my passwords the best option to protect my user's security? Is there any potential security flaw to doing it this way?
Note: Assume for the purpose of the discussion that the application & database are stored on separate machines, do not share passwords etc. so a breach of the database server does not automatically mean a breach of the application server.
Ok. Seeing as I need to write about this over and over, I'll do one last canonical answer on pepper alone.
The Apparent Upside Of Peppers
It seems quite obvious that peppers should make hash functions more secure. I mean, if the attacker only gets your database, then your users passwords should be secure, right? Seems logical, right?
That's why so many people believe that peppers are a good idea. It "makes sense".
The Reality Of Peppers
In the security and cryptography realms, "make sense" isn't enough. Something has to be provable and make sense in order for it to be considered secure. Additionally, it has to be implementable in a maintainable way. The most secure system that can't be maintained is considered insecure (because if any part of that security breaks down, the entire system falls apart).
And peppers fit neither the provable or the maintainable models...
Theoretical Problems With Peppers
Now that we've set the stage, let's look at what's wrong with peppers.
Feeding one hash into another can be dangerous.
In your example, you do hash_function($salt . hash_function($pepper . $password)).
We know from past experience that "just feeding" one hash result into another hash function can decrease the overall security. The reason is that both hash functions can become a target of attack.
That's why algorithms like PBKDF2 use special operations to combine them (hmac in that case).
The point is that while it's not a big deal, it is also not a trivial thing to just throw around. Crypto systems are designed to avoid "should work" cases, and instead focus on "designed to work" cases.
While this may seem purely theoretical, it's in fact not. For example, Bcrypt cannot accept arbitrary passwords. So passing bcrypt(hash(pw), salt) can indeed result in a far weaker hash than bcrypt(pw, salt) if hash() returns a binary string.
Working Against Design
The way bcrypt (and other password hashing algorithms) were designed is to work with a salt. The concept of a pepper was never introduced. This may seem like a triviality, but it's not. The reason is that a salt is not a secret. It is just a value that can be known to an attacker. A pepper on the other hand, by very definition is a cryptographic secret.
The current password hashing algorithms (bcrypt, pbkdf2, etc) all are designed to only take in one secret value (the password). Adding in another secret into the algorithm hasn't been studied at all.
That doesn't mean it is not safe. It means we don't know if it is safe. And the general recommendation with security and cryptography is that if we don't know, it isn't.
So until algorithms are designed and vetted by cryptographers for use with secret values (peppers), current algorithms shouldn't be used with them.
Complexity Is The Enemy Of Security
Believe it or not, Complexity Is The Enemy Of Security. Making an algorithm that looks complex may be secure, or it may be not. But the chances are quite significant that it's not secure.
Significant Problems With Peppers
It's Not Maintainable
Your implementation of peppers precludes the ability to rotate the pepper key. Since the pepper is used at the input to the one way function, you can never change the pepper for the lifetime of the value. This means that you'd need to come up with some wonky hacks to get it to support key rotation.
This is extremely important as it's required whenever you store cryptographic secrets. Not having a mechanism to rotate keys (periodically, and after a breach) is a huge security vulnerability.
And your current pepper approach would require every user to either have their password completely invalidated by a rotation, or wait until their next login to rotate (which may be never)...
Which basically makes your approach an immediate no-go.
It Requires You To Roll Your Own Crypto
Since no current algorithm supports the concept of a pepper, it requires you to either compose algorithms or invent new ones to support a pepper. And if you can't immediately see why that's a really bad thing:
Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.
Bruce Schneier
NEVER roll your own crypto...
The Better Way
So, out of all the problems detailed above, there are two ways of handling the situation.
Just Use The Algorithms As They Exist
If you use bcrypt or scrypt correctly (with a high cost), all but the weakest dictionary passwords should be statistically safe. The current record for hashing bcrypt at cost 5 is 71k hashes per second. At that rate even a 6 character random password would take years to crack. And considering my minimum recommended cost is 10, that reduces the hashes per second by a factor of 32. So we'd be talking only about 2200 hashes per second. At that rate, even some dictionary phrases or modificaitons may be safe.
Additionally, we should be checking for those weak classes of passwords at the door and not allowing them in. As password cracking gets more advanced, so should password quality requirements. It's still a statistical game, but with a proper storage technique, and strong passwords, everyone should be practically very safe...
Encrypt The Output Hash Prior To Storage
There exists in the security realm an algorithm designed to handle everything we've said above. It's a block cipher. It's good, because it's reversible, so we can rotate keys (yay! maintainability!). It's good because it's being used as designed. It's good because it gives the user no information.
Let's look at that line again. Let's say that an attacker knows your algorithm (which is required for security, otherwise it's security through obscurity). With a traditional pepper approach, the attacker can create a sentinel password, and since he knows the salt and the output, he can brute force the pepper. Ok, that's a long shot, but it's possible. With a cipher, the attacker gets nothing. And since the salt is randomized, a sentinel password won't even help him/her. So the best they are left with is to attack the encrypted form. Which means that they first have to attack your encrypted hash to recover the encryption key, and then attack the hashes. But there's a lot of research into the attacking of ciphers, so we want to rely on that.
TL/DR
Don't use peppers. There are a host of problems with them, and there are two better ways: not using any server-side secret (yes, it's ok) and encrypting the output hash using a block cipher prior to storage.
Fist we should talk about the exact advantage of a pepper:
The pepper can protect weak passwords from a dictionary attack, in the special case, where the attacker has read-access to the database (containing the hashes) but does not have access to the source code with the pepper.
A typical scenario would be SQL-injection, thrown away backups, discarded servers... These situations are not as uncommon as it sounds, and often not under your control (server-hosting). If you use...
A unique salt per password
A slow hashing algorithm like BCrypt
...strong passwords are well protected. It's nearly impossible to brute force a strong password under those conditions, even when the salt is known. The problem are the weak passwords, that are part of a brute-force dictionary or are derivations of them. A dictionary attack will reveal those very fast, because you test only the most common passwords.
The second question is how to apply the pepper ?
An often recommended way to apply a pepper, is to combine the password and the pepper before passing it to the hash function:
$pepperedPassword = hash_hmac('sha512', $password, $pepper);
$passwordHash = bcrypt($pepperedPassword);
There is another even better way though:
$passwordHash = bcrypt($password);
$encryptedHash = encrypt($passwordHash, $serverSideKey);
This not only allows to add a server side secret, it also allows to exchange the $serverSideKey, should this be necessary. This method involves a bit more work, but if the code once exists (library) there is no reason not to use it.
The point of salt and pepper is to increase the cost of a pre-computed password lookup, called a rainbow table.
In general trying to find a collision for a single hash is hard (assuming the hash is secure). However, with short hashes, it is possible to use computer to generate all possible hashes into a lookup onto a hard disk. This is called a Rainbow Table. If you create a rainbow table you can then go out into the world and quickly find plausable passwords for any (unsalted unpeppered) hash.
The point of a pepper is to make the rainbow table needed to hack your password list unique. Thus wasting more time on the attacker to construct the rainbow table.
The point of the salt however is to make the rainbow table for each user be unique to the user, further increasing the complexity of the attack.
Really the point of computer security is almost never to make it (mathematically) impossible, just mathematically and physically impractical (for example in secure systems it would take all the entropy in the universe (and more) to compute a single user's password).
I want this to be a discussion of salts & peppers and not specific algorithms but I'm using a secure one
Every secure password hashing function that I know of takes the password and the salt (and the secret/pepper if supported) as separate arguments and does all of the work itself.
Merely by the fact that you're concatenating strings and that your hash_function takes only one argument, I know that you aren't using one of those well tested, well analyzed standard algorithms, but are instead trying to roll your own. Don't do that.
Argon2 won the Password Hashing Competition in 2015, and as far as I know it's still the best choice for new designs. It supports pepper via the K parameter (called "secret value" or "key"). I know of no reason not to use pepper. At worst, the pepper will be compromised along with the database and you are no worse off than if you hadn't used it.
If you can't use built-in pepper support, you can use one of the two suggested formulas from this discussion:
Argon2(salt, HMAC(pepper, password)) or HMAC(pepper, Argon2(salt, password))
Important note: if you pass the output of HMAC (or any other hashing function) to Argon2 (or any other password hashing function), either make sure that the password hashing function supports embedded zero bytes or else encode the hash value (e.g. in base64) to ensure there are no zero bytes. If you're using a language whose strings support embedded zero bytes then you are probably safe, unless that language is PHP, but I would check anyway.
Can't see storing a hardcoded value in your source code as having any security relevance. It's security through obscurity.
If a hacker acquires your database, he will be able to start brute forcing your user passwords. It won't take long for that hacker to identify your pepper if he manages to crack a few passwords.
Is it possible to reverse a SHA-1?
I'm thinking about using a SHA-1 to create a simple lightweight system to authenticate a small embedded system that communicates over an unencrypted connection.
Let's say that I create a sha1 like this with input from a "secret key" and spice it with a timestamp so that the SHA-1 will change all the time.
sha1("My Secret Key"+"a timestamp")
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1.
Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
Note1:
I guess you could brute force in some way, but how much work would that actually be?
Note2:
I don't plan to encrypt any data, I just would like to know who sent it.
No, you cannot reverse SHA-1, that is exactly why it is called a Secure Hash Algorithm.
What you should definitely be doing though, is include the message that is being transmitted into the hash calculation. Otherwise a man-in-the-middle could intercept the message, and use the signature (which only contains the sender's key and the timestamp) to attach it to a fake message (where it would still be valid).
And you should probably be using SHA-256 for new systems now.
sha("My Secret Key"+"a timestamp" + the whole message to be signed)
You also need to additionally transmit the timestamp in the clear, because otherwise you have no way to verify the digest (other than trying a lot of plausible timestamps).
If a brute force attack is feasible depends on the length of your secret key.
The security of your whole system would rely on this shared secret (because both sender and receiver need to know, but no one else). An attacker would try to go after the key (either but brute-force guessing or by trying to get it from your device) rather than trying to break SHA-1.
SHA-1 is a hash function that was designed to make it impractically difficult to reverse the operation. Such hash functions are often called one-way functions or cryptographic hash functions for this reason.
However, SHA-1's collision resistance was theoretically broken in 2005. This allows finding two different input that has the same hash value faster than the generic birthday attack that has 280 cost with 50% probability. In 2017, the collision attack become practicable as known as shattered.
As of 2015, NIST dropped SHA-1 for signatures. You should consider using something stronger like SHA-256 for new applications.
Jon Callas on SHA-1:
It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off.
The question is actually how to authenticate over an insecure session.
The standard why to do this is to use a message digest, e.g. HMAC.
You send the message plaintext as well as an accompanying hash of that message where your secret has been mixed in.
So instead of your:
sha1("My Secret Key"+"a timestamp")
You have:
msg,hmac("My Secret Key",sha(msg+msg_sequence_id))
The message sequence id is a simple counter to keep track by both parties to the number of messages they have exchanged in this 'session' - this prevents an attacker from simply replaying previous-seen messages.
This the industry standard and secure way of authenticating messages, whether they are encrypted or not.
(this is why you can't brute the hash:)
A hash is a one-way function, meaning that many inputs all produce the same output.
As you know the secret, and you can make a sensible guess as to the range of the timestamp, then you could iterate over all those timestamps, compute the hash and compare it.
Of course two or more timestamps within the range you examine might 'collide' i.e. although the timestamps are different, they generate the same hash.
So there is, fundamentally, no way to reverse the hash with any certainty.
In mathematical terms, only bijective functions have an inverse function. But hash functions are not injective as there are multiple input values that result in the same output value (collision).
So, no, hash functions can not be reversed. But you can look for such collisions.
Edit
As you want to authenticate the communication between your systems, I would suggest to use HMAC. This construct to calculate message authenticate codes can use different hash functions. You can use SHA-1, SHA-256 or whatever hash function you want.
And to authenticate the response to a specific request, I would send a nonce along with the request that needs to be used as salt to authenticate the response.
It is not entirely true that you cannot reverse SHA-1 encrypted string.
You cannot directly reverse one, but it can be done with rainbow tables.
Wikipedia:
A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a plaintext password up to a certain length consisting of a limited set of characters.
Essentially, SHA-1 is only as safe as the strength of the password used. If users have long passwords with obscure combinations of characters, it is very unlikely that existing rainbow tables will have a key for the encrypted string.
You can test your encrypted SHA-1 strings here:
http://sha1.gromweb.com/
There are other rainbow tables on the internet that you can use so Google reverse SHA1.
Note that the best attacks against MD5 and SHA-1 have been about finding any two arbitrary messages m1 and m2 where h(m1) = h(m2) or finding m2 such that h(m1) = h(m2) and m1 != m2. Finding m1, given h(m1) is still computationally infeasible.
Also, you are using a MAC (message authentication code), so an attacker can't forget a message without knowing secret with one caveat - the general MAC construction that you used is susceptible to length extension attack - an attacker can in some circumstances forge a message m2|m3, h(secret, m2|m3) given m2, h(secret, m2). This is not an issue with just timestamp but it is an issue when you compute MAC over messages of arbitrary length. You could append the secret to timestamp instead of pre-pending but in general you are better off using HMAC with SHA1 digest (HMAC is just construction and can use MD5 or SHA as digest algorithms).
Finally, you are signing just the timestamp and the not the full request. An active attacker can easily attack the system especially if you have no replay protection (although even with replay protection, this flaw exists). For example, I can capture timestamp, HMAC(timestamp with secret) from one message and then use it in my own message and the server will accept it.
Best to send message, HMAC(message) with sufficiently long secret. The server can be assured of the integrity of the message and authenticity of the client.
You can depending on your threat scenario either add replay protection or note that it is not necessary since a message when replayed in entirety does not cause any problems.
Hashes are dependent on the input, and for the same input will give the same output.
So, in addition to the other answers, please keep the following in mind:
If you start the hash with the password, it is possible to pre-compute rainbow tables, and quickly add plausible timestamp values, which is much harder if you start with the timestamp.
So, rather than use
sha1("My Secret Key"+"a timestamp")
go for
sha1("a timestamp"+"My Secret Key")
I believe the accepted answer is technically right but wrong as it applies to the use case: to create & transmit tamper evident data over public/non-trusted mediums.
Because although it is technically highly-difficult to brute-force or reverse a SHA hash, when you are sending plain text "data & a hash of the data + secret" over the internet, as noted above, it is possible to intelligently get the secret after capturing enough samples of your data. Think about it - your data may be changing, but the secret key remains the same. So every time you send a new data blob out, it's a new sample to run basic cracking algorithms on. With 2 or more samples that contain different data & a hash of the data+secret, you can verify that the secret you determine is correct and not a false positive.
This scenario is similar to how Wifi crackers can crack wifi passwords after they capture enough data packets. After you gather enough data it's trivial to generate the secret key, even though you aren't technically reversing SHA1 or even SHA256. The ONLY way to ensure that your data has not been tampered with, or to verify who you are talking to on the other end, is to encrypt the entire data blob using GPG or the like (public & private keys). Hashing is, by nature, ALWAYS insecure when the data you are hashing is visible.
Practically speaking it really depends on the application and purpose of why you are hashing in the first place. If the level of security required is trivial or say you are inside of a 100% completely trusted network, then perhaps hashing would be a viable option. Hope no one on the network, or any intruder, is interested in your data. Otherwise, as far as I can determine at this time, the only other reliably viable option is key-based encryption. You can either encrypt the entire data blob or just sign it.
Note: This was one of the ways the British were able to crack the Enigma code during WW2, leading to favor the Allies.
Any thoughts on this?
SHA1 was designed to prevent recovery of the original text from the hash. However, SHA1 databases exists, that allow to lookup the common passwords by their SHA hash.
Is it possible to reverse a SHA-1?
SHA-1 was meant to be a collision-resistant hash, whose purpose is to make it hard to find distinct messages that have the same hash. It is also designed to have preimage-resistant, that is it should be hard to find a message having a prescribed hash, and second-preimage-resistant, so that it is hard to find a second message having the same hash as a prescribed message.
SHA-1's collision resistance is broken practically in 2017 by Google's team and NIST already removed the SHA-1 for signature purposes in 2015.
SHA-1 pre-image resistance, on the other hand, still exists. One should be careful about the pre-image resistance, if the input space is short, then finding the pre-image is easy. So, your secret should be at least 128-bit.
SHA-1("My Secret Key"+"a timestamp")
This is the pre-fix secret construction has an attack case known as the length extension attack on the Merkle-Damgard based hash function like SHA-1. Applied to the Flicker. One should not use this with SHA-1 or SHA-2. One can use
HMAC-SHA-256 (HMAC doesn't require the collision resistance of the hash function therefore SHA-1 and MD5 are still fine for HMAC, however, forgot about them) to achieve a better security system. HMAC has a cost of double call of the hash function. That is a weakness for time demanded systems. A note; HMAC is a beast in cryptography.
KMAC is the pre-fix secret construction from SHA-3, since SHA-3 has resistance to length extension attack, this is secure.
Use BLAKE2 with pre-fix construction and this is also secure since it has also resistance to length extension attacks. BLAKE is a really fast hash function, and now it has a parallel version BLAKE3, too (need some time for security analysis). Wireguard uses BLAKE2 as MAC.
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1. Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
You did not define the size of your secret. If your attacker knows the timestamp, then they try to look for it by searching. If we consider the collective power of the Bitcoin miners, as of 2022, they reach around ~293 double SHA-256 in a year. Therefore, you must adjust your security according to your risk. As of 2022, NIST's minimum security is 112-bit. One should consider the above 128-bit for the secret size.
Note1: I guess you could brute force in some way, but how much work would that actually be?
Given the answer above. As a special case, against the possible implementation of Grover's algorithm ( a Quantum algorithm for finding pre-images), one should use hash functions larger than 256 output size.
Note2: I don't plan to encrypt any data, I just would like to know who sent it.
This is not the way. Your construction can only work if the secret is mutually shared like a DHKE. That is the secret only known to party the sender and you. Instead of managing this, a better way is to use digital signatures to solve this issue. Besides, one will get non-repudiation, too.
Any hashing algorithm is reversible, if applied to strings of max length L. The only matter is the value of L. To assess it exactly, you could run the state of art dehashing utility, hashcat. It is optimized to get best performance of your hardware.
That's why you need long passwords, like 12 characters. Here they say for length 8 the password is dehashed (using brute force) in 24 hours (1 GPU involved). For each extra character multiply it by alphabet length (say 50). So for 9 characters you have 50 days, for 10 you have 6 years, and so on. It's definitely inaccurate, but can give us an idea, what the numbers could be.
I've read several stackoverflow posts about this topic, particularly this one:
Secure hash and salt for PHP passwords
but I still have a few questions, I need some clarification, please let me know if the following statements are true and explain your comments:
If someone has access to your database/data, then they would still have to figure out your hashing algorithm and your data would still be somewhat secure, depending on your algorithm? All they would have is the hash and the salt.
If someone has access to your database/data and your source code, then it seems like no matter what your do, your hashing algorithm can be reversed engineered, the only thing you would have on your side would be how complex and time consuming your algorithm is?
It seems like the weakest link is: how secure your own systems are and who has access to it?
Lasse V. Karlsen ... brings up a good point, if your data is compromised then game over ... my follow up question is: what types of attacks are these hashes trying to protect against? I've read about rainbow table and dictionary attacks (brute force), but how are these attacks administered?
The security of cryptographic algorithms is always in their secret input. Reasonable cryptanalysis is based on an assumption that any attacker knows what algorithm you use. Good cryptographic hashes are non-invertible and collision resistant. This means that there's still a lot of work to do going from a hash to the value that generated it, regardless of whether you know the algorithm applied.
If you used a secure hash, access to the hash, salt, and algorithm will still leave a lot of work for a would-be attacker.
Yes, a secure hash puts a very hard to invert algorithm on your side. Note that this inversion is not 'reverse-engineering'
The weak link is probably the processes and procedures that get those password hashes into the database. There are all sorts of ways to screw up and store sensitive data in the clear.
As I noted in a comment, there are attacks that these measures defend against. First, knowing the password may lead to authorization to do things beyond what the contents of the database suggest. Second, those passwords may be used elsewhere, and you expose your users to risk by revealing their passwords as a result of a break-in. Third, with hashing, an insider can't exploit read-only access to the database (subject to less auditing, etc.) to impersonate a user.
Dictionaries and rainbow tables are techniques for accelerating hash inversion.
You question is about using passwords as an authentication mechanism and how to securely store these passwords in a database using a hash. As you probably already know the goal is to be able to verify passwords without storing these passwords i clear text in the database. In this context let me try to answer each of your questions:
If someone has access to your database/data, then they would still have to figure out your hashing algorithm and your data would still be somewhat secure, depending on your algorithm? All they would have is the hash and the salt.
The basic idea of hashing passwords is that the attacker has knowledge of the hashing algorithm and has access to both the hash and the salt. By selecting a cryptographic strong hash function and a suitable salt value that is different for each password the computational effort required to guess the password is so high that the cost exceeds the possible gain the attacker can get from guessing the password. So to answer your question, hiding the hash function does not improve the security.
If someone has access to your database/data and your source code, then it seems like no matter what your do, your hashing algorithm can be reversed engineered, the only thing you would have on your side would be how complex and time consuming your algorithm is?
You should always use a well-known (and suitably strong) hashing algorithm, and reverse engineering this algorithm is not meaningful as there is nothing hidden in your code. If you didn't mean reverse engineer but actually reverse then, yes, the passwords are protected by the complexity of reversing the hash function (or guessing a password that matches a hash value). Good hash functions makes this very hard.
It seems like the weakest link is: how secure your own systems are and who has access to it?
In general this is true, but when it comes to securing passwords by storing them as hashes you should still assume that the attacker has full access to the hashes and design your system accordingly by choosing an appropriate hash function and using salts.
What types of attacks are these hashes trying to protect against? I've read about rainbow table and dictionary attacks (brute force), but how are these attacks administered?
The basic attack that password hashing protects against is when the attacker gets access to your database. The clear text password cannot be read from the database and the password is protected.
A more sophisticated attacker can generate a list of possible passwords and compute the hash using the same algorithm as you. He can then compare the computed hash to the stored hash and if he finds a match he has a valid password. This is a brute force attack and it is generally assumed that the attacker has "offline" access to your database. By requiring the users to use long and complex passwords the effort required to "brute force" a password is significantly increased.
When the attacker wants to attack not one password, but all the passwords in the database a large table of passwords and hash value pairs can be precomputed and further improved by using what is called hash chains. Rainbow tables is an application of this idea and can be used to brute force many passwords simultaneously without increasing the effort significantly. However, if a unique salt is used to compute the hash for each password a precomputed table becomes useless as it is different for each salt and cannot be reused.
To sum it up: Security by obscurity is not a good strategy for protecting sensitive information and modern cryptography allows you to secure information without having to resort to obscurity.
what types of attacks are these hashes trying to protect against?
That type when someone gets your password from poorly secured site, reverses it, and then tries to access your bank/PayPal/etc. account. It happens all the time, and many people are still using same (and often weak) passwords everywhere.
As a side note, from what I've read, key derivation functions (PBKDF2/scrypt/bcrypt) are considered better/more secure (#1, #2) than plain salted SHA-1/SHA-2 hashes by crypto people.
If you have just a hash, no salt, then once they know your data (and algorithm) they can get your password via a rainbow table lookup. If you have a hash and a salt, they can get your password by burning a lot of CPU cycles and building a rainbow table.
If your salt is the same for all your data, they only need to burn a lot of CPU cycles once to build the table and then they have all the passwords. If your salt is not always the same, they need to burn through the CPU cycles to make a unique rainbow table for each record.
If the salt is long enough, the CPU cycles they need become very cost-prohibitive.
If you know your data security is breached, of course, you need to reset all the passwords immediately anyway, because as far as you know the attacker is willing to spend that time.
If someone has access to your database/data, then they would still
have to figure out your hashing
algorithm and your data would still be
somewhat secure, depending on your
algorithm? All they would have is the
hash and the salt.
This might be all a really dedicated opponent would need. Much of this answer depends on how valuable the data is, which would tell you how motivated the opponent is. Credit card numbers are going to be extremely valuable, and criminal attackers seem to have plenty of time and accomplices to do their dirty work. Some bad guys have been known to farm out key decryption tasks to botnets!
If someone has access to your database/data and your source code,
then it seems like no matter what your
do, your hashing algorithm can be
reversed engineered, the only thing
you would have on your side would be
how complex and time consuming your
algorithm is?
If they have access to your source and all the data, the question is going to be "how did you load your key into the memory of the server in the first place?" If it's embedded in the data or in the program code, it's game over and you've lost. If it was hand-keyed by an operator at the machine's boot time, it should be as secure as your trust in your operator. If it is stored in an HSM*, it should still be secure.
And if they have root-level authority access to your running machine, then they can probably trigger and recover a memory dump that will reveal the secret key.
It seems like the weakest link is: how
secure your own systems are and who
has access to it?
This is true. But there are alternatives that help improve security.
For bank-like protection, the kind that passes security and industry audits, it's recommended that you use a *Hardware Security Module (HSM) to perform key storage and encryption/decryption functions. The commercial strength HSMs we're looking at cost 10s of thousands of dollars or more each, depending on capacity. But I have seen hardware encryption cards that plug into a PCI slot that cost substantially less.
The idea behind an HSM is that the encryption happens on a secure, hardened platform that nobody has access to without the secret keys. Most of them have cabinets with intrusion detection switches, trip wires, epoxied chips, and memory that will self-destruct if tampered with. Not even the legitimate owner or the factory should be able to recover the database key from an HSM without the set of authorized crypto keys (usually carried on smart cards.)
For a very small installation, an HSM can be as simple as a smart card. Smart cards aren't high performance encryption devices, though, so you can't pump more than about one decryption transaction per second through them. Systems using smart cards usually just store the root key, then decrypt the working database key on the smart card and send it back to the database accessing system. These will still yield the working database key if the attacker can access running memory, or if the attacker can sniff the USB traffic to and from the smart card.
And I have no experience getting TPM chips to work (yet), but theoretically they can be used to securely store keys on a machine. Again, it is still no defense against an attacker taking a memory dump while the key is loaded in memory, but they would prevent a stolen hard drive containing code and data from revealing its secrets.
A hash cannot be reversed. Conceptually, think of a hash as taking the value to be hashed as the seed to a random number generator, then taking the 500th number that it generates. This is a repeatable process, but it is not a reversible process.
If you store a hashed password in your database, when your user logs in, you take his password from the input to the login page, you apply the same hash to it, and then you compare the result of that operation to what you have stored in the database. If they match, the user typed the right password. (Or, in theory, they could have typed something that happens to hash to the same value, but in practice, you can completely ignore this.)
The purpose of the salt is so that even if users have the same password, you can't tell, and also lots of other things which are equivalent to this idea. If the user's password is "secret", and the salt is "abc", then instead of making a hash of "secret", you hash "secretabc" and store the results of that in your database. You also store the salt, but this is perfectly safe to store -- you can't figure out any information about the password from it.
The only reason to safeguard the hashed passwords and salt is that if an attacker has a copy of it, he can test passwords offline on his own machine, rather than repeatedly trying to log in to your server, which you would probably lock him out after three attempts or something like that. Even if you don't lock him out, it's much faster to test locally than to wait for the network round-trip.
( OP )
brings up a good point, if your data
is compromised then game over ... my
follow up question is: what types of
attacks are these hashes trying to
protect against? I've read about
rainbow table and dictionary attacks
(brute force), but how are these
attacks administered
( discussion )
It's not a game, except to the attacker. Research these terms:
Sarbanes-Oxley
Gramm-Leach-Bliley Act (GLBA)
HIPAA
Digital Millenium Copyright Act (DMCA)
PATRIOT Act
Then tell us ( as thought provocation for you ) how do we protect against whom? For one thing, it is the efforts of innocents vis-a-vis intruders - and for another it is data-recovery if part of the system fails.
It is an interesting experiment that the original intent of tcp/ip and so on is advertised as being a weapon of war, survivability under attacks. Okay, so passwords are hashed - no one can recover them ...
Which, duh, includes the owner-operator of the system.
So you build a robust record locking tool that implements key controls, then political pressures force the use of brand-x tools.
You can read Federal Information Security Management Act (FISMA) and by the time you have read it some governmental entity somewhere will have had an entire disk either stolen or compromised.
How would you protect that disk if it was your personal identity information on that disk.
I can tell you from the caliber of Martin Liversage and jadeters they will be paying attention.
Here are my thoughts to your points:
If people have access to your database you have bigger security concerns than your hash algorithm and salt phrase. Hashes are somewhat secure, however there are problems such as hash collisions and hash lookups.
Hashes are one-way, so unless they can guess the input there is no way to reverse out the original text even with the algorithm and salt; hence the name one-way hash.
Security is about obscurity and layers of defense. If you layer your defenses and make determining what those defenses are you stand a much better chance of staving off an attack than if you relied on a single approach to security such as password hashing and running OS/network hardware updates. Throw in some curveballs like obsfucation of the web server platform and clear boundaries between the prod web and database environments. Layers and hiding implementation details buy you valuable time.
When hashing a password, it is one way. So it is very difficult to get the password even if you have the salt, source and alot of cpu cyles to burn.
I've seen a few questions and answers on SO suggesting that MD5 is less secure than something like SHA.
My question is, Is this worth worrying about in my situation?
Here's an example of how I'm using it:
On the client side, I'm providing a "secure" checksum for a message by appending the current time and a password and then hashing it using MD5. So: MD5(message+time+password).
On the server side, I'm checking this hash against the message that's sent using my knowledge of the time it was sent and the client's password.
In this example, am I really better off using SHA instead of MD5?
In what circumstances would the choice of hashing function really matter in a practical sense?
Edit:
Just to clarify - in my example, is there any benefit moving to an SHA algorithm?
In other words, is it feasible in this example for someone to send a message and a correct hash without knowing the shared password?
More Edits:
Apologies for repeated editing - I wasn't being clear with what I was asking.
Yes, it is worth worrying about in practice. MD5 is so badly broken that researchers have been able to forge fake certificates that matched a real certificate signed by a certificate authority. This meant that they were able to create their own fake certificate authority, and thus could impersonate any bank or business they felt like with browsers completely trusting them.
Now, this took them a lot of time and effort using a cluster of PlayStation 3s, and several weeks to find an appropriate collision. But once broken, a hash algorithm only gets worse, never better. If you care at all about security, it would be better to choose an unbroken hash algorithm, such as one of the SHA-2 family (SHA-1 has also been weakened, though not broken as badly as MD5 is).
edit: The technique used in the link that I provided you involved being able to choose two arbitrary message prefixes and a common suffix, from which it could generate for each prefix a block of data that could be inserted between that prefix and the common suffix, to produce a message with the same MD5 sum as the message constructed from the other prefix. I cannot think of a way in which this particular vulnerability could be exploited in the situation you describe, and in general, using a secure has for message authentication is more resistant to attack than using it for digital signatures, but I can think of a few vulnerabilities you need to watch out for, which are mostly independent of the hash you choose.
As described, your algorithm involves storing the password in plain text on the server. This means that you are vulnerable to any information disclosure attacks that may be able to discover passwords on the server. You may think that if an attacker can access your database then the game is up, but your users would probably prefer if even if your server is compromised, that their passwords not be. Because of the proliferation of passwords online, many users use the same or similar passwords across services. Furthermore, information disclosure attacks may be possible even in cases when code execution or privilege escalation attacks are not.
You can mitigate this attack by storing the password on your server hashed with a random salt; you store the pair <salt,hash(password+salt)> on the server, and send the salt to the client so that it can compute hash(password+salt) to use in place of the password in the protocol you mention. This does not protect you from the next attack, however.
If an attacker can sniff a message sent from the client, he can do an offline dictionary attack against the client's password. Most users have passwords with fairly low entropy, and a good dictionary of a few hundred thousand existing passwords plus some time randomly permuting them could make finding a password given the information an attacker has from sniffing a message pretty easy.
The technique you propose does not authenticate the server. I don't know if this is a web app that you are talking about, but if it is, then someone who can perform a DNS hijack attack, or DHCP hijacking on an unsecure wireless network, or anything of the sort, can just do a man-in-the-middle attack in which they collect passwords in clear text from your clients.
While the current attack against MD5 may not work against the protocol you describe, MD5 has been severely compromised, and a hash will only ever get weaker, never stronger. Do you want to bet that you will find out about new attacks that could be used against you and will have time to upgrade hash algorithms before your attackers have a chance to exploit it? It would probably be easier to start with something that is currently stronger than MD5, to reduce your chances of having to deal with MD5 being broken further.
Now, if you're just doing this to make sure no one forges a message from another user on a forum or something, then sure, it's unlikely that anyone will put the time and effort in to break the protocol that you described. If someone really wanted to impersonate someone else, they could probably just create a new user name that has a 0 in place of a O or something even more similar using Unicode, and not even bother with trying to forge message and break hash algorithms.
If this is being used for something where the security really matters, then don't invent your own authentication system. Just use TLS/SSL. One of the fundamental rules of cryptography is not to invent your own. And then even for the case of the forum where it probably doesn't matter all that much, won't it be easier to just use something that's proven off the shelf than rolling your own?
In this particular case, I don't think that the weakest link your application is using md5 rather than sha. The manner in which md5 is "broken" is that given that md5(K) = V, it is possible to generate K' such that md5(K') = V, because the output-space is limited (not because there are any tricks to reduce the search space). However, K' is not necessarily K. This means that if you know md5(M+T+P) = V, you can generate P' such that md5(M+T+P') = V, this giving a valid entry. However, in this case the message still remains the same, and P hasn't been compromised. If the attacker tries to forge message M', with a T' timestamp, then it is highly unlikely that md5(M'+T'+P') = md5(M'+T'+P) unless P' = P. In which case, they would have brute-forced the password. If they have brute-forced the password, then that means that it doesn't matter if you used sha or md5, since checking if md5(M+T+P) = V is equivalent to checking if sha(M+T+P) = V. (except that sha might take constant time longer to calculate, that doesn't affect the complexity of the brute-force on P).
However, given the choice, you really ought to just go ahead and use sha. There is no sense in not using it, unless there is a serious drawback to using it.
A second thing is you probably shouldn't store the user's password in your database in plain-text. What you should store is a hash of the password, and then use that. In your example, the hash would be of: md5(message + time + md5(password)), and you could safely store md5(password) in your database. However, an attacker stealing your database (through something like SQL injection) would still be able to forge messages. I don't see any way around this.
Brian's answer covers the issues, but I do think it needs to be explained a little less verbosely
You are using the wrong crypto algorithm here
MD5 is wrong here, Sha1 is wrong to use here Sha2xx is wrong to use and Skein is wrong to use.
What you should be using is something like RSA.
Let me explain:
Your secure hash is effectively sending the password out for the world to see.
You mention that your hash is "time + payload + password", if a third party gets a copy of your payload and knows the time. It can find the password (using a brute force or dictionary attack). So, its almost as if you are sending the password in clear text.
Instead of this you should look at a public key cryptography have your server send out public keys to your agents and have the agents encrypt the data with the public key.
No man in the middle will be able to tell whats in the messages, and no one will be able to forge the messages.
On a side note, MD5 is plenty strong most of the time.
It depends on how valuable the contents of the messages are. The SHA family is demonstrably more secure than MD5 (where "more secure" means "harder to fake"), but if your messages are twitter updates, then you probably don't care.
If those messages are the IPC layer of a distributed system that handles financial transactions, then maybe you care more.
Update: I should add, also, that the two digest algorithms are essentially interchangeable in many ways, so how much more trouble would it really be to use the more secure one?
Update 2: this is a much more thorough answer: http://www.schneier.com/essay-074.html
Yes, someone can send a message and a correct hash without knowing the shared password. They just need to find a string that hashes to the same value.
How common is that? In 2007, a group from the Netherlands announced that they had predicted the winner of the 2008 U.S. Presidential election in a file with the MD5 hash value 3D515DEAD7AA16560ABA3E9DF05CBC80. They then created twelve files, all identical except for the candidate's name and an arbitrary number of spaces following, that hashed to that value. The MD5 hash value is worthless as a checksum, because too many different files give the same result.
This is the same scenario as yours, if I'm reading you right. Just replace "candidate's name" with "secret password". If you really want to be secure, you should probably use a different hash function.
if you are going to generate a hash-mac don't invent your scheme. use HMAC. there are issues with doing HASH(secret-key || message) and HASH(message || secret-key). if you are using a password as a key you should also be using a key derivation function. have a look at pbkdf2.
Yes, it is worth to worry about which hash to use in this case. Let's look at the attack model first. An attacker might not only try to generate values md5(M+T+P), but might also try to find the password P. In particular, if the attacker can collect tupels of values Mi, Ti, and the corresponding md5(Mi, Ti, P) then he/she might try to find P. This problem hasn't been studied as extensively for hash functions as finding collisions. My approach to this problem would be to try the same types of attacks that are used against block ciphers: e.g. differential attacks. And since MD5 already highly susceptible to differential attacks, I can certainly imagine that such an attack could be successful here.
Hence I do recommend that you use a stronger hash function than MD5 here. I also recommend that you use HMAC instead of just md5(M+T+P), because HMAC has been designed for the situation that you describe and has accordingly been analyzed.
There is nothing insecure about using MD5 in this manner. MD5 was only broken in the sense that, there are algorithms that, given a bunch of data A additional data B can be generated to create a desired hash. Meaning, if someone knows the hash of a password, they could produce a string that will result with that hash. Though, these generated strings are usually very long so if you limit passwords to 20 or 30 characters you're still probably safe.
The main reason to use SHA1 over MD5 is that MD5 functions are being phased out. For example the Silverlight .Net library does not include the MD5 cryptography provider.
MD5 provide more collision than SHA which mean someone can actually get same hash from different word (but it's rarely).
SHA family has been known for it's reliability, SHA1 has been standard on daily use, while SHA256/SHA512 was a standard for government and bank appliances.
For your personal website or forum, i suggest you to consider SHA1, and if you create a more serious like commerce, i suggest you to use SHA256/SHA512 (SHA2 family)
You can check wikipedia article about MD5 & SHA
Both MD5 amd SHA-1 have cryptographic weaknesses. MD4 and SHA-0 are also compromised.
You can probably safely use MD6, Whirlpool, and RIPEMD-160.
See the following powerpoint from Princeton University, scroll down to the last page.
http://gcu.googlecode.com/files/11Hashing.pdf
I'm not going to comment on the MD5/SHA1/etc. issue, so perhaps you'll consider this answer moot, but something that amuses me very slightly is whenever the use of MD5 et al. for hashing passwords in databases comes up.
If someone's poking around in your database, then they might very well want to look at your password hashes, but it's just as likely they're going to want to steal personal information or any other data you may have lying around in other tables. Frankly, in that situation, you've got bigger fish to fry.
I'm not saying ignore the issue, and like I said, this doesn't really have much bearing on whether or not you should use MD5, SHA1 or whatever to hash your passwords, but I do get tickled slightly pink every time I read someone getting a bit too upset about plain text passwords in a database.
I'm considering the following: I have some data stream which I'd like to protect as secure as possible -- does it make any sense to apply let's say AES with some IV, then Blowfish with some IV and finally again AES with some IV?
The encryption / decryption process will be hidden (even protected against debugging) so it wont be easy to guess which crypto method and what IVs were used (however, I'm aware of the fact the power of this crypto chain can't be depend on this fact since every protection against debugging is breakable after some time).
I have computer power for this (that amount of data isn't that big) so the question only is if it's worth of implementation. For example, TripleDES worked very similarly, using three IVs and encrypt/decrypt/encrypt scheme so it probably isn't total nonsense. Another question is how much I decrease the security when I use the same IV for 1st and 3rd part or even the same IV for all three parts?
I welcome any hints on this subject
I'm not sure about this specific combination, but it's generally a bad idea to mix things like this unless that specific combination has been extensively researched. It's possible the mathematical transformations would actually counteract one another and the end result would be easier to hack. A single pass of either AES or Blowfish should be more than sufficient.
UPDATE: From my comment below…
Using TripleDES as an example: think of how much time and effort from the world's best cryptographers went into creating that combination (note that DoubleDES had a vulnerability), and the best they could do is 112 bits of security despite 192 bits of key.
UPDATE 2: I have to agree with Diomidis that AES is extremely unlikely to be the weak link in your system. Virtually every other aspect of your system is more likely to be compromised than AES.
UPDATE 3: Depending on what you're doing with the stream, you may want to just use TLS (the successor to SSL). I recommend Practical Cryptography for more details—it does a pretty good job of addressing a lot of the concerns you'll need to address. Among other things, it discusses stream ciphers, which may or may not be more appropriate than AES (since AES is a block cipher and you specifically mentioned that you had a data stream to encrypt).
I don't think you have anything to loose by applying one encryption algorithm on top of another that is very different from the first one. I would however be wary of running a second round of the same algorithm on top of the first one, even if you've run another one in-between. The interaction between the two runs may open a vulnerability.
Having said that, I think you're agonizing too much on encryption part. Most exposures of data do not happen by breaking an industry-standard encryption algorithm, like AES, but through other weaknesses in the system. I would suggest to spend more time on looking at key management, the handling of unencrypted data, weaknesses in the algorithm's implementation (the possibility of leaking data or keys), and wider system issues, for instance, what are you doing with data backups.
A hacker will always attack the weakest element in a chain. So it helps little to make a strong element even stronger. Cracking an AES encryption is already impossible with 128 Bit key length. Same goes for Blowfish. Choosing even bigger key lengths make it even harder, but actually 128 Bit has never been cracked up to now (and probably will not within the next 10 or 20 years). So this encryption is probably not the weakest element, thus why making it stronger? It is already strong.
Think about what else might be the weakest element? The IV? Actually I wouldn't waste too much time on selecting a great IV or hiding it. The weakest key is usually the enccryption key. E.g. if you are encrypting data stored to disk, but this data needs to be read by your application, your application needs to know the IV and it needs to know the encryption key, hence both of them needs to be within the binary. This is actually the weakest element. Even if you take 20 encryption methods and chain them on your data, the IVs and encryption keys of all 20 needs to be in the binary and if a hacker can extract them, the fact that you used 20 instead of 1 encryption method provided zero additional security.
Since I still don't know what the whole process is (who encrypts the data, who decrypts the data, where is the data stored, how is it transported, who needs to know the encryption keys, and so on), it's very hard to say what the weakest element really is, but I doubt that AES or Blowfish encryption itself is your weakest element.
Who are you trying to protect your data from? Your brother, your competitor, your goverment, or the aliens?
Each of these has different levels at which you could consider the data to be "as secure as possible", within a meaningful budget (of time/cash)
I wouldn't rely on obscuring the algorithms you're using. This kind of "security by obscurity" doesn't work for long. Decompiling the code is one way of revealing the crypto you're using but usually people don't keep secrets like this for long. That's why we have private/public key crypto in the first place.
Also, don't waste time obfuscating the algorithm - apply Kirchoff's principle, and remember that AES, in and of itself, is used (and acknowledged to be used) in a large number of places where the data needs to be "secure".
Damien: you're right, I should write it more clearly. I'm talking about competitor, it's for commercial use. So there's meaningful budget available but I don't want to implement it without being sure I know why I'm doing it :)
Hank: yes, this is what I'm scared of, too. The most supportive source for this idea was mentioned TripleDES. On the other side, when I use one algorithm to encrypt some data, then apply another one, it would be very strange if the 'power' of whole encryption would be lesser than using standalone algorithm. But this doesn't mean it can't be equal... This is the reason why I'm asking for some hint, this isn't my area of knowledge...
Diomidis: this is basically my point of view but my colleague is trying to convince me it really 'boosts' security. My proposal would be to use stronger encryption key instead of one algorithm after another without any thinking or deep knowledge what I'm doing.
#Miro Kropacek - your colleague is trying to add security through Voodoo. Instead, try to build something simple that you can analyse for flaws - such as just using AES.
I'm guessing it was he (she?) who suggested enhancing the security through protection from debugging too...
You can't actually make things less secure if you encrypt more than once with distinct IVs and keys, but the gain in security may be much less than you anticipate: In the example of 2DES, the meet-in-the-middle attack means it's only twice as hard to break, rather than squaring the difficulty.
In general, though, it's much safer to stick with a single well-known algorithm and increase the key length if you need more security. Leave composing cryptosystems to the experts (and I don't number myself one of them).
Encrypting twice is more secure than encrypting once, even though this may not be clear at first.
Intuitively, it appears that encrypting twice with the same algorithm gives no extra protection because an attacker might find a key which decrypts all the way from the final cyphertext back to the plaintext. ... But this is not the case.
E.g. I start with plaintext A and encrypt with key K1 it to get B. Then I encrypt B with key K2 to get C.
Intuitively, it seems reasonable to assume that there may well be a key, K3, which I could use to encrypt A and get C directly. If this is the case, then an attacker using brute force would eventually stumble upon K3 and be able to decrypt C, with the result that the extra encryption step has not added any security.
However, it is highly unlikely that such a key exists (for any modern encryption scheme). (When I say "highly unlikely" here, I mean what a normal person would express using the word "impossible").
Why?
Consider the keys as functions which provide a mapping from plaintext to cyphertext.
If our keys are all KL bits in length, then there are 2^KL such mappings.
However, if I use 2 keys of KL bits each, this gives me (2^KL)^2 mappings.
Not all of these can be equivalent to a single-stage encryption.
Another advantage of encrypting twice, if 2 different algorithms are used, is that if a vulnerability is found in one of the algorithms, the other algorithm still provides some security.
As others have noted, brute forcing the key is typically a last resort. An attacker will often try to break the process at some other point (e.g. using social engineering to discover the passphrase).
Another way of increasing security is to simply use a longer key with one encryption algorithm.
...Feel free to correct my maths!
Yes, it can be beneficial, but probably overkill in most situations. Also, as Hank mentions certain combinations can actually weaken your encryption.
TrueCrypt provides a number of combination encryption algorithms like AES-Twofish-Serpent. Of course, there's a performance penalty when using them.
Changing the algorithm is not improving the quality (except you expect an algorithm to be broken), it's only about the key/block length and some advantage in obfuscation. Doing it several times is interesting, since even if the first key leaked, the resulting data is not distinguishable from random data. There are block sizes that are processed better on a given platform (eg. register size).
Attacking quality encryption algorithms only works by brute force and thus depending on the computing power you can spend on. This means eventually you only can increase the probable
average time somebody needs to decrypt it.
If the data is of real value, they'd better not attack the data but the key holder...
I agree with what has been said above. Multiple stages of encryption won't buy you much. If you are using a 'secure' algorithm then it is practically impossible to break. Using AES in some standard streaming mode. See http://csrc.nist.gov/groups/ST/toolkit/index.html for accepted ciphers and modes. Anything recommended on that site should be sufficiently secure when used properly. If you want to be extra secure, use AES 256, although 128 should still be sufficient anyway. The greatest risks are not attacks against the algorithm itself, but rather attacks against key management, or side channel attacks (which may or may not be a risk depending on the application and usage). If you're application is vulnerable to key management attacks or to side channel attacks then it really doesn't matter how many levels of encryption you apply. This is where I would focus your efforts.