Double hashing security

Double hashing security - security

My first question is, I've heard that hashing the string 2 times (e.g. sha1(sha1(password)) ), because the second hash has a fixed length, is it true??
My the second question is, which is safer? (var1 and var2 are 2 strings):
sha1(var1 + sha1(var2))
sha1(var1 + var2)
If it is the 1st one, is it worth the performance cost?

By hashing the string twice, you are increasing the risk of collisions, which is bad security-wise.
Instead of having an infinite amount of inputs leading to 2128 possible outputs, you will have 2128 possible inputs leading to maybe less than 2128 outputs.
Using salt and pepper before hashing at least keeps the input to infinite possibilities. Hashing twice increases the run time of your script, and increases the risk of collisions, unless you maintain the source of infinite input.

sha1(var1 + sha1(var2))
sha1(var1 + var2)
Neither is very secure. Don't reinvent the wheel. HMAC was designed for this; in particular, it takes steps (padding, etc) to avoid issues with certain weaknesses in certain hash algorithms interacting badly with the above 'simple' implementations.
Additionally, double hashing isn't useful to improve security. Consider: at best, you'll map from one random-ish value to another random-ish value. However, if there is a collision two resulting values from the second hash are equal, you've lost security by ending up with a collision. That is, you can never get rid of a collision this way, but you can gain one.
That said, existing preimage attacks might not work against double hashing ... but! They might still work. I'm not qualified to say which, and if you're asking this here, you're probably not either. Certainly collision attacks, which are all that are practical with MD5 today, aren't hindered by double hashing.
It's best to stick with the proven algorithms that have stood up to decades of analysis, because with cryptography, it's all too easy to make something that looks secure but isn't.

I doubt this would add any security to it. It would increase the runtime of your script, but would not add a lot of security to the hash. Remember that when someone tries to break your hash, they don't need to find the exact value of var1 and var2, they only need to find one that results in the same hash.

I am no expert on cryptography, but as far as I know you will gain no security from hashing a value multiple times besides slowing down the process. So if you hash 1000 times, you will make an attack (for example a dictionary attack) 1000 times slower but this is quite irrelevant in the case of hashing once or twice.

Using just one and no salt allows it to easily be solved.
There are many large tables that contain tons of these hashes so a single hash with no salt can be solved in next to nothing.
Adding in a second hash will not help much either because once again these tables already exists with the values saved.
A dynamic salt that is added to the password will help out much more then multiple hashes. Multiple hashes do not really add much to it.

My feeling about this is that you're building a brick wall with a steel door, but the other walls are made of plywood. In other words, you can inject as much security as you want into hashing passwords, but there are still going to be much easier ways to crack your system. I think it's much better to take a broad approach to security instead of worrying too much about things like this.

Is var1 + var2 gonna be the same in an expected situation? I mean, the first proposition should be ok in that case, for you would avoid hashing collision. About being worth the cost, that's a question you should answer. Is avoiding a collision worth the cost.
About the sha algorithm, it is supposed to really scramble the results. So applying it many times shouldn't have better results, nor applying it to a "more random" input should result in better results.

Hashing a string multiple times will complicate the task of the attacker in case of a dictionary attack, making it "much longer" to brute force. Your first question is not clear, but multiple hashes are not less secure because the second hash has a fixed length.
But more important than a multiple hash, the use of a salt is essential for a better security.
Concerning your second question, I would say that it is not very straightforward to say which of the 2 is the safest, I think going with the option 2 is really enough.

Related

Why does HashMap need a cryptographically secure hashing function?

I'm reading a Rust book about HashMap hashing functions, and I can't understand these two sentences.
By default, HashMap uses a cryptographically secure hashing function that can provide resistance to Denial of Service (DoS) attacks. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it.
I know what a cryptographically secure hash function is, but don't I understand the rationale behind it. From my understanding a good hash function for HashMap should only have three properties:
deterministic (the same object has same hash value)
be VERY fast,
has a uniform distribution of bits in hash value (meaning it will reduce collision)
Other properties, in cryptographically secure hash function, are not really relevant 99% (maybe even 99.99%) of the time for hash tables.
So my question is: What does "resistance to DoS attack and better security
" even mean in the context of HashMap?

Let's start backward: how do you DoS a HashMap?
Over the years, there have been multiple attacks on various software stacks based on Hash Flooding. If you know which framework a site is powered by, and therefore which hash function is used, and this hash function is not cryptographically secure then you may be able to pre-compute, offline, a large set of strings hashing to the same number.
Then, you simply inject this set into the site, and for each (simple) request, it does a disproportionately large amount of work as inserting N elements takes O(N2) operations.
Rust was conceived with the benefit of hindsight, and therefore attention was paid to avoiding this attack by default, reasoning that users who really need performance out of HashMap would simply switch the hash function.

Let's say we use HashMap to store some user data in a web-application. Suppose that users can choose (part of) the key in some way – maybe the key is a username or a filename of an uploaded file or anything like that.
If we are not using a cryptographically secure hash function, this means that the attacker could possible craft multiple inputs that all map to the same output. Of course, a hash map has to deal with collisions, because they occur naturally.
But when unnaturally many collisions occur, the hash map implementation might do strange things. For example, looking up some keys could have a runtime of O(n). Or the hash map might think that it has to grow because of all the collisions; but growing won't solve the problem, so the hash map grows until all memory is used. In either case, it's bad. Hash maps just assume that statistically, collisions rarely occur.
Of course, this is not a "stealing user data" attack -- at least not directly. But if one part of a system is weak, this makes it easier for attackers to find other weaknesses.
A cryptographically secure hash function prevents this attack, since the attacker cannot possibly craft multiple keys that map to the same value (at least not without trying out all keys).
is not really relevant 99% (maybe even 99.99%) of the time for hash tables.
Yes, probably. But this is difficult to balance. I guess we all would agree that if 20% of users would have security problems in their application due to an unsecure hash function (while 80% don't care), it's still a good idea to use the "secure by default" approach. What about 5%/95%? What about 1%/99%? Hard to tell where the threshold is, right?
There has been a ton of discussion about this already. Because yes, most people only notice the slowness of the hash map. Maybe the situation I described above is incredibly rare and it isn't worth slowing down all other users' code by default. But this has been decided, the default hash function won't change, and luckily you can choose your own hash function.

If a server application stores user input (such as post data in a web application) in a hash table, a malicious user may try to provide a large number of inputs that all have the same hash value, leading to a large number of hash collisions and thus slowing down operations on the map significantly, to the point that it can be used as a DoS attack (as described in this article for example).
If the hash is cryptographically secure, attackers will have a much harder time trying to find inputs with the same hash value.

Bcrypt for password hashing because it is slow?

I read today on not-implemented.com :
Sha-256 should be chosen in most cases where a high speed hash function is desired. It is considered secure with no known theoretical vulnerabilities and it has a reasonable digest size of 32 bytes. For things like hashing user password, though, a function designed to be slow is preferred: a great one is bcrypt.
Can somebody explain the last sentence :
For things like hashing user password, though, a function designed to be slow is preferred: a great one is bcrypt.
I don't say it's not correct, my question is simply:
Why it is preferred for hashing user password to use a slow function ?

Because if it takes more time to hash the value, it also takes a much longer time to brute-force the password.
Keep in mind that slow means that it requires more computing power. The same goes for when a potential hacker tries to brute-force a password.

On your side, the password hash needs to be computed rather rarely. But an attacker who tries to brute force a password from a stolen hash, relies on computing as many hashes as possible.
So, if your login now takes 100 ms instead of 0.1 (probably less) that's not really a problem for you. But it makes a huge difference for an attacker if he needs 2000 days to break a password instead of 2 days.
bcrypt is designed to be slow and not to allow any shortcut.

It takes more effort to brute force attack the password. The slower the algorithm, the less guesses can be made per second. The extra time won't be noticed by a user of the system, but will make it harder to crack the password.

Brute force a hash password?. It's easy to say than done.
If the passwords are not using a SALT then it is possible to break it, no matter the kind of encryption (because we could use a dictionary / pre-calculated hash attack).
The speed of the algorithm means nothing, it's just a myth that some people are spreading for the wrong reasons.
For example the next example:
Our hash is generated with the next formula:
MD5(SALT+MD5(SALT+VALUE))
Even if we could generate every possible combination of md5 in a split of a second, how we know if we found the right value?. And the answer is no, it's not possible. MD5 (or sha) doesn't check if the value is right or not, it simply generates a sequence of values and nothing more.
We could try a force brute attack if and only if we have a way to determine if our hash generated match some criteria. These criteria could be a dictionary and it means a slow process too and only if we could find some criteria.

Hashing Passwords With Multiple Algorithms

Does using multiple algorithms make passwords more secure? (Or less?)
Just to be clear, I'm NOT talking about doing anything like this:
key = Hash(Hash(salt + password))
I'm talking about using two separate algorithms and matching both:
key1 = Hash1(user_salt1 + password)
key2 = Hash2(user_salt2 + password)
Then requiring both to match when authenticating. I've seen this suggested as a way eliminate collision matches, but I'm wondering about unintended consequences, such as creating a 'weakest link' scenario or providing information that makes the user database easier to crack, since this method provides more data than a single key does. E.g. something like combining information the hash to find them more easily. Also if collisions were truly eliminated, you could theoretically brute force the actual password not just a matching password. In fact, you'd have to in order to brute force the system at all.
I'm not actually planning to implement this, but I'm curious whether or not this is actually an improvement over the standard practice of single key = Hash(user_salt + password).
EDIT:
Many good answers, so just to surmise here, this should have been obvious looking back, but you do create a weakest link by using both, because the matches of weaker of the two algorithms can be tried against the other. Example if you used a weak (fast) MD5 and a PBKDF2, I'd brute force the MD5 first, then try any match I found against the other, so by having the MD5 (or whatever) you actual make the situation worse. Also even if both are among the more secure set (bcrypt+PBKDF2 for example), you double your exposure to one of them breaking.

The only thing this would help with would be reducing the possibility of collisions. As you mention, there are several drawbacks (weakest link being a big one).
If the goal is to reduce the possibility of collisions, the best solution would simply be to use a single secure algorithm (e.g. bcrypt) with a larger hash.

Collisions are not a concern with modern hashing algorithms. The point isn't to ensure that every hash in the database is unique. The real point is to ensure that, in the event your database is stolen or accidentally given away, the attacker has a tough time determining a user's actual password. And the chance of a modern hashing algorithm recognizing the wrong password as the right password is effectively zero -- which may be more what you're getting at here.
To be clear, there are two big reasons you might be concerned about collisions.
A collision between the "right" password and a supplied "wrong" password could allow a user with the "wrong" password to authenticate.
A collision between two users' passwords could "reveals" user A's password if user B's password is known.
Concern 1 is addressed by using a strong/modern hashing algorithm (and avoiding terribly anti-brilliant things, like looking for user records based solely on their password hash). Concern 2 is addressed with proper salting -- a "lengthy" unique salt for each password. Let me stress, proper salting is still necessary.
But, if you add hashes to the mix, you're just giving potential attackers more information. I'm not sure there's currently any known way to "triangulate" message data (passwords) from a pair of hashes, but you're not making significant gains by including another hash. It's not worth the risk that there is a way to leverage the additional information.

To answer your question:
Having a unique salt is better than having a generic salt. H(S1 + PW1) , H(S2 + PW2)
Using multiple algorithms may be better than using a single one H1(X) , H2(Y)
(But probably not, as svidgen mentions)
However,
The spirit of this question is a bit wrong for two reasons:
You should not be coming up with your own security protocol without guidance from a security expert. I know it's not your own algorithm, but most security problems start because they were used incorrectly; the algorithms themselves are usually air-tight.
You should not be using hash(salt+password) to store passwords in a database. This is because hashing was designed to be fast - not secure. It's somewhat easy with today's hardware (especially with GPU processing) to find hash collisions in older algorithms. You can of course use a newer secure Hashing Algorithm (SHA-256 or SHA-512) where collisions are not an issue - but why take chances?
You should be looking into Password-Based Key Derivation Functions (PBKDF2) which are designed to be slow to thwart this type of attack. Usually it takes a combination of salting, a secure hashing algorithm (SHA-256) and iterates a couple hundred thousand times.
Making the function take about a second is no problem for a user logging in where they won't notice such a slowdown. But for an attacker, this is a nightmare since they have to perform these iterations for every attempt; significantly slowing down any brute-force attempt.
Take a look at libraries supporting PBKDF encryption as a better way of doing this. Jasypt is one of my favorites for Java encryption.
See this related security question: How to securely hash passwords
and this loosely related SO question

A salt is added to password hashes to prevent the use of generic pre-built hash tables. The attacker would be forced to generate new tables based on their word list combined with your random salt.
As mentioned, hashes were designed to be fast for a reason. To use them for password storage, you need to slow them down (large number of nested repetitions).
You can create your own password-specific hashing method. Essentially, nest your preferred hashes on the salt+password and recurs.
string MyAlgorithm(string data) {
string temp = data;
for i = 0 to X {
temp = Hash3(Hash2(Hash1(temp)));
}
}
result = MyAlgorithm("salt+password");
Where "X" is a large number of repetitions, enough so that the whole thing takes at least a second on decent hardware. As mentioned elsewhere, the point of this delay is to be insignificant to the normal user (who knows the correct password and only waits once), but significant to the attacker (who must run this process for every combination). Of course, this is all for the sake of learning and probably simpler to just use proper existing APIs.

SHA512 and MD5 hashing

For awhile I have been looking for a more secure way to hash a user's password on my website, to be inserted into a database. I have looked into all of the hashing methods available. It's been said that bcrypt is the best because of its slowness. I was thinking that, if I don't need the highest security of, but still staying safe. What if I used sha512 and then md5 on that hash. Would it matter if I reversed the order of the hashing? Keep in mind I will be using a separate salt for each operation. How safe would this be? Are there an other combinations that would do the same?

Any custom method you invent is more likely to have subtle bugs which make your storage method vulnerable. It's not worth the effort, since you're not likely to find these subtle bugs until it's too late. It's much better to use something that has been tried and tested.
Generally, these are your choices when storing a password,
Use bcrypt.
Use a salted, strengthened hash.
The first method is easy, just use bcrypt. It's support in pretty much any language, and has been widely used and tested to ensure it's security.
The second method requires you to use a general purpose hash function (SHA-2 family or better, not MD5 or SHA-1 as they're broken/weak) or better yet, a HMAC, create a unique salt for the user and a unique application wide salt, and then iterate the hash function many times (100,000, etc) to slow it down (called key stretching/strengthening).
e.g. (pseudocode)
sofar = hash("sha512", user_salt + site_salt + input_password);
iterations = 100000; // Should be changed based on hardware speed
while (iterations > 0)
{
sofar = hash("sha512", user_salt + site_salt + sofar);
iterations--;
}
user.save("password", "$sha512$" + user_salt + "$" + iterations + "$" + sofar);
Each iteration should rely on the previous iteration so someone can't parallelize a brute force method to break it. Likewise, the number of iterations should be changed based on the speed of your hardware so that the process is slow enough. Slower is better when it comes to password hashing.
Summary
Use bcrypt.

I don't see the point of employing MD5 at all. It's broken. But multiple rounds is apparently stronger. However the improvement from two rounds is unlikely to make much difference. Applying SHA512 lots and lots of times would be better.
You should instead be looking at user password length and salting the passwords if you're not already doing so
Is "double hashing" a password less secure than just hashing it once? provides a detailed commentary on multiple rounds of an algorithm.
In reality I suspect that the number of SHA-512'd passwords cracked are small and there are more important things to worry about like preventing them seeing the passwords in the first place - ensuring your system is safe from SQL injection, privilege escalation, etc.

Given a hashing algorithm, is there a more efficient way to 'unhash' besides bruteforce?

So I have the code for a hashing function, and from the looks of it, there's no way to simply unhash it (lots of bitwise ANDs, ORs, Shifts, etc). My question is, if I need to find out the original value before being hashed, is there a more efficient way than just brute forcing a set of possible values?
Thanks!
EDIT: I should add that in my case, the original message will never be longer than several characters, for my purposes.
EDIT2: Out of curiosity, are there any ways to do this on the run, without precomputed tables?

Yes; rainbow table attacks. This is especially true for hashes of shorter strings. i.e. hashes of small strings like 'true' 'false' 'etc' can be stored in a dictionary and can be used as a comparison table. This speeds up cracking process considerably. Also if the hash size is short (i.e. MD5) the algorithm becomes especially easy to crack. Of course, the way around this issue is combining 'cryptographic salts' with passwords, before hashing them.
There are two very good sources of info on the matter: Coding Horror: Rainbow Hash Cracking and
Wikipedia: Rainbow table
Edit: Rainbox tables can tage tens of gigabytes so downloading (or reproducing) them may take weeks just to make simple tests. Instead, there seems to be some online tools for reversing simple hashes: http://www.onlinehashcrack.com/ (i.e. try to reverse 463C8A7593A8A79078CB5C119424E62A which is MD5 hash of the word 'crack')

"Unhashing" is called a "preimage attack": given a hash output, find a corresponding input.
If the hash function is "secure" then there is no better attack than trying possible inputs until a hit is found; for a hash function with a n-bit output, the average number of hash function invocations will be about 2n, i.e. Way Too Much for current earth-based technology if n is greater than 180 or so. To state it otherwise: if an attack method faster than this brute force method is found, for a given hash function, then the hash function is deemed irreparably broken.
MD5 is considered broken, but for other weaknesses (there is a published method for preimages with cost 2123.4, which is thus about 24 times faster than the brute force cost -- but it is still so far in the technologically unfeasible that it cannot be confirmed).
When the hash function input is known to be part of a relatively small space (e.g. it is a "password", so it could fit in the brain of a human user), then one can optimize preimage attacks by using precomputed tables: the attacker still has to pay the search cost once, but he can reuse his tables to attack multiple instances. Rainbow tables are precomputed tables with a space-efficient compressed representation: with rainbow tables, the bottleneck for the attacker is CPU power, not the size of his hard disks.

Assuming the "normal case", the original message will be many times longer than the hash. Therefore, it is in principle absolutely impossible to derive the message from the hash, simply because you cannot calculate information that is not there.
However, you can guess what's probably the right message, and there exist techniques to accelerate this process for common messages (such as passwords), for example rainbow tables. It is very likely that if something that looks sensible is the right message if the hash matches.
Finally, it may not be necessary at all to find the good message as long as one can be found which will pass. This is the subject of a known attack on MD5. This attack lets you create a different message which gives the same hash.
Whether this is a security problem or not depends on what exactly you use the hash for.

This may sound trivial, but if you have the code to the hashing function, you could always override a hash table container class's hash() function (or similar, depending on your programming language and environment). That way, you can hash strings of say 3 characters or less, and then you can store the hash as a key by which you obtain the original string, which appears to be exactly what you want. Use this method to construct your own rainbow table, I suppose. If you have the code to the program environment in which you want to find these values out, you could always modify it to store hashes in the hash table.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string