Is there a bruteforce-proof hashing algorithm? - security

Well, from the discussion of hashing methods weaknesses, I've got that the only ol' good brute-force is efficient to break.
So, the question is:
Is there a hashing algorithm which is more rigid against brute-force than others?
In case of hashing passwords.

The only protection against brute force is the fact that it takes an inordinately long time to perform a brute force.
Brute force works by simply going through every possible input string and trying it, one at a time. There's no way to protect against simply trying every possible combination.

This question is a decade old and now I've got the answer.
Yes, there are bruteforce-proof algorithms. The key for such algo is being slow. It will do no harm if correctness will be verified in a few milliseconds. But it will drastically slow the brute-force. Moreover, those algorithms can adapt to the future CPU's performance increase. Such algorithms include
bcrypt
argon2
Particularly in PHP, the password_hash() function must be used for hashing passwords

If you know that the input space is small enough for a brute force attack to be feasible, then there are two options for protecting against brute-force attacks:
Artificially enlarging the input space. This isn't really feasible - Password salting looks like that at first glance, but it really only prevents attackers from amortizing the cost of a brute force attack across multiple targets.
Artificially slowing down the hashing through key strengthening or using a hash algorithm that is inherently slow to compute - presumably, it's only a small extra cost to have the hash take a relatively long time (say, a tenth of a second) in production. But a brute-force attacker incurs this cost billions of times.
So that's the answer: the slower a hash algorithm is to compute, the less susceptible it is against brute-forcing the input space
(Original Answer follows)
Any additional bit in the output format makes the algorithm twice as strong against a straightforward brute force attack.
But consider that if you had a trillion computers that could each try a trillion hashes per second, it would still take you over 100 trillion years to brute-force a 128 bit hash, and you'll realize that a straightforward brute-force attack on the output is simply not worth wasting any throughts on.
Of course, if the input of the hash has less than 128bits of entropy, then you can brute-force the input - this is why it's often feasible to brute-force password cracking (Nobody can actually remember a password with 128 bits of entropy).

Consider the output of the hash algorithms.
A MD5 (128 bit) or SHA-1 (160 bit) is certainly easier to brute-force than a SHA-2 (224, 384 or even 512 bit).
Of course, there can be other flaws (like in MD5 or a bit less in SHA-1) which weaken the algorithm a lot more.

As Codeka said, no hashing algorithm is 100% secure against brute force attacks. However, even with hardware-assisted password cracking (using the GPU to try passwords), the time it takes to crack a sufficiently long password is astronomical. If you have a password of 8ish characters, you could be vulnerable to a brute force attack. But if you add a few more characters, the time it takes to crack increases radically.
Of course, this doesn't mean you're safe from rainbow attacks. The solution to that is to add a salt to your password and use a hashing algorithm that isn't vulnerable to preimage attacks.
If you use a salted password of 12-14 characters, preferably hashed with an sha2 algo (or equivalent), you've got a pretty secure password.
Read more here: http://www.codinghorror.com/blog/2007/10/hardware-assisted-brute-force-attacks-still-for-dummies.html

All cryptographic systems are vulnerable to brute force. Another term for this is a "Trivial Attack".
A simple explanation for hashing is that all hashing algorithms we use accept an infinitely sized input and have a fixed sized output. This is an unavoidable collision, and for something like sha256 it takes 2^256 operations to find one naturally. md5() has a shortcut making it 2^39th operations to find a collision.
One thing you can do to make your passwords stronger is to hide your salt. A password hash cannot be broken until its salt is retrieved. John The Ripper can be given a Dictionary, a Salt and a Password to recover password hashes of any type. In this case sha256() and md5() will break in about the same amount of time. If the attacker doesn't have the salt he will have to make significantly more guesses. If your salt is the same size as sha256 (32 bytes) it will take (dictionary size)*2^256 guesses to break one password. This property of salts is the basis of CWE-760.

Brute force is the worst attack, nothing can be brute force proof...
right now ~80-90 bits is considered cryptographically safe from a brute force attack standpoint, so you only need 10 bytes if a Collision Resistant Hash function is perfect, but they aren't so you just do more bits...
the proof that nothing can be brute force proof is in the Pigeon Hole Principle.
since hash function H allows arbitrary sized input [0,1]^n and outputs constant output [0,1]^k when the size of input exceeds the output size:, n>k, there are necessarily some outputs that can be produced by more than one input.
you can visualize that with a square divided into 9 sub squares.
0 | 0 | 0
0 | 0 | 0
0 | 0 | 0
these are your 9 holes. We are a brute force attacker, we have unlimited chances to attack... we have unlimited pigeons... but we at most need 10 to find a collision...
after 4 pidgeons and a good collision resistant hashing algorithm:
P | 0 | 0
0 | P | P
0 | 0 | P
after 9 pidgeons:
P | P | P
P | P | P
P | P | P
so our 10th pigeon will necessarily be a collision, because all of the holes are full.
but it really isn't even that good, because of another numerical property called the Birthday Paradox where given a number of independent selections you will find a duplicate much much faster than it takes to fill all of your "holes".

Related

Is there a hash function to generate a hash with a given length?

Is there a function that generates a hash that has the exact lenght I want? I know that MD5 always has 16 bytes. But I want to define the lenght of the resulting hash.
Example:
hash('Something', 2) = 'gn'
hash('Something', 5) = 'a5d92'
hash('Something', 20) = 'RYNSl7cMObkPuXCK1GhF'
When the length increases, the result should be more secure from duplicates.
The upcoming SHAKE256 (or SHAKE128 for a security level of 128bit instead of 256bit), a so called extendable-output function (XOF), is exactly what you are looking for. It will be defined alongside with SHA3. There is already a draft online.
If you need an established solution now, follow CodesInChaos advice and truncate SHA512 if a maximum of 64byte is enough and otherwise seed a stream cipher with the output of a hash of the original data.
Technical disclaimer: After an output length of 512bit the "security against duplicates" (collision resistance) does not increase any more with longer output, as with SHAKE256 it has reached the maximum security level against collisions the primitive supports (256bit). (Note that because of the birthday paradox the security level of an ideal hash function with output length of n bit against collisions is only n/2 bit.) Any higher security level is pretty much meaningless anyway (probably 256bit is already an overkill) given that our solar system does not provide enough energy to even count from 0 to 2^256.
Please do not confuse "security levels" with key lengths: With symmetric algorithms one usually expects a security level equal to the key size, but with asymmetric algorithms the numbers are completely unrelated: A 512 bit RSA encryption scheme is far less secure than 128bit AES (i.e. 512bit RSA moduli can be factored by brute force already).
If a cryptographic primitive tries to achieve a "security level of n bits" it means that there are supposed to be no attacks against it that is faster than 2^n operations.
BLAKE2 can produce digests of any size between 1 and 64 bytes.
If you want a digest considered cryptographically secure, consider the Birthday problem and what other algorithms use — e.g. SHA-1 uses 20 bytes and is considered insecure, SHA-2 uses 28/32/48/64 bytes and is generally considered secure.
If you just want to avoid accidental collisions, still consider the Birthday problem (above), but 16 or even 8 bytes might be considered sufficient depending on the application (see table).

Iterate over hash function though it reduces search space

I was reading this article regarding the number of times you should hash your password
A salt is added to password before the password is hashed to safeguard against dictionary attacks and rainbow table attacks.
The commentors in the answer by ORIP stated
hashing a hash is not something you should do, as the possibility of
hash collision increase with each iteration which may reduce the
search space (salt doesn't help), but this is irrelevant for
password-based cryptography. To reach the 256-bit search space of this
hash you'd need a completely random password, 40 characters long, from
all available keyboard characters (log2(94^40))
The answer by erickson recommended
With pre-computation off the table, an attacker has compute the hash
on each attempt. How long it takes to find a password now depends
entirely on how long it takes to hash a candidate. This time is
increased by iteration of the hash function. The number iterations is
generally a parameter of the key derivation function; today, a lot of
mobile devices use 10,000 to 20,000 iterations, while a server might
use 100,000 or more. (The bcrypt algorithm uses the term "cost
factor", which is a logarithmic measure of the time required.)
My questions are
1) Why do we iterate over the hash function since each iteration reduces the search space and hence make it easier to crack the password
2) What does search space mean ??
3) Why is the reduction of search space irrelevant for password-based cryptography
4) When is reduction of search space relevant ??
.
Let's start with the basic question: What is a search space?
A search space is the set of all values that must be searched in order to find the one you want. In the case of AES-256, the total key space is 2^256. This is a really staggeringly large number. This is the number that most people are throwing around when they say that AES cannot be brute forced.
The search space of "8-letter sequences of lowercase letters" is 26^8, or about 200 billion (~2^37), which from a cryptographic point of view is a tiny, insignificant number that can be searched pretty quickly. It's less than 3 days at 1,000,000 checks per second. Real passwords are chosen out of much smaller sets, since most people don't type 8 totally random letters. (You can up this with upper case and numbers and symbols, but people pick from a tiny set of those, too.)
OK, so people like to type short, easy passwords, but we want to make them hard to brute-force. So we need a way to convert "easy to guess passwords" into "hard to guess key." We call this a Key Derivation Function (KDF). We need two things for it:
The KDF must be "computationally indistinguishable from random." This means that there is no inverse of the hash function that can be computed more quickly than a brute force search.
The KDF should take non-trivial time to compute, so that brute forcing the tiny password space is still very difficult. Ideally it should be made as difficult as brute forcing the entire key space, but it is rare to push it that far.
The first point is the answer to your question of "why don't we care about collisions?" It is because collisions, while they could possibly exist, cannot be predicted in an computationally efficient manner. If collisions could be efficiently predicted, then your KDF function is not indistinguishable from random.
A KDF is not the same as just "repeated hashing." Repeated hashing can be distinguished from random, and is subject to significant attacks (most notably length-extension attacks).
PBKDF2, as a specific KDF example, is proven to be computationally indistinguishable from random, as long as it is provided with a pseudorandom function (PRF). A PRF is defined as itself being computationally indistinguishable from random. PBDFK2 uses HMAC, which is proven to be a PRF as long as it is provided a hashing function that is at least weakly collision resistant (the requirement is actually a bit weaker than even that).
Note the word "proven" here. Good cryptography lives on top of mathematical security proofs. It is not just "tie a lot of knots and hope it holds."
So that's a little tiny bit of the math behind why we're not worried about collisions, but let's also consider some intuition about it.
The total number of 16-character (absurdly long) passwords that can be easily typed on a common English keyboard is about 95^16 or 2^105 (that doesn't count the 15, 14, 13, etc length passwords, but since 95^16 is almost two orders of magnitude larger than 95^15, it's close enough). Now, consider that for each password, we're going to randomly map it to 10,000 intermediate keys (via 10,000 iterations of PBKDF2). That gets us up to 2^118 random choices that we hope never collide in our hash. What are the chances?
Well, 2^256 (our total space) divided by 2^118 (our keys) is 2^138. That means we're using much less than 10^-41 of the space for all passwords that could even be remotely likely. If we're picking these randomly (and the definition of a PRF says we are), the chances of two colliding are, um, small. And if two somehow did, no attacker would ever be able to predict it.
Take away lesson: Use PBKDF2 (or another good KDF like scrypt or bcrypt) to convert passwords into keys. Use a lot of iterations (10,000-100,000 at a minimum). Do not worry about the collisions.
You may be interested in a little more discussion of this in Brute-Forcing Passwords.
As the second snippet said, each iteration makes each "guess" a hacker makes take longer, therefore increasing the total time it will take then to crack an average password.
Search space is all the possible hashes for a password after however many iterations you are using. Each iteration decreases the search space.
Because of #1, as the size of the search space decreases, the time to check each possibility increases, balancing out that negative effect.
According to the second snippet, answers #1 and #3 say it actually isn't.
I hope this makes sense, it's a very complicated topic.
The reason to iterate is to make it harder for an attacker to brute force the hash. If you have a single round of hashing for a value, then in order to precompute a table for cracking that hash, you need to do 1 * keyspace hashes. If you do 1000 hashes of the value, then it would require the work of 1000 * keyspace.
Search space generally refers to the total number of combinations of characters that could make up a password.
I would say that the reduction of search space is irrelevant because passwords are generally not cracked by attempting 0000000, then 0000001, etc. They are instead attempted to be cracked by using dictionaries and combinatorics. There is essentially a realm of passwords that are likely to get cracked (like "password", "abcdef1", "goshawks", etc.), but creating a larger work factor will make it much more difficult for an attacker to hit all of the likely passwords in the space. Combining that with a salt, means they have to do all of the work for those likely passwords, for every hash they want to crack.
The reduction in search space becomes relevant if you are trying to crack something that is random and could take up any value in the search space.

possible collision hashing uuid cakephp

Is it possible to have collisions if to use Security::hash on uuid() string ? I know that uuid() generates truly unique string, but I need them to be hashed, and I am worried if there is a possibility that the hashed string can be repeated.
Thanks
Firstly, contrary to the name, a uuid does not create a truly unique string. It generates a string that is unique with very high probability(high enough that it can for pretty much all purposes be treated as unique).
As for your chances of getting a collision, that really depends on which hashing algorithm you are using. Assuming a well built hashing algorithm which distributes uniformly over it's output space, your odds of a collision with any two hashes is 1 / 2^n where n is the hash length in bits. The odds of any two hashes colliding in a birthday attack scenario can be approximated using the formula p(h) = h^2 / 2 m where h is the number of hashes you expect to generate and m is the output space (2^256 in the case of SHA256 for example).
So, the sum it all up, you will always have a chance of getting a hash collision regardless of what hashing algorithm you're using. However, in the case of pretty much anything equal to or greater than SHA256, the chance is so vanishingly small that is is not worth worrying about. Your time is better spent worrying about the chances of a bus running over your server in the next second.
uuid can generate duplicates but the chance is very very very small.
Security::hash of cakePHP looks like the hash function of PHP.
If you use it with sha512 it should be pretty good.

Encryption length

Does the encryption length of the hash encrypted influence his own security?
I mean if i use md5() or sha() if the returned hash has 35 chars or 55 chars does will this influence in some way security of hash?
Hash don't have a "security", as they generally can't be inverted.
What they do have is risk of collision, i.e. two different messages (filenames, passwords etc.) mapping to the same hash.
To prevent that, the more bits you have, the better, provided that hash distribution is as flat (spread out) as possible.
In that case, collision probability is approximately the reciprocal of the bit length: a 32 bit hash is 1/2**32, etc.
There are also other considerations: under certain circumstances it is possible to craft a collision against MD5, and SHA is therefore to ne preferred. The difficulty of the attack is very high, and so you can still use MD5 for most purposes except very high security scenarios.

Password salts: prepending vs. appending

I just looked at the implementation of password hashing in Django and noticed that it prepends the salt, so the hash is created like sha1(salt + password), for example.
In my opinion, salts are good for two purposes
Preventing rainbow table lookups
Alright, prepending/appending the salt doesn't really make a difference for rainbow tables.
Hardening against brute-force/dictionary attacks
This is what my question is about. If someone wants to attack a single password from a stolen password database, he needs to try a lot of passwords (e.g. dictionary words or [A-Za-z0-9] permutations).
Let's assume my password is "abcdef", the salt is "salt" and the attacker tries all [a-z]{6} passwords.
With a prepended salt, one must calculate hash("salt"), store the hash algorithm's state and then go on from that point for each permutation. That is, going through all permutations would take 26^6 copy-hash-algorithm's-state-struct operations and 26^6 hash(permutation of [a-z]{6}) operations. As copying the hash algorithm's state is freakin fast, the salt hardly adds any complexity here, no matter how long it is.
But, with an appended salt, the attacker must calculate hash(permutation of [a-z]{6} + salt) for each permutation, leading to 26^10 hash operations. So obviously, appending salts adds complexity depending on the salt length.
I don't believe this is for historical reasons because Django is rather new. So what's the sense in prepending salts?
Do neither, use a standard Key derivation function like PBKDF2. Never roll your own crypto. It's much too easy to get it wrong. PBKDF2 uses many iterations to protect against bruteforce which is a much bigger improvement than the simple ordering.
And your trick pre-calculating the internal state of the hash-function after processing the salt probably isn't that easy to pull off unless the length of the salt corresponds to the block-length of the underlying block-cypher.
If salt is prepended, attacker can make hash state database for salts (assuming salt is long enough to make a hashing step) and then run dictionary attack.
But if salt is appended, attacker can make such database for password dictionary and additionally compute only salt's hash. Given that salt is usually shorter than password (like 4 chars salt and 8 char password), it will be faster attack.
You are making a valid point, of course; but , really, if you want to increase time it takes to calculate hash, just use longer hash. SHA256 instead of SHA1, for example.

Resources