As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking at hashing algorithms, but couldn't find an answer.
Bcrypt uses Blowfish
Blowfish is better than MD5
Q: but is Blowfish better than SHA512?
Thanks..
Update:
I want to clarify that I understand the difference between hashing and encryption. What prompted me to ask the question this way is this article, where the author refers to bcrypt as "adaptive hashing"
Since bcrypt is based on Blowfish, I was led to think that Blowfish is a hashing algorithm. If it's encryption as answers have pointed out, then seems to me like it shouldn't have a place in this article. What's worse is that he's concluding that bcrypt is the best.
What's also confusing me now is that the phpass class (used for password hashing I believe) uses bcrypt (i.e. blowfish, i.e. encryption). Based on this new info you guys are telling me (blowfish is encryption), this class sounds wrong. Am I missing something?
It should suffice to say whether bcrypt or SHA-512 (in the context of an appropriate algorithm like PBKDF2) is good enough. And the answer is yes, either algorithm is secure enough that a breach will occur through an implementation flaw, not cryptanalysis.
If you insist on knowing which is "better", SHA-512 has had in-depth reviews by NIST and others. It's good, but flaws have been recognized that, while not exploitable now, have led to the the SHA-3 competition for new hash algorithms. Also, keep in mind that the study of hash algorithms is "newer" than that of ciphers, and cryptographers are still learning about them.
Even though bcrypt as a whole hasn't had as much scrutiny as Blowfish itself, I believe that being based on a cipher with a well-understood structure gives it some inherent security that hash-based authentication lacks. Also, it is easier to use common GPUs as a tool for attacking SHA-2–based hashes; because of its memory requirements, optimizing bcrypt requires more specialized hardware like FPGA with some on-board RAM.
Note: bcrypt is an algorithm that uses Blowfish internally. It is not an encryption algorithm itself. It is used to irreversibly obscure passwords, just as hash functions are used to do a "one-way hash".
Cryptographic hash algorithms are designed to be impossible to reverse. In other words, given only the output of a hash function, it should take "forever" to find a message that will produce the same hash output. In fact, it should be computationally infeasible to find any two messages that produce the same hash value. Unlike a cipher, hash functions aren't parameterized with a key; the same input will always produce the same output.
If someone provides a password that hashes to the value stored in the password table, they are authenticated. In particular, because of the irreversibility of the hash function, it's assumed that the user isn't an attacker that got hold of the hash and reversed it to find a working password.
Now consider bcrypt. It uses Blowfish to encrypt a magic string, using a key "derived" from the password. Later, when a user enters a password, the key is derived again, and if the ciphertext produced by encrypting with that key matches the stored ciphertext, the user is authenticated. The ciphertext is stored in the "password" table, but the derived key is never stored.
In order to break the cryptography here, an attacker would have to recover the key from the ciphertext. This is called a "known-plaintext" attack, since the attack knows the magic string that has been encrypted, but not the key used. Blowfish has been studied extensively, and no attacks are yet known that would allow an attacker to find the key with a single known plaintext.
So, just like irreversible algorithms based cryptographic digests, bcrypt produces an irreversible output, from a password, salt, and cost factor. Its strength lies in Blowfish's resistance to known plaintext attacks, which is analogous to a "first pre-image attack" on a digest algorithm. Since it can be used in place of a hash algorithm to protect passwords, bcrypt is confusingly referred to as a "hash" algorithm itself.
Assuming that rainbow tables have been thwarted by proper use of salt, any truly irreversible function reduces the attacker to trial-and-error. And the rate that the attacker can make trials is determined by the speed of that irreversible "hash" algorithm. If a single iteration of a hash function is used, an attacker can make millions of trials per second using equipment that costs on the order of $1000, testing all passwords up to 8 characters long in a few months.
If however, the digest output is "fed back" thousands of times, it will take hundreds of years to test the same set of passwords on that hardware. Bcrypt achieves the same "key strengthening" effect by iterating inside its key derivation routine, and a proper hash-based method like PBKDF2 does the same thing; in this respect, the two methods are similar.
So, my recommendation of bcrypt stems from the assumptions 1) that a Blowfish has had a similar level of scrutiny as the SHA-2 family of hash functions, and 2) that cryptanalytic methods for ciphers are better developed than those for hash functions.
I agree with erickson's answer, with one caveat: for password authentication purposes, bcrypt is far better than a single iteration of SHA-512 - simply because it is far slower. If you don't get why slowness is an advantage in this particular game, read the article you linked to again (scroll down to "Speed is exactly what you don’t want in a password hash function.").
You can of course build a secure password hashing algorithm around SHA-512 by iterating it thousands of times, just like the way PHK's MD5 algorithm works. Ulrich Drepper did exactly this, for glibc's crypt(). There's no particular reason to do this, though, if you already have a tested bcrypt implementation available.
Blowfish is not a hashing algorithm. It's an encryption algorithm. What that means is that you can encrypt something using blowfish, and then later on you can decrypt it back to plain text.
SHA512 is a hashing algorithm. That means that (in theory) once you hash the input you can't get the original input back again.
They're 2 different things, designed to be used for different tasks. There is no 'correct' answer to "is blowfish better than SHA512?" You might as well ask "are apples better than kangaroos?"
If you want to read some more on the topic here's some links:
Blowfish
SHA512
Blowfish isn't better than MD5 or SHA512, as they serve different purposes. MD5 and SHA512 are hashing algorithms, Blowfish is an encryption algorithm. Two entirely different cryptographic functions.
I would recommend Ulrich Drepper's SHA-256/SHA-512 based crypt implementation.
We ported these algorithms to Java, and you can find a freely licensed version of them at ftp://ftp.arlut.utexas.edu/java_hashes/.
Note that most modern (L)Unices support Drepper's algorithm in their /etc/shadow files.
Related
I am security analyst and I had been asked this question Is SHA1(3DES-CBC) a good encryption for storing passwords in database?
However, to my knowledge I feel use of salt for storing any sensitive information. And I feel CBC mode is vulnerable on certain protocols. And I feel this is the best pratice https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet
Please correct my understanding of the above.
However, I am trying to understand the technical implication of SHA1(3DES-CBC) to better explain my team of its issues in implimentation. Please advise me on the same.
Fast hashing algorithms like SHA* are never a good choice to hash passwords, instead you should use a slow key-derivation function with a cost factor like BCrypt or PBKDF2.
I couldn't find much information about "3DES-CBC" in combination with SHA1, but both (SHA1 and DES) are hash functions without iterating.
However, to my knowledge I feel use of salt for storing any sensitive information ...
John Stevens of OWASP put together a good document on server password security and storage. It walks through the attacks and threats, and then adds steps to neutralize the threats. Here are the references to the OWASP material (you only referenced one of them):
Password Storage Cheat Sheet
Secure Password Storage Threat Model
And I feel CBC mode is vulnerable on certain protocols...
I don't believe this is correct. A block cipher operated in CBC mode is a pseudo random function. It posses the PRP-notion of security. However, it can't be used in a vacuum. Hence, the reason you need understand the material in the two OWASP references.
SHA1(3DES-CBC)...
I'm not sure what the purpose of the composite function is. You'll have to ask the developers what their security goals are, and what threat it neutralizes. Naively, I'm going to say AES/CBC or 3DES/CBC alone should have been sufficient.
You also have the key storage problem to contend with. Its known as the "Unattended Key Storage" problem, and its a problem without a solution. See Peter Gutmann's Engineering Security.
NO!
If you're storing passwords in a database, you should be using bcrypt or scrypt. bcrypt has been analyzed by numerous cryptographers over the years, and is the 'defacto' password hashing algorithm.
SHA1 is bad because:
It can be run quickly (bad, makes it vulnerable to brute force).
It is susceptible to collision attacks (this means attackers don't even need to brute force the password).
It can be easily reversed if you're not using a salt (rainbow tables).
bcrypt is great because:
It's very slow (slows down attackers trying to brute force).
It requires a lot of CPU (this means attackers need many computers, with large CPUs).
It has no collisions.
scrypt is just like bcrypt, but also requires a lot of memory to compute a hash, further slowing down attackers. scrypt is relatively new, however, so you might want to stick with bcrypt for now.
I know it's not possible to reverse an MD5 hash back to its original value. But what about generating a set of random characters which would give the exact same value when hashed? Is that possible?
Finding a message that matches a given MD5 hash can happen in three ways:
You guess the original message. For passwords and other low entropy messages this is often relatively easy. That's why we use use key-stretching in such situations. For sufficiently complex messages, this becomes infeasible.
You guess about 2^127 times and get a new message fitting that hash. This is currently infeasible.
You exploit a pre-image attack against that specific hash function, obtained by cryptoanalyzing it. For MD5 there is one, with a workfactor of 2^123, but that's still infeasible.
There is no efficient attack on MD5's pre-image resistance at the moment.
There are efficient collision attacks against MD5, but they only allow an attacker to construct two different messages with the same hash. But it doesn't allow him to construct a message for a given hash.
Yes it is possible to come up with a collision (since you map from a larger space to a smaller this is something that you can assume to happen eventually). Actually MD5 is already considered as "broken" in this respect.
From wiki:
However, it has since been shown that MD5 is not collision
resistant;[3] as such, MD5 is not suitable for applications like SSL
certificates or digital signatures that rely on this property. In
1996, a flaw was found with the design of MD5, and while it was not a
clearly fatal weakness, cryptographers began recommending the use of
other algorithms, such as SHA-1—which has since been found also to be
vulnerable. In 2004, more serious flaws were discovered in MD5, making
further use of the algorithm for security purposes
questionable—specifically, a group of researchers described how to
create a pair of files that share the same MD5 checksum.[4][5] Further
advances were made in breaking MD5 in 2005, 2006, and 2007.[6] In
December 2008, a group of researchers used this technique to fake SSL
certificate validity,[7][8] and US-CERT now says that MD5 "should be
considered cryptographically broken and unsuitable for further
use."[9] and most U.S. government applications now require the SHA-2
family of hash functions.[10]
In one sense, this is possible. If you have strings that are longer than the hash itself, then you will have collisions, so such a string will exist.
However, finding such a string would be equivalent to reversing the hash, as you would be finding a value that hashes to a particular hash, so it would not be any more feasible than reversing a hash any other way.
For MD5 specifically? Yes.
Several years ago, an article was published on an exploit of the MD5 hash that allowed easy generation of data which, when hashed, gave a desired MD5 hash (well, what they actually discovered was an algorithm to find sets of data with the same hash, but you get how that can be used the other way around). You can read an overview of the principle here. No similar algorithm has been found for SHA-2, although that may change in the future.
Yes, what you're talking about is called a collision. A collision in any hashing mechanism is when two different plaintexts create the same hash after being run through a hashing algorithm.
So passwords should not be stored in plaintext but many do anyway. For the others is there a standard way passwords are stored? I mean a SHA1 hash or MD5 hash and if so what will the salt size be? Is there a better place to ask this?
I am trying to pick the brains of sys admins and consultants working on directory services. I am trying to see if there is a pattern or not.
EDIT: I would like to clarify that I am not trying to learn how to store the passwords better myself but more trying to see how many different ways they are stored and if there is a standard if any.
MD5 has been broken for a while and SHA-1 also has problems.
If you want to store a hash that will be secure for a long time to come, SHA-256 or SHA-512 (part of the SHA-2 family of hashes, designed as secure replacements for SHA-1) are a good choice and somewhere between 128 and 256 bits of salt are standard.
However, the use of plain hashes is not the best way to do this nowadays. Adaptive hashes are specifically designed for this type of storage as the amount of time necessary to compute a result can be made to slow down with additional computations. This is a very important trait to have to prevent brute-force attacks against your stored passwords. A strong, and standard, implementation of an adaptive hash is bcrypt, based on modifications to the Blowfish encryption algorithm to make it suitable for this purpose (which is explained well here).
Passwords should be hashed and the hashes should be stored in the database.
However SHA* and MD5 are too fast a hashing algorithms to be used for the purpose of hashing passwords.
For hashing passwords, you'd ideally want something much slower which doesn't lend itself well to brute force/rainbow table attacks.
You can sure hash a password 1000s of times before storing the hash to make it time and computationally intensive but why bother doing that when you have algorithms like bcrypt that do the job for you.
You should use bcrypt to hash your password. Read more about it at
http://codahale.com/how-to-safely-store-a-password/
In bcrypt, since the salt is appended to hash - you don't even need two columns 'password_hash' and 'salt' in the table. Just 'password_hash'. The less clutter the better.
You can see this question for the answer to how long the salt should be (between 128-256 bits seems to be the consensus).
As far as what algorithm to use, you should definitely use SHA1. MD5 was considered broken long ago even though it is still commonly used (see wikipedia MD5.
I'm building a site where security is somewhat important (then again, when is it not important?) and I was looking for the best way to store my passwords. I know that MD5 has issues with collisions as well as SHA-1, so I was looking into storing my passwords via either SHA-256 or SHA-512.
Is it wiser to store a longer hash variant as opposed to a smaller one? (ie 512 vs 256) Does it take significantly more time to crack a SHA-512 encoded password versus a SHA-256 encoded password?
Also, I've read about using "salts" for the passwords. What is this and how does it work? Do I simply store the salt value in another database field? How do I use that as a part of the hash value calculation?
For password storage, you need more than a mere hash function; you need:
an extremely slow hash function (so that brute force attacks are more difficult)
and a salt: a publicly known value, stored along the hash, distinct for each hash password, and entering in the password hashing process. The salt prevents an attacker from efficiently attacking several passwords (e.g. using precomputed hash tables).
So you need bcrypt.
For the point of the hash output size: if that size is n bits, then n shall be such that an attacker cannot realistically compute the hash function 2n times; 80 bits are quite enough for that. An output of 128 bits is thus already overkill. You still would not want to use MD5, because it is way too fast (100000 nested invocations of MD5 might be slow enough, though) and because some structural weaknesses have been found in MD5, which do not directly impact its security for hashing passwords, but are bad public relations nonetheless. Anyway, you should use bcrypt, not a homemade structure.
Some of the answers here are giving you dubious advice. I recommend you to head over to the IT Security Stack Exchange and search on "password hashing". You will find lots of advice, and much of it has been carefully vetted by folks on the security stack exchange. Or, you could just listen to #Thomas Pornin, who knows what he is talking about.
Collisions are not relevant in your scenario, so MD5's weaknesses are not relevant. However, the most important thing is to use a hash that takes a long time to compute. Read http://codahale.com/how-to-safely-store-a-password/ and http://www.jasypt.org/howtoencryptuserpasswords.html (even if you're not using Java the techniques are still valid).
I would stay away from MD5 in any case, since there are other hashes that perform just as well.
I believe I can download the code to PHP or Linux or whatever and look directly at the source code for the MD5 function. Could I not then reverse engineer the encryption?
Here's the code - http://dollar.ecom.cmu.edu/sec/cryptosource.htm
It seems like any encryption method would be useless if "the enemy" has the code it was created with. Am I wrong?
That is actually a good question.
MD5 is a hash function -- it "mixes" input data in such a way that it should be unfeasible to do a number of things, including recovering the input given the output (it is not encryption, there is no key and it is not meant to be inverted -- rather the opposite). A handwaving description is that each input bit is injected several times in a large enough internal state, which is mixed such that any difference quickly propagates to the whole state.
MD5 is public since 1992. There is no secret, and has never been any secret, to the design of MD5.
MD5 is considered cryptographically broken since 2004, year of publication of the first collision (two distinct input messages which yield the same output); it was considered "weak" since 1996 (when some structural properties were found, which were believed to ultimately help in building collisions). However, there are other hash functions, which are as public as MD5 is, and for which no weakness is known yet: the SHA-2 family. Newer hash functions are currently being evaluated as part of the SHA-3 competition.
The really troubling part is that there is no known mathematical proof that a hash function may actually exist. A hash function is a publicly described efficient algorithm, which can be embedded as a logic circuit of a finite, fixed and small size. For the practitioners of computational complexity, it is somewhat surprising that it is possible to exhibit a circuit which cannot be inverted. So right now we only have candidates: functions for which nobody has found weaknesses yet, rather than function for which no weakness exists. On the other hand, the case of MD5 shows that, apparently, getting from known structural weaknesses to actual collisions to attacks takes a substantial amount of time (weaknesses in 1996, collisions in 2004, applied collisions -- to a pair of X.509 certificates -- in 2008), so the current trend is to use algorithm agility: when we use a hash function in a protocol, we also think about how we could transition to another, should the hash function prove to be weak.
It is not an encryption, but a one way hashing mechanism. It digests the string and produces a (hopefully) unique hash.
If it were a reversible encryption, zip and tar.gz formats would be quite verbose. :)
The reason it doesn't help hackers too much (obviously knowing how one is made is beneficial) is that if they find a password to a system that is hashed, e.g. 2fcab58712467eab4004583eb8fb7f89, they need to know the original string used to create it, and also if any salt was used. That is because when you login, for obvious reasons, the password string is hashed with the same method as it is generated and then that resulting hash is compared to what is stored.
Also, many developers are migrating to bcrypt which incorporates a work factor, if the hashing takes 1 second as opposed to .01 second, it greatly slows down generating a rainbow table for you application, and those old PHP sites using md5() only become the low hanging fruit.
Further reading on bcrypt.
One of the criteria of good cryptographic operations is that knowledge of the algorithm should not make it easier to break the encryption. So an encryption should not be reversible without knowledge of the algorithm and the key, and a hash function must not be reversible regardless of knowledge of the algorithm (the term used is "computationally infeasible").
MD5 and other hash function (like SHA-1 SHA-256, etc) perform a one-way operation on data that creates a digest or "fingerprint" that is usually much smaller than than the plaintext. This one way function cannot be reversed to retrieve the plaintext, even when you know exactly what the function does.
Likewise, knowledge of an encryption algorithm doesn't make it any easier (assuming a good algorithm) to recover plaintext from ciphertext. The reverse process is "computationally infeasible" without knowledge of the encryption key used.