If MD5 is broken, what is a better solution? - security

After reading the topic "Is MD5 really that bad", I was thinking about a better solution for generating hashes. Are there better solutions like Adler, CRC32 or SHA1? Or are they broken as well?

CRC32 is probably the worst thing you could possibly use for passwords (besides maybe crc16 :). Cyclic Redundancy Checks are to detect if a message has been damaged though natural causes, it is trivial to generate collisions using nothing more than algebra. SHA0 and SHA1 are also broken, although unlike md5() no one has generated a SHA1 collision, but it is believed to be computationally feasible with our current technology.
Any member of the SHA-2 family should be used. Sha-256 is good, SHA-512 is probably more than you need. NIST is holding the SHA-3 competition right now and this will be finalized sometime in 2012. (Skein for the win!)

If you are looking for a cryptographic hash function, Adler and CRC32 are really bad idea.
SHA-1 is also broken already, but in a much less dangerous way than MD5. However, this will probably change in the future.
Right now the only sensible choice seems to be to use SHA-256, possibly truncating the digest to the desired length.

SHA1 has some theoretical attacks but AFAIK there is still nothing practical that will let you break it as of yet.
SHA2 seems to hold steady for now.

Related

Do I need MD5 as a companion to SHA-1?

Do I need both MD5 and SHA-1 values to be sure the downloaded file is
a) Untouched by hackers. For example, when I need to download some app's .iso via torrents
and
b) Not corrupted during technical issues? For example, some unstable network connection during download.
Or, probably, SHA-1 value will be enough for both checks?
Also, is SHA-1 (without MD5) enough to be sure that some file downloaded years ago and stored somewhere on my HDD haven't degradated?
From a security perspective MD-5 is utterly broken.
SHA-1 is considered suspicious, and avoided for most uses if at all possible. For new projects: don't use it at all.
SHA-2 (aka SHA-256, SHA-512, etc.) is still widely used for fast hashes.
SHA-3 is the future since 2012, nothing is stopping you from using it already. I see little reason not to use it for new projects.
What's the problem with older ones:
Their resistance to finding collisions is below par: This is an attacker creating 2 contents that have the same hash. These are constructed at the same time. This problem is there for MD5 and SHA-1, and it's BAD, but requires the attacker creating both versions (and then they can do a switch at any time they want undetected).
Their resistance to length extension attacks is relatively weak. This is especially true for MD5, but SHA-1 and even SHA-2 to some degree suffer from it.
When is it not a problem: to ensure your disk has not produced an error: and hash will do, even a simple CRC32 will work wonders (and I'd recommend the simpler CRC check), or a RAID array, as these can fix errors, not just detect them.
Use both ?
Well if you have to find a collision on one hash and have that same set of plaintexts also produce a collision on another hash, is probably more difficult. This approach has been used in the past, The original PGP did something like it. If I'm not mistaken it had a number of things it calculated, one of them simply the length (which would prevent the extension attack above).
So yes, it likely adds something, but the way md5 and SHA-1 and SHA-2 work internally is quite similar, and that's the worrisome part: they are too much alike to be sure just how much it adds against a highly sophisticated attacker (think the level of the NSA and their counterparts).
So why not use one of the more modern versions of SHA-2, or even better SHA-3 ? They've no known weaknesses and have been peer-reviewed heavily. As such for any commercial level use, they should be more than enough.
Refs:
https://en.wikipedia.org/wiki/Length_extension_attack
https://en.wikipedia.org/wiki/Collision_attack
https://stackoverflow.com/questions/tagged/sha-3

How is bcrypt more future proof than increasing the number of SHA iterations?

I've been researching bcrypt hashing, and of course one of the large benefits of the scheme its "adaptiveness". However, how is it anymore adaptive than simply increasing the amount of iterations you make over a SHA-1 hash? Say, instead of SHA-1 hashing a value 1000 times, you increase it to 10,000 iterations. Isn't this achieving the same goal? What makes bcrypt more adaptive?
Making many iterations with a hash function has a few subtleties, because there must be some kind of "salting" involved, and because existing hash functions are not as "random" as what could be hoped for; so care must be taken, in which case you end up with PBKDF2. PBKDF2 was designed for key derivation, which is not exactly the same than password hashing, but it turned out to be quite good at it too.
bcrypt has a (slight) advantage over PBKDF2-with-SHA-1 in that bcrypt is derived from the Blowfish block cipher. The point of having many iterations is to make the password processing slow, and, in particular, slow for the attacker. We tolerate that the function is made slow for the normal, honest systems, because it thwarts extensive password guessing. But an attacker may use hardware which the normal system does not use, e.g. a programmable GPU, which gives quite a boost to computations which fit well on that kind of hardware. Blowfish and bcrypt use RAM-based lookup tables (tables which are modified during the processing); such tables are easy to handle for a general purpose CPU, but quite cumbersome on a GPU; thus, bcrypt somewhat hinders processing enhancement by the attacker with GPU. That's a bonus which makes bcrypt a bit more desirable for a password storage than PBKDF2.
An alternative to both is scrypt. Unlike bcrypt, it doesn't make use of the somewhat unusual blowfish cipher, instead using any standard hash function, and it's specifically designed to be difficult to implement on dedicated hardware, by being both memory- and time-inefficient.
Your alternative is a bit underspecified. You didn't say how you combine password and salt into your hashing scheme. Doing this in the wrong way might lead to vulnerabilities. The advantage of bcrypt(and other standard KDFs) is that this is well specified.
If you look at PBKDF2 in the common HMAC-SHA1 mode it's very simililar to what you suggest.
That's essentially it. You can iterate any hash function. Some hash functions are better than other, so choose carefully.
MD5 for example is considered broken these days, and belongs to a category of hash functions which suffer from certain prefix-based attacks and birthday attacks.
bcrypt is a good rule-of-thumb because it gets a few things right (like salt) that you would have to explicitly implement if you used another function.
As noted in another answer the mechanism of iterating a hash function is very important, because it can unexpectedly weaken the algorithm or still fail to prevent some time-memory tradeoff attacks.
This is why PBKDF2 is your friend. It's detailed in RFC 2898. PBKDF2 is also future-proof because it doesn't depend on a specific hash algorithm. For example, can swap out MD5 for SHA3 when SHA3 is finalized by NIST.
Also, a slight catch on future-proofness. Bcrypt will work as long as the passphrase you're protecting is "between 8 and 56 characters." An important catch to keep in mind should your future ever require longer passphrases for some reason.
I believe the "adaptiveness" has nothing to do with the actual encryption but instead that bcrypt is an adaptive hash: over time it can be made slower and slower so it remains resistant to specific brute-force search attacks against the hash and the salt.
(Partly quoted from http://en.wikipedia.org/wiki/Bcrypt)

Password hashing: Is this a way to avoid collisions?

I was thinking about using 2 keys for hashing each user password, obtaining 2 different hashes. This way, it would be (almost?) impossible to find a password that works, other than the actual password.
Is that right? Is it worth it?
An important rule to learn is "never try to invent your own cryptography". You are just wasting time at best and introducing security holes at worst.
If you are unsure whether you are an exception to this rule, then you are not an exception to this rule.
The designers of cryptographic hashes already worried about collisions so you do not have to. Just pick one (SHA-256 is a fine choice) and focus your efforts on the rest of your application.
You might use SHA256 as a hashing algorithm instead. No collisions were found to date, and it's highly unlikely to see any collisions on passwords in the future.
You could just use a longer hash. SHA-512, for example, is 512 bits, and (assuming it's uniform) far, far less likely to clash as SHA-256. But personally, I wouldn't worry about it. Most passwords themselves are less than 32 bytes (256 bits), and so should have an extremely low probability of clashing with SHA-256.

AES vs Blowfish for file encryption

I want to encrypt a binary file. My goal is that to prevent anyone to read the file who doesn't have the password.
Which is the better solution, AES or Blowfish with the same key length? We can assume that the attacker has great resources (softwares, knowledge, money) for cracking the file.
Probably AES. Blowfish was the direct predecessor to Twofish. Twofish was Bruce Schneier's entry into the competition that produced AES. It was judged as inferior to an entry named Rijndael, which was what became AES.
Interesting aside: at one point in the competition, all the entrants were asked to give their opinion of how the ciphers ranked. It's probably no surprise that each team picked its own entry as the best -- but every other team picked Rijndael as the second best.
That said, there are some basic differences in the basic goals of Blowfish vs. AES that can (arguably) favor Blowfish in terms of absolute security. In particular, Blowfish attempts to make a brute-force (key-exhaustion) attack difficult by making the initial key setup a fairly slow operation. For a normal user, this is of little consequence (it's still less than a millisecond) but if you're trying out millions of keys per second to break it, the difference is quite substantial.
In the end, I don't see that as a major advantage, however. I'd generally recommend AES. My next choices would probably be Serpent, MARS and Twofish in that order. Blowfish would come somewhere after those (though there are a couple of others that I'd probably recommend ahead of Blowfish).
It is a not-often-acknowledged fact that the block size of a block cipher is also an important security consideration (though nowhere near as important as the key size).
Blowfish (and most other block ciphers of the same era, like 3DES and IDEA) have a 64 bit block size, which is considered insufficient for the large file sizes which are common these days (the larger the file, and the smaller the block size, the higher the probability of a repeated block in the ciphertext - and such repeated blocks are extremely useful in cryptanalysis).
AES, on the other hand, has a 128 bit block size. This consideration alone is justification to use AES instead of Blowfish.
In terms of the algorithms themselves I would go with AES, for the simple reason is that it's been accepted by NIST and will be peer reviewed and cryptanalyzed for years. However I would suggest that in practical applications, unless you're storing some file that the government wants to keep secret (in which case the NSA would probably supply you with a better algorithm than both AES and Blowfish), using either of these algorithms won't make too much of a difference. All the security should be in the key, and both of these algorithms are resistant to brute force attacks. Blowfish has only shown to be weak on implementations that don't make use of the full 16 rounds. And while AES is newer, that fact should make you lean more towards BlowFish (if you were only taking age into consideration). Think of it this way, BlowFish has been around since the 90's and nobody (that we know of) has broken it yet....
Here is what I would pose to you... instead of looking at these two algorithms and trying to choose between the algorithm, why don't you look at your key generation scheme. A potential attacker who wants to decrypt your file is not going to sit there and come up with a theoretical set of keys that can be used and then do a brute force attack that can take months. Instead he is going to exploit something else, such as attacking your server hardware, reverse engineering your assembly to see the key, trying to find some config file that has the key in it, or maybe blackmailing your friend to copy a file from your computer. Those are going to be where you are most vulnerable, not the algorithm.
AES.
(I also am assuming you mean twofish not the much older and weaker blowfish)
Both (AES & twofish) are good algorithms. However even if they were equal or twofish was slightly ahead on technical merit I would STILL chose AES.
Why? Publicity. AES is THE standard for government encryption and thus millions of other entities also use it. A talented cryptanalyst simply gets more "bang for the buck" finding a flaw in AES then it does for the much less know and used twofish.
Obscurity provides no protection in encryption. More bodies looking, studying, probing, attacking an algorithm is always better. You want the most "vetted" algorithm possible and right now that is AES. If an algorithm isn't subject to intense and continual scrutiny you should place a lower confidence of it's strength. Sure twofish hasn't been compromised. Is that because of the strength of the cipher or simply because not enough people have taken a close look ..... YET
The algorithm choice probably doesn't matter that much. I'd use AES since it's been better researched. What's much more important is choosing the right operation mode and key derivation function.
You might want to take a look at the TrueCrypt format specification for inspiration if you want fast random access. If you don't need random access than XTS isn't the optimal mode, since it has weaknesses other modes don't. And you might want to add some kind of integrity check(or message authentication code) too.
I know this answer violates the terms of your question, but I think the correct answer to your intent is simply this: use whichever algorithm allows you the longest key length, then make sure you choose a really good key. Minor differences in the performance of most well regarded algorithms (cryptographically and chronologically) are overwhelmed by a few extra bits of a key.
Both algorithms (AES and twofish) are considered very secure. This has been widely covered in other answers.
However, since AES is much widely used now in 2016, it has been specifically hardware-accelerated in several platforms such as ARM and x86. While not significantly faster than twofish before hardware acceleration, AES is now much faster thanks to the dedicated CPU instructions.

Best general-purpose digest function?

When writing an average new app in 2009, what's the most reasonable digest function to use, in terms of security and performance? (And how can I determine this in the future, as conditions change?)
When similar questions were asked previously, answers have included SHA1, SHA2, SHA-256, SHA-512, MD5, bCrypt, and Blowfish.
I realize that to a great extent, any one of these could work, if used intelligently, but I'd rather not roll a dice and pick one randomly. Thanks.
I'd follow NIST/FIPS guidelines:
March 15, 2006: The SHA-2 family of
hash functions (i.e., SHA-224,
SHA-256, SHA-384 and SHA-512) may be
used by Federal agencies for all
applications using secure hash
algorithms. Federal agencies should
stop using SHA-1 for digital
signatures, digital time stamping and
other applications that require
collision resistance as soon as
practical, and must use the SHA-2
family of hash functions for these
applications after 2010. After 2010,
Federal agencies may use SHA-1 only
for the following applications:
hash-based message authentication
codes (HMACs); key derivation
functions (KDFs); and random number
generators (RNGs). Regardless of use,
NIST encourages application and
protocol designers to use the SHA-2
family of hash functions for all new
applications and protocols.
You say "digest function"; presumably that means you want to use it to compute digests of "long" messages (not just hashing "short" "messages" like passwords). That means bCrypt and similar choices are out; they're designed to be slow to inhibit brute-force attacks on password databases. MD5 is completely broken, and SHA-0 and SHA-1 are too weakened to be good choices. Blowfish is a stream cipher (though you can run it in a mode that produces digests), so it's not such a good choice either.
That leaves several families of hash functions, including SHA-2, HAVAL, RIPEMD, WHIRLPOOL, and others. Of these, the SHA-2 family is the most thoroughly cryptanalyzed, and so it would be my recommendation for general use. I would recommend either SHA2-256 or SHA2-512 for typical applications, since those two sizes are the most common and likely to be supported in the future by SHA-3.
It really depends on what you need it for.
If you are in need of actual security, where the ability to find a collision easily would compromise your system, I would use something like SHA-256 or SHA-512 as they come heavily recommended by various agencies.
If you are in need of something that is fast, and can be used to uniquely identify something, but there are no actual security requirements (ie, an attacker wouldn't be able to do anything nasty if they found a collision) then I would use something like MD5.
MD4, MD5, and SHA-1 have been shown to be more easily breakable, in the sense of finding a collision via a birthday attack method, than expected. RIPEMD-160 is well regarded, but at only 160 bits a birthday attack needs only 2^80 operations, so it won't last forever. Whirlpool has excellent characteristics and appears the strongest of the lot, though it doesn't have the same backing as SHA-256 or SHA-512 does - in the sense that if there was a problem with SHA-256 or SHA-512 you'd be more likely to find out about it via proper channels.

Resources