Will application pass FIPS certification if we use md5 only to calculate unique values?

Will application pass FIPS certification if we use md5 only to calculate unique values? - security

Our software has a functionality to calculate unique parts of some data.
To do that we use md5 hash function, store all hashes and eliminate those which are duplicated.
We do not use md5 for passwords hashing or in other security-critical use cases.
Will our software pass FIPS certification if we have only these use cases?
I know that md5 is not FIPS-approved algorithm, but as far as I know it is only critical when there are security risk, for example if someone is using it to hash passwords.

As per FIPS 140-2 compliance, it is strongly suggested to use only FIPS approved algorithms for digests that is listed in FIPS-180. I see you're not using it for passwords but FIPS never said that these algorithms are only meant for password hashes. Hence, MD5 should not be used in the entirety of application.

Related

Hashing algorithm that meet FISMA / other federal informations systems requirements

I work in an organization that must meet FISMA requirements for FIPS-enabled systems. One of the thing that I am trying to do is implement a hash algorithm for our passwords. I have many choice on this: SHA-2, MD5, bcrypt (with Blowfish), RIPE, etc.
Reading through the various NIST publication, there is NOTHING that FISMA stated that I must use a specific algorithm to meet their requirements.
However, FIPS 180-4 specifies WHICH hash algorithm is considered secure according to FISMA, which is SHA-1 to SHA-512/256. NIST SP 800-132 also recommend the use of PBKDS2.
So does this mean that:
a). I HAVE to use SHA for the hash algorithm to pass the FISMA audit / requirements?
...OR...
b). I CAN use any algorithm as long as it is BETTER than SHA? I.e. don't use MD5, but bcrypt or RIPE is OK.

How about both? Hash/salt your passwords using bcrypt to take advantage of its work factor features to provide future-proofing against massively parallel brute force attacks. Then SHA-512 the result from bcrypt and store that.
You get your protection from bcrypt, and you check the FISMA/FIPS box because you're storing a hash generated with their accepted algorithm.
Even if someone could brute-force SHA-512 to find an input that generates the same hash you've got stored in the database, that input isn't a working password that can log a user into the system.

Yes, you have to use SHA. SP 800-53 references FIPS 140-2 all over the place, implying that you must use SHA-256 or SHA-512. (Avoid SHA-1).
It's spelled out clearly in the MEMORANDUM FOR HEADS OF EXECUTIVE DEPARTMENTS AND AGENCIES from the Executive Office of the President:
11. Is use of National Institute of Standards and Technology (NIST) publications
required?
Yes. For non-national security programs and information systems, agencies
must follow NIST standards and guidelines. ...
12. Are NIST guidelines flexible?
Yes. While agencies are required to follow NIST standards and guidelines in
accordance with OMB policy, there is flexibility within NIST’s guidelines
(specifically in the 800-series) in how agencies apply them. However,
Federal Information Processing Standards (FIPS) are mandatory. ...
(And think about it. NIST didn't publish SHA as a standard so that you could go and use something else instead...)
Also, SHA and Bcrypt aren't really directly comparable. SHA is a set of hashing algorithms. Bcrypt is more of a process to produce a hash with the Blowfish algorithm at its core. The FIPS equivalent of Bcrypt is PBKDF2, which uses SHA as its core algorithm.

MD5 hash reversing

I know it's not possible to reverse an MD5 hash back to its original value. But what about generating a set of random characters which would give the exact same value when hashed? Is that possible?

Finding a message that matches a given MD5 hash can happen in three ways:
You guess the original message. For passwords and other low entropy messages this is often relatively easy. That's why we use use key-stretching in such situations. For sufficiently complex messages, this becomes infeasible.
You guess about 2^127 times and get a new message fitting that hash. This is currently infeasible.
You exploit a pre-image attack against that specific hash function, obtained by cryptoanalyzing it. For MD5 there is one, with a workfactor of 2^123, but that's still infeasible.
There is no efficient attack on MD5's pre-image resistance at the moment.
There are efficient collision attacks against MD5, but they only allow an attacker to construct two different messages with the same hash. But it doesn't allow him to construct a message for a given hash.

Yes it is possible to come up with a collision (since you map from a larger space to a smaller this is something that you can assume to happen eventually). Actually MD5 is already considered as "broken" in this respect.
From wiki:
However, it has since been shown that MD5 is not collision
resistant;[3] as such, MD5 is not suitable for applications like SSL
certificates or digital signatures that rely on this property. In
1996, a flaw was found with the design of MD5, and while it was not a
clearly fatal weakness, cryptographers began recommending the use of
other algorithms, such as SHA-1—which has since been found also to be
vulnerable. In 2004, more serious flaws were discovered in MD5, making
further use of the algorithm for security purposes
questionable—specifically, a group of researchers described how to
create a pair of files that share the same MD5 checksum.[4][5] Further
advances were made in breaking MD5 in 2005, 2006, and 2007.[6] In
December 2008, a group of researchers used this technique to fake SSL
certificate validity,[7][8] and US-CERT now says that MD5 "should be
considered cryptographically broken and unsuitable for further
use."[9] and most U.S. government applications now require the SHA-2
family of hash functions.[10]

In one sense, this is possible. If you have strings that are longer than the hash itself, then you will have collisions, so such a string will exist.
However, finding such a string would be equivalent to reversing the hash, as you would be finding a value that hashes to a particular hash, so it would not be any more feasible than reversing a hash any other way.

For MD5 specifically? Yes.
Several years ago, an article was published on an exploit of the MD5 hash that allowed easy generation of data which, when hashed, gave a desired MD5 hash (well, what they actually discovered was an algorithm to find sets of data with the same hash, but you get how that can be used the other way around). You can read an overview of the principle here. No similar algorithm has been found for SHA-2, although that may change in the future.

Yes, what you're talking about is called a collision. A collision in any hashing mechanism is when two different plaintexts create the same hash after being run through a hashing algorithm.

Is It Possible To Reconstruct a Cryptographic Hash's Key

We would like to cryptographically (SHA-256) hash a secret value in our database. Since we want to use this as a way to lookup individual records in our database, we cannot use a different random salt for each encrypted value.
My question is: given unlimited access to our database, and given that the attacker knows at least one secret value and hashed value pair, is it possible for the attacker to reverse engineer the cryptographic key? IE, would the attacker then be able to reverse all hashes and determine all secret values?
It seems like this defeats the entire purpose of a cryptographic hash if it is the case, so perhaps I'm missing something.

There are no published "first pre-image" attacks against SHA-256. Without such an attack to open a shortcut, it is impossible for an attacker to the recover a secret value from its SHA-256 hash.
However, the mention of a "secret key" might indicate some confusion about hashes. Hash algorithms don't use a key. So, if an attacker were able to attack one "secret-value–hash-value" pair, he wouldn't learn a "key" that would enable him to easily invert the rest of the hash values.
When a hash is attacked successfully, it is usually because the original message was from a small space. For example, most passwords are chosen from a relatively short list of real words, perhaps with some simple permutations. So, rather than systematically testing every possible password, the attacker starts with an ordered list of the few billion most common passwords. To avoid this, it's important to choose the "secret value" randomly from a large space.
There are message authentication algorithms that hash a secret key together with some data. These algorithms are used to protect the integrity of the message against tampering. But they don't help thwart pre-image attacks.

In short, yes.

No, a SHA hash is not reversible (at least not easily). When you Hash something if you need to reverse it you need to reconstruct the hash. This is usually done with a private (salt) and public key.
For example, if I'm trying to prevent access based off my user id. I would hash my user id and the salt. Let say MD5 for example. My user id is "12345" and the salt is "abcde"
So I will hash the string "12345_abcde", which return a hash of "7b322f78afeeb81ad92873b776558368"
Now I will pass to the validating application the hash and the public key, "12345" which is the public key and the has.
The validating application, knows the salt, so it hashes the same values. "12345_abcde", which in turn would generate the exact same hash. I then compare the hash i validated with the one passed off and they match. If I had somehow modified the public key without modifying the hash, a different has would have been generated resulting in a mismatch.

Yes it's possible, but not in this lifetime.

Modern brute-force attacks using multiple GPUs could crack this in short order. I recommend you follow the guidelines for password storage for this application. Here are the current password storage guidelines from OWASP. Currently, they recommend a long salt value, and PBKDF2 with 64,000 iterations, which iteratively stretches the key and makes it computationally complex to brute force the input values. Note that this will also make it computationally complex for you to generate your key values, but the idea is that you will be generating keys far less frequently than an attacker would have to. That said, your design requires many more key derivations than a typical password storage/challenge application, so your design may be fatally flawed. Also keep in mind that the iteration count should doubled every 18 months to make the computational complexity follow Moore's Law. This means that your system would need some way of allowing you to rehash these values (possibly by combining hash techniques). Over time, you will find that old HMAC functions are broken by cryptanalysts, and you need to be ready to update your algorithms. For example, a single iteration of MD5 or SHA-1 used to be sufficient, but it is not anymore. There are other HMAC functions that could also suit your needs that wouldn't require PBKDF2 (such as bcrypt or scrypt), but PBKDF2 is currently the industry standard that has received the most scrutiny. One could argue that bcrypt or scrypt would also be suitable, but this is yet another reason why a pluggable scheme should be used to allow you to upgrade HMAC functions over time.

SHA512 vs. Blowfish and Bcrypt [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking at hashing algorithms, but couldn't find an answer.
Bcrypt uses Blowfish
Blowfish is better than MD5
Q: but is Blowfish better than SHA512?
Thanks..
Update:
I want to clarify that I understand the difference between hashing and encryption. What prompted me to ask the question this way is this article, where the author refers to bcrypt as "adaptive hashing"
Since bcrypt is based on Blowfish, I was led to think that Blowfish is a hashing algorithm. If it's encryption as answers have pointed out, then seems to me like it shouldn't have a place in this article. What's worse is that he's concluding that bcrypt is the best.
What's also confusing me now is that the phpass class (used for password hashing I believe) uses bcrypt (i.e. blowfish, i.e. encryption). Based on this new info you guys are telling me (blowfish is encryption), this class sounds wrong. Am I missing something?

It should suffice to say whether bcrypt or SHA-512 (in the context of an appropriate algorithm like PBKDF2) is good enough. And the answer is yes, either algorithm is secure enough that a breach will occur through an implementation flaw, not cryptanalysis.
If you insist on knowing which is "better", SHA-512 has had in-depth reviews by NIST and others. It's good, but flaws have been recognized that, while not exploitable now, have led to the the SHA-3 competition for new hash algorithms. Also, keep in mind that the study of hash algorithms is "newer" than that of ciphers, and cryptographers are still learning about them.
Even though bcrypt as a whole hasn't had as much scrutiny as Blowfish itself, I believe that being based on a cipher with a well-understood structure gives it some inherent security that hash-based authentication lacks. Also, it is easier to use common GPUs as a tool for attacking SHA-2–based hashes; because of its memory requirements, optimizing bcrypt requires more specialized hardware like FPGA with some on-board RAM.
Note: bcrypt is an algorithm that uses Blowfish internally. It is not an encryption algorithm itself. It is used to irreversibly obscure passwords, just as hash functions are used to do a "one-way hash".
Cryptographic hash algorithms are designed to be impossible to reverse. In other words, given only the output of a hash function, it should take "forever" to find a message that will produce the same hash output. In fact, it should be computationally infeasible to find any two messages that produce the same hash value. Unlike a cipher, hash functions aren't parameterized with a key; the same input will always produce the same output.
If someone provides a password that hashes to the value stored in the password table, they are authenticated. In particular, because of the irreversibility of the hash function, it's assumed that the user isn't an attacker that got hold of the hash and reversed it to find a working password.
Now consider bcrypt. It uses Blowfish to encrypt a magic string, using a key "derived" from the password. Later, when a user enters a password, the key is derived again, and if the ciphertext produced by encrypting with that key matches the stored ciphertext, the user is authenticated. The ciphertext is stored in the "password" table, but the derived key is never stored.
In order to break the cryptography here, an attacker would have to recover the key from the ciphertext. This is called a "known-plaintext" attack, since the attack knows the magic string that has been encrypted, but not the key used. Blowfish has been studied extensively, and no attacks are yet known that would allow an attacker to find the key with a single known plaintext.
So, just like irreversible algorithms based cryptographic digests, bcrypt produces an irreversible output, from a password, salt, and cost factor. Its strength lies in Blowfish's resistance to known plaintext attacks, which is analogous to a "first pre-image attack" on a digest algorithm. Since it can be used in place of a hash algorithm to protect passwords, bcrypt is confusingly referred to as a "hash" algorithm itself.
Assuming that rainbow tables have been thwarted by proper use of salt, any truly irreversible function reduces the attacker to trial-and-error. And the rate that the attacker can make trials is determined by the speed of that irreversible "hash" algorithm. If a single iteration of a hash function is used, an attacker can make millions of trials per second using equipment that costs on the order of $1000, testing all passwords up to 8 characters long in a few months.
If however, the digest output is "fed back" thousands of times, it will take hundreds of years to test the same set of passwords on that hardware. Bcrypt achieves the same "key strengthening" effect by iterating inside its key derivation routine, and a proper hash-based method like PBKDF2 does the same thing; in this respect, the two methods are similar.
So, my recommendation of bcrypt stems from the assumptions 1) that a Blowfish has had a similar level of scrutiny as the SHA-2 family of hash functions, and 2) that cryptanalytic methods for ciphers are better developed than those for hash functions.

I agree with erickson's answer, with one caveat: for password authentication purposes, bcrypt is far better than a single iteration of SHA-512 - simply because it is far slower. If you don't get why slowness is an advantage in this particular game, read the article you linked to again (scroll down to "Speed is exactly what you don’t want in a password hash function.").
You can of course build a secure password hashing algorithm around SHA-512 by iterating it thousands of times, just like the way PHK's MD5 algorithm works. Ulrich Drepper did exactly this, for glibc's crypt(). There's no particular reason to do this, though, if you already have a tested bcrypt implementation available.

Blowfish is not a hashing algorithm. It's an encryption algorithm. What that means is that you can encrypt something using blowfish, and then later on you can decrypt it back to plain text.
SHA512 is a hashing algorithm. That means that (in theory) once you hash the input you can't get the original input back again.
They're 2 different things, designed to be used for different tasks. There is no 'correct' answer to "is blowfish better than SHA512?" You might as well ask "are apples better than kangaroos?"
If you want to read some more on the topic here's some links:
Blowfish
SHA512

Blowfish isn't better than MD5 or SHA512, as they serve different purposes. MD5 and SHA512 are hashing algorithms, Blowfish is an encryption algorithm. Two entirely different cryptographic functions.

I would recommend Ulrich Drepper's SHA-256/SHA-512 based crypt implementation.
We ported these algorithms to Java, and you can find a freely licensed version of them at ftp://ftp.arlut.utexas.edu/java_hashes/.
Note that most modern (L)Unices support Drepper's algorithm in their /etc/shadow files.

MD5 and sequential number

I have some sequential id which can be easily guessed. If some want to see data related to this id he has to prove his access by token I gave him before.
token = md5(secret_key + md5(id))
Is MD5 good enough for this job?

Technically one does not even need to md5 the id before concatenation to be secure enough salting.
However I would generally suggest using sha-256 or sha-512 unless one has some serious performance concerns (say embedded programming).

It really depends on what you're trying to protect, but probably not. I don't see any reason not to use a stronger hashing function.

Assuming that this is used for authentication I'd use HMAC. See for example FIPS PUB 198. This for example allows you to use a secure hash function (not MD5), truncate the result as described and still get secure tokens.

Don't use MD5. It is broken. I cannot believe VeriSign of all people still use MD5. There are test suites available for determining hash collisions for MD5 for use in breaking MD5 hash comparisons.
Use, at the absolute minimum, SHA-1. I recommend using SHA-5.

If the ID can be easily guessed, this is not really very secure unless the secret key is quite long.
My PC can brute-force a secret_key value based on the MD5 in about a day for a secret_key of 6 characters. People with access to faster/more computers can greatly reduce that time. The time-to-break increases by a factor of 10 for each additional digit in the key. Since the ID can be easily guessed, and therefore it's MD5 value computed, it does not add much to the difficulty of reversing to get the secret_key.

I would recommend using an alternative solution, or (if not acceptable) adding more data to your md5 generation routine. If your secret_key is constant, and I am able to reverse engineer one hash, then I can generate the correct key for any other ID.
If you build something such as a random salt stored with your data plus the current time (if associated with the record you are protecting) into the md5 generation then it will dramatically increase the difficulty of the attack.
See:
http://www.freerainbowtables.com/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string