Why use Authenticated Encryption instead if hashes?

Why use Authenticated Encryption instead if hashes? - security

What is the benefit of using Authenticated Encryption schemes like GCM or EAX compared to simpler methods like CRC or hash functions like SHA?
As far as I understand these methods basically add a Message Authentication Code (MAC) to the message so it can be validated. But the same would be possible if a CRC or hash value would be calculated and appended to the plaintext (MIC). This way it would also not be possible to tamper with the message because the hash would probably not match anymore.
The linked Wikipedia article says MICs don't take the key etc. into account but I do not understand why this is a problem.

There's conceptually no difference between an Authenticated Encryption scheme (GCM, CCM, EAX, etc) and providing an HMAC over the encrypted message, the AE algorithms simply constrain and standardize the byte pattern (while tending to require less space/time than a serial operation of encrypt and HMAC).
If you are computing your unkeyed digest over the plaintext before encrypting you do have a tamper-evident algorithm. But computing the digest over the plaintext has two disadvantages over computing it over the ciphertext:
If you send the same thing twice you send the same hash, even if your ciphertext is different (due to a different IV or key)
If the ciphertext has been tampered with in an attempt to confuse the decryption routine you will still process it before discovering the tamper.
Of course, the disadvantage of digesting after is that in your unkeyed approach anyone who tampers with the ciphertext can simply recompute the SHA-2-256 digest of the ciphertext after the tamper. The solution to that is to not do an unkeyed digest, but to do a keyed digest, like HMAC.
The options are:
Encrypt-only: Subject to tampering. Assuming a new IV is used for each message (and ECB isn't used) does not reveal when a message repeats.
Digest-only: Subject to tampering. Message is plain-text.
MAC-only: Not subject to tampering. Message is plain-text.
Digest-then-Encrypt (DtE - digest is itself encrypted): Ciphertext corruption attacks are possible. Tampering with the plaintext is possible, if it is known. Message reuse is not revealed.
Digest-and-Encrypt (D&E/E&D - digest plaintext, send digest as plaintext): Ciphertext corruption attacks are possible. Tampering with the plaintext is possible, if it is known. Message reuse is revealed via the digest not changing.
Encrypt-then-Digest (EtD): This guards against transmission errors, but since any attacker can just recompute the digest this is the same as encrypt-only.
MAC-then-Encrypt (MtE): Same strengths as DtE, but even if the attacker knew the original plaintext and what they had tampered it to they cannot alter the MAC (unless the plaintext is being altered to an already-known message+MAC).
MAC-and-Encrypt (M&E/E&M): Like D&E this reveals message reuse. Like MtE it is still vulnerable to ciphertext corruption, and a very small set of tampering.
Encrypt-then-MAC (EtM): Any attempt to alter the ciphertext is discovered by the MAC failing to validate, this can be done before processing the ciphertext. Message reuse is not revealed, since the MAC was over the ciphertext.
EtM is the safest approach in the general case. One of the things that an AE algorithm solves is that it takes the question of how to combine a MAC and cipher out of the developer's hands and puts it into the hands of a cryptographer.

Related

How to securely encrypt many similiar chunks of data with the same key?

I'm writing an application that will require the following security features: when launching the CLI version, you should pass some key to it. Some undefined number of chunks of data of the same size will be generated. It needs to be stored remotely. This will be a sensitive data. I want it to be encrypted and accessible only by that one key that was passed to it initially. My question is, which algorithm will suit me? I read about AES but it says that
When you perform an encryption operation you initialize your Encryptor
with this key, then generate a new, unique Initialization Vector for
each record you’re going to encrypt.
which means I'll have to pass a key and an IV, rather than just the key and this IV should be unique for each generated chunk of data (and there is going to be a lot of those).
If the answer is AES, which encryption mode is it?

You can use any modern symmetric algorithm. The amount of data and how to handle your IVs is irrelevant because it applies no matter which symmetric algorithm you pick.
AES-128 is a good choice, as it isn't limited by law in the US and 128 bits is infeasible to brute force. If you aren't in the US, you could use AES-256 if you wanted to, but implementations in Java require additional installations.
You say you are going to generate n many chunks of data (or retrieve, whatever).
You could encrypt them all at once in CBC mode, which keeps AES as a block cipher, and you'll only end up with one IV. You'll need an HMAC here to protect the integrity. This isn't the most modern way, however.
You should use AES in GCM mode as a stream cipher. You'll still have one single IV (nounce) but the ciphertext will also be authenticated.
IVs should be generated randomly and prepended to the ciphertext. You can then retrieve the IV when it is time to decrypt. Remember: IVs aren't secret, they just need to be random!
EDIT: As pointed out below, IVs should be generated using a crypto-secure random number generator. IVs for CTR based modes, like GCM, only need to be unique.
In summary, what you are worried about shouldn't be worried about. One key is fine. More than one IV is fine too, but there are ways to do it with just one. You will have to worry about IVs either way. Don't use ECB mode.

Why is encrypting necessary for security after hashing in the UMAC (Universal Message Authentication Code) algorithm?

On the Wikipedia for UMAC, https://en.wikipedia.org/wiki/UMAC, it states:
The resulting digest or fingerprint is then encrypted to hide the
identity of the hash function used.
Further, in this paper, http://web.cs.ucdavis.edu/~rogaway/papers/umac-full.pdf, it states:
A message is authenticated by hashing it with the shared hash function
and then encrypting the resulting hash (using the encryption key).
My question is, if the set of hash functions H is large enough, and the number of hash buckets |B| is large enough, why do we need to encrypt -- isn't the secret hash secure enough?
For example, take the worst case scenario where every client is sending the same, short content, like "x". If we hash to 32 bytes and our hash depends on a secret 32 byte hash key, and the hashes exhibit uniform properties, how could an attacker ever hope to learn the secret hash key of any individual client, even without encryption?
And, if the attacker doesn't learn the key, how could the attacker ever hope to maliciously alter the message contents?
Thank you!

I don't know much about UMAC specifically but:
Having a rainbow table for a specific hash function defeats any encryption you have put on the message so instead of having a single attack surface, you now have two
As computational powers increase with time you will be more and more likely to figure out the plaintext of the message so PFS (https://en.wikipedia.org/wiki/Forward_secrecy) will never be possible if you leave the MAC unencrypted.
On top of all this, if you can figure out a single plaintext message from a MAC value, you exponentially get closer to decrypting the rest of the message by getting some information about the PRNG, context of the other data, IV, etc.

AES256 CBC + HMAC SHA256 ensuring confidentiality and authentication?

I'm thinking of using AES256 CBC + HMAC SHA-256 as a building block for messages that ensures both confidentiality and authentication.
In particular, consider this scenario:
Alice is possession a public key belonging to Bob (the key exchange and algorithm is outside the scope of this question). Alice has an identifying key K, also shared with Bob, that she can use to identify herself with. Only Alice and Bob knows the key K.
Alice encrypts (nonce || K) using Bob's public key.
Bob decrypts the packet and has now has K and nonce.
Bob uses SHA-256 with SHA256(K || nonce) to yield a K(e) of 256 bits.
Bob uses SHA-256 with SHA256(K || nonce + 1) to yield a K(s) of 256 bits.
Now for every packet Bob wishes to send Alice he performs the following:
Create a new random 128 bit IV
Encrypts the message using the IV and K(e) as the key.
Creates a SHA-256 HMAC with K(s) as key and (IV || Encrypted message) as data.
Finally sends (IV || HMAC || Ciphertext) to Alice
Alice has also calculated K(e) and K(s), and follows the following procedure when receiving data from Bob:
Split the message into IV, ciphertext and HMAC.
Calculate the HMAC using K(s), IV and ciphertext.
Compare HMAC with the HMAC sent. If this matches, Alice considers this message authenticated as a message sent by Bob, otherwise it is discarded.
Alice decrypts the message using K(e)
Does this protocol ensure that Alice only decrypts messages from Bob, assuming that no one other than Bob can read the encrypted message that Alice sends him encrypted using his public key?
I.e. does messages constructed in this manner ensure both confidentiality and authentication?
Note: If the protocol requires Bob to send multiple messages, this scheme needs a slight modification to avoid replay attacks.
P.S. I am aware of AES-GCM/CCM, but this scheme would work with the basic AES, SHA and HMAC algorithms that are found in most crypto packages. This solution might also be slower, but that too is out of the scope for the question.

Basically you are recreating SSL/TLS. This implies the usual caveats about building your own protocol, and you are warmly encouraged to use TLS with an existing library instead of rewriting your own.
That being said, using AES with CBC for encryption, and HMAC for integrity, is sound. There are combined encryption+integrity modes (that you are aware of), and CBC+HMAC is kind of "old school", but it cannot hurt. You are doing things in the "science-approved" way: encrypt, then MAC the encrypted string (and you do not forget the IV: forgetting the IV is the classical mistake).
Your key derivation may be somewhat weak. It is perfect if SHA-256 behaves like a perfect random oracle, but it is known that SHA-256 does not behave like a random oracle (because of the so-called length-extension attack). It is similar to the reason why HMAC is HMAC, with two nested hash function invocations, instead of simple hashing (once) the concatenation of the MAC key and the data. TLS uses a specific key derivation function (which is called "the PRF" in the TLS specification) which should avoid any trouble. That function is built over SHA-256 (actually, over HMAC/SHA-256) and can be implemented around any typical SHA-256 implementation.
(I am not saying that I know how to attack your key derivation process; only that this is a tricky thing to make properly, and that its security may be assessed only after years of scrutiny from hundreds of cryptographers. Which is why reusing functions and protocols which have already been thoroughly examined is basically a good idea.)
In TLS there are two nonces, called the "client random" and the "server random". In your proposal you only have the "client random". What you lose here, security-wise, is kind of unclear. A cautious strategy would be to include a server random (i.e. another nonce chosen by Bob). The kind of things we want to avoid is when Alice and Bob run the protocol in both directions, and an attacker feeds messages from Alice to Alice herself. Complete analysis of what an attacker could do is complex (it is a whole branch of cryptography); generally speaking, nonces in both directions tend to avoid some issues.
If you send several packets, then you may have some issues about lost packets, duplicated packets ("replay attacks"), and packets arriving out of order. In the context of TLS, this should not "normally" happen because TLS is used over a medium which already ensures (under normal conditions, not counting active attacks) that data is transferred in strict order. Thus, TLS includes a sequence number into the data which goes in the MAC. This would detect any alteration from an attacker, include replay, lost records and record reordering. If possible, you should also use a sequence number.

The answer to the question as stated is no, there is no guarantee that Alice only decrypts messages from Bob, but that's only because you didn't stipulate that only Bob knows K. If Alice and Bob are the only two people who know K, then the crux of the question is whether your key generation protocol is sound. (We can ignore the rest, I believe, because you're just using HMAC-SHA256 and AES256 as they are intended to be used.)
The generation protocol isn't bad, but it can be improved. The accepted way to create keys from shared secrets is to use a "key derivation function". These functions use a hash in a similar way to what you have done here, but they are also purposely slow to inhibit brute force attacks. PBKDF2 seems to be what you want, as it a) can derive 512 bits of key data (or more), and b) can be made up of the primitives you have available; namely, SHA256 and HMAC-SHA256.

If you don't want to use PKI, take a look at TLS-PSK. It would seem to solve the exact problem you are solving yourself. See RFC 4279 (and 5487 for additional ciphersuites).

Does encryption guarantee integrity?

To build a secure system, can we assume that encryption guarantees integrity is true before starting a secure programming?
Both in symmetric and public-key
encryption, is my question
well-proofed ?
If no, what are the
vulnerabilities, can you give an
example?

No. This is easy to see if you consider the one-time pad, a simple (theoretically) perfectly secure system.
If you change any bit of the output, a bit of the clear text will change, and the recipient has no way to detect this.
This is an obvious case, but the same conclusion applies to most encryption systems. They only provide for confidentiality, not integrity.
Thus, you may want to add a digital signature. Interestingly, when using public key cryptography, it is not sufficient to sign then encrypt (SE), or to encrypt then sign (ES). Both of these are vulnerable to replay attacks. You have to either sign-encrypt-sign or encrypt-sign-encrypt to have a generally secure solution. This paper explains why in detail.
If you use SE, the recipient can decrypt the message, then re-encrypt it to a different recipient. This then deceives the new recipient about the sender's intended recipient.
If you use ES, an eavesdropper can remove the signature and add their own. Thus, even though they can't read the message, they can take credit for it, pretending to be the original sender.

In short the answer is no. Message Integrity and Secrecy are different, and require different tools.
Lets take a simple coin flip into consideration, and in this case we are betting on the results. The result is a simple bool and I encrypt it using a stream cipher like RC4 which yields 1 encrypted bit and I email it to you. You don't have the key, and I ask you to email me back the answer.
A few attacks can happen in this scenario.
1)An attacker could modify the bit in transit, if it was a 0 there is a 50% chance it will become a 1 and the contrary is true. This is because RC4 produces a prng stream that is XOR'ed with the plain text produce the cipher text, similar to a one time pad.
2)Another possibility is that I could provide you with a different key to make sure your answer is wrong. This is easy to brute force, I just just keep trying keys until I get the proper bit flip.
A solution is to use a block cipher is CMAC Mode. A CMAC is a message authentication code similar to an hmac but it uses a block cipher instead of a message digest function. The secret key (K) is the same key that you use to encrypt the message. This adds n+1 blocks to the cipher text. In my scenario this prevents both attacks 1 and 2. An attacker cannot flip a simple bit because the plain text is padded, even if the message only takes up 1 bit i must transmit a minimum of 1 block using a block cipher. The additional authentication block prevents me from chaining the key, and it also provides integrity from anyone attempting to modify the cipher text in transit (although this would be very difficult to do in practice, the additional layer of security is useful).
WPA2 uses AES-CMAC for these reasons.

If data integrity is a specific concern to you, you should use a cryptographic hash function, combined with an an encryption algorithm.
But it really does come down to using the correct tool for the job. Some encryption algorithms may provide some level of checksum validation built-in, others may not.

authentication token is encrypted but not signed - weakness?

Through the years I've come across this scenario more than once. You have a bunch of user-related data that you want to send from one application to another. The second application is expected to "trust" this "token" and use the data within it. A timestamp is included in the token to prevent a theft/re-use attack. For whatever reason (let's not worry about it here) a custom solution has been chosen rather than an industry standard like SAML.
To me it seems like digitally signing the data is what you want here. If the data needs to be secret, then you can also encrypt it.
But what I see a lot is that developers will use symmetric encryption, e.g. AES. They are assuming that in addition to making the data "secret", the encryption also provides 1) message integrity and 2) trust (authentication of source).
Am I right to suspect that there is an inherent weakness here? At face value it does seem to work, if the symmetric key is managed properly. Lacking that key, I certainly wouldn't know how to modify an encrypted token, or launch some kind of cryptographic attack after intercepting several tokens. But would a more sophisticated attacker be able to exploit something here?

Part of it depends on the Encryption Mode. If you use ECB (shame on you!) I could swap blocks around, altering the message. Stackoverflow got hit by this very bug.
Less threatening - without any integrity checking, I could perform a man-in-the-middle attack, and swap all sorts of bits around, and you would receive it and attempt to decrypt it. You'd fail of course, but the attempt may be revealing. There are side-channel attacks by "Bernstein (exploiting a combination of cache and microarchitectural characteristics) and Osvik, Shamir, and Tromer (exploiting cache collisions) rely on gaining statistical data based on a large number of random tests." 1 The footnoted article is by a cryptographer of greater note than I, and he advises reducing the attack surface with a MAC:
if you can make sure that an attacker
who doesn't have access to your MAC
key can't ever feed evil input to a
block of code, however, you
dramatically reduce the chance that he
will be able to exploit any bugs

Yup. Encryption alone does not provide authentication. If you want authentication then you should use an message authentication code such as HMAC or digital signatures (depending on your requirements).
There are quite a large number of attacks that are possible if messages are just encrypted, but not authenticated. Here is just a very simple example. Assume that messages are encrypted using CBC. This mode uses an IV to randomize the ciphertext so that encrypting the same message twice does not result in the same ciphertext. Now look what happens during decryption if the attacker just modifies the IV but leaves the remainder of the ciphertext as is. Only the first block of the decrypted message will change. Furthermore exactly those bits changed in the IV change in the message. Hence the attacker knows exactly what will change when the receiver decrypts the message. If that first block
was for example a timestamp an the attacker knows when the original message was sent, then he can easily fix the timestamp to any other time, just by flipping the right bits.
Other blocks of the message can also be manipulated, though this is a little trickier. Note also, that this is not just a weakness of CBC. Other modes like OFB, CFB have similiar weaknesses. So expecting that encryption alone provides authentication is just a very dangerous assumption

A symmetric encryption approach is as secure as the key. If both systems can be given the key and hold the key securely, this is fine. Public key cryptography is certainly a more natural fit.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string