How to securely encrypt many similiar chunks of data with the same key? - security

I'm writing an application that will require the following security features: when launching the CLI version, you should pass some key to it. Some undefined number of chunks of data of the same size will be generated. It needs to be stored remotely. This will be a sensitive data. I want it to be encrypted and accessible only by that one key that was passed to it initially. My question is, which algorithm will suit me? I read about AES but it says that
When you perform an encryption operation you initialize your Encryptor
with this key, then generate a new, unique Initialization Vector for
each record you’re going to encrypt.
which means I'll have to pass a key and an IV, rather than just the key and this IV should be unique for each generated chunk of data (and there is going to be a lot of those).
If the answer is AES, which encryption mode is it?

You can use any modern symmetric algorithm. The amount of data and how to handle your IVs is irrelevant because it applies no matter which symmetric algorithm you pick.
AES-128 is a good choice, as it isn't limited by law in the US and 128 bits is infeasible to brute force. If you aren't in the US, you could use AES-256 if you wanted to, but implementations in Java require additional installations.
You say you are going to generate n many chunks of data (or retrieve, whatever).
You could encrypt them all at once in CBC mode, which keeps AES as a block cipher, and you'll only end up with one IV. You'll need an HMAC here to protect the integrity. This isn't the most modern way, however.
You should use AES in GCM mode as a stream cipher. You'll still have one single IV (nounce) but the ciphertext will also be authenticated.
IVs should be generated randomly and prepended to the ciphertext. You can then retrieve the IV when it is time to decrypt. Remember: IVs aren't secret, they just need to be random!
EDIT: As pointed out below, IVs should be generated using a crypto-secure random number generator. IVs for CTR based modes, like GCM, only need to be unique.
In summary, what you are worried about shouldn't be worried about. One key is fine. More than one IV is fine too, but there are ways to do it with just one. You will have to worry about IVs either way. Don't use ECB mode.

Related

Why is encrypting necessary for security after hashing in the UMAC (Universal Message Authentication Code) algorithm?

On the Wikipedia for UMAC, https://en.wikipedia.org/wiki/UMAC, it states:
The resulting digest or fingerprint is then encrypted to hide the
identity of the hash function used.
Further, in this paper, http://web.cs.ucdavis.edu/~rogaway/papers/umac-full.pdf, it states:
A message is authenticated by hashing it with the shared hash function
and then encrypting the resulting hash (using the encryption key).
My question is, if the set of hash functions H is large enough, and the number of hash buckets |B| is large enough, why do we need to encrypt -- isn't the secret hash secure enough?
For example, take the worst case scenario where every client is sending the same, short content, like "x". If we hash to 32 bytes and our hash depends on a secret 32 byte hash key, and the hashes exhibit uniform properties, how could an attacker ever hope to learn the secret hash key of any individual client, even without encryption?
And, if the attacker doesn't learn the key, how could the attacker ever hope to maliciously alter the message contents?
Thank you!
I don't know much about UMAC specifically but:
Having a rainbow table for a specific hash function defeats any encryption you have put on the message so instead of having a single attack surface, you now have two
As computational powers increase with time you will be more and more likely to figure out the plaintext of the message so PFS (https://en.wikipedia.org/wiki/Forward_secrecy) will never be possible if you leave the MAC unencrypted.
On top of all this, if you can figure out a single plaintext message from a MAC value, you exponentially get closer to decrypting the rest of the message by getting some information about the PRNG, context of the other data, IV, etc.

How Do You Ensure Data Security of Small Data?

My Question:
What is the Best Approach to Ensure Data Security of Small Data? Below I present a concern around symmetric and asymmetric encryption. I'm curious if there is a way to do asymmetric encryption on small data with an equivalent of some sort of "salting" to actually make it secure? If so, how do you pick a "salt" and implement it properly? Or is there a better way to handle this?
Explanation of My Concern:
When encrypting something that has "bulk" it seems to me that asymmetric encryption approaches are pretty secure. My concern is around if I have a small field of data, say a credit card number, password, or social security number in a database. Then the data being encrypted is of fixed length and presentation. That being said, a hacker could attempt to encrypt every possible social security numbers (10^9 permutations) with the public key and compare it to values stored in the db. Once they find a match, they know the real number. Similar attacks can be done for the other data types. Because of this, I decided to avoid symmetric methods like mysql's AES_ENCRYPT() built in function, however now I'm questioning asymmetric as well.
How do we properly protect small data?
Salting is normally used for hash algorithms, but I need to be able to get the data back after. I thought about maybe having some "base bulk text", then append the sensitive data to the end. Do the encrypt on that concatenation. Decryption would reverse the process, by decrypting then stripping off the "base bulk text". If the hacker can figure out the base bulk text then I don't see how this would add any additional security.
Picking other data to include as part of encryption, to help act like a salt value derived from other fields in the database(or hash values of those fields, or combination there of yields the same issue) also seems like it is vulnerable. As hackers could be run through combinations similar to the attack mentioned above to try to perform a more intelligent form of "brute force". That being said, I'm unsure of how to properly secure the small data and my googles have not helped me.
What is the best approach to ensure data security of small data?
If you are encrypting with an RSA public key, there is no need to salt the small data. Use OAEP padding. The padding introduces the equivalent of random salt. Try it: encrypt the credit card number twice with the same RSA public key, using OAEP padding, and look at the result. You will see two different values, indistinguishable from random data.
If you are encrypting with an AES symmetric key, then you can use a random IV per data, and store the IV in the clear, publicly, next to the ciphertext. Try encrypting the credit number twice with AES CBC mode, for example, with a unique, 16 byte (cryptographically strong) IV each time. You will see two different ciphertexts. Now, assuming a 16-byte AES key, try to brute force those two outputs, without using any knowledge of the key. Use just the ciphertext, and the 16 byte IVs, and try to discover the credit card number.
EDIT: It's beyond the scope of the question, but since I mention it in the comment, if a client can send you arbitrary ciphertext to decrypt ("decrypt this credit card info"), you must not let the client see any difference between a padding error on decryption, vs. any other error on decryption. Look up "padding oracle".
If you need to encrypt data use a symmetric key algorithm, AES is a good choice. Use a mode such as CBC and a random IV, this will ensure that encryption the same data will produce different output.
Add PKCS#7 née PKCS#5 for padding.
If there is real value in the data hire a cryptographic domain expert to help with the design and later validation.
Asymmetric encryption is most useful for communicating encrypted data between two parties. For example, you have a mobile application that accepts credit card numbers and needs to transmit them to the server for processing. You want the public application (which is inherently insecure) to be able to encrypt the data and only you should be able to decrypt it in your secure environment.
Storage is a completely different matter. You're not communicating anything to or from an insecure party, you are the only one dealing with the data. You don't want to give everyone a way to decrypt things if they breach your storage, you want to make things as difficult as possible. Use a symmetric algorithm for storage and include a unique Initialization Vector with each encrypted value as a hurdle to decryption if the storage is compromised.
PCI-DSS requires that you use Strong Cryptography, which they define as follows.
At the time of publication, examples of industry-tested and accepted standards and algorithms for minimum encryption strength include AES (128 bits and higher), TDES (minimum triple-lengthkeys), RSA (2048 bits and higher), ECC (160 bits and higher), and ElGamal (2048 bits and higher). See NIST Special Publication 800-57 Part 1 (http://csrc.nist.gov/publications/) for more guidance on cryptographic key strengths and algorithms.
Beyond that, they are primarily concerned with key management, and with good reason. Breaching your storage won't help as much as actually having the means to decrypt your data, so ensure that your symmetric key is managed correctly and in accordance with their requirements.
There is also a field of study called Format-preserving encryption which seeks to help legacy systems maintain column-width and data types (a social security number is a 9-digit number even after encryption, etc), while allowing values to be securely encrypted. In this way the encryption can be created at a low level of the legacy system without breaking all of the layers above it which depend on a particular data format.
It is sometimes called "small-space encryption" and the idea is also explained in the paper How to Encipher Messages on a Small Domain
Deterministic Encryption and the Thorp Shuffle which gives an introduction to the topic and presents a specific algorithm devised by the authors. The Wikipedia article mentions many other algorithms with similar purpose.
If you'd prefer a video explanation of the topic, see The Mix-and-Cut Shuffle: Small Domain Encryption Secure Against N Queries talk from Crypto 2013. It includes graphics detailing how several algorithms work and some early research into the security of such designs.
When I encrypt short messages, I add a relatively long random salt to them before encryption. Edit others suggest prepending the salt to the payload.
So, for example, if I encrypt the fake credit card number 4242 4242 4242 4242. what I actually encrypt is
tOH_AN2oi4MkLC3lmxxRWaNqh6--m42424242424242424
the first time, and
iQe5xOZPIMjVWfrDDip244ZGhCy2U142424242424242424
the second time, and so forth.
This random salting significantly discourages the lookup table approach you describe. Many operating systems furnish sources of high-quality random numbers like *nix /dev/rand and Windows' RNGCryptoServiceProvider module.
It's still not OK to hold payment card data in that way without defense in depth and PCI data security certification.
Edit: Some encryption schemes handle this salting as part of their normal functioning.

Better practice with PKBDF2, AES, IV and salt

So, I'm encrypting list of documents with AES algorithm. I use PBKDF2 to determine key from user password. I have a few question about store data and IV/salt:
How to store documents:
Encrypt all documents with one AES key, IV and salt
Encrypt each document with one AES key, but separate IV and salt
How to store/retrive IV and salt:
Get IV from PBKDF2 (like AES key) and no need to store it somewhere
Generate IV before every document encryption and store as plain text
For salt, I think, there are no option - anyway I need to store it as plain text
As I unterstand from that article (http://adamcaudill.com/2013/04/16/1password-pbkdf2-and-implementation-flaws/) and some others:
It's OK to store IV and salt as plain text, as sometimes attacker even don't need to know them
Different IV can only "distort" first cipher block (for CBC mode), but not all, so it doesn't bring mush security to AES method.
Each document should have its own IV and salt. Since the salt varies, so will the AES key for each document. You should never encrypt two documents with the same key and IV. In the most common mode (CBC), reusing IV+Key leads to some reduction in security. In some modes (CTR), reusing IV+Key destroys the security of the encryption. (The "IV" in CTR is called the "nonce," but it is generally passed to the thing called "IV" in most encryption APIs.)
Typically, you generate the IV randomly, and store it at the start of the file in plain text. If you use PBKDF2 to generate the IV, you need another salt (which you then need to store anyway), so there's not much point to that.
You also need to remember that most common modes of AES (most notably CBC) provide no protection against modification. If someone knows what your plaintext is (or can guess what it might be), they can modify your ciphertext to decrypt to some other value they choose. (This is the actual meaning of "If you have the wrong IV when you decrypt in CBC mode it corrupts the first block" from the article. They say "corrupt" like it means "garbage," but you can actually cause the first block to corrupt in specific ways.)
The way you fix this problem is with either authenticated encryption modes (like CCM or EAX), or you add an HMAC.

RSA: Encrypting message using multiple keys

Is it possible to get additional security by encrypting a message using 2 or more RSA keys?
EDIT: A few clarifications:
The context I am most interested in doing this for is encrypting a randomly generated symmetric key.
I don't want to limit the question to encrypting twice in a row; the purpose is to avoid the high computational cost of large RSA keys. Using less straightforward tactics such as breaking the message into parts and encrypting them separately should be considered as an option.
It should be assumed that getting only part of the message is acceptable.
If you know of any publications where this is discussed specifically by an expert, or algorithms that use multiple RSA keys, then please contribute.
No.
It is not safe to do thought experiments regarding cryptography. You are advised to keep narrowly to the path trodden by the experts.
And when the experts want to protect something better, they use a bigger key-size (at least 2048 bits is required, smaller certificates are insufficient for any peace of mind) or use elliptic curve certificates in preference to RSA.
Incidentally, you're remember that your message body is typically encrypted with a symmetric cipher and a random key, and that just this random key is encrypted with the public key of the recipient. Double-encrypting this secret key won't make this secret key longer, and won't impact an attacker's ability to brute-force that.
Quantum cryptography - I mention it only as an exciting aside, you need not factor this into your choice - promises interesting things for the keysizes: the RSA keys will be wiped out by Shor's algorithm, but the symmetric keys (Grover's) will be only half-lengthed (128-bits will be equiv to 64-bits, so will be crackable). There is of course debate about whether such quantum machines can be implemented etc etc :)
No.
If Key A is compromised than encrypted with A+B will protect against the compromise, but outside that special case, you get no additional benefit.
Composing ciphers
Say you have an encryption function E(M, K), where M is the plaintext message and K is the key. Say no known vulnerabilities exist in E.
You generate two completely unrelated keys K1 and K2.
It is guaranteed that if you compose them in the form E(E(M, K1), K2), it is impossible to actually lose security this way. If it was possible to lose security from encrypting E(M, K1), be it with K2 or any other key, the is cipher broken, because an attacker could just do E(E(M, K1), KF) where KF is any key the attacker wishes to choose.
For more info see here.
Encrypting every second block with a different key
The implications here are obvious. Assuming you are using properly composed cryptographic primitives with both encryption function:key combinations, if you encrypt every second block with a different key out of the set of two keys, the attacker can only decrypt the blocks he has the key for.
Yes!
But do not use raw encryption. Use RSA encryption schema. Instead of reencrypting the encrypted message with the second key, which might have weakening effet (I don't know), use the shared secret algorithm to split your secret in two. The shared secret algorithm make it possible to split a secret in n pieces and ensures that if an attacker manages to get n-1 pieces he knows nothing of the secret. So don't simply split the secret in two.
You can then have more then 2 RSA keys. Another powerful property of the shared secret algorithm is that it is possible to spread the secret over n pieces and require only m pieces, with m smaller than n, to recover the secret. This makes the secret recovery more robust to loss of pieces.
Look here for more information on shared secret: http://en.wikipedia.org/wiki/Shared_secret
In additional to the answers given, it also simply doesn't work unless you do some patching. Very simply, one of the moduli must be larger than the other. If you perform RSA mod the larger modulus first and mod the smaller last you lose information and cannot guarantee successful decryption. The obvious patch is to always encrypt with the smaller modulus first. Of course, you have to perform decryption in the opposite order. Another simple patch is choose moduli that a very close together in size, so that the probability that you encounter a ciphertext that cannot be uniquely decrypted is vanishingly small.

authentication token is encrypted but not signed - weakness?

Through the years I've come across this scenario more than once. You have a bunch of user-related data that you want to send from one application to another. The second application is expected to "trust" this "token" and use the data within it. A timestamp is included in the token to prevent a theft/re-use attack. For whatever reason (let's not worry about it here) a custom solution has been chosen rather than an industry standard like SAML.
To me it seems like digitally signing the data is what you want here. If the data needs to be secret, then you can also encrypt it.
But what I see a lot is that developers will use symmetric encryption, e.g. AES. They are assuming that in addition to making the data "secret", the encryption also provides 1) message integrity and 2) trust (authentication of source).
Am I right to suspect that there is an inherent weakness here? At face value it does seem to work, if the symmetric key is managed properly. Lacking that key, I certainly wouldn't know how to modify an encrypted token, or launch some kind of cryptographic attack after intercepting several tokens. But would a more sophisticated attacker be able to exploit something here?
Part of it depends on the Encryption Mode. If you use ECB (shame on you!) I could swap blocks around, altering the message. Stackoverflow got hit by this very bug.
Less threatening - without any integrity checking, I could perform a man-in-the-middle attack, and swap all sorts of bits around, and you would receive it and attempt to decrypt it. You'd fail of course, but the attempt may be revealing. There are side-channel attacks by "Bernstein (exploiting a combination of cache and microarchitectural characteristics) and Osvik, Shamir, and Tromer (exploiting cache collisions) rely on gaining statistical data based on a large number of random tests." 1 The footnoted article is by a cryptographer of greater note than I, and he advises reducing the attack surface with a MAC:
if you can make sure that an attacker
who doesn't have access to your MAC
key can't ever feed evil input to a
block of code, however, you
dramatically reduce the chance that he
will be able to exploit any bugs
Yup. Encryption alone does not provide authentication. If you want authentication then you should use an message authentication code such as HMAC or digital signatures (depending on your requirements).
There are quite a large number of attacks that are possible if messages are just encrypted, but not authenticated. Here is just a very simple example. Assume that messages are encrypted using CBC. This mode uses an IV to randomize the ciphertext so that encrypting the same message twice does not result in the same ciphertext. Now look what happens during decryption if the attacker just modifies the IV but leaves the remainder of the ciphertext as is. Only the first block of the decrypted message will change. Furthermore exactly those bits changed in the IV change in the message. Hence the attacker knows exactly what will change when the receiver decrypts the message. If that first block
was for example a timestamp an the attacker knows when the original message was sent, then he can easily fix the timestamp to any other time, just by flipping the right bits.
Other blocks of the message can also be manipulated, though this is a little trickier. Note also, that this is not just a weakness of CBC. Other modes like OFB, CFB have similiar weaknesses. So expecting that encryption alone provides authentication is just a very dangerous assumption
A symmetric encryption approach is as secure as the key. If both systems can be given the key and hold the key securely, this is fine. Public key cryptography is certainly a more natural fit.

Resources