Better practice with PKBDF2, AES, IV and salt - security

So, I'm encrypting list of documents with AES algorithm. I use PBKDF2 to determine key from user password. I have a few question about store data and IV/salt:
How to store documents:
Encrypt all documents with one AES key, IV and salt
Encrypt each document with one AES key, but separate IV and salt
How to store/retrive IV and salt:
Get IV from PBKDF2 (like AES key) and no need to store it somewhere
Generate IV before every document encryption and store as plain text
For salt, I think, there are no option - anyway I need to store it as plain text
As I unterstand from that article (http://adamcaudill.com/2013/04/16/1password-pbkdf2-and-implementation-flaws/) and some others:
It's OK to store IV and salt as plain text, as sometimes attacker even don't need to know them
Different IV can only "distort" first cipher block (for CBC mode), but not all, so it doesn't bring mush security to AES method.

Each document should have its own IV and salt. Since the salt varies, so will the AES key for each document. You should never encrypt two documents with the same key and IV. In the most common mode (CBC), reusing IV+Key leads to some reduction in security. In some modes (CTR), reusing IV+Key destroys the security of the encryption. (The "IV" in CTR is called the "nonce," but it is generally passed to the thing called "IV" in most encryption APIs.)
Typically, you generate the IV randomly, and store it at the start of the file in plain text. If you use PBKDF2 to generate the IV, you need another salt (which you then need to store anyway), so there's not much point to that.
You also need to remember that most common modes of AES (most notably CBC) provide no protection against modification. If someone knows what your plaintext is (or can guess what it might be), they can modify your ciphertext to decrypt to some other value they choose. (This is the actual meaning of "If you have the wrong IV when you decrypt in CBC mode it corrupts the first block" from the article. They say "corrupt" like it means "garbage," but you can actually cause the first block to corrupt in specific ways.)
The way you fix this problem is with either authenticated encryption modes (like CCM or EAX), or you add an HMAC.

Related

How to decrypt hash sha 256 encrypted strings without knowing the key?

for a private research project I wonder if its possible to decrypt SHA256 encrypted strings without having the key and just have Examples of encrypted and decrypted strings.
As an example, I have 1000 Strings as decrypted text and I have the 1000 Strings encrypted. Can't this information be used to decrypt those strings?
I just want to give notice, that I totally do not have any clue about cryptographie and I am sorry if my questions sounds to newbie.
Best regards,
Heini
As an example, I have 1000 Strings as [original] text and I have the 1000 Strings [hash]. Can't this information be used to [identify] those strings?
Sure. Hash each string, and write down "this string went to that hash", or, write it the other direction so you can look up a possible original string from a hash value. You've just created a small rainbow table.
I wonder if its possible to decrypt SHA256 encrypted strings without having the key
SHA-256 is a digest algorithm, not an encryption algorithm (as Ebbe M. Pedersen pointed out in a comment). Digest algorithms don't have keys, and are designed to not be reversible (and even though no collisions are currently known for SHA-256, they're guaranteed to exist by the pigeon-hole principle... so there's no one right answer).
Protocols/processes/algorithms utilizing digest algorithms will often add a salt when hashing, but that's different than a key. The purpose of the salt is to 'defeat' rainbow tables... since you need a new table for every different salt.

How to securely encrypt many similiar chunks of data with the same key?

I'm writing an application that will require the following security features: when launching the CLI version, you should pass some key to it. Some undefined number of chunks of data of the same size will be generated. It needs to be stored remotely. This will be a sensitive data. I want it to be encrypted and accessible only by that one key that was passed to it initially. My question is, which algorithm will suit me? I read about AES but it says that
When you perform an encryption operation you initialize your Encryptor
with this key, then generate a new, unique Initialization Vector for
each record you’re going to encrypt.
which means I'll have to pass a key and an IV, rather than just the key and this IV should be unique for each generated chunk of data (and there is going to be a lot of those).
If the answer is AES, which encryption mode is it?
You can use any modern symmetric algorithm. The amount of data and how to handle your IVs is irrelevant because it applies no matter which symmetric algorithm you pick.
AES-128 is a good choice, as it isn't limited by law in the US and 128 bits is infeasible to brute force. If you aren't in the US, you could use AES-256 if you wanted to, but implementations in Java require additional installations.
You say you are going to generate n many chunks of data (or retrieve, whatever).
You could encrypt them all at once in CBC mode, which keeps AES as a block cipher, and you'll only end up with one IV. You'll need an HMAC here to protect the integrity. This isn't the most modern way, however.
You should use AES in GCM mode as a stream cipher. You'll still have one single IV (nounce) but the ciphertext will also be authenticated.
IVs should be generated randomly and prepended to the ciphertext. You can then retrieve the IV when it is time to decrypt. Remember: IVs aren't secret, they just need to be random!
EDIT: As pointed out below, IVs should be generated using a crypto-secure random number generator. IVs for CTR based modes, like GCM, only need to be unique.
In summary, what you are worried about shouldn't be worried about. One key is fine. More than one IV is fine too, but there are ways to do it with just one. You will have to worry about IVs either way. Don't use ECB mode.

Why is encrypting necessary for security after hashing in the UMAC (Universal Message Authentication Code) algorithm?

On the Wikipedia for UMAC, https://en.wikipedia.org/wiki/UMAC, it states:
The resulting digest or fingerprint is then encrypted to hide the
identity of the hash function used.
Further, in this paper, http://web.cs.ucdavis.edu/~rogaway/papers/umac-full.pdf, it states:
A message is authenticated by hashing it with the shared hash function
and then encrypting the resulting hash (using the encryption key).
My question is, if the set of hash functions H is large enough, and the number of hash buckets |B| is large enough, why do we need to encrypt -- isn't the secret hash secure enough?
For example, take the worst case scenario where every client is sending the same, short content, like "x". If we hash to 32 bytes and our hash depends on a secret 32 byte hash key, and the hashes exhibit uniform properties, how could an attacker ever hope to learn the secret hash key of any individual client, even without encryption?
And, if the attacker doesn't learn the key, how could the attacker ever hope to maliciously alter the message contents?
Thank you!
I don't know much about UMAC specifically but:
Having a rainbow table for a specific hash function defeats any encryption you have put on the message so instead of having a single attack surface, you now have two
As computational powers increase with time you will be more and more likely to figure out the plaintext of the message so PFS (https://en.wikipedia.org/wiki/Forward_secrecy) will never be possible if you leave the MAC unencrypted.
On top of all this, if you can figure out a single plaintext message from a MAC value, you exponentially get closer to decrypting the rest of the message by getting some information about the PRNG, context of the other data, IV, etc.

Using hash of password to encrypt private key

I am developing a web application in which I need to encrypt sensitive information. My plan is to use use AES-256 where the private key is encrypted by a hash of the user's password. I need to store the hash of the password for authentication purposes, but it obviously can't be same used to encrypt the private key. My current thought is to use bcrypt to generate a key to be used to encrypt the private key. For authentication, my thought was to simply hash the password using bcrypt and then hash that hash using bcrypt again and then store that hash in the database. Since it is one way, there shouldn't be any way to use the stored hash to decrypt the private key? Are there any obvious security issues with doing this that I may be missing?
My other thought was to use two different encryption algorithms, such as using a bcrypt hash to encrypt the private key and storing a SHA-2 hash for authentication purposes.
Thanks for your help.
don't use hash to encrypt AES password. salted hash should be used only for authentication. when user logs in, you have his password. use this password to encrypt (first time) and decrypt (later) the AES key and then forget the password.
I'd recommend using PBKDF2 in this situation. You can use two different salts, one that would derive the symmetric key and the other one would derive the password hash to be stored. The salt should contain a deterministic part distinguishing the two different use cases, as well as a random part - cf. this comment:
Otherwise, the salt should contain data that explicitly
distinguishes between different operations and different key
lengths, in addition to a random part that is at least eight
octets long, and this data should be checked or regenerated by
the party receiving the salt. For instance, the salt could have
an additional non-random octet that specifies the purpose of
the derived key. Alternatively, it could be the encoding of a
structure that specifies detailed information about the derived
key, such as the encryption or authentication technique and a
sequence number among the different keys derived from the
password. The particular format of the additional data is left
to the application.
A plain, salted SHA-2 probably isn't enough because of the poor entropy of typical passwords, as was mentioned in the comments.
A suggestion: use two different salts. When the user enters their password concatenate it with a random salt and hash it for the password recognition routine. Use a different salt and hash it again for the AES encryption key. Depending on how secure you want things, you can stretch the hashing as well.
Effectively you have:
storedPasswordCheck = SHA256(password + salt1);
AESkey = SHA256(password + salt2);
The AES keys are not stored of course, but are regenerated from the user's password as needed. You will need two separate salts, best at least 128 bits each, stored for each user.

How can bcrypt have built-in salts?

Coda Hale's article "How To Safely Store a Password" claims that:
bcrypt has salts built-in to prevent rainbow table attacks.
He cites this paper, which says that in OpenBSD's implementation of bcrypt:
OpenBSD generates the 128-bit bcrypt salt from an arcfour
(arc4random(3)) key stream, seeded with random data the kernel
collects from device timings.
I don't understand how this can work. In my conception of a salt:
It needs to be different for each stored password, so that a separate rainbow table would have to be generated for each
It needs to be stored somewhere so that it's repeatable: when a user tries to log in, we take their password attempt, repeat the same salt-and-hash procedure we did when we originally stored their password, and compare
When I'm using Devise (a Rails login manager) with bcrypt, there is no salt column in the database, so I'm confused. If the salt is random and not stored anywhere, how can we reliably repeat the hashing process?
In short, how can bcrypt have built-in salts?
This is bcrypt:
Generate a random salt. A "cost" factor has been pre-configured. Collect a password.
Derive an encryption key from the password using the salt and cost factor. Use it to encrypt a well-known string. Store the cost, salt, and cipher text. Because these three elements have a known length, it's easy to concatenate them and store them in a single field, yet be able to split them apart later.
When someone tries to authenticate, retrieve the stored cost and salt. Derive a key from the input password, cost and salt. Encrypt the same well-known string. If the generated cipher text matches the stored cipher text, the password is a match.
Bcrypt operates in a very similar manner to more traditional schemes based on algorithms like PBKDF2. The main difference is its use of a derived key to encrypt known plain text; other schemes (reasonably) assume the key derivation function is irreversible, and store the derived key directly.
Stored in the database, a bcrypt "hash" might look something like this:
$2a$10$vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa
This is actually three fields, delimited by "$":
2a identifies the bcrypt algorithm version that was used.
10 is the cost factor; 210 iterations of the key derivation function are used (which is not enough, by the way. I'd recommend a cost of 12 or more.)
vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa is the salt and the cipher text, concatenated and encoded in a modified Base-64. The first 22 characters decode to a 16-byte value for the salt. The remaining characters are cipher text to be compared for authentication.
This example is taken from the documentation for Coda Hale's ruby implementation.
I believe that phrase should have been worded as follows:
bcrypt has salts built into the generated hashes to prevent rainbow table attacks.
The bcrypt utility itself does not appear to maintain a list of salts. Rather, salts are generated randomly and appended to the output of the function so that they are remembered later on (according to the Java implementation of bcrypt). Put another way, the "hash" generated by bcrypt is not just the hash. Rather, it is the hash and the salt concatenated.
This is a simple terms...
Bcrypt does not have a database it stores the salt...
The salt is added to the hash in base64 format....
The question is how does bcrypt verifies the password when it has no database...?
What bcrypt does is that it extract the salt from the password hash... Use the salt extracted to encrypt the plain password and compares the new hash with the old hash to see if they are the same...
To make things even more clearer,
Registeration/Login direction ->
The password + salt is encrypted with a key generated from the: cost, salt and the password. we call that encrypted value the cipher text. then we attach the salt to this value and encoding it using base64. attaching the cost to it and this is the produced string from bcrypt:
$2a$COST$BASE64
This value is stored eventually.
What the attacker would need to do in order to find the password ? (other direction <- )
In case the attacker got control over the DB, the attacker will decode easily the base64 value, and then he will be able to see the salt. the salt is not secret. though it is random.
Then he will need to decrypt the cipher text.
What is more important : There is no hashing in this process, rather CPU expensive encryption - decryption. thus rainbow tables are less relevant here.
Lets imagine a table that has 1 hashed password. If hacker gets access he would know the salt but he will have to calculate a big list for all the common passwords and compare after each calculation. This will take time and he would have only cracked 1 password.
Imagine a second hashed password in the same table. The salt is visible but the same above calculation needs to happen again to crack this one too because the salts are different.
If no random salts were used, it would have been much easier, why? If we use simple hashing we can just generate hashes for common passwords 1 single time (rainbow table) and just do a simple table search, or simple file search between the db table hashes and our pre-calculated hashes to find the plain passwords.

Resources