What are the fundamentals to accomplish data encryption with exactly two keys (which could be password-based), but needing only one (either one) of the two keys to decrypt the data?
For example, data is encrypted with a user's password and his company's password, and then he or his company can decrypt the data. Neither of them know the other password. Only one copy of the encrypted data is stored.
I don't mean public/private key. Probably via symmetric key cryptography and maybe it involves something like XORing the keys together to use them for encrypting.
Update: I would also like to find a solution that does not involve storing the keys at all.
The way this is customarily done is to generate a single symmetric key to encrypt the data. Then you encrypt the symmetric key with each recipient's key or password to that they can decrypt it on their own. S/MIME (actually the Cryptographic Message Syntax on which S/MIME is based) uses this technique.
This way, you only have to store one copy of the encrypted message, but multiple copies of its key.
Generally speaking, what you do is encrypt the data with a randomly generated key, and then append versions of that random key that have been encrypted with every known key. So anybody with a valid key can discover the 'real' key that was used to encrypt the data.
If I understood you correctly, you have some data that you are willing to encrypt and distribute the encryption key splitted into n 'key pieces'.(In your case 2 pieces)
For that you could use the XOR based splitting, here is how it works:
You provide the required number of pieces - n, and the secret key – K. To generate n pieces of your key, you need to create (n – 1) random numbers: R1, R2, R3, . . . , Rn−1. For that you can use a SecureRandom number generator, which will prevent us from duplicates.Then you operate XOR function on these Rn-1 pieces and your key - K:
Rn = R1 ⊕ R2 ⊕ R3 ⊕ . . . ⊕ Rn−1 ⊕ K
Now you have your n pieces: R1, R2, R3, …, Rn-1, Rn and you may destroy the K. Those pieces can be spread in your code or sent to users.
To reassemble the key, we use XOR operation on our Rn pieces:
K = R1 ⊕ R2 ⊕ R3 ⊕ . . . ⊕ Rn−1 ⊕ Rn
With the XOR function (⊕) each piece is inherently important in the reconstruction of the key, if any bits in any of the pieces are changed, then the key is not recoverable.
For more info and code, you can take a look at the Android Utility I wrote for that purpose:
GitHub Project: https://github.com/aivarsda/Secret-Key-Split-Util
Also you can try the Secret Key Splitter demo app which uses that Utility :
GooglePlay: https://play.google.com/store/apps/details?id=com.aivarsda.keysplitter
I think I thought of a solution that would work:
D = data to encrypt
h1 = hash(userpassword)
h2 = hash(companyPassword)
k = h1 concat h2
E = function to encrypt
//C is the encrypted data
C = E_h1(h2) concat E_h2(h1) concat E_k(D)
Then either person can decrypt the hash of the other person, and then combine them to decrypt the rest of the data.
Perhaps there is a better solution than this though?
In the more general case, a secret (in this application, a decryption key for the data) can be split into shares such that some threshold number of these shares is required to recover the secret. This is known as secret sharing or with n shares and a threshold of t, a (t,n)-threshold scheme.
One way this can be done is by creating a polynomial of order t-1, setting the secret as the first coefficient, and choosing the rest of the coefficients at random. Then, n random points on this curve are selected and become the shares.
Related
I would like to use an asymmetric encryption process without making the "public" key really public.
My world contains 3 actors : A, B, C and a public message bank.
I want A to be the only one able to encrypt (and decrypt) short messages in the bank (for example AES keys), and I want A to be able to later chose who gains access to the encrypted data in the bank without the need to re-encrypt or add messages to the bank.
It would go like this:
A generates a key pair with RSA : (p1,p2), p1 being the "private" key (used here only for encryption) and p2 being the "public" key (used here only for decryption)
A encrypts the messages with p1 and stores them in the public bank : A, B and C can access the stored encrypted messages. For now, only A can decrypt the messages.
Later, A wants B (and only B) to be able to read the messages (but not to write more messages in his name in the bank), so he gives p2 to B in some secure way.
Is it a correct use of the RSA algorithm ? Does RSA ensure that B and C will not be able to read M without the "public" key p2?
Why does the vocabulary around asymmetric encryption seem to deter me from doing so? Is there a better way to achieve the same functionality ?
Any advice would be much appreciated
EDIT :
I know that I'm not using the RSA protocol in the usual way (public key encryption/ private key decryption), but the math behind it (as I understand it) should allow working the other way around.
As it stands, I have written a python3 project to encrypt a file (using AES) and a public/private key system (RSA) to encrypt the AES key.
My current predicament is as follows, what is the best approach to get the encrypted AES key to the recipient ? My program does NOT depend on the medium for sending of the files, rather just the files are securely encrypted. In other words, once a user chooses a public key of the recipient, there is no peer-to-peer communication.
Is naming the file the RSA encrypted AES key a bad idea ?
I dont have extensive knowledge of cryptography as such, so any suggestions are welcome
If you know the recipient public RSA key you can use RSA-KEM (KEM : Key Encapsulation Mechanism). RSA-KEM for a single recipient with AES-GCM simply as follows;
The Sender;
First generate a x in [2..n-1] uniformly randomly, n is the RSA modulus.
Use a Key Derivation Function (KDF) on x,
key= KDF(x)
for AES 128,192, or 256-bit depending on your need. Prefer 256.
Encrypt the x,
c = x^c mod n
Encrypt the message with AES-GCM generate an IV and
(IV,ciphertext,tag) = AES-GCM-Enc(IV,message, key)
Send (c,(IV,ciphertext,tag))
The receiver;
To get x, They are using their private exponent d,
x = c^d mod n
Uses the same (KDF) on x to derive same AES key,
key= KDF(x)
Decrypts the message with AES-GCM
message = AES-GCM-Dec(IV,ciphertext,tag, key)
Note 1: This is actually a composition of a KEM and a DEM (data encapsulation mechanism; an authenticated cipher serves as a DEM). This provides the standard of IND-CCA2/NM-CCA2—ciphertext indistinguishability and nonmalleability under adaptive chosen-ciphertext attack. That is the minimum requirement for modern Cryptography.
Note 2: If you want to send the key itself as you described, to prevent the attacks on textbook RSA, you will need a padding scheme like OAEP or PKCS#v1.5. RSA-KEM eliminates this by using the full modulus as a message.
Note 3: The above described RSA-KEM work for a single-user case. RSA-KEM for multiple users will fall into Håstad's broadcast attack. Instead use RSAES-OAEP, this makes it safe for multiple recipients with the same x encrypted for different recipients. This will make it very useful to send the message multiple recipients instead of creating a new x for every recipient and encrypting the message for each derived key (as PGP/GPG does).
Today I was doing some leisurely reading and stumbled upon Section 5.8 (on page 45) of Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography (Revised) (NIST Special Publication 800-56A). I was very confused by this:
An Approved key derivation function
(KDF) shall be used to derive secret
keying material from a shared secret.
The output from a KDF shall only be
used for secret keying material, such
as a symmetric key used for data
encryption or message integrity, a
secret initialization vector, or a
master key that will be used to
generate other keys (possibly using a
different process). Nonsecret keying
material (such as a non-secret
initialization vector) shall not be
generated using the shared secret.
Now I'm no Alan Turing, but I thought that initialization vectors need not be kept secret. Under what circumstances would one want a "secret initialization vector?"
Thomas Pornin says that IVs are public and he seems well-versed in cryptography. Likewise with caf.
An initialization vector needs not be secret (it is not a key) but it needs not be public either (sender and receiver must know it, but it is not necessary that the Queen of England also knows it).
A typical key establishment protocol will result in both involve parties computing a piece of data which they, but only they, both know. With Diffie-Hellman (or any Elliptic Curve variant thereof), the said shared piece of data has a fixed length and they have no control over its value (they just both get the same seemingly random sequence of bits). In order to use that shared secret for symmetric encryption, they must derive that shared data into a sequence of bits of the appropriate length for whatever symmetric encryption algorithm they are about to use.
In a protocol in which you use a key establishment algorithm to obtain a shared secret between the sender and the receiver, and will use that secret to symmetrically encrypt a message (possibly a very long streamed message), it is possible to use the KDF to produce the key and the IV in one go. This is how it goes in, for instance, SSL: from the shared secret (called "pre-master secret" in the SSL spec) is computed a big block of derived secret data, which is then split into symmetric keys and initialization vectors for both directions of encryption. You could do otherwise, and, for instance, generate random IV and send them along with the encrypted data, instead of using an IV obtained through the KDF (that's how it goes in recent versions of TLS, the successor to SSL). Both strategies are equally valid (TLS uses external random IV because they want a fresh random IV for each "record" -- a packet of data within a TLS connection -- which is why using the KDF was not deemed appropriate anymore).
Well, consider that if two parties have the same cryptographic function, but don't have the same IV, they won't get the same results. So then, it seems like the proposal there is that the two parties get the same shared secret, and each generate, deterministically, an IV (that will be the same) and then they can communicate. That's just how I read it; but I've not actually read the document, and I'm not completely sure that my description is accurate; but it's how I'd start investigating.
IV is public or private, it doesn't matter
let's consider IV is known to attacker, now by looking at encrypted packet/data,
and knowledge of IV and no knowledge on encryption key, can he/she can guess about input data ? (think for a while)
let's go slightly backwards, let's say there is no IV in used in encryption
AES (input, K)= E1
Same input will always produce the same encrypted text.
Attacker can guess Key "K" by looking at encrypted text and some prior knowledge of input data(i.e. initial exchange of some protocols)
So, here is what IV helps. its added with input value , your encrypted text changes even for same input data.
i.e. AES (input, IV, K)= E1
Hence, attacker sees encrypted packets are different (even with same input data) and can't guess easily. (even having IV knowledge)
The starting value of the counter in CTR mode encryption can be thought of as an IV. If you make it secret, you end up with some amount of added security over the security granted by the key length of the cipher you're using. How much extra is hard to say, but not knowing it does increase the work required to figure out how to decrypt a given message.
Best practice is to use unique ivs, but what is unique? Is it unique for each record? or absolutely unique (unique for each field too)?
If it's per field, that sounds awfully complicated, how do you manage the storage of so many ivs if you have 60 fields in each record.
I started an answer a while ago, but suffered a crash that lost what I'd put in. What I said was along the lines of:
It depends...
The key point is that if you ever reuse an IV, you open yourself up to cryptographic attacks that are easier to execute than those when you use a different IV every time. So, for every sequence where you need to start encrypting again, you need a new, unique IV.
You also need to look up cryptographic modes - the Wikipedia has an excellent illustration of why you should not use ECB. CTR mode can be very beneficial.
If you are encrypting each record separately, then you need to create and record one IV for the record. If you are encrypting each field separately, then you need to create and record one IV for each field. Storing the IVs can become a significant overhead, especially if you do field-level encryption.
However, you have to decide whether you need the flexibility of field level encryption. You might - it is unlikely, but there might be advantages to using a single key but different IVs for different fields. OTOH, I strongly suspect that it is overkill, not to mention stressing your IV generator (cryptographic random number generator).
If you can afford to do encryption at a page level instead of the row level (assuming rows are smaller than a page), then you may benefit from using one IV per page.
Erickson wrote:
You could do something clever like generating one random value in each record, and using a hash of the field name and the random value to produce an IV for that field.
However, I think a better approach is to store a structure in the field that collects an algorithm identifier, necessary parameters (like IV) for that parameter, and the ciphertext. This could be stored as a little binary packet, or encoded into some text like Base-85 or Base-64.
And Chris commented:
I am indeed using CBC mode. I thought about an algorithm to do a 1:many so I can store only 1 IV per record. But now I'm considering your idea of storing the IV with the ciphertext. Can you give me more some more advice: I'm using PHP + MySQL, and many of the fields are either varchar or text. I don't have much experience with binary in the database, I thought binary was database-unfriendly so I always base64_encoded when storing binary (like the IV for example).
To which I would add:
IBM DB2 LUW and Informix Dynamic Server both use a Base-64 encoded scheme for the character output of their ENCRYPT_AES() and related functions, storing the encryption scheme, IV and other information as well as the encrypted data.
I think you should look at CTR mode carefully - as I said before. You could create a 64-bit IV from, say, 48-bits of random data plus a 16-bit counter. You could use the counter part as an index into the record (probably in 16 byte chunks - one crypto block for AES).
I'm not familiar with how MySQL stores data at the disk level. However, it is perfectly possible to encrypt the entire record including the representation of NULL (absence of) values.
If you use a single IV for a record, but use a separate CBC encryption for each field, then each field has to be padded to 16 bytes, and you are definitely indulging in 'IV reuse'. I think this is cryptographically unsound. You would be much better off using a single IV for the entire record and either one unit of padding for the record and CBC mode or no padding and CTR mode (since CTR does not require padding - one of its merits; another is that you only use the encryption mode of the cipher for both encrypting and decrypting the data).
Once again, appendix C of NIST pub 800-38 might be helpful. E.g., according to this
you could generate an IV for the CBC mode simply by encrypting a unique nonce with your encryption key. Even simpler if you would use OFB then the IV just needs to be unique.
There is some confusion about what the real requirements are for good IVs in the CBC mode. Therefore, I think it is helpful to look briefly at some of the reasons behind these requirements.
Let's start with reviewing why IVs are even necessary. IVs randomize the ciphertext. If the same message is encrypted twice with the same key then (but different IVs) then the ciphertexts are distinct. An attacker who is given two (equally long) ciphertexts, should not be able to determine whether the two ciphertexts encrypt the same plaintext or two different plaintext. This property is usually called ciphertext indistinguishablility.
Obviously this is an important property for encrypting databases, where many short messages are encrypted.
Next, let's look at what can go wrong if the IVs are predictable. Let's for example take
Ericksons proposal:
"You could do something clever like generating one random value in each record, and using a hash of the field name and the random value to produce an IV for that field."
This is not secure. For simplicity assume that a user Alice has a record in which there
exist only two possible values m1 or m2 for a field F. Let Ra be the random value that was used to encrypt Alice's record. Then the ciphertext for the field F would be
EK(hash(F || Ra) xor m).
The random Ra is also stored in the record, since otherwise it wouldn't be possible to decrypt. An attacker Eve, who would like to learn the value of Alice's record can proceed as follows: First, she finds an existing record where she can add a value chosen by her.
Let Re be the random value used for this record and let F' be the field for which Eve can submit her own value v. Since the record already exists, it is possible to predict the IV for the field F', i.e. it is
hash(F' || Re).
Eve can exploit this by selecting her value v as
v = hash(F' || Re) xor hash(F || Ra) xor m1,
let the database encrypt this value, which is
EK(hash(F || Ra) xor m1)
and then compare the result with Alice's record. If the two result match, then she knows that m1 was the value stored in Alice's record otherwise it will be m2.
You can find variants of this attack by searching for "block-wise adaptive chosen plaintext attack" (e.g. this paper). There is even a variant that worked against TLS.
The attack can be prevented. Possibly by encrypting the random before using putting it into the record, deriving the IV by encrypting the result. But again, probably the simplest thing to do is what NIST already proposes. Generate a unique nonce for every field that you encrypt (this could simply be a counter) encrypt the nonce with your encryption key and use the result as an IV.
Also note, that the attack above is a chosen plaintext attack. Even more damaging attacks are possible if the attacker has the possibility to do chosen ciphertext attacks, i.e. is she can modify your database. Since I don't know how your databases are protected it is hard to make any claims there.
The requirements for IV uniqueness depend on the "mode" in which the cipher is used.
For CBC, the IV should be unpredictable for a given message.
For CTR, the IV has to be unique, period.
For ECB, of course, there is no IV. If a field is short, random identifier that fits in a single block, you can use ECB securely.
I think a good approach is to store a structure in the field that collects an algorithm identifier, necessary parameters (like IV) for that algorithm, and the ciphertext. This could be stored as a little binary packet, or encoded into some text like Base-85 or Base-64.
I need to generate random tokens so that when I see them later I can determine absolutely that they were actually generated by me, i.e. it should be near impossible for anyone else to generate fake tokens. It's kind of like serial number generation except I don't need uniqueness. Actually, its a lot like a digital signature except I am the only one that needs to verify the "signature".
My solution is as follows:
have a secret string S (this is the only data not in the open)
for each token, generate a random string K
token = K + MD5(K + S)
to validate the token is one I generated:
split incoming token into K + H
calculate MD5(K + S), ensure equal to H
It seems to me that it should be impossible for anybody to reliably generate H, given K without S. Is this solution too simplistic?
Check out HMAC.
The solution you presented is on the right track. You're essentially performing challenge-response authentication with yourself. Each token can consist of a non-secret challenge string C, and HMAC(C, K) where K is your server's secret key.
To verify a token, simply recompute the HMAC with the supplied value of C and see if it matches the supplied HMAC value.
Also, as Vinko mentioned, you should not use MD5; SHA-256 is a good choice.
That's not too simplistic, that's certainly a valid way to implement a simple digital signature.
Of course, you can't prove to anybody else that you generated the signature without revealing your secret key S, but for that purpose you would want to use a more sophisticated protocol like PKI.
Just to nitpick a bit you would prove only that whomever has access to S could have generated the token. Another little detail: use a better hash, like SHA256. Because if Mallory is able to generate a collision, she doesn't even need to know S.