Generating a token that I can prove I generated - security

I need to generate random tokens so that when I see them later I can determine absolutely that they were actually generated by me, i.e. it should be near impossible for anyone else to generate fake tokens. It's kind of like serial number generation except I don't need uniqueness. Actually, its a lot like a digital signature except I am the only one that needs to verify the "signature".
My solution is as follows:
have a secret string S (this is the only data not in the open)
for each token, generate a random string K
token = K + MD5(K + S)
to validate the token is one I generated:
split incoming token into K + H
calculate MD5(K + S), ensure equal to H
It seems to me that it should be impossible for anybody to reliably generate H, given K without S. Is this solution too simplistic?

Check out HMAC.

The solution you presented is on the right track. You're essentially performing challenge-response authentication with yourself. Each token can consist of a non-secret challenge string C, and HMAC(C, K) where K is your server's secret key.
To verify a token, simply recompute the HMAC with the supplied value of C and see if it matches the supplied HMAC value.
Also, as Vinko mentioned, you should not use MD5; SHA-256 is a good choice.

That's not too simplistic, that's certainly a valid way to implement a simple digital signature.
Of course, you can't prove to anybody else that you generated the signature without revealing your secret key S, but for that purpose you would want to use a more sophisticated protocol like PKI.

Just to nitpick a bit you would prove only that whomever has access to S could have generated the token. Another little detail: use a better hash, like SHA256. Because if Mallory is able to generate a collision, she doesn't even need to know S.

Related

What is the difference between AES encryption algorithm and secret key in crypto-js library?

Recently while learning Backend development (Node, Express, MongoDB), I discovered the crypto-js library. According to the docs, we can use the following code to encrypt a message using the AES Encryption -
const ciphertext = CryptoJS.AES.encrypt('my message', 'secret key 123').toString();
However, as I'm new to cryptography, the difference between an encryption algorithm and a secret key is not very clear to me.
I have a message, and I can use the AES encryption algorithm to encrypt/decrypt it. Same way, I can use the AES encryption algorithm to decrypt the message as well. Hence, I don't understand how does a secret key fit in here? How exactly do an encryption algorithm and a secret key work in tandem to secure a message?
I have gone through numerous videos, blogs, StackOverflow posts, etc. on the internet, however, I couldn't understand it completely through all the complex crypto jargon. I do have a faint idea, which I'll describe below with the help of Ceaser's cipher.
In Ceaser's Cipher, what I've understood is that the technique of shifting letters by a certain number (A shifted by 4 places is E) is the encryption algorithm, and the certain number (4) is the secret key.
Can somebody please tell me if I'm correct?
If I'm correct, can you please tell me how exactly this translates in the case of the AES encryption mentioned in the beginning?
If I'm not correct, can anyone explain this with the help of a simple analogy? Please try to minimize the use of crypto jargon, as otherwise I'll get lost again.
The algorithm is a series of steps that happen in processing the data with the secret key to produce the encrypted data.
There are two inputs into the algorithm - the key and the initial data. The algorithm takes those two inputs and produces the encrypted output.
+---------+ +---------+
| Key | | Data |
+---------+ +---------+
\ /
\ /
\ /
\ /
\ /
+-----------+
| Algorithm |
+-----------+
|
+-----------+
| Encrypted |
| Result |
+-----------+
The key and the data are separate from the algorithm. If you change either of them, you will get a different encrypted result without changing the algorithm.
In your example of the very simple Caesar Cipher, the algorithm is that each character in the input is going to be replaced by another character (a substitution cipher) that is offset in the alphabet by some amount.
The key would be what the amount is. So, if the key is 1, then a is replaced by b and b is replaced by c and so on. The code for the algorithm can be written to accept the key as an input parameter or function argument and the algorithm code does not have to be rewritten for a different key. The key is applied to the input data by the algorithm programmatically to produced the result. The same algorithm code works with all the different keys you can pass it.
Can somebody please tell me if I'm correct [in understanding the algorithm and key in the Caesar Cipher]?
Yes, your understanding of that is correct.
If I'm correct, can you please tell me how exactly this translates in the case of the AES encryption mentioned in the beginning?
The AES encryption is a much more complicated algorithm that again accepts input data and a key. In this case, the key is a block of data itself, not just a single number. If you want to know more about how it works, you can find many articles on the web about it so it's probably better to read those than try to repeat all that here. Here's one article: What is AES Encryption and How Does It Work?.
Note, you generally do not need to know how a given encryption algorithm works in order to use it successfully. You do need to know how secure it is, what kind of keys are required, what kind of output it generates and how you decrypt it. But, you don't need to know the details of how the algorithm works. And, you need to select the right type of algorithm (for example, symmetric encryption with the same key used for encryption and description vs. asymmetric encryption such as public key/private key pairs) because this determines how you generate/manage/share secrets.
In your code example:
const ciphertext = CryptoJS.AES.encrypt('my message', 'secret key 123').toString();
CryptoJS.AES.encrypt is a function that implements the algorithm. It accepts two arguments. The first argument is the data you want encrypted. The second argument is a key string where all the data in the key is used in the encryption and the key will need to be supplied again in order to descrypt the data.
The result of the call to CryptoJS.AES.encrypt() is a buffer of data.

Secret vs. Non-secret Initialization Vector

Today I was doing some leisurely reading and stumbled upon Section 5.8 (on page 45) of Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography (Revised) (NIST Special Publication 800-56A). I was very confused by this:
An Approved key derivation function
(KDF) shall be used to derive secret
keying material from a shared secret.
The output from a KDF shall only be
used for secret keying material, such
as a symmetric key used for data
encryption or message integrity, a
secret initialization vector, or a
master key that will be used to
generate other keys (possibly using a
different process). Nonsecret keying
material (such as a non-secret
initialization vector) shall not be
generated using the shared secret.
Now I'm no Alan Turing, but I thought that initialization vectors need not be kept secret. Under what circumstances would one want a "secret initialization vector?"
Thomas Pornin says that IVs are public and he seems well-versed in cryptography. Likewise with caf.
An initialization vector needs not be secret (it is not a key) but it needs not be public either (sender and receiver must know it, but it is not necessary that the Queen of England also knows it).
A typical key establishment protocol will result in both involve parties computing a piece of data which they, but only they, both know. With Diffie-Hellman (or any Elliptic Curve variant thereof), the said shared piece of data has a fixed length and they have no control over its value (they just both get the same seemingly random sequence of bits). In order to use that shared secret for symmetric encryption, they must derive that shared data into a sequence of bits of the appropriate length for whatever symmetric encryption algorithm they are about to use.
In a protocol in which you use a key establishment algorithm to obtain a shared secret between the sender and the receiver, and will use that secret to symmetrically encrypt a message (possibly a very long streamed message), it is possible to use the KDF to produce the key and the IV in one go. This is how it goes in, for instance, SSL: from the shared secret (called "pre-master secret" in the SSL spec) is computed a big block of derived secret data, which is then split into symmetric keys and initialization vectors for both directions of encryption. You could do otherwise, and, for instance, generate random IV and send them along with the encrypted data, instead of using an IV obtained through the KDF (that's how it goes in recent versions of TLS, the successor to SSL). Both strategies are equally valid (TLS uses external random IV because they want a fresh random IV for each "record" -- a packet of data within a TLS connection -- which is why using the KDF was not deemed appropriate anymore).
Well, consider that if two parties have the same cryptographic function, but don't have the same IV, they won't get the same results. So then, it seems like the proposal there is that the two parties get the same shared secret, and each generate, deterministically, an IV (that will be the same) and then they can communicate. That's just how I read it; but I've not actually read the document, and I'm not completely sure that my description is accurate; but it's how I'd start investigating.
IV is public or private, it doesn't matter
let's consider IV is known to attacker, now by looking at encrypted packet/data,
and knowledge of IV and no knowledge on encryption key, can he/she can guess about input data ? (think for a while)
let's go slightly backwards, let's say there is no IV in used in encryption
AES (input, K)= E1
Same input will always produce the same encrypted text.
Attacker can guess Key "K" by looking at encrypted text and some prior knowledge of input data(i.e. initial exchange of some protocols)
So, here is what IV helps. its added with input value , your encrypted text changes even for same input data.
i.e. AES (input, IV, K)= E1
Hence, attacker sees encrypted packets are different (even with same input data) and can't guess easily. (even having IV knowledge)
The starting value of the counter in CTR mode encryption can be thought of as an IV. If you make it secret, you end up with some amount of added security over the security granted by the key length of the cipher you're using. How much extra is hard to say, but not knowing it does increase the work required to figure out how to decrypt a given message.

How to store and verify digits chosen at random from a PIN/Password

If I have a users 6 digit PIN (or n char string) and I wish to verify say 3 digits chosen at random from the PIN (or x chars) as part of a 'login' procedure, how would I store the PIN in a database or some encrypted/hashed version of the PIN in such a way that I could verify the users identity?
Thoughts:
Store the PIN in a reversible
(symmetrically or asymmetrically) encrypted manner, decrypt for digit checks.
Store a range of hashed permutations of the PIN against some
ID, which links to the 'random
digits' selected, eg:
ID: 123 = Hash of Digits 1, 2, 3
ID: 416 = Hash of Digits 4, 1, 6
Issues:
Key security: Assume that the key is
'protected' and that the app is not
financial nor highly critical, but
is 'high-volume'.
Creating a
wide-number number of hash
permutations is both prohibitively
high-storage (16bytes x several
permutations) and time-consuming probably overkill
Are there any other options, issues or refinements?
Yes: I know storing passwords/PINs in a reversible manner is 'contentious' and ideally shouldn't be done.
Update
Just for clarification:
1. Random digits is a scheme I am considering to avoid key-loggers.
2. It is not possible to attempt more than a limited number of retries.
3. Other elements help secure and authenticate access.
As any encryption scheme you use to store the password/pass phrase would be either prohibitively expensive, or, easily cracked I am coming down on the side of just plain storing it in plain textr and ensuring that the database and server security is up to scratch.
You could consider some lightweight encryption scheme to hide the passwords from a casual browser of the database, but, you have to admit that any scheme will have two basic vulnerabilties. One -- your program will need a password or key which will have to be stored somewhere and will be almost as vulnerable to snooping as the actual passwords sotred in plain text, and, Two -- if you have a reasonable number of users then a hacker who has access to the encrypted passwords has lots of "clue"s to aid his brute force attack, and if your site is open to the public he can insert any number of "known texts" into your database.
Since 6C3 is 20 and 10C3 is 120, I'll get a false positive (be authenticated) on 1/6th of my guesses.
This scheme is only slightly better than no authentication at all regardless of how you store the token.
I totally agree with msw but that argument is only (or mostly) valid for the six digit scheme. For the n-char approach, the false positive ratio will (sometimes...) be much lower. One improvement would be that the random characters must be entered in the same order as in the password.
Also I think that storing hashed permutations would make it relatively easy to find the key using some brute force approach. For example, testing and combining different combinations of three characters and checking those against the stored hashes. This would defeat the purpose of hashing the key in the first place so you might as well store the key encrypted instead.
Another, totally different argument, is that your users might get very confused by this odd login procedure :)
One possible solution is to use Reed-Solomon (or something like it) to construct an n-of-m scheme: generate an nth degree polynomial f(x), where n is the number of digits needed to log in, and generate the pin digits by evaluating f(x) at x=1..6. The digits combined become your full pin. Any three of these digits can then be used (along with their x coordinate) to interpolate the polynomial constants. If they are equal to your original constants, the digits are correct.
The biggest problem, of course, is to form a field out of numbers 0..9 for polynomial constant arithmetic. Ordinary arithmetic will not cut it in this instance. And my finite field is too rusty to remember if it is possible. If you go 4 bits per digit, you can use GF(2^4) to overcome this deficiency. In addition, it is not possible to select your PIN. It will need to be assigned to you. Finally, assuming you can fix all the problems, there are only 1000 distinct polynomials for a 3 of n scheme, and it is too small for proper security.
Anyhow, I don't think this will be a good method, but I wanted to add some different ideas into the mix.
You say you've other elements for authentication. If you've also passwords, you might do the following:
Ask for a password (password is stored as hash only on your side)
First check the hash of the entered password against the stored password hash
On success, continue, otherwise go back to 1
Use there entered (unhashed) password as key for symmetrically encrypted PINs
Ask for some random digits of the PIN
This way the PIN is encrypted, but the key is not stored in plain text on your side. The online portal of my bank seems to do just that (at least I hope so that the PIN is encrypted, but from the users view the login process is like the one described above).
The key is 'protected'
The app is not financial nor highly
critical,
The app is 'high-volume'.
Creating a wide-number number of hash
permutations is both prohibitively
high-storage (16bytes x several
permutations) and time-consuming
probably overkill
Random digits is a scheme I am
considering to avoid key-loggers.
It is not possible to attempt more
than a limited number of retries.
Other elements help secure and
authenticate access.
You seem to be arguing for storing the PIN in the clear. I say go for it. You're basically describing a challenge-response authentication method, and cleartext storage on the server side is common for that use-case.
Something similar to this is a one-time-pad, or a secret key matrix. The difference is that the user has to keep / have the pad with them to access. The benefit is that as long as you get the key distribution sufficiently secure, you're very safe from keyloggers.
If you want to make it so that exposure of the matrix / pad doesn't cause compromise alone, have the user use a short (3-4 number) PIN with the pad, and keep your sensitive locking mechanism.
Example of a matrix:
1 2 3 4 5 6 7 8
A ; k j l k a s g
B f q 3 n 0 8 u 0
C 1 2 8 e g u 8 -
A challenge might be: "Enter your PIN, and then the character from square B3 from your matrix."
The response might be:
98763

Initialization vector uniqueness

Best practice is to use unique ivs, but what is unique? Is it unique for each record? or absolutely unique (unique for each field too)?
If it's per field, that sounds awfully complicated, how do you manage the storage of so many ivs if you have 60 fields in each record.
I started an answer a while ago, but suffered a crash that lost what I'd put in. What I said was along the lines of:
It depends...
The key point is that if you ever reuse an IV, you open yourself up to cryptographic attacks that are easier to execute than those when you use a different IV every time. So, for every sequence where you need to start encrypting again, you need a new, unique IV.
You also need to look up cryptographic modes - the Wikipedia has an excellent illustration of why you should not use ECB. CTR mode can be very beneficial.
If you are encrypting each record separately, then you need to create and record one IV for the record. If you are encrypting each field separately, then you need to create and record one IV for each field. Storing the IVs can become a significant overhead, especially if you do field-level encryption.
However, you have to decide whether you need the flexibility of field level encryption. You might - it is unlikely, but there might be advantages to using a single key but different IVs for different fields. OTOH, I strongly suspect that it is overkill, not to mention stressing your IV generator (cryptographic random number generator).
If you can afford to do encryption at a page level instead of the row level (assuming rows are smaller than a page), then you may benefit from using one IV per page.
Erickson wrote:
You could do something clever like generating one random value in each record, and using a hash of the field name and the random value to produce an IV for that field.
However, I think a better approach is to store a structure in the field that collects an algorithm identifier, necessary parameters (like IV) for that parameter, and the ciphertext. This could be stored as a little binary packet, or encoded into some text like Base-85 or Base-64.
And Chris commented:
I am indeed using CBC mode. I thought about an algorithm to do a 1:many so I can store only 1 IV per record. But now I'm considering your idea of storing the IV with the ciphertext. Can you give me more some more advice: I'm using PHP + MySQL, and many of the fields are either varchar or text. I don't have much experience with binary in the database, I thought binary was database-unfriendly so I always base64_encoded when storing binary (like the IV for example).
To which I would add:
IBM DB2 LUW and Informix Dynamic Server both use a Base-64 encoded scheme for the character output of their ENCRYPT_AES() and related functions, storing the encryption scheme, IV and other information as well as the encrypted data.
I think you should look at CTR mode carefully - as I said before. You could create a 64-bit IV from, say, 48-bits of random data plus a 16-bit counter. You could use the counter part as an index into the record (probably in 16 byte chunks - one crypto block for AES).
I'm not familiar with how MySQL stores data at the disk level. However, it is perfectly possible to encrypt the entire record including the representation of NULL (absence of) values.
If you use a single IV for a record, but use a separate CBC encryption for each field, then each field has to be padded to 16 bytes, and you are definitely indulging in 'IV reuse'. I think this is cryptographically unsound. You would be much better off using a single IV for the entire record and either one unit of padding for the record and CBC mode or no padding and CTR mode (since CTR does not require padding - one of its merits; another is that you only use the encryption mode of the cipher for both encrypting and decrypting the data).
Once again, appendix C of NIST pub 800-38 might be helpful. E.g., according to this
you could generate an IV for the CBC mode simply by encrypting a unique nonce with your encryption key. Even simpler if you would use OFB then the IV just needs to be unique.
There is some confusion about what the real requirements are for good IVs in the CBC mode. Therefore, I think it is helpful to look briefly at some of the reasons behind these requirements.
Let's start with reviewing why IVs are even necessary. IVs randomize the ciphertext. If the same message is encrypted twice with the same key then (but different IVs) then the ciphertexts are distinct. An attacker who is given two (equally long) ciphertexts, should not be able to determine whether the two ciphertexts encrypt the same plaintext or two different plaintext. This property is usually called ciphertext indistinguishablility.
Obviously this is an important property for encrypting databases, where many short messages are encrypted.
Next, let's look at what can go wrong if the IVs are predictable. Let's for example take
Ericksons proposal:
"You could do something clever like generating one random value in each record, and using a hash of the field name and the random value to produce an IV for that field."
This is not secure. For simplicity assume that a user Alice has a record in which there
exist only two possible values m1 or m2 for a field F. Let Ra be the random value that was used to encrypt Alice's record. Then the ciphertext for the field F would be
EK(hash(F || Ra) xor m).
The random Ra is also stored in the record, since otherwise it wouldn't be possible to decrypt. An attacker Eve, who would like to learn the value of Alice's record can proceed as follows: First, she finds an existing record where she can add a value chosen by her.
Let Re be the random value used for this record and let F' be the field for which Eve can submit her own value v. Since the record already exists, it is possible to predict the IV for the field F', i.e. it is
hash(F' || Re).
Eve can exploit this by selecting her value v as
v = hash(F' || Re) xor hash(F || Ra) xor m1,
let the database encrypt this value, which is
EK(hash(F || Ra) xor m1)
and then compare the result with Alice's record. If the two result match, then she knows that m1 was the value stored in Alice's record otherwise it will be m2.
You can find variants of this attack by searching for "block-wise adaptive chosen plaintext attack" (e.g. this paper). There is even a variant that worked against TLS.
The attack can be prevented. Possibly by encrypting the random before using putting it into the record, deriving the IV by encrypting the result. But again, probably the simplest thing to do is what NIST already proposes. Generate a unique nonce for every field that you encrypt (this could simply be a counter) encrypt the nonce with your encryption key and use the result as an IV.
Also note, that the attack above is a chosen plaintext attack. Even more damaging attacks are possible if the attacker has the possibility to do chosen ciphertext attacks, i.e. is she can modify your database. Since I don't know how your databases are protected it is hard to make any claims there.
The requirements for IV uniqueness depend on the "mode" in which the cipher is used.
For CBC, the IV should be unpredictable for a given message.
For CTR, the IV has to be unique, period.
For ECB, of course, there is no IV. If a field is short, random identifier that fits in a single block, you can use ECB securely.
I think a good approach is to store a structure in the field that collects an algorithm identifier, necessary parameters (like IV) for that algorithm, and the ciphertext. This could be stored as a little binary packet, or encoded into some text like Base-85 or Base-64.

Random access encryption with AES In Counter mode using Fortuna PRNG:

I'm building file-encryption based on AES that have to be able to work in random-access mode (accesing any part of the file). AES in Counter for example can be used, but it is well known that we need an unique sequence never used twice.
Is it ok to use a simplified Fortuna PRNG in this case (encrypting a counter with a randomly chosen unique key specific to the particular file)? Are there weak points in this approach?
So encryption/decryption can look like this
Encryption of a block at Offset:
rndsubseq = AESEnc(Offset, FileUniqueKey)
xoredplaintext = plaintext xor rndsubseq
ciphertext = AESEnc(xoredplaintext, PasswordBasedKey)
Decryption of a block at Offset:
rndsubseq = AESEnc(Offset, FileUniqueKey)
xoredplaintext = AESDec(ciphertext, PasswordBasedKey)
plaintext = xoredplaintext xor rndsubseq
One observation. I came to the idea used in Fortuna by myself and surely discovered later that it is already invented. But as I read everywhere the key point about it is security, but there's another good point: it is a great random-access pseudo random numbers generator so to speak (in simplified form). So the PRNG that not only produces very good sequence (I tested it with Ent and Die Hard) but also allow to access any sub-sequence if you know the step number. So is it generally ok to use Fortuna as a "Random-access" PRNG in security applications?
EDIT:
In other words, what I suggest is to use Fortuna PRNG as a tweak to form a tweakable AES Cipher with random-access ability. I read the work of Liskov, Rivest and Wagner, but could not understand what was the main difference between a cipher in a mode of operation and a tweakable cipher. They said they suggested to bring this approach from high level inside the cipher itself, but for example in my case xoring the plain text with the tweak, is this a tweak or not?
I think you may want to look up how "tweakable block ciphers" work and have a look at how the problem of disc encryption is solved: Disk encryption theory. Encrypting the whole disk is similar to your problem: encryption of each sector must be done independently (you want independent encryption of data at different offsets) and yet the whole thing must be secure. There is a lot of work done on that. Wikipedia seems to give a good overview.
EDITED to add:
Re your edit: Yes, you are trying to make a tweakable block cipher out of AES by XORing the tweak with the plaintext. More concretely, you have Enc(T,K,M) = AES (K, f(T) xor M) where AES(K,...) means AES encryption with the key K and f(T) is some function of the tweak (in your case I guess it's Fortuna). I had a brief look at the paper you mentioned and as far as I can see it's possible to show that this method does not produce a secure tweakable block cipher.
The idea (based on definitions from section 2 of the Liskov, Rivest, Wagner paper) is as follows. We have access to either the encryption oracle or a random permutation and we want to tell which one we are interacting with. We can set the tweak T and the plaintext M and we get back the corresponding ciphertext but we don't know the key which is used. Here is how to figure out if we use the construction AES(K, f(T) xor M).
Pick any two different values T, T', compute f(T), f(T'). Pick any message M and then compute the second message as M' = M xor f(T) xor f(T'). Now ask the encrypting oracle to encrypt M using tweak T and M' using tweak T'. If we deal with the considered construction, the outputs will be identical. If we deal with random permutations, the outputs will be almost certainly (with probability 1-2^-128) different. That is because both inputs to the AES encryptions will be the same, so the ciphertexts will be also identical. This would not be the case when we use random permutations, because the probability that the two outputs are identical is 2^-128. The bottom line is that xoring tweak to the input is probably not a secure method.
The paper gives some examples of what they can prove to be a secure construction. The simplest one seems to be Enc(T,K,M) = AES(K, T xor AES(K, M)). You need two encryptions per block, but they prove the security of this construction. They also mention faster variants, but they require additional primitive (almost-xor-universal function families).
Even though I think your approach is secure enough, I don't see any benefits over CTR. You have the exact same problem, which is you don't inject true randomness to the ciphertext. The offset is a known systematic input. Even though it's encrypted with a key, it's still not random.
Another issue is how do you keep the FileUniqueKey secure? Encrypted with password? A whole bunch issues are introduced when you use multiple keys.
Counter mode is accepted practice to encrypt random access files. Even though it has all kinds of vulnerabilities, it's all well studied so the risk is measurable.

Resources