I've been trying to figure out the best way to accomplish the task of encrypting big (several GB) files into the file system for later access.
I've been experimenting with several modes of AES (particularly CBC and GCM) and there are some pros and cons I've found on each approach.
After researching and asking around, I come to the conclusion that at least at this moment, using AES+GCM is not feasible for me, mostly because of the issues it has in Java and the fact that I can't use BouncyCastle.
So I am writing this to talk about the protocol I'm going to be implementing to complete the task. Please provide feedback as you see fit.
Encryption
Using AES/CBC/PKCS5Padding with 256 bit keys
The file will be encrypted using a custom CipherOutputStream. This output stream will take care of writing a custom header at the beginning of the file which will consist of at least the following:
First few bytes to easyly tell that the file is encrypted
IV
Algorithm, mode and padding used
Size of the key
The length of the header itself
While the file is being encrypted, it will be also digested to calculate its authentication tag.
When the encryption ends, the tag will be appended at the end of the file. The tag is of a know size, so this makes it easy to later recover it.
Decryption
A custom CipherInputStream will be used. This stream knows how to read the header.
It will then read the authentication tag, and will digest the whole file (without encrypting it) to validate it has not been tampered (I haven't actually measure how this will perform, however it's the only way I can think of to safely start decryption wihtout the risk of knowing too late the file should not have been decrypted in the first place).
If the validation of the tag is ok, then the header will provide all the information needed to initialize the cipher and make the input stream decrypt the file. Otherwise it will fail.
Is this something that seems ok to you in order to handle encryption/decryption of big files?
Some points:
A) Hashing of the encrypted data, with the hash not encrypted itself.
One of the possible things a malicious human M could do without any hash: Overwrite the encrypted file with something else. M doesn´t know key, the plaintext before and/or the plaintext after this action, but he can change the plaintext to something different (usually, it becomes garbage data). Destruction is also a valid purpose for some people.
The "good" user with the key can still decrypt it without problems, but it won´t be the original plaintext. So far no problems if it´s garbage data if (and only if) you know for sure what´s inside, ie. how to recognize if it is unchanged. But do you know that in every case? And there´s a small chance that the "gargabe" data actually makes sense, but is not the real one anyways.
So, to recognize if the file was changed, you add a SHA hash of the encrypted data.
And if the evil user M overwrites the encrypted file part, he will do what with the hash? Right, he can recalculate it so that it matches the new encrypted data. Once again, you can´t recognize changes.
If the plaintext is hashed and then everything is encrypted, it´s pretty much impossible to get it right. Remember, M doesn´t know the key or anything. M can change the plaintext inside to "something", but can´t change the hash to the correct value for this something.
B) CBC
CBC is fine if you decrypt the whole file or nothing everytime.
If you want to access parts of it without decrypting the unused parts, look at XTS.
C) Processing twice
It will then read the authentication tag, and will digest the whole
file (without encrypting it) to validate it has not been tampered (I
haven't actually measure how this will perform, however it's the only
way I can think of to safely start decryption wihtout the risk of
knowing too late the file should not have been decrypted in the first
place).
Depending on how the files are used, this in indeed necessary. Especially if you want to use the data during the final step already, before it has finished.
I don´t know details about the Java CipherOutputStream,
but besides that and the mentioned points, it looks fine to me.
Related
I am securing our our servers using password based encryption for Jboss 5.1.0.
I have read the parts of the RFC:
https://www.rfc-editor.org/rfc/rfc2898
I have read the several Jboss documents several times:
https://docs.jboss.org/jbosssecurity/docs/6.0/security_guide/html/Encrypting_Data_Source_Passwords.html (this is for 6.0, but works with 5.1.0)
Now, let me explain my issue.
In the official JBoss document listed above, they treat "Secured Identity" encryption as if it is secure. Heck, it is in the documents. Worse, I've seen other people ask questions on Stack Overflow on how to use this. This is not secure. To make it secure, one has to write their own encryption class overriding org.jboss.resource.security.SecureIdentityLoginModule.
I was able to prove this by doing a quick google search of "Decrypt Jboss 5.1.0 Password" and the first result was a jar file that decrypted Jboss passwords using the recommended approach in the official Jboss documentation.
Enter Password Based Encryption.
Knowing I've already found a security flaw in the first approach, I am already wearing of taking advice from this documentation-- if your wrong once, your probably wrong twice. However, it seems I don't have I must use approach 2: Password Based Encryption.
My concern is, the documentation makes me generate a 'master.password' file. I am assuming this is the derived key function mentioned in the RFC. However, I don't know for certain.
All in all, my gut tells me this:
Your making me store a master.password on my server. The master.password file that contains the derived key function can be used by some code somewhere to simply decrypt my encrypted password. That is because I am specifying salt and iterations elsewhere in other files.
This whole process seems like a mathematical function. On my end it looks like this:
? = DerivedKeyFunction(Salt, Iterations, Password)
But for the hacker it looks like this:
EncodedPassword = DerivedKeyFunction(Salt, Iterations, ?)
I'm claim to be neither a cryptographer or a Jboss expert, but my gut tells me all a hacker needs to do is look at the Jboss source code (which is open source as far as I know) and do a little bit of reverse engineering to get the password using the server.password file.
So my question is: How secure is Password Based Encryption on Jboss (assuming the hacker gained access to the server)? Has anyone actually looked into this?
------------ EDIT -----------------
To clarify:
This is for JBoss to connect to our Database. This is not for an end-user to log into their user account on a web application.
JBoss uses a master.password (or server.password... its just a filename) which contains some sort of encrypted string. I'm not sure what's in here, its not well documented (or maybe it is and I just don't understand).
After the configuration is followed, a password is never entered again. I don't see how this is secure. I'm guessing I can somehow use the server.password file created in step 1 to decrypt my database password. Someone just hasn't written a convenient jar file yet. But the code is opensource, so I'm guessing the right person knows how to do this very easily.
I am sharing the steps due to the number of terrible setups I've seen people using on stack overflow. The steps are as follows:
From jboss/common/lib folder, Create server.password file. place in server/conf directory.:
java -cp jbosssx.jar org.jboss.security.plugins.FilePassword <8Charactersalt> <iterationsMoreThan1000> <aLongRandomPassword> server.password
#outputs server.password file which contains encrypted string.
Encrypt Database Password
java -cp jbosssx.jar org.jboss.security.plugins.PBEUtils <8Charactersalt> <iterationsMoreThan1000> <aLongRandomPassword> <databaseConnectionPassword>
#outputs encrypted DB Password
Remove Username & Password & Update Datasource XML
<security-domain>EncryptedMySqlDbRealm</security-domain>
<depends>jboss.security:service=JaasSecurityDomain,domain=ServerMasterPassword</depends>
Add Mbean To Datasource XML. it specifies the server.password file, salt and iterations.
{CLASS}org.jboss.security.plugins.FilePassword:${jboss.server.home.dir}/conf/server.password
${8Charactersalt}
${iterationsMoreThan1000}
Add Application Policy To Login Config XML. specify username, encrypted password, and datasource to encrypt. There is a 1 to 1 mapping between application policies and datasources, so if you have two datasources, it appears you need 2 application policies as well. Otherwise you get errors starting up jboss.
${DatabaseUsername}
${EncryptedPassword}
jboss.jca:service=LocalTxCM,name=${DataSourceNameFromDatasourceXML}
jboss.security:service=JaasSecurityDomain,domain=ServerMasterPassword
It sounds like your just want to obfuscate your password. Encrypting it just makes this a circular process: You need to choose a password to encrypt your password that you will use to encrypt your password that you will.....
Simply Base-64 encode it. Or some other type of (non-encrypting) encoding.
First, a cavet: I know nothing of the JBoss system to which you are referring. But I'm fairly sure the system is not asking you to "store a master.password on my server." However, I am familiar enough with encryption to offer this explanation:
You want to store some plain-text data and protect it with a password. So, you ask a user for a password, encrypt it, and store the encrypted text (call this "cipher-text") and then discard the password. When the user wants to retrieve it, you ask for a password and then decrypt the "Cipher-Text". If the password is correct you will get back the original Plain-Text.
An encryption (and decryption) process requires a Key. This is numeric value, not a password. So, you need a way to derive a Key-value from some text Pass-Phrase. The DerivedKeyFunction performs this action. The returned result is not an Encoded Password. It is a Key value that is passed to the encryption/decryption process.
So, you ask the user for a Pass-Phrase, then call DerivedKeyFunction to get a Key-value, then encrypt the Plain-Text into Cipher-Text (using the Key), store the resulting Cipher-Text and then discard the Pass-Phrase and the Key.
And to decrypt, you ask for the Pass-Phrase, re-derive the Key, and then decrypt.
Basically, the DerivedKeyFunction is a Hash function (or process); you use it to "convert" the Pass-Phrase into a numeric value that can be used by the encrypt process.
Now, you will note the other two parameters to the DerivedKeyFunction: "Salt" and "Iterations". These are required to increase the difficulty of an attack on your data. "Iterations" (obviously) specifies the number of times to re-hash the Pass-Phrase. And the "Salt" injects a random number into this iterative process.
Hopefully, you see now that these are two values that you will need to store with the Cipher-Text. Any time you want to derive the Key from your Pass-Phrase, you must do it the same way each time; that means iterating the same number of times and injecting the same "Salt" value.
So, now your process is:
1) Pick a random Salt value (yes, a random number).
2) Decide on iteration count, lets say for example 100,000. (this should be a big number, so DerivedKeyFunction takes a long time; explanation in a moment)
3) Ask the user for a Pass-Phrase.
4) Call DerivedKeyFunction handing it the Pass-Phrase, the random Salt value, and 100,000. This returns a Key-value (implied by those 3 parameters).
5) Encrypt Plain-Text into Cipher-Text using the Key-value.
6) Store the Cipher-Text and the Salt value and the iteration count (100,000)
7) Discard the Pass-Phrase and Key-value.
To decrypt:
1) Ask for the Pass-Phrase.
2) Call DerivedKeyFunction handing it the Pass-Phrase, the stored Salt value, and the stored iteration count. If the Pass-Phrase is correct, then the same (correct) Key-value will be returned.
3) Decrypt the Cipher-Text into Plain-Text using the Key-value.
Ok, so why choose a large number for the iteration count? Well, an attacker is will simply just spin around trying password after password until they successfully decrypt the the Cipher-Text. But they have to call DerivedKeyFunction with each attempt they make. And, they must use the correct Salt value and Iteration count that you used when you encrypted the data. And, yes, since you had to store them for your use, an attacker will know what they are; but they still have to call DerivedKeyFunction over-and-over. So you should see that the higher you make the iteration count, the less the number of attempts-per-second an attacker can try.
Although you didn't mention it, when using CBC (Cipher-Block-Chain) type encryption algorithms, there is another parameter called the IV or Initialization Vector. This value is an input to the encryption/decryption as a companion to the Key. As it pertains to the above processes, treat it as an extension of the Key: The DerivedKeyFunction that is used should provide both a Key-value and an IV-value. And, as with the Key-value, the IV-value is never stored and is discarded at the same points that the Key-value is discarded.
A system I have been working on for a while requires DPA, and asked a question about keeping the data passcodes safe. I have since them come up with an idea to fix that, which involves having the data decrypt password for the database stored on the database, but have that encrypted with validated users password (which is stored as an MD5 key) after a different type of hashing.
The question is that does encrypting the password multiple times with different keys (at least 20 characters long, with possible extension) make it considerably easier to decrypt without prior knowledge or information on the password?
No, in general a good cipher should have the property that you cannot retrieve data even if you know the plaintext. Having the data encrypted should not have much influence, geven a good cipher and a big enough key space.
First off, MD5 is no longer considered a secure encryption algorithm. See http://www.kb.cert.org/vuls/id/836068 for details.
Secondly, the encryption key for the data should not be stored in the database itself. It should be stored separately. That way there are at least two things that have to be obtained (the database file and the key) to decrypt the data. If the key is stored in the database itself, it probably wouldn't take long to find it once someone has the database file.
Find a separate method for storing the key. It should either be coded into the application or stored in a file that is obfuscated in some way.
Imagine you have a database full of secret information, e.g. a list of usernames + passwords.
If you want to encrypt this database using an algorithm such as AES-128, how would you encrypt the data?
Encrypt only the secret information fields, e.g. the passwords. Leave the usernames as they are. Output could be: "mike#example.org/AES_ENCRYPTED_PASSWORD;linda#example.org/AES_ENCRYPTED_PASSWORD"
Encrypt the entire database, output would be: "AES_ENCRYPTED_DATA"
The problem I am thinking of: Probably, the data is saved in XML format. So a possible attacker could try random passwords using brute force until he finds an XML-element in the encrypted data. So it's easier to crack than the first approach. Right?
Or is it safe to just save my data temporarily in XML format and then encrypt the whole XML file using AES?
As I understand the OP, the question is more towards "known plaintext" and the size and redundancy of the message to be encrypted.
So:
YES, it is in most cases "easier" to break an encryption if parts of the plaintext are known, like XML-tags.
YES, it may become easier to break an encryption if more encrypted data is available.
BUT: All common off-the-shelf encryption algorithms, that are not yet considered "broken", are pretty immune against both types of attack.
In theory, it should in fact be safer if only short messages of pretty much random content (as passwords) were encrypted. If, however, encrypting many such messages independently, one would have to think about initialization vectors (similar to "salt") and the like to avoid producing patterns one actually intends to hide.
Conclusion:
Take a "good" algorithm with a good key/password/... and -if feasible- encrypt your "database" as one big plaintext message.
Or is it safe to just [...] encrypt the whole [database] file using AES?
Yes, that's what I would recommend in principle. But be very careful how and where you store your data "temporarily"; a file in a "temp" dir, for instance, may not be as "temporary" as one is tempted to believe.
If you are using the same passphrase the difficulty would be the same whether this is a bunch of usernames/passwords or sets of XML files.
Only encrypt what you must encrypt. Though when it comes to passwords, encryption is not a good idea as passwords can be recovered (which will disclose them to the would be attacker).
It is better to store a hash and have a mechanism to generate new passwords if a user can't recall their password.
Better to store the passwords as a hash with salt.
Is it possible to reverse a SHA-1?
I'm thinking about using a SHA-1 to create a simple lightweight system to authenticate a small embedded system that communicates over an unencrypted connection.
Let's say that I create a sha1 like this with input from a "secret key" and spice it with a timestamp so that the SHA-1 will change all the time.
sha1("My Secret Key"+"a timestamp")
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1.
Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
Note1:
I guess you could brute force in some way, but how much work would that actually be?
Note2:
I don't plan to encrypt any data, I just would like to know who sent it.
No, you cannot reverse SHA-1, that is exactly why it is called a Secure Hash Algorithm.
What you should definitely be doing though, is include the message that is being transmitted into the hash calculation. Otherwise a man-in-the-middle could intercept the message, and use the signature (which only contains the sender's key and the timestamp) to attach it to a fake message (where it would still be valid).
And you should probably be using SHA-256 for new systems now.
sha("My Secret Key"+"a timestamp" + the whole message to be signed)
You also need to additionally transmit the timestamp in the clear, because otherwise you have no way to verify the digest (other than trying a lot of plausible timestamps).
If a brute force attack is feasible depends on the length of your secret key.
The security of your whole system would rely on this shared secret (because both sender and receiver need to know, but no one else). An attacker would try to go after the key (either but brute-force guessing or by trying to get it from your device) rather than trying to break SHA-1.
SHA-1 is a hash function that was designed to make it impractically difficult to reverse the operation. Such hash functions are often called one-way functions or cryptographic hash functions for this reason.
However, SHA-1's collision resistance was theoretically broken in 2005. This allows finding two different input that has the same hash value faster than the generic birthday attack that has 280 cost with 50% probability. In 2017, the collision attack become practicable as known as shattered.
As of 2015, NIST dropped SHA-1 for signatures. You should consider using something stronger like SHA-256 for new applications.
Jon Callas on SHA-1:
It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off.
The question is actually how to authenticate over an insecure session.
The standard why to do this is to use a message digest, e.g. HMAC.
You send the message plaintext as well as an accompanying hash of that message where your secret has been mixed in.
So instead of your:
sha1("My Secret Key"+"a timestamp")
You have:
msg,hmac("My Secret Key",sha(msg+msg_sequence_id))
The message sequence id is a simple counter to keep track by both parties to the number of messages they have exchanged in this 'session' - this prevents an attacker from simply replaying previous-seen messages.
This the industry standard and secure way of authenticating messages, whether they are encrypted or not.
(this is why you can't brute the hash:)
A hash is a one-way function, meaning that many inputs all produce the same output.
As you know the secret, and you can make a sensible guess as to the range of the timestamp, then you could iterate over all those timestamps, compute the hash and compare it.
Of course two or more timestamps within the range you examine might 'collide' i.e. although the timestamps are different, they generate the same hash.
So there is, fundamentally, no way to reverse the hash with any certainty.
In mathematical terms, only bijective functions have an inverse function. But hash functions are not injective as there are multiple input values that result in the same output value (collision).
So, no, hash functions can not be reversed. But you can look for such collisions.
Edit
As you want to authenticate the communication between your systems, I would suggest to use HMAC. This construct to calculate message authenticate codes can use different hash functions. You can use SHA-1, SHA-256 or whatever hash function you want.
And to authenticate the response to a specific request, I would send a nonce along with the request that needs to be used as salt to authenticate the response.
It is not entirely true that you cannot reverse SHA-1 encrypted string.
You cannot directly reverse one, but it can be done with rainbow tables.
Wikipedia:
A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a plaintext password up to a certain length consisting of a limited set of characters.
Essentially, SHA-1 is only as safe as the strength of the password used. If users have long passwords with obscure combinations of characters, it is very unlikely that existing rainbow tables will have a key for the encrypted string.
You can test your encrypted SHA-1 strings here:
http://sha1.gromweb.com/
There are other rainbow tables on the internet that you can use so Google reverse SHA1.
Note that the best attacks against MD5 and SHA-1 have been about finding any two arbitrary messages m1 and m2 where h(m1) = h(m2) or finding m2 such that h(m1) = h(m2) and m1 != m2. Finding m1, given h(m1) is still computationally infeasible.
Also, you are using a MAC (message authentication code), so an attacker can't forget a message without knowing secret with one caveat - the general MAC construction that you used is susceptible to length extension attack - an attacker can in some circumstances forge a message m2|m3, h(secret, m2|m3) given m2, h(secret, m2). This is not an issue with just timestamp but it is an issue when you compute MAC over messages of arbitrary length. You could append the secret to timestamp instead of pre-pending but in general you are better off using HMAC with SHA1 digest (HMAC is just construction and can use MD5 or SHA as digest algorithms).
Finally, you are signing just the timestamp and the not the full request. An active attacker can easily attack the system especially if you have no replay protection (although even with replay protection, this flaw exists). For example, I can capture timestamp, HMAC(timestamp with secret) from one message and then use it in my own message and the server will accept it.
Best to send message, HMAC(message) with sufficiently long secret. The server can be assured of the integrity of the message and authenticity of the client.
You can depending on your threat scenario either add replay protection or note that it is not necessary since a message when replayed in entirety does not cause any problems.
Hashes are dependent on the input, and for the same input will give the same output.
So, in addition to the other answers, please keep the following in mind:
If you start the hash with the password, it is possible to pre-compute rainbow tables, and quickly add plausible timestamp values, which is much harder if you start with the timestamp.
So, rather than use
sha1("My Secret Key"+"a timestamp")
go for
sha1("a timestamp"+"My Secret Key")
I believe the accepted answer is technically right but wrong as it applies to the use case: to create & transmit tamper evident data over public/non-trusted mediums.
Because although it is technically highly-difficult to brute-force or reverse a SHA hash, when you are sending plain text "data & a hash of the data + secret" over the internet, as noted above, it is possible to intelligently get the secret after capturing enough samples of your data. Think about it - your data may be changing, but the secret key remains the same. So every time you send a new data blob out, it's a new sample to run basic cracking algorithms on. With 2 or more samples that contain different data & a hash of the data+secret, you can verify that the secret you determine is correct and not a false positive.
This scenario is similar to how Wifi crackers can crack wifi passwords after they capture enough data packets. After you gather enough data it's trivial to generate the secret key, even though you aren't technically reversing SHA1 or even SHA256. The ONLY way to ensure that your data has not been tampered with, or to verify who you are talking to on the other end, is to encrypt the entire data blob using GPG or the like (public & private keys). Hashing is, by nature, ALWAYS insecure when the data you are hashing is visible.
Practically speaking it really depends on the application and purpose of why you are hashing in the first place. If the level of security required is trivial or say you are inside of a 100% completely trusted network, then perhaps hashing would be a viable option. Hope no one on the network, or any intruder, is interested in your data. Otherwise, as far as I can determine at this time, the only other reliably viable option is key-based encryption. You can either encrypt the entire data blob or just sign it.
Note: This was one of the ways the British were able to crack the Enigma code during WW2, leading to favor the Allies.
Any thoughts on this?
SHA1 was designed to prevent recovery of the original text from the hash. However, SHA1 databases exists, that allow to lookup the common passwords by their SHA hash.
Is it possible to reverse a SHA-1?
SHA-1 was meant to be a collision-resistant hash, whose purpose is to make it hard to find distinct messages that have the same hash. It is also designed to have preimage-resistant, that is it should be hard to find a message having a prescribed hash, and second-preimage-resistant, so that it is hard to find a second message having the same hash as a prescribed message.
SHA-1's collision resistance is broken practically in 2017 by Google's team and NIST already removed the SHA-1 for signature purposes in 2015.
SHA-1 pre-image resistance, on the other hand, still exists. One should be careful about the pre-image resistance, if the input space is short, then finding the pre-image is easy. So, your secret should be at least 128-bit.
SHA-1("My Secret Key"+"a timestamp")
This is the pre-fix secret construction has an attack case known as the length extension attack on the Merkle-Damgard based hash function like SHA-1. Applied to the Flicker. One should not use this with SHA-1 or SHA-2. One can use
HMAC-SHA-256 (HMAC doesn't require the collision resistance of the hash function therefore SHA-1 and MD5 are still fine for HMAC, however, forgot about them) to achieve a better security system. HMAC has a cost of double call of the hash function. That is a weakness for time demanded systems. A note; HMAC is a beast in cryptography.
KMAC is the pre-fix secret construction from SHA-3, since SHA-3 has resistance to length extension attack, this is secure.
Use BLAKE2 with pre-fix construction and this is also secure since it has also resistance to length extension attacks. BLAKE is a really fast hash function, and now it has a parallel version BLAKE3, too (need some time for security analysis). Wireguard uses BLAKE2 as MAC.
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1. Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
You did not define the size of your secret. If your attacker knows the timestamp, then they try to look for it by searching. If we consider the collective power of the Bitcoin miners, as of 2022, they reach around ~293 double SHA-256 in a year. Therefore, you must adjust your security according to your risk. As of 2022, NIST's minimum security is 112-bit. One should consider the above 128-bit for the secret size.
Note1: I guess you could brute force in some way, but how much work would that actually be?
Given the answer above. As a special case, against the possible implementation of Grover's algorithm ( a Quantum algorithm for finding pre-images), one should use hash functions larger than 256 output size.
Note2: I don't plan to encrypt any data, I just would like to know who sent it.
This is not the way. Your construction can only work if the secret is mutually shared like a DHKE. That is the secret only known to party the sender and you. Instead of managing this, a better way is to use digital signatures to solve this issue. Besides, one will get non-repudiation, too.
Any hashing algorithm is reversible, if applied to strings of max length L. The only matter is the value of L. To assess it exactly, you could run the state of art dehashing utility, hashcat. It is optimized to get best performance of your hardware.
That's why you need long passwords, like 12 characters. Here they say for length 8 the password is dehashed (using brute force) in 24 hours (1 GPU involved). For each extra character multiply it by alphabet length (say 50). So for 9 characters you have 50 days, for 10 you have 6 years, and so on. It's definitely inaccurate, but can give us an idea, what the numbers could be.
Through the years I've come across this scenario more than once. You have a bunch of user-related data that you want to send from one application to another. The second application is expected to "trust" this "token" and use the data within it. A timestamp is included in the token to prevent a theft/re-use attack. For whatever reason (let's not worry about it here) a custom solution has been chosen rather than an industry standard like SAML.
To me it seems like digitally signing the data is what you want here. If the data needs to be secret, then you can also encrypt it.
But what I see a lot is that developers will use symmetric encryption, e.g. AES. They are assuming that in addition to making the data "secret", the encryption also provides 1) message integrity and 2) trust (authentication of source).
Am I right to suspect that there is an inherent weakness here? At face value it does seem to work, if the symmetric key is managed properly. Lacking that key, I certainly wouldn't know how to modify an encrypted token, or launch some kind of cryptographic attack after intercepting several tokens. But would a more sophisticated attacker be able to exploit something here?
Part of it depends on the Encryption Mode. If you use ECB (shame on you!) I could swap blocks around, altering the message. Stackoverflow got hit by this very bug.
Less threatening - without any integrity checking, I could perform a man-in-the-middle attack, and swap all sorts of bits around, and you would receive it and attempt to decrypt it. You'd fail of course, but the attempt may be revealing. There are side-channel attacks by "Bernstein (exploiting a combination of cache and microarchitectural characteristics) and Osvik, Shamir, and Tromer (exploiting cache collisions) rely on gaining statistical data based on a large number of random tests." 1 The footnoted article is by a cryptographer of greater note than I, and he advises reducing the attack surface with a MAC:
if you can make sure that an attacker
who doesn't have access to your MAC
key can't ever feed evil input to a
block of code, however, you
dramatically reduce the chance that he
will be able to exploit any bugs
Yup. Encryption alone does not provide authentication. If you want authentication then you should use an message authentication code such as HMAC or digital signatures (depending on your requirements).
There are quite a large number of attacks that are possible if messages are just encrypted, but not authenticated. Here is just a very simple example. Assume that messages are encrypted using CBC. This mode uses an IV to randomize the ciphertext so that encrypting the same message twice does not result in the same ciphertext. Now look what happens during decryption if the attacker just modifies the IV but leaves the remainder of the ciphertext as is. Only the first block of the decrypted message will change. Furthermore exactly those bits changed in the IV change in the message. Hence the attacker knows exactly what will change when the receiver decrypts the message. If that first block
was for example a timestamp an the attacker knows when the original message was sent, then he can easily fix the timestamp to any other time, just by flipping the right bits.
Other blocks of the message can also be manipulated, though this is a little trickier. Note also, that this is not just a weakness of CBC. Other modes like OFB, CFB have similiar weaknesses. So expecting that encryption alone provides authentication is just a very dangerous assumption
A symmetric encryption approach is as secure as the key. If both systems can be given the key and hold the key securely, this is fine. Public key cryptography is certainly a more natural fit.