I have a large dataset (say 1GB) comprised of many blocks, some with a size of ~ 100 bytes, some around a megabyte. Each block is encrypted by AES-GCM, with the same 128b key (and different IV, naturally). I have a structure that keeps the offset and length of each encrypted block, with its IV and GCM tag.
Question: if I encrypt the structure (thus hiding the beginning, length and IV/tag of each encrypted block), will it make my data safer? Or its ok to leave all thousand(s) encrypted blocks in the open, for anybody to see where each starts and ends, and what is its IV/tag? The block size is fairly standard, and doesn't reveal much about the data. My concern is with direct attacks on the key and data (with thousands of encrypted samples available) - or other indirect attacks.
I believe in the comments you've answered most of your own question. If the question is "do I need to encrypt the structure?" then the next question (as YAHsaves notes) is "is the structure itself sensitive information?" If the answer is no, then that's your answer. To the extent that the structure itself is sensitive, it should be protected.
If there are attacks on your key due to repeated use with unique IVs, then this indicates incorrect use of GCM, and should be resolved. GCM is designed to support key reuse if used correctly. NIST provides good and explicit guidance on how to design GCM systems in NIST 800-38d. In particular, you want to read section 8, and especially 8.2.1 on the the recommended construction of IVs (and 8.3 if you do not use the recommended IV construction).
Most of NIST's guidance can be summed up as "make sure that Key+IV is never reused, ever, and if you can't 100% guarantee it, then guarantee it to at least 2^-31 (99.9999999%), no seriously, we aren't kidding, don't reuse Key+IV, not even once."
Looks like I found an additional answer here. It addresses a different question, but applied to mine, it means: Yes, its ok to leave in the open view thousands of blocks, encrypted with the same key. Actually, up to a ~ billion should be OK - in both random and deterministic IV modes of AES-GCM.
Related
Note: Though I'm mentioning H2, this may apply to any DBMS,
that allows you to store the whole database in a single file; and
that makes its source-code publicly available.
My concern:
Is it possible to break into an encrypted H2 database by doing something like the following?
Store a very large, zeroed out BLOB, a few 100 KB in size, in some table.
Examine the new H2 database file binary and look for a repeating pattern near page/block boundaries. The page/block size could be obtained from the H2 source code. The repeating pattern so obtained would be the cipher key used to encrypt the H2 database.
Once the cipher key stands exposed, the hacker just needs to be dedicated enough to then further dig into the H2 sources and figure out the exact structure of its tables, columns, and rows. In other words, everything stands exposed from this point on.
I have not personally studied the source code of H2, nor am I a cryptography expert, but here's why I think the above -- or some hack along the above lines -- might work:
For performance reasons, all DBMSes read/write data in chunks (pages or blocks 512 bytes to 8 KB in size), and so would H2.
Since a BLOB several hundred KB in size would far exceed the DBMS's page/block size, one can expect the cryptogrphic key (generated internally using the user password) to show up in repeating patterns of sizes less than the page/block size.
A good cryptography algorithm will not be vulnerable to this attack.
The patterns in the plaintext (here the BLOB of zeroes) will be dissipated in the ciphertext. The secret key will not be readily visible in the ciphertext as patterns or otherwise. A classic technique to achieve that when using a block cipher is to make the encryption of a block dependent on the ciphertext of the previous block. Here the blocks I'm referring to are the blocks used in the cryptography algorithm, typically 128 bits of length.
You can for example XOR the plaintext block with the result of the previous block encryption, here is the schema from Wikipedia for CBC mode, which XOR the current block with the result of the previous one prior to encryption.
As you can see, even if you feed all zeroes in each plaintext blocks, you will end up with a completely random looking result.
These are just examples and the actual confusion mechanism used in H2 might be more complex or involved depending on the algorithm they use.
The file encryption algorithm used in H2 does not use the ECB encryption mode. The file encryption algorithm is, as documented, not vulnerable to this kind of attack. The new storage engine that will be used for future versions of H2 uses the standardized AES XTS algorithm.
I was reading wikipedia, and it says
Cryptographic hash functions are a third type of cryptographic algorithm.
They take a message of any length as input, and output a short,
fixed length hash which can be used in (for example) a digital signature.
For good hash functions, an attacker cannot find two messages that produce the same hash.
But why? What I understand is that you can put the long Macbeth story into the hash function and get a X long hash out of it. Then you can put in the Beowulf story to get another hash out of it again X long.
So since this function maps loads of things into a shorter length, there is bound to be overlaps, like I might put in the story of the Hobit into the hash function and get the same output as Beowulf, ok, but this is inevitable right (?) since we are producing a shorter length output from our input? And even if the output is found, why is it a problem?
I can imagine if I invert it and get out Hobit instead of Beowulf, that would be bad but why is it useful to the attacker?
Best,
Yes, of course there will be collisions for the reasons you describe.
I suppose the statement should really be something like this: "For good hash functions, an attacker cannot find two messages that produce the same hash, except by brute-force".
As for the why...
Hash algorithms are often used for authentication. By checking the hash of a message you can be (almost) certain that the message itself hasn't been tampered with. This relies on it being infeasible to find two messages that generate the same hash.
If a hash algorithm allows collisions to be found relatively easily then it becomes useless for authentication because an attacker could then (theoretically) tamper with a message and have the tampered message generate the same hash as the original.
Yes, it's inevitable that there will be collisions when mapping a long message onto a shorter hash, as the hash cannot contain all possible values of the message. For the same reason you cannot 'invert' the hash to uniquely produce either Beowulf or The Hobbit - but if you generated every possible text and filtered out the ones that had your particular hash value, you'd find both texts (amongst billions of others).
The article is saying that it should be hard for an attacker to find or construct a second message that has the same hash value as a first. Cryptographic hash functions are often used as proof that a message hasn't been tampered with - if even a single bit of data flips then the hash value should be completely different.
A couple of years back, Dutch researchers demonstrated weaknesses in MD5 by publishing a hash of their "prediction" for the US presidential election. Of course, they had no way of knowing the outcome in advance - but with the computational power of a PS3 they constructed a PDF file for each candidate, each with the same hash value. The implications for MD5 - already on its way down - as a trusted algorithm for digital signatures became even more dire...
Cryptographic hashes are used for authentication. For instance, peer-to-peer protocols rely heavily on them. They use them to make sure that an ill-intentioned peer cannot spoil the download for everyone else by distributing packets that contain garbage. The torrent file that describes a download contains the hashes for each block. With this check in place, the victim peer can find out that he has been handled a corrupted block and download it again from someone else.
The attacker would like to replace Beowulf by Hobbit to increase saxon poetry's visibility, but the cryptographic hash that is used in the protocol won't let him.
If it is easy to find collisions then the attacker could create malicious data, and simply prepend it with dummy data until the collision is found. The hash check would then pass for the malicious data. That is why collisions should only be possible via brute force and be as rare as possible.
Alternatively collisions are also a problem with Certificates.
Pretty pointless but if i encrypt something with my own cypher (i'll assume it is wrong and bad) then encrypt it with something like AES or another known good cypher would that data be safe?
Logically i say yes because the top layer is secure. Does anyone know for sure?
Maybe.
The situation that you describe has been analyzed by Maurer and Massey in the paper "Cascade ciphers: The importance of being first", published in Journal of Cryptology, 1993. Here the authors show that a cascade of two ciphers can not be weaker than the first cipher. Note, that this result assumes that the ciphers use independent keys.
They also show somewhat surprisingly that the cascade may not be as strong as the second cipher. The example given in the paper is a little academic and it is in fact somewhat hard to come up with realistic examples. Here is a try:
Assume users of a web page are sending encrypted forms containing some sensitive fields to the server. To make sure that the length of the ciphertext does not leak any information all the choices for the sensitive fields should be formated such that they have the same length. Encrypting plaintexts of the same length with say AES-CBC will always result in ciphertexts of the same length. Now, assume that we insert another encryption step before the AES layer that uses compression. Now suddenly, the ciphertexts will no longer have always the same length, and the length depends on the choices of the user. Specially, if the number of choices are restricted (think voting) this may indeed leak real information.
Of course, in many situation adding another encryption step does not hurt. Just, the conclusion of the paper by Maurer and Massey is that you shouldn't rely on it.
This is only provably true if the keys used by the AES layer and the other layer are distinct and completely unrelated.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am asking from a "more secure" perspective. I can imagine a scenario with two required private keys needed for decryption scenarios that may make this an attractive model. I believe it is not adding any additional security other than having to compromise two different private keys. I think that if it was any more secure than encrypting it one million times would be the best way to secure information.
Update a couple of years later: As Rasmus Faber points out 3DES encryption was added to extend the life of DES encryption which had widespread adoption. Encrypting twice using the same key suffers from the Meet in the Middle Attack while encrypting a third time does in fact offer greater security
I understand that it is more secure provided you use different keys. But don't take my word for it. I'm not a crypto-analyst. I don't even play one on TV.
The reason I understand it to be more secure is that you're using extra information for encoding (both multiple keys and an unknown number of keys (unless you publish the fact that there's two)).
Double encryption using the same key makes many codes easier to crack. I've heard this for some codes but I know it to be true for ROT13 :-)
I think the security scheme used by Kerberos is a better one than simple double encryption.
They actually have one master key whose sole purpose is to encrypt the session key and that's all the master key is used for. The session key is what's used to encrypt the real traffic and it has a limited lifetime. This has two advantages.
Evil dudes don't have time to crack the session key since, by the time they've managed to do it, those session keys are no longer in use.
Those same evil dudes don't get an opportunity to crack the master key simply because it's so rarely used (they would need a great many encrypted packets to crack the key).
But, as I said, take that with a big grain of salt. I don't work for the NSA. But then I'd have to tell you that even if I did work for the NSA. Oh, no, you won't crack me that easily, my pretty.
Semi-useful snippet: Kerberos (or Cerberus, depending on your lineage) is the mythological three-headed dog that guards the gates of Hell, a well-chosen mascot for that security protocol. That same dog is called Fluffy in the Harry Potter world (I once had a girlfriend whose massive German Shepherd dog was called Sugar, a similarly misnamed beast).
It is more secure, but not much. The analogy with physical locks is pretty good. By putting two physical locks of the same type on a door, you ensure that a thief that can pick one lock in five minutes now need to spend ten minutes. But you might be much better off by buying a lock that was twice as expensive, which the thief could not pick at all.
In cryptography it works much the same way: in the general case, you cannot ensure that encrypting twice makes it more than twice as hard to break the encryption. So if NSA normally can decrypt your message in five minutes, with double encryption, they need ten minutes. You would probably be much better off by instead doubling the length of the key, which might make them need 100 years to break the encryption.
In a few cases, it makes sense to repeat the encryption - but you need to work the math with the specific algorithm to prove it. For instance, Triple-DES is basically DES repeated three times with three different keys (except that you encrypt-decrypt-encrypt, instead of just encrypting three times). But this also shows how unintuitive this works, because while Triple-DES triples the number of encryptions, it only has double the effective key-length of the DES algorithm.
Encryption with multiple keys is more secure than encryption with a single key, it's common sense.
My vote is that it is not adding any additional security
No.
other than having to compromise two different private keys.
Yes, but you see, if you encrypt something with two ciphers, each using a different key, and one of the ciphers are found to be weak and can be cracked, the second cipher also must be weak for the attacker to recover anything.
Double encryption does not increase the security.
There are two modes of using PGP: asymmetric (public key, with a private key to decrypt), and symmetric (with a passphrase). With either mode the message is encrypted with a session key, which is typically a randomly generated 128-bit number. The session key is then encrypted with the passphrase or with the public key.
There are two ways that the message can be decrypted. One is if the session key can be decryped. This is going to be either a brute-force attack on the passphrase or by an adversary that has your private key. The second way is an algorithmic weakness.
If the adversary can get your private key, then if you have two private keys the adversary will get both.
If the adversary can brute-force your passphrase or catch it with a keystroke logger, then the adversary can almost certainly get both of them.
If there is an algorithmic weakness, then it can be exploited twice.
So although it may seem like double encryption helps, in practice it does not help against any realistic threat.
The answer, like most things, is "it depends". In this case, it depends on how the encryption scheme is implemented.
In general, using double encryption with different keys does improve security, but it does not square the security, due to the meet-in-the-middle attack.
Basically, the attacker doesn't HAVE to break all possible combinations of the first key and the second key (squared security). They can break each key in turn (double security). This can be done in double the time of breaking the single key.
Doubling the time it takes isn't a significant improvement however, as others have noted. If they can break 1 in 10mins, they can break two in 20mins, which is still totally in the realm of possibility. What you really want is to increase security by orders of magnitude so rather than taking 10mins it takes 1000 years. This is done by choosing a better encryption method, not performing the same one twice.
The wikipedia article does a good job of explaining it.
Using brute force to break encryption, the only way they know they got the key, is when the document they've decrypted makes sense. When the document is double encrypted, it still looks like garbage, even if you have the right key - hence you don't know you had the right key.
Is this too obvious or am I missing something?
Its depends on the situation.
For those who gave poor comparison like "locks on doors", think twice before you write something. That example is far from the reality of encryption. Mine is way better =)
When you wrap something, you can wrap it with two diferent things, and it becomes more secure from the outside... true. Imagine that to get to your wrapped sandwitch, instead of unwrap, you cut the wrapping material. Double wrapping now makes no sense, you get it???
WinRAR is VERY secure. There's a case where the goverment couldnt' get into files on a laptop a guy was carrying from Canada. He used WinRAR. They tried to make him give them the password, and he took the 5th. It was on appeal for 2 years, and the courts finally said he didn't have to talk (every court said that during this process). I couldn't believe someone would even think he couldn't take the 5th. The government dropped the case when they lost their appeal, because they still hadn't cracked the files.
I'm considering the following: I have some data stream which I'd like to protect as secure as possible -- does it make any sense to apply let's say AES with some IV, then Blowfish with some IV and finally again AES with some IV?
The encryption / decryption process will be hidden (even protected against debugging) so it wont be easy to guess which crypto method and what IVs were used (however, I'm aware of the fact the power of this crypto chain can't be depend on this fact since every protection against debugging is breakable after some time).
I have computer power for this (that amount of data isn't that big) so the question only is if it's worth of implementation. For example, TripleDES worked very similarly, using three IVs and encrypt/decrypt/encrypt scheme so it probably isn't total nonsense. Another question is how much I decrease the security when I use the same IV for 1st and 3rd part or even the same IV for all three parts?
I welcome any hints on this subject
I'm not sure about this specific combination, but it's generally a bad idea to mix things like this unless that specific combination has been extensively researched. It's possible the mathematical transformations would actually counteract one another and the end result would be easier to hack. A single pass of either AES or Blowfish should be more than sufficient.
UPDATE: From my comment below…
Using TripleDES as an example: think of how much time and effort from the world's best cryptographers went into creating that combination (note that DoubleDES had a vulnerability), and the best they could do is 112 bits of security despite 192 bits of key.
UPDATE 2: I have to agree with Diomidis that AES is extremely unlikely to be the weak link in your system. Virtually every other aspect of your system is more likely to be compromised than AES.
UPDATE 3: Depending on what you're doing with the stream, you may want to just use TLS (the successor to SSL). I recommend Practical Cryptography for more details—it does a pretty good job of addressing a lot of the concerns you'll need to address. Among other things, it discusses stream ciphers, which may or may not be more appropriate than AES (since AES is a block cipher and you specifically mentioned that you had a data stream to encrypt).
I don't think you have anything to loose by applying one encryption algorithm on top of another that is very different from the first one. I would however be wary of running a second round of the same algorithm on top of the first one, even if you've run another one in-between. The interaction between the two runs may open a vulnerability.
Having said that, I think you're agonizing too much on encryption part. Most exposures of data do not happen by breaking an industry-standard encryption algorithm, like AES, but through other weaknesses in the system. I would suggest to spend more time on looking at key management, the handling of unencrypted data, weaknesses in the algorithm's implementation (the possibility of leaking data or keys), and wider system issues, for instance, what are you doing with data backups.
A hacker will always attack the weakest element in a chain. So it helps little to make a strong element even stronger. Cracking an AES encryption is already impossible with 128 Bit key length. Same goes for Blowfish. Choosing even bigger key lengths make it even harder, but actually 128 Bit has never been cracked up to now (and probably will not within the next 10 or 20 years). So this encryption is probably not the weakest element, thus why making it stronger? It is already strong.
Think about what else might be the weakest element? The IV? Actually I wouldn't waste too much time on selecting a great IV or hiding it. The weakest key is usually the enccryption key. E.g. if you are encrypting data stored to disk, but this data needs to be read by your application, your application needs to know the IV and it needs to know the encryption key, hence both of them needs to be within the binary. This is actually the weakest element. Even if you take 20 encryption methods and chain them on your data, the IVs and encryption keys of all 20 needs to be in the binary and if a hacker can extract them, the fact that you used 20 instead of 1 encryption method provided zero additional security.
Since I still don't know what the whole process is (who encrypts the data, who decrypts the data, where is the data stored, how is it transported, who needs to know the encryption keys, and so on), it's very hard to say what the weakest element really is, but I doubt that AES or Blowfish encryption itself is your weakest element.
Who are you trying to protect your data from? Your brother, your competitor, your goverment, or the aliens?
Each of these has different levels at which you could consider the data to be "as secure as possible", within a meaningful budget (of time/cash)
I wouldn't rely on obscuring the algorithms you're using. This kind of "security by obscurity" doesn't work for long. Decompiling the code is one way of revealing the crypto you're using but usually people don't keep secrets like this for long. That's why we have private/public key crypto in the first place.
Also, don't waste time obfuscating the algorithm - apply Kirchoff's principle, and remember that AES, in and of itself, is used (and acknowledged to be used) in a large number of places where the data needs to be "secure".
Damien: you're right, I should write it more clearly. I'm talking about competitor, it's for commercial use. So there's meaningful budget available but I don't want to implement it without being sure I know why I'm doing it :)
Hank: yes, this is what I'm scared of, too. The most supportive source for this idea was mentioned TripleDES. On the other side, when I use one algorithm to encrypt some data, then apply another one, it would be very strange if the 'power' of whole encryption would be lesser than using standalone algorithm. But this doesn't mean it can't be equal... This is the reason why I'm asking for some hint, this isn't my area of knowledge...
Diomidis: this is basically my point of view but my colleague is trying to convince me it really 'boosts' security. My proposal would be to use stronger encryption key instead of one algorithm after another without any thinking or deep knowledge what I'm doing.
#Miro Kropacek - your colleague is trying to add security through Voodoo. Instead, try to build something simple that you can analyse for flaws - such as just using AES.
I'm guessing it was he (she?) who suggested enhancing the security through protection from debugging too...
You can't actually make things less secure if you encrypt more than once with distinct IVs and keys, but the gain in security may be much less than you anticipate: In the example of 2DES, the meet-in-the-middle attack means it's only twice as hard to break, rather than squaring the difficulty.
In general, though, it's much safer to stick with a single well-known algorithm and increase the key length if you need more security. Leave composing cryptosystems to the experts (and I don't number myself one of them).
Encrypting twice is more secure than encrypting once, even though this may not be clear at first.
Intuitively, it appears that encrypting twice with the same algorithm gives no extra protection because an attacker might find a key which decrypts all the way from the final cyphertext back to the plaintext. ... But this is not the case.
E.g. I start with plaintext A and encrypt with key K1 it to get B. Then I encrypt B with key K2 to get C.
Intuitively, it seems reasonable to assume that there may well be a key, K3, which I could use to encrypt A and get C directly. If this is the case, then an attacker using brute force would eventually stumble upon K3 and be able to decrypt C, with the result that the extra encryption step has not added any security.
However, it is highly unlikely that such a key exists (for any modern encryption scheme). (When I say "highly unlikely" here, I mean what a normal person would express using the word "impossible").
Why?
Consider the keys as functions which provide a mapping from plaintext to cyphertext.
If our keys are all KL bits in length, then there are 2^KL such mappings.
However, if I use 2 keys of KL bits each, this gives me (2^KL)^2 mappings.
Not all of these can be equivalent to a single-stage encryption.
Another advantage of encrypting twice, if 2 different algorithms are used, is that if a vulnerability is found in one of the algorithms, the other algorithm still provides some security.
As others have noted, brute forcing the key is typically a last resort. An attacker will often try to break the process at some other point (e.g. using social engineering to discover the passphrase).
Another way of increasing security is to simply use a longer key with one encryption algorithm.
...Feel free to correct my maths!
Yes, it can be beneficial, but probably overkill in most situations. Also, as Hank mentions certain combinations can actually weaken your encryption.
TrueCrypt provides a number of combination encryption algorithms like AES-Twofish-Serpent. Of course, there's a performance penalty when using them.
Changing the algorithm is not improving the quality (except you expect an algorithm to be broken), it's only about the key/block length and some advantage in obfuscation. Doing it several times is interesting, since even if the first key leaked, the resulting data is not distinguishable from random data. There are block sizes that are processed better on a given platform (eg. register size).
Attacking quality encryption algorithms only works by brute force and thus depending on the computing power you can spend on. This means eventually you only can increase the probable
average time somebody needs to decrypt it.
If the data is of real value, they'd better not attack the data but the key holder...
I agree with what has been said above. Multiple stages of encryption won't buy you much. If you are using a 'secure' algorithm then it is practically impossible to break. Using AES in some standard streaming mode. See http://csrc.nist.gov/groups/ST/toolkit/index.html for accepted ciphers and modes. Anything recommended on that site should be sufficiently secure when used properly. If you want to be extra secure, use AES 256, although 128 should still be sufficient anyway. The greatest risks are not attacks against the algorithm itself, but rather attacks against key management, or side channel attacks (which may or may not be a risk depending on the application and usage). If you're application is vulnerable to key management attacks or to side channel attacks then it really doesn't matter how many levels of encryption you apply. This is where I would focus your efforts.