A security concern with H2 - security

Note: Though I'm mentioning H2, this may apply to any DBMS,
that allows you to store the whole database in a single file; and
that makes its source-code publicly available.
My concern:
Is it possible to break into an encrypted H2 database by doing something like the following?
Store a very large, zeroed out BLOB, a few 100 KB in size, in some table.
Examine the new H2 database file binary and look for a repeating pattern near page/block boundaries. The page/block size could be obtained from the H2 source code. The repeating pattern so obtained would be the cipher key used to encrypt the H2 database.
Once the cipher key stands exposed, the hacker just needs to be dedicated enough to then further dig into the H2 sources and figure out the exact structure of its tables, columns, and rows. In other words, everything stands exposed from this point on.
I have not personally studied the source code of H2, nor am I a cryptography expert, but here's why I think the above -- or some hack along the above lines -- might work:
For performance reasons, all DBMSes read/write data in chunks (pages or blocks 512 bytes to 8 KB in size), and so would H2.
Since a BLOB several hundred KB in size would far exceed the DBMS's page/block size, one can expect the cryptogrphic key (generated internally using the user password) to show up in repeating patterns of sizes less than the page/block size.

A good cryptography algorithm will not be vulnerable to this attack.
The patterns in the plaintext (here the BLOB of zeroes) will be dissipated in the ciphertext. The secret key will not be readily visible in the ciphertext as patterns or otherwise. A classic technique to achieve that when using a block cipher is to make the encryption of a block dependent on the ciphertext of the previous block. Here the blocks I'm referring to are the blocks used in the cryptography algorithm, typically 128 bits of length.
You can for example XOR the plaintext block with the result of the previous block encryption, here is the schema from Wikipedia for CBC mode, which XOR the current block with the result of the previous one prior to encryption.
As you can see, even if you feed all zeroes in each plaintext blocks, you will end up with a completely random looking result.
These are just examples and the actual confusion mechanism used in H2 might be more complex or involved depending on the algorithm they use.

The file encryption algorithm used in H2 does not use the ECB encryption mode. The file encryption algorithm is, as documented, not vulnerable to this kind of attack. The new storage engine that will be used for future versions of H2 uses the standardized AES XTS algorithm.

Related

Encrypting many buffers with same key

I have a large dataset (say 1GB) comprised of many blocks, some with a size of ~ 100 bytes, some around a megabyte. Each block is encrypted by AES-GCM, with the same 128b key (and different IV, naturally). I have a structure that keeps the offset and length of each encrypted block, with its IV and GCM tag.
Question: if I encrypt the structure (thus hiding the beginning, length and IV/tag of each encrypted block), will it make my data safer? Or its ok to leave all thousand(s) encrypted blocks in the open, for anybody to see where each starts and ends, and what is its IV/tag? The block size is fairly standard, and doesn't reveal much about the data. My concern is with direct attacks on the key and data (with thousands of encrypted samples available) - or other indirect attacks.
I believe in the comments you've answered most of your own question. If the question is "do I need to encrypt the structure?" then the next question (as YAHsaves notes) is "is the structure itself sensitive information?" If the answer is no, then that's your answer. To the extent that the structure itself is sensitive, it should be protected.
If there are attacks on your key due to repeated use with unique IVs, then this indicates incorrect use of GCM, and should be resolved. GCM is designed to support key reuse if used correctly. NIST provides good and explicit guidance on how to design GCM systems in NIST 800-38d. In particular, you want to read section 8, and especially 8.2.1 on the the recommended construction of IVs (and 8.3 if you do not use the recommended IV construction).
Most of NIST's guidance can be summed up as "make sure that Key+IV is never reused, ever, and if you can't 100% guarantee it, then guarantee it to at least 2^-31 (99.9999999%), no seriously, we aren't kidding, don't reuse Key+IV, not even once."
Looks like I found an additional answer here. It addresses a different question, but applied to mine, it means: Yes, its ok to leave in the open view thousands of blocks, encrypted with the same key. Actually, up to a ~ billion should be OK - in both random and deterministic IV modes of AES-GCM.

How to securely encrypt many similiar chunks of data with the same key?

I'm writing an application that will require the following security features: when launching the CLI version, you should pass some key to it. Some undefined number of chunks of data of the same size will be generated. It needs to be stored remotely. This will be a sensitive data. I want it to be encrypted and accessible only by that one key that was passed to it initially. My question is, which algorithm will suit me? I read about AES but it says that
When you perform an encryption operation you initialize your Encryptor
with this key, then generate a new, unique Initialization Vector for
each record you’re going to encrypt.
which means I'll have to pass a key and an IV, rather than just the key and this IV should be unique for each generated chunk of data (and there is going to be a lot of those).
If the answer is AES, which encryption mode is it?
You can use any modern symmetric algorithm. The amount of data and how to handle your IVs is irrelevant because it applies no matter which symmetric algorithm you pick.
AES-128 is a good choice, as it isn't limited by law in the US and 128 bits is infeasible to brute force. If you aren't in the US, you could use AES-256 if you wanted to, but implementations in Java require additional installations.
You say you are going to generate n many chunks of data (or retrieve, whatever).
You could encrypt them all at once in CBC mode, which keeps AES as a block cipher, and you'll only end up with one IV. You'll need an HMAC here to protect the integrity. This isn't the most modern way, however.
You should use AES in GCM mode as a stream cipher. You'll still have one single IV (nounce) but the ciphertext will also be authenticated.
IVs should be generated randomly and prepended to the ciphertext. You can then retrieve the IV when it is time to decrypt. Remember: IVs aren't secret, they just need to be random!
EDIT: As pointed out below, IVs should be generated using a crypto-secure random number generator. IVs for CTR based modes, like GCM, only need to be unique.
In summary, what you are worried about shouldn't be worried about. One key is fine. More than one IV is fine too, but there are ways to do it with just one. You will have to worry about IVs either way. Don't use ECB mode.

How Do You Ensure Data Security of Small Data?

My Question:
What is the Best Approach to Ensure Data Security of Small Data? Below I present a concern around symmetric and asymmetric encryption. I'm curious if there is a way to do asymmetric encryption on small data with an equivalent of some sort of "salting" to actually make it secure? If so, how do you pick a "salt" and implement it properly? Or is there a better way to handle this?
Explanation of My Concern:
When encrypting something that has "bulk" it seems to me that asymmetric encryption approaches are pretty secure. My concern is around if I have a small field of data, say a credit card number, password, or social security number in a database. Then the data being encrypted is of fixed length and presentation. That being said, a hacker could attempt to encrypt every possible social security numbers (10^9 permutations) with the public key and compare it to values stored in the db. Once they find a match, they know the real number. Similar attacks can be done for the other data types. Because of this, I decided to avoid symmetric methods like mysql's AES_ENCRYPT() built in function, however now I'm questioning asymmetric as well.
How do we properly protect small data?
Salting is normally used for hash algorithms, but I need to be able to get the data back after. I thought about maybe having some "base bulk text", then append the sensitive data to the end. Do the encrypt on that concatenation. Decryption would reverse the process, by decrypting then stripping off the "base bulk text". If the hacker can figure out the base bulk text then I don't see how this would add any additional security.
Picking other data to include as part of encryption, to help act like a salt value derived from other fields in the database(or hash values of those fields, or combination there of yields the same issue) also seems like it is vulnerable. As hackers could be run through combinations similar to the attack mentioned above to try to perform a more intelligent form of "brute force". That being said, I'm unsure of how to properly secure the small data and my googles have not helped me.
What is the best approach to ensure data security of small data?
If you are encrypting with an RSA public key, there is no need to salt the small data. Use OAEP padding. The padding introduces the equivalent of random salt. Try it: encrypt the credit card number twice with the same RSA public key, using OAEP padding, and look at the result. You will see two different values, indistinguishable from random data.
If you are encrypting with an AES symmetric key, then you can use a random IV per data, and store the IV in the clear, publicly, next to the ciphertext. Try encrypting the credit number twice with AES CBC mode, for example, with a unique, 16 byte (cryptographically strong) IV each time. You will see two different ciphertexts. Now, assuming a 16-byte AES key, try to brute force those two outputs, without using any knowledge of the key. Use just the ciphertext, and the 16 byte IVs, and try to discover the credit card number.
EDIT: It's beyond the scope of the question, but since I mention it in the comment, if a client can send you arbitrary ciphertext to decrypt ("decrypt this credit card info"), you must not let the client see any difference between a padding error on decryption, vs. any other error on decryption. Look up "padding oracle".
If you need to encrypt data use a symmetric key algorithm, AES is a good choice. Use a mode such as CBC and a random IV, this will ensure that encryption the same data will produce different output.
Add PKCS#7 née PKCS#5 for padding.
If there is real value in the data hire a cryptographic domain expert to help with the design and later validation.
Asymmetric encryption is most useful for communicating encrypted data between two parties. For example, you have a mobile application that accepts credit card numbers and needs to transmit them to the server for processing. You want the public application (which is inherently insecure) to be able to encrypt the data and only you should be able to decrypt it in your secure environment.
Storage is a completely different matter. You're not communicating anything to or from an insecure party, you are the only one dealing with the data. You don't want to give everyone a way to decrypt things if they breach your storage, you want to make things as difficult as possible. Use a symmetric algorithm for storage and include a unique Initialization Vector with each encrypted value as a hurdle to decryption if the storage is compromised.
PCI-DSS requires that you use Strong Cryptography, which they define as follows.
At the time of publication, examples of industry-tested and accepted standards and algorithms for minimum encryption strength include AES (128 bits and higher), TDES (minimum triple-lengthkeys), RSA (2048 bits and higher), ECC (160 bits and higher), and ElGamal (2048 bits and higher). See NIST Special Publication 800-57 Part 1 (http://csrc.nist.gov/publications/) for more guidance on cryptographic key strengths and algorithms.
Beyond that, they are primarily concerned with key management, and with good reason. Breaching your storage won't help as much as actually having the means to decrypt your data, so ensure that your symmetric key is managed correctly and in accordance with their requirements.
There is also a field of study called Format-preserving encryption which seeks to help legacy systems maintain column-width and data types (a social security number is a 9-digit number even after encryption, etc), while allowing values to be securely encrypted. In this way the encryption can be created at a low level of the legacy system without breaking all of the layers above it which depend on a particular data format.
It is sometimes called "small-space encryption" and the idea is also explained in the paper How to Encipher Messages on a Small Domain
Deterministic Encryption and the Thorp Shuffle which gives an introduction to the topic and presents a specific algorithm devised by the authors. The Wikipedia article mentions many other algorithms with similar purpose.
If you'd prefer a video explanation of the topic, see The Mix-and-Cut Shuffle: Small Domain Encryption Secure Against N Queries talk from Crypto 2013. It includes graphics detailing how several algorithms work and some early research into the security of such designs.
When I encrypt short messages, I add a relatively long random salt to them before encryption. Edit others suggest prepending the salt to the payload.
So, for example, if I encrypt the fake credit card number 4242 4242 4242 4242. what I actually encrypt is
tOH_AN2oi4MkLC3lmxxRWaNqh6--m42424242424242424
the first time, and
iQe5xOZPIMjVWfrDDip244ZGhCy2U142424242424242424
the second time, and so forth.
This random salting significantly discourages the lookup table approach you describe. Many operating systems furnish sources of high-quality random numbers like *nix /dev/rand and Windows' RNGCryptoServiceProvider module.
It's still not OK to hold payment card data in that way without defense in depth and PCI data security certification.
Edit: Some encryption schemes handle this salting as part of their normal functioning.

Is there an algorithm for unique "hashes"

I'm interested in finding an algorithm that can encode a piece of data into a sort of hash (as in that is impossible to convert back into the source data, except by brute force), but also has a unique output for every unique input. The size of the output doesn't matter.
It should be able to hash the same input twice though, and give the same output, so regular encryption with a random, discarded key won't suffice. Nor will regular encryption with a known key, or a salt, because they would be exposed to attackers.
Does such a thing exist?
Can it event theoretically exist, or is the data-destroying part of normal hash algorithms critical for the irreversible characteristic?
What use would something like this be? Well, imagine a browser with a list of websites that should be excluded from the history (like NSFW sites). If this list is saved unencoded or encrypted with a key known on the system, it's readable not just by the browser but also by bosses, wives, etc.
If instead the website addresses are stored hashed, they can't be read, but the browser can check if a site is present in the list.
Using a normal hash function could result in false positives (however unlikely).
I'm not building a browser, I have no plan to actually use the answer. I'm just curious and interested in encryption and such.
Given the definition of a hash;
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value.
no - it's not theoretically possible. A hash value is of a fixed length that is generally smaller than the data it is hashing (unless the data being hashed is less than the fixed length of the hash). They will always lose data, and as such there can always be collisions (a hash function is considered good if the risk of collision is low, and infeasible to compute.)
In theory it's impossible for outputs that are shorter than the input. This trivially follows from the pidgeon-hole principle.
You could use asymmetric encryption where you threw away the private key. That way it's technically lossless encryption, but nobody will be able to easily reverse it. Note that this is much slower than normal hashing, and the output will be larger than the input.
But the probability of collision drops exponentially with the hash size. A good 256 bit hash is collision free for all practical purposes. And by that I mean hashing for billions of years with all computers in the world will almost certainly not produce collision.
Your extended question shows two problems.
What use would something like this be? Well, imagine a browser with a list of websites that should be excluded from the history (like NSFW sites). If this list is saved unencoded or encrypted with a key known on the system, it's readable not just by the browser but also by bosses, wives, etc.
If instead the website addresses are stored hashed, they can't be read, but the browser can check if a site is present in the list.
Brute force is trivial in this use case. Just find the list of all domains/the zone file. Wouldn't be surprised if a good list is downloadable somewhere.
Using a normal hash function could result in false positives (however unlikely).
The collision probability of a hash is much lower(especially since you have no attacker that tries to provoke a collision in this scenario) than the probability of hardware error.
So my conclusion is to combine a secret with a slow hash.
byte[] secret=DeriveKeyFromPassword(pwd, salt, enough iterations for this to take perhaps a second)
and then for the actual hash use a KDF again combining the secret and the domain name.
Any form of lossless public encryption where you forget the private key.
Well, any lossless compressor with a password would work.
Or you could salt your input with some known (to you) text. This would give you something as long as the input. You could then run some sort of lossless compression on the result, which would make it shorter.
you can find a hash function with a low probability of that happening, but i think all of them are prone to birthday attack, you can try to use a function with a large size output to minimize that probability
Well what about md5 hash? sha1 hash?
I don't think it can exist; if you can put anything into them and get a different result, it couldn't be a fixed length byte array, and it would lose a lot of its usefulness.
Perhaps instead of a hash what you are looking for is reversible encryption? That should be unique. Won't be fast, but it will be unique.

Combination of more than one crypto algorithm

I'm considering the following: I have some data stream which I'd like to protect as secure as possible -- does it make any sense to apply let's say AES with some IV, then Blowfish with some IV and finally again AES with some IV?
The encryption / decryption process will be hidden (even protected against debugging) so it wont be easy to guess which crypto method and what IVs were used (however, I'm aware of the fact the power of this crypto chain can't be depend on this fact since every protection against debugging is breakable after some time).
I have computer power for this (that amount of data isn't that big) so the question only is if it's worth of implementation. For example, TripleDES worked very similarly, using three IVs and encrypt/decrypt/encrypt scheme so it probably isn't total nonsense. Another question is how much I decrease the security when I use the same IV for 1st and 3rd part or even the same IV for all three parts?
I welcome any hints on this subject
I'm not sure about this specific combination, but it's generally a bad idea to mix things like this unless that specific combination has been extensively researched. It's possible the mathematical transformations would actually counteract one another and the end result would be easier to hack. A single pass of either AES or Blowfish should be more than sufficient.
UPDATE: From my comment below…
Using TripleDES as an example: think of how much time and effort from the world's best cryptographers went into creating that combination (note that DoubleDES had a vulnerability), and the best they could do is 112 bits of security despite 192 bits of key.
UPDATE 2: I have to agree with Diomidis that AES is extremely unlikely to be the weak link in your system. Virtually every other aspect of your system is more likely to be compromised than AES.
UPDATE 3: Depending on what you're doing with the stream, you may want to just use TLS (the successor to SSL). I recommend Practical Cryptography for more details—it does a pretty good job of addressing a lot of the concerns you'll need to address. Among other things, it discusses stream ciphers, which may or may not be more appropriate than AES (since AES is a block cipher and you specifically mentioned that you had a data stream to encrypt).
I don't think you have anything to loose by applying one encryption algorithm on top of another that is very different from the first one. I would however be wary of running a second round of the same algorithm on top of the first one, even if you've run another one in-between. The interaction between the two runs may open a vulnerability.
Having said that, I think you're agonizing too much on encryption part. Most exposures of data do not happen by breaking an industry-standard encryption algorithm, like AES, but through other weaknesses in the system. I would suggest to spend more time on looking at key management, the handling of unencrypted data, weaknesses in the algorithm's implementation (the possibility of leaking data or keys), and wider system issues, for instance, what are you doing with data backups.
A hacker will always attack the weakest element in a chain. So it helps little to make a strong element even stronger. Cracking an AES encryption is already impossible with 128 Bit key length. Same goes for Blowfish. Choosing even bigger key lengths make it even harder, but actually 128 Bit has never been cracked up to now (and probably will not within the next 10 or 20 years). So this encryption is probably not the weakest element, thus why making it stronger? It is already strong.
Think about what else might be the weakest element? The IV? Actually I wouldn't waste too much time on selecting a great IV or hiding it. The weakest key is usually the enccryption key. E.g. if you are encrypting data stored to disk, but this data needs to be read by your application, your application needs to know the IV and it needs to know the encryption key, hence both of them needs to be within the binary. This is actually the weakest element. Even if you take 20 encryption methods and chain them on your data, the IVs and encryption keys of all 20 needs to be in the binary and if a hacker can extract them, the fact that you used 20 instead of 1 encryption method provided zero additional security.
Since I still don't know what the whole process is (who encrypts the data, who decrypts the data, where is the data stored, how is it transported, who needs to know the encryption keys, and so on), it's very hard to say what the weakest element really is, but I doubt that AES or Blowfish encryption itself is your weakest element.
Who are you trying to protect your data from? Your brother, your competitor, your goverment, or the aliens?
Each of these has different levels at which you could consider the data to be "as secure as possible", within a meaningful budget (of time/cash)
I wouldn't rely on obscuring the algorithms you're using. This kind of "security by obscurity" doesn't work for long. Decompiling the code is one way of revealing the crypto you're using but usually people don't keep secrets like this for long. That's why we have private/public key crypto in the first place.
Also, don't waste time obfuscating the algorithm - apply Kirchoff's principle, and remember that AES, in and of itself, is used (and acknowledged to be used) in a large number of places where the data needs to be "secure".
Damien: you're right, I should write it more clearly. I'm talking about competitor, it's for commercial use. So there's meaningful budget available but I don't want to implement it without being sure I know why I'm doing it :)
Hank: yes, this is what I'm scared of, too. The most supportive source for this idea was mentioned TripleDES. On the other side, when I use one algorithm to encrypt some data, then apply another one, it would be very strange if the 'power' of whole encryption would be lesser than using standalone algorithm. But this doesn't mean it can't be equal... This is the reason why I'm asking for some hint, this isn't my area of knowledge...
Diomidis: this is basically my point of view but my colleague is trying to convince me it really 'boosts' security. My proposal would be to use stronger encryption key instead of one algorithm after another without any thinking or deep knowledge what I'm doing.
#Miro Kropacek - your colleague is trying to add security through Voodoo. Instead, try to build something simple that you can analyse for flaws - such as just using AES.
I'm guessing it was he (she?) who suggested enhancing the security through protection from debugging too...
You can't actually make things less secure if you encrypt more than once with distinct IVs and keys, but the gain in security may be much less than you anticipate: In the example of 2DES, the meet-in-the-middle attack means it's only twice as hard to break, rather than squaring the difficulty.
In general, though, it's much safer to stick with a single well-known algorithm and increase the key length if you need more security. Leave composing cryptosystems to the experts (and I don't number myself one of them).
Encrypting twice is more secure than encrypting once, even though this may not be clear at first.
Intuitively, it appears that encrypting twice with the same algorithm gives no extra protection because an attacker might find a key which decrypts all the way from the final cyphertext back to the plaintext. ... But this is not the case.
E.g. I start with plaintext A and encrypt with key K1 it to get B. Then I encrypt B with key K2 to get C.
Intuitively, it seems reasonable to assume that there may well be a key, K3, which I could use to encrypt A and get C directly. If this is the case, then an attacker using brute force would eventually stumble upon K3 and be able to decrypt C, with the result that the extra encryption step has not added any security.
However, it is highly unlikely that such a key exists (for any modern encryption scheme). (When I say "highly unlikely" here, I mean what a normal person would express using the word "impossible").
Why?
Consider the keys as functions which provide a mapping from plaintext to cyphertext.
If our keys are all KL bits in length, then there are 2^KL such mappings.
However, if I use 2 keys of KL bits each, this gives me (2^KL)^2 mappings.
Not all of these can be equivalent to a single-stage encryption.
Another advantage of encrypting twice, if 2 different algorithms are used, is that if a vulnerability is found in one of the algorithms, the other algorithm still provides some security.
As others have noted, brute forcing the key is typically a last resort. An attacker will often try to break the process at some other point (e.g. using social engineering to discover the passphrase).
Another way of increasing security is to simply use a longer key with one encryption algorithm.
...Feel free to correct my maths!
Yes, it can be beneficial, but probably overkill in most situations. Also, as Hank mentions certain combinations can actually weaken your encryption.
TrueCrypt provides a number of combination encryption algorithms like AES-Twofish-Serpent. Of course, there's a performance penalty when using them.
Changing the algorithm is not improving the quality (except you expect an algorithm to be broken), it's only about the key/block length and some advantage in obfuscation. Doing it several times is interesting, since even if the first key leaked, the resulting data is not distinguishable from random data. There are block sizes that are processed better on a given platform (eg. register size).
Attacking quality encryption algorithms only works by brute force and thus depending on the computing power you can spend on. This means eventually you only can increase the probable
average time somebody needs to decrypt it.
If the data is of real value, they'd better not attack the data but the key holder...
I agree with what has been said above. Multiple stages of encryption won't buy you much. If you are using a 'secure' algorithm then it is practically impossible to break. Using AES in some standard streaming mode. See http://csrc.nist.gov/groups/ST/toolkit/index.html for accepted ciphers and modes. Anything recommended on that site should be sufficiently secure when used properly. If you want to be extra secure, use AES 256, although 128 should still be sufficient anyway. The greatest risks are not attacks against the algorithm itself, but rather attacks against key management, or side channel attacks (which may or may not be a risk depending on the application and usage). If you're application is vulnerable to key management attacks or to side channel attacks then it really doesn't matter how many levels of encryption you apply. This is where I would focus your efforts.

Resources