Can One-Time-Pad key be reused if encrypted plaintext is random?

Can One-Time-Pad key be reused if encrypted plaintext is random? - security

I want to encrypt purely random data with one single key that is shorter than the plaintext.
Should I use AES or another robust encryption algorithm, or can I use OTP, i.e. only xoring (purely random) plaintext with the unique key, block by block?
E.g. data is 1024 bits long and is random. Key is 128-bit long (random too). Is it safe to encrypt data by xoring 8 successive 128-bit blocks with the same key?

Your question asks " Is it safe to encrypt data by xoring 8 successive 128-bit blocks with the same key?"
This is not a One-Time-Pad. A One-Time-Pad is used once and once only. Any compromise of part of the unencrypted data would allow recovery of all or part of the key, and hence recovery of more of the unencrypted data.
A safe encryption scheme is secure against an attacker knowing part or all of the plaintext: a "known plaintext attack". Your scheme is not safe; it is vulnerable to a known plaintext attack.

E.g. data is 1024 bits long and is random. Key is 128-bit long (random
too). Is it safe to encrypt data by xoring 8 successive 128-bit blocks
with the same key?
If your data is random, then the answer is yes.
You can consider your actual data as OTP key here. It's purely random and is used only once, so there's no way to recover either key or data.

If the data itself is random then it is an equally likely to the text space, so any transformation will lead safety. XOR or stream cipher maintains the relation that the blocks have in cipher-text as below..
from Crypto.Cipher import ARC4
key = '1234567812345678'
obj1 = ARC4.new(key)
obj2 = ARC4.new(key)
d1= obj1.encrypt('\x01\x82\x83\x04\x05\x06\x10\x81\x23\x32\x33\x34')
d2= obj2.encrypt('\x81\x02\x83\x84\x85\x86\x90\x01\xa3\xb2\x33\xb4')
print repr(d1)
print repr(d2)
p1='';p2=''
for i in d1:
if ord(i)>=128: p1+=chr(ord(i)-128)
else: p1+=chr(ord(i))
print; print
for i in d2:
if ord(i)>=128: p2+= chr(ord(i)-128)
else: p2+=chr(ord(i))
print p1==p2`
output:
'\xbaq\xba\xd0\x0c\xb7\xce&\xd3\x019\xfb'
':\xf1\xbaP\x8c7N\xa6S\x819{'
True

Related

How does an IV work and what would be the best way to store it?

I want to encrypt and decrypt strings. I'm using Nodejs crypto for this. I've read that when encrypting and decrypting it's highly recommended to use an IV. I want to store the encrypted data inside a MySQL database and decrypt it later when needed. I understand that I need the IV also for the decryption process. But what exactly is an IV and how should I store it? I read something about that an IV does not to be kept secret. Does this mean I can store it right next to the encrypted data it belongs to?

it's highly recommended to use an IV
No, it's required or you'll not get a fully secure ciphertext in most circumstances. At the very minimum, not supplying an IV for the same key and plaintext message will result in identical ciphertext, which will leak information to an adversary. In other words: encryption would be deterministic, and that's not a property that you want from a cipher. For CTR and GCM mode you may well leak all of the plaintext message though...
But what exactly is an IV ... ?
An IV just consists of binary bits. It's size and contents depend on the mode of operation (CBC/CTR/GCM). Generally it needs either to be a nonce or randomized.
CBC mode requires a randomized IV of 16 bytes; generally a cryptographically secure random number generator is used for that.
CTR mode commonly specifies both a nonce and the initial counter value within the IV of 16 bytes. So you already need to put the nonce in the left hand bytes (lowest index). This nonce may be randomized, but then it should be large enough (e.g. 12 bytes) to avoid the birthday problem.
GCM mode requires just a nonce of 12 bytes.
and how should I store it
Anyway you can store the bytes, as long as they can be retrieved or regenerated during decryption. If you need text you may need to encode it using base 64 or hexadecimals (this goes for the ciphertext as well, of course).
I read something about that an IV does not to be kept secret.
That's correct.
Does this mean I can store it right next to the encrypted data it belongs to?
Correct, quite often the IV is simply prefixed to the ciphertext; if you know the block cipher and mode of operation then the size is predetermined after all.

Which of these encryption methods is more secure? Why?

I am writing a program that takes a passphrase from the user and then writes some encrypted data to file. The method that I have come up with so far is as follows:
Generate a 128-bit IV from hashing the filename and the system time, and write this to the beginning of the file.
Generate a 256-bit key from the passphrase using SHA256.
Encrypt the data (beginning with a 32-bit static signature) with this key using AES in CBC mode, and write it to file.
When decrypting, the IV is read, and then the passphrase used to generate the key in the same way, and the first 32-bits are compared against what the signature should be in order to tell if the key is valid.
However I was looking at the AES example provided in PolarSSL (the library I am using to do the hashing and encryption), and they use a much more complex method:
Generate a 128-bit IV from hashing the filename and file size, and write this to the beginning of the file.
Generate a 256-bit key from hashing (SHA256) the passphrase and the IV together 8192 times.
Initialize the HMAC with this key.
Encrypt the data with this key using AES in CBC mode, and write it to file, while updating the HMAC with each encrypted block.
Write the HMAC to the end of the file.
I get the impression that the second method is more secure, but I don't have enough knowledge to back that up, other than that it looks more complicated.
If it is more secure, what are the reasons for this?
Is appending an HMAC to the end of the file more secure than having a signature at the beginning of the encrypted data?
Does hashing 8192 times increase the security?
Note: This is an open source project so whatever method I use, it will be freely available to anyone.

The second option is more secure.
Your method, does not provide any message integrity. This means that an attacker can modify parts of the ciphertext and alter what the plain text decrypts to. So long as they don't modify anything that will alter your 32-bit static signature then you'll trust it. The HMAC on the second method provides message integrity.
By hashing the key 8192 times it adds extra computational steps for someone to try and bruteforce the key. Assume a user will pick a dictionary based password. With your method an attacker must perform SHA256(someguess) and then try and decrypt. However, with the PolarSSL version, they will have to calculate SHA256(SHA256(SHA256...(SHA256(someguess))) for 8192 times. This will only slow an attacker down, but it might be enough (for now).
For what it's worth, please use an existing library. Cryptography is hard and is prone to subtle mistakes.

Source and importance of nonce / IV for protocol using AES-GCM

I am making a protocol that uses packets (i.e., not a stream) encrypted with AES. I've decided on using GCM (based off CTR) because it provides integrated authentication and is part of the NSA's Suite B. The AES keys are negotiated using ECDH, where the public keys are signed by trusted contacts as a part of a web-of-trust using something like ECDSA. I believe that I need a 128-bit nonce / initialization vector for GCM because even though I'm using a 256 bit key for AES, it's always a 128 bit block cipher (right?) I'll be using a 96 bit IV after reading the BC code.
I'm definitely not implementing my own algorithms (just the protocol -- my crypto provider is BouncyCastle), but I still need to know how to use this nonce without shooting myself in the foot. The AES key used in between two people with the same DH keys will remain constant, so I know that the same nonce should not be used for more than one packet.
Could I simply prepend a 96-bit pseudo random number to the packet and have the recipient use this as a nonce? This is peer-to-peer software and packets can be sent by either at any time (e.g., an instant message, file transfer request, etc.) and speed is a big issue so it would be good not to have to use a secure random number source. The nonce doesn't have to be secret at all, right? Or necessarily as random as a "cryptographically secure" PNRG? Wikipedia says that it should be random, or else it is susceptible to a chosen plaintext attack -- but there's a "citation needed" next to both claims and I'm not sure if that's true for block ciphers. Could I actually use a counter that counts the number of packets sent (separate from the counter of the number of 128 bit blocks) with a given AES key, starting at 1? Obviously this would make the nonce predictable. Considering that GCM authenticates as well as encrypts, would this compromise its authentication functionality?

GCM is a block cipher counter mode with authentication. A Counter mode effectively turns a block cipher into a stream cipher, and therefore many of the rules for stream ciphers still apply. Its important to note that the same Key+IV will always produce the same PRNG stream, and reusing this PRNG stream can lead to an attacker obtaining plaintext with a simple XOR. In a protocol the same Key+IV can be used for the life of the session, so long as the mode's counter doesn't wrap (int overflow). For example, a protocol could have two parties and they have a pre-shared secret key, then they could negotiate a new cryptographic Nonce that is used as the IV for each session (Remember nonce means use ONLY ONCE).
If you want to use AES as a block cipher you should look into CMAC Mode or perhaps the OMAC1 variant. With CMAC mode all of the rules for still CBC apply. In this case you would have to make sure that each packet used a unique IV that is also random. However its important to note that reusing an IV doesn't have nearly as dire consequences as reusing PRNG stream.

I'd suggest against making your own security protocol. There are several things you need to consider that even a qualified cryptographer can get it wrong. I'd refer you to the TLS
protocol (RFC5246), and the datagram TLS protocol (RFC 4347). Pick a library and use them.
Concerning your question with IV in GCM mode. I'll tell you how DTLS and TLS do it. They use an explicit nonce, i.e. the message sequence number (64-bits) that is included in every packet, with a secret part that is not transmitted (the upper 32 bits) and is derived from the initial key exchange (check RFC 5288 for more information).

Avoid that repeated same messages look always same after encryption, and can be replayed by an attacker?

I'm looking to authenticate that a particular message is coming from a particular place.
Example: A repeatedly sends the same message to B. Lets say this message is "helloworld" which is encrypted to "asdfqwerty".
How can I ensure that a third party C doesn't learn that B always receives this same encrypted string, and C starts sending "asdfqwerty" to B?
How can I ensure that when B decrypts "asdfqwerty" to "helloworld", it is always receiving this "helloworld" from A?
Thanks for any help.

For the former, you want to use a Mode of Operation for your symmetric cipher that uses an Initialization Vector. The IV ensures that every encrypted message is different, even if it contains the same plaintext.
For the latter, you want to sign your message using the private key of A(lice). If B(ob) has the public key of Alice, he can then verify she really created the message.
Finally, beware of replay attacks, where C(harlie) records a valid message from Alice, and later replays it to Bob. To avoid this, add a nonce and/or a timestamp to your encrypted message (yes, you could make the IV play double-duty as a nonce).

Add random value to the data being encrypted, and whenever it's decrypted, strip it from the original unencrypted data.
You need decent random number generator. I'm sure Google will help you on that.

C noticing that B receives twice the same encrypted message is an issue called traffic analysis and has historically been a heavy concern (but this was in times which predated public key encryption).
Any decent public encryption system includes some random padding. For instance, for RSA as described in PKCS#1, the encrypted message (of length at most 117 bytes for a 1024-bit RSA key) gets a header with at least eight random (non-zero) bytes, and a few extra data which allows the receiver to unambiguously locate the padding bytes, and see where the "real" data begins. The random bytes will be generated anew every time; hence, if A sends twice the same message to B, the encrypted messages will be different, but B will recover the original message twice.
Random padding is required for public key encryption precisely because the public key is public: if encryption was deterministic, then an attacker could "try" potential messages and look for a match (this is exhaustive search on possible messages).
Public key encryption algorithms often have heavy limitations on data size or performance (e.g. with RSA, you have a strict maximum message length, depending on the key size). Thus, it is customary to use a hybrid system: the public key encryption is used to encrypt a symmetric key K (i.e. a bunch of random bytes), and K is used to symmetrically encrypt the data (symmetric encryption is fast and does not have constraints on input message size). In a hybrid system, you generate a new K for every message, so this also gives you the randomness you need to avoid the issue of encrypting several times the same message with a given public key: at the public encryption level, you are actually never encrypting twice the same message (the same key K), even if the data which is symmetrically encrypted with K is the same than in a previous message. This would protect you from traffic analysis even if the public key encryption itself did not include random padding.
When symmetrically encrypting data with a key K, the symmetric encryption should use an "initial value" (IV) which is randomly and uniformly generated; this is integrated in the encryption mode (some modes only need a non-repeating IV without requiring a random uniform generation, but CBC needs random uniform generation). This is a third level of randomness, protecting you against traffic analysis.
When using asymmetric key agreement (static Diffie-Hellman), since are a bit more complex, because a key agreement results in a key K which you do not choose, and which could be the same ever and ever (between given sender and receiver). In that situation, protection against traffic analysis relies on the symmetric encryption IV randomness.
Asymmetric encryption protocols, such as OpenPGP, describe how the symmetric encryption, public key encryption and randomness should all be linked together, ironing out the tricky details. You are warmly encouraged not to reinvent your own protocol: it is difficult to design a secure protocol, mostly because one cannot easily test for the presence or absence of any weakness.

You may want to study block cipher modes of operation. However, the modes are designed to work on a data stream that is sent over a reliable channel. If your messages are sent out of order over an unreliable transport (e.g. UDP packets), I don't think you can use it.

Can you 'convert' ciphertext encrypted in CBC mode to ECB mode with known IV?

In CBC mode, C2 = Ek(C1 ⊕ P2)
C2 = 2nd block of ciphertext
P2 = 2nd block of plaintext
Ek = encryption function
If IV is known (let's say it's set to 0), is there anyway to find the ciphertext block such that C2 = Ek(P2) ?

If the message is only 1 block in size and the IV is null, then both ECB and CBC modes will produce an equivalent cipher text. However, the short answer to your question is NO. In most cases (like WEP and WPA) the IV is known to the attacker and this does not compromise the system. However, make sure that the IV is random when using CBC mode.
In Cipher Block Chaining mode (CBC), each plain text block is xor'ed with the previous cipher text block in the sequence. Thus you cannot decrypt the message in its entirety without following the "chain" or series of decryption routines. Its kind of like a one-time-pad meets a block cipher. The algorithm isn't that complex and here is a graphical representation of the two. Notice that they are nearly identical except CBC mode adds XOR operation for each block.

The simple answer is that if what you ask for was possible in general then CBC would then be equivalent to ECB anyway - there would be no security gained by CBC over ECB.
There is, however, an information leak in CBC. If you happen to observe two identical ciphertexts in CBC messages that were encrypted with the same key, Cn = Cm, then you can calculate Pn ⊕ Pm (it is equal to Cn-1 ⊕ Cm-1). Two identical ciphertexts are only likely in two cases: the first is that the sender re-used an IV, and the second is that you have observed a very large amount of traffic encrypted with the same key (for a 128 bit block cipher, on the order of 16 million TB; for a 64 bit block cipher, on the order of 4 GB).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string