XOR encrypting and decryption key - security

It is possible to get the key used to encrypt a secuence of characters using xor?
Example
Lets say that I Have the following string: 1456, so:
1 - 49 ascii - 00110001 binary
4 - 52 ascii - 00110100 binary
5 - 53 ascii - 00110101 binary
6 - 54 ascii - 00110110 binary
Key: 100
Then I do the following: 1 ^ 100 (talking in binary: 00110001 ^ 01100100), and get the following result: "UPQR", how do I know that I used the key 100 in xor to encrypt "1456" getting "UPQR" as the result.
Thanks in advance!

If you know both the original and the encoded sequence, then for each component it must be that
original[i] ^ encoded[i] == key
If you don't know the original content, then you would have to try with each possible key and see if the results makes any sense (for some definition of sense).

Note Wikipedia's comment about XOR cipher
By itself, using a constant repeating key, a simple XOR cipher can trivially be broken using frequency analysis.
Though if the key is the size of the message (and random, and used only once), you have a one-time pad. That's just unbreakable, period. Though it is too cumbersome for most to use.

If you're in a position of code-breaking, then you should definitely check out Sinkov's Elementary Cryptanlysis and Gaines's Cryptanalysis: A Study of Ciphers and Their Solution. Both of these books go into good depth on key recovery for a Vigenère Cipher, which is fairly similar to a sequential application of XOR operations.

Related

Understanding XTS, why better than CBC with generated IVs?

I tried to understand XTS-mode, so I did some research.
I found the following article (http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/).
This page raised some questions; please help me to answer those:
XTS does the XOR operation twice on the record (once before encryption and once after the encryption). What benefits does the second XOR give?.
As an alternative CBC was mentioned, would CBC be as secure as XTS when the IV would have been generated using the same method? (IV=AES_ECB(IVKEY, block offset)
For the first sector's perspective the only difference is the additional XOR, is it?
First Sector XTS = TWEAK ^ ( AES_ECB(key, plaintext ^ TWEAK) )
First Sector CBC = AES_ECB(key, plaintext ^ IV )
XTS applies the tweak unique to each 32 byte block, no chaining like in CBC. This means that like "CTR", changes can be made bit granular?
CBC's chaining nature combined with the generated IVs using a secret key and the block offset seems to a better deal than XTS? Maybe I'm missing something and most likely it is linked to the second XORing.
I found some answers on Crypto.SE.
Answer to 2)
In CBC mode if an attacker knows the IV and (the key?) then can generate one garbled block and then could control the entire block. (Reference)
Answer to 3)
CTR mode is XORing with the key expanded using the cipher, this makes it possible to flip bits.
Directly applying AES prevents bit flipping in both CBC and XTS. (Reference)
As for 1), I have asked the question on Crypto.SE here.

SHA-1 Digest Reduction

I'm using QR Code barcodes to store UUIDs in my system and I need to check that the barcodes generated are mine and not someone else's. I also need to keep the encoded data short so that the QR Codes remain in the lower version range and remain easy to scan.
My approach is to take the UUID raw value number (a 128-bit value) and a 16 bit checksum and then Base64 encoded that data before converting to a QR code. So far so good, this works perfectly.
To generate the checksum I take the string version of the UUID and combine it with a long secret string and XOR the odd bytes together to produce a SHA-1 hash. But this hash is too long, so I XOR all the old bytes together to produce half the checksum, and likewise with the even bytes to produce the other half.
What worries me is that I have compromised the SHA-1 system needlessly by XORing it down. Would it be better to just take two unmanipulated bytes from somewhere within the result? I accept that a 16-bit checksum won't be as secure as a 160-bit checksum, but that is a price I have to pay for usability with the barcodes. What I really don't want to find is that I've now provided a checksum that is easy to crack as the UUID is transmitted in the clear.
If there is a better way of generating the checksum that would also be a suitable answer to the question. As always many thanks for your time or just reading this, double plus good thanks if you post an answer.
There's no reason to do any XORing. Simply taking the first two bytes will be as (in)secure.
To keep the code version as small as possible, you might want to convert the 144 bit value to a decimal string and encode that. QR Codes have different characters sets and encode numbers efficiently. Base64 can only be encoded as 8 bit values in QR codes so you add 30% right there.

Is there an encryption technique that could turn an 8-digit number into something 10 or 11 digits or less?

Many of the encryption techniques I've seen can easily encrypt a simple 8 digit number like "12345678" but the result is often something like "8745b34097af8bc9de087e98deb8707aac8797d097f" (made up but you get the idea).
Is there a way to encrypt this 8 digit number but have the resulting encrypted value be the same or at least only a slightly longer number? An ideal target would be to end up with a 10 digit number or less. Is this possible while still maintaining a fairly strong encryption?
Update: I didn't make the output clear enough - I am wanting an 8-digit number to turn into an 8-digit number, not 8 bytes.
A lot here is going to depend on how seriously you mean your "public-key-encryption" tag. Do you actually want public key encryption, or are you just taking that possibility into account?
If you're willing to use symmetric encryption, producing 8 bytes of output from 8 bytes of input is pretty easy: just run 3DES in ECB (Electronic Code Book) mode, and that's what you'll get. The main weakness of ECB is that a given input will always produce the same result, so if your inputs might repeat an attacker will be able to see that repetition, and may be able to notice a pattern of "encrypted value X leads to action Y", even if they can't/don't break the encryption itself at all. If you can live with that, 3DES/ECB is probably your answer.
If you can't live with that, 3DES in CFB mode is probably the next best. This will produce 16 bytes of output from 8 bytes of input (note that it's not normally doubling the input size, but adding 8 bytes to the input size).
3DES is hardly what anybody would call a cutting edge algorithm, but I'd say it still qualifies as "fairly strong encryption". Part of its weakness as an algorithm stems from its relatively small block size, but that also minimizes expansion of the output.
Edit: Sorry, I forgot to the public-key possibility. With most public-key cryptography, the smallest result is roughly equal to the key size. With RSA encryption, that'll typically mean a minimum of something like 1024 bits (and often considerably more than that). To keep the result smaller, I'd probably use Elliptical Curve Cryptography, for which a ~200 bit key is reasonably secure against known attacks. This will still be larger than 3DES/CFB, but not outrageously so.
Well you could look a stream cipher which encrypts bytes 1:1. With N bytes input, there are N bytes encrypted/decrypted output. Such ciphers are usually based on an algorithm that creates a stream of random numbers, with the encryption key/IV acting as seed.
For some stream ciphers, look at the eSTREAM candidates. I don't know of any relevant attacks on HC-128 and HC-256, for example.

How should I create my DES key? Why is an 7-character string not enough?

I'm having a bit of difficulty getting an understand of key length requirements in cryptography. I'm currently using DES which I believe is 56 bits... now, by converting an 8 character password to a byte[] my cryptography works. If I use a 7 digit password, it doesn't.
Now, forgive me if I'm wrong, but is that because ASCII characters are 7 bits, therefor 8 * 7 = 56bits?
That just doesn't seem right to me. If I want to use a key, why can I not just pass in a salted hash of my secret key, i.e. an MD5 hash?
I'm sure this is very simple, but I can't get a clear understanding of what's going on.
DES uses a 56-bit key: 8 bytes where one bit in each byte is a parity bit.
In general, however, it is recommended to use an accepted, well-known key derivation algorithm to convert a text password to a symmetric cipher key, regardless of the algorithm.
The PBKDF2 algorithm described in PKCS #5 (RFC 2898) is a widely-used key derivation function that can generate a key of any length. At its heart, PBKDF2 is combining salt and the password through via a hash function to produce the actual key. The hash is repeated many times so that it will be expensive and slow for an attacker to try each entry in her "dictionary" of most common passwords.
The older version, PBKDF1, can generate keys for DES encryption, but DES and PBKDF1 aren't recommended for new applications.
Most platforms with cryptographic support include PKCS #5 key-derivation algorithms in their API.
Each algorithm is designed to accept a certain key length. The key is used as part of the algorithm, and as such, can't be whatever your heart desires.
Common key sizes are:
DES: 56bit key
AES: 128-256bit key (commonly used values are 128, 192 and 256)
RSA (assymetric cryptography): 1024, 2048, 4096 bit key
A number, such as 1234567 is only a 4-byte variable. The key is expected to be a byte array, such as "1234567" (implicitly convertible to one in C) or `{ '1', '2', '3', '4', '5', '6', '7' }.
If you wish to pass the MD5 hash of your salted key to DES, you should use some key compression technique. For instance, you could take the top 7 bytes (somewhat undesirable), or perform DES encryption on the MD5 hash (with a known constant key), and take all but the last byte as the key for DES operation.
edit: The DES I'm talking about here is the implementation per the standard released by NIST. It could so be (as noted above), that your specific API expects different requirements on the length of the key, and derives the final 7-byte key from it.
The key must have size 64-bits but only 56-bits are used from the key. The other 8-bits are parity bits (internal use).
ASCII chars have 8-bit size.
You shouldn't pass you passwords straight into the algorithm. Use for instance the Rfc2898DeriveBytes class that will salt your passwords, too. It will work with any length.
Have a look here for an example.
EDIT: D'Oh - your question is not C# or .Net tagged :/
According to MSDN DES supports a key length of 64 bits.
To avoid this issue and increase the overall security of one's implementation, typically we'll pass some hashed variant of the key to crypto functions, rather than the key itself.
Also, it's good practice to 'salt' the hash with a value which is particular to the operation you are doing and won't change (e.g., internal userid). This assures you that for any two instances of the key, the resulting has will be different.
Once you have your derived key, you can pull off the first n-bites of it as required by your particular crypto function.
DES requires a 64 bit key. 56 bits for data and 8 bits for parity.
Each byte contains a parity bit at the last index. This bit is used to check for errors that may have occurred.
If the sum of the 7 data bits add up to an even number the correct parity bit is 0, for an odd number it's 1.
ASCII chars contain 8 bits, 8 chars can be used as a key if error correction is not necessary. If EC is necessary, use 7 chars and insert parity bits at indices (0 based) 7,15,23,31,39,47,55,63.
sources:
Wikipedia: https://en.m.wikipedia.org/wiki/Data_Encryption_Standard
“The key ostensibly consists of 64 bits; however, only 56 of these are actually used by the algorithm. Eight bits are used solely for checking parity, and are thereafter discarded. Hence the effective key length is 56 bits.”
“The key is nominally stored or transmitted as 8 bytes, each with odd parity. According to ANSI X3.92-1981 (Now, known as ANSI INCITS 92-1981), section 3.5:
One bit in each 8-bit byte of the KEY may be utilized for error detection in key generation, distribution, and storage. Bits 8, 16,..., 64 are for use in ensuring that each byte is of odd parity.”

What's wrong with XOR encryption?

I wrote a short C++ program to do XOR encryption on a file, which I may use for some personal files (if it gets cracked it's no big deal - I'm just protecting against casual viewers). Basically, I take an ASCII password and repeatedly XOR the password with the data in the file.
Now I'm curious, though: if someone wanted to crack this, how would they go about it? Would it take a long time? Does it depend on the length of the password (i.e., what's the big-O)?
The problem with XOR encryption is that for long runs of the same characters, it is very easy to see the password. Such long runs are most commonly spaces in text files. Say your password is 8 chars, and the text file has 16 spaces in some line (for example, in the middle of ASCII-graphics table). If you just XOR that with your password, you'll see that output will have repeating sequences of characters. The attacker would just look for any such, try to guess the character in the original file (space would be the first candidate to try), and derive the length of the password from length of repeating groups.
Binary files can be even worse as they often contain repeating sequences of 0x00 bytes. Obviously, XORing with those is no-op, so your password will be visible in plain text in the output! An example of a very common binary format that has long sequences of nulls is .doc.
I concur with Pavel Minaev's explanation of XOR's weaknesses. For those who are interested, here's a basic overview of the standard algorithm used to break the trivial XOR encryption in a few minutes:
Determine how long the key is. This
is done by XORing the encrypted data
with itself shifted various numbers
of places, and examining how many
bytes are the same.
If the bytes that are equal are
greater than a certain percentage
(6% according to Bruce Schneier's
Applied Cryptography second
edition), then you have shifted the
data by a multiple of the keylength.
By finding the smallest amount of
shifting that results in a large
amount of equal bytes, you find the
keylength.
Shift the cipher text by the
keylength, and XOR against itself.
This removes the key and leaves you
with the plaintext XORed with the
plaintext shifted the length of the
key. There should be enough
plaintext to determine the message
content.
Read more at Encryption Matters, Part 1
XOR encryption can be reasonably* strong if the following conditions are met:
The plain text and the password are about the same length.
The password is not reused for encrypting more than one message.
The password cannot be guessed, IE by dictionary or other mathematical means. In practice this means the bits are randomized.
*Reasonably strong meaning it cannot be broken by trivial, mathematical means, as in GeneQ's post. It is still no stronger than your password.
In addition to the points already mentioned, XOR encryption is completely vulnerable to known-plaintext attacks:
cryptotext = plaintext XOR key
key = cryptotext XOR plaintext = plaintext XOR key XOR plaintext
where XORring the plaintexts cancel each other out, leaving just the key.
Not being vulnerable to known-plaintext attacks is a required but not sufficient property for any "secure" encryption method where the same key is used for more than one plaintext block (i.e. a one-time pad is still secure).
Ways to make XOR work:
Use multiple keys with each key length equal to a prime number but never the same length for keys.
Use the original filename as another key but remember to create a mechanism for retrieving the filename. Then create a new filename with an extension that will let you know it is an encrypted file.
The reason for using multiple keys of prime-number length is that they cause the resulting XOR key to be Key A TIMES Key B in length before it repeats.
Compress any repeating patterns out of the file before it is encrypted.
Generate a random number and XOR this number every X Offset (Remember, this number must also be recreatable. You could use a RANDOM SEED of the Filelength.
After doing all this, if you use 5 keys of length 31 and greater, you would end up with a key length of approximately One Hundred Meg!
For keys, Filename being one (including the full path), STR(Filesize) + STR(Filedate) + STR(Date) + STR(Time), Random Generation Key, Your Full Name, A private key created one time.
A database to store the keys used for each file encrypted but keep the DAT file on a USB memory stick and NOT on the computer.
This should prevent the repeating pattern on files like Pictures and Music but movies, being four gigs in length or more, may still be vulnerable so may need a sixth key.
I personally have the dat file encrypted itself on the memory stick (Dat file for use with Microsoft Access). I used a 3-Key method to encrypt it cause it will never be THAT large, being a directory of the files with the associated keys.
The reason for multiple keys rather than randomly generating one very large key is that primes times primes get large quick and I have some control over the creation of the key and you KNOW that there really is no such thing as a truely random number. If I created one large random number, someone else can generate that same number.
Method to use the keys: Encrypt the file with one key, then the next, then the next till all keys are used. Each key is used over and over again till the entire file is encrypted with that key.
Because the keys are of different length, the overlap of the repeat is different for each key and so creates a derived key the length of Key one time Key two. This logic repeats for the rest of the keys. The reason for Prime numbers is that the repeating would occur on a division of the key length so you want the division to be 1 or the length of the key, hense, prime.
OK, granted, this is more than a simple XOR on the file but the concept is the same.
Lance
I'm just protecting against casual viewers
As long as this assumption holds, your encryption scheme is ok. People who think that Internet Explorer is "teh internets" are not capable of breaking it.
If not, just use some crypto library. There are already many good algorithms like Blowfish or AES for symmetric crypto.
The target of a good encryption is to make it mathematically difficult to decrypt without the key.
This includes the desire to protect the key itself.
The XOR technique is basically a very simple cipher easily broken as described here.
It is important to note that XOR is used within cryptographic algorithms.
These algorithms work on the introduction of mathematical difficulty around it.
Norton's Anti-virus used to use a technique of using the previous unencrypted letter as the key for next letter. That took me an extra half-hour to figure out, if I recall correctly.
If you just want to stop the casual viewer, it's good enough; I've used to hide strings within executables. It won't stand up 10 minutes to anyone who actually tries, however.
That all said, these days there are much better encryption methods readily available, so why not avail yourself of something better. If you are trying to just hide from the "casual" user, even something like gzip would do that job better.
Another trick is to generate a md5() hash for your password. You can make it even more unique by using the length of the protected text as an offset or combining it with your password to provide better distribution for short phrases. And for long phrases, evolve your md5() hash by combining each 16-byte block with the previous hash -- making the entire XOR key "random" and non-repetitive.
RC4 is essentially XOR encryption! As are many stream ciphers - the key is the key (no pun intended!) you must NEVER reuse the key. EVER!
I'm a little late in answering, but since no one has mentioned it yet: this is called a Vigenère cipher.
Wikipedia gives a number of cryptanalysis attacks to break it; even simpler, though, since most file-formats have a fixed header, would be to XOR the plaintext-header with the encrypted-header, giving you the key.
That ">6%" GeneQ mentions is the index of coincidence for English telegraph text - 26 letters, with punctuation and numerals spelled out. The actual value for long texts is 0.0665.
The <4% is the index of coincidence for random text in a 26-character alphabet, which is 1/26, or 0.385.
If you're using a different language or a different alphabet, the specific values will different. If you're using the ASCII character set, Unicode, or binary bytes, the specific values will be very different. But the difference between the IC of plaintext and random text will usually be present. (Compressed binaries may have ICs very close to that of random, and any file encrypted with any modern computer cipher will have an IC that is exactly that of random text.)
Once you've XORed the text against itself, what you have left is equivalent to an autokey cipher. Wikipedia has a good example of breaking such a cipher
http://en.wikipedia.org/wiki/Autokey_cipher
If you want to keep using XOR you could easily hash the password with multiple different salts (a string that you add to a password before hashing) and then combine them to get a larger key.
E.G. use sha3-512 with 64 unique salts, then hash your password with each salt to get a 32768 bit key that you can use to encrypt a 32Kib (Kilibit) (4KiB (kilibyte)) or smaller file. Hashing this many times should be less than a second on a modern CPU.
for something more secure you could try manipulating your key during encryption like AES (Rijndael). AES actually does XOR times and modifies the key each repeat of the key using a switch table. It became an internation standard so its quite secure.

Resources