SHA-1 Digest Reduction - digest

I'm using QR Code barcodes to store UUIDs in my system and I need to check that the barcodes generated are mine and not someone else's. I also need to keep the encoded data short so that the QR Codes remain in the lower version range and remain easy to scan.
My approach is to take the UUID raw value number (a 128-bit value) and a 16 bit checksum and then Base64 encoded that data before converting to a QR code. So far so good, this works perfectly.
To generate the checksum I take the string version of the UUID and combine it with a long secret string and XOR the odd bytes together to produce a SHA-1 hash. But this hash is too long, so I XOR all the old bytes together to produce half the checksum, and likewise with the even bytes to produce the other half.
What worries me is that I have compromised the SHA-1 system needlessly by XORing it down. Would it be better to just take two unmanipulated bytes from somewhere within the result? I accept that a 16-bit checksum won't be as secure as a 160-bit checksum, but that is a price I have to pay for usability with the barcodes. What I really don't want to find is that I've now provided a checksum that is easy to crack as the UUID is transmitted in the clear.
If there is a better way of generating the checksum that would also be a suitable answer to the question. As always many thanks for your time or just reading this, double plus good thanks if you post an answer.

There's no reason to do any XORing. Simply taking the first two bytes will be as (in)secure.
To keep the code version as small as possible, you might want to convert the 144 bit value to a decimal string and encode that. QR Codes have different characters sets and encode numbers efficiently. Base64 can only be encoded as 8 bit values in QR codes so you add 30% right there.

Related

How to get pieces value from .torrent file

I am trying to build a .torrent file interpreter. The problem is that I can't seem to understand how to go about interpreting the pieces value. I am aware that the pieces key contains a concatenation of the SHA-1 hashes for each piece and that SHA-1 contains 20 bytes. A result of this is that the final output should be a multiple of 20 bytes. However, after counting the bytes from the pieces value as a string or in hexadecimal form it still does not satisfy this. How should I interpret the pieces key?
Here we use bencode and bdecode, and the pieces value can get easily. I think you need to firstly read BEP for more details. What's more, you can see this and use it as an example.
From looking at a real torrent file, I found that the SHA-1 hashes had to be taken from its hexadecimal string format, but I previously thought that it was wrong because the byte length of the hash was not a multiple of 20. Turns out I forgot to add a trailing 0 to hexadecimals that were only 1 character (e.g. a had to be changed to 0a)

hashing passwords with pbkdf2 crypto does not work correctly

Password security is not my strong suit. Please help me out.
I use node.js 4.2.3 express 4.13.3. I found some examples to hash and salt passwords with crypto's pbkdf2.
Here is my code.
var salt = crypto.randomBytes(10).toString('base64');
console.log("salt > "+salt);
crypto.pbkdf2(pass, salt , 10000, 150, 'sha512',function(err, derivedKey) {
pass = derivedKey.toString('hex');
});
The final derivedKey does not include the salt. What am I missing? Should I join the two strings manually before saving?
Why some examples use base64 and others hex? To get different string lenghts? What is the default, so I can use it?
Why not to use basic64 in both salt and hashed password?
Is the final derivedKey string UTF8? Or this has to do only with the database it gets saved? My database is in UTF8.
Thanks
Yes, store the salt yourself, separately, unencrypted. Make sure it's randomly generated.
More importantly, you're crippling your PBKDF2 encryption by asking for 150 bytes (bytes per nodejs.org) of key length - SHA512 is a fantastic choice, but it only provides 64 bytes of native output. To get 10,000 iterations of 150 bytes of output, PBKDF2/RFC2898 is going to execute 30,000 times, while an offline attacker will only need to run 10,000 iterations and match the first 64 bytes (if the first 64 match, then the rest will too); you gave them a 3:1 advantage for free!
Instead, if you're happy with the work factor, you should use 30,000 iterations of 64 bytes of output - you'll spend the same amount of time, no difference, but the attacker now has to do 30,000 iterations too, so you took away their 3:1 advantage!
When you pass the salt to the PBKDF2 function, if you can, just pass in the pure binary. Also, the node.js docs say - reasonably "It is recommended that the salts are random and their lengths are greater than 16 bytes." This means binary 16 bytes, before the base64 or hex or whatever conversion if you want one.
You can save both salt and derivedkey as BINARY of the correct length for the most efficient storage (then you don't have to worry about UTF-x vs. ASCII), or you can convert one or both to BASE64 or hexadecimal, and then convert back to binary as required. Base64 vs hex vs binary is irrelevant as long as the conversions are reconverted as needed.
I'd also make the number of iterations a stored field, so you can easily increase it in the years to come, and include a field for the "version" of password hashing used, so you can easily change your algorithm in the years to come if need be as well.
Encryption works with data, not strings, this includes the encryption key. PBKDF2 produces a data key, which can be easily converted to a string, this conversion is necessary because many data bytes have no corresponding print character or unicode code point. Many scripting languages do not handle data well so the data is many times converted to Base64 or hexadecimal (hex).
You can use Base64 or hexadecimal for the salt and hashed password, just be consistent on all uses.
The salt and iteration count need to be the same for creating an checking, you will need to combine them or save them separately.
Your code is converting the derived key to hexadecimal, that is fine and base64 would also be fine. Again this is necessary because not all data bytes are UTF-8.

How to represent 32 bytes of binary data in the smallest possible printable way

I have a sha256 hash of some data that is a product ID for a registration system. I want to give this information to the end user, and I wish it to contain only printable characters (preferably a-z, A-Z and 0-9). I tried regular hex and base64, but they both produce very long results that are not satisfactory. I wish to represent the data in as small a format as possible in alphanumeric characters, but without losing integrity. Note that the data does not need to be converted back, so it can be a one-way process as long as no security is lost.
I am working in C.
Thanks in advance for any help on this!
Kind regards,
Philip Bennefall
32 bytes of data is going to be very difficult to meaningfully provide to a user in a medium that doesn't support cut/paste, however you represent it.
Lessen the amount of data you're using for the product ID and you can use Base-64 and friends.
If Base64 isn't adequate for your 32 bytes, MD5 it down to 16 bytes -- shazam, now it's half as long.
Why, yes it is absurd to hash a 32 byte hash down to 16 bytes, but that's basically what you're asking to do, whether it's 16 or any other number of bytes. You WILL lose information.
Or simply use MD5 to begin with, since it's a smaller hash.
If the user isn't going to key this number in, how important is the representation anyway? All of these long hash dumps are inscrutable. When I see them I just look at the last 3 characters anyway.

How safely can I assume unicity of a part of SHA1 hash?

I'm currently using a SHA1 to somewhat shorten an url:
Digest::SHA1.hexdigest("salt-" + url)
How safe is it to use only the first 8 characters of the SHA1 as a unique identifier, like GitHub does for commits apparently?
To calculate the probability of a collision with a given length and the number of hashes that you have, see the birthday problem. I don't know the number of hashes that you are going to have, but here are some examples. 8 hexadecimal characters is 32 bits, so for about 100 hashes the probability of a collision is about 1/1,000,000, for 10,000 hashes it's about 1/100, for 100,000 it's 3/4 etc.
See the table in the Birthday attack article on Wikipedia to find a good hash length that would satisfy your needs. For example if you want the collision to be less likely than 1/1,000,000,000 for a set of more than 100,000 hashes then use 64 bits, or 16 hexadecimal digits.
It all depends on how many hashes are you going to have and what probability of a collision are you willing to accept (because there is always some probability, even if insanely small).
If you're talking about a SHA-1 in hexadecimal, then you're only getting 4 bits per character, for a total of 32 bits. The chances of a collision are inversely proportional to the square root of that maximum value, so about 1/65536. If your URL shortener gets used much, it probably won't take terribly long before you start to see collisions.
As for alternatives, the most obvious is probably to just maintain a counter. Since you need to store a table of URLs to translate your shortened URL back to the original, you basically just store each new URL in your table. If it was already present, you give its existing number. Otherwise, you insert it and give it a new number. Either way, you give that number to the user.
It depends on what you are trying to accomplish. The output of SHA1 is effectively random with regards to the input (the output of a good hash function changes in half of its bits based on a one-bit change in the input, and SHA1, while not perfect, is pretty good), and by taking a 32-bit (assuming 8 hex digits) subset of the 160-bit output, you reduce the output space from 2^160 to 2^32 values. All things being equal, which they never are, this would significantly reduce the difficulty of finding a collision.
However, if the hash function's input must be a valid URL, that significantly reduces the number of possible inputs. #rsp points out the birthday problem, but given this, I'm not sure exactly how applicable it is at least in its simple form. Also, it largely assumes that there are no other precautions in place.
I would be more interested in why you are doing this. Is this about URLs that the user will need to remember and type? If so, tacking on a bunch of random hexadecimal digits is probably a bad idea. Is it a URL or URL parameter that will just be passed around programmatically? Then, I wouldn't care much about length. Either way, there are probably better ways to do what you are trying to accomplish.
If you use a binary output for SHA1 and Base64 encode the result, you will get much higher information density per character; you can have the same 8-character names, but rather than only 16^8 (2^32) possibilities, you'll have 64^8 (2^48) possibilities.
Using the assumption that the 50% probability-of-collision scales with 1.177*sqrt(N), using a Base64-style encoding will require 256 times more inputs than the hex-output before reaching the 50% chance of collision probability.

What's wrong with XOR encryption?

I wrote a short C++ program to do XOR encryption on a file, which I may use for some personal files (if it gets cracked it's no big deal - I'm just protecting against casual viewers). Basically, I take an ASCII password and repeatedly XOR the password with the data in the file.
Now I'm curious, though: if someone wanted to crack this, how would they go about it? Would it take a long time? Does it depend on the length of the password (i.e., what's the big-O)?
The problem with XOR encryption is that for long runs of the same characters, it is very easy to see the password. Such long runs are most commonly spaces in text files. Say your password is 8 chars, and the text file has 16 spaces in some line (for example, in the middle of ASCII-graphics table). If you just XOR that with your password, you'll see that output will have repeating sequences of characters. The attacker would just look for any such, try to guess the character in the original file (space would be the first candidate to try), and derive the length of the password from length of repeating groups.
Binary files can be even worse as they often contain repeating sequences of 0x00 bytes. Obviously, XORing with those is no-op, so your password will be visible in plain text in the output! An example of a very common binary format that has long sequences of nulls is .doc.
I concur with Pavel Minaev's explanation of XOR's weaknesses. For those who are interested, here's a basic overview of the standard algorithm used to break the trivial XOR encryption in a few minutes:
Determine how long the key is. This
is done by XORing the encrypted data
with itself shifted various numbers
of places, and examining how many
bytes are the same.
If the bytes that are equal are
greater than a certain percentage
(6% according to Bruce Schneier's
Applied Cryptography second
edition), then you have shifted the
data by a multiple of the keylength.
By finding the smallest amount of
shifting that results in a large
amount of equal bytes, you find the
keylength.
Shift the cipher text by the
keylength, and XOR against itself.
This removes the key and leaves you
with the plaintext XORed with the
plaintext shifted the length of the
key. There should be enough
plaintext to determine the message
content.
Read more at Encryption Matters, Part 1
XOR encryption can be reasonably* strong if the following conditions are met:
The plain text and the password are about the same length.
The password is not reused for encrypting more than one message.
The password cannot be guessed, IE by dictionary or other mathematical means. In practice this means the bits are randomized.
*Reasonably strong meaning it cannot be broken by trivial, mathematical means, as in GeneQ's post. It is still no stronger than your password.
In addition to the points already mentioned, XOR encryption is completely vulnerable to known-plaintext attacks:
cryptotext = plaintext XOR key
key = cryptotext XOR plaintext = plaintext XOR key XOR plaintext
where XORring the plaintexts cancel each other out, leaving just the key.
Not being vulnerable to known-plaintext attacks is a required but not sufficient property for any "secure" encryption method where the same key is used for more than one plaintext block (i.e. a one-time pad is still secure).
Ways to make XOR work:
Use multiple keys with each key length equal to a prime number but never the same length for keys.
Use the original filename as another key but remember to create a mechanism for retrieving the filename. Then create a new filename with an extension that will let you know it is an encrypted file.
The reason for using multiple keys of prime-number length is that they cause the resulting XOR key to be Key A TIMES Key B in length before it repeats.
Compress any repeating patterns out of the file before it is encrypted.
Generate a random number and XOR this number every X Offset (Remember, this number must also be recreatable. You could use a RANDOM SEED of the Filelength.
After doing all this, if you use 5 keys of length 31 and greater, you would end up with a key length of approximately One Hundred Meg!
For keys, Filename being one (including the full path), STR(Filesize) + STR(Filedate) + STR(Date) + STR(Time), Random Generation Key, Your Full Name, A private key created one time.
A database to store the keys used for each file encrypted but keep the DAT file on a USB memory stick and NOT on the computer.
This should prevent the repeating pattern on files like Pictures and Music but movies, being four gigs in length or more, may still be vulnerable so may need a sixth key.
I personally have the dat file encrypted itself on the memory stick (Dat file for use with Microsoft Access). I used a 3-Key method to encrypt it cause it will never be THAT large, being a directory of the files with the associated keys.
The reason for multiple keys rather than randomly generating one very large key is that primes times primes get large quick and I have some control over the creation of the key and you KNOW that there really is no such thing as a truely random number. If I created one large random number, someone else can generate that same number.
Method to use the keys: Encrypt the file with one key, then the next, then the next till all keys are used. Each key is used over and over again till the entire file is encrypted with that key.
Because the keys are of different length, the overlap of the repeat is different for each key and so creates a derived key the length of Key one time Key two. This logic repeats for the rest of the keys. The reason for Prime numbers is that the repeating would occur on a division of the key length so you want the division to be 1 or the length of the key, hense, prime.
OK, granted, this is more than a simple XOR on the file but the concept is the same.
Lance
I'm just protecting against casual viewers
As long as this assumption holds, your encryption scheme is ok. People who think that Internet Explorer is "teh internets" are not capable of breaking it.
If not, just use some crypto library. There are already many good algorithms like Blowfish or AES for symmetric crypto.
The target of a good encryption is to make it mathematically difficult to decrypt without the key.
This includes the desire to protect the key itself.
The XOR technique is basically a very simple cipher easily broken as described here.
It is important to note that XOR is used within cryptographic algorithms.
These algorithms work on the introduction of mathematical difficulty around it.
Norton's Anti-virus used to use a technique of using the previous unencrypted letter as the key for next letter. That took me an extra half-hour to figure out, if I recall correctly.
If you just want to stop the casual viewer, it's good enough; I've used to hide strings within executables. It won't stand up 10 minutes to anyone who actually tries, however.
That all said, these days there are much better encryption methods readily available, so why not avail yourself of something better. If you are trying to just hide from the "casual" user, even something like gzip would do that job better.
Another trick is to generate a md5() hash for your password. You can make it even more unique by using the length of the protected text as an offset or combining it with your password to provide better distribution for short phrases. And for long phrases, evolve your md5() hash by combining each 16-byte block with the previous hash -- making the entire XOR key "random" and non-repetitive.
RC4 is essentially XOR encryption! As are many stream ciphers - the key is the key (no pun intended!) you must NEVER reuse the key. EVER!
I'm a little late in answering, but since no one has mentioned it yet: this is called a Vigenère cipher.
Wikipedia gives a number of cryptanalysis attacks to break it; even simpler, though, since most file-formats have a fixed header, would be to XOR the plaintext-header with the encrypted-header, giving you the key.
That ">6%" GeneQ mentions is the index of coincidence for English telegraph text - 26 letters, with punctuation and numerals spelled out. The actual value for long texts is 0.0665.
The <4% is the index of coincidence for random text in a 26-character alphabet, which is 1/26, or 0.385.
If you're using a different language or a different alphabet, the specific values will different. If you're using the ASCII character set, Unicode, or binary bytes, the specific values will be very different. But the difference between the IC of plaintext and random text will usually be present. (Compressed binaries may have ICs very close to that of random, and any file encrypted with any modern computer cipher will have an IC that is exactly that of random text.)
Once you've XORed the text against itself, what you have left is equivalent to an autokey cipher. Wikipedia has a good example of breaking such a cipher
http://en.wikipedia.org/wiki/Autokey_cipher
If you want to keep using XOR you could easily hash the password with multiple different salts (a string that you add to a password before hashing) and then combine them to get a larger key.
E.G. use sha3-512 with 64 unique salts, then hash your password with each salt to get a 32768 bit key that you can use to encrypt a 32Kib (Kilibit) (4KiB (kilibyte)) or smaller file. Hashing this many times should be less than a second on a modern CPU.
for something more secure you could try manipulating your key during encryption like AES (Rijndael). AES actually does XOR times and modifies the key each repeat of the key using a switch table. It became an internation standard so its quite secure.

Resources