I'm using encryption on my entire HDD (aes 256) and i'm wondering what length password i would need so that the password also is 256 bits. As we all know the password is usually the weak link with encryption so i think this is good thing to know. The password will be made up of letters (capital and small) numbers and punctuation and be random. Thanks.
If the password is truly random (aka non-memorizable), then with the characters described, you are getting about 6 bits of randomness per 8-bit byte of password. Therefore, you need about (256 / 6) = 43 characters in the password to contain about 256 bits of randomness. If the password is memorable, you need many more characters to attain the 256 bits of randomness. Running English text has less than 4 bits of randomness per byte.
You might do better to take a long pass-phrase and generate a 256-bit hash of that (SHA-256, perhaps). Your pass-phrase might be a miniature essay - maybe 80-128 characters long; more would not hurt.
If you're using only letters and numbers, then you've got a total of 26 × 2 + 10 = 62 possible values per character. That's close to 64, so you have just under 6 bits of entropy per character.
If you want 256 bits, then you need about 43 characters from your character set.
Further reading: http://en.wikipedia.org/wiki/Password_strength
Related
I thought, I know the base64 encoding, but I often see so encoded text: iVBORw0KGgoAAAANSUhEUgAAAXwAAAG4C ... YPQr/w8B0CBr+DAkGQAAAABJRU5ErkJggg==. I got mean, It overs by double =. Why does it push second void byte, if 8 bits enough satisfy empty bits of encoded text?
I found. Length of base64-encoded data should be divisible by 4. The remainder of division length by 4 is filled by =-chars. That's because conflict of bit rate. In modern systems it's used 8-bits bytes, but in base64 is used 6-bits chars. So the lowest common multiple is 24 bits or 3 bytes or 4 base64-chars. A lack is filled by =.
A random string should be incompressible.
pi = "31415..."
pi.size # => 10000
XZ.compress(pi).size # => 4540
A random hex string also gets significantly compressed. A random byte string, however, does not get compressed.
The string of pi only contains the bytes 48 through 57. With a prefix code on the integers, this string can be heavily compressed. Essentially, I'm wasting space by representing my 9 different characters in bytes (or 16, in the case of the hex string). Is this what's going on?
Can someone explain to me what the underlying method is, or point me to some sources?
It's a matter of information density. Compression is about removing redundant information.
In the string "314159", each character occupies 8 bits, and can therefore have any of 28 or 256 distinct values, but only 10 of those values are actually used. Even a painfully naive compression scheme could represent the same information using 4 bits per digit; this is known as Binary Coded Decimal. More sophisticated compression schemes can do better than that (a decimal digit is effectively log210, or about 3.32, bits), but at the expense of storing some extra information that allows for decompression.
In a random hexadecimal string, each 8-bit character has 4 meaningful bits, so compression by nearly 50% should be possible. The longer the string, the closer you can get to 50%. If you know in advance that the string contains only hexadecimal digits, you can compress it by exactly 50%, but of course that loses the ability to compress anything else.
In a random byte string, there is no opportunity for compression; you need the entire 8 bits per character to represent each value. If it's truly random, attempting to compress it will probably expand it slightly, since some additional information is needed to indicate that the output is compressed data.
Explaining the details of how compression works is beyond both the scope of this answer and my expertise.
In addition to Keith Thompson's excellent answer, there's another point that's relevant to LZMA (which is the compression algorithm that the XZ format uses). The number pi does not consist of a single repeating string of digits, but neither is it completely random. It does contain substrings of digits which are repeated within the larger sequence. LZMA can detect these and store only a single copy of the repeated substring, reducing the size of the compressed data.
How do we shrink/encode a 20 letter string to 6 letters. I found few algorithms address data compression like RLE, Arithmetic coding, Universal code but none of them guarantees 6 letters.
The original string can contain the characters A-Z (upper case), 0-9 ans a dash.
If your goal is to losslessly compress or hash an random input string of 20 characters (each character could be [A-Z], [0-9] or -) to an output string of 6 characters. It's theoretically impossible.
In information theory, given a discrete random variable X={x|x1,...,xn}, the Shannon entropy H(X) is defined as:
where p(xi) is the probablity of X = xi. In your case, X has 20 of 37 possible characters, so it could be {x|x1,...,xn} where n = 37^20. Supposing the 37 characters have the same probability of being (aka the input string is random), then p(xi) = 1/37^20. So the Shannon entropy of the input is:
. A char in common computer can hold 8 bit, so that 6 chars can hold 48 bit. There's no way to hold 104 bit information by 6 chars. You need at least 15 chars to hold it instead.
If you do allow the loss and have to hash the 20 chars into 6 chars, then your are trying to hash 37^20 values to 128^6 keys. It could be done, but you would got plenty of hash collisions.
In your case, supposing you hash them with the most uniformity (otherwise it would be worse), for each input value, there would be by average of 5.26 other input values sharing the same hash key with it. By a birthday attack, we could expect to find a collision within approximately 200 million trials. It could be done in less than 10 seconds by a common laptop. So I don't think this would be a safe hashing.
However if you insist to do that, you might want to read Hash function algorithms. It lists a lot of algorithms for your choice. Good luck!
Suppose I have a function that can generate random numbers of different lengths. So I can generate a session ID which is 32 bit long and a 512 bit long also.
But I want to know what would be the security impact if I combine 16 32 bit randomly generated ids to create a 512 bit one. Will the security be equal to generating the 512 bit id directly form the algorithm or concatenating is same ?
I am using either mt_crypt or openssl random function in PHP.
Assuming that the random generater you are using is not broken in any way, it does not matter if you generate the 512 bits of random data in one call, or by combine 16 32 bit randomly generated id's.
The outcome would be 512 bits of random data in both cases, with comparable security.
I am trying to learn the basics of compression using only ASCII.
If I am sending an email of strings of lower-case letters. If the file has n
characters each stored as an 8-bit extended ASCII code, then we need 8n bits.
But according to Guiding principle of compression: we discard the unimportant information.
So using that we don't need all ASCII codes to code strings of lowercase letters: they use only 26 characters. We can make our own code with only 5-bit codewords (25 = 32 > 26), code the file using this coding scheme and then decode the email once received.
The size has decreased by 8n - 5n = 3n, i.e. a 37.5% reduction.
But what IF the email was formed with lower-case letters (26), upper-case letters, and extra m characters and they have to be stored efficiently?
If you have n symbols of equal probability, then it is possible to code each symbol using log2(n) bits. This is true even if log2(n) is fractional, using arithmetic or range coding. If you limit it to Huffman (fixed number of bits per symbol) coding, you can get close to log2(n), with still a fractional number of bits per symbol on average.
For example, you can encode ten symbols (e.g. decimal digits) in very close to 3.322 bits per symbol with arithmetic coding. With Huffman coding, you can code six of the symbols with three bits and four of the symbols with four bits, for an average of 3.4 bits per symbol.
The use of shift-up and shift-down operations can be beneficial since in English text you expect to have strings of lower case characters with occasional upper case characters. Now you are getting into both higher order models and unequal frequency distributions.