Modified SHA256 for PRNG seeding - sha

Can I do this
1. Copy SHA hash constants to eight 32bit work variables.
2. Expand message.
3. Mix work variables (SHA inner loop).
4. Output work variables to PRNG state.
instead of the normal procedure (single message block)
1. Pad message block.
2. Copy SHA hash constants to hash.
3. Read hash into eight 32bit work variables.
4. Expand message.
5. Mix work variables (SHA inner loop).
6. Add work variables to hash.
7. Output hash to PRNG state.
if I only want good bit mixing of some input entropy for seeding a non-cryptographic PRNG? Security is completely irrelevant. All I need is to generate a good PRNG state from a time stamp combined with some hardware bits (8 byte time stamp, 56 bytes from hardware).

While I'm not sure about the specifics of how you want to simplify things, generally simplifying a cryptographic algorithm for non-cryptographic purposes is just fine provided you document very clearly that your use of a cryptographic primitive does not imply any cryptographic strength in the resulting code.
Normally you might implement the exact specification so that you can verify it against a third-party reference (ensuring that the code is connected as expected and that no data is discarded, etc.), and then reduce the number of rounds so it goes much faster.
A common motivation for doing this is when you have access to hardware acceleration for a cryptographic algorithm.
However, if your input is a fixed-length 64 bits then sha256 is typically more trouble than it's worth.
You haven't specified the size of the state of the PRNG. If it's bigger than 64 bits then you probably just want to seed a simple 64-bit PRNG with your seed and then use that iteratively to fill in the larger PRNG state buffer. If it's exactly 64-bits then something like the mix function of MurmurHash might be sufficient.

Related

How is SHA unique?

I am trying to understand SHA uniqueness in simple terms.
For example let us assume there are only messages with maximum length of 4 bits (binery) in whole world. Number of possible messages with different lengths is
2 for single bit length
2^2 for double bit length
2^3 for 3 bit length
2^4 for 4 bit length
that would be 2+4+8+16 = 30 (31 if we consider empty message 2^0 = 1)
Lets us consider SHA3(for example) with output length of 3bits (binery), so maximum possible number of digest are 8.
How can a digest be unique if we need to map 30 messages to 8, or why is it hard to find digest collision for 2 unique messages
I'm not sure what you mean by "SHA uniqueness". An SHA value (any version) is not unique, it cannot be, because it maps an infinite number of inputs (an input of any length) to a finite number of outputs.
A cryptographic hash function has three important properties (which make it a crypto hash, over a regular hash):
strong collision resistance: it is very difficult (computationally infeasible, ie. "not practically possible") to find two inputs that produce the same output (even if you can choose both)
weak collision resistance: for a given input, it is computationally infeasible to find another input that gives the same hash value (you can choose one input to match the output of a given input)
preimage resistance: for a hash value, it's computationally infeasible to find an input that produces that output (it's "one-way")
The only problem in your example is the size. With such small numbers it doesn't make sense of course. But if the hash value is say 512 bits, it suddenly gets really time consuming and hence practically impossible to brute force.
"SHA3 which has digest length of 3bits"
I think this question is based on one bit misunderstanding. SHA-3 is a family of hashes that has the same output bit size as SHA-2. SHA-2 has bit sizes 224, 256, 384 or 512 for SHA-224, SHA-256, SHA-384 and SHA-512 respectively.
Of course, SHA-2 already took those identifiers, so SHA-3 will have SHA3-224, SHA3-256, SHA3-384 and SHA3-512. There were some proposals to use a different acronym, but those failed.
Still, SHA-3 hashes have near infinite input, so there will be many hashes that map to the same value. However, since it is not possible reverse any SHA-3 algorithm, it should be impossible to find a collision. That is, unless SHA-3 is broken, as it is not provably secure.
Any SHA3 variant will have digests with more than 100 bits. The terminology has probably confused you, because SHA256 has 256 bits, while SHA3 is considered the third generation of SHA algorithms (and does NOT have 3 bits of lenght).
Generally speaking it's not hard to find a hash collision by brute-forcing (alas, it's time-consuming), what is difficult is producing a collision that is also meaningful in its context. For example, assume you have a source file for an important application, that hashes to a digest. If an attacker tried to alter the source file in a way to introduce a vulnerability, while also hashing to the same digest, he'd have to introduce a lot of random gibberish, making the attack obvious.

How random is crypto#randomBytes?

How random is crypto.randomBytes(20).toString('hex')?
Easy as that, all I need to know.
How random is crypto.randomBytes()? Usually, random enough for whatever purpose you need.
crypto.randomBytes() generates cryptographically secure random data:
crypto.randomBytes(size[, callback])
Generates cryptographically strong pseudo-random data. The size argument is a number indicating the number of bytes to generate.
This means that the random data is secure enough to use for encryption purposes. In fact, the function is just a wrapper around OpenSSL's RAND_bytes() function. This part of their documentation states:
RAND_bytes will fetch cryptographically strong random bytes. Cryptographically strong bytes are suitable for high integrity needs, such as long term key generation. If your generator is using a software algorithm, then the bytes will be pseudo-random (but still cryptographically strong).
Unless you have a hardware random number generator, the bytes will be pseudo-random—generated predictably from a seed value. The seed is generated from an OS-specific source (/dev/urandom on Unix-like systems, CryptGenRandom on Windows). As long as your seed is relatively random and not known to an attacker, the data produced will appear totally random.
If you like, you could perform the test described here:
Given any arbitrary sequence of binary digits it is possible to examine it using statistical techniques. There are various suites of statistical tests available such as STS (Statistical Test Suite) available from NIST's RANDOM NUMBER GENERATION page. This suite provides a number of different tests including:
The Frequency (Monobit) Test: Checks whether the proportion of 0s and 1s in a given sequence are approximately as one would expect
The Runs Test: Tests whether the number of runs of consecutive identical digits of varying lengths within a given sequence is as expected
The Longest Run of Ones in a block: Confirms whether the longest single run of ones within a sequence is as would be expected
That would give you a very good indication on how random your generator is on your system. Rest assured, though, that it's likely to be virtually indistinguishable from a truly random source, so it should be sufficiently random for nearly any application.

Entropy for Nonce creation

In OAuth, a Nonce is used to prevent replay-attacks. In addition to the nonce, a timestamp is also used (and can be considered a second nonce, as, when strictly sticking with the specification, there is no timeframe in which requests are consideed valid - servers MAY, not MUST limit the range).
The question that came into my mind when implementing a OAuth-Client is: Do Nonces have to be cryptographcally secure?
Two points are important to me here:
Is it ok to use /dev/urandom instead of /dev/random and risk predictable values if the system is running low on entropy when many nonces are created in little time?
(For those not familiar with random/urandom: This would have an advantage in performance, as /dev/urandom doesn't block calls when little entropy is available at the cost of security, as, of course, values are less random).
As nonces have to be encoded to be sent if they contain non-ASCII-characters, it's the easiest thing to create them only out of those ASCII chars that can be sent as-is ([0-9A-Za-z_-+~] AFAIR). Of course this limits entropy again, so the nonce has t be longer to be equally strong. In your oppinion, what's a reasonable length for nonce that only consist of those characters and is it worth the advantage of not having to encode?
Normally it is hardly ever useful to use /dev/random instead of /dev/urandom. You can make a point of using it to seed other PRNG's if you don't want to have those PRNG's rely on /dev/urandom. For nonce's, you should certainly be better off using /dev/urandom. Or you can use a well seeded, thread local cryptographically secure PRNG implemented in your app or library of course.
If you want to send a nonce (or most binary data) over ASCII then you can use hexadecimals or base 64. For the best readability of the value itself, use hex, for efficiency use base64. Now by default base 64 uses numbers, upper and lowercase letters, plus the characters +, / and = but if you want to use other values you can always URLencode the base64, replace the characters you do not want, or use one of the variants.

How (if at all) does a predictable random number generator get more secure after SHA-1ing its output?

This article states that
Despite the fact that the Mersenne Twister is an extremely good pseudo-random number generator, it is not cryptographically secure by itself for a very simple reason. It is possible to determine all future states of the generator from the state the generator has at any given time, and either 624 32-bit outputs, or 19,937 one-bit outputs are sufficient to provide that state. Using a cryptographically-secure hash function, such as SHA-1, on the output of the Mersenne Twister has been recommended as one way of obtaining a keystream useful in cryptography.
But there are no references on why digesting the output would make it any more secure. And honestly, I don't see why this should be the case. The Mersenne Twister has a period of 2^19937-1, but I think my reasoning would also apply to any periodic PRNG, e.g. Linear Congruential Generators as well. Due to the properties of a secure one-way function h, one could think of h as an injective function (otherwise we could produce collisions), thus simply mapping the values from its domain into its range in a one-to-one manner.
With this thought in mind I would argue that the hashed values will produce exactly the same periodical behaviour as the original Mersenne Twister did. This means if you observe all values of one period and the values start to recur, then you are perfectly able to predict all future values.
I assume this to be related to the same principle that is applied in password-based encryption (PKCS#5) - because the domain of passwords does not provide enough entropy, simply hashing passwords doesn't add any additional entropy - that's why you need to salt passwords before you hash them. I think that exactly the same principle applies here.
One simple example that finally convinced me: Suppose you have a very bad PRNG that will always produce a "random number" of 1. Then even if SHA-1 would be a perfect one-way function, applying SHA-1 to the output will always yield the same value, thus making the output no less predictable than previously.
Still, I'd like to believe there is some truth to that article, so surely I must have overlooked something. Can you help me out? To a large part, I have left out the seed value from my arguments - maybe this is where the magic happens?
The state of the mersenne twister is defined by the previous n outputs, where n is the degree of recurrence (a constant). As such, if you give the attacker n outputs straight from a mersenne twister, they will immediately be able to predict all future values.
Passing the values through SHA-1 makes it more difficult, as now the attacker must try to reverse the RNG. However, for a 32-bit word size, this is unlikely to be a severe impediment to a determined attacker; they can build a rainbow table or use some other standard approach for reversing SHA-1s, and in the event of collisions, filter candidates by whether they produce the RNG stream observed. As such, the mersenne twister should not be used for cryptographically sensitive applications, SHA-1 masking or no. There are a number of standard CSPRNGs that may be used instead.
An attacker is able to predict the output of MT based on relatively few outputs not because it repeats over such a short period (it doesn't), but because the output leaks information about the internal state of the PRNG. Hashing the output obscures that leaked information. As #bdonlan points out, though, if the output size is small (32 bits, for instance), this doesn't help, as the attacker can easily enumerate all valid plaintexts and precalculate their hashes.
Using more than 32 bits of PRNG output as an input to the hash would make this impractical, but a cryptographically secure PRNG is still a much better choice if you need this property.

Is there an encryption technique that could turn an 8-digit number into something 10 or 11 digits or less?

Many of the encryption techniques I've seen can easily encrypt a simple 8 digit number like "12345678" but the result is often something like "8745b34097af8bc9de087e98deb8707aac8797d097f" (made up but you get the idea).
Is there a way to encrypt this 8 digit number but have the resulting encrypted value be the same or at least only a slightly longer number? An ideal target would be to end up with a 10 digit number or less. Is this possible while still maintaining a fairly strong encryption?
Update: I didn't make the output clear enough - I am wanting an 8-digit number to turn into an 8-digit number, not 8 bytes.
A lot here is going to depend on how seriously you mean your "public-key-encryption" tag. Do you actually want public key encryption, or are you just taking that possibility into account?
If you're willing to use symmetric encryption, producing 8 bytes of output from 8 bytes of input is pretty easy: just run 3DES in ECB (Electronic Code Book) mode, and that's what you'll get. The main weakness of ECB is that a given input will always produce the same result, so if your inputs might repeat an attacker will be able to see that repetition, and may be able to notice a pattern of "encrypted value X leads to action Y", even if they can't/don't break the encryption itself at all. If you can live with that, 3DES/ECB is probably your answer.
If you can't live with that, 3DES in CFB mode is probably the next best. This will produce 16 bytes of output from 8 bytes of input (note that it's not normally doubling the input size, but adding 8 bytes to the input size).
3DES is hardly what anybody would call a cutting edge algorithm, but I'd say it still qualifies as "fairly strong encryption". Part of its weakness as an algorithm stems from its relatively small block size, but that also minimizes expansion of the output.
Edit: Sorry, I forgot to the public-key possibility. With most public-key cryptography, the smallest result is roughly equal to the key size. With RSA encryption, that'll typically mean a minimum of something like 1024 bits (and often considerably more than that). To keep the result smaller, I'd probably use Elliptical Curve Cryptography, for which a ~200 bit key is reasonably secure against known attacks. This will still be larger than 3DES/CFB, but not outrageously so.
Well you could look a stream cipher which encrypts bytes 1:1. With N bytes input, there are N bytes encrypted/decrypted output. Such ciphers are usually based on an algorithm that creates a stream of random numbers, with the encryption key/IV acting as seed.
For some stream ciphers, look at the eSTREAM candidates. I don't know of any relevant attacks on HC-128 and HC-256, for example.

Resources