RSA signature size? - security

I would like to know what is the length of RSA signature ? Is it always the same size as the RSA key size like if the key size is 1024 then RSA signature is 128 bytes , if the key size is 512 bits then RSA signature is 64 bytes ? what is RSA modulus ?
Also what does RSA-sha1 mean ?
Any pointers greatly appreciated.

You are right, the RSA signature size is dependent on the key size, the RSA signature size is equal to the length of the modulus in bytes. This means that for a "n bit key", the resulting signature will be exactly n bits long. Although the computed signature value is not necessarily n bits, the result will be padded to match exactly n bits.
Now here is how this works: The RSA algorithm is based on modular exponentiation. For such a calculation the final result is the remainder of the "normal" result divided by the modulus. Modular arithmetic plays a large role in Number Theory. There the definition for congruence (≡) is
m is congruent to n mod k if k divides m - n
Simple example - let n = 2 and k = 7, then
2 ≡ 2 (mod 7) because: 7 divides 2 - 2
9 ≡ 2 (mod 7) because: 7 divides 9 - 2
16 ≡ 2 (mod 7) because: 7 divides 16 - 2
...
7 actually does divide 0, the definition for division is
An integer a divides an integer b if there is an integer n with the property that b = na
For a = 7 and b = 0 choose n = 0. This implies that every integer divides 0, but it also implies that congruence can be expanded to negative numbers (won't go into details here, it's not important for RSA).
So the gist is that the congruence principle expands our naive understanding of remainders, the modulus is the "number after mod", in our example it would be 7. As there are an infinite amount of numbers that are congruent given a modulus, we speak of this as the congruence classes and usually pick one representative (the smallest congruent integer > 0) for our calculations, just as we intuitively do when talking about the "remainder" of a calculation.
In RSA, signing a message m means exponentiation with the "private exponent" d, the result r is the smallest integer >0 and smaller than the modulus n so that
m^d ≡ r (mod n)
This implies two things
The length of r (in bits) is bounded by n (in bits)
The length of m (in bits) must be <= n (in bits, too)
To make the signature exactly n bits long, some form of padding is applied. Cf. PKCS#1 for valid options.
The second fact implies that messages larger than n would either have to be signed by breaking m in several chunks <= n, but this is not done in practice since it would be way too slow (modular exponentiation is computationally expensive), so we need another way to "compress" our messages to be smaller than n. For this purpose we use cryptographically secure hash functions such as SHA-1 that you mentioned. Applying SHA-1 to an arbitrary-length message m will produce a "hash" that is 20 bytes long, smaller than the typical size of an RSA modulus, common sizes are 1024 bits or 2048 bits, i.e. 128 or 256 bytes, so the signature calculation can be applied for any arbitrary message.
The cryptographic properties of such a hash function ensures (in theory - signature forgery is a huge topic in the research community) that it is not possible to forge a signature other than by brute force.

Related

Getting an integer value from a byte-string in Python3

I am implementing a RSA and AES file encryption program. So far, I have RSA and AES implemented. What I wish to understand however is , if my AES implementation uses a 16 byte key (obtained by os.urandom(16)) how could I get an integer value from this to encrypt with the RSA ?
In essence, if I have a byte string like
b',\x84\x9f\xfc\xdd\xa8A\xa7\xcb\x07v\xc9`\xefu\x81'
How could I obtain an integer from this byte string (AES key) which could subsequently be used for encryption using (RSA)?
Flow of encryption
Encrypt file (AES Key) -> Encrypt AES key (using RSA)
TL;DR use from_bytes and to_bytes to implement OS2IP and I2OSP respectively.
For secure encryption, you don't directly turn the AES key into a number. This is because raw RSA is inherently insecure in many many ways (the list is not complete at the time of writing).
First you need to random-pad your key bytes to obtain a byte array that will represent a number close to the modulus. Then you can perform the byte array conversion to a number, and only then should you perform modular exponentiation. Modular exponentiation will also result in a number, and you need to turn that number into a statically sized byte array with the same size as the modulus (in bytes).
All this is standardized in the PKCS#1 RSA standard. In v2.2 there are two schemes specified, known as PKCS#1 v1.5 padding and OAEP padding. The first one is pretty easy to implement, but is more vulnerable to padding oracle attacks. OAEP is also vulnerable, but less so. You will however need to follow the implementation hints to the detail, especially during unpadding.
To circle back to your question, the number conversions are called the octet string to integer primitive (OS2IP) and the integer to octet string primitive (I2OSP). These are however not mathematical operations that you need to perform: they just describe how to represent how to encode the number as statically sized, big endian, unsigned integer.
Say that keysize is the key size (modulus size) in bits and em is the bytes or bytearray representing the padded key, then you'd just perform:
m = int.from_bytes(em, byteorder='big', signed=False)
for OS2IP where m will be the input for modular exponentiation and back using:
k = (keysize + 8 - 1) / 8
em = m.to_bytes(k, byteorder='big', signed=False)
for I2OSP.
And you will have to perform the same two operations for decryption...
To literally interpret the byte-string as an integer (which you should be able to do; python integers can get arbitrarily large), you could just sum up the values, shifted the appropriate number of bits:
bytestr = b',\x84\x9f\xfc\xdd\xa8A\xa7\xcb\x07v\xc9`\xefu\x81'
le_int = sum(v << (8 * i) for i, v in enumerate(bytestr))
# = sum([44, 33792, 10420224, 4227858432, 949187772416, 184717953466368, 18295873486192640, 12033618204333965312, 3744689046963038978048, 33056565380087516495872, 142653246714526242615328768, 62206486974090358813680992256, 7605903601369376408980219232256, 4847495895272749231323393057357824, 607498732448574832538068070518751232, 171470411456254147604591110776164450304])
# = 172082765352850773589848323076011295788
That would be a little-endian interpretation; a big-endian interpretation would just start reading from the other side, which you could do with reversed():
be_int = sum(v << (8 * i) for i, v in enumerate(reversed(bytestr)))
# = sum([129, 29952, 15663104, 1610612736, 863288426496, 129742372077568, 1970324836974592, 14627691589699371008, 3080606260309495119872, 306953821386526938890240, 203099537695257701350637568, 68396187170517260188176613376, 19965496953594613073573075484672, 3224903126980615597407612954476544, 685383185326597246966025515457052672, 58486031814536298407767510652335161344])
# = 59174659937086426622213974601606591873

In PBKDF2 is INT (i) signed?

Page 11 of RFC 2898 states that for U_1 = PRF (P, S || INT (i)), INT (i) is a four-octet encoding of the integer i, most significant octet first.
Does that mean that i is a signed value and if so what happens on overflow?
Nothing says that it would be signed. The fact that dkLen is capped at (2^32 - 1) * hLen suggests that it's an unsigned integer, and that it cannot roll over from 0xFFFFFFFF (2^32 - 1) to 0x00000000.
Of course, PBKDF2(MD5) wouldn't hit 2^31 until you've asked for 34,359,738,368 bytes. That's an awful lot of bytes.
SHA-1: 42,949,672,960
SHA-2-256 / SHA-3-256: 68,719,476,736
SHA-2-384 / SHA-3-384: 103,079,215,104
SHA-2-512 / SHA-3-512: 137,438,953,472
Since the .NET implementation (in Rfc2898DeriveBytes) is an iterative stream it could be polled for 32GB via a (long) series of calls. Most platforms expose PBKDF2 as a one-shot, so you'd need to give them a memory range of 32GB (or more) to identify if they had an error that far out. So even if most platforms get the sign bit wrong... it doesn't really matter.
PBKDF2 is a KDF (key derivation function), so used for deriving keys. AES-256 is 32 bytes, or 48 if you use the same PBKDF2 to generate an IV (which you really shouldn't). Generating a private key for the ECC curve with a 34,093 digit prime is (if I did my math right) 14,157 bytes. Well below the 32GB mark.
i ranges from 1 to l = CEIL (dkLen / hLen), and dkLen and hLen are positive integers. Therefore, i is strictly positive.
You can, however, store i in a signed, 32-bit integer type without any special handling. If i rolls over (increments from 0x7FFFFFFF to 0xF0000000), it will continue to be encoded correctly, and continue to increment correctly. With two's complement encoding, bitwise results for addition, subtraction, and multiplication are the same as long as all values are treated as either signed or unsigned.

Selecting parameters for string hashing

I was recently reading an article on string hashing. We can hash a string by converting a string into a polynomial.
H(s1s2s3 ...sn) = (s1 + s2*p + s3*(p^2) + ··· + sn*(p^n−1)) mod M.
What are the constraints on p and M so that the probability of collision decreases?
A good requirement for a hash function on strings is that it should be difficult to find a
pair of different strings, preferably of the same length n, that have equal fingerprints. This
excludes the choice of M < n. Indeed, in this case at some point the powers of p corresponding
to respective symbols of the string start to repeat.
Similarly, if gcd(M, p) > 1 then powers of p modulo M may repeat for
exponents smaller than n. The safest choice is to set p as one of
the generators of the group U(ZM) – the group of all integers
relatively prime to M under multiplication modulo M.
I am not able to understand the above constraints. How selecting M < n and gcd(M,p) > 1 increases collision? Can somebody explain these two with some examples? I just need a basic understanding of these.
In addition, if anyone can focus on upper and lower bounds of M, it will be more than enough.
The above facts has been taken from the following article string hashing mit.
The "correct" answers to these questions involve some amount of number theory, but it can often be instructive to look at some extreme cases to see why the constraints might be useful.
For example, let's look at why we want M ≥ n. As an extreme case, let's pick M = 2 and n = 4. Then look at the numbers p0 mod 2, p1 mod 2, p2 mod 2, and p3 mod 2. Because there are four numbers here and only two possible remainders, by the pigeonhole principle we know that at least two of these numbers must be equal. Let's assume, for simplicity, that p0 and p1 are the same. This means that the hash function will return the same hash code for any two strings whose first two characters have been swapped, since those characters are multiplied by the same amount, which isn't a desirable property of a hash function. More generally, the reason why we want M ≥ n is so that the values p0, p1, ..., pn-1 at least have the possibility of being distinct. If M < n, there will just be too many powers of p for them to all be unique.
Now, let's think about why we want gcd(M, p) = 1. As an extreme case, suppose we pick p such that gcd(M, p) = M (that is, we pick p = M). Then
s0p0 + s1p1 + s2p2 + ... + sn-1pn-1 (mod M)
= s0M0 + s1M1 + s2M2 + ... + sn-1Mn-1 (mod M)
= s0
Oops, that's no good - that makes our hash code exactly equal to the first character of the string. This means that if p isn't coprime with M (that is, if gcd(M, p) ≠ 1), you run the risk of certain characters being "modded out" of the hash code, increasing the collision probability.
How selecting M < n and gcd(M,p) > 1 increases collision?
In your hash function formula, M might reasonably be used to restrict the hash result to a specific bit-width: e.g. M=216 for a 16-bit hash, M=232 for a 32-bit hash, M=2^64 for a 64-bit hash. Usually, a mod/% operation is not actually needed in an implementation, as using the desired size of unsigned integer for the hash calculation inherently performs that function.
I don't recommend it, but sometimes you do see people describing hash functions that are so exclusively coupled to the size of a specific hash table that they mod the results directly to the table size.
The text you quote from says:
A good requirement for a hash function on strings is that it should be difficult to find a pair of different strings, preferably of the same length n, that have equal fingerprints. This excludes the choice of M < n.
This seems a little silly in three separate regards. Firstly, it implies that hashing a long passage of text requires a massively long hash value, when practically it's the number of distinct passages of text you need to hash that's best considered when selecting M.
More specifically, if you have V distinct values to hash with a good general purpose hash function, you'll get dramatically less collisions of the hash values if your hash function produces at least V2 distinct hash values. For example, if you are hashing 1000 values (~210), you want M to be at least 1 million (i.e. at least 2*10 = 20-bit hash values, which is fine to round up to 32-bit but ideally don't settle for 16-bit). Read up on the Birthday Problem for related insights.
Secondly, given n is the number of characters, the number of potential values (i.e. distinct inputs) is the number of distinct values any specific character can take, raised to the power n. The former is likely somewhere from 26 to 256 values, depending on whether the hash supports only letters, or say alphanumeric input, or standard- vs. extended-ASCII and control characters etc., or even more for Unicode. The way "excludes the choice of M < n" implies any relevant linear relationship between M and n is bogus; if anything, it's as M drops below the number of distinct potential input values that it increasingly promotes collisions, but again it's the actual number of distinct inputs that tends to matter much, much more.
Thirdly, "preferably of the same length n" - why's that important? As far as I can see, it's not.
I've nothing to add to templatetypedef's discussion on gcd.

Does ((a^x) ^ 1/x) == a in Zp? (for Jablon's protocol)

I have to implement Jablon's protocol (paper) but I've been sitting on a bug for two hours.
I'm not very good with math so I don't know if it's my fault in writing it or it just isn't possible. If it isn't possible, I don't see how Jablon's protocol can be implemented since it relies on the fact that ((gP ^ x) ^ yi) ^ (1/x) == gP^yi .
Take the following code. It doesn't work.
BigInteger p = new BigInteger("101");
BigInteger a = new BigInteger("83");
BigInteger x = new BigInteger("13");
BigInteger ax = a.modPow(x, p);
BigInteger xinv = x.modInverse(p);
BigInteger axxinv = ax.modPow(xinv, p);
if (a.equals(axxinv))
System.out.println("Yay!");
else
System.out.println("How is this possible?");
Your problem is that you're not calculating k(1/x) correctly. We need k(1/x))k to be x. Fermat's Little Theorem tells us that kp-1 is 1 mod p. Therefore we want to find y such that x * y is 1 mod p-1, not mod p.
So you want BigInteger xinv = x.modInverse(p-1);.
This will not work if x shares a common factor with p-1. (Your case avoids that.) For that, you need additional theory.
If p is a prime, then r is a primitive root if none of r, r^2, r^3, ..., r^(p-2) are congruent to 1 mod p. There is no simple algorithm to produce a primitive root, but they are common so you usually only need to check a few. (For p=101, the first number I tried, 2, turned out to be a primitive root. 83 is also.) Testing them would seem to be hard, but it isn't so bad since it turns out that (omitting a bunch of theory here) only divisors of p-1 need to be checked. For instance for 101 you only need to check the powers 1, 2, 4, 5, 10, 20, 25 and 50.
Now if r is a primitive root, then every number mod p is some power of r. What power? That's called the discrete logarithm problem and is not simple. (It's difficulty is the basis of RSA, which is a well known cryptography system.) You can do it with trial division. So trying 1, 2, 3, ... you eventually find that, for instance, 83 is 2^89 (mod 101).
But once we know that every number from 1 to 100 is 2 to some power, we are armed with a way to calculate roots. Because raising a number to the power of x just multiplies the exponent by x. And 2^100 is 1. So exponentiation is multiplying by x (mod 100).
So suppose that we want y ^ 13 to be 83. Then y is 2^k for some k such that k * 13 is 89. If you play around with the Chinese Remainder Theorem you can realize that k = 53 works. Therefore 2^53 (mod 101) = 93 is the 13'th root of 89.
That is harder than what we did before. But suppose that we wanted to take, say, the 5th root of 44 mod 101. We can't use the simple procedure because 5 does not have a multiplicative inverse mod 100. However 44 is 2^15. Therefore 2^3 = 8 is a 5th root. But there are 4 others, namely 2^23, 2^43, 2^63 and 2^83.

How do I convert the first 32 bits of the fractional part of a Float into a Word32?

Say I have a Float. I want the first 32 bits of the fractional part of this Float? Specifically, I am looking at getting this part of the sha256 pseudo-code working (from the wikipedia)
# Note 1: All variables are unsigned 32 bits and wrap modulo 232 when calculating
# Note 2: All constants in this pseudo code are in big endian
# Initialize variables
# (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):
h[0..7] := 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
I naively tried doing floor (((sqrt 2) - 1) * 2^32), and then converting the returned integer to a Word32. This does not at all appear to be the right answer. I figured that by multipling by 2^32 power, I was effectively left shifting it 32 places (after the floor). Obviously, not the case. Anyway, long and the short of it is, how do I generate h[0..7] ?
The best way to get h[0..7] is to copy the hex constants from the Wikipedia page. That way you know you will have the correct ones.
But if you really want to compute them:
scaledFrac :: Integer -> Integer
scaledFrac x =
let s = sqrt (fromIntegral x) :: Double
in floor ((s - fromInteger (floor s)) * 2^32)
[ printf "%x" (scaledFrac i) | i <- [2,3,5,7,11,13,17,19] ]

Resources