Python hashlib SHA256 with starting value - python-3.x

I want to process a HMAC* length extension attack for a university task.
Therefore I have both, a HMAC* and the corresponding message, provided and want to attach another arbitrary message and recalculate the HMAC without having the key.
Regarding our lecture, this is possible and a very common attack scenario.
My problem is rather implementation based:
To drive this attack, I need to replace the default SHA256 starting values (h0 to h7) with the existing HMAC* I already have. As I do not have the key, just pushing in the orginal data will not be possible.
Is there any way except reimplementing SHA256 that would allow me to replace these starting values in python3?
Clarification
I have a valid HMAC* h given.
Furthermore, there is the a message m that has been used (together with a secret key k) to generate h. (h = SHA256(k || m)).
My task: I need to find a way to derivate another HMAC* h' without knowing k on the basis of m. It turned out, that the new message is m' = m + pad(k||m) + a with a randomly chosen a.
Further clarification
*: With "HMAC" I do not refer to the standard of RFC 2014. HMAC in general "is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key." (Wikipedia.org/HMAC).
In this case, the HMAC is calculated as h = SHA256(k || m) where k is the secret key, || is the concatenation and m is the message.

First of all, the SHA256 context includes multiple parameters. In this solution, two of these are relevant: The state which somehow represents the "progress" of the SHA256 algorithm. The final state is actually the SHA256 hash sum. And the seconds parameter is the overall message length in bits, that will be set at the end of the padding.
Furthermore, SHA256 always uses padding what means, another sequence of bytes p is implicitly added to the input data before calculating the final hash and depends on the actual input value. So lets say SHA256(x) = mySHA256(x || p(x)) assuming that mySHA256 is not using padding.
When the given HMAC h has been generated using h = SHA256(k || m) = mySHA256(k || m || p) where k was the secret key and m was the message, h represented the final state of the SHA256 context. Additionally, we have an implicit padding p that depends on k || m. Hereby, p is not rather dependend on len(k) and not k itself, what means that we can calculate p without knowing the key but it's length.
As my target will only accept my modified message m' = m + a when I deliver a correct HMAC h' = SHA256(k || m'), I need to focus on that point now. By knowing the original HMAC h, I can set the state of the SHA256 context corresponding to h. As I know the message m as well, and I know that the overall message length in bits is (len(k) + len(m) + len(p)) * 8, my overall message length is just depending on len(k) (not k!) because len(p) only depends on len(k) and len(m). I will iterate through a range of len(k), like 1 - 64. In each iteration step, I can just insert my value for len(k). So it is possible to set the overall message length (the second parameter of my SHA256 context), too.
When iterating through all key lengths, there will be one value that represents the length of the key that has actually been used. In that case, I have a SHA256 context that exactly equals the context of the original calculation. We can now add our arbitrary data a to the hash calculation and create another HMAC h' that does depend on the key k without knowing it. h' = SHA256(k || m || p || a)
But now, we have to ensure that this HMAC h' equal to that one, the target calculates using our message m'.
Therefore, we add our padding p to the end of original message m followed by our arbitrary message a. Finally we have m' = m || p || a.
As the target knows the secret key in order to validate the input data, it can easily calculate SHA256(k || m') = SHA256(k || m || p || a)* and oooops! Indeed that is the same hash sum as our HMAC h' that we calculated without knowing the secret key k
Result:
We can not add a fully arbitrary message, but a message that is fully arbitrary after the padding. As the padding is mostly filled with Null-Bytes, that can disturb our attack, but that depends on each case. In my case, the Null-Bytes were ignored and I just had one artifact from the overall message length displayed before my inserted message.

Related

Unpadded RSA ciphertext multiplication by 2**e breaks deciphering on a small message _sporadically_

Please, help me to understand, why the snippet below fails to decrypt the message sometimes (it successes best 5 out 6 times) when ran multiple times.
It generates an 1024-bit rsa keys pair, then encrypts "Hello World!!" in most naive way possible, doubles ciphertext, decrypts doubled ciphertext, and finally divides the result to get original plaintext. At the step of decryption it could be clearly seen (logging doubled_decr) when it is going wildly off.
As the given plaintext is small, it should recover from doubling well. bigint-mod-arith package, used for modular exponentiation here, is maintained and have some tests (though really big numbers only in performance section) in it, was used for a number of times, and doesn't seem to be a cause.
import {generateKeyPairSync, privateDecrypt, publicEncrypt, constants} from "node:crypto";
import * as modAr from "bigint-mod-arith";
// "Generate a 1024 bit RSA key pair." https://cryptopals.com/sets/6/challenges/46
const keys = generateKeyPairSync("rsa", {modulusLength: 1024});
let jwk_export = keys.privateKey.export({format: "jwk"});
let pt_test = Buffer.from("Hello World!!");
let ct_test = naiveEncrypt(pt_test, keys.publicKey);
let doubled = bigintFromBuf(ct_test)
* modAr.modPow(2, bigintFromParam(jwk_export.e), bigintFromParam(jwk_export.n));
let doubled_decr = naiveDecrypt(Buffer.from(doubled.toString(16), "hex"), keys.privateKey);
console.debug(pt_test, "plaintext buffer");
console.debug(doubled_decr, "homomorphically doubled buffer (after decryption)");
console.debug(
"_Decrypted doubled buffer divided back by 2 and converted to text_:",
Buffer.from((bigintFromBuf(doubled_decr) / 2n).toString(16), "hex").toString()
)
function bigintFromParam(str) {return bigintFromBuf(Buffer.from(str, "base64url"))}
function bigintFromBuf(buf) {return BigInt("0x" + buf.toString("hex"))}
function naiveEncrypt(message, publicKey) {
const keyParameters = publicKey.export({format: "jwk"});
// console.debug(bigintFromParam(keyParameters.e));
// console.debug(bigintFromParam(keyParameters.n));
return Buffer.from(modAr.modPow(
bigintFromBuf(message),
bigintFromParam(keyParameters.e),
bigintFromParam(keyParameters.n)
).toString(16), "hex");
}
function naiveDecrypt(message, privateKey) {
const keyParameters = privateKey.export({format: "jwk"});
// console.debug(bigintFromParam(keyParameters.d));
console.assert(
bigintFromParam(keyParameters.e) == modAr.modInv(
bigintFromParam(keyParameters.d),
(bigintFromParam(keyParameters.q) - 1n) * (bigintFromParam(keyParameters.p) - 1n)
)
);
return Buffer.from(modAr.modPow(
bigintFromBuf(message),
bigintFromParam(keyParameters.d),
bigintFromParam(keyParameters.n)
).toString(16), "hex");
}
There are two problems, one fundamental and one 'just coding':
you need to divide the 'doubled' plaintext by 2 in the modular ring Zn not the plain integers Z. In general to divide in Zn we modular-multiply by the modular inverse -- a over b = a*modInv(b,n)%n -- but for the particular case of 2 we can simplify to just a/2 or (a+n)/2
when you take bigint.toString(16) the result is variable length depending on the value of the bigint. Since RSA cryptograms are for practical purposes uniform random numbers in [2,n-1], with a 1024-bit key most of the time the result is 128 digits, but about 1/16 of the time is it 127 digits, 1/256 of the time it is 126 digits, etc. If the number of digits is odd, doing Buffer.from(hex,'hex') throws away the last digit and produces a value that is very wrong for your purpose.
In standard RSA we conventionally represent all cryptograms and signatures as a byte string of fixed length equal to the length needed to represent n -- for a 1024-bit key always 128 bytes even if that includes leading zeros. For your hacky scheme, it is sufficient if we have an even number of digits -- 126 is okay, but not 127.
I simplified your code some to make it easier for me to test -- in particular I compute the bigint versions of n,e,d once and reuse them -- but only really changed bigintToBuf and halve= per above, to get the following which works for me (in node>=16 so 'crypto' supports jwk export):
const modAr=require('bigint-mod-arith'); // actually I use the IIFE for reasons but that makes no difference
const jwk=require('crypto').generateKeyPairSync('rsa',{modulusLength:1024}).privateKey.export({format:'jwk'});
const n=bigintFromParam(jwk.n), e=bigintFromParam(jwk.e), d=bigintFromParam(jwk.d);
function bigintFromParam(str){ return bigintFromBuf(Buffer.from(str,'base64url')); }
function bigintFromBuf(buf){ return BigInt('0x'+buf.toString('hex')); }
function bigintToBuf(x){ let t=x.toString(16); return Buffer.from(t.length%2?'0'+t:t,'hex');}
let plain=Buffer.from('Hello world!');
let encr=bigintToBuf(modAr.modPow(bigintFromBuf(plain),e,n));
let double=bigintToBuf(bigintFromBuf(encr)*modAr.modPow(2n,e,n))
// should actually take mod n there to get an in-range ciphertext,
// but the modPow(,,n) in the next step = decrypt fixes that for us
let decr=bigintToBuf(modAr.modPow(bigintFromBuf(double),d,n));
let temp=bigintFromBuf(decr), halve=bigintToBuf(temp%2n? (temp+n)/2n: temp/2n);
console.log(halve.toString());
PS: real RSA implementations, including the 'crypto' module, don't use modPow(,d,n) for decryption, they use the "Chinese Remainder" parameters in the private key to do a more efficient computation instead. See wikipedia for a good explanation.
PPS: just for the record, 1024-bit RSA -- even with decent padding -- has been considered marginal for security since 2013 at latest and mostly prohibited, although there is not yet a reported break in the open community. However, that is offtopic for SO, and your exercise clearly isn't about security.

What is wrong with "keccak256(abi.encode(_a ^ _b))" considering merkle tree security?

In this article MERKLE PROOFS FOR OFFLINE DATA INTEGRITY there is a paragraph:
Warning: Cryptography is harder than it looks. The initial version of
this article had the hash function hash(a^b). That was a bad idea
because it meant that if you knew the legitimate values of a and b you
could use b' = a^b^a' to prove any desired a' value. With this
function you'd have to calculate b' such that hash(a') ^ hash(b') is
equal to a known value (the next branch on the way to root), which is
a lot harder.
I can see that this code seems to be insecure:
function pairHash(uint _a, uint _b) internal pure returns(uint) {
return uint(keccak256(abi.encode(_a ^ _b)));
}
Could somebody explain why and provide an example(real code appreciated)? Or maybe just a URL to corresponding article (the closest one I've found is this one)
Thanks in advance
Let's say we have two values a and b and we create the pair as hash(a^b). The issue is that we can create the same pair by passing two different values a' and b' such that b' = a^b^a' for every a'.
The proof is easily done knowing that the xor operation is commutative and x^x = 0:
hash(a'^b') = hash(a'^(a^b^a')) = hash(a'^a'^a^b) = hash(a^b)
The most basic example is a tree with a and b as only leaves, so with root equal hash(a^b). Let's say the leaves are tokens amounts for an airdrop. We are supposed to claim the tokens by passing a as value and [b] as merkle proof for example. However someone can pass any value a' with proof [a^b^a'], as this still verifies the merkle root.
(PS: the issue from the codearena audit is a different one)

BigNumber lib for Node.js gives wrong RSA decryption

I'm trying to learn some cryptography, so I decided to take on the challenge given in the YouTube video How the RSA algorithm works, including how to select d, e, n, p, q, and φ (phi) using Node.js.
The challenge is thus:
Encrypt the secret using m^e mod n = c, and then decrypt it using c^d mod n = m, and then the following data is given:
// Secret Message:
const m = 42;
// Prime Numbers:
const p = 61, q = 53;
// Other numbers:
const e = 17, n = 3233, d = 2753;
Sadly I quickly discovered that Math.pow(c, d) yields Infinity, so I went $ npm install big-number --save and did BigNumber(c).pow(d) instead. This yielded an actual number, but the final result after the modulus operation is still wrong. I get the number 24 when it should be 42, and I just can't for the life of me understand why (though knowing myself it's probably due to something really dumb).
Here are the rather naïve functions I made to solve the problem:
function encryptRSA(m, e, n) {
// return Math.pow(m, e) % n; // yields 1278 and not 2557
return BigNumber(m).pow(e).mod(n);
}
...and...
function decryptRSA(c, d, n) {
// return Math.pow(c, d) % n; // yileded 'Infinity' so dl'd BigNumber
return BigNumber(c).pow(d).mod(n);
}
Question is, why does the last function return 24 and not 42? And how can I fix it?
So it turns out it the solution was really dumb as expected. The number is stored backwards in the object BigNumber.number. Though when printed in a string, it is shown correctly. I still don't know why that is, though, so an answer to that would still be pretty interesting.
I would also be extremely grateful for an answer to why the regular modulus operations yields the wrong answer, especially in the first function, though I guess it has to do with physical number length vs memory restrictions...

Letter substitutions termination

Given:
A char string S length l containing only characters from 'a' to 'z'
A set of ordered substitution rules R (in the form X->Y) where x, y are single letters from 'a' to 'z' (eg, 'a' -> ' e' could be a valid rule but 'ce'->'abc' would never be a valid rule)
When a rule r in R is applied on S, all letters of S which are equal to the left side of the rule r would be replaced by the letter in the right side of r, if the rule r cause any replacement in S, r is called triggered rule.
Flowchart (Algorithm) :
(1) Alternately apply all rules in R (following the order of rules in R) on S.
(2) While (there exists any 'triggered rule' DURING (1) ) : repeat (1)
(3) Terminate
The question is: Is there any way to determine if with a given string S and set R, the algorithm would terminate or not (running forever)
Example1 : (manually executed)
S = 'abcdef' R = { 'a'->'b' , 'b' -> 'c' }
(the order is implied the order of appearance from left to right of each rule)
Ater running algorithm on S and R:
(1.1): 'abcdef' --> 'bbcdef' --> 'cccdef'
(2.1): repeat (1) because there are 2 replacements during the (1.1)
(1.2): 'cccdef'
(2.2): continue to (3) because there is no replacement during the (1.2)
(3) : terminate the algorithm
=> The algorithm terminate with the given S and R
Example2:
S = 'abcdef' R = { 'a'->'b' , 'b' -> 'a' }
(the order is implied the appearance order from left to right of each rule)
Ater running algorithm on S and R:
(1.1): 'abcdef' --> 'bbcdef' --> 'abcdef'
(2.1): repeat (1) because there are 2 replacements during the (1.1)
(1.2): 'abcdef --> 'bbcdef' --> 'abcdef'
(2.2): repeat (1) because there are 2 replacements during the (1.2)
(1.3): ...... that would be alike (1.1) forever....
The step (3) (terminate) is never reached.
=> The algorithm won't terminate with the given S and R.
I worked on this and found no efficient algorithm for the question
"if the algorithm halts".
First idea came to my mind was to "find cycle" of letters which
are in triggered rules but the number of rules may be too large
for this idea to be ideal.
The second one is to propose a "threshold" for the time of the
repeat, if the threshold is exceeded, we conclude the algorithm
would not terninate.
The "threshold" could be choosen randomly, (as long as it big
enough) - this approach is not really compelling.
I am thinking that if there is any upper bound for the
"threshold" which ensures that we always get the right answer.
And I came up with threshold = 26 where 26 is the number of
letter from 'a' to 'z' - but I can't prove that it true (or not).
(I hope that It would be something like Bellman-Ford algorithm which determines negative cycle in a fixed number of step,..)
How about you? Please help me find the answer (this is not a
homework)
Thankyou for reading.
One simple way to think about solving this is to consider a string of length 1 and see if the problem can loop for any given starting letter. Since the string's length is never changing, and applying a rule applies to each character in S independently, it suffices to consider just a string of length 1.
Now, start with a state diagram with 26 states - 1 for each letter of the alphabet. Now, for your state transitions, consider this process:
Apply the transitions from R 1 at a time in order, until you reach the end of R. If from a particular state (letter), you do not ever reach a new letter, you know that if you reach the starting letter, you terminate. Otherwise, after applying the entire sequence of R, you will end up with a new letter. This will be your new state.
Note that all state transitions are deterministic because we apply the entire sequence of R, not just the individual transitions. If we applied the individual transitions, we might get confused, because we might have a -> b, b->a, a->c. When looking at the individual operations, we might think there are two possible transitions from a (either to b or to c), but really, considering the entire sequence, we see definitively that a transitions to c.
You will be done creating your state diagram after considering the next states of each starting letter. Creating the entire state diagram in this manner requires 26 * |R| operations. If the state diagram contains a loop, then if the string S contains any of the letters in the loop, then it fails to halt, otherwise it will halt.
Alternatively, if you just consider halting after 26 iterations through the entire sequence from R, you can use that as well.

Why is this commit that sets the RSA public exponent to 1 problematic?

I saw this commit in SaltStack on Hacker News, but I don't understand exactly what it does or why the original version was a cryptography error. (I also don't know a lot about how the specifics of cryptography work, either.)
- gen = RSA.gen_key(keysize, 1, callback=lambda x, y, z: None)
+ gen = RSA.gen_key(keysize, 65537, callback=lambda x, y, z: None)
Can someone elaborate why the choice of "1" was replaced? And why is "65537" better?
You've essentially asked three questions:
What is this code doing?
Why is 1 bad?
Why was it replaced with 65537?
It sounds like you don't have a lot of cryptography background, so I'll try to fill in some of the gaps there as well.
What is this code doing?
To understand why the original value of 1 was a broken choice, you have to understand a little bit about how RSA works.
RSA is a cryptosystem -- a way of performing key generation, encryption, and decryption -- so that you can send messages securely to other people. RSA is a member of a class called public-key cryptosystems, because the key that you use to encrypt messages is public and can be freely known by everyone. The key you use to decrypt messages enciphered with your public key is secret and known only by you, so we call it a private key.
If you imagine padlocks and keys as the analog to public keys and private keys, you can see how this might work with real-world messages:
Bob gives Alice a padlock (his public key) and keeps the key to the lock (his private key).
Now, if Alice wants to send a Bob a message, she puts a message inside a box, puts his padlock on a box, and sends him the box.
Only Bob has the key, so only Bob can unlock the padlock and get inside the box.
To actually generate the key, RSA needs three important numbers:
"N", the product of two very large prime numbers p and q
"e", the "public exponent"
"d", the "private exponent"
A big part of the security of RSA comes from the fact that it should be very difficult to figure out what d is, given N and e. The public key in RSA consists of two numbers: <N,e>, while the private key is <N,d>.
In other words, if I know what Bob's padlock looks like, it should be very difficult to reverse-engineer a key that will open Bob's padlock.
Why is 1 bad?
1 is a bad choice because it makes very easy to reverse-engineer a key that will open Bob's padlock, which is the opposite of what we want.
The problematic section in full looks like this:
def gen_keys(keydir, keyname, keysize, user=None):
# Generate a keypair for use with salt
# ...
gen = RSA.gen_key(keysize, 1, callback=lambda x, y, z: None)
This is a Python fragment which generates a RSA key with e = 1.
The relationship between N, e, and d is given by:
d*e = 1 mod (p-1)(q-1)
But wait: if you pick e = 1, as SaltStack did, then you have a problem:
d = 1 mod (p-1)(q-1)
Now you have the private key! The security is broken, since you can figure out what d is. So you can decrypt everyone's transmissions -- you've made it so that you can trivially get Bob's key given his padlock. Oops.
It actually gets worse than that. In RSA, encryption means that you have a message m to transmit that you want to encrypt with the public key <N,e>. The enciphered message c is computed as:
c = m^e (mod N)
So, if e = 1, then m^e = m, and you have c = m mod N.
But if m < N, then m mod N is m. So you have:
c = m
The enciphered text is the same the message text, so no encryption is happening at all! Double oops.
Hopefully it's clear why 1 is a bad choice!
Why is 65537 better?
65537 seems like an unusual, arbitrary choice. You may wonder why, for instance, we couldn't just pick e = 3. The lower e is, the faster encryption becomes, since to encrypt anything we have to execute:
c = m^e (mod N)
and m^e can be a very large number when e is large.
It turns out that 65537 is mostly for compatibility reasons with existing hardware and software, and for a few other reasons. This Cryptography StackExchange answer explains it in good detail.
With a suitable random padding scheme, you can pick almost any odd integer higher other than 1 without affecting security, so e = 3 is otherwise a choice that maximizes performance.

Resources