AES256 CBC + HMAC SHA256 ensuring confidentiality *and* authentication?

AES256 CBC + HMAC SHA256 ensuring confidentiality *and* authentication? - security

I'm thinking of using AES256 CBC + HMAC SHA-256 as a building block for messages that ensures both confidentiality and authentication.
In particular, consider this scenario:
Alice is possession a public key belonging to Bob (the key exchange and algorithm is outside the scope of this question). Alice has an identifying key K, also shared with Bob, that she can use to identify herself with. Only Alice and Bob knows the key K.
Alice encrypts (nonce || K) using Bob's public key.
Bob decrypts the packet and has now has K and nonce.
Bob uses SHA-256 with SHA256(K || nonce) to yield a K(e) of 256 bits.
Bob uses SHA-256 with SHA256(K || nonce + 1) to yield a K(s) of 256 bits.
Now for every packet Bob wishes to send Alice he performs the following:
Create a new random 128 bit IV
Encrypts the message using the IV and K(e) as the key.
Creates a SHA-256 HMAC with K(s) as key and (IV || Encrypted message) as data.
Finally sends (IV || HMAC || Ciphertext) to Alice
Alice has also calculated K(e) and K(s), and follows the following procedure when receiving data from Bob:
Split the message into IV, ciphertext and HMAC.
Calculate the HMAC using K(s), IV and ciphertext.
Compare HMAC with the HMAC sent. If this matches, Alice considers this message authenticated as a message sent by Bob, otherwise it is discarded.
Alice decrypts the message using K(e)
Does this protocol ensure that Alice only decrypts messages from Bob, assuming that no one other than Bob can read the encrypted message that Alice sends him encrypted using his public key?
I.e. does messages constructed in this manner ensure both confidentiality and authentication?
Note: If the protocol requires Bob to send multiple messages, this scheme needs a slight modification to avoid replay attacks.
P.S. I am aware of AES-GCM/CCM, but this scheme would work with the basic AES, SHA and HMAC algorithms that are found in most crypto packages. This solution might also be slower, but that too is out of the scope for the question.

Basically you are recreating SSL/TLS. This implies the usual caveats about building your own protocol, and you are warmly encouraged to use TLS with an existing library instead of rewriting your own.
That being said, using AES with CBC for encryption, and HMAC for integrity, is sound. There are combined encryption+integrity modes (that you are aware of), and CBC+HMAC is kind of "old school", but it cannot hurt. You are doing things in the "science-approved" way: encrypt, then MAC the encrypted string (and you do not forget the IV: forgetting the IV is the classical mistake).
Your key derivation may be somewhat weak. It is perfect if SHA-256 behaves like a perfect random oracle, but it is known that SHA-256 does not behave like a random oracle (because of the so-called length-extension attack). It is similar to the reason why HMAC is HMAC, with two nested hash function invocations, instead of simple hashing (once) the concatenation of the MAC key and the data. TLS uses a specific key derivation function (which is called "the PRF" in the TLS specification) which should avoid any trouble. That function is built over SHA-256 (actually, over HMAC/SHA-256) and can be implemented around any typical SHA-256 implementation.
(I am not saying that I know how to attack your key derivation process; only that this is a tricky thing to make properly, and that its security may be assessed only after years of scrutiny from hundreds of cryptographers. Which is why reusing functions and protocols which have already been thoroughly examined is basically a good idea.)
In TLS there are two nonces, called the "client random" and the "server random". In your proposal you only have the "client random". What you lose here, security-wise, is kind of unclear. A cautious strategy would be to include a server random (i.e. another nonce chosen by Bob). The kind of things we want to avoid is when Alice and Bob run the protocol in both directions, and an attacker feeds messages from Alice to Alice herself. Complete analysis of what an attacker could do is complex (it is a whole branch of cryptography); generally speaking, nonces in both directions tend to avoid some issues.
If you send several packets, then you may have some issues about lost packets, duplicated packets ("replay attacks"), and packets arriving out of order. In the context of TLS, this should not "normally" happen because TLS is used over a medium which already ensures (under normal conditions, not counting active attacks) that data is transferred in strict order. Thus, TLS includes a sequence number into the data which goes in the MAC. This would detect any alteration from an attacker, include replay, lost records and record reordering. If possible, you should also use a sequence number.

The answer to the question as stated is no, there is no guarantee that Alice only decrypts messages from Bob, but that's only because you didn't stipulate that only Bob knows K. If Alice and Bob are the only two people who know K, then the crux of the question is whether your key generation protocol is sound. (We can ignore the rest, I believe, because you're just using HMAC-SHA256 and AES256 as they are intended to be used.)
The generation protocol isn't bad, but it can be improved. The accepted way to create keys from shared secrets is to use a "key derivation function". These functions use a hash in a similar way to what you have done here, but they are also purposely slow to inhibit brute force attacks. PBKDF2 seems to be what you want, as it a) can derive 512 bits of key data (or more), and b) can be made up of the primitives you have available; namely, SHA256 and HMAC-SHA256.

If you don't want to use PKI, take a look at TLS-PSK. It would seem to solve the exact problem you are solving yourself. See RFC 4279 (and 5487 for additional ciphersuites).

Related

Why use Authenticated Encryption instead if hashes?

What is the benefit of using Authenticated Encryption schemes like GCM or EAX compared to simpler methods like CRC or hash functions like SHA?
As far as I understand these methods basically add a Message Authentication Code (MAC) to the message so it can be validated. But the same would be possible if a CRC or hash value would be calculated and appended to the plaintext (MIC). This way it would also not be possible to tamper with the message because the hash would probably not match anymore.
The linked Wikipedia article says MICs don't take the key etc. into account but I do not understand why this is a problem.

There's conceptually no difference between an Authenticated Encryption scheme (GCM, CCM, EAX, etc) and providing an HMAC over the encrypted message, the AE algorithms simply constrain and standardize the byte pattern (while tending to require less space/time than a serial operation of encrypt and HMAC).
If you are computing your unkeyed digest over the plaintext before encrypting you do have a tamper-evident algorithm. But computing the digest over the plaintext has two disadvantages over computing it over the ciphertext:
If you send the same thing twice you send the same hash, even if your ciphertext is different (due to a different IV or key)
If the ciphertext has been tampered with in an attempt to confuse the decryption routine you will still process it before discovering the tamper.
Of course, the disadvantage of digesting after is that in your unkeyed approach anyone who tampers with the ciphertext can simply recompute the SHA-2-256 digest of the ciphertext after the tamper. The solution to that is to not do an unkeyed digest, but to do a keyed digest, like HMAC.
The options are:
Encrypt-only: Subject to tampering. Assuming a new IV is used for each message (and ECB isn't used) does not reveal when a message repeats.
Digest-only: Subject to tampering. Message is plain-text.
MAC-only: Not subject to tampering. Message is plain-text.
Digest-then-Encrypt (DtE - digest is itself encrypted): Ciphertext corruption attacks are possible. Tampering with the plaintext is possible, if it is known. Message reuse is not revealed.
Digest-and-Encrypt (D&E/E&D - digest plaintext, send digest as plaintext): Ciphertext corruption attacks are possible. Tampering with the plaintext is possible, if it is known. Message reuse is revealed via the digest not changing.
Encrypt-then-Digest (EtD): This guards against transmission errors, but since any attacker can just recompute the digest this is the same as encrypt-only.
MAC-then-Encrypt (MtE): Same strengths as DtE, but even if the attacker knew the original plaintext and what they had tampered it to they cannot alter the MAC (unless the plaintext is being altered to an already-known message+MAC).
MAC-and-Encrypt (M&E/E&M): Like D&E this reveals message reuse. Like MtE it is still vulnerable to ciphertext corruption, and a very small set of tampering.
Encrypt-then-MAC (EtM): Any attempt to alter the ciphertext is discovered by the MAC failing to validate, this can be done before processing the ciphertext. Message reuse is not revealed, since the MAC was over the ciphertext.
EtM is the safest approach in the general case. One of the things that an AE algorithm solves is that it takes the question of how to combine a MAC and cipher out of the developer's hands and puts it into the hands of a cryptographer.

Encrypting a string

Is there a way to encrypt a string so there is no reversable effect? Like if you run some algorith 100 times, encrypting a message, you can run it 100 times in reverse and get the right one. If there a technology or method that eliminates such possibility?

There are two broad categories you should look into, depending on your needs:
Cryptographic Hash Functions
Cryptographic hash functions produce fixed-width values based on an arbitrarily long input, in such a way that even very minor changes in the input result in significantly different output. As a rule, they are irreversible (though flaws have been found in some algorithms). This is a good choice if you do not need to be able to recover the value of the string yourself. For example, good username/password verification systems store a hash of the password rather than the password itself, and authenticate by comparing that hash to the hash of the password provided by the user. This way, even if the username/password database is compromised, user passwords are not exposed.
Public-Key Cryptography
In public-key cryptography, a sender uses the intended recipient's "public" key to encrypt a message, and the recipient uses their "private" key to decrypt it. The message cannot be decrypted by the same key that encrypted it, so in that sense the algorithm is not strictly "reversible" (splicing hairs, I know). TLS, SSL, and PGP are all based on this technique, to name a few examples. This is probably your best option if you are transmitting data between two known parties.

Does encryption guarantee integrity?

To build a secure system, can we assume that encryption guarantees integrity is true before starting a secure programming?
Both in symmetric and public-key
encryption, is my question
well-proofed ?
If no, what are the
vulnerabilities, can you give an
example?

No. This is easy to see if you consider the one-time pad, a simple (theoretically) perfectly secure system.
If you change any bit of the output, a bit of the clear text will change, and the recipient has no way to detect this.
This is an obvious case, but the same conclusion applies to most encryption systems. They only provide for confidentiality, not integrity.
Thus, you may want to add a digital signature. Interestingly, when using public key cryptography, it is not sufficient to sign then encrypt (SE), or to encrypt then sign (ES). Both of these are vulnerable to replay attacks. You have to either sign-encrypt-sign or encrypt-sign-encrypt to have a generally secure solution. This paper explains why in detail.
If you use SE, the recipient can decrypt the message, then re-encrypt it to a different recipient. This then deceives the new recipient about the sender's intended recipient.
If you use ES, an eavesdropper can remove the signature and add their own. Thus, even though they can't read the message, they can take credit for it, pretending to be the original sender.

In short the answer is no. Message Integrity and Secrecy are different, and require different tools.
Lets take a simple coin flip into consideration, and in this case we are betting on the results. The result is a simple bool and I encrypt it using a stream cipher like RC4 which yields 1 encrypted bit and I email it to you. You don't have the key, and I ask you to email me back the answer.
A few attacks can happen in this scenario.
1)An attacker could modify the bit in transit, if it was a 0 there is a 50% chance it will become a 1 and the contrary is true. This is because RC4 produces a prng stream that is XOR'ed with the plain text produce the cipher text, similar to a one time pad.
2)Another possibility is that I could provide you with a different key to make sure your answer is wrong. This is easy to brute force, I just just keep trying keys until I get the proper bit flip.
A solution is to use a block cipher is CMAC Mode. A CMAC is a message authentication code similar to an hmac but it uses a block cipher instead of a message digest function. The secret key (K) is the same key that you use to encrypt the message. This adds n+1 blocks to the cipher text. In my scenario this prevents both attacks 1 and 2. An attacker cannot flip a simple bit because the plain text is padded, even if the message only takes up 1 bit i must transmit a minimum of 1 block using a block cipher. The additional authentication block prevents me from chaining the key, and it also provides integrity from anyone attempting to modify the cipher text in transit (although this would be very difficult to do in practice, the additional layer of security is useful).
WPA2 uses AES-CMAC for these reasons.

If data integrity is a specific concern to you, you should use a cryptographic hash function, combined with an an encryption algorithm.
But it really does come down to using the correct tool for the job. Some encryption algorithms may provide some level of checksum validation built-in, others may not.

RSA: Encrypting message using multiple keys

Is it possible to get additional security by encrypting a message using 2 or more RSA keys?
EDIT: A few clarifications:
The context I am most interested in doing this for is encrypting a randomly generated symmetric key.
I don't want to limit the question to encrypting twice in a row; the purpose is to avoid the high computational cost of large RSA keys. Using less straightforward tactics such as breaking the message into parts and encrypting them separately should be considered as an option.
It should be assumed that getting only part of the message is acceptable.
If you know of any publications where this is discussed specifically by an expert, or algorithms that use multiple RSA keys, then please contribute.

No.
It is not safe to do thought experiments regarding cryptography. You are advised to keep narrowly to the path trodden by the experts.
And when the experts want to protect something better, they use a bigger key-size (at least 2048 bits is required, smaller certificates are insufficient for any peace of mind) or use elliptic curve certificates in preference to RSA.
Incidentally, you're remember that your message body is typically encrypted with a symmetric cipher and a random key, and that just this random key is encrypted with the public key of the recipient. Double-encrypting this secret key won't make this secret key longer, and won't impact an attacker's ability to brute-force that.
Quantum cryptography - I mention it only as an exciting aside, you need not factor this into your choice - promises interesting things for the keysizes: the RSA keys will be wiped out by Shor's algorithm, but the symmetric keys (Grover's) will be only half-lengthed (128-bits will be equiv to 64-bits, so will be crackable). There is of course debate about whether such quantum machines can be implemented etc etc :)

No.
If Key A is compromised than encrypted with A+B will protect against the compromise, but outside that special case, you get no additional benefit.

Composing ciphers
Say you have an encryption function E(M, K), where M is the plaintext message and K is the key. Say no known vulnerabilities exist in E.
You generate two completely unrelated keys K1 and K2.
It is guaranteed that if you compose them in the form E(E(M, K1), K2), it is impossible to actually lose security this way. If it was possible to lose security from encrypting E(M, K1), be it with K2 or any other key, the is cipher broken, because an attacker could just do E(E(M, K1), KF) where KF is any key the attacker wishes to choose.
For more info see here.
Encrypting every second block with a different key
The implications here are obvious. Assuming you are using properly composed cryptographic primitives with both encryption function:key combinations, if you encrypt every second block with a different key out of the set of two keys, the attacker can only decrypt the blocks he has the key for.

Yes!
But do not use raw encryption. Use RSA encryption schema. Instead of reencrypting the encrypted message with the second key, which might have weakening effet (I don't know), use the shared secret algorithm to split your secret in two. The shared secret algorithm make it possible to split a secret in n pieces and ensures that if an attacker manages to get n-1 pieces he knows nothing of the secret. So don't simply split the secret in two.
You can then have more then 2 RSA keys. Another powerful property of the shared secret algorithm is that it is possible to spread the secret over n pieces and require only m pieces, with m smaller than n, to recover the secret. This makes the secret recovery more robust to loss of pieces.
Look here for more information on shared secret: http://en.wikipedia.org/wiki/Shared_secret

In additional to the answers given, it also simply doesn't work unless you do some patching. Very simply, one of the moduli must be larger than the other. If you perform RSA mod the larger modulus first and mod the smaller last you lose information and cannot guarantee successful decryption. The obvious patch is to always encrypt with the smaller modulus first. Of course, you have to perform decryption in the opposite order. Another simple patch is choose moduli that a very close together in size, so that the probability that you encounter a ciphertext that cannot be uniquely decrypted is vanishingly small.

Is it possible to reverse a SHA-1 hash?

Is it possible to reverse a SHA-1?
I'm thinking about using a SHA-1 to create a simple lightweight system to authenticate a small embedded system that communicates over an unencrypted connection.
Let's say that I create a sha1 like this with input from a "secret key" and spice it with a timestamp so that the SHA-1 will change all the time.
sha1("My Secret Key"+"a timestamp")
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1.
Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
Note1:
I guess you could brute force in some way, but how much work would that actually be?
Note2:
I don't plan to encrypt any data, I just would like to know who sent it.

No, you cannot reverse SHA-1, that is exactly why it is called a Secure Hash Algorithm.
What you should definitely be doing though, is include the message that is being transmitted into the hash calculation. Otherwise a man-in-the-middle could intercept the message, and use the signature (which only contains the sender's key and the timestamp) to attach it to a fake message (where it would still be valid).
And you should probably be using SHA-256 for new systems now.
sha("My Secret Key"+"a timestamp" + the whole message to be signed)
You also need to additionally transmit the timestamp in the clear, because otherwise you have no way to verify the digest (other than trying a lot of plausible timestamps).
If a brute force attack is feasible depends on the length of your secret key.
The security of your whole system would rely on this shared secret (because both sender and receiver need to know, but no one else). An attacker would try to go after the key (either but brute-force guessing or by trying to get it from your device) rather than trying to break SHA-1.

SHA-1 is a hash function that was designed to make it impractically difficult to reverse the operation. Such hash functions are often called one-way functions or cryptographic hash functions for this reason.
However, SHA-1's collision resistance was theoretically broken in 2005. This allows finding two different input that has the same hash value faster than the generic birthday attack that has 280 cost with 50% probability. In 2017, the collision attack become practicable as known as shattered.
As of 2015, NIST dropped SHA-1 for signatures. You should consider using something stronger like SHA-256 for new applications.
Jon Callas on SHA-1:
It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off.

The question is actually how to authenticate over an insecure session.
The standard why to do this is to use a message digest, e.g. HMAC.
You send the message plaintext as well as an accompanying hash of that message where your secret has been mixed in.
So instead of your:
sha1("My Secret Key"+"a timestamp")
You have:
msg,hmac("My Secret Key",sha(msg+msg_sequence_id))
The message sequence id is a simple counter to keep track by both parties to the number of messages they have exchanged in this 'session' - this prevents an attacker from simply replaying previous-seen messages.
This the industry standard and secure way of authenticating messages, whether they are encrypted or not.
(this is why you can't brute the hash:)
A hash is a one-way function, meaning that many inputs all produce the same output.
As you know the secret, and you can make a sensible guess as to the range of the timestamp, then you could iterate over all those timestamps, compute the hash and compare it.
Of course two or more timestamps within the range you examine might 'collide' i.e. although the timestamps are different, they generate the same hash.
So there is, fundamentally, no way to reverse the hash with any certainty.

In mathematical terms, only bijective functions have an inverse function. But hash functions are not injective as there are multiple input values that result in the same output value (collision).
So, no, hash functions can not be reversed. But you can look for such collisions.
Edit
As you want to authenticate the communication between your systems, I would suggest to use HMAC. This construct to calculate message authenticate codes can use different hash functions. You can use SHA-1, SHA-256 or whatever hash function you want.
And to authenticate the response to a specific request, I would send a nonce along with the request that needs to be used as salt to authenticate the response.

It is not entirely true that you cannot reverse SHA-1 encrypted string.
You cannot directly reverse one, but it can be done with rainbow tables.
Wikipedia:
A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a plaintext password up to a certain length consisting of a limited set of characters.
Essentially, SHA-1 is only as safe as the strength of the password used. If users have long passwords with obscure combinations of characters, it is very unlikely that existing rainbow tables will have a key for the encrypted string.
You can test your encrypted SHA-1 strings here:
http://sha1.gromweb.com/
There are other rainbow tables on the internet that you can use so Google reverse SHA1.

Note that the best attacks against MD5 and SHA-1 have been about finding any two arbitrary messages m1 and m2 where h(m1) = h(m2) or finding m2 such that h(m1) = h(m2) and m1 != m2. Finding m1, given h(m1) is still computationally infeasible.
Also, you are using a MAC (message authentication code), so an attacker can't forget a message without knowing secret with one caveat - the general MAC construction that you used is susceptible to length extension attack - an attacker can in some circumstances forge a message m2|m3, h(secret, m2|m3) given m2, h(secret, m2). This is not an issue with just timestamp but it is an issue when you compute MAC over messages of arbitrary length. You could append the secret to timestamp instead of pre-pending but in general you are better off using HMAC with SHA1 digest (HMAC is just construction and can use MD5 or SHA as digest algorithms).
Finally, you are signing just the timestamp and the not the full request. An active attacker can easily attack the system especially if you have no replay protection (although even with replay protection, this flaw exists). For example, I can capture timestamp, HMAC(timestamp with secret) from one message and then use it in my own message and the server will accept it.
Best to send message, HMAC(message) with sufficiently long secret. The server can be assured of the integrity of the message and authenticity of the client.
You can depending on your threat scenario either add replay protection or note that it is not necessary since a message when replayed in entirety does not cause any problems.

Hashes are dependent on the input, and for the same input will give the same output.
So, in addition to the other answers, please keep the following in mind:
If you start the hash with the password, it is possible to pre-compute rainbow tables, and quickly add plausible timestamp values, which is much harder if you start with the timestamp.
So, rather than use
sha1("My Secret Key"+"a timestamp")
go for
sha1("a timestamp"+"My Secret Key")

I believe the accepted answer is technically right but wrong as it applies to the use case: to create & transmit tamper evident data over public/non-trusted mediums.
Because although it is technically highly-difficult to brute-force or reverse a SHA hash, when you are sending plain text "data & a hash of the data + secret" over the internet, as noted above, it is possible to intelligently get the secret after capturing enough samples of your data. Think about it - your data may be changing, but the secret key remains the same. So every time you send a new data blob out, it's a new sample to run basic cracking algorithms on. With 2 or more samples that contain different data & a hash of the data+secret, you can verify that the secret you determine is correct and not a false positive.
This scenario is similar to how Wifi crackers can crack wifi passwords after they capture enough data packets. After you gather enough data it's trivial to generate the secret key, even though you aren't technically reversing SHA1 or even SHA256. The ONLY way to ensure that your data has not been tampered with, or to verify who you are talking to on the other end, is to encrypt the entire data blob using GPG or the like (public & private keys). Hashing is, by nature, ALWAYS insecure when the data you are hashing is visible.
Practically speaking it really depends on the application and purpose of why you are hashing in the first place. If the level of security required is trivial or say you are inside of a 100% completely trusted network, then perhaps hashing would be a viable option. Hope no one on the network, or any intruder, is interested in your data. Otherwise, as far as I can determine at this time, the only other reliably viable option is key-based encryption. You can either encrypt the entire data blob or just sign it.
Note: This was one of the ways the British were able to crack the Enigma code during WW2, leading to favor the Allies.
Any thoughts on this?

SHA1 was designed to prevent recovery of the original text from the hash. However, SHA1 databases exists, that allow to lookup the common passwords by their SHA hash.

Is it possible to reverse a SHA-1?
SHA-1 was meant to be a collision-resistant hash, whose purpose is to make it hard to find distinct messages that have the same hash. It is also designed to have preimage-resistant, that is it should be hard to find a message having a prescribed hash, and second-preimage-resistant, so that it is hard to find a second message having the same hash as a prescribed message.
SHA-1's collision resistance is broken practically in 2017 by Google's team and NIST already removed the SHA-1 for signature purposes in 2015.
SHA-1 pre-image resistance, on the other hand, still exists. One should be careful about the pre-image resistance, if the input space is short, then finding the pre-image is easy. So, your secret should be at least 128-bit.
SHA-1("My Secret Key"+"a timestamp")
This is the pre-fix secret construction has an attack case known as the length extension attack on the Merkle-Damgard based hash function like SHA-1. Applied to the Flicker. One should not use this with SHA-1 or SHA-2. One can use
HMAC-SHA-256 (HMAC doesn't require the collision resistance of the hash function therefore SHA-1 and MD5 are still fine for HMAC, however, forgot about them) to achieve a better security system. HMAC has a cost of double call of the hash function. That is a weakness for time demanded systems. A note; HMAC is a beast in cryptography.
KMAC is the pre-fix secret construction from SHA-3, since SHA-3 has resistance to length extension attack, this is secure.
Use BLAKE2 with pre-fix construction and this is also secure since it has also resistance to length extension attacks. BLAKE is a really fast hash function, and now it has a parallel version BLAKE3, too (need some time for security analysis). Wireguard uses BLAKE2 as MAC.
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1. Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
You did not define the size of your secret. If your attacker knows the timestamp, then they try to look for it by searching. If we consider the collective power of the Bitcoin miners, as of 2022, they reach around ~293 double SHA-256 in a year. Therefore, you must adjust your security according to your risk. As of 2022, NIST's minimum security is 112-bit. One should consider the above 128-bit for the secret size.
Note1: I guess you could brute force in some way, but how much work would that actually be?
Given the answer above. As a special case, against the possible implementation of Grover's algorithm ( a Quantum algorithm for finding pre-images), one should use hash functions larger than 256 output size.
Note2: I don't plan to encrypt any data, I just would like to know who sent it.
This is not the way. Your construction can only work if the secret is mutually shared like a DHKE. That is the secret only known to party the sender and you. Instead of managing this, a better way is to use digital signatures to solve this issue. Besides, one will get non-repudiation, too.

Any hashing algorithm is reversible, if applied to strings of max length L. The only matter is the value of L. To assess it exactly, you could run the state of art dehashing utility, hashcat. It is optimized to get best performance of your hardware.
That's why you need long passwords, like 12 characters. Here they say for length 8 the password is dehashed (using brute force) in 24 hours (1 GPU involved). For each extra character multiply it by alphabet length (say 50). So for 9 characters you have 50 days, for 10 you have 6 years, and so on. It's definitely inaccurate, but can give us an idea, what the numbers could be.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string