I need some clue to make a keystream based from an input - security

well i have a project, but i didn't dive so deep into cryptography, here is my problem:
i want to generate a keystream based on a two input: a string and a 'file-size'
it's like if i have an input with a string = 'abcd' and a file with size of '10kbytes', the generated keystream will have a size of 80 kbytes, but if the file is '5kbytes' then the keystream size is 40kbytes
anyone know if there is some algorithm like this so i can learn more about this, or this is imposible / not exist?

This answer is very basic cryptography, and may not be secure. I assume this is a learning exercise, not for real use. To learn more, read about stream ciphers, which generate a keystream.
First you need a key. Since the key must be unique for each file, you will need to use a counter. Concatenate your filename, file size and counter into a single string: "abcd-10kbytes-count0000001". Keep track of your counter value and increment it each time you encrypt a file.
You now need to hash your concatenated string. Your system may provide a secure hash like SHA256 or SHA3. If it does then use one of them to produce a 256 bit key. If you have to code the hash yourself, then try something much simpler to code, though insecure, such as FNV hash with 256 bit output. The hash output is your encryption key. Save it in a secure place, you will need it later for encrypting and decrypting the file. Each file will have a different key.
For the stream cipher itself you can use AES-CTR (AES in counter mode) if it is available on your system. If not, then code a much simpler stream cipher, like RC4. This is obsolete and insecure but very easy to code.
AES-CTR and RC4 both produce a stream of bytes. To encrypt the file, XOR the file with the keystream, byte by byte. To decrypt the file, use the same key and again XOR the encrypted file with the keystream byte by byte.
If you find problems then ask again, showing your code with the problem.

Related

How to decrypt hash sha 256 encrypted strings without knowing the key?

for a private research project I wonder if its possible to decrypt SHA256 encrypted strings without having the key and just have Examples of encrypted and decrypted strings.
As an example, I have 1000 Strings as decrypted text and I have the 1000 Strings encrypted. Can't this information be used to decrypt those strings?
I just want to give notice, that I totally do not have any clue about cryptographie and I am sorry if my questions sounds to newbie.
Best regards,
Heini
As an example, I have 1000 Strings as [original] text and I have the 1000 Strings [hash]. Can't this information be used to [identify] those strings?
Sure. Hash each string, and write down "this string went to that hash", or, write it the other direction so you can look up a possible original string from a hash value. You've just created a small rainbow table.
I wonder if its possible to decrypt SHA256 encrypted strings without having the key
SHA-256 is a digest algorithm, not an encryption algorithm (as Ebbe M. Pedersen pointed out in a comment). Digest algorithms don't have keys, and are designed to not be reversible (and even though no collisions are currently known for SHA-256, they're guaranteed to exist by the pigeon-hole principle... so there's no one right answer).
Protocols/processes/algorithms utilizing digest algorithms will often add a salt when hashing, but that's different than a key. The purpose of the salt is to 'defeat' rainbow tables... since you need a new table for every different salt.

VBA Excel modify file and keep same checksum

I saw different ways of creating checksum for some plain text file but I would like to be able to change mentioned file contents but in same time to keep already known (set) checksum by filling rest of the file with necessary characters. I got this idea when long years ago found some app (ATARI computer I think) able to make disk boot-able after change of its ID, if checksum of boot sector is $1234. Is it possible to achieve in VBA? Thank you.
I would like to be able to change mentioned file contents but in same time to keep already known (set) checksum by filling rest of the file with necessary characters.
You can't do that, at least not with any hashing algorithm worth its salt (crypto pun not intended... I swear!). Well you could, in theory, but then there's no telling how many characters (and how much time and disk space!) you're going to need to add in order to get the hash collision that yields exactly the same hash as the original file.
What you're asking is basically defeating the entire purpose of a checksum.
I don't think that ATARI computer used SHA-1 hashing (160 bits), let alone the SHA-256 or SHA-512 (or 128-bit MD5), or any other algorithm in common use today.
You could implement some of the lower-bitness checksum algorithms, but the smaller the hash, the higher the risk of a hash collision - and the easier it is to get a hash that collides with your checksum value, the more meaningless the checksum is.
By definition, a hashing function isn't reversible, and a salted, cryptographic hash will not even produce the same ouptut given two identical inputs. I'm not familiar with checksum, but if I had to implement one, I would probably go with a high-bitness cryptographic hashing algorithm, in order to reduce the risk of a hash collision down to statistical insignificance.

How to securely encrypt many similiar chunks of data with the same key?

I'm writing an application that will require the following security features: when launching the CLI version, you should pass some key to it. Some undefined number of chunks of data of the same size will be generated. It needs to be stored remotely. This will be a sensitive data. I want it to be encrypted and accessible only by that one key that was passed to it initially. My question is, which algorithm will suit me? I read about AES but it says that
When you perform an encryption operation you initialize your Encryptor
with this key, then generate a new, unique Initialization Vector for
each record you’re going to encrypt.
which means I'll have to pass a key and an IV, rather than just the key and this IV should be unique for each generated chunk of data (and there is going to be a lot of those).
If the answer is AES, which encryption mode is it?
You can use any modern symmetric algorithm. The amount of data and how to handle your IVs is irrelevant because it applies no matter which symmetric algorithm you pick.
AES-128 is a good choice, as it isn't limited by law in the US and 128 bits is infeasible to brute force. If you aren't in the US, you could use AES-256 if you wanted to, but implementations in Java require additional installations.
You say you are going to generate n many chunks of data (or retrieve, whatever).
You could encrypt them all at once in CBC mode, which keeps AES as a block cipher, and you'll only end up with one IV. You'll need an HMAC here to protect the integrity. This isn't the most modern way, however.
You should use AES in GCM mode as a stream cipher. You'll still have one single IV (nounce) but the ciphertext will also be authenticated.
IVs should be generated randomly and prepended to the ciphertext. You can then retrieve the IV when it is time to decrypt. Remember: IVs aren't secret, they just need to be random!
EDIT: As pointed out below, IVs should be generated using a crypto-secure random number generator. IVs for CTR based modes, like GCM, only need to be unique.
In summary, what you are worried about shouldn't be worried about. One key is fine. More than one IV is fine too, but there are ways to do it with just one. You will have to worry about IVs either way. Don't use ECB mode.

Review: Protocol for encryption/decryption of big files with authentication

I've been trying to figure out the best way to accomplish the task of encrypting big (several GB) files into the file system for later access.
I've been experimenting with several modes of AES (particularly CBC and GCM) and there are some pros and cons I've found on each approach.
After researching and asking around, I come to the conclusion that at least at this moment, using AES+GCM is not feasible for me, mostly because of the issues it has in Java and the fact that I can't use BouncyCastle.
So I am writing this to talk about the protocol I'm going to be implementing to complete the task. Please provide feedback as you see fit.
Encryption
Using AES/CBC/PKCS5Padding with 256 bit keys
The file will be encrypted using a custom CipherOutputStream. This output stream will take care of writing a custom header at the beginning of the file which will consist of at least the following:
First few bytes to easyly tell that the file is encrypted
IV
Algorithm, mode and padding used
Size of the key
The length of the header itself
While the file is being encrypted, it will be also digested to calculate its authentication tag.
When the encryption ends, the tag will be appended at the end of the file. The tag is of a know size, so this makes it easy to later recover it.
Decryption
A custom CipherInputStream will be used. This stream knows how to read the header.
It will then read the authentication tag, and will digest the whole file (without encrypting it) to validate it has not been tampered (I haven't actually measure how this will perform, however it's the only way I can think of to safely start decryption wihtout the risk of knowing too late the file should not have been decrypted in the first place).
If the validation of the tag is ok, then the header will provide all the information needed to initialize the cipher and make the input stream decrypt the file. Otherwise it will fail.
Is this something that seems ok to you in order to handle encryption/decryption of big files?
Some points:
A) Hashing of the encrypted data, with the hash not encrypted itself.
One of the possible things a malicious human M could do without any hash: Overwrite the encrypted file with something else. M doesn´t know key, the plaintext before and/or the plaintext after this action, but he can change the plaintext to something different (usually, it becomes garbage data). Destruction is also a valid purpose for some people.
The "good" user with the key can still decrypt it without problems, but it won´t be the original plaintext. So far no problems if it´s garbage data if (and only if) you know for sure what´s inside, ie. how to recognize if it is unchanged. But do you know that in every case? And there´s a small chance that the "gargabe" data actually makes sense, but is not the real one anyways.
So, to recognize if the file was changed, you add a SHA hash of the encrypted data.
And if the evil user M overwrites the encrypted file part, he will do what with the hash? Right, he can recalculate it so that it matches the new encrypted data. Once again, you can´t recognize changes.
If the plaintext is hashed and then everything is encrypted, it´s pretty much impossible to get it right. Remember, M doesn´t know the key or anything. M can change the plaintext inside to "something", but can´t change the hash to the correct value for this something.
B) CBC
CBC is fine if you decrypt the whole file or nothing everytime.
If you want to access parts of it without decrypting the unused parts, look at XTS.
C) Processing twice
It will then read the authentication tag, and will digest the whole
file (without encrypting it) to validate it has not been tampered (I
haven't actually measure how this will perform, however it's the only
way I can think of to safely start decryption wihtout the risk of
knowing too late the file should not have been decrypted in the first
place).
Depending on how the files are used, this in indeed necessary. Especially if you want to use the data during the final step already, before it has finished.
I don´t know details about the Java CipherOutputStream,
but besides that and the mentioned points, it looks fine to me.

Is there an algorithm for unique "hashes"

I'm interested in finding an algorithm that can encode a piece of data into a sort of hash (as in that is impossible to convert back into the source data, except by brute force), but also has a unique output for every unique input. The size of the output doesn't matter.
It should be able to hash the same input twice though, and give the same output, so regular encryption with a random, discarded key won't suffice. Nor will regular encryption with a known key, or a salt, because they would be exposed to attackers.
Does such a thing exist?
Can it event theoretically exist, or is the data-destroying part of normal hash algorithms critical for the irreversible characteristic?
What use would something like this be? Well, imagine a browser with a list of websites that should be excluded from the history (like NSFW sites). If this list is saved unencoded or encrypted with a key known on the system, it's readable not just by the browser but also by bosses, wives, etc.
If instead the website addresses are stored hashed, they can't be read, but the browser can check if a site is present in the list.
Using a normal hash function could result in false positives (however unlikely).
I'm not building a browser, I have no plan to actually use the answer. I'm just curious and interested in encryption and such.
Given the definition of a hash;
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value.
no - it's not theoretically possible. A hash value is of a fixed length that is generally smaller than the data it is hashing (unless the data being hashed is less than the fixed length of the hash). They will always lose data, and as such there can always be collisions (a hash function is considered good if the risk of collision is low, and infeasible to compute.)
In theory it's impossible for outputs that are shorter than the input. This trivially follows from the pidgeon-hole principle.
You could use asymmetric encryption where you threw away the private key. That way it's technically lossless encryption, but nobody will be able to easily reverse it. Note that this is much slower than normal hashing, and the output will be larger than the input.
But the probability of collision drops exponentially with the hash size. A good 256 bit hash is collision free for all practical purposes. And by that I mean hashing for billions of years with all computers in the world will almost certainly not produce collision.
Your extended question shows two problems.
What use would something like this be? Well, imagine a browser with a list of websites that should be excluded from the history (like NSFW sites). If this list is saved unencoded or encrypted with a key known on the system, it's readable not just by the browser but also by bosses, wives, etc.
If instead the website addresses are stored hashed, they can't be read, but the browser can check if a site is present in the list.
Brute force is trivial in this use case. Just find the list of all domains/the zone file. Wouldn't be surprised if a good list is downloadable somewhere.
Using a normal hash function could result in false positives (however unlikely).
The collision probability of a hash is much lower(especially since you have no attacker that tries to provoke a collision in this scenario) than the probability of hardware error.
So my conclusion is to combine a secret with a slow hash.
byte[] secret=DeriveKeyFromPassword(pwd, salt, enough iterations for this to take perhaps a second)
and then for the actual hash use a KDF again combining the secret and the domain name.
Any form of lossless public encryption where you forget the private key.
Well, any lossless compressor with a password would work.
Or you could salt your input with some known (to you) text. This would give you something as long as the input. You could then run some sort of lossless compression on the result, which would make it shorter.
you can find a hash function with a low probability of that happening, but i think all of them are prone to birthday attack, you can try to use a function with a large size output to minimize that probability
Well what about md5 hash? sha1 hash?
I don't think it can exist; if you can put anything into them and get a different result, it couldn't be a fixed length byte array, and it would lose a lot of its usefulness.
Perhaps instead of a hash what you are looking for is reversible encryption? That should be unique. Won't be fast, but it will be unique.

Resources