I am writing a program that takes a passphrase from the user and then writes some encrypted data to file. The method that I have come up with so far is as follows:
Generate a 128-bit IV from hashing the filename and the system time, and write this to the beginning of the file.
Generate a 256-bit key from the passphrase using SHA256.
Encrypt the data (beginning with a 32-bit static signature) with this key using AES in CBC mode, and write it to file.
When decrypting, the IV is read, and then the passphrase used to generate the key in the same way, and the first 32-bits are compared against what the signature should be in order to tell if the key is valid.
However I was looking at the AES example provided in PolarSSL (the library I am using to do the hashing and encryption), and they use a much more complex method:
Generate a 128-bit IV from hashing the filename and file size, and write this to the beginning of the file.
Generate a 256-bit key from hashing (SHA256) the passphrase and the IV together 8192 times.
Initialize the HMAC with this key.
Encrypt the data with this key using AES in CBC mode, and write it to file, while updating the HMAC with each encrypted block.
Write the HMAC to the end of the file.
I get the impression that the second method is more secure, but I don't have enough knowledge to back that up, other than that it looks more complicated.
If it is more secure, what are the reasons for this?
Is appending an HMAC to the end of the file more secure than having a signature at the beginning of the encrypted data?
Does hashing 8192 times increase the security?
Note: This is an open source project so whatever method I use, it will be freely available to anyone.
The second option is more secure.
Your method, does not provide any message integrity. This means that an attacker can modify parts of the ciphertext and alter what the plain text decrypts to. So long as they don't modify anything that will alter your 32-bit static signature then you'll trust it. The HMAC on the second method provides message integrity.
By hashing the key 8192 times it adds extra computational steps for someone to try and bruteforce the key. Assume a user will pick a dictionary based password. With your method an attacker must perform SHA256(someguess) and then try and decrypt. However, with the PolarSSL version, they will have to calculate SHA256(SHA256(SHA256...(SHA256(someguess))) for 8192 times. This will only slow an attacker down, but it might be enough (for now).
For what it's worth, please use an existing library. Cryptography is hard and is prone to subtle mistakes.
Related
I want to encrypt and decrypt strings. I'm using Nodejs crypto for this. I've read that when encrypting and decrypting it's highly recommended to use an IV. I want to store the encrypted data inside a MySQL database and decrypt it later when needed. I understand that I need the IV also for the decryption process. But what exactly is an IV and how should I store it? I read something about that an IV does not to be kept secret. Does this mean I can store it right next to the encrypted data it belongs to?
it's highly recommended to use an IV
No, it's required or you'll not get a fully secure ciphertext in most circumstances. At the very minimum, not supplying an IV for the same key and plaintext message will result in identical ciphertext, which will leak information to an adversary. In other words: encryption would be deterministic, and that's not a property that you want from a cipher. For CTR and GCM mode you may well leak all of the plaintext message though...
But what exactly is an IV ... ?
An IV just consists of binary bits. It's size and contents depend on the mode of operation (CBC/CTR/GCM). Generally it needs either to be a nonce or randomized.
CBC mode requires a randomized IV of 16 bytes; generally a cryptographically secure random number generator is used for that.
CTR mode commonly specifies both a nonce and the initial counter value within the IV of 16 bytes. So you already need to put the nonce in the left hand bytes (lowest index). This nonce may be randomized, but then it should be large enough (e.g. 12 bytes) to avoid the birthday problem.
GCM mode requires just a nonce of 12 bytes.
and how should I store it
Anyway you can store the bytes, as long as they can be retrieved or regenerated during decryption. If you need text you may need to encode it using base 64 or hexadecimals (this goes for the ciphertext as well, of course).
I read something about that an IV does not to be kept secret.
That's correct.
Does this mean I can store it right next to the encrypted data it belongs to?
Correct, quite often the IV is simply prefixed to the ciphertext; if you know the block cipher and mode of operation then the size is predetermined after all.
I have come up with a scenario to make a secure data. Suppose I have a public encrypted file that anybody can download. But whenever anyone want to decrypt that data they need to get a key from server
To make the key cannot be shared. The key from server will not be able to decrypt the data directly. But the data must be decrypted with the client's private key after, without server knowing those client's privateKey
I hope below diagram could explain it clearly
Is it possible? What is the algorithm that could do this?
I have come up with a scenario to make a secure data. Suppose I have a public encrypted
file that anybody can download. But whenever anyone want to decrypt that data they need to get a key from server
To make the key cannot be shared. The key from server will not be able to decrypt the
data directly. But the data must be decrypted with the client's
private key after, without server knowing those client's privateKey
Make it so each time the file is downloaded, a random string is appended. The file is then encrypted with the user's public key, and symmetrically with an appropriate hash generated by that same string. For example a GPG file inside a password-protected ZIP file.
So Alice downloads Financial_Report_201809_d8a1b2e6.pdf.zip while Bob downloads Financial_Report_201809_ff2a91c3.pdf.zip.
If they want to decrypt the file, they need to send the server back the random string, and the server will supply them with the password for the outer ZIP. Then they're left with an encrypted file that only their private key can decode.
Note that once they have decrypted the file, nothing stops them from forwarding the file in the clear to someone else. On the other hand, sharing the encrypted PDF avails them nothing, as they would also need to share their private key.
Also note that since they need to be online to get the outer password, and they're left with a cleartext file at the end, this is (almost) functionally equivalent to the file being downloaded in the clear once user identity has been established.
The main differences are:
the ciphered file (PDF in the above example) might not have been encrypted by the server at all. It might have been supplied by the user, who is then satisfied that only he can read the file back (it makes little sense for anyone else to download it, though).
the transmitted file is very securely transmitted. An attacker with full access to the datastream would not be able to decode the file (but this is no more than could be gained by just encrypting with the user's public key - no extra ZIP stage required).
UPDATE
You want to encrypt the whole file only once (for all users), and then send the same file to Alice and Bob, and have them require two different keys at decryption time. The problem here is that Alice's key will also work on Bob's file, since it is the same file. There's no magic that's going to work here, unless you can hide some detail of the decryption process (e.g. use a program that you control and that can't be debugged and that will always connect to your server: a proposition that has consistently shown to be losing).
If you want to limit the encryption cost, you can send the massive file with both a symmetrically encrypted data payload (always the same) and a very short, asymmetrically encrypted key payload (always different), but still you will be vulnerable to the decrypted key being captured:
[ RSA(ALICE.PUB, "SQUEAMISH OSSIFRAGE" ][ RIJNDAEL("SQUEAMISH OSSIFRAGE", LARGE FILE) ]
In the above scenario some program has to read the encryption header and decrypt the 'Squeamish Ossifrage' password, then go on decrypting (e.g. playing) the extra payload without the password being intercepted. This means that you need to supply the program yourself.
This is functionally equivalent to the program connecting to the server and downloading a "yes" or "no" to the question (appropriately encrypted, signed and secured) "I am Alice's player. Can I decrypt and play 'Never Wanna Give You Up.avi'?" , with no passwords or public keys being known or exchanged apart from the secret shared by Alice's player and the server.
UPDATE II
If the goal is to save encryption resources, the encryption could be made client side as hinted in the comment:
the file is encrypted the once, with a purpose-generated private key.
the private key is stored inside a binary (we must assume it to be unhackable).
the user has to supply his public key for the decryption to work
the program can verify the public key from a repository (or, alternately, the user can supply the public key to the server, which will generate and send the binary file for download)
the program then runs both the decryption and reencryption
the user is left with a file encrypted with his public key, that he alone can decrypt.
UPDATE III
In order for the cleartext file to never be exposed (i.e., it does not matter whether the algorithm gets leaked), you could devise the following scheme. Keep in mind that I'm not a cryptographer and there could be all sort of side channels left uncovered.
You prepare a conversion table that maps each 16-bit word into another 16-bit word. This is a flavour of symmetrical encryption, even if you use two reciprocal matrices for encoding and decoding. Each matrix holds all possible 16-bit words, which means 65536 values, and is therefore 128 Kb in size.
You encrypt the file, once, with the encryption matrix. Without the decryption matrix, the file is unusable.
The user has to send you his public key.
You prepare a transmogrification matrix by encrypting each word with that key, and use the decryption value as an index.
So, for example, say the first word of the cleartext file is A18B. In the encryption matrix, after the scramble, the A18B-th position will contain say 701C, and the decryption matrix, therefore, in the 701Cth position, will hold a18b.
The user has a file starting with 701c... which is of no use.
The user sends you his public key and you run 65536 encryptions on all words from 0000 to ffff. You then determine that the encryption of a18b is 791c. You prepare a re-encoding matrix that has 791c in the 701cth position.
You then send the user this matrix, which has 128K bytes, where the 701cth position is 791c.
The user runs the transmogrification, which is very fast, and is left with a file starting with 791c (as the 701c became 791c - I mistakenly chose two similar values in my example, that is of no significance). This value, once decrypted with his private key, will yield a18b which is the "readable" value.
The user has now a file that's been encrypted by his public key. The a18b value never appeared anywhere.
All that's left is for the user to decrypt the file using his private key and a code block size of 16 bits. This operation will be run by the client and be quite slow, and it's the reason why usually a large random quick symmetric key is RSA-encoded, and used to symmetrically quickly encrypt the large file, which can be quickly decrypted after the private key has unlocked the symmetric key.
The user cannot send the 128K to anyone, for they're useless without the private key.
(The problem here is still that the user can now decrypt the file with his private key, and send it around, even if it's unwieldy as it's a very large file).
the data must be decrypted with the client's private key after, without server knowing those client's privateKey
the original file can be decrypted only by a specific client, using their own private key,
There's a commonly used cryptosystem called hybrid cryptosystem.
The steps are:
The original data are encrypted with a random unique key.
The data encryption key is encrypted by a client's public key (the client's public key needs to be know to the server).
The client needs to use its private key to decrypt the file encryption key and decrypt the file
you can use any asymmetric cryptography algorithms.
A public and a private key pairs are used. The public key is used to encrypt data that can only be decrypted with the private key. There are a lot of resources on this, for example the article form InfoSec Institute.
There are several proven good asymmetric algorithms such as RSA, DSA, Elliptic Curve Crytography (used by Ethereum blockchain). There are many Python libraries too.
Currently I am using a particular scheme for securing passwords, and I think I have some points for improvement. The implementation is in Java, so I prefer to use SHA-2 512 as encryption form.
Currently I have a client-server model, so these things can happen:
Client wants to login, he sends his password with one time normal SHA-2 512 encryption over the network.
The server has the passwords stored in the database as for example SHA-2_512(SHA-2_512(password) + salt), with the inner SHA-2_512(password) being the 'encrypted' password it receives over the network.
Password checks are done server side and there is no way anything can leak out from the server, the only possible vulnerability would be if someone could read out the RAM I think.
I have these questions:
An attacker usually creates collision attacks when wanting to hack a password. However how are collision attacks sufficient? If the password needs to be used for other applications like Outlook.com, Facebook or whatever (which likely use another salt as they have nothing to do with my applications), how is a collision attack enough then? Don't you need the real password?
Does SHA-2 512 already use iteration? And even if so, should I change my encryption methods to automatically use a number of iterations plus how many iterations is preferred? I have also read about using a random number of iterations (in a range), how do I store the random factor determenistically?
Should I store system secrets for every iteration in the server code? See http://blog.mozilla.org/webappsec/2011/05/10/sha-512-w-per-user-salts-is-not-enough/ . I could store an array which would hold a static secret for every iteration, with the nth secret being for the nth iteration. Nobody can know the secrets, they are computed once (I guess as encrypting some random string), and then basically stored in the Server's RAM.
Currently I send the typed password from the client to the server as just SHA-2_512(password), should this process be improved, and if so, how? I cannot use salts, because the client does not have the salt available.
Regards.
TLDR: You need to send the password using an encrypted channel, such as TLS. Consider using bcrypt for password hashing.
SHA-2 512 is not an encryption algortihm, it is a message digest algorithm. An encryption algorithm requires a key and a message to encrypt. It produces ciphertext. The important thing is that an encryption algorithm has a decryption algorithm.
ciphertext = E(key, plaintext);
plaintext = D(key, ciphertext);
A message digest takes a piece of plaintext and produces a message digest. There is no corresponding reverse mechanism to take a message digest and retrieve the original message. There is also no secret key.
digest = hash(plaintext);
If an attacker is able to access a database with hashes, the attacker can retrieve the original password by brute forcing, trying lots of guesses with the hash algorithm.
digest1 = hash(guess1);
digest2 = hash(guess2); //repeat with lots of guesses
Firstly, sending a hash over a network is not secure. It needs to be sent through some secure communications mechanism such as SSL. If an attacker can intercept the hash over the communications they may be able to work out the orignal password.
A hash collision is not the same as brute forcing the password. A hash collision is caused when two different messages produce the same message digest.
digest1 = hash(plaintext1);
digest2 = hash(plaintext2);
if ( ( plaintext1 != plaintext2 ) && ( digest1 == digest2 ) )
// hash collision
SHA-512 does not have iterations designed to prevent brute-forcing. The SHA set of algorithms are designed to be efficient. The reason for adding iterations when hashing passwords is to increase the time it takes to brute force a password. The idea being the cost to perform a legitimate login attempt and perform 100 iterations is tiny compared to an attacker who has millions of passwords, each of which requires 100 iterations. Adding more iterations helps reduce the impact of improved processor speeds (which would help an attacker try more iterations quicker).
You should make the number of iterations a configurable limit that is stored against each user. So you store the password hash, salt and iteration count for each user. This means that in the future you can increase the number of iterations to take into account increased hardware power.
Sending the SHA-2 512 in plaintext is not secure. You should send it within an encrypted channel, such as SSL.
Having said all that, SHA-2 is not designed to be a password hashing algorithm. It is designed for message validation and is to be efficient. Consider using a purpose built password hashing algorithm. One example is bcrypt. It is designed to be computationally difficult and has salt and iterations built in.
I am making a protocol that uses packets (i.e., not a stream) encrypted with AES. I've decided on using GCM (based off CTR) because it provides integrated authentication and is part of the NSA's Suite B. The AES keys are negotiated using ECDH, where the public keys are signed by trusted contacts as a part of a web-of-trust using something like ECDSA. I believe that I need a 128-bit nonce / initialization vector for GCM because even though I'm using a 256 bit key for AES, it's always a 128 bit block cipher (right?) I'll be using a 96 bit IV after reading the BC code.
I'm definitely not implementing my own algorithms (just the protocol -- my crypto provider is BouncyCastle), but I still need to know how to use this nonce without shooting myself in the foot. The AES key used in between two people with the same DH keys will remain constant, so I know that the same nonce should not be used for more than one packet.
Could I simply prepend a 96-bit pseudo random number to the packet and have the recipient use this as a nonce? This is peer-to-peer software and packets can be sent by either at any time (e.g., an instant message, file transfer request, etc.) and speed is a big issue so it would be good not to have to use a secure random number source. The nonce doesn't have to be secret at all, right? Or necessarily as random as a "cryptographically secure" PNRG? Wikipedia says that it should be random, or else it is susceptible to a chosen plaintext attack -- but there's a "citation needed" next to both claims and I'm not sure if that's true for block ciphers. Could I actually use a counter that counts the number of packets sent (separate from the counter of the number of 128 bit blocks) with a given AES key, starting at 1? Obviously this would make the nonce predictable. Considering that GCM authenticates as well as encrypts, would this compromise its authentication functionality?
GCM is a block cipher counter mode with authentication. A Counter mode effectively turns a block cipher into a stream cipher, and therefore many of the rules for stream ciphers still apply. Its important to note that the same Key+IV will always produce the same PRNG stream, and reusing this PRNG stream can lead to an attacker obtaining plaintext with a simple XOR. In a protocol the same Key+IV can be used for the life of the session, so long as the mode's counter doesn't wrap (int overflow). For example, a protocol could have two parties and they have a pre-shared secret key, then they could negotiate a new cryptographic Nonce that is used as the IV for each session (Remember nonce means use ONLY ONCE).
If you want to use AES as a block cipher you should look into CMAC Mode or perhaps the OMAC1 variant. With CMAC mode all of the rules for still CBC apply. In this case you would have to make sure that each packet used a unique IV that is also random. However its important to note that reusing an IV doesn't have nearly as dire consequences as reusing PRNG stream.
I'd suggest against making your own security protocol. There are several things you need to consider that even a qualified cryptographer can get it wrong. I'd refer you to the TLS
protocol (RFC5246), and the datagram TLS protocol (RFC 4347). Pick a library and use them.
Concerning your question with IV in GCM mode. I'll tell you how DTLS and TLS do it. They use an explicit nonce, i.e. the message sequence number (64-bits) that is included in every packet, with a secret part that is not transmitted (the upper 32 bits) and is derived from the initial key exchange (check RFC 5288 for more information).
I'm looking to authenticate that a particular message is coming from a particular place.
Example: A repeatedly sends the same message to B. Lets say this message is "helloworld" which is encrypted to "asdfqwerty".
How can I ensure that a third party C doesn't learn that B always receives this same encrypted string, and C starts sending "asdfqwerty" to B?
How can I ensure that when B decrypts "asdfqwerty" to "helloworld", it is always receiving this "helloworld" from A?
Thanks for any help.
For the former, you want to use a Mode of Operation for your symmetric cipher that uses an Initialization Vector. The IV ensures that every encrypted message is different, even if it contains the same plaintext.
For the latter, you want to sign your message using the private key of A(lice). If B(ob) has the public key of Alice, he can then verify she really created the message.
Finally, beware of replay attacks, where C(harlie) records a valid message from Alice, and later replays it to Bob. To avoid this, add a nonce and/or a timestamp to your encrypted message (yes, you could make the IV play double-duty as a nonce).
Add random value to the data being encrypted, and whenever it's decrypted, strip it from the original unencrypted data.
You need decent random number generator. I'm sure Google will help you on that.
C noticing that B receives twice the same encrypted message is an issue called traffic analysis and has historically been a heavy concern (but this was in times which predated public key encryption).
Any decent public encryption system includes some random padding. For instance, for RSA as described in PKCS#1, the encrypted message (of length at most 117 bytes for a 1024-bit RSA key) gets a header with at least eight random (non-zero) bytes, and a few extra data which allows the receiver to unambiguously locate the padding bytes, and see where the "real" data begins. The random bytes will be generated anew every time; hence, if A sends twice the same message to B, the encrypted messages will be different, but B will recover the original message twice.
Random padding is required for public key encryption precisely because the public key is public: if encryption was deterministic, then an attacker could "try" potential messages and look for a match (this is exhaustive search on possible messages).
Public key encryption algorithms often have heavy limitations on data size or performance (e.g. with RSA, you have a strict maximum message length, depending on the key size). Thus, it is customary to use a hybrid system: the public key encryption is used to encrypt a symmetric key K (i.e. a bunch of random bytes), and K is used to symmetrically encrypt the data (symmetric encryption is fast and does not have constraints on input message size). In a hybrid system, you generate a new K for every message, so this also gives you the randomness you need to avoid the issue of encrypting several times the same message with a given public key: at the public encryption level, you are actually never encrypting twice the same message (the same key K), even if the data which is symmetrically encrypted with K is the same than in a previous message. This would protect you from traffic analysis even if the public key encryption itself did not include random padding.
When symmetrically encrypting data with a key K, the symmetric encryption should use an "initial value" (IV) which is randomly and uniformly generated; this is integrated in the encryption mode (some modes only need a non-repeating IV without requiring a random uniform generation, but CBC needs random uniform generation). This is a third level of randomness, protecting you against traffic analysis.
When using asymmetric key agreement (static Diffie-Hellman), since are a bit more complex, because a key agreement results in a key K which you do not choose, and which could be the same ever and ever (between given sender and receiver). In that situation, protection against traffic analysis relies on the symmetric encryption IV randomness.
Asymmetric encryption protocols, such as OpenPGP, describe how the symmetric encryption, public key encryption and randomness should all be linked together, ironing out the tricky details. You are warmly encouraged not to reinvent your own protocol: it is difficult to design a secure protocol, mostly because one cannot easily test for the presence or absence of any weakness.
You may want to study block cipher modes of operation. However, the modes are designed to work on a data stream that is sent over a reliable channel. If your messages are sent out of order over an unreliable transport (e.g. UDP packets), I don't think you can use it.