I want to use OpenSSL for data transmission between server and client. I want to do it using EVP with AES in CBC mode. But when I try to decode second message on client, EVP_EncryptFinal_ex returns 0.
The my scheme is shown on picture.
I think, this behavior because I call EVP_EncryptFinal_ex (and EVP_DecryptFinal_ex) twice for one EVP context. How to do it correctly?
You cannot call EVP_EncryptUpdate() after calling EVP_EncryptFinal_ex() according to the EVP docs.
If padding is enabled (the default) then EVP_EncryptFinal_ex()
encrypts the "final" data, that is any data that remains in a partial
block. It uses standard block padding (aka PKCS padding) as described
in the NOTES section, below. The encrypted final data is written to
out which should have sufficient space for one cipher block. The
number of bytes written is placed in outl. After this function is
called the encryption operation is finished and no further calls to
EVP_EncryptUpdate() should be made.
Instead, you should setup the cipher ctx for encryption again by calling EVP_EncryptInit_ex(). Note that unlike EVP_EncryptInit(), with EVP_EncryptInit_ex(), you can continue reusing an existing context without allocating and freeing it up on each call.
Related
I have read one article about difference between the methods update() and dofinal() in cipher.
It was about what will happend if we want to encrypt 4 Bytes Array, when the block size of the cipher is for example 8 Bytes. If we call update here it will return null. My question is: what will happen if we call doFinal() with a 4 byte array to encrypt, and the buffer size is 8 bytes, how many bytes encoded data will we receive on the return?
update(): feed the data, again and again, enables you to encrypt long files, streams.
dofinal(): apply the requested padding scheme to the data, if requested and necessary, then encrypt. ECB and CBC mode requires padding but CTR mode doesn't. If NOPADDING has used some libraries may secretly pad, in others you have to handle the padding yourself.
When you call, dofinal() with 4-byte data, if NOPADDING is not set, it will be padded and then encrypted.
From Java Doc;
update(byte[] input)
Continues a multiple-part encryption or decryption operation (depending on how this cipher was initialized), processing another data part.
doFinal()
Finishes a multiple-part encryption or decryption operation, depending on how this cipher was initialized.
I have an API route that proxies a file upload from the browser/client to AWS S3.
This API route attempts to stream the file as it is uploaded to avoid buffering the entire contents of the file in memory on the server.
However, the route also attempts to calculate an MD5 checksum of the file's body. As each part of the file is chunked, the hash.update() method is invoked w/ the chunk.
http://nodejs.org/api/crypto.html#crypto_hash_update_data_input_encoding
var crypto = require('crypto');
var hash = crypto.createHash('md5');
function write (chunk) {
// invoked many times as file is uploaded
hash.update(chunk);
}
function done() {
// will hash buffer all chunks in memory at this point?
hash.digest('hex');
}
Will the instance of Hash buffer all the contents of the file in order to perform the hash calculation (thus defeating the goal of avoiding buffering the entire file's contents in memory)? Or can an MD5 hash be calculated incrementally, without ever having the entire input available to perform the calculation?
MD5 and some other hash functions are based on the Merkle–Damgård construction. It supports the incremental/progressive/streaming hashing of data. After the data is transformed into an internal state (which has a fixed size) a last finalization step is performed to generate the final hash by padding and processing the last block and afterwards by simply returning the final state.
This is probably also why many hashing library functions are designed in such a way with an update and a finalization step.
To answer your question: No, the file content is not kept in a buffer, but is rather transformed into a fixed size internal state.
All modern cryptographic hash functions are created in such a way that they can be updated incrementally.
To allow for incremental updates, the input data of the message is first arranged in blocks. These blocks are processed in order. To do this the implementation usually buffers the input internally until it has a full block, and then processes this block together with the current state to produce a new state, using a so called compression function. The initial state usually simply consists of predetermined constant values. During the call to digest the last block is padded - usually with bit padding and an encoding of the amount of processed bytes - and the final state is calculated; this may require an additional block without any message data. A final operation may be performed and finally the resulting hash value is returned.
For MD5 the Merkle–Damgård construction is used. This common construction is also used for SHA-1 and SHA-2. SHA-2 is a family of hashes based on the algorithms for SHA-256 (SHA-224) and SHA-512 (SHA-384, SHA-512/224 and SHA-512/256). MD5 in particular uses a block size of 512 bits and a internal state of 128 bits. The internal state of the last block (including padding) is simply output directly without any post-processing for MD5, SHA-1, SHA-256 and SHA-512.
Keccak has been chosen to be SHA-3. It is construction based on a sponge, a specific compression function. It isn't a Merkle–Damgård hash - which is a big reason why it has been chosen as SHA-3. It still has all the update properties of Merkle–Damgård hashes and has been designed to be compatible with SHA-2. It splits up and buffers blocks just like the previously mentioned hashes, but it has a larger internal state and performs final operations on the output, making it arguably more secure.
So when you were using a modern hash construction such as MD5 you were unknowingly performing additional buffering. Fortunately, the buffering of a single block of 512 bits + 128 bits for the state size will not likely make you run out of memory. It is certainly not required for the hash implementation to buffer the entire message before the final hash value can be calculated.
Notes:
MD5 and SHA-1 are considered insecure w.r.t. collision resistance and they should preferably not be used anymore, especially when it comes to validating contents;
A "compression function" is a specific cryptographic notion; it is not
LSZIP or anything similar;
There may be specialized, theoretical hashes that perform the calculate the values differently - theoretically speaking there is no requirement to split the input messages into blocks and operate on the blocks sequentially. No worry, those are unlikely to be in the libraries you are using;
Similarly, implementations may decide to buffer more blocks at once, but that is fortunately extremely uncommon as well. Commonly only one block is used as buffer - in some cases it could be more performant to buffer a few blocks instead;
Some low level implementations may require you to supply the blocks yourself for reasons of efficiency.
I have an API route that proxies a file upload from the browser/client to AWS S3.
This API route attempts to stream the file as it is uploaded to avoid buffering the entire contents of the file in memory on the server.
However, the route also attempts to calculate an MD5 checksum of the file's body. As each part of the file is chunked, the hash.update() method is invoked w/ the chunk.
http://nodejs.org/api/crypto.html#crypto_hash_update_data_input_encoding
var crypto = require('crypto');
var hash = crypto.createHash('md5');
function write (chunk) {
// invoked many times as file is uploaded
hash.update(chunk);
}
function done() {
// will hash buffer all chunks in memory at this point?
hash.digest('hex');
}
Will the instance of Hash buffer all the contents of the file in order to perform the hash calculation (thus defeating the goal of avoiding buffering the entire file's contents in memory)? Or can an MD5 hash be calculated incrementally, without ever having the entire input available to perform the calculation?
MD5 and some other hash functions are based on the Merkle–Damgård construction. It supports the incremental/progressive/streaming hashing of data. After the data is transformed into an internal state (which has a fixed size) a last finalization step is performed to generate the final hash by padding and processing the last block and afterwards by simply returning the final state.
This is probably also why many hashing library functions are designed in such a way with an update and a finalization step.
To answer your question: No, the file content is not kept in a buffer, but is rather transformed into a fixed size internal state.
All modern cryptographic hash functions are created in such a way that they can be updated incrementally.
To allow for incremental updates, the input data of the message is first arranged in blocks. These blocks are processed in order. To do this the implementation usually buffers the input internally until it has a full block, and then processes this block together with the current state to produce a new state, using a so called compression function. The initial state usually simply consists of predetermined constant values. During the call to digest the last block is padded - usually with bit padding and an encoding of the amount of processed bytes - and the final state is calculated; this may require an additional block without any message data. A final operation may be performed and finally the resulting hash value is returned.
For MD5 the Merkle–Damgård construction is used. This common construction is also used for SHA-1 and SHA-2. SHA-2 is a family of hashes based on the algorithms for SHA-256 (SHA-224) and SHA-512 (SHA-384, SHA-512/224 and SHA-512/256). MD5 in particular uses a block size of 512 bits and a internal state of 128 bits. The internal state of the last block (including padding) is simply output directly without any post-processing for MD5, SHA-1, SHA-256 and SHA-512.
Keccak has been chosen to be SHA-3. It is construction based on a sponge, a specific compression function. It isn't a Merkle–Damgård hash - which is a big reason why it has been chosen as SHA-3. It still has all the update properties of Merkle–Damgård hashes and has been designed to be compatible with SHA-2. It splits up and buffers blocks just like the previously mentioned hashes, but it has a larger internal state and performs final operations on the output, making it arguably more secure.
So when you were using a modern hash construction such as MD5 you were unknowingly performing additional buffering. Fortunately, the buffering of a single block of 512 bits + 128 bits for the state size will not likely make you run out of memory. It is certainly not required for the hash implementation to buffer the entire message before the final hash value can be calculated.
Notes:
MD5 and SHA-1 are considered insecure w.r.t. collision resistance and they should preferably not be used anymore, especially when it comes to validating contents;
A "compression function" is a specific cryptographic notion; it is not
LSZIP or anything similar;
There may be specialized, theoretical hashes that perform the calculate the values differently - theoretically speaking there is no requirement to split the input messages into blocks and operate on the blocks sequentially. No worry, those are unlikely to be in the libraries you are using;
Similarly, implementations may decide to buffer more blocks at once, but that is fortunately extremely uncommon as well. Commonly only one block is used as buffer - in some cases it could be more performant to buffer a few blocks instead;
Some low level implementations may require you to supply the blocks yourself for reasons of efficiency.
I am working on an OpenSSL project. While using the encryption and decryption functions under EVP. EVP_Decrypt_Final is not showing an error but after every OP_SIZE there is 8 bytes of extra data coming in the decrypted file. I used the programs given in stackoverflow with various other users but the error was same.
Please help :)
Extra 8 bytes of data may be result of padding. Block cipher encrypts/decrypts a block of fixed size at a time. If a given block is smaller than the block size, it is padded.
It looks like that you are using ECB or CBC mode.
You may be encrypting the data of multiple blocks. Then you should know different modes of block cipher.
If you do not want padding, consider encrypting your data using CFB or CTR mode.
I'm currently working on a voip project and have a question about the implementation of AES-CBC mode. I know that for instant messaging based on text message communication, it's important to generate an IV for every message to avoid possible guess of the first block if this one is redundant during the communication.
But is it useful to do the same with audio data ? Since audio data is much more complex than clear text, i'm wondering if it would be wise to generate an IV for each audio chunk ( that would mean a lot of IVs per second, more than 40 ), or will this just slow everything down for nothing? Or just one IV generated at the start of the conversation should be enough?
Thanks in advance,
Nolhian
You do not need to generate new IVs each time.
For example, in SSH and TLS only one IV is used for a whole data session, and rekeying is needed only after some gbytes of data.
CBC requires a new IV for each message. However nobody said that you had to send a message in one go.
Consider SSL/TLS. The connection begins with a complex procedure (the "handshake") which results in a shared "master key" from which are derived symmetric encryption keys, MAC keys, and IVs. From that point and until the connection end (or new handshake), the complete data sent by the client to the server is, as far as CBC is concerned, one unique big message which uses, quite logically, a unique IV.
In more details, with CBC each block (of 16 bytes with AES) is first XORed with the previous encrypted block, then is itself encrypted. The IV is needed only for the very first block, since there is no previous block at that point. One way of seeing it is that each encrypted block is the IV for the encryption of what follows. When, as part of the SSL/TLS dialog, the client sends some data (a "record" in SSL speak), it remembers the last encrypted block of that record, to be used as IV for the next record.
In your case, I suppose that you have an audio stream to encrypt. You could handle it as SSL/TLS does, simply chopping the CBC stream between blocks. It has, however, a slight complication: usually, in VoIP protocols, some packets may be lost. If you receive a chunk of CBC-encrypted data and do not have the previous chunk, then you do not know the IV for that chunk (i.e. the last encrypted block of the previous chunk). You are then unable to properly decrypt the first block (16 bytes) of the chunk you receive. Whether recovery from that situation is easy or not depends on what data you are encrypting (in particular, with audio, what kind of compression algorithm you use). If that potential loss is a problem, then a workaround is to include the IV in each chunk: in CBC-speak, the last encrypted block of a chunk (in a packet) is repeated as first encrypted block in the next chunk (in the next packet).
Or, to state it briefly: you need an IV per chunk, but CBC generates these IV "naturally" because all the IV (except the very first) are blocks that you just encrypted.