FIO repeatable buffer fill - linux

Is it possible to have a pseudo-random buffer fill pattern using FIO? ie, the fill pattern for a block would incorporate a seed + block number or offset into a pseudo-random fill generator. This way the entire fill data could be 100% repeatable and verifiable, but more varied than the static pattern provided by --verify=pattern.
My guess at the commands would be something like:
Write pseudo-random data out in verifiable manner
fio --filename=/home/test.bin --direct=1 --rw=write --bs=512 --size=1M --name=verifiable_write --verify=psuedo_rand --verify_psuedo_rand_seed=0xdeadbeef --do_verify=0
Read back pseudo-random data and verify
fio --filename=/home/test.bin --direct=1 --rw=read --bs=512 --size=1M --name=verify_written_data --verify=psuedo_rand --verify_psuedo_rand_seed=0xdeadbeef --do_verify=1
Obviously, I'm making up some options here, but I'm hoping it may get the point across.

(This isn't the right site for this type of question because it's not about programming - Super User or Serverfault look more appropriate)
The fio documentation for buffer_pattern says you can choose a fixed string or number (given in decimal or hex). However look at your examples shows you are doing a verify so the documentation for verify_pattern is relevant. That states you can use %o that sets the block offset. However once you set a fixed pattern that's it - there are no more variables beyond %o. That means with current fio (3.17 at the time of writing) if are choose to use a fixed pattern (e.g. via verify_pattern) there's no way to include seeded random data that can be verified.
If you don't use a fixed pattern and specify verify by checksum then fio will actually use seeded random data but I don't think split verification will check the seed - just that the checksum written into the block matches the data of the rest of the block.
Is it possible to have a pseudo-random buffer fill pattern using FIO?
If the default random buffer fill is OK then yes but if you want to include something like block offset and other additional data alongside that then no at the time of writing (unless you patch the fio source).

Related

How does a hash digest update function work? [duplicate]

I have an API route that proxies a file upload from the browser/client to AWS S3.
This API route attempts to stream the file as it is uploaded to avoid buffering the entire contents of the file in memory on the server.
However, the route also attempts to calculate an MD5 checksum of the file's body. As each part of the file is chunked, the hash.update() method is invoked w/ the chunk.
http://nodejs.org/api/crypto.html#crypto_hash_update_data_input_encoding
var crypto = require('crypto');
var hash = crypto.createHash('md5');
function write (chunk) {
// invoked many times as file is uploaded
hash.update(chunk);
}
function done() {
// will hash buffer all chunks in memory at this point?
hash.digest('hex');
}
Will the instance of Hash buffer all the contents of the file in order to perform the hash calculation (thus defeating the goal of avoiding buffering the entire file's contents in memory)? Or can an MD5 hash be calculated incrementally, without ever having the entire input available to perform the calculation?
MD5 and some other hash functions are based on the Merkle–Damgård construction. It supports the incremental/progressive/streaming hashing of data. After the data is transformed into an internal state (which has a fixed size) a last finalization step is performed to generate the final hash by padding and processing the last block and afterwards by simply returning the final state.
This is probably also why many hashing library functions are designed in such a way with an update and a finalization step.
To answer your question: No, the file content is not kept in a buffer, but is rather transformed into a fixed size internal state.
All modern cryptographic hash functions are created in such a way that they can be updated incrementally.
To allow for incremental updates, the input data of the message is first arranged in blocks. These blocks are processed in order. To do this the implementation usually buffers the input internally until it has a full block, and then processes this block together with the current state to produce a new state, using a so called compression function. The initial state usually simply consists of predetermined constant values. During the call to digest the last block is padded - usually with bit padding and an encoding of the amount of processed bytes - and the final state is calculated; this may require an additional block without any message data. A final operation may be performed and finally the resulting hash value is returned.
For MD5 the Merkle–Damgård construction is used. This common construction is also used for SHA-1 and SHA-2. SHA-2 is a family of hashes based on the algorithms for SHA-256 (SHA-224) and SHA-512 (SHA-384, SHA-512/224 and SHA-512/256). MD5 in particular uses a block size of 512 bits and a internal state of 128 bits. The internal state of the last block (including padding) is simply output directly without any post-processing for MD5, SHA-1, SHA-256 and SHA-512.
Keccak has been chosen to be SHA-3. It is construction based on a sponge, a specific compression function. It isn't a Merkle–Damgård hash - which is a big reason why it has been chosen as SHA-3. It still has all the update properties of Merkle–Damgård hashes and has been designed to be compatible with SHA-2. It splits up and buffers blocks just like the previously mentioned hashes, but it has a larger internal state and performs final operations on the output, making it arguably more secure.
So when you were using a modern hash construction such as MD5 you were unknowingly performing additional buffering. Fortunately, the buffering of a single block of 512 bits + 128 bits for the state size will not likely make you run out of memory. It is certainly not required for the hash implementation to buffer the entire message before the final hash value can be calculated.
Notes:
MD5 and SHA-1 are considered insecure w.r.t. collision resistance and they should preferably not be used anymore, especially when it comes to validating contents;
A "compression function" is a specific cryptographic notion; it is not
LSZIP or anything similar;
There may be specialized, theoretical hashes that perform the calculate the values differently - theoretically speaking there is no requirement to split the input messages into blocks and operate on the blocks sequentially. No worry, those are unlikely to be in the libraries you are using;
Similarly, implementations may decide to buffer more blocks at once, but that is fortunately extremely uncommon as well. Commonly only one block is used as buffer - in some cases it could be more performant to buffer a few blocks instead;
Some low level implementations may require you to supply the blocks yourself for reasons of efficiency.

Variable length messages in Verilog (serial CRC-32)

I'm working with a serial protocol. Messages are of variable length that is known in advance. On both transmission and reception sides, I have the message saved to a shift register that is as long as the longest possible message.
I need to calculate CRC32 of these registers, the same as for Ethernet, as fast as possible. Since messages are variable length (anything from 12 to 64 bits), I chose serial implementation that should run already in parallel with reception/transmission of the message.
I ran into a problem with organization of data before calculation. As specified here , the data needs to be bit-reversed, padded with 32 zeros and complemented before calculation.
Even if I forget the part about running in parallel with receiving or transmitting data, how can I effectively get only my relevant message from max-length register so that I can pad it before calculation? I know that ideas like
newregister[31:0] <= oldregister[X:0] // X is my variable length
don't work. It's also impossible to have the generate for loop clause that I use to bit-reverse the old vector run variable number of times. I could use a counter to serially move data to desired length, but I cannot afford to lose this much time.
Alternatively, is there an operation that would directly give me the padded and complemented result? I do not even have an idea how to start developing such an idea.
Thanks in advance for any insight.
You've misunderstood how to do a serial CRC; the Python question you quote isn't relevant. You only need a 32-bit shift register, with appropriate feedback taps. You'll get a million hits if you do a Google search for "serial crc" or "ethernet crc". There's at least one Xilinx app note that does the whole thing for you. You'll need to be careful to preload the 32-bit register with the correct value, and whether or not you invert the 32-bit data on completion.
EDIT
The first hit on 'xilinx serial crc' is xapp209, which has the basic answer in fig 1. On top of this, you need the taps, the preload value, whether or not to invert the answer, and the value to check against on reception. I'm sure they used to do all this in another app note, but I can't find it at the moment. The basic references are the Ethernet 802.3 spec (3.2.8 Frame check Sequence field, which was p27 in the original book), and the V42 spec (8.1.1.6.2 32-bit frame check sequence, page 311 in the old CCITT Blue Book). Both give the taps. V42 requires a preload to all 1's, invert of completion, and gives the test value on reception. Warren has a (new) chapter in Hacker's Delight, which shows the taps graphically; see his website.
You only need the online generators to check your solution. Be careful, though: they will generally have different preload values, and may or may not invert the result, and may or may not be bit-reversed.
Since X is a viarable, you will need to bit assignments with a for-loop. The for-loop needs to be inside an always block and the for-loop must static unroll (ie the starting index, ending index, and step value must be constants).
for(i=0; i<32; i=i+1) begin
if (i<X)
newregister[i] <= oldregister[i];
else
newregister[i] <= 1'b0; // pad zeros
end

nodejs crypto module, does hash.update() store all input in memory

I have an API route that proxies a file upload from the browser/client to AWS S3.
This API route attempts to stream the file as it is uploaded to avoid buffering the entire contents of the file in memory on the server.
However, the route also attempts to calculate an MD5 checksum of the file's body. As each part of the file is chunked, the hash.update() method is invoked w/ the chunk.
http://nodejs.org/api/crypto.html#crypto_hash_update_data_input_encoding
var crypto = require('crypto');
var hash = crypto.createHash('md5');
function write (chunk) {
// invoked many times as file is uploaded
hash.update(chunk);
}
function done() {
// will hash buffer all chunks in memory at this point?
hash.digest('hex');
}
Will the instance of Hash buffer all the contents of the file in order to perform the hash calculation (thus defeating the goal of avoiding buffering the entire file's contents in memory)? Or can an MD5 hash be calculated incrementally, without ever having the entire input available to perform the calculation?
MD5 and some other hash functions are based on the Merkle–Damgård construction. It supports the incremental/progressive/streaming hashing of data. After the data is transformed into an internal state (which has a fixed size) a last finalization step is performed to generate the final hash by padding and processing the last block and afterwards by simply returning the final state.
This is probably also why many hashing library functions are designed in such a way with an update and a finalization step.
To answer your question: No, the file content is not kept in a buffer, but is rather transformed into a fixed size internal state.
All modern cryptographic hash functions are created in such a way that they can be updated incrementally.
To allow for incremental updates, the input data of the message is first arranged in blocks. These blocks are processed in order. To do this the implementation usually buffers the input internally until it has a full block, and then processes this block together with the current state to produce a new state, using a so called compression function. The initial state usually simply consists of predetermined constant values. During the call to digest the last block is padded - usually with bit padding and an encoding of the amount of processed bytes - and the final state is calculated; this may require an additional block without any message data. A final operation may be performed and finally the resulting hash value is returned.
For MD5 the Merkle–Damgård construction is used. This common construction is also used for SHA-1 and SHA-2. SHA-2 is a family of hashes based on the algorithms for SHA-256 (SHA-224) and SHA-512 (SHA-384, SHA-512/224 and SHA-512/256). MD5 in particular uses a block size of 512 bits and a internal state of 128 bits. The internal state of the last block (including padding) is simply output directly without any post-processing for MD5, SHA-1, SHA-256 and SHA-512.
Keccak has been chosen to be SHA-3. It is construction based on a sponge, a specific compression function. It isn't a Merkle–Damgård hash - which is a big reason why it has been chosen as SHA-3. It still has all the update properties of Merkle–Damgård hashes and has been designed to be compatible with SHA-2. It splits up and buffers blocks just like the previously mentioned hashes, but it has a larger internal state and performs final operations on the output, making it arguably more secure.
So when you were using a modern hash construction such as MD5 you were unknowingly performing additional buffering. Fortunately, the buffering of a single block of 512 bits + 128 bits for the state size will not likely make you run out of memory. It is certainly not required for the hash implementation to buffer the entire message before the final hash value can be calculated.
Notes:
MD5 and SHA-1 are considered insecure w.r.t. collision resistance and they should preferably not be used anymore, especially when it comes to validating contents;
A "compression function" is a specific cryptographic notion; it is not
LSZIP or anything similar;
There may be specialized, theoretical hashes that perform the calculate the values differently - theoretically speaking there is no requirement to split the input messages into blocks and operate on the blocks sequentially. No worry, those are unlikely to be in the libraries you are using;
Similarly, implementations may decide to buffer more blocks at once, but that is fortunately extremely uncommon as well. Commonly only one block is used as buffer - in some cases it could be more performant to buffer a few blocks instead;
Some low level implementations may require you to supply the blocks yourself for reasons of efficiency.

explain me a difference of how MRTG measures incoming data

Everyone knows that MRTG needs at least one value to be passed on it's input.
In per-target options MRTG has 'gauge', 'absolute' and default (with no options) behavior of 'what to do with incoming data'. Or, how to count it.
Lets look at the elementary, yet popular example :
We pass cumulative data from network interface statistics of 'how much packets were recieved by the interface'.
We take it from '/proc/net/dev' or look at 'ifconfig' output for certain network interface. The number of recieved bytes is increasing every time. Its cumulative.
So as i can imagine there could be two types of possible statistics:
1. How fast this value changes upon the time interval. In oher words - activity.
2. Simple, as-is growing graphic that just draw every new value per every minute (or any other time interwal)
First graphic will be saltatory (activity). Second will just grow up every time.
I read twice rrdtool's and MRTG's docs and can't understand which option mentioned above counts what.
I suppose (i am not sure) that 'gauge' draw values as is, without any differentiation calculations (good for measuring how much memory or cpu is used every 5 minutes). And default or 'absolute' behavior tryes to calculate the speed between nearby measures, but what's the differencr between last two?
Can you, guys, explain in a simple manner which behavior stands after which option of three options possible?
Thanks in advance.
MRTG assumes that everything is being measured as a rate (even if it isnt a rate)
Type 'gauge' assumes that you have already calculated the rate; thus, the provided value is stored as-is (after Data Normalisation). This is appropriate for things like CPU usage.
Type 'absolute' assumes the value passed is the count since the last update. Thus, the value is divided by the number of seconds since the last update to get a rate in thingies per second. This is rarely used, and only for certain unusual data sources that reset their value on being read - eg, a script that counts the number of lines in a log file, then truncates the log file.
Type 'counter' (the default) assumes the value passed is a constantly growing count, possibly that wraps around at 16 or 64 bits. The difference between the value and its previous value is divided by the number of seconds since the last update to get a rate in thingies per second. If it sees the value decrease, it will assume a counter wraparound at 16 or 64 bit. This is appropriate for something like network traffic counters, which is why it is the default behaviour (MRTG was originally written for network traffic graphs)
Type 'derive' is like 'counter', but will allow the counter to decrease (resulting in a negative rate). This is not possible directly in MRTG but you can manually create the necessary RRD if you want.
All types subsequently perform Data Normalisation to adjust the timestamp to a multiple of the Interval. This will be more noticeable for Gauge types where the value is small than for counter types where the value is large.
For information on this, see Alex van der Bogaerdt's excellent tutorial

What value for length field for Freescale PowerPC Security Engine 2.0, when using Link Tables?

I am working on the code to use the security engine of my MPC83XX with Openssl.
I can already encrypt/decrypt AES up to 64KByte of data.
The problem comes with data greater than 64KByte since the maximum value of the length-bits is 65535.
I can assume the data is always in one piece on the Ram.
So now I am collecting all the data in a Link Table and use the pointer to the table instead of the pointer to the data and set the J bit to 1.
Now I am not sure what a value I should use for the length-bits since 0 would mean the Dword will be ignored.
The real length of the data is too also big for 16 bit.
http://cache.freescale.com/files/32bit/doc/app_note/AN2755.pdf?fpsp=1
Possible Informations can be found in Chapter 8.
You set LENGTH to the length of the data. See Page 19:
For any sequence of data parcels accessed by a link table or chain of link tables, the combined lengths of the parcels (the sum of their LENGTH and/or EXTENT fields) must equal the combined lengths of the link table memory segments (SEGLEN fields). Otherwise the channel sets the appropriate error bit in the Channel Pointer Status Register...
I'm not sure what mode you're using (and the documentation seems unnecessarily confusing!) but for the usual cipher modes (CBC/CTR/CFB/OFB) the the usual method is simply to chain AES invocations, reusing the same context. You might be able to do this by simply setting "Pointer Dword1" and "Pointer Dword5" to the same thing. There's very little documentation, though; I can't work out where it gets the IV from.

Resources