CRC16 collision (2 CRC values of blocks of different size) - security

The Problem
I have a textfile which contains one string per line (linebreak \r\n). This file is secured using CRC16 in two different ways.
CRC16 of blocks of 4096 bytes
CRC16 of blocks of 32768 bytes
Now I have to modify any of these 4096 byte blocks, so it (the block)
contains a specific string
does not change the size of the textfile
has the same CRC value as the original block (same for the 32k block, that contains this 4k block)
Depart of that limitations I may do any modifications to the block that are required to fullfill it as long as the file itself does not break its format. I think it is the best to use any of the completly filled 4k blocks, not the last block, that could be really short.
The Question
How should I start to solve that problem? The first thing I would come up is some kind of bruteforce but wouldn't it take extremly long to find the changes that will result in both CRC values stay the same? Is there probably a mathematical way to solve that?
It should be done in seconds or max. few minutes.

There are math ways to solve this but I don't know them. I'm proposing a brute-force solution:
A block looks like this:
SSSSSSSMMMMEEEEEEE
Each character represents a byte. S = start bytes, M = bytes you can modify, E = end bytes.
After every byte added to the CRC it has a new internal state. You can reuse the checksum state up to that position that you modify. You only need to recalculate the checksum for the modified bytes and all following bytes. So calculate the CRC for the S-part only once.
You don't need to recompute the following bytes either. You just need to check whether the CRC state is the same or different after the modification you made. If it is the same, the entire block will also be the same. If it is different, the entire block is likely to be different (not guaranteed, but you should abort the trial). So you compute the CRC of just the S+M' part (M' being the modified bytes). If it equals the state of CRC(S+M) you won.
That way you have much less data to go through and a recent desktop or server can do the 2^32 trials required in a few minutes. Use parallelism.

Take a look at spoof.c. That will directly solve your problem for the CRC of the 4K block. However you will need to modify the code to solve the problem simultaneously for both the CRC of the 4K block and the CRC of the enclosing 32K block. It is simply a matter of adding more equations to solve. The code is extremely fast, running in O(log(n)) time, where n is the length of the message.
The basic idea is that you will need to solve 32 linear equations over GF(2) in 32 or more unknowns, where each unknown is a bit location that you are permitting to be changed. It is important to provide more than 32 unknowns with which to solve the problem, since if you pick exactly 32, it is not at all unlikely that you will end up with a singular matrix and no solution. The spoof code will automatically find non-singular choices of 32 unknown bit locations out of the > 32 that you provide.

Related

What is a quick way to check if file contents are null?

I have a rather large file (32 GB) which is an image of an SD card, created using dd.
I suspected that the file is empty (i.e. filled with the null byte \x00) starting from a certain point.
I checked this using python in the following way (where f is an open file handle with the cursor at the last position I could find data at):
for i in xrange(512):
if set(f.read(64*1048576))!=set(['\x00']):
print i
break
This worked well (in fact it revealed some data at the very end of the image), but took >9 minutes.
Has anyone got a better way to do this? There must be a much faster way, I'm sure, but cannot think of one.
Looking at a guide about memory buffers in python here I suspected that the comparator itself was the issue. In most non-typed languages memory copies are not very obvious despite being a killer for performance.
In this case, as Oded R. established, creating a buffer from read and comparing the result with a previously prepared nul filled one is much more efficient.
size = 512
data = bytearray(size)
cmp = bytearray(size)
And when reading:
f = open(FILENAME, 'rb')
f.readinto(data)
Two things that need to be taken into account is:
The size of the compared buffers should be equal, but comparing bigger buffers should be faster until some point (I would expect memory fragmentation to be the main limit)
The last buffer may not be the same size, reading the file into the prepared buffer will keep the tailing zeroes where we want them.
Here the comparison of the two buffers will be quick and there will be no attempts of casting the bytes to string (which we don't need) and since we reuse the same memory all the time, the garbage collector won't have much work either... :)

What does the IS_ALIGNED macro in the linux kernel do?

I've been trying to read the implementation of a kernel module, and I'm stumbling on this piece of code.
unsigned long addr = (unsigned long) buf;
if (!IS_ALIGNED(addr, 1 << 9)) {
DMCRIT("#%s in %s is not sector-aligned. I/O buffer must be sector-aligned.", name, caller);
BUG();
}
The IS_ALIGNED macro is defined in the kernel source as follows:
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
I understand that data has to be aligned along the size of a datatype to work, but I still don't understand what the code does.
It left-shifts 1 by 9, then subtracts by 1, which gives 111111111. Then 111111111 does bitwise-and with x.
Why does this code work? How is this checking for byte alignment?
In systems programming it is common to need a memory address to be aligned to a certain number of bytes -- that is, several lowest-order bits are zero.
Basically, !IS_ALIGNED(addr, 1 << 9) checks whether addr is on a 512-byte (2^9) boundary (the last 9 bits are zero). This is a common requirement when erasing flash locations because flash memory is split into large blocks which must be erased or written as a single unit.
Another application of this I ran into. I was working with a certain DMA controller which has a modulo feature. Basically, that means you can allow it to change only the last several bits of an address (destination address in this case). This is useful for protecting memory from mistakes in the way you use a DMA controller. Problem it, I initially forgot to tell the compiler to align the DMA destination buffer to the modulo value. This caused some incredibly interesting bugs (random variables that have nothing to do with the thing using the DMA controller being overwritten... sometimes).
As far as "how does the macro code work?", if you subtract 1 from a number that ends with all zeroes, you will get a number that ends with all ones. For example, 0b00010000 - 0b1 = 0b00001111. This is a way of creating a binary mask from the integer number of required-alignment bytes. This mask has ones only in the bits we are interested in checking for zero-value. After we AND the address with the mask containing ones in the lowest-order bits we get a 0 if any only if the lowest 9 (in this case) bits are zero.
"Why does it need to be aligned?": This comes down to the internal makeup of flash memory. Erasing and writing flash is a much less straightforward process then reading it, and typically it requires higher-than-logic-level voltages to be supplied to the memory cells. The circuitry required to make write and erase operations possible with a one-byte granularity would waste a great deal of silicon real estate only to be used rarely. Basically, designing a flash chip is a statistics and tradeoff game (like anything else in engineering) and the statistics work out such that writing and erasing in groups gives the best bang for the buck.
At no extra charge, I will tell you that you will be seeing a lot of this type of this type of thing if you are reading driver and kernel code. It may be helpful to familiarize yourself with the contents of this article (or at least keep it around as a reference): https://graphics.stanford.edu/~seander/bithacks.html

How to divide block in piece when they overlap

Some input I'm looking to build a simple minimal bittorrent client.
I reading the protocol spec for a 2-3 days now.
here what my understanding on it thus far . Assuming that torrent has a piece length of 26000 bytes and according to non official spec block size is 16384. Something like this.
Now upon request of a block of piece message would look like this
piece 0
block offset 0
block length 16484
So far so good.
Now, for next block which overlap in piece 0 and 1 what should the request look like
piece 0 ## since the start of byte is in piece 0 use piece 0 instead of piece 1
block offset 16384
block length 16384
Now on the receiving end I need to recreate the piece of 26000 bytes so that I can compare that with pieces (hash) to match the piece for correctness.
Is my understanding correct ?
Also I'm let suppose the piece verification failed and may be it because of the first block i.e Block 0 (which is faulty or corrupt)
then I should requeue Block 0 and Block 1 (which was valid btw and also a part of piece 1) to retransmit again.
And now suddenly the piece and block distribution become a bit complex then what I assume it be. and I hoping there is a simpler solution to this.
Any thought
Will use the more distinct term 'chunk' instead of the ambiguous 'block'.
A torrent is divided into pieces.
A piece is divided into chunks.
A chunk is cut from one piece.
A torrent is divided into pieces when it's created. With the Request message, a piece is in turn further divided into chunks by the downloading BitTorrent client.
How the client cut the chunks out from a piece doesn't matter, as long as no single chunk is larger than 16 KB (16384 bytes).
The simplest and most rational way to divide a piece, is to do it in as few chunks as possible, by dividing it in 16 KB chunks and let the last chunk of the piece be smaller if necessary.
The Request message format: <len=0013><id=6><Piece_index><Chunk_offset><Chunk_length>
<Piece_index > integer specifying the zero-based piece index
<Chunk_offset> integer specifying the zero-based byte offset within the piece
<Chunk_length> integer specifying the requested number of bytes
When requesting a chunk:
the whole chunk must be within the piece specified by the Piece_index,
ie Chunk_offset+Chunk_length must be less or equal to the size of that specific piece*.
the Chunk_length can not be larger than 16 KB (16384 bytes) and must be at least 1 byte
the peer that get the request must have the piece specified by the Piece_index
If any of the conditions is not met, the peer receiving the request will close the connection.
* For all pieces except the very last one that is the 'piece length' defined in the info-dictionary.
The size of the last piece can by calculated as:
size_last_piece = size_of_torrent - (number_of_pieces - 1) * 'piece length'
The maximum block size commonly accepted by clients is 16KiB. Clients are free to make smaller requests.
Pieces are commonly a multiple of 16KiB, but the current spec does not require it (this changes with BEP52) and some people use prime numbers or similar things for fun, so they do exist in the wild.
Blocks only exist in the sense that you need multiple requests to get a complete piece that is larger than 16KiB. In other words, blocks are the same thing as whatever you decide to request. You could request 500 bytes, then 1017 bytes and then 13016 bytes, ... until you got a complete piece. They are arbitrary subdivisions within a piece - there is no overlap - that you need to keep track of between the start of downloading a piece and finishing the piece.
They do not participate in hashing, they do not factor into the HAVE or BITFIELD messages. Only REQUEST, PIECE, CANCEL and REJECT messages concern themselves with blocks. And instead of blocks you could also call them sub-piece offset-length tuples or something to that effect.
Last block in a piece may be smaller than the transfer block size. I.e. 26000 - 16384 = 9616 bytes should be requested in the second PIECE message. As soon as all 26000 bytes have been received, SHA-1 hash should be calculated and compared with the corresponding checksum from the pieces section of metainfo dictionary. If the checksum does not match, you have no means to know which block contained invalid data and should re-download all blocks from this piece.
My advice would be not to depend on some particular partitioning of the piece, because:
1) peers may use a different transfer block size when requesting data
2) SHA-1 algorithm is block-based, and the digester better use a bigger block size (otherwise calculations will take more time)
A proper abstraction for a piece would be a generic data range with the following methods:
read(from:int, length:int):byte[]
write(offset:int, block:byte[]):()
Then you'll be able to read/write arbitrary subranges of data.

Implementing CRC16 in verilog with dynamic data packet length

Thank you for reading this and for all of your help. Anyway...I am trying to implement a crc16 with polynomial x^16 + x^12 + x^5 + 1 in verilog. The problem I have encountered is that I don't get the entire packet of data at one point in time. I get a 32 bit word at a time and the number of words is dynamic but is at least 4 words and can be as high as 16384 words or higher. The time is not much of an issue because I am running on a 150 MHz clk and the input is coming in at most a 33 MHz clk but may be a 10 MHz. This does not really affect me because I am first accepting the data via a FIFO.
I have been trying to develop an FSM but have really hit a roadblock. One idea is for me to wait for all the data and then just input the entire thing as one big data packet; however, this seems really inefficient and I just don't think I need to do this. Plus it could take up valuable resources. Another way I was playing with was to input the first word and do the XOR operation. Then when the input data only has 1 to 2 bits left that are not xored (not sure if that is worded correctly) I would input the next word. Upon the input I would continue to compute the CRC followed by another input until the last word is imputed into the module.
With this method I would need to implement a counter or a shift register in some fashion. Anyway, any help would be nice. This goes into a command parser/packet parser. Thank you so much for your help.
A CRC calculation doesn't need to be done serially 1-bit at a time. You can essentially "unroll" the calculation to come up with the individual equations for each bit of a parallel CRC generator. With that, you can create a CRC generator that processes 32-bits of input data at a time, matching your datapath width. This should simplify your design as well as make it higher performance (processing each bit serially wouldn't meet your throughput requirements anyway, unless you don't mind holding off incoming data while the hw generates the CRC).

xentop VBD_RD & VBD_WR output

I'm writing a perl script that track the output of xentop tool, I'm not sure what is the meaning for VBD_RD & VBD_WR
Following http://support.citrix.com/article/CTX127896
VBD_RD number displays read requests
VBD_WR number displays write requests
Dose anyone know how read & write requests are measured in bytes, kilobytes, megabytes??
Any ideas?
Thank you
As far as I understood, xentop shows you two different measuers (two for read and write).
VBD_RD and VBD_WR's measures are unit. The number of times you tried to access to the block device. This does not say anything about the the number of bytes you have read or written.
The second measure read (VBD_RSECT) and write (VBD_WSECT) are measured in "sectors". You can find the size of the sectors by using xenstore-ls (https://serverfault.com/questions/153196/xen-find-vbd-id-for-physical-disks) (in my case it was 512).
The unit of a sector is in bytes (http://xen.1045712.n5.nabble.com/xen-3-3-testing-blkif-Clarify-units-for-sector-sized-blkif-request-params-td2620172.html).
So if the VBD_WR value is 2, VBD_WSECT value is 10, sector size is 512. We have written 10 * 512 bytes in two different requests (you tried to access to the block device two times but we know nothing about how much bytes were written in each request but we only know the total). To find disk I/O you can periodically check these values and take the derivative between those values.
I suppose the sector size might change for each block device somehow, so it might be worthy to check the xenstore-ls output for each domain but I'm not sure. You can probably define it in the cfg file too.
This is what I found out and understood so far. I hope this helps.

Resources