Reversing the checksum based on the data - protocols

I'm trying to reverse the binary protocol of embedded device. I was able to capture 4 samples of data packages, here they are in hex encoded form:
5E:A1:00:10:10:00:00:02:01:05:F0:F6:4B:00:01:03
5E:A1:00:10:10:00:00:06:01:93:79:DA:F9:00:01:07
5E:A1:00:10:10:00:00:03:01:C9:B1:F0:81:00:01:04
5E:A1:00:10:10:00:00:04:01:A3:BE:2A:3A:00:01:05
Based on other packages I can assert the following:
First 6 bytes (5E:A1:00:10:10:00) - message header, it's static across all other messages.
Next 2 bytes (00:02 or 00:06 or 00:03 or 00:04) - the message numeric id, int16. It's different from message to message.
Next 4 bytes (05:F0:F6:4B or 93:79:DA:F9 or C9:B1:F0:81 or A3:BE:2A:3A) is a checksum of a message. It depends on the data and the message number. I tried that by forming the package manually: when I update bytes in data area of a message or the message number, but not the checksum - the message gets declined by the remote server.
Everything else is just a data of variable length.
My question is: how can I understand the algorithm used for the checksum generation? Is there any software which can be used for that kind of purpose? For example, I input the mask in the data and it tries to guess the algorithm.

If it is a CRC, then reveng may be able to deduce the parameters of the CRC, given enough examples.

Related

Unused bytes by protobuf implementation (for limiter implementation)

I need to transfer data over a serial port. In order to ensure integrity of the data, I want a small envelope protocol around each protobuf message. I thought about the following:
message type (1 byte)
message size (2 bytes)
protobuf message (N bytes)
(checksum; optional)
The message type will mostly be a mapping between messages defined in proto files. However, if a message gets corrupted or some bytes are lost, the message size will not be correct and all subsequent bytes cannot be interpreted anymore. One way to solve this would be the introduction of limiters between messages, but for that I need to choose something that is not used by protobuf. Is there a byte sequence that is never used by any protobuf message?
I also thought about a different way. If the master finds out that packages are corrupted, it should reset the communication to a clean start. For that I want the master to send a RESTART command to the slave. The slave should answer with an ACK and then start sending complete messages again. All bytes received between RESTART and ACK are to be discarded by the master. I want to encode ACK and RESTART as special messages. But with that approach I face the same problem: I need to find byte sequences for ACK and RESTART that are not used by any protobuf messages.
Maybe I am also taking the wrong approach - feel free to suggest other approaches to deal with lost bytes.
Is there a byte sequence that is never used by any protobuf message?
No; it is a binary serializer and can contain arbitrary binary payloads (especially in the bytes type). You cannot use sentinel values. Length prefix is fine (your "message size" header), and a checksum may be a pragmatic option. Alternatively, you could impose an artificial sentinel to follow each message (maybe a guid chosen per-connection as part of the initial handshake), and use that to double-check that everything looks correct.
One way to help recover packet synchronization after a rare problem is to use synchronization words in the beginning of the message, and use the checksum to check for valid messages.
This means that you put a constant value, e.g. 0x12345678, before your message type field. Then if a message fails checksum check, you can recover by finding the next 0x12345678 in your data.
Even though that value could sometimes occur in the middle of the message, it doesn't matter much. The checksum check will very probably catch that there isn't a real message at that position, and you can search forwards until you find the next marker.

What is BitTorrent peer (Deluge) saying?

I'm writing a small app to test out how torrent p2p works and I created a sample torrent and am seeding it from my Deluge client. From my app I'm trying to connect to Deluge and download the file.
The torrent in question is a single-file torrent (file is called A - without any extension), and its data is the ASCII string Test.
Referring to this I was able to submit the initial handshake and also get a valid response back.
Immediately afterwards Deluge is sending even more data. From the 5th byte it would seem like it is a bitfield message, but I'm not sure what to make of it. I read that torrent clients may send a mixture of Bitfield and Have messages to show which parts of the torrent they possess. (My client isn't sending any bitfield, since it is assuming not to have any part of the file in question).
If my understanding is correct, it's stating that the message size is 2: one for identifier + payload. If that's the case why is it sending so much more data, and what's that supposed to be?
Same thing happens after my app sends an interested command. Deluge responds with a 1-byte message of unchoke (but then again appends more data).
And finally when it actually submits the piece, I'm not sure what to make of the data. The first underlined byte is 84 which corresponds to the letter T, as expected, but I cannot make much more sense of the rest of the data.
Note that the link in question does not really specify how the clients should supply messages in order once the initial handshake is completed. I just assumed to send interested and request based on what seemed to make sense to me, but I might be completely off.
I don't think Deluge is sending the additional bytes you're seeing.
If you look at them, you'll notice that all of the extra bytes are bytes that already existed in the handshake message, which should have been the longest message you received so far.
I think you're reading new messages into the same buffer, without zeroing it out or anything, so you're seeing bytes from earlier messages again, following the bytes of the latest message you read.
Consider checking if the network API you're using has a way to check the number of bytes actually received, and only look at that slice of the buffer, not the entire thing.

Confusion around Bitfield Torrent

I'm a bit confuse about the bitfield message in bittorrent. I have noted the confusion in form of question below.
Optional vs Required
Bitfield to be sent immediately after the handshaking sequence is
completed
I'm assuming this is compulsory i.e after handshake there must follow a bitfield message. Correct?
When to expect bitfield?
The bitfield message may only be sent immediately after the
handshaking sequence is completed, and before any other messages are
sent
assuming I read this clear although be optional message. peer can still broadcast the bitfield message prior to any message (like request, choke, uncoke etc). correct ?
The high bit in the first byte corresponds to piece index 0
If I'm correct bitfield represent the state i.e whether or not the peer has a given piece with it.
Assuming that my bitfield is [1,1,1,1,1,1,1,1,1,1 ..]. I establish the fact that the peer has 10th piece missing and if the bitfield look like this [1,1,0,1,1,1,1,1,1,1 ..] the peer has a 3rd piece missing. Then what is the high bit in the first byte corresponds to piece index 0 means.
Spare bits
Spare bits at the end are set to zero
What does this mean ? I mean if have a bit at end as 0 does it not means that peers has that as missing piece. why is the spare bit used.
The most important of all what is the purpose of the bitfield.
My hunch on this is that bitfield make it easier to find the right peer for a piece knowing available with the peer but am i correct on this?
#Encombe
here how my bitfield payload looks like
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFE
I'm assuming this is compulsory i.e after handshake there must follow a bitfield message. Correct?
No, the bitfield message is optional, but if a client sends it, it MUST be the first message after the handshake.
Also, both peers must have sent their complete handshakes, (ie the handshaking sequence is completed), before anyone of them starts to send any type of regular messages including the bitfield message.
assuming I read this clear although be optional message. peer can still broadcast the bitfield message prior to any message (like request, choke, uncoke etc). correct ?
Yes, see above. If a client sends a bitfield message anywhere else the connection must be closed.
Assuming that my bitfield is [1,1,1,1,1,1,1,1,1,1 ..]. I establish the fact that the peer has 10th piece missing
No. It's unclear to me if your numbers is bits (0b1111111111) or bytes (0x01010101010101010101).
If it's bits (0b11111111): It means have pieces 0 to 9
If it's bytes (0x01010101010101010101): It means have pieces 7, 15, 23, 31, 39, 47, 55, 63, 71 and 79
if the bitfield look like this [1,1,0,1,1,1,1,1,1,1 ..] the peer has a 3rd piece missing.
No, pieces are zero indexed. 0b1101111111: means piece 2 is missing.
Then what is the high bit in the first byte corresponds to piece index 0 means.
It means that the piece with index 0 is represented by the leftmost bit. (Most significant bit in bigendian.)
. eight bits = one byte
. 0b10000000 = 0x80
. ^ high bit set meaning that the client have piece 0
. 0b00000001 = 0x01
. ^ low bit set meaning that the client have piece 7
why is the spare bit used
If the number of pieces in the torrent is not evenly divisible by eight; there will be bits over, that don't represent any pieces, in the last byte of the bitfield. Those bits must be set to zero.
The size of the bitfield in bytes can be calculated this way:
size_bitfield = math.ceil( number_of_pieces / 8 )
and the number of spare bits is:
spare_bits = 8 * size_bitfield - number_of_pieces
what is the purpose of the bitfield
The purpose is to tell what pieces the client has, so the other peer know what pieces it can request.

Handling of faulty ISO7816 APDUs

Are there any specifications in the Java Card API, RE or VM specs as to how the card must react to faulty ISO7816-4 APDUs (provided that such malformed APDUs are passed to the card at all)?
Are there different requirements for the APDU handling of applets?
If I were to send e.g. a (faulty) 3-byte long first interindustry APDU to the
card/applet - who should detect/report this error?
Who would detect/report a first interindustry APDU containing a bad LC
length field?
No, there is no generic specification that defines how to handle malformed APDU's.
In general you should always return a status word that is in a valid ISO 7816-3/4 range. Which one depends fully on the context. Generally you should try always to throw an ISOException with a logical status word on error conditions. You should try never to return a 6F00 status word, which is returned if the Applet.process() method exits with an exception other than ISOException. The most common (not all) ISO status words have been defined in the ISO7816 interface.
Unfortunately, ISO 7816-4 only provides some hints regarding which status words may be expected. On the other hand, unless the error is very specific (e.g. incorrect PIN), there is not too much a terminal can do if it receives a status word on a syntactically incorrect APDU (it is unlikely to fix an incorrect APDU command data field). Any specific status words should be defined by higher level protocols. ISO 7816-4 itself can only be used as a (rotten) foundation for other protocols. No clear rules for handling syntactic (wrong length) or semantic (wrong PIN) errors have been defined.
With regard to malformed APDU's: 3 byte APDU's won't be received by the Applet. Bytes with an incorrect Lc byte may be received. It would however be more logical if this would influence the transport layer in such a way that the transport layer either times out because it is expecting more data, or that spurious bytes are discarded. It cannot hurt to check and return a wrong length error, but please use the values of APDU.getIncomingLength() or APDU.setIncomingAndReceive() as final values for Ne if you decide to continue.

To pad or not to pad - creating a communication protocol

I am creating a protocol to have two applications talk over a TCP/IP stream and am figuring out how to design a header for my messages. Using the TCP header as an initial guide, I am wondering if I will need padding. I understand that when we're dealing with a cache, we want to make sure that data being stored fits in a row of cache so that when it is retrieved it is done so efficiently. However, I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
For example: I want to send over a message header consisting of a 3 byte field followed by a 1 byte padding field for 32 bit alignment. Then I will send over the message data.
In this case, the receiver will just take 3 bytes from the stream and throw away the padding byte. And then start reading message data. As I see it, he will not be storing the 3 bytes and the message data the way he wants. The whole point of byte alignment is so that it will be retrieved in an efficient manner. But if the retriever doesn't care about the padding how will it be retrieved efficiently?
Without the padding, the retriever just takes the 3 header bytes from the stream and then takes the data bytes. Since the retriever stores these bytes however he wants, how does it matter whether or not the padding is done?
Maybe I'm missing the point of padding.
It's slightly hard to extract a question from this post, but with what I've said you guys can probably point out my misconceptions.
Please let me know what you guys think.
Thanks,
jbu
If word alignment of the message body is of some use, then by all means, pad the message to avoid other contortions. The padding will be of benefit if most of the message is processed as machine words with decent intensity.
If the message is a stream of bytes, for instance xml, then padding won't do you a whole heck of a lot of good.
As far as actually designing a wire protocol, you should probably consider using a plain text protocol with compression (including the header), which will probably use less bandwidth than any hand-designed binary protocol you could possibly invent.
I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
If I'm a receiver, I might pass a buffer (i.e. an array of bytes) to the protocol driver (i.e. the TCP stack) and say, "give this back to me when there's data in it".
What I (the application) get back, then, is an array of bytes which contains the data. Using C-style tricks like "casting" and so on I can treat portions of this array as if it were words and double-words (not just bytes) ... provided that they're suitably aligned (which is where padding may be required).
Here's an example of a statement which reads a DWORD from an offset in a byte buffer:
DWORD getDword(const byte* buffer)
{
//we want the DWORD which starts at byte-offset 8
buffer += 8;
//dereference as if it were pointing to a DWORD
//(this would fail on some machines if the pointer
//weren't pointing to a DWORD-aligned boundary)
return *((DWORD*)buffer);
}
Here's the corresponding function in Intel assembly; note that it's a single opcode i.e. quite an efficient way to access the data, more efficient that reading and accumulating separate bytes:
mov eax,DWORD PTR [esi+8]
Oner reason to consider padding is if you plan to extend your protocol over time. Some of the padding can be intentionally set aside for future assignment.
Another reason to consider padding is to save a couple of bits on length fields. I.e. always a multiple of 4, or 8 saves 2 or 3 bits off the length field.
One other good reason that TCP has padding (which probably does not apply to you) is it allows dedicated network processing hardware to easily separate the data from the header. As the data always starts on a 32 bit boundary, it's easier to separate the header from the data when the packet gets routed.
If you have a 3 byte header and align it to 4 bytes, then designate the unused byte as 'reserved for future use' and require the bits to be zero (rejecting messages where they are not as malformed). That leaves you some extensibility. Or you might decide to use the byte as a version number - initially zero, and then incrementing it if (when) you make incompatible changes to the protocol. Don't let the value be 'undefined' and "don't care"; you'll never be able to use it if you start out that way.

Resources