BitTorrent p2p - What to do with very large blocks of data, read from piece message from a peer (larger than piece-length)? - p2p

I am writing a BitTorrent client, where the application is receiving large blocks of data after requesting pieces from other peers. Sometimes the blocks are larger than piece-length of the torrent.
For example, where torrent piece-length 524288 bytes, some piece requests result in 1940718596 bytes long responses.
Also, the message seems valid as the length encoded in the first four bytes happens to be the same (that large num).
Question: What to do with that data, should I ignore the excess bytes (after piece-length)? Or, should I write the data into corresponding files? - what is concerning because it might override the next pieces!

The largest chunk of a piece the protocol allows in a piece message is 16 KB (16384 bytes). So if a peer sent a 1940718596 bytes (1.8 GB) long piece message, the correct response is to disconnect from it.
Also, if a peer sends a piece message that doesn't correspond to a request message you have sent earlier, you shall also disconnect from it.
A peer that receives a request message asking for more than a 16 KB chunk, shall also disconnect the requester. Requesting a whole piece in a single request message is NOT allowed.
A request message that goes outside the end of the piece, is of course, also NOT allowed.
While it's possible that you will encounter other peers that don't follow the protocol, the most likely when writing a new client, is that the error is on your side.
The most important tool you can use is WireShark. Look how other clients behave and compare with yours.

Related

How to prepare the "correct" buffer size when receiving netlink responses?

Implementing a netlink (rtnetlink) module I ran into a problem:
Like for UDP, a part of the message (packet) is lost when the receive buffer is not big enough (e.g. when the buffer is 1024 bytes and you received 1024 bytes).
So I wondered how to prepare the "correct" receive buffer.
Initially I had the idea to MSG_PEEK just the nlmsghdr, then extract the message size from there, and next do the final receive.
As security measure I allocated one byte more, just to be able to complain if the receive size used the full buffer.
Unfortunately it did!
Is there an algorithm that will work, not needing a ridiculously huge receive buffer?
Example:
nlmsg_len was 1276, so I tried to receive 1277 bytes, and I did.
So on the next attempt I blindly added 2000 bytes to the receive buffer, and the result was 2552 bytes long (1276 bytes longer than expected).
As said above I think I cannot "continue" to receive a longer message using multiple recvs, so I must receive all at once.

Node js peerwire protocol implementation

While implementing Bittorrent prtotocol, to communicate with peers and get pieces run into problem with some incoming peer messages:
buffer of such messages contain about 200 "255" values and then, about 200 random numbers. The problem is that i can't find in specification definition for such payload. Type of message described by first or fourth byte in buffer, any way in my situation both of them are equal to 255, and there is no such type of message (available types are: 1-8, 16, 21-23)
Array representation of the buffer:
[255,255,255,255,255,239,254,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,255,255,255,223,255,255,255,255,255,255,255,255,255,255,255,255,255,254,255,255,255,239,255,254,237,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,239,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,251,255,255,255,255,255,255,255,255,255,255,255,255,253,191,255,255,255,255,255,255,253,255,255,255,255,255,255,255,255,255,255,255,249,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,254,191,255,127,255,247,255,255,255,255,255,255,255,255,255,255,255,255,255,0,0,0,5,4,0,0,3,11,0,0,0,5,4,0,0,5,196,0,0,0,5,4,0,0,1,186,0,0,0,5,4,0,0,2,102,0,0,0,5,4,0,0,2,95,0,0,0,5,4,0,0,6,7,0,0,0,5,4,0,0,4,30,0,0,0,5,4,0,0,4,190,0,0,0,5,4,0,0,4,189,0,0,0,5,4,0,0,2,47,0,0,0,5,4,0,0,1,19,0,0,0,5,4,0,0,0,28,0,0,0,5,4,0,0,0,223,0,0,0,5,4,0,0,2,75,0,0,0,5,4,0,0,4,33,0,0,0,5,4,0,0,1,31,0,0,0,5,4,0,0,1,100,0,0,0,5,4,0,0,6,24,0,0,0,5,4,0,0,3,181,0,0,0,5,4,0,0,4,94,0,0,0,5,4,0,0,2,99,0,0,0,5,4,0,0,6,44,0,0,0,5,4,0,0,0,74,0,0,0,5,4,0,0,6,9,0,0,0,1,1]
What you have is a bitfield message missing the beginning with the length, type and probably some of the data, 24 have messages and one unchoke message.
255,255,255,255,255,239,254,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,255,255,255,223,255,255,255,255,255,255,255,255,255,255,255,255,255,254,255,255,255,239,255,254,237,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,239,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,251,255,255,255,255,255,255,255,255,255,255,255,255,253,191,255,255,255,255,255,255,253,255,255,255,255,255,255,255,255,255,255,255,249,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,254,191,255,127,255,247,255,255,255,255,255,255,255,255,255,255,255,255,255,
0,0,0,5,4,0,0,3,11,
0,0,0,5,4,0,0,5,196,
0,0,0,5,4,0,0,1,186,
0,0,0,5,4,0,0,2,102,
0,0,0,5,4,0,0,2,95,
0,0,0,5,4,0,0,6,7,
0,0,0,5,4,0,0,4,30,
0,0,0,5,4,0,0,4,190,
0,0,0,5,4,0,0,4,189,
0,0,0,5,4,0,0,2,47,
0,0,0,5,4,0,0,1,19,
0,0,0,5,4,0,0,0,28,
0,0,0,5,4,0,0,0,223,
0,0,0,5,4,0,0,2,75,
0,0,0,5,4,0,0,4,33,
0,0,0,5,4,0,0,1,31,
0,0,0,5,4,0,0,1,100,
0,0,0,5,4,0,0,6,24,
0,0,0,5,4,0,0,3,181,
0,0,0,5,4,0,0,4,94,
0,0,0,5,4,0,0,2,99,
0,0,0,5,4,0,0,6,44,
0,0,0,5,4,0,0,0,74,
0,0,0,5,4,0,0,6,9,
0,0,0,1,1
A BitTorrent peer to peer connection consists of two unidirectional byte streams, one in each direction. When reading the data stream from the receive buffer, don't expect to get exactly one complete message per time. You must yourself split the stream to messages. Also, be prepared for that the responding peer likely will start to send messages immediately following after the end of the handshake.

Unused bytes by protobuf implementation (for limiter implementation)

I need to transfer data over a serial port. In order to ensure integrity of the data, I want a small envelope protocol around each protobuf message. I thought about the following:
message type (1 byte)
message size (2 bytes)
protobuf message (N bytes)
(checksum; optional)
The message type will mostly be a mapping between messages defined in proto files. However, if a message gets corrupted or some bytes are lost, the message size will not be correct and all subsequent bytes cannot be interpreted anymore. One way to solve this would be the introduction of limiters between messages, but for that I need to choose something that is not used by protobuf. Is there a byte sequence that is never used by any protobuf message?
I also thought about a different way. If the master finds out that packages are corrupted, it should reset the communication to a clean start. For that I want the master to send a RESTART command to the slave. The slave should answer with an ACK and then start sending complete messages again. All bytes received between RESTART and ACK are to be discarded by the master. I want to encode ACK and RESTART as special messages. But with that approach I face the same problem: I need to find byte sequences for ACK and RESTART that are not used by any protobuf messages.
Maybe I am also taking the wrong approach - feel free to suggest other approaches to deal with lost bytes.
Is there a byte sequence that is never used by any protobuf message?
No; it is a binary serializer and can contain arbitrary binary payloads (especially in the bytes type). You cannot use sentinel values. Length prefix is fine (your "message size" header), and a checksum may be a pragmatic option. Alternatively, you could impose an artificial sentinel to follow each message (maybe a guid chosen per-connection as part of the initial handshake), and use that to double-check that everything looks correct.
One way to help recover packet synchronization after a rare problem is to use synchronization words in the beginning of the message, and use the checksum to check for valid messages.
This means that you put a constant value, e.g. 0x12345678, before your message type field. Then if a message fails checksum check, you can recover by finding the next 0x12345678 in your data.
Even though that value could sometimes occur in the middle of the message, it doesn't matter much. The checksum check will very probably catch that there isn't a real message at that position, and you can search forwards until you find the next marker.

What is BitTorrent peer (Deluge) saying?

I'm writing a small app to test out how torrent p2p works and I created a sample torrent and am seeding it from my Deluge client. From my app I'm trying to connect to Deluge and download the file.
The torrent in question is a single-file torrent (file is called A - without any extension), and its data is the ASCII string Test.
Referring to this I was able to submit the initial handshake and also get a valid response back.
Immediately afterwards Deluge is sending even more data. From the 5th byte it would seem like it is a bitfield message, but I'm not sure what to make of it. I read that torrent clients may send a mixture of Bitfield and Have messages to show which parts of the torrent they possess. (My client isn't sending any bitfield, since it is assuming not to have any part of the file in question).
If my understanding is correct, it's stating that the message size is 2: one for identifier + payload. If that's the case why is it sending so much more data, and what's that supposed to be?
Same thing happens after my app sends an interested command. Deluge responds with a 1-byte message of unchoke (but then again appends more data).
And finally when it actually submits the piece, I'm not sure what to make of the data. The first underlined byte is 84 which corresponds to the letter T, as expected, but I cannot make much more sense of the rest of the data.
Note that the link in question does not really specify how the clients should supply messages in order once the initial handshake is completed. I just assumed to send interested and request based on what seemed to make sense to me, but I might be completely off.
I don't think Deluge is sending the additional bytes you're seeing.
If you look at them, you'll notice that all of the extra bytes are bytes that already existed in the handshake message, which should have been the longest message you received so far.
I think you're reading new messages into the same buffer, without zeroing it out or anything, so you're seeing bytes from earlier messages again, following the bytes of the latest message you read.
Consider checking if the network API you're using has a way to check the number of bytes actually received, and only look at that slice of the buffer, not the entire thing.

Nodejs UDP socket send with array larger than MTU size

I am currently using Nodejs to send UDP packets to Arduino with a Wiznet820io. I have successfully managed to send smallish byte arrays (500 bytes in length) however when I try to send to send a byte array greater than 1470 bytes I get nothing on the Arduino. I did some research and determined that it was silently failing due to the MTU size restriction.
So I attempted to split up the data array into multiple arrays not exceeding 1470 bytes and send them via a basic for loop. However, when doing this I noticed that only the first packet would get sent over unless I wait ~10ms before sending the next packet. I believe that this is due to the send function attempting to send the data before the previous data has been sent, However, i may be wrong in my understanding. Having the delay significantly reduces the speed of the server which is an issue as I am attempting to stream video.
Is there a proper way to parse and send packets over the UDP stream using dgram.send? Am I correct in my understanding of why only the first packet was retrieved? I am new to sockets so any help would be awesome :D
Cheers
Steve

Resources