While implementing Bittorrent prtotocol, to communicate with peers and get pieces run into problem with some incoming peer messages:
buffer of such messages contain about 200 "255" values and then, about 200 random numbers. The problem is that i can't find in specification definition for such payload. Type of message described by first or fourth byte in buffer, any way in my situation both of them are equal to 255, and there is no such type of message (available types are: 1-8, 16, 21-23)
Array representation of the buffer:
[255,255,255,255,255,239,254,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,255,255,255,223,255,255,255,255,255,255,255,255,255,255,255,255,255,254,255,255,255,239,255,254,237,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,239,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,251,255,255,255,255,255,255,255,255,255,255,255,255,253,191,255,255,255,255,255,255,253,255,255,255,255,255,255,255,255,255,255,255,249,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,254,191,255,127,255,247,255,255,255,255,255,255,255,255,255,255,255,255,255,0,0,0,5,4,0,0,3,11,0,0,0,5,4,0,0,5,196,0,0,0,5,4,0,0,1,186,0,0,0,5,4,0,0,2,102,0,0,0,5,4,0,0,2,95,0,0,0,5,4,0,0,6,7,0,0,0,5,4,0,0,4,30,0,0,0,5,4,0,0,4,190,0,0,0,5,4,0,0,4,189,0,0,0,5,4,0,0,2,47,0,0,0,5,4,0,0,1,19,0,0,0,5,4,0,0,0,28,0,0,0,5,4,0,0,0,223,0,0,0,5,4,0,0,2,75,0,0,0,5,4,0,0,4,33,0,0,0,5,4,0,0,1,31,0,0,0,5,4,0,0,1,100,0,0,0,5,4,0,0,6,24,0,0,0,5,4,0,0,3,181,0,0,0,5,4,0,0,4,94,0,0,0,5,4,0,0,2,99,0,0,0,5,4,0,0,6,44,0,0,0,5,4,0,0,0,74,0,0,0,5,4,0,0,6,9,0,0,0,1,1]
What you have is a bitfield message missing the beginning with the length, type and probably some of the data, 24 have messages and one unchoke message.
255,255,255,255,255,239,254,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,255,255,255,223,255,255,255,255,255,255,255,255,255,255,255,255,255,254,255,255,255,239,255,254,237,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,239,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,251,255,255,255,255,255,255,255,255,255,255,255,255,253,191,255,255,255,255,255,255,253,255,255,255,255,255,255,255,255,255,255,255,249,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,247,255,255,255,255,255,255,255,254,191,255,127,255,247,255,255,255,255,255,255,255,255,255,255,255,255,255,
0,0,0,5,4,0,0,3,11,
0,0,0,5,4,0,0,5,196,
0,0,0,5,4,0,0,1,186,
0,0,0,5,4,0,0,2,102,
0,0,0,5,4,0,0,2,95,
0,0,0,5,4,0,0,6,7,
0,0,0,5,4,0,0,4,30,
0,0,0,5,4,0,0,4,190,
0,0,0,5,4,0,0,4,189,
0,0,0,5,4,0,0,2,47,
0,0,0,5,4,0,0,1,19,
0,0,0,5,4,0,0,0,28,
0,0,0,5,4,0,0,0,223,
0,0,0,5,4,0,0,2,75,
0,0,0,5,4,0,0,4,33,
0,0,0,5,4,0,0,1,31,
0,0,0,5,4,0,0,1,100,
0,0,0,5,4,0,0,6,24,
0,0,0,5,4,0,0,3,181,
0,0,0,5,4,0,0,4,94,
0,0,0,5,4,0,0,2,99,
0,0,0,5,4,0,0,6,44,
0,0,0,5,4,0,0,0,74,
0,0,0,5,4,0,0,6,9,
0,0,0,1,1
A BitTorrent peer to peer connection consists of two unidirectional byte streams, one in each direction. When reading the data stream from the receive buffer, don't expect to get exactly one complete message per time. You must yourself split the stream to messages. Also, be prepared for that the responding peer likely will start to send messages immediately following after the end of the handshake.
Related
Some sources say that recv should have the max length possible of a message, like recv(1024):
message0 = str(client.recv(1024).decode('utf-8'))
But other sources say that it should have the total bytes of the receiving message. If the message is "hello":
message0 = str(client.recv(5).decode('utf-8'))
What is the correct way of using recv()?
Some sources say ... But other sources say ... message ...
Both sources are wrong.
The argument for the recv is the maximum number of bytes one want to read at once.
With an UDP socket this is the message size one want to read or larger, but a single recv will only return a single message anyway. If the given size is smaller than the message it will be pruned and the rest will be discarded.
With a TCP socket (the case you ask about) there is no concept of a message in the first place since TCP is a byte stream only. recv will simply return the number of bytes available for read, up to the given size. Specifically a single recv in a TCP receiver does not not need to match a single send in the sender. It might match and it often will match if the amount of data is small, but there is no guarantee and one should never rely on it.
... message0 = str(client.recv(5).decode('utf-8'))
Note that calling decode('utf-8') directly on the data returned by recv is a bad idea. One first need to be sure that all the expected data are read and only then call decode('utf-8'). If only part of the data are read the end of the read data could be in the middle of a character, since a single character in UTF-8 might be encoded in multiple bytes (everything except ASCII characters). If decode('utf-8') is called with an incomplete encoded character it will throw an exception and your code will break.
I am writing a BitTorrent client, where the application is receiving large blocks of data after requesting pieces from other peers. Sometimes the blocks are larger than piece-length of the torrent.
For example, where torrent piece-length 524288 bytes, some piece requests result in 1940718596 bytes long responses.
Also, the message seems valid as the length encoded in the first four bytes happens to be the same (that large num).
Question: What to do with that data, should I ignore the excess bytes (after piece-length)? Or, should I write the data into corresponding files? - what is concerning because it might override the next pieces!
The largest chunk of a piece the protocol allows in a piece message is 16 KB (16384 bytes). So if a peer sent a 1940718596 bytes (1.8 GB) long piece message, the correct response is to disconnect from it.
Also, if a peer sends a piece message that doesn't correspond to a request message you have sent earlier, you shall also disconnect from it.
A peer that receives a request message asking for more than a 16 KB chunk, shall also disconnect the requester. Requesting a whole piece in a single request message is NOT allowed.
A request message that goes outside the end of the piece, is of course, also NOT allowed.
While it's possible that you will encounter other peers that don't follow the protocol, the most likely when writing a new client, is that the error is on your side.
The most important tool you can use is WireShark. Look how other clients behave and compare with yours.
I need to transfer data over a serial port. In order to ensure integrity of the data, I want a small envelope protocol around each protobuf message. I thought about the following:
message type (1 byte)
message size (2 bytes)
protobuf message (N bytes)
(checksum; optional)
The message type will mostly be a mapping between messages defined in proto files. However, if a message gets corrupted or some bytes are lost, the message size will not be correct and all subsequent bytes cannot be interpreted anymore. One way to solve this would be the introduction of limiters between messages, but for that I need to choose something that is not used by protobuf. Is there a byte sequence that is never used by any protobuf message?
I also thought about a different way. If the master finds out that packages are corrupted, it should reset the communication to a clean start. For that I want the master to send a RESTART command to the slave. The slave should answer with an ACK and then start sending complete messages again. All bytes received between RESTART and ACK are to be discarded by the master. I want to encode ACK and RESTART as special messages. But with that approach I face the same problem: I need to find byte sequences for ACK and RESTART that are not used by any protobuf messages.
Maybe I am also taking the wrong approach - feel free to suggest other approaches to deal with lost bytes.
Is there a byte sequence that is never used by any protobuf message?
No; it is a binary serializer and can contain arbitrary binary payloads (especially in the bytes type). You cannot use sentinel values. Length prefix is fine (your "message size" header), and a checksum may be a pragmatic option. Alternatively, you could impose an artificial sentinel to follow each message (maybe a guid chosen per-connection as part of the initial handshake), and use that to double-check that everything looks correct.
One way to help recover packet synchronization after a rare problem is to use synchronization words in the beginning of the message, and use the checksum to check for valid messages.
This means that you put a constant value, e.g. 0x12345678, before your message type field. Then if a message fails checksum check, you can recover by finding the next 0x12345678 in your data.
Even though that value could sometimes occur in the middle of the message, it doesn't matter much. The checksum check will very probably catch that there isn't a real message at that position, and you can search forwards until you find the next marker.
I'm writing a small app to test out how torrent p2p works and I created a sample torrent and am seeding it from my Deluge client. From my app I'm trying to connect to Deluge and download the file.
The torrent in question is a single-file torrent (file is called A - without any extension), and its data is the ASCII string Test.
Referring to this I was able to submit the initial handshake and also get a valid response back.
Immediately afterwards Deluge is sending even more data. From the 5th byte it would seem like it is a bitfield message, but I'm not sure what to make of it. I read that torrent clients may send a mixture of Bitfield and Have messages to show which parts of the torrent they possess. (My client isn't sending any bitfield, since it is assuming not to have any part of the file in question).
If my understanding is correct, it's stating that the message size is 2: one for identifier + payload. If that's the case why is it sending so much more data, and what's that supposed to be?
Same thing happens after my app sends an interested command. Deluge responds with a 1-byte message of unchoke (but then again appends more data).
And finally when it actually submits the piece, I'm not sure what to make of the data. The first underlined byte is 84 which corresponds to the letter T, as expected, but I cannot make much more sense of the rest of the data.
Note that the link in question does not really specify how the clients should supply messages in order once the initial handshake is completed. I just assumed to send interested and request based on what seemed to make sense to me, but I might be completely off.
I don't think Deluge is sending the additional bytes you're seeing.
If you look at them, you'll notice that all of the extra bytes are bytes that already existed in the handshake message, which should have been the longest message you received so far.
I think you're reading new messages into the same buffer, without zeroing it out or anything, so you're seeing bytes from earlier messages again, following the bytes of the latest message you read.
Consider checking if the network API you're using has a way to check the number of bytes actually received, and only look at that slice of the buffer, not the entire thing.
I'm a bit confuse about the bitfield message in bittorrent. I have noted the confusion in form of question below.
Optional vs Required
Bitfield to be sent immediately after the handshaking sequence is
completed
I'm assuming this is compulsory i.e after handshake there must follow a bitfield message. Correct?
When to expect bitfield?
The bitfield message may only be sent immediately after the
handshaking sequence is completed, and before any other messages are
sent
assuming I read this clear although be optional message. peer can still broadcast the bitfield message prior to any message (like request, choke, uncoke etc). correct ?
The high bit in the first byte corresponds to piece index 0
If I'm correct bitfield represent the state i.e whether or not the peer has a given piece with it.
Assuming that my bitfield is [1,1,1,1,1,1,1,1,1,1 ..]. I establish the fact that the peer has 10th piece missing and if the bitfield look like this [1,1,0,1,1,1,1,1,1,1 ..] the peer has a 3rd piece missing. Then what is the high bit in the first byte corresponds to piece index 0 means.
Spare bits
Spare bits at the end are set to zero
What does this mean ? I mean if have a bit at end as 0 does it not means that peers has that as missing piece. why is the spare bit used.
The most important of all what is the purpose of the bitfield.
My hunch on this is that bitfield make it easier to find the right peer for a piece knowing available with the peer but am i correct on this?
#Encombe
here how my bitfield payload looks like
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFE
I'm assuming this is compulsory i.e after handshake there must follow a bitfield message. Correct?
No, the bitfield message is optional, but if a client sends it, it MUST be the first message after the handshake.
Also, both peers must have sent their complete handshakes, (ie the handshaking sequence is completed), before anyone of them starts to send any type of regular messages including the bitfield message.
assuming I read this clear although be optional message. peer can still broadcast the bitfield message prior to any message (like request, choke, uncoke etc). correct ?
Yes, see above. If a client sends a bitfield message anywhere else the connection must be closed.
Assuming that my bitfield is [1,1,1,1,1,1,1,1,1,1 ..]. I establish the fact that the peer has 10th piece missing
No. It's unclear to me if your numbers is bits (0b1111111111) or bytes (0x01010101010101010101).
If it's bits (0b11111111): It means have pieces 0 to 9
If it's bytes (0x01010101010101010101): It means have pieces 7, 15, 23, 31, 39, 47, 55, 63, 71 and 79
if the bitfield look like this [1,1,0,1,1,1,1,1,1,1 ..] the peer has a 3rd piece missing.
No, pieces are zero indexed. 0b1101111111: means piece 2 is missing.
Then what is the high bit in the first byte corresponds to piece index 0 means.
It means that the piece with index 0 is represented by the leftmost bit. (Most significant bit in bigendian.)
. eight bits = one byte
. 0b10000000 = 0x80
. ^ high bit set meaning that the client have piece 0
. 0b00000001 = 0x01
. ^ low bit set meaning that the client have piece 7
why is the spare bit used
If the number of pieces in the torrent is not evenly divisible by eight; there will be bits over, that don't represent any pieces, in the last byte of the bitfield. Those bits must be set to zero.
The size of the bitfield in bytes can be calculated this way:
size_bitfield = math.ceil( number_of_pieces / 8 )
and the number of spare bits is:
spare_bits = 8 * size_bitfield - number_of_pieces
what is the purpose of the bitfield
The purpose is to tell what pieces the client has, so the other peer know what pieces it can request.