Understanding Bittorrent Tracker Request - bittorrent

I'm reading Bittorrent Request parameter that need to be sent to announce URL over here
Question 1:
left params:
left: The number of bytes this client still has to download in base
ten ASCII. Clarification: The number of bytes needed to download to be
100% complete and get all the included files in the torrent.
also BEEP-3 states
The number of bytes this peer still has to download, encoded in base
ten ascii. Note that this can't be computed from downloaded and the
file length since it might be a resume, and there's a chance that some
of the downloaded data failed an integrity check and had to be
re-downloaded.
Now, if I'm starting my torrent download or anytime what possible value should I give to left.
Question 2:
While reading through spec no where I found how often the client should query for announce to get the update lists of peers.
Any word on that
I found this answer to this in term of interval and min interval in the Tracker response.

Answer 1:
The left= value you send in an announce is the minimum number of bytes left you need to download to fully get all the pieces in the torrent, no matter if you intend to download all of the files in it or not.
So if you start to download a torrent from scratch that all the files in it has
the total size of: 1 234 567 890 bytes:
You send in the first announce: left=1234567890, downloaded=0 and uploaded=0
Even if only want to download 567 890 123 bytes from that torrent:
You still send in the first announce: left=1234567890, downloaded=0 and uploaded=0
Then, when it's time for the second announce, you have succeeded to download 234 524 288 bytes without any hash fails and uploaded 87 654 400 bytes to other peers;
You send in the second announce:
left=1000043602, downloaded=234524288 and uploaded=87654400
Third announce, you have succeeded to download 258 786 432 bytes more that passed the hash check and uploaded 98 762 752 bytes more to other peers, but this time there was 3 hash fails (piece size: 262 144 bytes);
You send in the third announce:
left=741257170, downloaded=493310720, uploaded=186417152 and corrupt=786432
(A client that don't send corrupt would instead send downloaded=494097152)
Fourth announce: [TODO]
Reservation: Other than the offical BEP3, how this is done is mostly undocumented conventions and above answer is based on best effort checking common clients with Wireshark.

Related

Specification of client uploads in (Tight)VNC/RFB?

I'm looking to understand how file transfers working in VNC/TightVNC/RFB.
In https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst#serverinit I see there is mention of certain client messages that look relevant if using the Tight Security Type, e.g.
132 "TGHT" "FTC_UPRQ" File Upload Request
133 "TGHT" "FTC_UPDT" File Upload Data
But I don't see detail on how these messages are used in the protocol
At https://www.tightvnc.com/ there is lots of information on usage, but so far not found anything about the protocol itself.
How do the file transfers work? As in, what are the low-level details of the messages sent in both directions to initiate and complete an upload from the client to the server?
(Ultimately I am looking to implement this, say in NoVNC, but I'm quite a few steps away from any coding at this point)
Looking in the source for UltraVNC there is another protocol, based on message type of 7 to initiate a file transfer. This is part of the RFB specification, although details are not given beyond including that messages of type 7 related to "FileTransfer"
This a very partial answer from looking at the code of
https://github.com/LibVNC/libvncserver/blob/5deb43e2837e05e5e24bd2bfb458ae6ba930bdaa/libvncserver/tightvnc-filetransfer/rfbtightserver.c
https://github.com/LibVNC/libvncserver/blob/5deb43e2837e05e5e24bd2bfb458ae6ba930bdaa/libvncserver/tightvnc-filetransfer/handlefiletransferrequest.c
https://github.com/TurboVNC/tightvnc/blob/a235bae328c12fd1c3aed6f3f034a37a6ffbbd22/vnc_winsrc/vncviewer/FileTransfer.cpp
I think that an upload from the client is initiated by the client:
client -> server, 1 byte = 132: messages type of a file upload request
client -> server, 1 byte: compression level, where 0 is not compressed, and I don't think libvnc supports anything other than 0(?)
client -> server, 2 bytes big endian integer: the length of the file name
client -> server, 4 bytes big endian: the "position" - not sure what this is, but I suspect libvnc either ignores it, or there is a bug in libvnc where on little endian systems (e.g. intel) this might break in some situations if this isn't zero since there seems to be some code that assumes it's 2 bytes https://github.com/LibVNC/libvncserver/blob/5deb43e2837e05e5e24bd2bfb458ae6ba930bdaa/libvncserver/tightvnc-filetransfer/handlefiletransferrequest.c#L401.
When uploading, TightVNC's "legacy" code also seems to set this as zero https://github.com/TurboVNC/tightvnc/blob/a235bae328c12fd1c3aed6f3f034a37a6ffbbd22/vnc_winsrc/vncviewer/FileTransfer.cpp#L552
client -> server, "length of the file name" bytes: the file name itself
I'm not sure how the server responds to say "yes, that's fine". See below for how the server can say "that's not fine" and abort
And then I think that uploads are done in max 64k chunks (num of bytes are limited to 16 bits). So each chunk is:
client -> server, 1 byte = 133: message type of file upload data request
client -> server, 1 byte: compression level, where 0 is not compressed, and I don't think libvnc supports anything other than 0(?)
client -> server, 2 bytes big endian: the uncompressed(?) size of the upload data
client -> server, 2 bytes big endian: the compressed size of the upload data. I think for libvnc since compression isn't supported, this has to equal the uncompressed size
client -> server, "compressed size" bytes: the data of the current chunk the file itself
not sure how the server acknowledges that this is all fine
Then once all the data has been uploaded, there is a final empty chunk followed by the modification/access time of the file:
client -> server, 1 byte = 133: message type of file upload data request
client -> server, 1 byte: compression level, where 0 is not compressed, and I don't think libvnc supports anything other than 0(?)
client -> server, 2 bytes = 0: the uncompressed(?) size of the upload data
client -> server, 2 bytes = 0: the compressed size of the upload data
client -> server, 4 bytes: the modification and access time of the file. libvnc sets both to be the same, and interestingly there doesn't seem to be a conversion from the endianness of the messages to the endianness of the system.
and as per the other parts of upload, not sure how the server acknowledges that this has been successful
If the client wants to cancel the upload:
client -> server, 1 byte = 135: message type of file upload failed
client -> server, 1 byte: unused(?)
client -> server, 1 byte: length of reason
client -> server, "length of reason" bytes: human readable reason of why the upload was cancelled
If the server wants to fail the upload:
server -> client, 1 byte = 132: message type of file upload cancel
server -> client, 1 byte: unused(?)
server -> client, 1 byte: length of reason
server -> client, "length of reason" bytes: human readable reason of why the upload failed
It does seem odd that there doesn't seem a way for the server to acknowledge any sort of success, so the client can't really give the user a "yes, it's worked!" sign. At least, not one with high confidence that everything has indeed worked.
It also does look like at most 1 upload at a time is possible: there is no ID or anything like that to distinguish multiple files being uploaded at the same time. Although given this would all be over the same TCP connection (typically) there would probably not be any speed benefit of this.
Looking in the source for TightVNC, it looks like (confusingly) TightVNC itself doesn't seem to support the 132 "TGHT" / 133 "TGHT" messages.
Instead, it has a sub-protocol based on messages of type 252 (0xFC). In this sub-protocol, the message types are 4 byte integers: 252 in the first byte, and then 3 more, as per the comment in FTMessage.h
read first byte (UINT8) as message id, but if first byte is equal to 0xFC then it's TightVNC extension message and must read next 3 bytes and create UINT32 message id for processing.
At first glance on a high level it looks similar to the one in libvnc, but it does appear to include more server acknowledgements. For example, in response to a request to start an upload, the server will reply with a message to type 0xFC000107 to say "yes, that's fine" (I think)

Strange behavior of OTP gen_tcp with settings {packet,4} and using NodeJS "frame-stream'' for TCP communication

I've been struggling for a while to get my messages framed correctly between my NodeJS server and my erlang gen_tcp server. I was using {packet,line} successfully until I had to send large data messages and needed to switch to message size framing.
I set gen_tcp to {packet,2}
and I'm using the library from:
https://github.com/davedoesdev/frame-stream
for the NodeJS tcp decode side. It is ALSO set to packet size option 2
and I have tried packet size option 4.
I saw for any messages with a length under 127 characters this setup works well, but any messages longer than this has a problem.
I ran a test by sending longer and longer messages from gen_tcp and then reading out the first four bytes received on the NodeJS side:
on message 127:
HEADER: 0 0 0 127
Frame length 127
on message 128:
HEADER: 0 0 0 239 <----- This should be 128
Frame length 239 <----- This should be 128
Theories:
Some character encoding mismatch since it's on the number 128 (likely?)
Some error in either gen_tcp or the library (highly unlikely?)
Voodoo magic curse that makes me work on human-rights day (most likely)
Data from wireshark shows the following:
The header bytes are encoded properly by gen_tcp past 128 characters since the hex values proceed as follows:
[00][7e][...] (126 length)
[00][7f][...] (127 length)
[00][80][...] (128 length)
[00][81][...] (129 length)
So it must be that the error lies when the library on the NodeJS side calls the Node readUInt16BE(0) or readUInt32BE(0) functions. But I checked the endieness and both are big-endian.
If the header bytes are [A,B] then, in binary, this error occurs after
[00000000 01111111]
In other words, readUInt16BE(0) reads [000000000 10000000] as 0xef ? which is not even an endian option...?
Thank you for any help in how to solve this.
Kind Regards
Dale
I figured it out, the problem was caused by setting the socket to receive on UTF-8 encoding which supports ascii up to 127.
Dont do this: socket.setEncoding('utf8').
It seems obvious now but that one line of code is hard to spot.

BitTorrent p2p - What to do with very large blocks of data, read from piece message from a peer (larger than piece-length)?

I am writing a BitTorrent client, where the application is receiving large blocks of data after requesting pieces from other peers. Sometimes the blocks are larger than piece-length of the torrent.
For example, where torrent piece-length 524288 bytes, some piece requests result in 1940718596 bytes long responses.
Also, the message seems valid as the length encoded in the first four bytes happens to be the same (that large num).
Question: What to do with that data, should I ignore the excess bytes (after piece-length)? Or, should I write the data into corresponding files? - what is concerning because it might override the next pieces!
The largest chunk of a piece the protocol allows in a piece message is 16 KB (16384 bytes). So if a peer sent a 1940718596 bytes (1.8 GB) long piece message, the correct response is to disconnect from it.
Also, if a peer sends a piece message that doesn't correspond to a request message you have sent earlier, you shall also disconnect from it.
A peer that receives a request message asking for more than a 16 KB chunk, shall also disconnect the requester. Requesting a whole piece in a single request message is NOT allowed.
A request message that goes outside the end of the piece, is of course, also NOT allowed.
While it's possible that you will encounter other peers that don't follow the protocol, the most likely when writing a new client, is that the error is on your side.
The most important tool you can use is WireShark. Look how other clients behave and compare with yours.

Reversing the checksum based on the data

I'm trying to reverse the binary protocol of embedded device. I was able to capture 4 samples of data packages, here they are in hex encoded form:
5E:A1:00:10:10:00:00:02:01:05:F0:F6:4B:00:01:03
5E:A1:00:10:10:00:00:06:01:93:79:DA:F9:00:01:07
5E:A1:00:10:10:00:00:03:01:C9:B1:F0:81:00:01:04
5E:A1:00:10:10:00:00:04:01:A3:BE:2A:3A:00:01:05
Based on other packages I can assert the following:
First 6 bytes (5E:A1:00:10:10:00) - message header, it's static across all other messages.
Next 2 bytes (00:02 or 00:06 or 00:03 or 00:04) - the message numeric id, int16. It's different from message to message.
Next 4 bytes (05:F0:F6:4B or 93:79:DA:F9 or C9:B1:F0:81 or A3:BE:2A:3A) is a checksum of a message. It depends on the data and the message number. I tried that by forming the package manually: when I update bytes in data area of a message or the message number, but not the checksum - the message gets declined by the remote server.
Everything else is just a data of variable length.
My question is: how can I understand the algorithm used for the checksum generation? Is there any software which can be used for that kind of purpose? For example, I input the mask in the data and it tries to guess the algorithm.
If it is a CRC, then reveng may be able to deduce the parameters of the CRC, given enough examples.

How many bytes of memory is a tweet?

140 characters. How much memory would it take up ?
I'm trying to calculate how many tweets my EC2 Large instance Mongo DB can hold.
Twitter uses UTF-8 encoded messages.
UTF-8 code points can be up to six four octets long, making the maximum message size 140 x 4 = 560 8-bit bytes.
This is, of course, just for the raw messages, excluding storage overhead, indexing and other storage-related padding.
e: Twitter successfully let me post the message:
™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™
Yes, that's 140 trademark symbols, which are three octets each in UTF-8
Back in September, an engineer at Twitter gave a presentation that suggested it's about 200 bytes per tweet.
Of course you still have to account for overhead for your own metadata and the database itself, but 200 bytes/record is probably a good place to start.
Typically it's two bytes per character if you're storing Unicode as UTF-8, so that would mean 280 bytes max per tweet.
Probably 284 bytes in memory ( 4 byte length prefix + length*2). Inside the DB I cannot say but probably 280 if the DB is UTF-8, you could add some bytes of overhead, for metadata etc.
Potentially of interest:
http://mehack.com/map-of-a-twitter-status-object
Anatomy of a Twitter Status Object
Also more about twitter character encoding:
http://dev.twitter.com/pages/counting_characters
It's technically stored as UTF-8, and in reality, the slide deck from a tweeter guy here http://www.slideshare.net/raffikrikorian/twitter-by-the-numbers gives the real stat about it:
140 characters, ~200 bytes

Resources