How to reconstruct files transferred using uTP? - bittorrent

I have read from an article that bit torrent uses uTorrent Transport Protocol.Also as far as I understood, if I am downloading a file using bit torrent, the different pieces can come from different peers. All these packets have the same connection-id. But how can I understand the order in which these packets arrived?
For an e.g., Let P1,P2 and P3 be the peers from which I can get my file. D1 be my system. Then first portion of the file came from P2, second from P1 and third from P3. Is there any way to find which part came from from which system so that I can reconstruct the file from the captured packets?
Thank You.

The order of the individual uTP packets doesn't matter. The uTP protocol takes care of reconstructing the order of the transported stream.
It's not necessary to know from which system torrent 'piece' messages originate to reconstruct a file. By utilizing the data in the metainfo for a torrent, and 'piece' messages per the bittorrent peer protocol it's possible to create the intended files within a torrent.
To avoid confusion, I think you will benefit from knowing that uTP is a level of abstraction below the peer protocol in use with each peer.

Related

IPFS search file mechanism

I am using IPFS(Inter Planetary File System) to store documents/files in a decentralized manner.
In order to search a file from the network, is there a record of all the hashes on the network(like leeches)?
How does my request travel through the network?
Apologies, but it's unclear to me if you intend to search the contents of files on the network or to just search for files on the network. I'm going to assume the latter, please correct me if that's wrong.
What follows is a bit of an oversimplification, but here it goes:
In order to search a file from the network, is there a record of all the hashes on the network(like leeches)?
There is not a single record, no. Instead, each of the ipfs nodes that makes up the network holds a piece of the total record. When you add a block to your node, the node will announce to the network that it will provide that block if asked to. The process of announcing means letting a number of other ipfs nodes in the network know that you have that block. Essentially, your node asks its peers who ask their peers, and so on, until you find some nodes with ids that are near the hash of the block. Near could be measured using something simple like xor.
The important thing to understand is that, given the hash for a block, your node finds other ipfs nodes in the network that have ids which are similar to the hash of the block, and tells them "if anyone asks, I have the block with this hash". This is important because someone who wants to go find the content for the same hash can use the same process to find nodes that have been told where the hash can be retrieved from.
How does my request travel through the network?
Basically the reverse of above.
You can read more about ipfs content routing in the following:
https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf
https://discuss.ipfs.io/t/how-does-resolution-and-routing-work-with-ipfs/365/3

WinSock: Check if transfered file is complete

I'm planning to make a file transport using sockets(TCP) on Windows with C++. Hence it would be quite convenient to see if the transferred file has been received completely and correctly. What would be the best (and maybe also the easiest) way to do this?
after all of the Bytes are sent everything has been received correctly - TCP will ensure that : https://en.wikipedia.org/wiki/Transmission_Control_Protocol
Do not try to re-calculate checksums of individual packets or something, you will introduce errors. TCP is somewhat reliable, every transfer is automatically segmented, padded with checksums, reassembled and checked for a matching checksum - thats a pretty reliable transport protocol right there, it will work out-of-the-box
If you're really paranoid or simply need to create digital proof of transmission you need to choose another protocol entirely - something like SCTP, perhaps

BitTorrent Optimistic Unchoke/Bandwith probing

While thinking about how BitTorrent works, a few questions come to my mind. Would appreciate if somebody can share a few possible responses.
Suppose a BitTorrent gets 50 peers from the tracker and then it establishes connection with 20 of them to form the peer-set. Is this peer-set randomly selected or based on their bandwith? (i understand that the peers which will be unchoked are selected based on their offered bandwidth) Subsequently, how is this bandwidth determined for each connection (a ping can give us the latency but not the bandwidth i assume)
The optimistic unchoke leads to the problem of free-riders in the system. Considering an unchoke might not always result in better peers, why is not possible to discard this policy at all? (I assume this policy helps peers with slow bandwith to fulfill requests, why cannot BitTorrent adopt a policy to probe the bandwith of the optimistic peer without sending data packets; and have another (maybe the 5th connection) for low-bandwidth peers so that they don't starve. This 5th channel will transmit at only a fraction of the bandwith compared to the other 4 channels) This may at least discourage free-riding?
traditionally the peers are selected at random. Some clients may have had weak biases based on previous interactions with the peers or CIDR distance. However, there is a recent proposal (which uTorrent and libtorrent implemens) suggests a consistent but uniformly distributed peer selection/priority algorithm. For more information, see this blog post. The unchoke algorithm is triggered every 15 seconds. The peers are then sorted by the number of bytes they sent during the last 15 seconds. The ones sending the most are then unchoked, and the rest are choked. So, the download rate is the 15 second average.
If you don't optimistically unchoke peers, there's no way for you to prove to them that you are better than the other peers in their unchoke set, and they will never unchoke you back. Without optimistic unchokes (also assuming you don't have the allow-fast extension), there is no way to start a download. When you first join, you won't have any pieces, you can't trade the first piece, you have to rely on being optimistically unchoked. Estimating someone's bandwidth without sending bulk data is hard and probably unreliable. Even if you got a good estimate of someones capacity, that wouldn't necessarily mean that capacity was available to you. The current mechanism is very robust in that it doesn't need to make assumptions about the network equipment between the peers (like the packet-train bandwidth estimation needs to do) and it looks at actual data.

Packet sniffing with Channel hopping in linux

I want to scan the WiFi on b/g interface, and I want to sniff packets on each channel, by spending 100 ms on each channel. One of the biggest requirements I have is not to store the packets I get (because of less disk space), my application will parse the packets, retrieve Tx MAC and RSSI, and would construct the list (MAC, Avg RSSI, #Records) at the end of every minute, and then clear this list and start over again.
I've figured out two ways to do channel hop on linux:
Option 1: Use wi_set_channel(struct wif *, channel number) system call in C, and write the code in C to sniff all the packets
Option 2: Use linux command iw dev wlan0 set channel 4, and use any language like python+scapy OR C to sniff the packets
I'd like to know which is more efficient of the two, if at all, so that the delay/wait for WiFi interface to switch to a different channel is minimal. I suspect that this delay would mean loss of packet while the switch to a different channel happens, is that the case?
I would also like to know some of the other ways to solve this problem in linux.
Answer to your first question us straight forward, use Option1 and have two threads doing the work - one thread populating an in-memory circular buffer with packets collected from channels and second thread processing them in sequence. You can determine best packet discarding algo depending on the measured performance of processing thread and other factors if any.
As for the second question, I would go with the above for being in complete control on exactly how you can tune the algorithm rather than depending on canned processing tools.

Bittorrent : Why value of peers field is binary , not Bencoded list?

I'm trying to implement Bittorent in C. First of all, before writing a code snippet, I tried to used a web browser to send the following message(URL) to the tracker server.
you may try this URL.
http://torrent.ubuntu.com:6969/announce?info_hash=%9ea%80%ed%e7/%c4%ae%c8%de%8c%b0C%81c%fbq%3cJ%22&peer_id=M7-3-5--%eck%a8%2a%7f%e6%3ah%84%f2%9d%c5&port=43611&uploaded=0&downloaded=0&left=0&corrupt=0&key=00BA7F86&event=started&numwant=4&compact=0&no_peer_id=0
I have downloaded the torrent file from this link which is named xubuntu-13.04-desktop-i386.iso and has 9e6180ede72fc4aec8de8cb0438163fb713c4a22 as SHA-1 value.
However, after sending above request, I get
HTTP/1.0 200 OK
d8:completei357e10:incompletei8e8:intervali1800e5:peers24:l\262j"\310Հp\226\310\325G?\205^%!\221x \364\367\357e
But Bittorent specification says
peers : The value is a list of dictionaries, each with the following keys
-peer id
peer's self-selected ID, as described above for the tracker request (string)
-ip
peer's IP address (either IPv6 or IPv4) or DNS name (string)
-port
peer's port number (integer)
Why value of peers field is binary, not Bencoded list?
Thank you in advance.
the peers value may be a string consisting of multiples of 6 bytes. First 4 bytes are the IP address and last 2 bytes are the port number. All in network (big endian) notation.
https://wiki.theory.org/BitTorrentSpecification#Tracker_HTTP.2FHTTPS_Protocol
The protocol you refer to was used in the early days of bittorrent. However, as some trackers became increasingly popular without being scaled out significantly in terms of capacity, the size of the tracker responses became significant. One measure to deal with this was for clients to accept gzipped HTTP responses and the compact peer responses (which is by far the most popular format among trackers these days). The compact peer response provides a significantly smaller response with the same amount of information. It's defined in BEP23.
However, even though the responses are relatively small now, the TCP handshake and teardown still imposes a significant const, this is why many trackers are moving over to UDP BEP15.

Resources