I'm trying to understand the process inside BitTorrent and implement something similar. However, I still don't know the process of piece selection and transfer. How can each peer know that which peer has which file piece? Does each peer need to ask each other whether it has the requested piece or not all the time?
I have also heard about "rarest piece first" approach, which tries to get rarer piece before. How to get these information without wasting network traffic?
Thank you.
This is done with two messages: bitfield and have.
After two peers perform a handshake, they can optionally send bitfields as their first messages, which compactly represent all of their completed blocks.
Once a peer successfully downloads and verifies a block, they broadcast a have message to all of their connected peers, letting them know which block they just completed.
Related
I am following the Kristen Widman tutorial for writing a bittorrent client, as well as following the Wiki, however I have a point of confusion that I haven't been able to clear up from online resources.
So far I have been able to send a tracker request and receive a list of peers, whose IP addresses and ports I've stored like this:
[{'IP': IPv4Address('76.126.244.88'), 'port': 6881}, ... ]
Kristen suggests in her tutorial that I try to connect to a single peer first. I suppose I'd try to connect with the first peer in the list first, so far so good.
With regards to each peer, however, there are certain states like 'am_choking' , 'peer_choking', 'am_interested', 'peer_interested'. For each peer in my list, do I need to keep track of each of these states?
Yes, connections to each peer need to maintain an independent set of states to remember what the latest status sent by the remote is.
This is explicitly mentioned in the official bittorrent specification:
Connections contain two bits of state on either end: choked or not, and interested or not.
and on the wiki too:
A client must maintain state information for each connection that it has with a remote peer:
My understanding is that IPFS and Bittorrent Mainline DHT are built on top of a Distributed hash Table (Kademlia).
They use the file hash as Kademlia key to find a list of peer that might have this file.
1- What I don't understand is if this is all decentralized who remove from the DHT peer that no longer host a file content?
2- What prevent someone from storing large amount of data for free inside the DHT?
3- What prevent someone from disrupting the network by adding large number of invalid peer for a popular file.
4- What prevent a bad actor from joining the DHT ring and not following the routing protocol thus preventing discovery message from reaching correct nodes.
Not sure why this was downvoted. These are excellent questions.
1- What I don't understand is if this is all decentralized who remove
from the DHT peer that no longer host a file content?
I think that DHT entries are regularly re-broadcast. So if a peer goes away, its DHT entries will no longer be broadcast and the network will forget about the data it provides unless some other node has it.
2- What prevent someone from storing large amount of data for free
inside the DHT?
Unless you re-publish or somebody else is interested in the data, it will vanish. The amount of data that you can store directly in a DHT entry is limited. So you can make other nodes store some of your data by putting data directly into DHT entries, but the effort outweighs the benefits.
3- What prevent someone from disrupting the network by adding large
number of invalid peer for a popular file.
I think there are some mechanisms envisioned in IPFS to protect the DHT against attacks. However, I don't think the current implementation is all that sophisticated. I don't think that current IPFS would deal well with a large scale distributed DDOS attack.
4- What prevent a bad actor from joining the DHT ring and not
following the routing protocol thus preventing discovery message from
reaching correct nodes.
I think a single node would be insufficient to do much damage, because a node will ask multiple peers. You would have to have multiple nodes to do significant damage.
But IPFS as it is now would not survive a sophisticated attack by state actors.
I have a quite theoretical question about the scheme used in Hyperledger Fabric for the developers.
If a chaincode has been made by malicious node and only endorse some of the malicious nodes. Then, if a client's transaction triggers this code, the malicious nodes could manipulate with their response. Let us assume that all the responses are 1 while it should be 0. Then, if the client somehow accept their responses, it will be send through the rest of the transaction flow and eventually end up in the ledger. Hence, this incorrect result will be in the ledger.
Would this ever happen? Or did I misunderstood some parts?
For a theoretical question, the theoretical answer is no, it wouldn't happen. For a blockchain network, all peers on a channel that are involved in a transaction needs to have the same chaincode.
Also, even though a peer validates a transaction and sends those successful responses to the client. When the client submit these responses for the transaction to be committed, it will be validated by all peers involved in that transaction before being committed. Basically all parties are agreeing that this is correct at once.
If one node has a different value than all the other nodes do, then something is wrong.
You can follow this simplified explanation of the transaction flow here: http://hyperledger-fabric.readthedocs.io/en/latest/txflow.html, you can view especially step number 5.
When a peer connects to another peer that has all the pieces, how does the connecting peer know that what it connects to is actually a seed (and has all of the pieces)? Are there messages sent between them?
In clients like uTorrent, the peer seems to be aware of the download progress of each peer it has connected to as well.
How does it know all that? Does a peer derive that another peer is a seed if progress is 100% or are there actually specific messages for that? Which parts of the protocol deal with all this?
A peer knows if another peer is a seed if the other peer either:
sends a fully complete bitfield indicating that it has all the pieces in the torrent. - BEP3
sends a incomplete bitfield and then all have messages for the rest of the pieces it didn't have from the beginning. (This can either be that it is continuously downloading and completes the torrent or that it sends a lazy bitfield.) - BEP3
sends a have all message according to the Fast Extension - BEP6
sends upload only=1 according to the Extension for Partial Seeds - BEP21
Partial seed means that the peer has only downloaded parts of the torrent and don't want to download any more and is seeding what it has.
A peer reports its progress by continuously sending have messages.
This part of the protocol is called the Peer Wire Protocol.
As you can see in the spec, clients are supposed to exchange the bitfield message to tell the other which pieces they currently have. Regular have messages later update this, when a peer receives further pieces (that's the straightforward description anyway, the reality is messier, more on that later).
This is modified by the widely supported Fast Extension, in which peers can compress fully complete and fully empty bitfield messages to have all and have none.
It is also modified by Superseeding, in which seeds lie about the pieces they have in order to seed the initial swarm more efficiently. And in general peers could always lie, in particular they can pretend not to have pieces which they really do, and you'd never know for sure.
Which brings me back to the messier reality. Peers may choose not to send the have x to you if you told them you have x, because it won't make any difference in whether or not you will request x from them (you won't, because you already have it). On the other hand, that is bad for some optimizations such as prioritizing the upload of rare pieces and, in particular, Superseeding.
According to bittorent protocol specification:
The peer protocol refers to pieces of the file by index as described
in the metainfo file, starting at zero. When a peer finishes
downloading a piece and checks that the hash matches, it announces
that it has that piece to all of its peers.
Then, yes, messages are exchanged by peers so they may know what is available for download. The protocol "part" that deals with this is the Peer Protocol.
What mechanisms exist already for designing a P2P architecture, in which the different nodes do work separately, in order to split a task (say distributed rendering of a 3D image), but unlike torrents, they don't get to see, or hijack the contents of the packets being transmitted? Only the original task requester is entitled to view the? results of the complete task.
Any working implementations on that principle already?
EDIT: Maybe I formulated the question wrongly. The idea is that even when they are able to work on the contents of the separate packets being sent, the separate nodes never get the chance to assemble the whole picture. Only the one requesting the task is supposed to do this.
If you have direct P2P connections (no "promiscuous" or "multicasting" sort of mode), the receiving peers should only "see" the data sent to them, nothing else.
If you have relay servers on the way and you are worried that they can sniff the data, I believe encryption is the way to go.
What we do is that peer A transmits data to peer B in an S/MIME envelope: the content is signed with the Private Key of Peer A and encrypted with the public Key of Peer B.
Only peer B can decrypt the data and is guaranteed that peer A actually sent the data.
This whole process is costly CPU and byte wise and may not be appropriate for your application. It also requires some form of key management infrastructure: peers need to subscribe to a community which issues certificates for instance.
But the basic idea is there: asymetric encryption with a key or shared secret encryption.