How do torrent clients ensure changes made to a file don't affect file transmission? - p2p

Let's say there are two peers A and B. In oversimplified terms, when A requests a file from B, how does the torrent client on B ensure that changes made to the file by the user doesn't affect the file transmission? Does it copy the file to a temp folder before it starts sending chunks to the receiver peer or does it take a OS level write-lock of some sort and retains it till the transmission is completed?

Files are hashed at creation time and the hashes are stored in the torrent file. The hashes are checked at download time.
You can find more details in the core bittorrent protocol specification.
What happens when a hash mismatch is detected is up to the implementation. Common approaches are to attempt to download the failed piece from a different peer and report hash failure statistics to the user. Optionally the sending peer may also check its own data and report that to its user, but the main intent of the specification is for the receiver to verify the data since it cannot trust the sender anyway.

Related

Is it possible to get a file on my host in order to be used by a chaincode in Hyperledger Fabric?

I'm working on Hyperledger Fabric and I'd like to know if it is possible to save a text file on my host that will be retrieved by the chaincode to be used. In this way I would obtain data resisting to procedures of restarting and anytime it is required, I could use.
You can have data in file on local disk and read it in chaincode, but remember chain code executes on every peer, and so the file would be read by each peer, not just once. Imagine you have 2 orgs and 3 peers per org, then we have 6 peers in total. So, this means that if you wrote a code to read the file, the file would be read by each peer once, i.e six times (assuming all peers are on same box).
Static data which can be read from file is best fed to chain code as parameters to avoid a file being read multiple times. This can easily cause resource contention (disk chokes and whole system performance goes down).

How does a peer know another peer is a seed?

When a peer connects to another peer that has all the pieces, how does the connecting peer know that what it connects to is actually a seed (and has all of the pieces)? Are there messages sent between them?
In clients like uTorrent, the peer seems to be aware of the download progress of each peer it has connected to as well.
How does it know all that? Does a peer derive that another peer is a seed if progress is 100% or are there actually specific messages for that? Which parts of the protocol deal with all this?
A peer knows if another peer is a seed if the other peer either:
sends a fully complete bitfield indicating that it has all the pieces in the torrent. - BEP3
sends a incomplete bitfield and then all have messages for the rest of the pieces it didn't have from the beginning. (This can either be that it is continuously downloading and completes the torrent or that it sends a lazy bitfield.) - BEP3
sends a have all message according to the Fast Extension - BEP6
sends upload only=1 according to the Extension for Partial Seeds - BEP21
Partial seed means that the peer has only downloaded parts of the torrent and don't want to download any more and is seeding what it has.
A peer reports its progress by continuously sending have messages.
This part of the protocol is called the Peer Wire Protocol.
As you can see in the spec, clients are supposed to exchange the bitfield message to tell the other which pieces they currently have. Regular have messages later update this, when a peer receives further pieces (that's the straightforward description anyway, the reality is messier, more on that later).
This is modified by the widely supported Fast Extension, in which peers can compress fully complete and fully empty bitfield messages to have all and have none.
It is also modified by Superseeding, in which seeds lie about the pieces they have in order to seed the initial swarm more efficiently. And in general peers could always lie, in particular they can pretend not to have pieces which they really do, and you'd never know for sure.
Which brings me back to the messier reality. Peers may choose not to send the have x to you if you told them you have x, because it won't make any difference in whether or not you will request x from them (you won't, because you already have it). On the other hand, that is bad for some optimizations such as prioritizing the upload of rare pieces and, in particular, Superseeding.
According to bittorent protocol specification:
The peer protocol refers to pieces of the file by index as described
in the metainfo file, starting at zero. When a peer finishes
downloading a piece and checks that the hash matches, it announces
that it has that piece to all of its peers.
Then, yes, messages are exchanged by peers so they may know what is available for download. The protocol "part" that deals with this is the Peer Protocol.

How peers in BitTorrent select and transfer file piece?

I'm trying to understand the process inside BitTorrent and implement something similar. However, I still don't know the process of piece selection and transfer. How can each peer know that which peer has which file piece? Does each peer need to ask each other whether it has the requested piece or not all the time?
I have also heard about "rarest piece first" approach, which tries to get rarer piece before. How to get these information without wasting network traffic?
Thank you.
This is done with two messages: bitfield and have.
After two peers perform a handshake, they can optionally send bitfields as their first messages, which compactly represent all of their completed blocks.
Once a peer successfully downloads and verifies a block, they broadcast a have message to all of their connected peers, letting them know which block they just completed.

Securing a p2p network, so that intermediate nodes do not get to access the contents of the packets being transmitted

What mechanisms exist already for designing a P2P architecture, in which the different nodes do work separately, in order to split a task (say distributed rendering of a 3D image), but unlike torrents, they don't get to see, or hijack the contents of the packets being transmitted? Only the original task requester is entitled to view the? results of the complete task.
Any working implementations on that principle already?
EDIT: Maybe I formulated the question wrongly. The idea is that even when they are able to work on the contents of the separate packets being sent, the separate nodes never get the chance to assemble the whole picture. Only the one requesting the task is supposed to do this.
If you have direct P2P connections (no "promiscuous" or "multicasting" sort of mode), the receiving peers should only "see" the data sent to them, nothing else.
If you have relay servers on the way and you are worried that they can sniff the data, I believe encryption is the way to go.
What we do is that peer A transmits data to peer B in an S/MIME envelope: the content is signed with the Private Key of Peer A and encrypted with the public Key of Peer B.
Only peer B can decrypt the data and is guaranteed that peer A actually sent the data.
This whole process is costly CPU and byte wise and may not be appropriate for your application. It also requires some form of key management infrastructure: peers need to subscribe to a community which issues certificates for instance.
But the basic idea is there: asymetric encryption with a key or shared secret encryption.

DHT in P2P systems

In a P2P system, what is a difference between:
send a query message to a known node and the node re-send a response(I mean I explicitly contact a node by sending a message to ask him somethings).
if there is a DHT which contains information about nodes and their resources(each recording contain a key that represent IP # of each node, and a list of its available resources), so if I have an access to this DHT (may be I am a member) and I know the key or the identifier of a given node, first can I look directly at the recording of this node without need to send it a message or a query(I mean I implicitly contact it)?second, if yes how? I mean how the DHT is represented physically, and how a node updates its information?
In case 1. if you are sure the remote node has the resource, then DHT is useless.
In case 2, DHT helps you locate resources. Yes, you can take a look at the DHT record (if you have any) about the remote node. It will give you an indication of whether the resource might be available on that remote node.
Typically, DHT are in memory tables, or table stored on a local small database. There are many ways to push the info to remote nodes, a common way is push the info to random nodes.

Resources