How bittorrent tracker works? - bittorrent

I read official documetation here and wikipedia article about bittorrent client here but couldn't find how tracker exactly work. As per my understanding tracker should know which peer have which piece of some file. For example client 1 have 5 pieces of file 1 and 10 pieces of file 2. In official documentation I see tracker have fields like peer_id ip port uploaded downloaded left and event but I don't see where is the information about which file we are tracking. For example if I ask tracker hey I need pieces for linux.torrent file, how tracker would answer me?

A trackers job is, when a peer announces to a specific swarm (identified by the info_hash),to register that peer as active on that swarm and then send that peer a peer list with other peers active on that swarm.
A tracker does NOT keep track on which pieces or files a peer has.
I recommend you to read the inoffical protocol specification: https://wiki.theory.org/index.php/BitTorrentSpecification
it's bit easier do comprehend than the terse and dense BEP-3.

In official documentation there is bep for UDP tracker protocol here

Related

Bittorrent maintain states among multiple peers

I am following the Kristen Widman tutorial for writing a bittorrent client, as well as following the Wiki, however I have a point of confusion that I haven't been able to clear up from online resources.
So far I have been able to send a tracker request and receive a list of peers, whose IP addresses and ports I've stored like this:
[{'IP': IPv4Address('76.126.244.88'), 'port': 6881}, ... ]
Kristen suggests in her tutorial that I try to connect to a single peer first. I suppose I'd try to connect with the first peer in the list first, so far so good.
With regards to each peer, however, there are certain states like 'am_choking' , 'peer_choking', 'am_interested', 'peer_interested'. For each peer in my list, do I need to keep track of each of these states?
Yes, connections to each peer need to maintain an independent set of states to remember what the latest status sent by the remote is.
This is explicitly mentioned in the official bittorrent specification:
Connections contain two bits of state on either end: choked or not, and interested or not.
and on the wiki too:
A client must maintain state information for each connection that it has with a remote peer:

What difference does anchor peers make in fabric network?

I removed anchor peer definition in configtx.yaml file and also I didn't update anchor peers on my channel for any participating orgs. Surprisingly, the network works fine and all transactions are going through.
I made changes in fabric-samples/first-network folder of fabric's official github repo. I understand that anchor peers are used for gossip communication and peer discovery, though having no anchor peer in the network made no visible difference as compared to when we define them. I was hoping to see some errors but none came. How do I gauge difference between both cases ?
You need anchor peers in order to allow cross organizations communication, that's mean to make peers from different organization domain able to get connected. Now, normally in each organization gossip elects peer to serve as a leader to pull blocks from ordering service and gossip them around. Therefore if no anchor peers is configured, most likely you won't see any difference.
Now, the question, why do you need them. Here is two reasons
You need cross organization communication during state transfer or replication of missing blocks in case for example where one org partitioned from the ordering service but can reach out to the other organization.
Second use case if more complex one. You need cross organization communication for private data, as private data distributed off chain, e.g. via gossip. You need to be able to push pieces of private data during endorsement. And pull missing private data during commit.
Hence unless you are not encounter a need of any of these two scenarios you won't feel any difference with or without anchor peers configured.

How Distributed Hash Table in IPFS and Bittorrent prevent abuse?

My understanding is that IPFS and Bittorrent Mainline DHT are built on top of a Distributed hash Table (Kademlia).
They use the file hash as Kademlia key to find a list of peer that might have this file.
1- What I don't understand is if this is all decentralized who remove from the DHT peer that no longer host a file content?
2- What prevent someone from storing large amount of data for free inside the DHT?
3- What prevent someone from disrupting the network by adding large number of invalid peer for a popular file.
4- What prevent a bad actor from joining the DHT ring and not following the routing protocol thus preventing discovery message from reaching correct nodes.
Not sure why this was downvoted. These are excellent questions.
1- What I don't understand is if this is all decentralized who remove
from the DHT peer that no longer host a file content?
I think that DHT entries are regularly re-broadcast. So if a peer goes away, its DHT entries will no longer be broadcast and the network will forget about the data it provides unless some other node has it.
2- What prevent someone from storing large amount of data for free
inside the DHT?
Unless you re-publish or somebody else is interested in the data, it will vanish. The amount of data that you can store directly in a DHT entry is limited. So you can make other nodes store some of your data by putting data directly into DHT entries, but the effort outweighs the benefits.
3- What prevent someone from disrupting the network by adding large
number of invalid peer for a popular file.
I think there are some mechanisms envisioned in IPFS to protect the DHT against attacks. However, I don't think the current implementation is all that sophisticated. I don't think that current IPFS would deal well with a large scale distributed DDOS attack.
4- What prevent a bad actor from joining the DHT ring and not
following the routing protocol thus preventing discovery message from
reaching correct nodes.
I think a single node would be insufficient to do much damage, because a node will ask multiple peers. You would have to have multiple nodes to do significant damage.
But IPFS as it is now would not survive a sophisticated attack by state actors.

How does a peer know another peer is a seed?

When a peer connects to another peer that has all the pieces, how does the connecting peer know that what it connects to is actually a seed (and has all of the pieces)? Are there messages sent between them?
In clients like uTorrent, the peer seems to be aware of the download progress of each peer it has connected to as well.
How does it know all that? Does a peer derive that another peer is a seed if progress is 100% or are there actually specific messages for that? Which parts of the protocol deal with all this?
A peer knows if another peer is a seed if the other peer either:
sends a fully complete bitfield indicating that it has all the pieces in the torrent. - BEP3
sends a incomplete bitfield and then all have messages for the rest of the pieces it didn't have from the beginning. (This can either be that it is continuously downloading and completes the torrent or that it sends a lazy bitfield.) - BEP3
sends a have all message according to the Fast Extension - BEP6
sends upload only=1 according to the Extension for Partial Seeds - BEP21
Partial seed means that the peer has only downloaded parts of the torrent and don't want to download any more and is seeding what it has.
A peer reports its progress by continuously sending have messages.
This part of the protocol is called the Peer Wire Protocol.
As you can see in the spec, clients are supposed to exchange the bitfield message to tell the other which pieces they currently have. Regular have messages later update this, when a peer receives further pieces (that's the straightforward description anyway, the reality is messier, more on that later).
This is modified by the widely supported Fast Extension, in which peers can compress fully complete and fully empty bitfield messages to have all and have none.
It is also modified by Superseeding, in which seeds lie about the pieces they have in order to seed the initial swarm more efficiently. And in general peers could always lie, in particular they can pretend not to have pieces which they really do, and you'd never know for sure.
Which brings me back to the messier reality. Peers may choose not to send the have x to you if you told them you have x, because it won't make any difference in whether or not you will request x from them (you won't, because you already have it). On the other hand, that is bad for some optimizations such as prioritizing the upload of rare pieces and, in particular, Superseeding.
According to bittorent protocol specification:
The peer protocol refers to pieces of the file by index as described
in the metainfo file, starting at zero. When a peer finishes
downloading a piece and checks that the hash matches, it announces
that it has that piece to all of its peers.
Then, yes, messages are exchanged by peers so they may know what is available for download. The protocol "part" that deals with this is the Peer Protocol.

Peer to Peer: Methods of Finding Peers

Are there any known methods of finding peers without using a dedicated central server?
ie: If I have peers which are disconnecting and reconnecting to the internet but getting a new IP address each time, and I want to connect to them without setting up a dedicated server to register with.
I was thinking about using peers email address to send a manifest of connected peers periodically, with some sort of timecode, negating the need for a dedicated server. This would be a fallback if none of the peers could be connected to after trying all the previously known peer addresses. But existing models of finding peers would be preferable.
There's no way around having to know at least one initial peer to discover more.
Fully P2P protocols, such as Gnutella or Gnutella2, or the simpler Overnet (made famous by Storm Worm), are based on each client having a start-up list of a few peers. These can come off a web-based automated tracker for example. The client will discover the whole network or portions of it by asking other peers for more addresses, for example when delegating a file search.
If you truly can't have any kind of a centralized resource, the best you can do is find the first peer through broadcasted messages and ultimately IP address scanning. The first approach is well-meaning but in at least 98% of cases won't yield any results. The later approach, of course, is abusing the internet, as well as illegal in most countries.
I really would rethink having some kind of a central tracker. It can be something as simple as a PHP script on a webserver (the gnutella network, today, is held up by ten-twenty such scripts, hosted by people who don't even know each other). And this sure is more lightweight than email (which, due to spam filters at the very least, would not work anyway).
In the limited case of peers within an intranet, it is possible to send a broadcast UDP message to a known port asking for peers to report back.
The BitcoinQT client uses a variety of methods to find nodes, some of them might be useful to you.
Satoshi Client Node Discovery
IRC is no longer used, but might be the most easy to implement:
As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default, and as of version 0.8.2 support for IRC bootstrapping has been removed completely. This documentation below is accurate for most prior versions.
In addition to learning and sharing its own address, the node learned about other node addresses via an IRC channel. See irc.cpp.
After learning its own address, a node encoded its own address into a string to be used as a nickname. Then, it randomly joined an IRC channel named between #bitcoin00 and #bitcoin99. Then it issued a WHO command. The thread read the lines as they appeared in the channel and decoded the IP addresses of other nodes in the channel. It did this in a loop, forever, until the node was shutdown.
When the client discovered an address from IRC, it set the timestamp on the address to the current time, but it used a "penalty" of 51 minutes, which means it looked like it was actually seen almost an hour earlier.
Take advantage of any existing forum where data can posted. Think secret IRC channel, embedding data in photos and posting to photo sharing sites 4chan?, any site that would allow your application to login and post data without captia logins etc.
http://chatzilla.hacksrus.com/faq/#password
Another strategy might be to embedded messages in digital currency transactions. Pick a cheap coin that's likely to hang around ... DOGE or MOON coin maybe. Build wallet functionality into your app. such that you can post micro transactions back and forth between addresses that your app controls. There would still be a miners fee, but this is only fractions of pennies. Even if they later prohibit adding metadata to transactions, you could make a transaction equivalent to your IP address in MOON, and use vanity addresses in MOON coin for your app. such that when a new node comes online it knows what to search the blockchain for -- 2daMOON%bootStr#pM3. SEND - 104.003021133 MOON IP = 104.3.21.133 not an expensive proposition.
Old question but I've been thinking about this problem myself so will ad my 2-cents. In short, a central server is not required if a node is aware of at least one valid peer. New nodes must be added to the network by any current member (e.g. invited, or node spawns another node, depending on your application).
Assuming that:
agents keep track of peers; the size of this address book and how entries are managed will depend on the nature of the system; e.g. how long peers remain connected, if peers use stable addresses
agents share peer information with other peers
at least some agents remain available for relatively long periods of time relative to frequency node connects to network to update it's address book (or nodes have stable addresses)
in addition to peer addresses, availability information is also tracked (many options here depending on your system. examples include: whether peer has a stable address, when last seen, some availability metric, content/service type information, address valid-until time if known)
new agents are initialized with at least one valid peer (doesn't have to be a central node, can be any valid node)
trust mechanisms shall be required if malicious peers are a possibility
When a peer comes online, it queries the peers in it's peer table to discover which are active and perhaps removes expired dynamic addresses. Nodes exchange peer information and may become linked themselves. This peer discovery/exchange may continue a certain number of hops or via random walk until peer list if of sufficient size and/or quality.
A few more details:
Nodes connect and share peer information with frequency related to how often node addresses change, so address book doesn't become stale and node becomes disconnected because none of it's former peers are available at their last known addresses
Nodes may need to limit the number of peers they accept, to avoid tendency towards centralization around the most stable nodes.
Nodes should be selective about the peers they keep; i.e. ones in which they are more likely to exchange data (e.g. weight based upon history)
Node links may be asymmetric or symmetric depending on the application
Three ways, off the top of my head, though you're always going to need some central server to start the connection unless you went with option 3.
Central server that maintains known list of peers, with keep-alive.
One or more central servers that maintain some common resource peers can use to discover one another, but once connected no longer need the central server as long as the peer remains connected (something like BitTorrent); can chain peered connections as well.
Port/IP scanning (strongly not recommended).
In your example, you'd still have some kind of central server where the peers would be registered; the protocol is the only difference.
To put it simply no, there is no way to do this without a central sever.
If you want to do this you simply need one or more central servers, whether by dynamic dns or not. The clients need a method to discover where they should connect to, and the only truly sensible way to do this is with your own server, in the simplest scenario it only needs to send an IP address in response.
Virtual severs can be had for around $15/month, which IMO is considerably cheaper than trying to use or abuse someone else's bandwidth.
[Edit].
To put it simply, there is another way, as follows.
Upon reflection I think what I'd do is to designate a set of peers as cluster controllers and use a dynamic DNS service to allow other peers to discover the cluster controllers.
Choose a dynamic DNS provider I'll call it myc.ath.cx (I Use http://www.dyndns.com/).
Each peer has to be capable of becoming a cluster controller. A cluster controller will contain a list of all the other peers connected.
When a peer is started it looks up myc.ath.cx and attempts to connect. If connection cannot be made within a period, say 30 seconds, it takes over the registration of the DNS entry.
Any peer wishing to discover other peers can simply query myc.ath.cx and a list will be provided
All peers are responsible for periodically downloading the list of peers, in case they need to cluster controller.
The cluster controller will periodically query the DNS entry - if has changed from it's IP address then it knows that it is no longer the cluster controller - so it will contact the cluster controller that currently has the DNS entry and provide it's list of known hosts.
The cluster controller will periodically contact hosts on the list to ensure that they are still valid.
Your method of sending email does use a dedicated server, though; the peer's email server, to be precise.
Roughly, I don't think it's possible without using some sort of dedicated storage or server (which the email approach does, albeit obliquely) UNLESS you are able to characterize the connectivity to the internet that your peers are using.
Basically, if you have a set of X number of peers, that connect for Y amount of time, and they are then off the grid for Z amount of time... essentially, you can construct a probability equation about how likely it is that the set of peers that you last contacted is still available; where that probability approaches 1 (for a given set of X, Y, and Z above), you can most likely sustain a peer-to-peer network without using storage.
Possibly more in the spirit; instead of having a "dedicated central server", use simple online free service to specify a peer list. Set up a yahoo group, or something like that; clients can automatically look it up and get a peer address from which to query a set of peers; the client can be coded with the authentication to post to the group, and can post periodically its IP address so that others can request the set of known active peers.
If you want to get really tricky, you can start using basically steganographic methods to hide peer location information. I.e. get a google search for "blah"; find the first site listed in the results that has an unprotected (no CAPTCHA) message board; find the third (or whatever) post that starts with "Indubitably" (or whatever), and find the header of the first message there, and there's the IP address of a peer. If that doesn't work, go down the list of search terms to the next one.
But that's sneaky. :-)
Could you re-use an existing dedicated server for the purpose?
I am thinking in particular of registering each of the peers with a Dynamic DNS, but if you were willing to get a bit uglier, sharing access to a known Hotmail account or Google Doc or the like.
You can either use a central directory or some sort of broadcast protocol for service discovery. Assuming that you could get them indexed by Google, you could conceive of a system whereby each peer runs a web site with some unique, rare words contained on a specific page. You could then use Google search results based on these words to identify potential peers. This would essentially be a (noisy and slow) internet broadcast.
If the page structure was a well known pattern or contained identifiable connection information for that peer, it would be easy to distinguish them in the search results. Using such a public directory leaves you open to compromised nodes in the network that is formed, but this is pretty much true of any P2P network absent some security mechanism.
Getting the web sites crawled and highly ranked by Google (or some other search engine) for your particular arcane set of search terms would be the trick. I can think of a couple of ways, but they aren't ones that I would use. For a legitimate service, I'd rather spend the money or find a free web site that could function as a directory.
What about another P2P system built specifically to track online peers of other P2P systems?
Then we reduce the problem of finding peers for any new P2P system to simply finding peers for the 'main' P2P system, which will give you the addresses of online peers for the system you're interested in using...
This is a typical use of a distributed hash table algorithm. I'd suggest looking at something like pastry. It uses a overlay network (Application layer network) on top of other layers.
Each node has a GUID which is used to route requests across the peer network.
If you're loooking for an already established central server then see the metaserver entry on page here:
http://martindevans.appspot.com/
You can register peers on there and then other peers can find them. Obviously this is a central server, but it requires no maintenance on your part.

Resources