Assume there is a P2P file sharing system which has no trackers but only a DHT.
How to know the number of all active peers uploading/downloading a specific file?
Is it just keep querying the DHT by get_peers to get new peers? Are there any better solution?
The distributed part in the DHT makes it hard to get the exact number of peers in a swarm from it. Technically it's also unneeded and not a very useful number, as its only necessary to get in contact with only one other peer in the swarm and then the PeerEXchange extension will give plenty more peers in a more efficient way than the DHT.
Some clients also support the BEP33 DHT scrape extension that can give a approximate number of peers registered in the DHT with a max capacity of ca 6000.
Unfortunately it's badly designed and has a vulnerability making it the currently most potent vector for UDP amplification attacks using the BitTorrent protocol. It has a BAF (Bandwidth Amplification Factor) of 13.4 The attack is called Distributed Reflective Denial of Service (DRDoS) and is decribed in this paper. If this vector starts to get used it may be necessary to speedly remove this extension from the protocol.
Related
Suppose that the client keeps a hardcoded table of some known, famous nodes in order to start the connection with the network. When that node initially connects to the entry point, it must then proceed to find its own peers. How is that done?
The node obviously can't just ask the initial nodes for their neighbors.
https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery explains the process pretty well.
The gist is:
A central service to identify self's public IP address.
Hardcoded IRC channel, DNS domains or IP addresses to discover initial nodes.
Followed by each node regularly sharing their peers using specific advertisement messages.
I'm in particular wondering if something complex such as kademlia is required/used, or if we could have it as simple as randomly asking nodes neighbors and then neighbors neighbors for a while.
Both exist, the commonly used terms are structured and unstructured overlay network. The latter usually try to be a small world network.
Whether a simple network is sufficient depends on what you want to do with it. They generally work well enough for simply propagating updates to everyone or forming separate networks for each topic of interest. More complex things like address-based lookups on the other hand will benefit from structued overlays like kademlia.
Hi I want to make an application that if given a torrent file (or hash) can give the number of peers without being active (i.e not responsible) in the process that allow the sharing of a file (for legal reason obviously). whether by being a "passive" (passive as define previously) tracker or a bittorrent client that counts "All time" peers (i.e. number of download for a torrent). Can it be done? I know some trackers keep track of download but I don't know if those who "seem not to" actually do as well. I look for something that can track the number of unique-ip transfers from when the torrent was added to the tracking system or something that count download (complete).
It's not possible to determine all peers just from a tracker. There can be multiple trackers for each torrent, and they may not store complete, fresh, or even truthful information. Additionally there's no obligation for peers to be honest with their trackers. There are also alternatives to centralized trackers, such as DHT and PEX. There's no guarantee that all peers are participating in the same DHT network. Peers might even establish disjoint PEX communities.
In short, you might make a best effort attempt at determining the total swarm participation for a particular torrent by checking trackers and querying DHT. But to be as thorough as the technology will allow, you'd actually have to participate in the swarm with all manner of transports and protocol extensions currently in use such as uTP and encryption, and scrape each peer for further peers and download states. Of course the BitTorrent community is familiar with such attempts to scrape data, and there a lot of security measures in place to prevent exploitation in this way. Examples include IP blocklists, and heuristics on peer behaviour.
This question already has answers here:
Peer to Peer: Methods of Finding Peers
(13 answers)
Closed 7 years ago.
Simple question. How does client connect to a p2p system if it does not know any already connected nodes? Is it event possible? Like in torrent magnet links, or pastry?
Yes, it is possible. One technique is to use a well-known URL, where the peer can fetch a list of IP addresses of (bootstrap/central) peers currently reachable online. Another technique is to send a multicast message on the LAN and hope that another peer on the same LAN is already connected to the P2P network. Then, addresses of other peers can be fetched from it.
There is an academic paper on this subject which is quite interesting. Addressing the P2P bootstrap problem for small overlay networks 2010 by David Isaac Wolinsky, Pierre St. Juste, P. Oscar Boykin, and Renato Figueiredo.
It explores using already existing P2P-services (like XMPP, BitTorrent, Gnutella, and Brunet) for bootstrapping smaller, private overlays.
This is the abstract:
"P2P overlays provide a framework for building distributed applications consisting of few to many resources with features including self-configuration, scalability, and resilience to node failures. Such systems have been successfully adopted in large-scale services for content delivery networks, file sharing, and data storage. In small-scale systems, they can be useful to address privacy concerns and for network applications that lack dedicated servers. The bootstrap problem, finding an existing peer in the overlay, remains a challenge to enabling these services for small-scale P2P systems. In large networks, the solution to the bootstrap problem has been the use of dedicated services, though creating and maintaining these systems requires expertise and resources, which constrain their usefulness and make them unappealing for small-scale systems. This paper surveys and summarizes requirements that allow peers potentially constrained by network connectivity to bootstrap small-scale overlays through the use of existing public overlays. In order to support bootstrapping, a public overlay must support the following requirements: a method for reflection in order to obtain publicly reachable addresses, so peers behind network address translators and firewalls can receive incoming connection requests; communication relaying to share public addresses and communicate when direct communication is not feasible; and rendezvous for discovering remote peers, when the overlay lacks stable membership. After presenting a survey of various public overlays, we identify two overlays that match the requirements: XMPP overlays, such as Google Talk and Live Journal Talk, and Brunet, a structured overlay based upon Symphony. We present qualitative experiences with prototypes that demonstrate the ability to bootstrap small-scale private structured overlays from public Brunet or XMPP infrastructures."
I’m looking at creating a P2P system. During initial research, I’m reading from Peer-to-Peer – Harnessing the Power of Disruptive Technologies. That book states “a fully decentralized approach to instant messaging would not work on today's Internet.” Mostly blaming firewalls and NATs. The copyright is 2001. Is this information old or still correct?
It's still largely correct. Most users still are behind firewalls or home routers that block incoming connections. Those can be opened easier today than in 2001 (using uPnP for example, requiring little user interaction and knowledge) but most commercial end-user-targeting applications - phone (Skype, VoIP), chat (the various Messengers), remote control - are centralized solutions to circumvent firewall problems.
I would say that it is just plain wrong, both now and then. Yes, you will have many nodes that will be firewalled, however, you will also have a significant number who are not. So, if end-to-end encryption is used to protect the traffic from snooping, then you can use non-firewalled clients to act as intermediaries between two firewalled clients that want to chat.
You will need to take care, however, to spread the load around, so that a few unfirewalled clients aren't given too much load.
Skype uses a similar idea. They even allow file transfers through intermediaries, though they limit the through-put so as not to over load the middle-men.
That being said, now in 2010, it is a lot easier to punch holes in firewalls than it was in 2001, as most routers will allow you to automate the opening of ports via UPNP, so you are likely to have a larger pool of unfirewalled clients to work with.
Firewalls and NATs still commonly disrupt direct peer-to-peer communication between home-based PCs (and also between home-based PCs and corporate desktops).
They can be configured to allow particular peer-to-peer protocols, but that remains a stumbling block for most unsavvy users.
I think the original statement is no longer correct. But the field of Decentralized Computing is still in its infancy, with little serious contenders.
Read this interesting post on ZeroTier (thanks to #joehand): The State of NAT Traversal:
NAT is Traversable
In reading the Internet chatter on this subject I've been shocked by how many people don't really understand this, hence the reason this post was written. Lots of people think NAT is a show-stopper for peer to peer communication, but it isn't. More than 90% of NATs can be traversed, with most being traversable in reliable and deterministic ways.
At the end of the day anywhere from 4% (our numbers) to 8% (an older number from Google) of all traffic over a peer to peer network must be relayed to provide reliable service. Providing relaying for that small a number is fairly inexpensive, making reliable and scalable P2P networking that always works quite achievable.
I personally know of Dat Project, a decentralized data sharing toolkit (based on their hypercore protocol for P2P streaming).
From their Dat - Distributed Dataset Synchronization And Versioning paper:
Peer Connections
After the discovery phase, Dat should have a list of
potential data sources to try and contact. Dat uses
either TCP, UTP, or HTTP. UTP is designed to not
take up all available bandwidth on a network (e.g. so
that other people sharing wifi can still use the Inter-
net), and is still based on UDP so works with NAT
traversal techniques like UDP hole punching.
HTTP is supported for compatibility with static file servers and
web browser clients. Note that these are the protocols
we support in the reference Dat implementation, but
the Dat protocol itself is transport agnostic.
Furthermore you can use it with Bittorrent DHT. The paper also contains some references to other technologies that inspired Dat.
For implementation of peer discovery, see: discovery-channel
Then there is also IPFS, or 'The Interplanetary File System' which is currently best positioned to become a standard.
They have extensive documentation on their use of DHT and NAT traversal to achieve decentralized P2P.
The session messenger seem to have solved the issue with a truly decentralized p2p messenger by using a incentivized mixnet to relay and store messages. Its a fork of the Signal messenger with a mixnet added in. https://getsession.org -- whitepaper: https://getsession.org/wp-content/uploads/2020/02/Session-Whitepaper.pdf
It's very old and not correct. I believe there is a product out called Tribler (news article) which enables BitTorrent to function in a fully decentralized way.
If you want to go back a few years (even before that document) you could look at Windows. Windows networking used to function in a fully decentralized way. In some cases it still does.
UPNP is also decentralized in how it determines available devices on your local network.
In order to be decentralized you need to have a way to locate other peers. This can be done proactively by scanning the network (time consuming) or by having some means of the clients announcing that they are available.
The announcements can be simple UDP packets that get broadcast every so often to the subnet which other peers listen for. Another mechanism is broadcasting to IIRC channels (most common for command and control of botnets), etc. You might even use twitter or similar services. Use your imagination here.
Firewalls don't really play a part because they almost always leave open a few ports, such as 80 (http). Obviously you couldn't browse the network if that was closed. Now if the firewall is configured to only allow connections that originated from internal clients, then you'd have a little more work to do. But not much.
NATs are also not a concern for similiar issues.
I want to collect statistics from the spreading of a file in a new bittorrent swarm without actually downloading anything (or as little as possible). I need to know which peer has which pieces (to make file based statistics) knowing the number of seeders and leechers or percentages is not enough. Later when there are many peers I need to download the data to determine what it is. This part can be done with a regular torrent client.
I do not plan to implement the protocol myself so I looked at 2 implementations libtorrent and ktorrent's libbtcore. Neither is capable of collecting data while not downloading there are simply no connected peers when there is nothing to download. Libtorrent is simpler but ktorrent looks better commented.
I see 3 possibilities:
Use some application exactly for this. Are there any?
Modify a torrent implementation to do what I want. Is anyone familiar with them? Where to start?
Implement a small subset of the protocol. Just periodically ask the peers what they have. Is this feasible or would the program need to support almost the full protocol?
What do you recommend?
This is an old question, but perhaps this answer might be useful for others.
Use some application exactly for this. Are there any?
Not that I know of.
Modify a torrent implementation to do what I want. Is anyone familiar with them? Where to start?
I'm only familiar with the BitTornado core (that is used in e.g. ABC). It is written in Python, but it's an architectural mess.
However, you could just take any implementation and start stripping it from unnecessary functionality.
Implement a small subset of the protocol. Just periodically ask the peers what they have. Is this feasible or would the program need to support almost the full protocol?
Note that you cannot "ask" a peer what they have. The other peer informs you whenever it wants about the pieces it has (so it's push instead of pull). After the BitTorrent handshake, a peer may send a bitfield of pieces it has. Afterwards it may send HAVE messages informing you it has acquired a new piece. Also note that peers may lie about the pieces they have. Examples include superseeding peers and freeriding clients like BitThief.
If you want to implement a small subset of the protocol, you'd need at the bare minimum implement the BitTorrent handshake message and preferably the extended handshake message. The latter allows you to receive (and send) uTorrent PEX messages. PEX is useful to quickly discover other peers in the swarm.
For your statistics gathering purposes, you additionally need to support the bitfield and HAVE messages.