Hi I want to make an application that if given a torrent file (or hash) can give the number of peers without being active (i.e not responsible) in the process that allow the sharing of a file (for legal reason obviously). whether by being a "passive" (passive as define previously) tracker or a bittorrent client that counts "All time" peers (i.e. number of download for a torrent). Can it be done? I know some trackers keep track of download but I don't know if those who "seem not to" actually do as well. I look for something that can track the number of unique-ip transfers from when the torrent was added to the tracking system or something that count download (complete).
It's not possible to determine all peers just from a tracker. There can be multiple trackers for each torrent, and they may not store complete, fresh, or even truthful information. Additionally there's no obligation for peers to be honest with their trackers. There are also alternatives to centralized trackers, such as DHT and PEX. There's no guarantee that all peers are participating in the same DHT network. Peers might even establish disjoint PEX communities.
In short, you might make a best effort attempt at determining the total swarm participation for a particular torrent by checking trackers and querying DHT. But to be as thorough as the technology will allow, you'd actually have to participate in the swarm with all manner of transports and protocol extensions currently in use such as uTP and encryption, and scrape each peer for further peers and download states. Of course the BitTorrent community is familiar with such attempts to scrape data, and there a lot of security measures in place to prevent exploitation in this way. Examples include IP blocklists, and heuristics on peer behaviour.
Related
Background
I'm trying to add some active trackers for transmission daemon to speed it up as I have done this before when using aria2.
But all the resources I found is how to add trackers to a torrent.
Question
So I'm wondering which is tracker related to? A torrent file or the downloader? If is the torrent file, how does I add trackers in aria2? The only way I can image is that aria2 automatically adds trackers to the added torrent.
BTW, how to add default trackers in transmission daemon just like in aria2?
Trackers can be centralized servers which you can request list of peers from.
The torrent file and the download don't go through the tracking server, the tracking server simply tells you from whom you can ask for pieces of the file from.
If you ask for more trackers, you won't download faster, you'll just have a wider pool of peers to pick from. If you want the download to be faster, you'll have to increase the amount of peers you are downloading from. (Tracking servers usually return 80 at a time I think in any case)
There are decentralized means of doing this using DHT (distributed hash table)
Assume there is a P2P file sharing system which has no trackers but only a DHT.
How to know the number of all active peers uploading/downloading a specific file?
Is it just keep querying the DHT by get_peers to get new peers? Are there any better solution?
The distributed part in the DHT makes it hard to get the exact number of peers in a swarm from it. Technically it's also unneeded and not a very useful number, as its only necessary to get in contact with only one other peer in the swarm and then the PeerEXchange extension will give plenty more peers in a more efficient way than the DHT.
Some clients also support the BEP33 DHT scrape extension that can give a approximate number of peers registered in the DHT with a max capacity of ca 6000.
Unfortunately it's badly designed and has a vulnerability making it the currently most potent vector for UDP amplification attacks using the BitTorrent protocol. It has a BAF (Bandwidth Amplification Factor) of 13.4 The attack is called Distributed Reflective Denial of Service (DRDoS) and is decribed in this paper. If this vector starts to get used it may be necessary to speedly remove this extension from the protocol.
I know the concept of building a simple P2P network without any server. My problems is with securing the network. The network should have some administrative nodes. So there are two kinds of nodes:
Nodes with privileges
Nodes without privileges
The first question is: Can I assign some nodes more rights than others, like the privileges to send a broadcast message?
How can I secure the network of modified nodes that are trying to get privileges?
I'm really interested in answers and resources than can help me. It is important to me to understand this, and I'm happy to add further information if anything is unclear.
You seem lost, and I used to do research in this area, so I'll take a shot. I feel this question is borderline off-topic, but I tend to error toward leaving things open.
See the P2P networks Chord, CAN, Tapestry, and Pastry for examples of P2P networks as well as psuedo-code. These works are all based off distributed hash tables (DHTs) and have been around for over 10 years now. Many of them have open source implementations you can use.
As for "privileged nodes", your question contradicts itself. You want a P2P network, but you also want nodes with more rights than others. By definition, your network is no longer P2P because peers are no longer equally privileged.
Your question points to trust within P2P networks - a problem that academics have focused on since the introduction of (DHTs). I feel that no satisfactory answer has been found yet that solves all problems in all cases. Here are a few approaches which will help you:
(1) Bitcoin addresses malicious users by forcing all users within their network do perform computationally intensive work. For any member to forge bitcoins that would need more computational power than everyone to prove they had done more work than everyone else.
(2) Give privileges based on reputation. You can calculate reputation in any number of ways. One simple example - for each transaction in your system (file sent, database look up, piece of work done), the requester sends a signed acknowledgement (using private/public keys) to the sender. Each peer can then present the accumulation of their signed acknowledgements to any other peer. Any peer who has accumulated N acknowledgements (you determine N) has more privileges.
(3) Own a central server that hands out privileges. This one is the simplest and you get to determine what trust means for you. You're handing it out.
That's the skinny version - good luck.
I'm guessing that the administrative nodes are different from normal nodes by being able to tell other nodes what to do (and the regular nodes should obey).
You have to give the admin nodes some kind of way to prove themselves that can be verified by other nodes but not forged by them (like a policeman's ID). The Most standard way I can think of is by using TLS certificates.
In (very) short, you create couples of files called key and certificate. The key is secret and belongs to one identity, and the certificate is public.
You create a CA certificate, and distribute it to all of your nodes.
Using that CA, you create "administrative node" certificates, one for each administrative node.
When issuing a command, an administrative node presents its certificate to the "regular" node. The regular node, using the CA certificate you provided beforehand, can make sure the administrative node is genuine (because the certificate was actually signed by the CA), and it's OK to do as it asks.
Pros:
TLS/SSL is used by many other products to create a secure tunnel, preventing "man in the
middle" attacks and
impersonations
There are ready-to-use libraries and sample projects for TLS/SSL in practically every language, from .net to C.
There are revocation lists, to "cancel" certificates that have been stolen (although you'll have to find a way to distribute these)
Certificate verification is offline - a node needs no external resources (except for the CA certificate) for verification
Cons:
Since SSL/TLS is a widely-used system, there are many tools to exploit misconfigured / old clients / servers
There are some exploits found in such libraries (e.g. "heartbleed"), so you might need to patch your software a lot.
This solution still requires some serious coding, but it's usually better to rely on an existing and proven system than to go around inventing your own.
I’m looking at creating a P2P system. During initial research, I’m reading from Peer-to-Peer – Harnessing the Power of Disruptive Technologies. That book states “a fully decentralized approach to instant messaging would not work on today's Internet.” Mostly blaming firewalls and NATs. The copyright is 2001. Is this information old or still correct?
It's still largely correct. Most users still are behind firewalls or home routers that block incoming connections. Those can be opened easier today than in 2001 (using uPnP for example, requiring little user interaction and knowledge) but most commercial end-user-targeting applications - phone (Skype, VoIP), chat (the various Messengers), remote control - are centralized solutions to circumvent firewall problems.
I would say that it is just plain wrong, both now and then. Yes, you will have many nodes that will be firewalled, however, you will also have a significant number who are not. So, if end-to-end encryption is used to protect the traffic from snooping, then you can use non-firewalled clients to act as intermediaries between two firewalled clients that want to chat.
You will need to take care, however, to spread the load around, so that a few unfirewalled clients aren't given too much load.
Skype uses a similar idea. They even allow file transfers through intermediaries, though they limit the through-put so as not to over load the middle-men.
That being said, now in 2010, it is a lot easier to punch holes in firewalls than it was in 2001, as most routers will allow you to automate the opening of ports via UPNP, so you are likely to have a larger pool of unfirewalled clients to work with.
Firewalls and NATs still commonly disrupt direct peer-to-peer communication between home-based PCs (and also between home-based PCs and corporate desktops).
They can be configured to allow particular peer-to-peer protocols, but that remains a stumbling block for most unsavvy users.
I think the original statement is no longer correct. But the field of Decentralized Computing is still in its infancy, with little serious contenders.
Read this interesting post on ZeroTier (thanks to #joehand): The State of NAT Traversal:
NAT is Traversable
In reading the Internet chatter on this subject I've been shocked by how many people don't really understand this, hence the reason this post was written. Lots of people think NAT is a show-stopper for peer to peer communication, but it isn't. More than 90% of NATs can be traversed, with most being traversable in reliable and deterministic ways.
At the end of the day anywhere from 4% (our numbers) to 8% (an older number from Google) of all traffic over a peer to peer network must be relayed to provide reliable service. Providing relaying for that small a number is fairly inexpensive, making reliable and scalable P2P networking that always works quite achievable.
I personally know of Dat Project, a decentralized data sharing toolkit (based on their hypercore protocol for P2P streaming).
From their Dat - Distributed Dataset Synchronization And Versioning paper:
Peer Connections
After the discovery phase, Dat should have a list of
potential data sources to try and contact. Dat uses
either TCP, UTP, or HTTP. UTP is designed to not
take up all available bandwidth on a network (e.g. so
that other people sharing wifi can still use the Inter-
net), and is still based on UDP so works with NAT
traversal techniques like UDP hole punching.
HTTP is supported for compatibility with static file servers and
web browser clients. Note that these are the protocols
we support in the reference Dat implementation, but
the Dat protocol itself is transport agnostic.
Furthermore you can use it with Bittorrent DHT. The paper also contains some references to other technologies that inspired Dat.
For implementation of peer discovery, see: discovery-channel
Then there is also IPFS, or 'The Interplanetary File System' which is currently best positioned to become a standard.
They have extensive documentation on their use of DHT and NAT traversal to achieve decentralized P2P.
The session messenger seem to have solved the issue with a truly decentralized p2p messenger by using a incentivized mixnet to relay and store messages. Its a fork of the Signal messenger with a mixnet added in. https://getsession.org -- whitepaper: https://getsession.org/wp-content/uploads/2020/02/Session-Whitepaper.pdf
It's very old and not correct. I believe there is a product out called Tribler (news article) which enables BitTorrent to function in a fully decentralized way.
If you want to go back a few years (even before that document) you could look at Windows. Windows networking used to function in a fully decentralized way. In some cases it still does.
UPNP is also decentralized in how it determines available devices on your local network.
In order to be decentralized you need to have a way to locate other peers. This can be done proactively by scanning the network (time consuming) or by having some means of the clients announcing that they are available.
The announcements can be simple UDP packets that get broadcast every so often to the subnet which other peers listen for. Another mechanism is broadcasting to IIRC channels (most common for command and control of botnets), etc. You might even use twitter or similar services. Use your imagination here.
Firewalls don't really play a part because they almost always leave open a few ports, such as 80 (http). Obviously you couldn't browse the network if that was closed. Now if the firewall is configured to only allow connections that originated from internal clients, then you'd have a little more work to do. But not much.
NATs are also not a concern for similiar issues.
I want to collect statistics from the spreading of a file in a new bittorrent swarm without actually downloading anything (or as little as possible). I need to know which peer has which pieces (to make file based statistics) knowing the number of seeders and leechers or percentages is not enough. Later when there are many peers I need to download the data to determine what it is. This part can be done with a regular torrent client.
I do not plan to implement the protocol myself so I looked at 2 implementations libtorrent and ktorrent's libbtcore. Neither is capable of collecting data while not downloading there are simply no connected peers when there is nothing to download. Libtorrent is simpler but ktorrent looks better commented.
I see 3 possibilities:
Use some application exactly for this. Are there any?
Modify a torrent implementation to do what I want. Is anyone familiar with them? Where to start?
Implement a small subset of the protocol. Just periodically ask the peers what they have. Is this feasible or would the program need to support almost the full protocol?
What do you recommend?
This is an old question, but perhaps this answer might be useful for others.
Use some application exactly for this. Are there any?
Not that I know of.
Modify a torrent implementation to do what I want. Is anyone familiar with them? Where to start?
I'm only familiar with the BitTornado core (that is used in e.g. ABC). It is written in Python, but it's an architectural mess.
However, you could just take any implementation and start stripping it from unnecessary functionality.
Implement a small subset of the protocol. Just periodically ask the peers what they have. Is this feasible or would the program need to support almost the full protocol?
Note that you cannot "ask" a peer what they have. The other peer informs you whenever it wants about the pieces it has (so it's push instead of pull). After the BitTorrent handshake, a peer may send a bitfield of pieces it has. Afterwards it may send HAVE messages informing you it has acquired a new piece. Also note that peers may lie about the pieces they have. Examples include superseeding peers and freeriding clients like BitThief.
If you want to implement a small subset of the protocol, you'd need at the bare minimum implement the BitTorrent handshake message and preferably the extended handshake message. The latter allows you to receive (and send) uTorrent PEX messages. PEX is useful to quickly discover other peers in the swarm.
For your statistics gathering purposes, you additionally need to support the bitfield and HAVE messages.