How a peer on private lan is connected to another peer on another private lan in bittorrent

How a peer on private lan is connected to another peer on another private lan in bittorrent - p2p

How the peaces of data are being transferred to a client behind the router (private lan)
I know that every peer is connected to a tracker (Public IP) initially but tracker doesn't involve in transfer of data.
I've read many answers on stackoverflow and superuser about bittorrent working but I didn't get the answer.
I would really appreciate if someone explains it in detail similar to this answer https://superuser.com/questions/401802/how-do-ip-answer-packets-reach-their-destination-inside-of-a-private-lan
I'm curious to know whats happening on the router for torrent traffic.

Related

Torrent Protocol find peer

I am studying bit torrent protocol. I have confusion regarding torrent protocol. Suppose I have a router with static IP and two clients are connected to that router that are C1 and C2. One of that client say C1 is acting as a seed. Now how will the client downloading the file will know that C1 is seeding taking consideration the fact the only thing know to the outer-network is static IP of the router.
Is there is any way through which torrent can identify the client C1 ??? Please explain that.

a) they can find each other via local service discovery, it's widely deployed but currently lacking a specification
b) they can talk to each other through their respective public socket addresses as discovered through other peer discovery mechanisms if the router supports hairpin NAT routing
Update: Now there's a spec for LSD.

STUN, TURN and ICE

I have two PC behind the different NAT. I want to access the first PC from the second PC.
While searching over the internet, I have found the concepts of STUN, TURN and ICE.
If i run the stun client on one PC, I can find the ip and port of my PC from public internet perspective. but after that also how can i connect to other PC. How other PC will know my ip and port which is public ip and external port of my router?
Both PCs are behind Restricted cone NAT, so i think i don't need turn server functionality.
I have one PC with public IP which i can run as a STUN server.
I have read the specifications of STUN and ICE, But i am not getting the clear idea. There they are telling that there is some signalling mechanism available through which other host can know my ip and port number.
But, Exactly how?
If there is one stun server and multiple clients, then how can be decided which client need to connect with which client as no client knows about othet clients addresses?

the short answer is, the STUN server helps the peer identify his public IP, so it does not matter how many clients are try to talk to each other, it does not care, its only job is to supply a peer with his public IP.
from what I understand, the STUN server helps you (the browser client) identify its public IP, which you would set in your SDP, and you pass this to the signalling server,
which would forward it to the other peer, similarly, the other peer would also transmit to you his sdp (the offer and answer).

How do you create a peer to peer connection without port forwarding or a centeralized server?

I recall reading an article about a proposed way to do this. If I recall correctly, the researchers successfully created a connection to a client on another network without port forwarding by sending HTTP packets to each other (Alice pretends that Bob is an HTTP web server while Bob pretends Alice is a web server).
I'm not sure if that makes sense, but does anyone know where I can find the article or does anyone have any other ideas how to connect two clients together without a central server or port forwarding?
Is it even possible?
Edit: I would know the IPs of both computers and port the program listens on.

It is possible. I see at least 2 parts to your question. (It is not going to be HTTP packet. It is a lot more complex than that.)
First off, I believe you might be talking about a concept called decentralized P2P network. The main idea behind a decentralized peer-to-peer network is the fact that nodes conjoint in such a network will not require central server or group of servers.
As you might already know, most common centralized peer-to-peer networks require such centralized system to exchange and maintain interconnectivity among nodes. The basic concept is such, a new node will connect to one of the main servers to retrieve information about other nodes on the network to maintain its connectivity and availability. The central system gets maintained through servers constantly synchronizing network state, relevant information, and central coordination among each other.
Decentralized network, on the other hand, does not have any structure or predetermined core. This peer-to-peer model is also called unstructured P2P networks. Any new node will copy or inherit original links from the "parent" node and will form its own list over time. There are several categories of decentralization of such unstructured networks.
Interestingly enough, the absence of central command and control system makes it solution of choice for modern malware botnets. A great example could be Storm botnet, which employed so-called Passive P2P Monitor (PPM). PPM was able to locate the infected hosts and build peer list regardless whether or not infected hosts are behind a firewall or NAT. Wikipedia's article Storm botnet is an interesting read. There is also great collaborative study called Towards Complete Node Enumeration in a Peer-to-Peer Botnet, which provides excellent conceptual analysis and techniques employed by Storm botnet network.
Second of all, you might be talking about UDP hole punching. This is a technique or algorithm used to maintain connectivity between 2 hosts behind NATed router/gateway using 3rd comment host by means of a third rendezvous server.
There is a great paper by Bryan Ford, Pyda Srisuresh, and Dan Kegel called Peer-to-Peer Communication Across Network Address Translators.

As answered, a peer-to-peer connection requires establishment of a connection between two (presumably) residential computers, which will necessitate punching holes through both of their firewalls. For a concrete example of hole punching, see pwnat: "The only tool to punch holes through firewalls/NATs without a third party". The process, put simply, goes like this:
The "server" (who doesn't know the client's IP address, but the client knows the server's) pings a very specific ICMP Echo Request packet to 1.2.3.4 every 30 seconds. The NAT, during translation, takes note of this packet in case it gets a response.
The client sends an ICMP Time Exceeded packet to the server, which is a type of packet that usually contains the packet that failed to deliver. The client, knowing in advance the exact packet that the server has been sending to 1.2.3.4, embeds that whole packet in the Data field.
The NAT recognizes the Echo Request packet and happily relays the whole Time Exceeded packet, source IP and all, to the correct user, i.e. the server. Voila, now the server knows the client's IP and port number.
Now that the server knows the address, it begins to continually send UDP packets to the client, despite the fact that the client's NAT did not expect them and will therefore ignore them all.
The client begins sending UDP packets to the server, which will be recognized by the server's NAT as a response to the server's packets and route them appropriately.
Now that the client is sending UDP packets to the server, the server's stream of UDP packets starts getting properly routed by the client's NAT.
And, in 6 easy steps, you have established a UDP connection between a client and a server penetrating two residential firewalls. Take that, ISP!

Where the p2p software connect to initially?

In BitTorrent, client connects to the tracker specified in .torrent file. Tracker is a kind of centralized server and it is the starting point. So BitTorrent is not pure p2p.
If we want to develop pure p2p system, we should design routing overlay network. All nodes will have routing table like routers do. But even in routing overlay network, each node should know at least one existing node(GUID, IP address) initially. So how can we determine this? Should we keep 'one existing node to connect initially' forever like fixed centralized server? If so, I think this is not fully decentralized method.

The solution you describe (defining a central peer wit well-know ip address) is not the only one.
The other solution is to post an html page (or json file) in a well-know URL on the net, and make sure this item contains an updated list of peers this peer can initialy connect. This list can be updated if a peer goes down. Eventually, you can use multiple URLs.
Pure P2P system is a theoretical concept which cannot be implemented fully in reality.

You could use an anycast. So the first other client will answer and may send such an initial "client list". Where your client can connect to them to get more lists.
Classically I would implement a multicast to a adress and wait for an answer of other clients.

Firstly, a true peer to peer network is not necessarily decentralized. Secondly, decentralization does not necessarily mean that the network doesn't make use of secondary services which may themselves be centralized.
The main question in both of these issues is wheather the primary resources of the network solution are distributed through correlated peers.
For example, peer-to-peer video conferencing may use a central contacts service but still be pure peer-to-peer as long as the peers resolve such issues before entering a true peer-to-peer scope. This would also be decentralized.
What it comes down to is what your trying to solve by using peer-to-peer. A video conference is a video conference - it starts with a video being recorded on one peer and ends with the video being viewed on another. As long as each byte of this data is transferred directly between the peers (even if there's hundreds of peers in the conference, and regardless of how these peers found each other) it is a true peer-to-peer video conference.
Note that the video peers will still be in your typical ring, and that the contacts lists may still use the node key for location information rather than an IP. This will still be a network overlay as it will still be built over IP, replacing it's addressing scheme on the peer level to facilitate true peer-to-peer networking.
What it realy comes down to is the concepts of a network connection. IP just pushes packets to unspecified routers until it gets to a specific address. 'connections' between each endpoint only exist within the higher software levels (including when dealing with TCP/IP). A connection is just the data used within software to understand who is who and how each point can handle data etc. Peer-to-peer network overlays effectively distributes this data, eliminating the need for each peer to create massive amounts of connections to communicate on a massive scope. Decentralization is not required for this (as long as peer to peer communication is not centralized), and a secondary service within the system wont necessarily limit a network's scope or otherwise centralize actual peer-to-peer networking.
So to answer your question, it doesn't matter where it initially connects in order to be considered peer-to-peer, and different peer-to-peer services will handle this based on their service design.

Ok so I am thinking about writing a p2p protocol for a number of different services for an AI project, and I thought I'd come here to see if i could get some ideas on how to get the initial connection.
I have come across several ways to establish the initial connection:
1) You have a static ip address on the internet that distributes information on other peers. This isn't good, because:
a) it's a single point of failure, the service could go offline, preventing any new connections from creating an initial connection to peers,
b) the IP address could change. This could be mitigated by using a domain name which is maintained to point to the current ip address of the location of a service which provides data on peers, however this can be subverted in theory by hackers by spoofing or arp poisoning, dns attacks etc.
2) You could force the user to provide an initial ip address or hostname for another peer, and it's up to the user to find the hostname / ip address / port number. This is good, but if someone posts disinformation or they cannot find a peer on google or some other search site then obviously it's fallable.
3) You could leave it to the peer to publish their own existence in a central location - for example a group of IRC channels or a group of websites. Again, unless it's going through a central trusted domain, it's hard to determine the authenticity of the peer.
4) You could use some kind of nmap style discovery algorithm that searches through subnets for appropriate protocols. The problem with this approach is it's slow and it's likely to attract attention from things like firewalls etc.
5) This is a variation of 3) you could allow the peers to advertise their own information on a website, then instead of having to look for the information in a suitable location (a specific website, or group of websites), you can let google's search algorithm find it, and do the discovery for you, however you could imagine that this may take a few days for google to cache the website data with information on the peers. Also again, it would be upto you to provide some way to verify the authenticity of the advertised data.
6) If you are interested in an exclusive p2p network that locks out certain people (for example, you might want to have a file sharing network and you don't want law enforcement to be able to access it, or MPIAA), then you could use 2) and then have a referral system where you require that the initial connection provide the referrer's ip address, and then the service could connect to the referrers ip address and ask the referrer if they did indeed refer the referee.
That's all I can think of currently, but if anyone comes up with any other ways to do this, i would be very interested.

How does a DHT in a Bittorent client get "bootstrapped"?

If I have a torrent w/o any trackers in it, and I just started a bittorent client so I have no peers yet...how do I know who to first connect with in the DHT? It seems like I would have to know at least ONE node in the DHT to get started....

The mainline DHT bootstrap nodes are router.utorrent.com and a CNAME to it, router.bittorrent.com. Port 6881.

When a BitTorrent client connects to DHT, there is an initial place that it goes to find peers. With the original BitTorrent client, there was a url to bitorrent.com that would help get things started. I tried looking up the reference but I couldn't find it. Once you've established connections with other clients, then you can do an announce on the DHT network to find peers for the torrent you're looking for.
Here's a link to the BitTorrent specs that discuss DHT.
A trackerless torrent dictionary does
not have an "announce" key. Instead, a
trackerless torrent has a "nodes" key.
This key should be set to the K
closest nodes in the torrent
generating client's routing table.
Alternatively, the key could be set to
a known good node such as one operated
by the person generating the torrent.
Please do not automatically add
"router.bittorrent.com" to torrent
files or automatically add this node
to clients routing tables.

the graph at the bottom of this DHT monitoring project site shows
dht.transmissionbt.com
router.utorrent.com
router.bittorrent.com
as bootstrapping peers

In BiTTorrent, you have three main options:
Torrent File: some torrent files can embed nodes for you to link into the DHT with (in fact, it's recommended when making a torrent file)
Hardcoding: Some torrent clients hard code a few bootstrap nodes (like the ones mentioned by stk). These are usually run by companies and organizations with long-running servers.
PEX / Peer Conversations: You can usually ask for DHT nodes from the people you are downloading other torrents from (if your clients understand eachothers language. ie some versions are incompatible).

Transmission uses a hardcoded bootstrap node for dht if there is no other way to get peers:
bootstrap_from_name( "dht.transmissionbt.com", 6881, bootstrap_af(session) );
I guess each torrent client uses their own bootstrap node.

For the record, Deluge also uses hardcoded boostrap nodes:
dht_bootstraps = set(
lt_bootstraps.split(',')
+ [
'router.bittorrent.com:6881',
'router.utorrent.com:6881',
'router.bitcomet.com:6881',
'dht.transmissionbt.com:6881',
'dht.aelitis.com:6881',
]
)

A client can learn about other DHT-capable peers through it's interactions with them. A peer's support for DHT is advertised in it's Handshake. Once a client discovers at least one good, well-connected DHT peer, it can navigate the DHT to find more and closer DHT peers. It will remember these peers, called nodes in DHT-speak, between restarts of the software and maintain/update the list continuously while it is running. In the worse case where a client knows of no good DHT-capable peers, it will require you to download a tracker-based torrent so it can hopefully contact a few good DHT-capable peers it learns about through the tracker.
Update:
For it's initial list of DHT peers, as #Seppo points out, a torrent client can use one or more hard-coded DNS names to find the addresses for well-known peers, and it may also include a hard-coded list of peers as a final fallback as well. One limitation of DNS, however, it no port information is provided so a default port of 6881 is generally assumed whereas other means support peers operating on different ports.

Here are the primary nodes I've come across.
dht.transmissionbt.com 6881
router.bittorrent.com 6881
router.bitcomet.com 6881
dht.aelitis.com 6881
bootstrap.jami.net 4222

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string