If firewalls don't accept incoming connections by default how do p2p networks work? - p2p

If firewalls block all incoming connections by default how do p2p technologies work? Like torrents, how do you connect to everyone who shares a file and get the information from them? Does it go through a relay server or something?

I'm not up on everything about BitTorrent, as I am about general P2P connectivity techniques. Typically clients in a P2P network rendezvous on a common signaling server (e.g. SIP, XMPP, tracking server, web site) to exchange IP addresses,other meta data, and messages to bootstrap direct connections. Then use any of the following techniques below to get a communications session going:
Both sides attempt to connect to each other simultaneously - in case one side can't accept incoming connections, but is allowed to make outbound connections. Such is the case for the firewall scenario.
Hole punching (used in conjunction with above). Relays are not required per se, but do help insure connectivity when both peers are behind network devices that are difficult to traverse. There's both UDP Hole Punching as well as TCP Hole Punching techniques. More info here.
Relays, including TURN servers, can be deployed into a P2P network when direct connectivity is not possible. All your favorite video call applications deploy relays for these scenarios, but do their best to get peers directly connected to avoid the cost of relaying.
Bing for the following topics: STUN, TURN, ICE (Interactive Connectivity Establishment), libjingle, pjnath, libnice.

If a firewall (as opposed to a NAT) intends to block incoming connections (which I believe they normally do), there's nothing you can do about it. You can try all the hole punching you like. This is achieved by just implementing a symmetric NAT, i.e. each pin-hole is open specifically for a ip,port-quadruple (source, destination port and IP).
BitTorrent is not likely to stop working just because you can't receive incoming connections, it will just be performing slightly worse. Chances are that there are at least some people on the swarm that can receive connections, and you just connect to them.
This is an old explanation of how being firewalled mostly just means worse performance.
EDIT:
The short answer to the "why" is that the majority of peers are not behind firewalls (just NATs).

For posterity:
Short answer is, it's done via https://en.wikipedia.org/wiki/Hole_punching_(networking)
When you connect to google.com from your machine (which most likely sits behind a router and NAT) your machines IP address and NAT'ed port number get sends to google servers. Google uses these connection info to send search results to your machine.
You can think of google's servers similar to one node in the p2p network. Google were able to 'reach' you from their servers even though your router doesn't accept incoming connection. It's because they didn't initiate the connection (they don't know your ip:NATedport combination to initiate a connection). But your machine can make outbound connections and send/publish your ip:NATedport combo to outside world.
So both nodes in the p2p send their ip:NATedport combo to a third party site and exchange this info and start connecting (whoever got the info first) to one another.

LetF = # firewalled nodes,
O = # nodes with open incoming port,
T = F + O = total # of nodes,
P = O / T = fraction of total nodes that have open ports,
Cf = max # connections each firewalled node can have, and
Co = max # connections each open node can have.
Then,
Cf = O, ∵ firewalled nodes can only connect to open nodes
Co = T - 1, ∵ open nodes can connect to all other nodes (and once all the firewalled nodes have first connected to them)so
Co / Cf = (T - 1) / O = 1 / P - 1 / O.
If T is large,
Co / Cf ≈ 1 / P.
So, for example, say half the nodes have open incoming ports: P = ½. Then,
Co / Cf ≈ 2,
which means that open nodes have ~2× the number of connections as firewalled nodes (meaning they are overburdened, but also meaning they have more visibility).
Thus, it is better for particular nodes and for the entire network to have P = 1; all nodes will be equally visible and equally burdened.
If P = 0, the network won't work at all ∵ no one will be able to initiate connections to other peers.

Related

Explaining NAT Tranversal C++?

I have created an P2P application which is FULLY decentralized and is using a Kademlia algorithm to make it so. This has been tested on the local network and it completely successful.
I heard about UDP Hole Punching, however hole punching requires the peer to know about the clients IP and vice versa, however as said due to design it is impossible. As it would require each peer to keep on asking the supernode\server for new arrivals and keep them assigned and so naturally it is not too dependable especially if the supernode is down.
Due to the design of the algorithm 1 peer does not know the other peers IP address so I need 1 peer to completely open a port up for PUBLIC to be able to connect to, how can I do this, in Windows? could someone perhaps also give me suitable links that might give me a direction?
It would be preferable (but not absolute) if they use c++\c as example
I think you will have to change your architecture a little. There is no other way for NAT traversal unless you configure your NAT for port forwarding (I think you don't want this). You might need to implement three layers:
Lower layer 1: it knows about IP addresses and ports and can solve problems like hole punching or dealing with servers (which could be down but first you have no choice and second you can add alternatives for connectivity).
Layer 2: implements special naming, addressing and location services for your solution (instead of using IP addresses).
Upper layer 3: implements your p2p solution using lower layer naming and location services.
First of all, you need to examine your design. If it needs 100% connectivity between all nodes (without relays) it's probably going to fail under IPv4 since not all NATs are traversable. And possibly under IPv6 due to stateful firewalls.
Now, for the nat traversal: A solution for DHT-assisted NAT traversal is for NATed nodes to have a rendezvous node.
To keep the UDP NAT mappings open it has to regularly ping that rendezvous node.
Additionally it has to announce the address of the rendezvous point on the DHT, e.g. under hash("rendezvous" + node ID) or simply on its reachable neighbor nodes.
The rendezvous node can then act as coordination point for hole punching.
This does not require any special "supernode", just other (possibly multiple, thus eliminating the SPOF) nodes in the network that are not NATed and can assist.
Additional mechanisms such as UPnP IGP, NAT-PMP, PCP and ultimately instructing users to forward the needed ports can also help to reduce the need for nat traversal.
Due to the design of the algorithm 1 peer does not know the other peers IP address so I need 1 peer to completely open a port up for PUBLIC to be able to connect to
First of all this will only work for Full cone NAT. For other types of NAT that public IP:Port you open will only work for specific destination. In your case you don't know the destination so it's impossible.
In case of full cone NAT, you send a packet to a random address with low TTL value so that the packet drops in the middle and doesn't reach that address. If it reaches than that address's NAT might block you. If you do this then a port will be opened for anybody to send you a packet. You need to keep sending packets after some short interval for that port to be remained open. Here is a problem that you can't choose which port to open in the NAT. The NAT will assign you a free port on its own.
Finally I don't see any point doing any of the above if peers can't exchange their IP information with each other. You should use a signalling protocol like SIP or XMPP to exchange IP information between peers.
To learn more about NATs please read this answer.

How do you create a peer to peer connection without port forwarding or a centeralized server?

I recall reading an article about a proposed way to do this. If I recall correctly, the researchers successfully created a connection to a client on another network without port forwarding by sending HTTP packets to each other (Alice pretends that Bob is an HTTP web server while Bob pretends Alice is a web server).
I'm not sure if that makes sense, but does anyone know where I can find the article or does anyone have any other ideas how to connect two clients together without a central server or port forwarding?
Is it even possible?
Edit: I would know the IPs of both computers and port the program listens on.
It is possible. I see at least 2 parts to your question. (It is not going to be HTTP packet. It is a lot more complex than that.)
First off, I believe you might be talking about a concept called decentralized P2P network. The main idea behind a decentralized peer-to-peer network is the fact that nodes conjoint in such a network will not require central server or group of servers.
As you might already know, most common centralized peer-to-peer networks require such centralized system to exchange and maintain interconnectivity among nodes. The basic concept is such, a new node will connect to one of the main servers to retrieve information about other nodes on the network to maintain its connectivity and availability. The central system gets maintained through servers constantly synchronizing network state, relevant information, and central coordination among each other.
Decentralized network, on the other hand, does not have any structure or predetermined core. This peer-to-peer model is also called unstructured P2P networks. Any new node will copy or inherit original links from the "parent" node and will form its own list over time. There are several categories of decentralization of such unstructured networks.
Interestingly enough, the absence of central command and control system makes it solution of choice for modern malware botnets. A great example could be Storm botnet, which employed so-called Passive P2P Monitor (PPM). PPM was able to locate the infected hosts and build peer list regardless whether or not infected hosts are behind a firewall or NAT. Wikipedia's article Storm botnet is an interesting read. There is also great collaborative study called Towards Complete Node Enumeration in a Peer-to-Peer Botnet, which provides excellent conceptual analysis and techniques employed by Storm botnet network.
Second of all, you might be talking about UDP hole punching. This is a technique or algorithm used to maintain connectivity between 2 hosts behind NATed router/gateway using 3rd comment host by means of a third rendezvous server.
There is a great paper by Bryan Ford, Pyda Srisuresh, and Dan Kegel called Peer-to-Peer Communication Across Network Address Translators.
As answered, a peer-to-peer connection requires establishment of a connection between two (presumably) residential computers, which will necessitate punching holes through both of their firewalls. For a concrete example of hole punching, see pwnat: "The only tool to punch holes through firewalls/NATs without a third party". The process, put simply, goes like this:
The "server" (who doesn't know the client's IP address, but the client knows the server's) pings a very specific ICMP Echo Request packet to 1.2.3.4 every 30 seconds. The NAT, during translation, takes note of this packet in case it gets a response.
The client sends an ICMP Time Exceeded packet to the server, which is a type of packet that usually contains the packet that failed to deliver. The client, knowing in advance the exact packet that the server has been sending to 1.2.3.4, embeds that whole packet in the Data field.
The NAT recognizes the Echo Request packet and happily relays the whole Time Exceeded packet, source IP and all, to the correct user, i.e. the server. Voila, now the server knows the client's IP and port number.
Now that the server knows the address, it begins to continually send UDP packets to the client, despite the fact that the client's NAT did not expect them and will therefore ignore them all.
The client begins sending UDP packets to the server, which will be recognized by the server's NAT as a response to the server's packets and route them appropriately.
Now that the client is sending UDP packets to the server, the server's stream of UDP packets starts getting properly routed by the client's NAT.
And, in 6 easy steps, you have established a UDP connection between a client and a server penetrating two residential firewalls. Take that, ISP!

Where the p2p software connect to initially?

In BitTorrent, client connects to the tracker specified in .torrent file. Tracker is a kind of centralized server and it is the starting point. So BitTorrent is not pure p2p.
If we want to develop pure p2p system, we should design routing overlay network. All nodes will have routing table like routers do. But even in routing overlay network, each node should know at least one existing node(GUID, IP address) initially. So how can we determine this? Should we keep 'one existing node to connect initially' forever like fixed centralized server? If so, I think this is not fully decentralized method.
The solution you describe (defining a central peer wit well-know ip address) is not the only one.
The other solution is to post an html page (or json file) in a well-know URL on the net, and make sure this item contains an updated list of peers this peer can initialy connect. This list can be updated if a peer goes down. Eventually, you can use multiple URLs.
Pure P2P system is a theoretical concept which cannot be implemented fully in reality.
You could use an anycast. So the first other client will answer and may send such an initial "client list". Where your client can connect to them to get more lists.
Classically I would implement a multicast to a adress and wait for an answer of other clients.
Firstly, a true peer to peer network is not necessarily decentralized. Secondly, decentralization does not necessarily mean that the network doesn't make use of secondary services which may themselves be centralized.
The main question in both of these issues is wheather the primary resources of the network solution are distributed through correlated peers.
For example, peer-to-peer video conferencing may use a central contacts service but still be pure peer-to-peer as long as the peers resolve such issues before entering a true peer-to-peer scope. This would also be decentralized.
What it comes down to is what your trying to solve by using peer-to-peer. A video conference is a video conference - it starts with a video being recorded on one peer and ends with the video being viewed on another. As long as each byte of this data is transferred directly between the peers (even if there's hundreds of peers in the conference, and regardless of how these peers found each other) it is a true peer-to-peer video conference.
Note that the video peers will still be in your typical ring, and that the contacts lists may still use the node key for location information rather than an IP. This will still be a network overlay as it will still be built over IP, replacing it's addressing scheme on the peer level to facilitate true peer-to-peer networking.
What it realy comes down to is the concepts of a network connection. IP just pushes packets to unspecified routers until it gets to a specific address. 'connections' between each endpoint only exist within the higher software levels (including when dealing with TCP/IP). A connection is just the data used within software to understand who is who and how each point can handle data etc. Peer-to-peer network overlays effectively distributes this data, eliminating the need for each peer to create massive amounts of connections to communicate on a massive scope. Decentralization is not required for this (as long as peer to peer communication is not centralized), and a secondary service within the system wont necessarily limit a network's scope or otherwise centralize actual peer-to-peer networking.
So to answer your question, it doesn't matter where it initially connects in order to be considered peer-to-peer, and different peer-to-peer services will handle this based on their service design.
Ok so I am thinking about writing a p2p protocol for a number of different services for an AI project, and I thought I'd come here to see if i could get some ideas on how to get the initial connection.
I have come across several ways to establish the initial connection:
1) You have a static ip address on the internet that distributes information on other peers. This isn't good, because:
a) it's a single point of failure, the service could go offline, preventing any new connections from creating an initial connection to peers,
b) the IP address could change. This could be mitigated by using a domain name which is maintained to point to the current ip address of the location of a service which provides data on peers, however this can be subverted in theory by hackers by spoofing or arp poisoning, dns attacks etc.
2) You could force the user to provide an initial ip address or hostname for another peer, and it's up to the user to find the hostname / ip address / port number. This is good, but if someone posts disinformation or they cannot find a peer on google or some other search site then obviously it's fallable.
3) You could leave it to the peer to publish their own existence in a central location - for example a group of IRC channels or a group of websites. Again, unless it's going through a central trusted domain, it's hard to determine the authenticity of the peer.
4) You could use some kind of nmap style discovery algorithm that searches through subnets for appropriate protocols. The problem with this approach is it's slow and it's likely to attract attention from things like firewalls etc.
5) This is a variation of 3) you could allow the peers to advertise their own information on a website, then instead of having to look for the information in a suitable location (a specific website, or group of websites), you can let google's search algorithm find it, and do the discovery for you, however you could imagine that this may take a few days for google to cache the website data with information on the peers. Also again, it would be upto you to provide some way to verify the authenticity of the advertised data.
6) If you are interested in an exclusive p2p network that locks out certain people (for example, you might want to have a file sharing network and you don't want law enforcement to be able to access it, or MPIAA), then you could use 2) and then have a referral system where you require that the initial connection provide the referrer's ip address, and then the service could connect to the referrers ip address and ask the referrer if they did indeed refer the referee.
That's all I can think of currently, but if anyone comes up with any other ways to do this, i would be very interested.

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

NAT, P2P and Multiplayer

How can an application be designed such that two peers can communicate directly with each other (assuming both know each other's IPs), but without outgoing connections? That's, no ports will be opened. Bitorrent for example does it, but multiplayer games (as far as I know) require port forwarding.
I'm not sure what you mean by No Outgoing Connections, I'm going to assume like everyone else you meant no Incoming Connections (they are behind a NAT/FW/etc).
The most common one mentioned so far is UPNP, which in this context is a protocol that allows you as a computer to talk to the Gateway and say forward me this port because I want someone on the outside to be able to talk to me. UPNP is also designed for other things, but this is the common thing for home networking (Actually it's one of many definitions).
There are also more common and slightly more reliable ways if you don't own the network. The most common is called STUN but if I recall correctly there are a few variants. Basically you use a third party server that allows incoming connections to try and coordinate a communication channel. Basically, what you do is send a UDP packet to you're peer, which will open up you're NAT for a response, but gets dropped on you're peer's NAT (since no forwarding rule exists yet). Through the connection to the intermediary, they are then told to do the same, which now opens up their NAT, and matches the existing rule in you're NAT. Now the communications can proceed. Their is a variant of this which will allow a TCP/IP connection as well by sending SYN and SYN-ACK messages with some coordination.
The Wikipedia articles I've linked to has links to the relevant rfc's for these protocols on precisely how they work. Essentially it comes down to, there isn't an easy answer, as this is a very network centric problem.
You need a "meeting point" in the network somewhere: the participants "meet" at a "gateway" of some sort and the said "gateway function" takes care of the forwarding.
At least that's one way of doing it: I won't try to comment on the details of Bittorrent... I am sure you can google for links.
UPNP dealt with this mostly in the recent years, but the need to open ports is because the application has been coded to listen on a specific port for a response.
Ports beneath 1024 are called "registered" because they've been assigned a port number because a company paid for it. This doesn't mean you couldn't use port 53 for a webserver or SSH, just that most will assume when they see it that they are dealing with DNS. Ports above 1024 are unregistered, so there's no association - your web browser, be it Internet Explorer/Firefox/etc, is using an unregistered port to send the request to the StackOverflow webserver(s) on port 80. You can use:
netstat -a
..on windows hosts to see what network connections are currently established, including the port involved.
UPNP can be used to negotiate with the router to open and forward a port to your application. Even bit-torrent needs at least one of the peers to have an open port to enable p2p connections. There is no need for both peers to have an open port however, since they both communicate with the same server (tracker) that lets them negotiate and determine who has an open port.
An alternative is an echo-server / relay-server somewhere on the internet that both peers trust, and have that relay all the traffic.
The "problem" with this solution is that the echo-server needs to have lots of bandwidth to accomodate all connected peers since it relays all the traffic rather than establish p2p connections.
Check out EchoWare: http://www.echogent.com/tech.htm

Resources