Maintaning more than 65535 connections on single IP - linux

Reading the following article: 10M concurrent websockets
So, there are 1000 websocket servers listening on ports 10000-11000. When a connection is made to one of these servers, I assume they continue communication from a random established TCP connection with random ports. So, as one IP is used, and there are 64K ports, how can one maintain 10M connections? Are connections identified by IP-Port pairs? Can two different connections from different IPs to same port be established? How does this work under the hood?

When a connection is made to one of these servers, I assume they continue communication from a random established TCP connection with random ports.
Wrong assumption. They communicate with the clients using the same local port number they are listening on.
So, as one IP is used, and there are 64K ports, how can one maintain 10M connections?
Not a problem.
Are connections identified by IP-Port pairs?
Yes.
Can two different connections from different IPs to same port be established?
Yes.
How does this work under the hood?
See above. IP:port pairs. You answered your own question.

Sorry for totally changing my answer.
Linux can easily support millions of open sockets if the machine has enough memory and processing power. The TCP/IP stack allows this because the socket the OS targets for a given TCP packet is determined by the source and destination IP and port tuple.
The server implementing the websocket protocol need only listen to a single TCP socket, often defined by the HTTP or HTTPS port number, but not in this example. As part of standard TCP handshaking, the server OS and application open a unique socket for the TCP connection to the new client when the HTTP request which is a websocket request is received. The websocket package takes care of upgrading the protocol used on this new socket from standard HTTP to websocket.
In the example, a goroutine is started for each websocket socket.
The client side, the side initiating the TCP connections, is limited by the number of ephemeral ports its OS can open for a given destination host and port. Honestly, I don't know if this is a limitation of the client OS or the TCP/IP specification itself.

I think the part you are missing is a TCP connection is actually two pairs of IP:PORT.
One for the server, one for the client.
The listening side of a tcp socket is generally always the same IP/Port pair.
Example: net.Listen("tcp", ":8080") is listening on port 8080 (on all interfaces in this case)
The connecting (client) side is usually uses a single outgoing IP along with a random port.
Example: net.Dial("tcp","server:8080) Selects a random available ephemeral port and then attempts to connect to server:8080.
So, in the above example, that connection is: client.ip:32768 -> server.ip:8080 (where 32768 is the ephemeral port selected)
the two pairs combined make a unique connection.
The server side can take as many connections from a single client as there are available (client side) ports. It can also take as many clients are there are IP addresses.
Think of it as, for one listening socket, you can theoretically have 2^16(ports) * 2^32(ipv4 addrs) connections.
In reality, there are reserved IPs, ports, memory limitations, etc so the number is far smaller.
For exmaple, the ephemeral port range on Linux is 32768 - 61000. Which means I'll start getting errors if I net.Dial("tcp", "server:8080") more than 28232 times as I will have exhausted my ephemeral port range for the given server address. But if the server is listening on 2 separate ports, I can do 28232 to the first port, and another 28232 to the second port.
When you see people do the 10MM connection tests, they have to use multiple client IPs or multiple server IPs/Ports to achieve this (or a combo of both to get 10MM unique client:ip/server:ip pairs)

Related

Do all the sockets in a namespace connect to the same port on the server in socket.io?

I thought when a server is started, it creates a specific number of TCP ports on a computer. so whenever a new connection comes in, it assigns a port to that client ('connection'). Recently I opened tutorialsPoint website 'https://www.tutorialspoint.com/socket.io/socket.io_namespaces.htm' and in there is written:
"Socket.IO allows you to “namespace” your sockets, which essentially means assigning different endpoints or paths. This is a useful feature to minimize the number of resources (TCP connections) and at the same time separate concerns within your application by introducing separation between communication channels. Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server".
This part i did not understand: "Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server". My question is how can all the connections share a single port on the web-server.
Any help will be highly appreciated.
Do all the sockets in a namespace connect to the same port on the server in socket.io?
Yes, they do.
First off socket.io is built on the underlying webSocket protocol. A webSocket connection starts with an http connection which is built on top of a TCP connection and then the two sides agree to "upgrade" the protocol to start talking the webSocket protocol instead of the http protocol.
So, when a socket.io connection comes in, it's initially an http connection.
Second, any TCP server is listening for inbound connections on a known port. The client must know what that port is and the client attempts to connect to the combination of IP address and port. A regular TCP server using only one network adapter will just be listening on that one port. All inbound client connections will arrive on that one port.
I thought when a server is started, it creates a specific number of TCP ports on a computer. so whenever a new connection comes in, it assigns a port to that client ('connection').
That's not how it works. A listening server creates a passive socket listening for inbound connections on one specific port. When a TCP client initiates an outbound connection, that client picks a dynamically selected port number for that outbound connection (that is unique for that client and not currently in use). This source port number is typically not visible in TCP, http, webSocket or socket.io programming (though you can see what is is if you want - you just don't have to use it yourself at the level we usually program at). It's part of the TCP plumbing that helps packets get delivered to the right socket. So, at that point it has a source IP address and a source port number. It then attempts to connect to a target IP address on a target port.
That unique combination of those four parameters:
source IP
source port (dynamically assigned on the client)
target IP (known in advance by the client)
target port (known in advance by the client)
defines a unique TCP connection. No two TCP connections will have the same four parameters. If the same client makes another TCP connection to the same target IP and port, it will be assigned a different source port number and thus it will be a different unique combination.
There's one little (somewhat confusing) aspect here that I'll make you aware of, but not try to overly explain or confuse things by. Many clients are actually on a private network and have a private IP address. That private IP address is not what the server actually sees as the source of the connection. At some point the connection goes through a gateway that connects the private network to a public network. This gateway will do NAT (network address translation). It will swap the private source IP/port for a public source IP/port that corresponds to the gateway itself. It remembers what it swapped so that when packets come back the other directly, it can swap it back. So, the target server actually believes it's communicating with the gateway, but anything the target sends to the gateway is "forwarded" onto the private IP address/port of the original sender. So, you don't really need to understand the details of the gateway except that it's serves as a broker between the private IP address of some computer on a private network and some computer on the public internet that you are trying to connect to. It does what's called "network address translation" to make this all work. For the rest of the discussion, you should forget about this and just pretend that both source and target are both on the public internet with public IP addresses (even though that is almost never the actual case, but the gateway makes it just work as if they were).
"Socket.IO allows you to “namespace” your sockets, which essentially means assigning different endpoints or paths. This is a useful feature to minimize the number of resources (TCP connections) and at the same time separate concerns within your application by introducing separation between communication channels. Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server".
In socket.io, when you connect on a namespace, you are creating a new underlying webSocket connection to the same target IP/port. A server can have many inbound connections to the same IP/port. Each is given it's own TCP socket and the four parameters mentioned above uniquely define each one. When an inbound network packet arrives at the lowest level, TCP can tell which source IP and source port it came from and which target IP/port is was sent and that allows the TCP driver to figure out which socket that packet belongs to so that the packet can be delivered to the code that is monitoring that specific socket.
This part i did not understand: "Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server". My question is how can all the connections share a single port on the web-server.
To use a namespace in socket.io, you make a new socket.io connection to that specific namespace. You don't use multiple namespaces on a single socket.io connection. But, a namespace operates at a higher level than the TCP or webSocket connection logic. It rides on top of that in the application layer. So, all namespace connections, no matter which namespace you are using, connect to the same server on the same IP and same port. Once the connection has been established, socket.io sends some data that it would like a "logical" connection on this namespace and then the receiving socket.io code is informed that the new connection belongs in this namespace.
Here's a useful article to read on the topic: Understanding socket and port in TCP.

Understanding Client Server Connections [duplicate]

This question already has answers here:
Does the port change when a server accepts a TCP connection?
(3 answers)
Closed 4 years ago.
I understand the basics of how ports work. However, what I don't get is how multiple clients can simultaneously connect to say port 80. I know each client has a unique (for their machine) port. Does the server reply back from an available port to the client, and simply state the reply came from 80? How does this work?
First off, a "port" is just a number. All a "connection to a port" really represents is a packet which has that number specified in its "destination port" header field.
Now, there are two answers to your question, one for stateful protocols and one for stateless protocols.
For a stateless protocol (ie UDP), there is no problem because "connections" don't exist - multiple people can send packets to the same port, and their packets will arrive in whatever sequence. Nobody is ever in the "connected" state.
For a stateful protocol (like TCP), a connection is identified by a 4-tuple consisting of source and destination ports and source and destination IP addresses. So, if two different machines connect to the same port on a third machine, there are two distinct connections because the source IPs differ. If the same machine (or two behind NAT or otherwise sharing the same IP address) connects twice to a single remote end, the connections are differentiated by source port (which is generally a random high-numbered port).
Simply, if I connect to the same web server twice from my client, the two connections will have different source ports from my perspective and destination ports from the web server's. So there is no ambiguity, even though both connections have the same source and destination IP addresses.
Ports are a way to multiplex IP addresses so that different applications can listen on the same IP address/protocol pair. Unless an application defines its own higher-level protocol, there is no way to multiplex a port. If two connections using the same protocol simultaneously have identical source and destination IPs and identical source and destination ports, they must be the same connection.
Important:
I'm sorry to say that the response from "Borealid" is imprecise and somewhat incorrect - firstly there is no relation to statefulness or statelessness to answer this question, and most importantly the definition of the tuple for a socket is incorrect.
First remember below two rules:
Primary key of a socket: A socket is identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL} not by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT} - Protocol is an important part of a socket's definition.
OS Process & Socket mapping: A process can be associated with (can open/can listen to) multiple sockets which might be obvious to many readers.
Example 1: Two clients connecting to same server port means: socket1 {SRC-A, 100, DEST-X,80, TCP} and socket2{SRC-B, 100, DEST-X,80, TCP}. This means host A connects to server X's port 80 and another host B also connects to the same server X to the same port 80. Now, how the server handles these two sockets depends on if the server is single-threaded or multiple-threaded (I'll explain this later). What is important is that one server can listen to multiple sockets simultaneously.
To answer the original question of the post:
Irrespective of stateful or stateless protocols, two clients can connect to the same server port because for each client we can assign a different socket (as the client IP will definitely differ). The same client can also have two sockets connecting to the same server port - since such sockets differ by SRC-PORT. With all fairness, "Borealid" essentially mentioned the same correct answer but the reference to state-less/full was kind of unnecessary/confusing.
To answer the second part of the question on how a server knows which socket to answer. First understand that for a single server process that is listening to the same port, there could be more than one socket (maybe from the same client or from different clients). Now as long as a server knows which request is associated with which socket, it can always respond to the appropriate client using the same socket. Thus a server never needs to open another port in its own node than the original one on which the client initially tried to connect. If any server allocates different server ports after a socket is bound, then in my opinion the server is wasting its resource and it must be needing the client to connect again to the new port assigned.
A bit more for completeness:
Example 2: It's a very interesting question: "can two different processes on a server listen to the same port". If you do not consider protocol as one of the parameters defining sockets then the answer is no. This is so because we can say that in such a case, a single client trying to connect to a server port will not have any mechanism to mention which of the two listening processes the client intends to connect to. This is the same theme asserted by rule (2). However, this is the WRONG answer because 'protocol' is also a part of the socket definition. Thus two processes in the same node can listen to the same port only if they are using different protocols. For example, two unrelated clients (say one is using TCP and another is using UDP) can connect and communicate to the same server node and to the same port but they must be served by two different server processes.
Server Types - single & multiple:
When a server processes listening to a port that means multiple sockets can simultaneously connect and communicate with the same server process. If a server uses only a single child process to serve all the sockets then the server is called single-process/threaded and if the server uses many sub-processes to serve each socket by one sub-process then the server is called a multi-process/threaded server. Note that irrespective of the server's type a server can/should always use the same initial socket to respond back (no need to allocate another server port).
Suggested Books and the rest of the two volumes if you can.
A Note on Parent/Child Process (in response to query/comment of 'Ioan Alexandru Cucu')
Wherever I mentioned any concept in relation to two processes say A and B, consider that they are not related by the parent-child relationship. OS's (especially UNIX) by design allows a child process to inherit all File-descriptors (FD) from parents. Thus all the sockets (in UNIX like OS are also part of FD) that process A listening to can be listened to by many more processes A1, A2, .. as long as they are related by parent-child relation to A. But an independent process B (i.e. having no parent-child relation to A) cannot listen to the same socket. In addition, also note that this rule of disallowing two independent processes to listen to the same socket lies on an OS (or its network libraries), and by far it's obeyed by most OS's. However, one can create own OS which can very well violate this restriction.
TCP / HTTP Listening On Ports: How Can Many Users Share the Same Port
So, what happens when a server listen for incoming connections on a TCP port? For example, let's say you have a web-server on port 80. Let's assume that your computer has the public IP address of 24.14.181.229 and the person that tries to connect to you has IP address 10.1.2.3. This person can connect to you by opening a TCP socket to 24.14.181.229:80. Simple enough.
Intuitively (and wrongly), most people assume that it looks something like this:
Local Computer | Remote Computer
--------------------------------
<local_ip>:80 | <foreign_ip>:80
^^ not actually what happens, but this is the conceptual model a lot of people have in mind.
This is intuitive, because from the standpoint of the client, he has an IP address, and connects to a server at IP:PORT. Since the client connects to port 80, then his port must be 80 too? This is a sensible thing to think, but actually not what happens. If that were to be correct, we could only serve one user per foreign IP address. Once a remote computer connects, then he would hog the port 80 to port 80 connection, and no one else could connect.
Three things must be understood:
1.) On a server, a process is listening on a port. Once it gets a connection, it hands it off to another thread. The communication never hogs the listening port.
2.) Connections are uniquely identified by the OS by the following 5-tuple: (local-IP, local-port, remote-IP, remote-port, protocol). If any element in the tuple is different, then this is a completely independent connection.
3.) When a client connects to a server, it picks a random, unused high-order source port. This way, a single client can have up to ~64k connections to the server for the same destination port.
So, this is really what gets created when a client connects to a server:
Local Computer | Remote Computer | Role
-----------------------------------------------------------
0.0.0.0:80 | <none> | LISTENING
127.0.0.1:80 | 10.1.2.3:<random_port> | ESTABLISHED
Looking at What Actually Happens
First, let's use netstat to see what is happening on this computer. We will use port 500 instead of 80 (because a whole bunch of stuff is happening on port 80 as it is a common port, but functionally it does not make a difference).
netstat -atnp | grep -i ":500 "
As expected, the output is blank. Now let's start a web server:
sudo python3 -m http.server 500
Now, here is the output of running netstat again:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
So now there is one process that is actively listening (State: LISTEN) on port 500. The local address is 0.0.0.0, which is code for "listening for all". An easy mistake to make is to listen on address 127.0.0.1, which will only accept connections from the current computer. So this is not a connection, this just means that a process requested to bind() to port IP, and that process is responsible for handling all connections to that port. This hints to the limitation that there can only be one process per computer listening on a port (there are ways to get around that using multiplexing, but this is a much more complicated topic). If a web-server is listening on port 80, it cannot share that port with other web-servers.
So now, let's connect a user to our machine:
quicknet -m tcp -t localhost:500 -p Test payload.
This is a simple script (https://github.com/grokit/dcore/tree/master/apps/quicknet) that opens a TCP socket, sends the payload ("Test payload." in this case), waits a few seconds and disconnects. Doing netstat again while this is happening displays the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:54240 ESTABLISHED -
If you connect with another client and do netstat again, you will see the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:26813 ESTABLISHED -
... that is, the client used another random port for the connection. So there is never confusion between the IP addresses.
Normally, for every connecting client the server forks a child process that communicates with the client (TCP). The parent server hands off to the child process an established socket that communicates back to the client.
When you send the data to a socket from your child server, the TCP stack in the OS creates a packet going back to the client and sets the "from port" to 80.
Multiple clients can connect to the same port (say 80) on the server because on the server side, after creating a socket and binding (setting local IP and port) listen is called on the socket which tells the OS to accept incoming connections.
When a client tries to connect to server on port 80, the accept call is invoked on the server socket. This creates a new socket for the client trying to connect and similarly new sockets will be created for subsequent clients using same port 80.
Words in italics are system calls.
Ref
http://www.scs.stanford.edu/07wi-cs244b/refs/net2.pdf

A client connected to two servers with the same port

According to TCP/IP specification I consider it impossible to ESTABLISH two connections with the same port from the client side. BUT IT DIT!
The matchine 172.22.3.137 acts as client and left ones are servers. So does this mean it is possible for a client to connect to multiple servers with identical port?
Any ideas?
According to the TCP specification, a connection is identified by four numbers: client port, client address, server port, server address.
It is entirely possible for client ports to be reused, otherwise you could have only 64k connections from any machine.
What is not possible, is to connect from the same client port to the same server (address and port), this would make the two connections indistinguishable.
Have you checked that when second connection was setuo first one still exist ??
Check am sure first one must have terminated as if not then your machine willnot send any ack and three way handshake will not complete.

TCP/IP basics: Destination port relevance

Ok this is kind of embarassing but I just have a rather "noob" question.
In a client server TCP communications, where my system is a client accessing a remote server at say Port XX, isnt the client opening a random port YY in its system to talk to remote port XX?
So when we code we do specify the destination port XX right?
For the client, the port YY itself is chosen when the socket is created, isnt it?
Is there anyway I could monitor/restrict/control any client talking to a particular server?(like say clients talking to servers at specific serving ports??)
Is there any IPTABLE rule or some firewall rule restricting the client?
Can this be done at all??
Are destination ports saved in the socket structures? If so where??
Thanks!
First, server side creates a listening socket, with the chain of socket(2), bind(2), and listen(2) calls, then waits for incoming client connection requests with the accept(2) call. Once a client connects (socket(2) and then connect(2) on the client side) and the TCP/IP stacks of the client and the server machines complete the three way handshake, the accept(2) returns new socket descriptor - that's the server's end of the connected socket. Both bind(2) on the server side, and connect(2) on the client side take server's address and port.
Now, the full TCP connection is described by four numbers - server address, server port, client address, and client port. The first two must obviously be known to the client prior to the connection attempt (otherwise, where do we go?). The client address and port, while could be specified explicitly with the bind(2), are usually assigned dynamically - the address is the IP address of the outgoing network interface, as determined by the routing table, and the port selected out of range of ephemeral ports.
The netstat(8) command shows you established connections. Adding -a flag lets you see listening sockets, -n flag disables DNS and service resolution, so you just see numeric addresses and ports.
Linux iptables(8) allows you to restrict where clients are allowed to connect to. You can restrict based on source and destination ports, addresses, and more.
You can get socket local binding with getsockname(2) call, remote binding is given by getpeername(2).
Hope this makes it a bit more clear.
Yes you can create a firewall rule to prevent outbound TCP connections to port XX. For example, some organizations prevent outbound TCP port 25, to prevent spam being sent from network PCs to remote SMTP servers.

Remote port blocking in firewalls?

some guys use a firewall on their laptops which not only blocks their own local incoming ports (except those they need for their application) but also blocks messages unless they are issued from a distinct port number. We're talking about a local UDP server which is listening to UDP broadcasts.
The problem is that the remote client uses a random port, say 1024, which is blocked unless they tell the firewall to accept it.
What puzzles me is that as far as I know from using sockets in my programs is that usually the client gets its port number from the OS, whereas only when you have a server, you bind your socket to a distinct port, right?
In my literature and in tutorials and code snippets in the web I haven't found any clue that clients should be using fixed port numbers at all.
So how is this in reality? Am I probably missing a point?
Are there client applications around using fixed ports?
Is is actually useful to block remote ports with a firewall?
And if yes, what level of added security does this give to you?
Thanks for enlightenment in beforehand...
Although the default API's allow the network stack to select a local port for client connections, clients may specify a fixed port for various reasons.
Some specifications (FTP) specify a fixed port for clients. Most servers don't care if clients get this correct.
Some clients use a fixed pool of ports for egress from a LAN to the Internet. This allows firewall rules to more completely lock down outbound traffic.
Source ports are sometimes uses as a weak type of "security through obscurity".
You always get a random address and/or port when not explicitly having bound to one before sending.
Daemons are usually bound to a fixed port, so that:
you can actually contact them without having to try all possible ports or utilize a secondary resolver (remember the SUNRPC portmapping crap?)
and because a TCP socket is not allowed to listen() if it has not bound to a port, IIRC.
Are there client applications around using fixed ports?
Some can be configured so, like BIND9.
useful to block remote ports with a firewall?
No, because your peer may choose any port of his. Block him and you'll lose a customer, so to speak.

Resources