Nodejs TCP connection client port assignment - node.js

I created tcp connection between client and server using nodejs (net module). Server is listening on already predefined port and client is connecting to that port.
As far as i understand port for client is dynamically assigned by node? Is that correct?
What kind of algorithm node is using to assign "random" port for the client? How this works, is this determined by node or by OS?
Is it possible to define static port which client is going to use? Is it possible to define range of ports for the client to use?
NOTE: I think i found discussion/question with similar subject on stackoverflow before, but i cannot find it anymore. I would apprecaite if you can share any reliable resources regarding this subject.

The source port number is usually pretty much irrelevant to your programming unless you have a router or firewall that is somehow restrictive in that regard. It is merely used by the underlying TCP infrastructure to keep track of different TCP connections.
From this article:
A TCP/IP connection is identified by a four element tuple: {source IP,
source port, destination IP, destination port}. To establish a TCP/IP
connection only a destination IP and port number are needed, the
operating system automatically selects source IP and port.
The above referenced article describes how Linux selects the source port number.
As to your particular questions:
What kind of algorithm node is using to assign "random" port for the
client? How this works, is this determined by node or by OS?
It is determined by the OS. That source port number is selected by the originating host at the TCP level before the connection is even made to node.js.
Some other reference articles:
Does the TCP source port have to be unique per host?
how can an application use port 80/HTTP without conflicting with browsers?
Note: there is no security reason I'm aware of for a firewall to limit the source port number or block certain source port numbers. They are a TCP bookkeeping number only, not related at all to security or the type of service being used. Note, this is different than the destination port which is usually correlated directly with the type of service being used (e.g. 80 is HTTP, 25 is SMTP, 143 is IMAP, etc... When you make a TCP connection to a different host, you specify the host address and the destination port number. You don't specify the source port number.

The selected answer is provides a lot of info, but does not deal with the underlying problem. Node does not appear to allow https.request to specify a port for the client. There exist localAddress and localPort options, but they appear to be broken.
I've opened a new question on this issue. Hopefully someone will answer with something other than "just don't do that."
Is there a way to set the source port for a node js https request?

Related

What port number shall I use?

I am writing an app that is to be running on a Windows PC. I need to create a server socket listening on 127.0.0.1, and another client socket which is to connect with this server socket.
Since the data exchange between the two sockets are within the same machine and there is no client connecting from outside of the machine, what port to use is insignificant, as long as the two sockets use the same port number.
So, how do I decide which port number to use? Shall it be a hard-coded port number such as 49500? What if another unrelated app on this machine happen to use this port number? Or shall I get the list of all used ports and programmatically pick an unused port?
Just want to know what is the best approach. Thanks.
ports within 0 to 1023 are generally controlled and you should assign your socket with higher port numbers, although in that range ports within 1024 and 49151 can be registered for others to be informed about that and not use them.
if you want to avoid conflicts you can see registered ports on your machine and assign a port number to your socket which is empty but ports higher than that (49152 to 65535) are completely free and are not even registered.
generally, it is not common to worry about that. for example, two major applications like VMware and apache web server operate on the same port number (443), and if you want to use VMware workstation and Xampp (which works with apache) you have to simply make one of them listen on another port and its not a big deal. so in my opinion the best practice is to let your users change this via a config file or something similar.
for further information, you can search google. for instance this link might be useful:
https://www.sciencedirect.com/topics/computer-science/registered-port#:~:text=Well%2Dknown%20ports%E2%80%94Ports%20in,1023%20are%20assigned%20and%20controlled.&text=Registered%20ports%E2%80%94Ports%20in%20the,be%20registered%20to%20prevent%20duplication.&text=Dynamic%20ports%E2%80%94Ports%20in%20the,assigned%2C%20controlled%2C%20or%20registered.

Do all the sockets in a namespace connect to the same port on the server in socket.io?

I thought when a server is started, it creates a specific number of TCP ports on a computer. so whenever a new connection comes in, it assigns a port to that client ('connection'). Recently I opened tutorialsPoint website 'https://www.tutorialspoint.com/socket.io/socket.io_namespaces.htm' and in there is written:
"Socket.IO allows you to “namespace” your sockets, which essentially means assigning different endpoints or paths. This is a useful feature to minimize the number of resources (TCP connections) and at the same time separate concerns within your application by introducing separation between communication channels. Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server".
This part i did not understand: "Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server". My question is how can all the connections share a single port on the web-server.
Any help will be highly appreciated.
Do all the sockets in a namespace connect to the same port on the server in socket.io?
Yes, they do.
First off socket.io is built on the underlying webSocket protocol. A webSocket connection starts with an http connection which is built on top of a TCP connection and then the two sides agree to "upgrade" the protocol to start talking the webSocket protocol instead of the http protocol.
So, when a socket.io connection comes in, it's initially an http connection.
Second, any TCP server is listening for inbound connections on a known port. The client must know what that port is and the client attempts to connect to the combination of IP address and port. A regular TCP server using only one network adapter will just be listening on that one port. All inbound client connections will arrive on that one port.
I thought when a server is started, it creates a specific number of TCP ports on a computer. so whenever a new connection comes in, it assigns a port to that client ('connection').
That's not how it works. A listening server creates a passive socket listening for inbound connections on one specific port. When a TCP client initiates an outbound connection, that client picks a dynamically selected port number for that outbound connection (that is unique for that client and not currently in use). This source port number is typically not visible in TCP, http, webSocket or socket.io programming (though you can see what is is if you want - you just don't have to use it yourself at the level we usually program at). It's part of the TCP plumbing that helps packets get delivered to the right socket. So, at that point it has a source IP address and a source port number. It then attempts to connect to a target IP address on a target port.
That unique combination of those four parameters:
source IP
source port (dynamically assigned on the client)
target IP (known in advance by the client)
target port (known in advance by the client)
defines a unique TCP connection. No two TCP connections will have the same four parameters. If the same client makes another TCP connection to the same target IP and port, it will be assigned a different source port number and thus it will be a different unique combination.
There's one little (somewhat confusing) aspect here that I'll make you aware of, but not try to overly explain or confuse things by. Many clients are actually on a private network and have a private IP address. That private IP address is not what the server actually sees as the source of the connection. At some point the connection goes through a gateway that connects the private network to a public network. This gateway will do NAT (network address translation). It will swap the private source IP/port for a public source IP/port that corresponds to the gateway itself. It remembers what it swapped so that when packets come back the other directly, it can swap it back. So, the target server actually believes it's communicating with the gateway, but anything the target sends to the gateway is "forwarded" onto the private IP address/port of the original sender. So, you don't really need to understand the details of the gateway except that it's serves as a broker between the private IP address of some computer on a private network and some computer on the public internet that you are trying to connect to. It does what's called "network address translation" to make this all work. For the rest of the discussion, you should forget about this and just pretend that both source and target are both on the public internet with public IP addresses (even though that is almost never the actual case, but the gateway makes it just work as if they were).
"Socket.IO allows you to “namespace” your sockets, which essentially means assigning different endpoints or paths. This is a useful feature to minimize the number of resources (TCP connections) and at the same time separate concerns within your application by introducing separation between communication channels. Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server".
In socket.io, when you connect on a namespace, you are creating a new underlying webSocket connection to the same target IP/port. A server can have many inbound connections to the same IP/port. Each is given it's own TCP socket and the four parameters mentioned above uniquely define each one. When an inbound network packet arrives at the lowest level, TCP can tell which source IP and source port it came from and which target IP/port is was sent and that allows the TCP driver to figure out which socket that packet belongs to so that the packet can be delivered to the code that is monitoring that specific socket.
This part i did not understand: "Multiple namespaces actually share the same WebSockets connection thus saving us socket ports on the server". My question is how can all the connections share a single port on the web-server.
To use a namespace in socket.io, you make a new socket.io connection to that specific namespace. You don't use multiple namespaces on a single socket.io connection. But, a namespace operates at a higher level than the TCP or webSocket connection logic. It rides on top of that in the application layer. So, all namespace connections, no matter which namespace you are using, connect to the same server on the same IP and same port. Once the connection has been established, socket.io sends some data that it would like a "logical" connection on this namespace and then the receiving socket.io code is informed that the new connection belongs in this namespace.
Here's a useful article to read on the topic: Understanding socket and port in TCP.

Can TCPv4 source and destination ports conflict with each other? Or do source and destination ports live in their own address spaces?

Let me be more specific about my question with an example: Let's say that I have a slew of little servers that all start up on different ports using TCPv4. These ports are going to be destination ports, of course. Let's further assume that these little servers don't just start up at boot time like a typical server, but rather they churn dynamically based on demand. They start up when needed, and may shut themselves down for a while, and then later start up again.
Now let's say that on this same computer, we also have lots of client processes making requests to server processes on other computers via TCPv4. When a client makes such a request, it is assigned a source port by the OS.
Let's say for the sake of this example that a client processes makes a web request to a RESTful server running on a different computer. Let's also say that the source port assigned by the OS to this request is port 7777.
For this example let's also say that while the above request is still occurring, one of our little servers wants to start up, and it wants to start up on destination port 7777.
My question is will this cause a conflict? I.e., will the server get an error because port 7777 is already in use? Or will everything be fine because these two different kinds of ports live in different address spaces that cannot conflict with each other?
One reason I'm worried about the potential for conflict here is that I've seen web pages that say that "ephemeral source port selection" is typically done in a port number range that begins at a relatively high number. Here is such a web page:
https://www.cymru.com/jtk/misc/ephemeralports.html
A natural assumption for why source ports would begin at high numbers, rather than just starting at 1, is to avoid conflict with the destination ports used by server processes. Though I haven't yet seen anything that explicitly comes out and says that this is the case.
P.S. There is, of course, a potential distinction between what the TCPv4 protocol spec has to say on this issue, and what OSes actually do. E.g., perhaps the protocol is agnostic, but OSes tend to only use a single address space? Or perhaps different OSes treat the issue differently?
Personally, I'm most interested at the moment in what Linux would do.
The TCP specification says that connections are identified by the tuple:
{local addr, local port, remote addr, remote port}
Based on this, there theoretically shouldn't be a conflict between a local port used in an existing connection and trying to bind that same port for a server to listen on, since the listening socket doesn't have a remote address/port (these are represented as wildcards in the spec).
However, most TCP implementations, including the Unix sockets API, are more strict than this. If the local port is already in use in any existing socket, you won't be able to bind it, you'll get the error EADDRINUSE. A special exception is made if the existing sockets are all in TIME_WAIT state and the new socket has the SO_REUSEADDR socket option; this is used to allow a server to restart while the sockets left over from a previous process are still waiting to time out.
For this reason, the port range is generally divided into ranges with different uses. When a socket doesn't bind a local port (either because it just called connect() without calling bind(), or by specifying IPPORT_ANY as the port in bind()), the port is chosen from the ephemeral range, which is usually very high numbered ports. Servers, on the other hand, are expected to bind to low-numbered ports. If network applications follow this convention, you should not run into conflicts.

Maintaning more than 65535 connections on single IP

Reading the following article: 10M concurrent websockets
So, there are 1000 websocket servers listening on ports 10000-11000. When a connection is made to one of these servers, I assume they continue communication from a random established TCP connection with random ports. So, as one IP is used, and there are 64K ports, how can one maintain 10M connections? Are connections identified by IP-Port pairs? Can two different connections from different IPs to same port be established? How does this work under the hood?
When a connection is made to one of these servers, I assume they continue communication from a random established TCP connection with random ports.
Wrong assumption. They communicate with the clients using the same local port number they are listening on.
So, as one IP is used, and there are 64K ports, how can one maintain 10M connections?
Not a problem.
Are connections identified by IP-Port pairs?
Yes.
Can two different connections from different IPs to same port be established?
Yes.
How does this work under the hood?
See above. IP:port pairs. You answered your own question.
Sorry for totally changing my answer.
Linux can easily support millions of open sockets if the machine has enough memory and processing power. The TCP/IP stack allows this because the socket the OS targets for a given TCP packet is determined by the source and destination IP and port tuple.
The server implementing the websocket protocol need only listen to a single TCP socket, often defined by the HTTP or HTTPS port number, but not in this example. As part of standard TCP handshaking, the server OS and application open a unique socket for the TCP connection to the new client when the HTTP request which is a websocket request is received. The websocket package takes care of upgrading the protocol used on this new socket from standard HTTP to websocket.
In the example, a goroutine is started for each websocket socket.
The client side, the side initiating the TCP connections, is limited by the number of ephemeral ports its OS can open for a given destination host and port. Honestly, I don't know if this is a limitation of the client OS or the TCP/IP specification itself.
I think the part you are missing is a TCP connection is actually two pairs of IP:PORT.
One for the server, one for the client.
The listening side of a tcp socket is generally always the same IP/Port pair.
Example: net.Listen("tcp", ":8080") is listening on port 8080 (on all interfaces in this case)
The connecting (client) side is usually uses a single outgoing IP along with a random port.
Example: net.Dial("tcp","server:8080) Selects a random available ephemeral port and then attempts to connect to server:8080.
So, in the above example, that connection is: client.ip:32768 -> server.ip:8080 (where 32768 is the ephemeral port selected)
the two pairs combined make a unique connection.
The server side can take as many connections from a single client as there are available (client side) ports. It can also take as many clients are there are IP addresses.
Think of it as, for one listening socket, you can theoretically have 2^16(ports) * 2^32(ipv4 addrs) connections.
In reality, there are reserved IPs, ports, memory limitations, etc so the number is far smaller.
For exmaple, the ephemeral port range on Linux is 32768 - 61000. Which means I'll start getting errors if I net.Dial("tcp", "server:8080") more than 28232 times as I will have exhausted my ephemeral port range for the given server address. But if the server is listening on 2 separate ports, I can do 28232 to the first port, and another 28232 to the second port.
When you see people do the 10MM connection tests, they have to use multiple client IPs or multiple server IPs/Ports to achieve this (or a combo of both to get 10MM unique client:ip/server:ip pairs)

Remote port blocking in firewalls?

some guys use a firewall on their laptops which not only blocks their own local incoming ports (except those they need for their application) but also blocks messages unless they are issued from a distinct port number. We're talking about a local UDP server which is listening to UDP broadcasts.
The problem is that the remote client uses a random port, say 1024, which is blocked unless they tell the firewall to accept it.
What puzzles me is that as far as I know from using sockets in my programs is that usually the client gets its port number from the OS, whereas only when you have a server, you bind your socket to a distinct port, right?
In my literature and in tutorials and code snippets in the web I haven't found any clue that clients should be using fixed port numbers at all.
So how is this in reality? Am I probably missing a point?
Are there client applications around using fixed ports?
Is is actually useful to block remote ports with a firewall?
And if yes, what level of added security does this give to you?
Thanks for enlightenment in beforehand...
Although the default API's allow the network stack to select a local port for client connections, clients may specify a fixed port for various reasons.
Some specifications (FTP) specify a fixed port for clients. Most servers don't care if clients get this correct.
Some clients use a fixed pool of ports for egress from a LAN to the Internet. This allows firewall rules to more completely lock down outbound traffic.
Source ports are sometimes uses as a weak type of "security through obscurity".
You always get a random address and/or port when not explicitly having bound to one before sending.
Daemons are usually bound to a fixed port, so that:
you can actually contact them without having to try all possible ports or utilize a secondary resolver (remember the SUNRPC portmapping crap?)
and because a TCP socket is not allowed to listen() if it has not bound to a port, IIRC.
Are there client applications around using fixed ports?
Some can be configured so, like BIND9.
useful to block remote ports with a firewall?
No, because your peer may choose any port of his. Block him and you'll lose a customer, so to speak.

Resources