How to confirm several TCP connections come from one computer? - node.js

I have a C/S model that each client use 3 or more TCP connections to one Server, for example one connection for binary data, one for text messages and one for controlling.
However, I have no idea how to "group" these 3 connections as one client.
I've tried use IP to identical, but different client may have same IP address.
Though I'm using node.js, I think this is a common problem may appear in any languages/implements.

The only thing you can do is handle it at an application layer. Send some data that the clients have to return to you in some form for each connection. (Look at the SSL handshaking process. Maybe just use that!)

Related

TCP hole punching in Node without a server

I'm trying to follow the code given here to implement NAT hole punching in Node.js. I'd like to know if the server is strictly necessary. Having read about hole punching, I am under the impression that the purpose of the server is to allow the clients to exchange some information (including but not limited to their addresses and ports they want to communicate on) so that they can proceed to talk directly. Assuming the clients already had each other's information (again, including but not limited to their addresses and ports), would the server still be necessary? If so why and if not, how could this be implemented?
For instance, say one were to build an application where client_A prints out all information that would have been transmitted to the server for user_A to read, who then sends this to user_B, who then submits this info to client_B (this could be done via email for example). Wouldn't this avoid the need for a server?
Here is another explanation of why I think it might be possible to remove the server in the middle:
In NAT hole punching (assuming I understand it correctly), the communications begin when client_A sends a message to the server. The message contains some information that the server then passes on to client_B when client_B contacts the server. After this point, client_A and client_B are able to communicate directly without the need for the server. I am under the impression that once a direct connection between client_A and client_B has been established, the server could go offline and the two clients would still be able to communicate directly with one another. If this is the case, then I would imagine that any information that is being used to maintain this connection (be that addresses, ports, or any other kind of info) could be exchanged through any other channel (eg: email, a handwritten letter, a voice call, etc) at the beginning of the protocol, and then the connection could be established without ever needing the server.
Regarding 'tricking' the router
As manishig pointed out to me in a comment (thanks), NAT hole punching also requires tricking the router. If I understand correctly (please correct me if not) the router is tricked by having the router store the info for directing incoming packets from the server to client_A, however, these packets are actually coming from client_B after the initial phase of the protocol. If this is a correct description of the problem, is there a way to trick the router that doesn't require using a server?
There are ways to communicate between two remote computers over the internet without an intermidiate server, but IMO it is not the preferred way.
Why an intermidiate server is needed?
If client_A and client_B are both in the same LAN (e.g your home/office network) you can make sure (configure on the clients side and/or the router) that they will have a static ip address over this LAN and they can just talk freely.
E.G: If client_A is listening on port 8080, client_B can create a connection to client_A_ip on port 8080
Over the internet any packet sent is passed through NAT usually at least twice. One time after going through your LAN (e.g your home/office router) and at least once over an ISP endpoint. Which means you have no controll over the public ip and port assigned to your packet.
Now not only that you don't have controll over your packet's assigned public ip and port, these are also not static. They won't change while you have an active TCP connection, but you don't have any other guarantee from your ISP regarding your assigned public ip and port.
The intermediate server`s purpose is to dynamically update each client with it's peer info and also keeping the tcp connection open, so that peer to peer comunication will be available.
Alternative solution to an intermidiate server (Not recommended)
If you want your clients to communicate without an intermidiate server you can buy a public static ip from your ISP (if they support it) and then there are ways you can make (with some config) that one of your clients have a public static ip and port that the other client can connect to.
But I wouldn't recommend it, since it requires some understanding in IT and security risks.
Also if both client's are portable and connect to different networks all the time it's not a valid solution

How to identify and authenticate different TCP clients (with dynamic IPs) in a Node.js TCP server framework?

I have Node.js TCP server framework which acts as a central TCP server connecting to many TCP clients which are sensors gathering and sending data. This is essentially a machine-to-machine communication where the TCP client establishes a connection and starts sending data. The server has to authenticate and then process the data.
What I want to do? --->
Authenticate each client by identifying them and making sure they are in the users list in the database.
I can identify each incoming client by its ip and port. However, each client has a dynamic IP. This means that I cannot rely on it to compare against a list in my DB.
My goal here is to make sure that each connection is valid and is part of my user database. I was thinking along the lines of implementing an 'application layer' where both the sensor(client) and the TCP server know a string which they match. When the client setups a connection, it sends this string and this is used by the server to compare against a list in the database. This way the client emits a 'keyword' each time it establishes a connection.
If this is a viable method, how can I use the node.js 'NET' module to emit a keyword only when the connection is established? I don't see any such provision.
Also, is there a better way to identify clients in a M-M connections? Any pointers will be helpful.
Node-RSA Will do exactly this for you.
Else,
use the socket.write function in order to send data. Then if another Node app is listening on the other side you can utilize the socket data event to obtain what the sender has written. If you only want certain machines to connect to yours I'd indeed recommend using a secret key if they IP adresses aren't static. Though beware this is not completely safe as people could perform a bruteforce attack. It is possible to minimize the chance of someone actually guessing your secret key by you guessed it, utilizing a key which is a hashed string. The string could be anything, as long as both of you have it. It's even better to create and share the same function which alters the string every once in a while for a little bit more security.
Please note that this isn't as secure as the first option but it is definitely still something.

Permanent TCP connection for administration use

I am facing the following situation:
I have several devices (embedded devices running ARCH Linux) and i would like to have administration access to each device at any time. The problem is the devices are behind a NAT, so establishing a connection from a server to a device is not possible. How could i achieve this?
I thought i could write a simple service running on the device that opens a connection to a server at startup. This TCP connection remains open and can be used from the server to administrate the device. But is it a good idea to keep TCP connections open for a long time? If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Is there maybe another way?
Thanks a lot!
But is it a good idea to keep TCP connections open for a long time?
It's not necessarily a bad idea; although in practice the connections will fail from time to time (e.g. due to network reconfiguration, temporary network outages, etc), so your clients should contain logic to reconnect automatically when this happens. Also note that TCP will not usually not detect it when a completely-idle TCP connection no longer has connectivity, so to avoid "zombie connections" that aren't actually connected, you may want to either enable SO_KEEPALIVE, or have your clients and/or server send the (very occasional) bit of dummy data on the socket just to goose the TCP stack into checking whether connectivity still exists on the socket.
If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Scaling is definitely an issue you'll need to think about. For example, select() is typically implemented to only handle up to a fixed number of connections (often 1024), or if your server is using the thread-per-connection model, you'd find that a process with 1000+ threads is not very efficient. Check out the c10k problem article for lots of interesting details about various approaches and how well they scale up (or don't).
Is there maybe another way?
If you don't need immediate access to the clients, you could always have them check in periodically instead (e.g. once every 5 minutes); or you could have them occasionally send a UDP packet to the server instead of keeping a TCP connection all the time, just to let the server know their presence, and have the server indicate to them somehow (e.g. by updating a well-known web page that the clients check from time to time) when it wanted one of them to open a full TCP connection. Or maybe just use multiple servers to share the load...
The only limit I know of is imposed by state tracking in the iptables code. Check the value of net.ipv4.netfilter.ip_conntrack_max on both sides if you're using this to make sure you have enough headroom for other activities.
If you set the socket option SO_KEEPALIVE before the connect() call, the kernel will send TCP keepalives to make sure the far end is still there. This will mean that connections won't linger forever in the event of a reboot.

how to efficiently transfer file between 2 node.js instances?

I'm developing chat application using app.js which is webkit+node.js framework.
So i have node.js plus bridged web browser environment on both sides.
I want to make file transfer feature somewhat similar to Skype one.
So, initial idea is to:
1.connect clients to main server.
2.Each client gets ip of oposite ones.
3.Start socket or websocket server on both clients and connect to each other.
4.Sender reads the file and transmits it to the reciver.
Question are:
1.Im not really sure that one client can "see" the other.
2.file is a binary data, but websockets are made for text messages so i need some kind of coding/decoding stuff. I thought about base 64 but it has 30% of "overhead" information. So i need something more effitient (base 128?).
3.If it is not efficient to use websocket should i use TCP sockets instead? What problems can appear if i decide to use them?
Yeah i know about node2node and BinaryJS, i just dont know should i use them or not. And i really what to do something myself.
OK, with your communication looking like this:
(C->N)<->N<->(N->C)
(...) is installed on one client's machine. N's are node servers, C's are web clients.
This is out of your control. Some file sharing apps send test packets from the central server to clients, to check whether ports are open and NAT rules are configured correctly, etc. Your clients will start their own servers on some port, your master server can potentially create a test connection to these servers to see whether they're started correctly and open to the web, BEFORE telling other clients that they can send files.
Websockets are great for status messages from your servers to the web GUIs and general client-to-client communication. For the actual file transfers, I would use TCP sockets, see the next answer. On the other hand base64 encoding is really not a slow process, play with it and benchmark its performance, then decide with some data to back up your decision.
You could use a combination: websockets from your servers to the web GUIs, but TCP communication between the servers themselves. TCP servers (and streams) aren't hard to set up in Node, I see no disadvantages. It might actually be less complicated than installing node2node on those servers, since TCP is already built-in.

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

Resources