Permanent TCP connection for administration use - linux

I am facing the following situation:
I have several devices (embedded devices running ARCH Linux) and i would like to have administration access to each device at any time. The problem is the devices are behind a NAT, so establishing a connection from a server to a device is not possible. How could i achieve this?
I thought i could write a simple service running on the device that opens a connection to a server at startup. This TCP connection remains open and can be used from the server to administrate the device. But is it a good idea to keep TCP connections open for a long time? If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Is there maybe another way?
Thanks a lot!

But is it a good idea to keep TCP connections open for a long time?
It's not necessarily a bad idea; although in practice the connections will fail from time to time (e.g. due to network reconfiguration, temporary network outages, etc), so your clients should contain logic to reconnect automatically when this happens. Also note that TCP will not usually not detect it when a completely-idle TCP connection no longer has connectivity, so to avoid "zombie connections" that aren't actually connected, you may want to either enable SO_KEEPALIVE, or have your clients and/or server send the (very occasional) bit of dummy data on the socket just to goose the TCP stack into checking whether connectivity still exists on the socket.
If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Scaling is definitely an issue you'll need to think about. For example, select() is typically implemented to only handle up to a fixed number of connections (often 1024), or if your server is using the thread-per-connection model, you'd find that a process with 1000+ threads is not very efficient. Check out the c10k problem article for lots of interesting details about various approaches and how well they scale up (or don't).
Is there maybe another way?
If you don't need immediate access to the clients, you could always have them check in periodically instead (e.g. once every 5 minutes); or you could have them occasionally send a UDP packet to the server instead of keeping a TCP connection all the time, just to let the server know their presence, and have the server indicate to them somehow (e.g. by updating a well-known web page that the clients check from time to time) when it wanted one of them to open a full TCP connection. Or maybe just use multiple servers to share the load...

The only limit I know of is imposed by state tracking in the iptables code. Check the value of net.ipv4.netfilter.ip_conntrack_max on both sides if you're using this to make sure you have enough headroom for other activities.
If you set the socket option SO_KEEPALIVE before the connect() call, the kernel will send TCP keepalives to make sure the far end is still there. This will mean that connections won't linger forever in the event of a reboot.

Related

linux refuse to open listening port from localhost

I have problem to open a listening port from localhost in a heavy loaded production system.
Sometimes some request to my port 44000 failed. During that time , I checked the telnet to the port with no response, I'm wonder to know the underneath operations takes there. Is the application that is listening to the port is failing to response to the request or it is some problem in kernel side or number of open files.
I would be thankful if someone could explain the underneath operation to opening a socket.
Let me clarify more. I have a java process which accept state full connection from 12 different server.requests are statefull SOAP message . this service is running for one year without this problem. Recently we are facing a problem that sometimes connection from source is not possible to my server in port 44000. As I checked During that time telnet to the service is not possible even from local server. But all other ports are responding good. they all are running with same user and number of allowed open files are much more bigger than this all (lsof | wc -l )
As I understood there is a mechanism in application that limits the number of connection from source to 450 concurrent session, And the problem will likely takes when I'm facing with maximum number of connection (but not all the time)
My application vendor doesn't accept that this problem is from his side and points to os / network / hardware configuration. To be honest I restarted the network service and the problem solved immediately for this special port. Any idea please???
Here's a quick overview of the steps needed to set up a server-side TCP socket in Linux:
socket() creates a new socket and allocates system resources to it (*)
bind() associates a socket with an address
listen() causes a bound socket to enter a listening state
accept() accepts a received incoming attempt, and creates a new socket for this connection. (*)
(It's explained quite clearly and in more detail on wikipedia).
(*): These operations allocate an entry in the file descriptor table and will fail if it's full. However, most applications fork and there shouldn't be issues unless the number of concurrent connections you are handling is in the thousands (see, the C10K problem).
If a call fails for this or any other reason, errno will be set to report the error condition (e.g., to EMFILE if the descriptor table is full). Most applications will report the error somewhere.
Back to your application, there are multiple reasons that could explain why it isn't responding. Without providing more information about what kind of service you are trying to set up, we can only guess. Try testing if you can telnet consistently, and see if the server is overburdened.
Cheers!
Your description leaves room for interpretation, but as we talked above, maybe your problem is that your terminated application is trying to re-use the same socket port, but it is still in TIME_WAIT state.
You can set your socket options to reuse the same address (and port) by this way:
int srv_sock;
int i = 1;
srv_sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(srv_sock, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(i));
Basically, you are telling the OS that the same socket address & port combination can be re-used, without waiting the MSL (Maximum Segment Life) timeout. This timeout can be several minutes.
This does not permit to re-use the socket when it is still in use, it only applies to the TIME_WAIT state. Apparently there is some minor possibility of data coming from previous transactions, though. But, you can (and should anyway) program your application protocol to take care of unintelligible data.
More information for example here: http://www.unixguide.net/network/socketfaq/4.5.shtml
Start TCP server with sudo will solve or, in case, edit firewalls rules (if you are connecting in LAN).
Try to scan ports with nmap (like with TCP Sync Handshake), or similar, to see if the port is opened to any protocol (maybe network security trunkates pings ecc.. to don't show hosts up). If the port isn't responsive, check privileges used by the program, check firewalls rules maybe the port is on but you can't get to it.
Mh I mean.. you are talking about enterprise network so I'm supposing you are on a LAN environment so you are just trying to localhost but you need it to work on LAN.
Anyway if you just need to open localhost port check privileges and routing, try to "tracert" and see what happens and so on...
Oh and check if port is used by a higher privilege service or deamon.
Anyway I see now that this is a 2014 post, np gg nice coding byebye

Dropping of connections with tcp_tw_recycle

summary of the problem
we are having a setup wherein a lot(800 to 2400 per second( of incoming connections to a linux box and we have a NAT device between the client and server.
so there are so many TIME_WAIT sockets left in the system.
To overcome that we had set tcp_tw_recycle to 1, but that led to drop of in comming connections.
after browsing through the net we did find the references for why the dropping of frames with tcp_tw_recycle and NAT device happens.
resolution tried
we then tried by setting tcp_tw_reuse to 1 it worked fine without any issues with the same setup and configuration.
But the documentation says that tcp_tw_recycle and tcp_tw_reuse should not be used when the Connections that go through TCP state aware nodes, such as firewalls, NAT devices or load balancers may see dropped frames. The more connections there are, the more likely you will see this issue.
Queries
1) can tcp_tw_reuse be used in this type of scenarios?
2) if not, which part of the linux code is preventing tcp_tw_reuse being used for such scenario?
3) generally what is the difference between tcp_tw_recycle and tcp_tw_reuse?
By default, when both tcp_tw_reuse and tcp_tw_recycle are disabled, the kernel will make sure that sockets in TIME_WAIT state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
When you enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps (a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.
When you enable tcp_tw_recycle, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.
Now, timestamps should never go back in time; unless:
the host is rebooted (but then, by the time it comes back up, TIME_WAIT socket will probably have expired, so it will be a non issue);
the IP address is quickly reused by something else (TIME_WAIT connections will stay a bit, but other connections will probably be struck by TCP RST and that will free up some space);
network address translation (or a smarty-pants firewall) is involved in the middle of the connection.
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
Some people recommend to leave tcp_tw_recycle alone, but enable tcp_tw_reuse and lower tcp_fin_timeout. I concur :-)

TCP Servers: Drop Connection, instead of resetting or responding?

Is it possible in Node.JS to "drop" a connection in such a way that
The client never receives a response (200, 404 or otherwise)
The client is never notified that the connection is terminated (never receives connection reset or end of stream)
The server's resources are released (the server should not attempt to maintain the connection in any way)
I am specifically asking about Node.JS HTTP Servers (which are really just complex TCP servers) on Solaris., but if there are cases on other OSes (Windows, Linux) or programming languages (C/C++, Java) that permit this, I am interested.
Why do I want this?
To annoy or slow down (possibly single-threaded) robots such as phpMyAdmin Probe.
I know this is not really something that matters, but these types of questions can better help me learn the boundaries of my programs.
I am aware that the client host is likely to re-transmit the packets of the connection since I am never sending reset.
These are not possible in a generic TCP stack (vs. your own custom TCP stack). The reasons are:
Closing a socket sends a RST
Even if you avoid sending a RST, the client continues to think the connection is open while the server has closed the connection. If the client sends any packet on this connection, the server is going to send a RST.
You may want to explore firewalling these robots and block / rate limit their IP addresses with something like iptables (linux) or the equivalent on solaris.
closing a connection should NOT send an RST. There is a 3 way tear down process.

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

What is the meaning of SO_REUSEADDR (setsockopt option) - Linux? [duplicate]

This question already has answers here:
How do SO_REUSEADDR and SO_REUSEPORT differ?
(2 answers)
Closed 9 years ago.
From the man page:
SO_REUSEADDR Specifies that the rules
used in validating addresses supplied
to bind() should allow reuse of local
addresses, if this is supported by the
protocol. This option takes an int
value. This is a Boolean option
When should I use it? Why does "reuse of local addresses" give?
TCP's primary design goal is to allow reliable data communication in the face of packet loss, packet reordering, and — key, here — packet duplication.
It's fairly obvious how a TCP/IP network stack deals with all this while the connection is up, but there's an edge case that happens just after the connection closes. What happens if a packet sent right at the end of the conversation is duplicated and delayed, such that the 4-way shutdown packets get to the receiver before the delayed packet? The stack dutifully closes down its connection. Then later, the delayed duplicate packet shows up. What should the stack do?
More importantly, what should it do if a program with open sockets on a given IP address + TCP port combo closes its sockets, and then a brief time later, a program comes along and wants to listen on that same IP address and TCP port number? (Typical case: A program is killed and is quickly restarted.)
There are a couple of choices:
Disallow reuse of that IP/port combo for at least 2 times the maximum time a packet could be in flight. In TCP, this is usually called the 2×MSL delay. You sometimes also see 2×RTT, which is roughly equivalent.
This is the default behavior of all common TCP/IP stacks. 2×MSL is typically between 30 and 120 seconds, and it shows up in netstat output as the TIME_WAIT period. After that time, the stack assumes that any rogue packets have been dropped en route due to expired TTLs, so that socket leaves the TIME_WAIT state, allowing that IP/port combo to be reused.
Allow the new program to re-bind to that IP/port combo. In stacks with BSD sockets interfaces — essentially all Unixes and Unix-like systems, plus Windows via Winsock — you have to ask for this behavior by setting the SO_REUSEADDR option via setsockopt() before you call bind().
SO_REUSEADDR is most commonly set in network server programs, since a common usage pattern is to make a configuration change, then be required to restart that program to make the change take effect. Without SO_REUSEADDR, the bind() call in the restarted program's new instance will fail if there were connections open to the previous instance when you killed it. Those connections will hold the TCP port in the TIME_WAIT state for 30-120 seconds, so you fall into case 1 above.
The risk in setting SO_REUSEADDR is that it creates an ambiguity: the metadata in a TCP packet's headers isn't sufficiently unique that the stack can reliably tell whether the packet is stale and so should be dropped rather than be delivered to the new listener's socket because it was clearly intended for a now-dead listener.
If you don't see that that is true, here's all the listening machine's TCP/IP stack has to work with per-connection to make that decision:
Local IP: Not unique per-conn. In fact, our problem definition here says we're reusing the local IP, on purpose.
Local TCP port: Ditto.
Remote IP: The machine causing the ambiguity could re-connect, so that doesn't help disambiguate the packet's proper destination.
Remote port: In well-behaved network stacks, the remote port of an outgoing connection isn't reused quickly, but it's only 16 bits, so you've got 30-120 seconds to force the stack to get through a few tens of thousands of choices and reuse the port. Computers could do work that fast back in the 1960s.
If your answer to that is that the remote stack should do something like TIME_WAIT on its side to disallow ephemeral TCP port reuse, that solution assumes that the remote host is benign. A malicious actor is free to reuse that remote port.
I suppose the listener's stack could choose to strictly disallow connections from the TCP 4-tuple only, so that during the TIME_WAIT state a given remote host is prevented from reconnecting with the same remote ephemeral port, but I'm not aware of any TCP stack with that particular refinement.
Local and remote TCP sequence numbers: These are also not sufficiently unique that a new remote program couldn't come up with the same values.
If we were re-designing TCP today, I think we'd integrate TLS or something like it as a non-optional feature, one effect of which is to make this sort of inadvertent and malicious connection hijacking impossible. That requires adding large fields (128 bits and up) which wasn't at all practical back in 1981, when the document for the current version of TCP (RFC 793) was published.
Without such hardening, the ambiguity created by allowing re-binding during TIME_WAIT means you can either a) have stale data intended for the old listener be misdelivered to a socket belonging to the new listener, thereby either breaking the listener's protocol or incorrectly injecting stale data into the connection; or b) new data for the new listener's socket mistakenly assigned to the old listener's socket and thus inadvertently dropped.
The safe thing to do is wait out the TIME_WAIT period.
Ultimately, it comes down to a choice of costs: wait out the TIME_WAIT period or take on the risk of unwanted data loss or inadvertent data injection.
Many server programs take this risk, deciding that it's better to get the server back up immediately so as to not miss any more incoming connections than necessary.
This is not a universal choice. Many programs — even server programs requiring a restart to apply a settings change — choose instead to leave SO_REUSEADDR alone. The programmer may know these risks and is choosing to leave the default alone, or they may be ignorant of the issues but are getting the benefit of a wise default.
Some network programs offer the user a choice among the configuration options, fobbing the responsibility off on the end user or sysadmin.
SO_REUSEADDR allows your server to
bind to an address which is in a
TIME_WAIT state.
This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port.
From unixguide.net
When you create a socket, you don't really own it. The OS (TCP stack) creates it for you and gives you a handle (file descriptor) to access it. When your socket is closed, it take time for the OS to "fully close it" while it goes through several states. As EJP mentioned in the comments, the longest delay is usually from the TIME_WAIT state. This extra delay is required to handle edge cases at the very end of the termination sequence and make sure the last termination acknowledgement either got through or had the other side reset itself because of a timeout. Here you can find some extra considerations about this state. The main considerations are pointed out as follow :
Remember that TCP guarantees all data transmitted will be delivered,
if at all possible. When you close a socket, the server goes into a
TIME_WAIT state, just to be really really sure that all the data has
gone through. When a socket is closed, both sides agree by sending
messages to each other that they will send no more data. This, it
seemed to me was good enough, and after the handshaking is done, the
socket should be closed. The problem is two-fold. First, there is no
way to be sure that the last ack was communicated successfully.
Second, there may be "wandering duplicates" left on the net that must
be dealt with if they are delivered.
If you try to create multiple sockets with the same ip:port pair really quick, you get the "address already in use" error because the earlier socket will not have been fully released. Using SO_REUSEADDR will get rid of this error as it will override checks for any previous instance.

Resources