Trouble on state FIN_WAIT_1 - linux

recently i've got some port holding on FIN_WAIT_1 state till two days later. The target port is used by one server process ever and client connect to the server process through this port.
The situation is we stopped the server process, and obviously some client is still connecting with the server at that moment. From my understanding, the server process sends a FIN package to client and waiting for the ACK package back. Unfortunately, that ACK package seems not like coming to server side till two days later.
my question is there any config like timeout to FIN_WAIT_1 status. i walked through the internet searching but found nothing there. Please help tell me if you have any experience with this.
BTW, the server process has already gone while the FIN_WAIT_1 happening to the port.
Thanks in advance

The FIN_WAIT_1 state is waiting for the peer to ACK the FIN that this end has just sent. That outgoing FIN is subject to all the normal TCP retry and timeout processing, so if the other end has completely disappeared and never responds, TCP should time out the connection and reset it as a matter of course. That's why you couldn't find a specific FIN_WAIT_1 timeout: there isn't one, just the normal TCP write timers.
All that should have happened within ten minutes or so.
If the state persists and it causes other problems I don't think you don't have much option but to reboot.

Are you sure it's the same ports stuck in FIN_WAIT? It could be due to a load balancer or NAT device that is dropping the connections after an inactivity timeout and silently discarding any further packets, which is the default behavior on some devices.

Related

Can TCP RST packet reduce the connection timeout?

As a way to learn how raw sockets work, I programmed a dummy firewall which drops the packets based on the TCP destination port. It is working but the problem is that the client retries for quite some time until the time out is finally reached.
I was wondering if perhaps the client retries for so long because it does not receive any answer. In that case, would it help if the firewall replies with a TCP RST to the TCP SYNC messages from the client? If not, is there any way to force the client to stop retrying (not reducing the timeout time in the Linux but more, getting a specific answer to its packets which will make the client stop)?
You can think of your firewall as the same case as if the port were closed on the host OS. What would the host OS's TCP/IP stack do?
RFC 793 (original TCP RFC) has the following to say about this case:
If the connection does not exist (CLOSED) then a reset is sent
in response to any incoming segment except another reset. In
particular, SYNs addressed to a non-existent connection are rejected
by this means.
You should read the TCP RFCs and make sure your TCP RST packet conforms to the requirements for this case. A malformed RST will be ignored by the client.
RFC 1122 also indicates that ICMP Destination Unreachable containing codes 2-4 should cause an abort in the connection. It's important to note the codes because 0, 1, and 5 are listed as a MUST NOT for aborting the connection
Destination Unreachable -- codes 2-4
These are hard error conditions, so TCP SHOULD abort
the connection.
Your firewall is behaving correctly. It is a cardinal principle of information scurity not to disclose any information to the attacker. Sending an RST would disclose that the host exists.
There were firewalls that did that 15-20 years ago but there were frowned on. Nowadays they behave like yours: they just drop the packet and do nothing in reply.
It is normal for the client to retry a few times before giving up if there is no response, but contrary to what you have been told in comments, a client will give up immediately with 'connection refused' if it receives an RST. It only retries if there is no response at all.

Permanent TCP connection for administration use

I am facing the following situation:
I have several devices (embedded devices running ARCH Linux) and i would like to have administration access to each device at any time. The problem is the devices are behind a NAT, so establishing a connection from a server to a device is not possible. How could i achieve this?
I thought i could write a simple service running on the device that opens a connection to a server at startup. This TCP connection remains open and can be used from the server to administrate the device. But is it a good idea to keep TCP connections open for a long time? If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Is there maybe another way?
Thanks a lot!
But is it a good idea to keep TCP connections open for a long time?
It's not necessarily a bad idea; although in practice the connections will fail from time to time (e.g. due to network reconfiguration, temporary network outages, etc), so your clients should contain logic to reconnect automatically when this happens. Also note that TCP will not usually not detect it when a completely-idle TCP connection no longer has connectivity, so to avoid "zombie connections" that aren't actually connected, you may want to either enable SO_KEEPALIVE, or have your clients and/or server send the (very occasional) bit of dummy data on the socket just to goose the TCP stack into checking whether connectivity still exists on the socket.
If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Scaling is definitely an issue you'll need to think about. For example, select() is typically implemented to only handle up to a fixed number of connections (often 1024), or if your server is using the thread-per-connection model, you'd find that a process with 1000+ threads is not very efficient. Check out the c10k problem article for lots of interesting details about various approaches and how well they scale up (or don't).
Is there maybe another way?
If you don't need immediate access to the clients, you could always have them check in periodically instead (e.g. once every 5 minutes); or you could have them occasionally send a UDP packet to the server instead of keeping a TCP connection all the time, just to let the server know their presence, and have the server indicate to them somehow (e.g. by updating a well-known web page that the clients check from time to time) when it wanted one of them to open a full TCP connection. Or maybe just use multiple servers to share the load...
The only limit I know of is imposed by state tracking in the iptables code. Check the value of net.ipv4.netfilter.ip_conntrack_max on both sides if you're using this to make sure you have enough headroom for other activities.
If you set the socket option SO_KEEPALIVE before the connect() call, the kernel will send TCP keepalives to make sure the far end is still there. This will mean that connections won't linger forever in the event of a reboot.

linux refuse to open listening port from localhost

I have problem to open a listening port from localhost in a heavy loaded production system.
Sometimes some request to my port 44000 failed. During that time , I checked the telnet to the port with no response, I'm wonder to know the underneath operations takes there. Is the application that is listening to the port is failing to response to the request or it is some problem in kernel side or number of open files.
I would be thankful if someone could explain the underneath operation to opening a socket.
Let me clarify more. I have a java process which accept state full connection from 12 different server.requests are statefull SOAP message . this service is running for one year without this problem. Recently we are facing a problem that sometimes connection from source is not possible to my server in port 44000. As I checked During that time telnet to the service is not possible even from local server. But all other ports are responding good. they all are running with same user and number of allowed open files are much more bigger than this all (lsof | wc -l )
As I understood there is a mechanism in application that limits the number of connection from source to 450 concurrent session, And the problem will likely takes when I'm facing with maximum number of connection (but not all the time)
My application vendor doesn't accept that this problem is from his side and points to os / network / hardware configuration. To be honest I restarted the network service and the problem solved immediately for this special port. Any idea please???
Here's a quick overview of the steps needed to set up a server-side TCP socket in Linux:
socket() creates a new socket and allocates system resources to it (*)
bind() associates a socket with an address
listen() causes a bound socket to enter a listening state
accept() accepts a received incoming attempt, and creates a new socket for this connection. (*)
(It's explained quite clearly and in more detail on wikipedia).
(*): These operations allocate an entry in the file descriptor table and will fail if it's full. However, most applications fork and there shouldn't be issues unless the number of concurrent connections you are handling is in the thousands (see, the C10K problem).
If a call fails for this or any other reason, errno will be set to report the error condition (e.g., to EMFILE if the descriptor table is full). Most applications will report the error somewhere.
Back to your application, there are multiple reasons that could explain why it isn't responding. Without providing more information about what kind of service you are trying to set up, we can only guess. Try testing if you can telnet consistently, and see if the server is overburdened.
Cheers!
Your description leaves room for interpretation, but as we talked above, maybe your problem is that your terminated application is trying to re-use the same socket port, but it is still in TIME_WAIT state.
You can set your socket options to reuse the same address (and port) by this way:
int srv_sock;
int i = 1;
srv_sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(srv_sock, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(i));
Basically, you are telling the OS that the same socket address & port combination can be re-used, without waiting the MSL (Maximum Segment Life) timeout. This timeout can be several minutes.
This does not permit to re-use the socket when it is still in use, it only applies to the TIME_WAIT state. Apparently there is some minor possibility of data coming from previous transactions, though. But, you can (and should anyway) program your application protocol to take care of unintelligible data.
More information for example here: http://www.unixguide.net/network/socketfaq/4.5.shtml
Start TCP server with sudo will solve or, in case, edit firewalls rules (if you are connecting in LAN).
Try to scan ports with nmap (like with TCP Sync Handshake), or similar, to see if the port is opened to any protocol (maybe network security trunkates pings ecc.. to don't show hosts up). If the port isn't responsive, check privileges used by the program, check firewalls rules maybe the port is on but you can't get to it.
Mh I mean.. you are talking about enterprise network so I'm supposing you are on a LAN environment so you are just trying to localhost but you need it to work on LAN.
Anyway if you just need to open localhost port check privileges and routing, try to "tracert" and see what happens and so on...
Oh and check if port is used by a higher privilege service or deamon.
Anyway I see now that this is a 2014 post, np gg nice coding byebye

TCP Servers: Drop Connection, instead of resetting or responding?

Is it possible in Node.JS to "drop" a connection in such a way that
The client never receives a response (200, 404 or otherwise)
The client is never notified that the connection is terminated (never receives connection reset or end of stream)
The server's resources are released (the server should not attempt to maintain the connection in any way)
I am specifically asking about Node.JS HTTP Servers (which are really just complex TCP servers) on Solaris., but if there are cases on other OSes (Windows, Linux) or programming languages (C/C++, Java) that permit this, I am interested.
Why do I want this?
To annoy or slow down (possibly single-threaded) robots such as phpMyAdmin Probe.
I know this is not really something that matters, but these types of questions can better help me learn the boundaries of my programs.
I am aware that the client host is likely to re-transmit the packets of the connection since I am never sending reset.
These are not possible in a generic TCP stack (vs. your own custom TCP stack). The reasons are:
Closing a socket sends a RST
Even if you avoid sending a RST, the client continues to think the connection is open while the server has closed the connection. If the client sends any packet on this connection, the server is going to send a RST.
You may want to explore firewalling these robots and block / rate limit their IP addresses with something like iptables (linux) or the equivalent on solaris.
closing a connection should NOT send an RST. There is a 3 way tear down process.

Tcp connections hang on CLOSE_WAIT status

Client close the socket first, when there is not much data from server, tcp connection shutdown is okay like:
FIN -->
<-- ACK
<-- FIN, ACK
ACK -->
When the server is busying sending data:
FIN -->
<-- ACK,PSH
RST -->
And the server connection comes to CLOSE_WAIT state and hang on there for a long time.
What's the problem here? client related or server related? This happens on Redhat5 for local sockets.
This article talk about why "RST" is sent, but I do not know why the server connection stuck on CLOSE_WAIT, and do not send a FIN out.
[EDIT]I ignored the most important information, this happens on qemu's slirp network emulation. It seems to be a problem of slirp bug for dealing with close connection.
This means that there is unread data left in in the stream, that the client hasn't finished reading.
You can force it off by using the SO_LINGER option. Here's relevant documentation for Linux (also see the option itself, here), and [here's the matching function2] for Win32.
It's the server side that is remaining open, so it's on the server side you can try disabling SO_LINGER.
It may mean that the server hasn't closed the socket. You can easily tell this by using "lsof" to list the file descriptors open by that process which will include TCP sockets. The fix is to have the process always close the socket when it's finished (even in error cases etc)
This a known defect for qemu.

Resources