Linux - Too many closed connections - linux

I'm coding an application opening on a single linux machine 1800 connections/minute using Netty (async nio). A connection lives for a few seconds and then it is closed or it is timeouted after 20 secs if no answer is received. More, the read/write timeout is 30 secs and the request header contains connection=close.
After a while (2-3 hours) I get a lot of exceptions in the logs because Netty is unable to create new connections due to a lack of resources.
I increased the max number of open files in limits.conf as:
root hard nofile 200000
root soft nofile 200000
Here is the output of netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n:
1 established)
1 FIN_WAIT2
1 Foreign
2 TIME_WAIT
6 LISTEN
739 SYN_SENT
6439 LAST_ACK
6705 CLOSE_WAIT
12484 ESTABLISHED
This is the output of the ss -s command:
Total: 194738 (kernel 194975)
TCP: 201128 (estab 13052, closed 174321, orphaned 6477, synrecv 0, timewait 3/0), ports 0
Transport Total IP IPv6
* 194975 - -
RAW 0 0 0
UDP 17 12 5
TCP 26807 8 26799
INET 26824 20 26804
FRAG 0 0 0
Also ls -l /proc/2448/fd | wc -l gives about 199K.
That said, the questions are about the closed connections reported in the ss -s command output:
1)what are they exactly?
2)why do they keep dangling without being destroyed?
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?

1)what are they exactly?
They are sockets that were either never connected or were disconnected and weren't closed.
In Linux, an outgoing TCP socket goes through the following stages (roughly):
You create the socket (unconnected), and kernel allocates a file descriptor for it.
You connect() it to the remote side, establishing a network connection.
You do data transfer (read/write).
When you are done with reading/writing, you shutdown() the socket for both reading and writing, closing the network connection.
You close() the socket, and kernel frees the file descriptor.
So those 174K connections ss reports as closed are sockets that were either not gone past stage 1 (maybe connect() failed or even never called) or gone through stage 4, but not 5. Effectively, they are sockets with underlying open file descriptors, but without any network binding (so netstat / ss listing doesn't show them).
2)why do they keep dangling without being destroyed?
Because nobody called close() on them. I would call it a "file descriptor leak" or a "socket descriptor leak".
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
From the Linux point of view, no. You have to explicitly call close() on them (or terminate the process that owns them so the kernel knows they aren't used anymore).
From the Netty/Java point of view, maybe, I don't know.
Essentially, it's a bug in your code, or in Netty code (less likely), or in JRE code (much less likely). You are not releasing the resources when you should. If you show the code, maybe somebody can spot the error.

As Roman correctly pointed out, closed connections do exist and are sockets which have never been closed properly.
In my case I had some clues about what was going wrong which I report below:
1)ss -s showed stranges values, in particular a lot of closed connections
2)ls -l /proc/pid/fd | wc -l showed a lot of open descriptors
3)Numbers in netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n did not match with the previous ones
4)sudo lsof -n -p pid (Roman suggestion) showed a lot of entries with can't identify protocol.
Looking around on the web I found an interesting post (https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/) which explains what really point 4 might mean and why netstat numbers do not match (see also here https://serverfault.com/questions/153983/sockets-found-by-lsof-but-not-by-netstat) .
I was quite surprised, since I used netty 4.1.x (with Spring) with a common pattern where every connection was supposed to be properly closed, so I spent a few days before understanding what was really wrong.
The subtle problem was in the netty IO thread, where the message body was copied and put in a blocking queue (as part of my code). When the queue was full, that slowed down things, introducing some latency and causing a connection time out not detected by my end and, consequently, the leak of FDs.
My solution was to introduce a sort of pooled queue preventing netty requests when the queue is full.

You still haven't provided the exact error message I asked for, but as far as I can see the question should be about the six and a half thousand connections in CLOSE_WAIT state, not 'those closed connections'.
You're not closing sockets that the peer has disconnected.
That said, the questions are about those closed connections.
What closed connections? Your netstat display doesn't show any closed connections. And there is no evidence that your resource exhaustion problem has anything to do with closed connections.
1)what are they exactly?
They aren't.
2)why do they keep dangling without being destroyed?
They don't.
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
As they don't exist, the question is meaningless.

Related

Dropping packets with netcat using a UDP transfer?

I'm working on sending large data files between two Linux computers via a 10 Gigabit Ethernet cable and netcat with a UDP transfer, but seem to be having issues.
After running several tests, I've come to the conclusion that netcat is the issue. I've tested the UDP transfer using [UDT][1], [Tsunami-UDP]2, and a Python UDT transfer as well, and all of which have not had any packet loss issues.
On the server side, we've been doing:
cat "bigfile.txt" | pv | nc -u IP PORT
then on the client side, we've been doing:
nc -u -l PORT > "outputFile.txt"
A few things that we've noticed:
On one of the computers, regardless of whether it's the client or server, it just "hangs". That is to say, even once the transfer is complete, Linux doesn't kill the process and move to the next line in the terminal.
If we run pipe view on the receiving side as well, the incoming data rate is significantly lower than what the sending side thinks it's sending.
Running Wireshark doesn't show any packet loss.
Running the system performance monitor in Linux shows that the incoming data rate (for the receiving side) is the same as the outgoing data rate from the sending side. This is in contrast to what pipe view thinks (see #2)
We're not sure where the issue is with netcat, and if there is a way around it. Any help/insights would be greatly appreciated.
Also, for what it's worth, using netcat with a TCP transfer works fine. And, I do understand that UDP isn't known for reliability, and that packet loss should be expected, but it's the protocol we must use.
Thanks
It could well be that the sending instance is sending the data too fast for the receiving instance. Note that this can occur even if you see no drops on the receiving NIC (as you seem to be saying), because the loss can occur at OS level instead. Your OS could have its UDP buffers overflowing. Run this command:
watch -d "cat /proc/net/snmp | grep -w Udp"
To see if your RcvbufErrors field is non-zero and/or growing while your file transfer is going on.
This answer (How to send only one UDP packet with netcat?) says that nc sends one packet per line. Assuming that's true, this could lead to a significantly higher number of packets than your other transfer mechanisms. Presumably, as #Smeeheey suggested, you're running out of receive buffers on the receiving end.
To cause your sending end to exit, you can add -q 1 to the command line (exit 1 second after seeing end of file).
But there's no way that the the receiving end nc can know when the transfer is complete. This is why these other mechanisms are "protocols" -- they have mechanisms built into them to communicate the bounds of a file. Raw UDP has no concept of end of file.
Tuning the Linux networking stack is a bit complicated, as there are many components to tune to figure out where data is being dropped.
If possible/feasible, I'd recommend that you start by monitoring packet drops throughout the entire network stack. Once you've done that, you can determine where exactly packets are being dropped and then adjust tuning parameters as needed. There are a lot of different files to measure with lots of different fields. I wrote a detailed blog post about monitoring and tuning each component of the Linux networking stack from top to bottom. It's a bit difficult to summarize all the information there, but take a look, I think it can help guide you.

Get network connections from /proc/net/sockstat

I founds this information in /proc which displays sockets:
$ cat /proc/net/sockstat
sockets: used 8278
TCP: inuse 1090 orphan 2 tw 18 alloc 1380 mem 851
UDP: inuse 6574
RAW: inuse 1
FRAG: inuse 0 memory 0
Can you help me to find what these values means? Also are these values enough reliable or I need to search for it somewhere else?
Is these other way to find information about the TCP/UDP connections in Linux?
Can you help me to find what these values means?
As per code here the values are number of sockets in use (TCP / UDP), number of orphan TCP sockets (socket that applications have no more handles to, they already called close()). TCP tw I am not sure but, based on structure name (tcp_death_row), those are the sockets to be definitively destroyed in near future? sockets represent the number of allocated sockets (as per my understand, contemplates TCP sockets in different states) and mem is number of pages allocated by TCP sockets (memory usage).
This article has some discussions around this topic.
In my understanding the /proc/net/sockstat is The most reliable place to look for that information. I often use it myself, and to have one single server to manage 1MM simultaneous connections that was the only place I could reliably count that information.
You can use the netstat command which itself utilizes the /proc filesystem but prints information more readable for humans.
If you want to display the current tcp connections for example, you can issue the following command:
netstat -t
Check man netstat for the numerous options.

How do I solve having my server automatically shutdown, if a UDP port has not been active for a certain amount of time?

I suppose this may be an odd question, but I have a small EC2-instance that costs quite a large sum of money every month. It's charged hourly though, so I only turn on this particular instance when I need it, and power it off when I'm done.
The purpose of this instance is for hosting a Counter-Strike: Global Offensive dedicated server which I only power on when I have a scrim to play.
Instead of forgetting to turn it off and being charged a lot, or having an unintelligent start-up script that asks the instance to power-off after 3 hours, I was thinking of a more intelligent design.
Here's my idea; that the instance intelligently powers itself off when it senses it is no longer in use, by determaining if a certain amount of network activity on UDP 27015 has not been recorded over the last 10 minutes, trying 3 times before powering off.
That way I can power-on, play the match, and not worry about powering off the server :-)
It sounds cool in my head. The question is how I go about solving the task. I imagine a bash-script executed every 10 minutes with the help of cron.
If I'm not being entirely crazy here, could a bash-script suggestion possibly be offered? Or maybe a better solution how I solve this quest I'm on, to save $$ by having the server power itself off when sensing it is no longer in use!
I'm not too familiar with EC2 instances, but if they are running some form of linux... Under Fedora I can use ifconfig to see how much data has been received/transmitted across the network interface. It's not just the single port but all ports on that interface... Would that number suffice for you? Ought to be pretty trivial to monitor it every few minutes and see when the load drops off...
Possibly a simple script to start with that is started when the EC2 instance is brought up and just logs the data. An hour after your game you can grab the log, manually shut down, and review it at your leisure to see if this will work. (It's amazing how many things use the network sometimes...)
Afterthought: Perhaps tcpdump would be better? Will it work with UDP port 27015? You might need some way to time it out, like running it as a background process, possibly with the -c option, sleeping for a while, and then killing the tcpdump process if it's still running. You may need to pipe through wc -l or just grep the final packets grabbed line. Caveat: tcpdump may need to be run as root.
E.g. /usr/sbin/tcpdump -n -nn -q -c 100 -i eth0 port 27015
Further afterthought:
#!/bin/bash --norc
/usr/sbin/tcpdump -n -nn -q -i eth0 port 27015 2>./logfile 1>/dev/null &
TCPDUMP_PID=$!
echo "sleeping... pid=$TCPDUMP_PID"
sleep 30
echo "wake up"
kill $TCPDUMP_PID
sleep 2
cat ./logfile

Node.js struggling with lots of concurrent connections

I'm working on a somewhat unusual application where 10k clients are precisely timed to all try to submit data at once, every 3 mins or so. This 'ab' command fairly accurately simulates one barrage in the real world:
ab -c 10000 -n 10000 -r "http://example.com/submit?data=foo"
I'm using Node.js on Ubuntu 12.4 on a rackspacecloud VPS instance to collect these submissions, however, I'm seeing some very odd behavior from Node, even when I remove all my business logic and turn the http request into a no-op.
When the test gets about 90% done, it hangs for a long period of time. Strangely, this happens consistently at 90% - for c=n=10k, at 9000; for c=n=5k, at 4500; for c=n=2k, at 1800. The test actually completes eventually, often with no errors. But both ab and node logs show continuous processing up till around 80-90% of the test run, then a long pause before completing.
When node is processing requests normally, CPU usage is typically around 50-70%. During the hang period, CPU goes up to 100%. Sometimes it stays near 0. Between the erratic CPU response and the fact that it seems unrelated to the actual number of connections (only the % complete), I do not suspect the garbage collector.
I've tried this running 'ab' on localhost and on a remote server - same effect.
I suspect something related to the TCP stack, possibly involving closing connections, but none of my configuration changes have helped. My changes:
ulimit -n 999999
When I listen(), I set the backlog to 10000
Sysctl changes are:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_max_orphans = 20000
net.ipv4.tcp_max_syn_backlog = 10000
net.core.somaxconn = 10000
net.core.netdev_max_backlog = 10000
I have also noticed that I tend to get this msg in the kernel logs:
TCP: Possible SYN flooding on port 80. Sending cookies. Check SNMP counters.
I'm puzzled by this msg since the TCP backlog queue should be deep enough to never overflow. If I disable syn cookies the "Sending cookies" goes to "Dropping connections".
I speculate that this is some sort of linux TCP stack tuning problem and I've read just about everything I could find on the net. Nothing I have tried seems to matter. Any advice?
Update: Tried with tcp_max_syn_backlog, somaxconn, netdev_max_backlog, and the listen() backlog param set to 50k with no change in behavior. Still produces the SYN flood warning, too.
Are you running ab on the same machine running node? If not do you have a 1G or 10G NIC? If you are, then aren't you really trying to process 20,000 open connections?
Also if you are changing net.core.somaxconn to 10,000 you have absolutely no other sockets open on that machine? If you do then 10,000 is not high enough.
Have you tried to use nodejs cluster to spread the number of open connections per process out?
I think you might find this blog post and also the previous ones useful
http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/

Increasing the maximum number of TCP/IP connections in Linux

I am programming a server and it seems like my number of connections is being limited since my bandwidth isn't being saturated even when I've set the number of connections to "unlimited".
How can I increase or eliminate a maximum number of connections that my Ubuntu Linux box can open at a time? Does the OS limit this, or is it the router or the ISP? Or is it something else?
Maximum number of connections are impacted by certain limits on both client & server sides, albeit a little differently.
On the client side:
Increase the ephermal port range, and decrease the tcp_fin_timeout
To find out the default values:
sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_fin_timeout
The ephermal port range defines the maximum number of outbound sockets a host can create from a particular I.P. address. The fin_timeout defines the minimum time these sockets will stay in TIME_WAIT state (unusable after being used once).
Usual system defaults are:
net.ipv4.ip_local_port_range = 32768 61000
net.ipv4.tcp_fin_timeout = 60
This basically means your system cannot consistently guarantee more than (61000 - 32768) / 60 = 470 sockets per second. If you are not happy with that, you could begin with increasing the port_range. Setting the range to 15000 61000 is pretty common these days. You could further increase the availability by decreasing the fin_timeout. Suppose you do both, you should see over 1500 outbound connections per second, more readily.
To change the values:
sysctl net.ipv4.ip_local_port_range="15000 61000"
sysctl net.ipv4.tcp_fin_timeout=30
The above should not be interpreted as the factors impacting system capability for making outbound connections per second. But rather these factors affect system's ability to handle concurrent connections in a sustainable manner for large periods of "activity."
Default Sysctl values on a typical Linux box for tcp_tw_recycle & tcp_tw_reuse would be
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_tw_reuse=0
These do not allow a connection from a "used" socket (in wait state) and force the sockets to last the complete time_wait cycle. I recommend setting:
sysctl net.ipv4.tcp_tw_recycle=1
sysctl net.ipv4.tcp_tw_reuse=1
This allows fast cycling of sockets in time_wait state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these sockets. Make sure to read post "Coping with the TCP TIME-WAIT" from Vincent Bernat to understand the implications. The net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you. Note that net.ipv4.tcp_tw_recycle has been removed from Linux 4.12.
On the Server Side:
The net.core.somaxconn value has an important role. It limits the maximum number of requests queued to a listen socket. If you are sure of your server application's capability, bump it up from default 128 to something like 128 to 1024. Now you can take advantage of this increase by modifying the listen backlog variable in your application's listen call, to an equal or higher integer.
sysctl net.core.somaxconn=1024
txqueuelen parameter of your ethernet cards also have a role to play. Default values are 1000, so bump them up to 5000 or even more if your system can handle it.
ifconfig eth0 txqueuelen 5000
echo "/sbin/ifconfig eth0 txqueuelen 5000" >> /etc/rc.local
Similarly bump up the values for net.core.netdev_max_backlog and net.ipv4.tcp_max_syn_backlog. Their default values are 1000 and 1024 respectively.
sysctl net.core.netdev_max_backlog=2000
sysctl net.ipv4.tcp_max_syn_backlog=2048
Now remember to start both your client and server side applications by increasing the FD ulimts, in the shell.
Besides the above one more popular technique used by programmers is to reduce the number of tcp write calls. My own preference is to use a buffer wherein I push the data I wish to send to the client, and then at appropriate points I write out the buffered data into the actual socket. This technique allows me to use large data packets, reduce fragmentation, reduces my CPU utilization both in the user land and at kernel-level.
There are a couple of variables to set the max number of connections. Most likely, you're running out of file numbers first. Check ulimit -n. After that, there are settings in /proc, but those default to the tens of thousands.
More importantly, it sounds like you're doing something wrong. A single TCP connection ought to be able to use all of the bandwidth between two parties; if it isn't:
Check if your TCP window setting is large enough. Linux defaults are good for everything except really fast inet link (hundreds of mbps) or fast satellite links. What is your bandwidth*delay product?
Check for packet loss using ping with large packets (ping -s 1472 ...)
Check for rate limiting. On Linux, this is configured with tc
Confirm that the bandwidth you think exists actually exists using e.g., iperf
Confirm that your protocol is sane. Remember latency.
If this is a gigabit+ LAN, can you use jumbo packets? Are you?
Possibly I have misunderstood. Maybe you're doing something like Bittorrent, where you need lots of connections. If so, you need to figure out how many connections you're actually using (try netstat or lsof). If that number is substantial, you might:
Have a lot of bandwidth, e.g., 100mbps+. In this case, you may actually need to up the ulimit -n. Still, ~1000 connections (default on my system) is quite a few.
Have network problems which are slowing down your connections (e.g., packet loss)
Have something else slowing you down, e.g., IO bandwidth, especially if you're seeking. Have you checked iostat -x?
Also, if you are using a consumer-grade NAT router (Linksys, Netgear, DLink, etc.), beware that you may exceed its abilities with thousands of connections.
I hope this provides some help. You're really asking a networking question.
To improve upon the answer given by #derobert,
You can determine what your OS connection limit is by catting nf_conntrack_max. For example:
cat /proc/sys/net/netfilter/nf_conntrack_max
You can use the following script to count the number of TCP connections to a given range of tcp ports. By default 1-65535.
This will confirm whether or not you are maxing out your OS connection limit.
Here's the script.
#!/bin/sh
OS=$(uname)
case "$OS" in
'SunOS')
AWK=/usr/bin/nawk
;;
'Linux')
AWK=/bin/awk
;;
'AIX')
AWK=/usr/bin/awk
;;
esac
netstat -an | $AWK -v start=1 -v end=65535 ' $NF ~ /TIME_WAIT|ESTABLISHED/ && $4 !~ /127\.0\.0\.1/ {
if ($1 ~ /\./)
{sip=$1}
else {sip=$4}
if ( sip ~ /:/ )
{d=2}
else {d=5}
split( sip, a, /:|\./ )
if ( a[d] >= start && a[d] <= end ) {
++connections;
}
}
END {print connections}'
In an application level, here are something a developer can do:
From server side:
Check if load balancer(if you have),works correctly.
Turn slow TCP timeouts into 503 Fast Immediate response, if you load balancer work correctly, it should pick the working resource to serve, and it's better than hanging there with unexpected error massages.
Eg: If you are using node server, u can use toobusy from npm.
Implementation something like:
var toobusy = require('toobusy');
app.use(function(req, res, next) {
if (toobusy()) res.send(503, "I'm busy right now, sorry.");
else next();
});
Why 503? Here are some good insights for overload:
http://ferd.ca/queues-don-t-fix-overload.html
We can do some work in client side too:
Try to group calls in batch, reduce the traffic and total requests number b/w client and server.
Try to build a cache mid-layer to handle unnecessary duplicates requests.
im trying to resolve this in 2022 on loadbalancers and one way I found is to attach another IPv4 (or eventualy IPv6) to NIC, so the limit is now doubled. Of course you need to configure the second IP to the service which is trying to connect to the machine (in my case another DNS entry)

Resources