Get network connections from /proc/net/sockstat - linux

I founds this information in /proc which displays sockets:
$ cat /proc/net/sockstat
sockets: used 8278
TCP: inuse 1090 orphan 2 tw 18 alloc 1380 mem 851
UDP: inuse 6574
RAW: inuse 1
FRAG: inuse 0 memory 0
Can you help me to find what these values means? Also are these values enough reliable or I need to search for it somewhere else?
Is these other way to find information about the TCP/UDP connections in Linux?

Can you help me to find what these values means?
As per code here the values are number of sockets in use (TCP / UDP), number of orphan TCP sockets (socket that applications have no more handles to, they already called close()). TCP tw I am not sure but, based on structure name (tcp_death_row), those are the sockets to be definitively destroyed in near future? sockets represent the number of allocated sockets (as per my understand, contemplates TCP sockets in different states) and mem is number of pages allocated by TCP sockets (memory usage).
This article has some discussions around this topic.

In my understanding the /proc/net/sockstat is The most reliable place to look for that information. I often use it myself, and to have one single server to manage 1MM simultaneous connections that was the only place I could reliably count that information.

You can use the netstat command which itself utilizes the /proc filesystem but prints information more readable for humans.
If you want to display the current tcp connections for example, you can issue the following command:
netstat -t
Check man netstat for the numerous options.

Related

Linux - Too many closed connections

I'm coding an application opening on a single linux machine 1800 connections/minute using Netty (async nio). A connection lives for a few seconds and then it is closed or it is timeouted after 20 secs if no answer is received. More, the read/write timeout is 30 secs and the request header contains connection=close.
After a while (2-3 hours) I get a lot of exceptions in the logs because Netty is unable to create new connections due to a lack of resources.
I increased the max number of open files in limits.conf as:
root hard nofile 200000
root soft nofile 200000
Here is the output of netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n:
1 established)
1 FIN_WAIT2
1 Foreign
2 TIME_WAIT
6 LISTEN
739 SYN_SENT
6439 LAST_ACK
6705 CLOSE_WAIT
12484 ESTABLISHED
This is the output of the ss -s command:
Total: 194738 (kernel 194975)
TCP: 201128 (estab 13052, closed 174321, orphaned 6477, synrecv 0, timewait 3/0), ports 0
Transport Total IP IPv6
* 194975 - -
RAW 0 0 0
UDP 17 12 5
TCP 26807 8 26799
INET 26824 20 26804
FRAG 0 0 0
Also ls -l /proc/2448/fd | wc -l gives about 199K.
That said, the questions are about the closed connections reported in the ss -s command output:
1)what are they exactly?
2)why do they keep dangling without being destroyed?
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
1)what are they exactly?
They are sockets that were either never connected or were disconnected and weren't closed.
In Linux, an outgoing TCP socket goes through the following stages (roughly):
You create the socket (unconnected), and kernel allocates a file descriptor for it.
You connect() it to the remote side, establishing a network connection.
You do data transfer (read/write).
When you are done with reading/writing, you shutdown() the socket for both reading and writing, closing the network connection.
You close() the socket, and kernel frees the file descriptor.
So those 174K connections ss reports as closed are sockets that were either not gone past stage 1 (maybe connect() failed or even never called) or gone through stage 4, but not 5. Effectively, they are sockets with underlying open file descriptors, but without any network binding (so netstat / ss listing doesn't show them).
2)why do they keep dangling without being destroyed?
Because nobody called close() on them. I would call it a "file descriptor leak" or a "socket descriptor leak".
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
From the Linux point of view, no. You have to explicitly call close() on them (or terminate the process that owns them so the kernel knows they aren't used anymore).
From the Netty/Java point of view, maybe, I don't know.
Essentially, it's a bug in your code, or in Netty code (less likely), or in JRE code (much less likely). You are not releasing the resources when you should. If you show the code, maybe somebody can spot the error.
As Roman correctly pointed out, closed connections do exist and are sockets which have never been closed properly.
In my case I had some clues about what was going wrong which I report below:
1)ss -s showed stranges values, in particular a lot of closed connections
2)ls -l /proc/pid/fd | wc -l showed a lot of open descriptors
3)Numbers in netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n did not match with the previous ones
4)sudo lsof -n -p pid (Roman suggestion) showed a lot of entries with can't identify protocol.
Looking around on the web I found an interesting post (https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/) which explains what really point 4 might mean and why netstat numbers do not match (see also here https://serverfault.com/questions/153983/sockets-found-by-lsof-but-not-by-netstat) .
I was quite surprised, since I used netty 4.1.x (with Spring) with a common pattern where every connection was supposed to be properly closed, so I spent a few days before understanding what was really wrong.
The subtle problem was in the netty IO thread, where the message body was copied and put in a blocking queue (as part of my code). When the queue was full, that slowed down things, introducing some latency and causing a connection time out not detected by my end and, consequently, the leak of FDs.
My solution was to introduce a sort of pooled queue preventing netty requests when the queue is full.
You still haven't provided the exact error message I asked for, but as far as I can see the question should be about the six and a half thousand connections in CLOSE_WAIT state, not 'those closed connections'.
You're not closing sockets that the peer has disconnected.
That said, the questions are about those closed connections.
What closed connections? Your netstat display doesn't show any closed connections. And there is no evidence that your resource exhaustion problem has anything to do with closed connections.
1)what are they exactly?
They aren't.
2)why do they keep dangling without being destroyed?
They don't.
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
As they don't exist, the question is meaningless.

How to set the maximum TCP receive window size in Linux?

I want to limit the rate of every TCP connection. Can I set the maximum TCP receive window size in Linux?
With iptables + tc can only limit IP packets. The parameters net.core.rmem_max and net.core.wmem_max didn't not work well.
man tcp:
Linux supports RFC 1323 TCP high performance extensions. These include Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and Timestamps. Window scaling allows the use of large (> 64K) TCP windows in order to support links with high latency or bandwidth. To make use of them, the send and receive buffer sizes must be increased. They can be set globally with the /proc/sys/net/ipv4/tcp_wmem and /proc/sys/net/ipv4/tcp_rmem files, or on individual sockets by using the SO_SNDBUF and SO_RCVBUF socket options with the setsockopt(2) call.

How to programmatically increase the per-socket buffer for UDP sockets on LInux?

I'm trying to understand the correct way to increase the socket buffer size on Linux for our streaming network application. The application receives variable bitrate data streamed to it on a number of UDP sockets. The volume of data is substantially higher at the start of the stream and I've used:
# sar -n UDP 1 200
to show that the UDP stack is discarding packets and
# ss -un -pa
to show that each socket Recv-Q length grows to the nearly the limit (124928. from sysctl net.core.rmem_default) before packets are discarded. This implies that the application simply can't keep up with the start of the stream. After discarding enough initial packets the data rate slows down and the application catches up. Recv-Q trends towards 0 and remains there for the duration.
I'm able to address the packet loss by substantially increasing the rmem_default value which increases the socket buffer size and gives the application time to recover from the large initial bursts. My understanding is that this changes the default allocation for all sockets on the system. I'd rather just increase the allocation for the specific UDP sockets and not modify the global default.
My initial strategy was to modify rmem_max and to use setsockopt(SO_RCVBUF) on each individual socket. However, this question makes me concerned about disabling Linux autotuning for all sockets and not just UDP.
udp(7) describes the udp_mem setting but I'm confused how these values interact with the rmem_default and rmem_max values. The language it uses is "all sockets", so my suspicion is that these settings apply to the complete UDP stack and not individual UDP sockets.
Is udp_rmem_min the setting I'm looking for? It seems to apply to individual sockets but global to all UDP sockets on the system.
Is there a way to safely increase the socket buffer length for the specific UDP ports used in my application without modifying any global settings?
Thanks.
Jim Gettys is armed and coming for you. Don't go to sleep.
The solution to network packet floods is almost never to increase buffering. Why is your protocol's queueing strategy not backing off? Why can't you just use TCP if you're trying to send so much data in a stream (which is what TCP was designed for).

What are the units of UDP buffers, and where are docs for sysctl params?

I'm running x86_64 RedHat 5.3 (kernel 2.6.18) and looking specifically at net.core.rmem_max from sysctl -a in the context of trying to set UDP buffers. The receiver application misses packets sometimes, but I think the buffer is already plenty large, depending upon what it means:
What are the units of this setting -- bits, bytes, packets, or pages? If bits or bytes, is it from the datagram/ payload (such as 100 bytes) or the network MTU size (~1500 bytes)? If pages, what's the page size in bytes?
And is this the max per system, per physical device (NIC), per virtual device (VLAN), per process, per thread, per socket/ per multicast group?
For example, suppose my data is 100 bytes per message, and each network packet holds 2 messages, and I want to be able to buffer 50,000 messages per socket, and I open 3 sockets per thread on each of 4 threads. How big should net.core.rmem_max be? Likewise, when I set socket options inside the application, are the units payload bytes, so 5000000 on each socket in this case?
Finally, in general how would I find details of the units for the parameters I see via sysctl -a? I have similar units and per X questions about other parameters such as net.core.netdev_max_backlog and net.ipv4.igmp_max_memberships.
Thank you.
You'd look at these docs. That said, many of these parameters really are quite poorly documented, so do expect do do som googling to dig out the gory details from blogs and mailinglists.
rmem_max is the per socket maximum buffer, in bytes. Digging around, this appears to be the memory where whole packets are received, so the size have to include the sizes of whatever/ip/udp headers as well - though this area is quite fuzzy to me.
Keep in mind though, UDP is unreliable. There's a lot of sources for loss, not the least inbetween switches and routers - these have buffers as well.
It is fully documented in the socket(7) man page (it is in bytes).
Moreover, the limit may be set on a per-socket basis with SO_RCVBUF (as documented in the same page).
Read the socket(7), ip(7) and udp(7) man pages for information on how these things actually work. The sysctls are documented there.

Increasing the maximum number of TCP/IP connections in Linux

I am programming a server and it seems like my number of connections is being limited since my bandwidth isn't being saturated even when I've set the number of connections to "unlimited".
How can I increase or eliminate a maximum number of connections that my Ubuntu Linux box can open at a time? Does the OS limit this, or is it the router or the ISP? Or is it something else?
Maximum number of connections are impacted by certain limits on both client & server sides, albeit a little differently.
On the client side:
Increase the ephermal port range, and decrease the tcp_fin_timeout
To find out the default values:
sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_fin_timeout
The ephermal port range defines the maximum number of outbound sockets a host can create from a particular I.P. address. The fin_timeout defines the minimum time these sockets will stay in TIME_WAIT state (unusable after being used once).
Usual system defaults are:
net.ipv4.ip_local_port_range = 32768 61000
net.ipv4.tcp_fin_timeout = 60
This basically means your system cannot consistently guarantee more than (61000 - 32768) / 60 = 470 sockets per second. If you are not happy with that, you could begin with increasing the port_range. Setting the range to 15000 61000 is pretty common these days. You could further increase the availability by decreasing the fin_timeout. Suppose you do both, you should see over 1500 outbound connections per second, more readily.
To change the values:
sysctl net.ipv4.ip_local_port_range="15000 61000"
sysctl net.ipv4.tcp_fin_timeout=30
The above should not be interpreted as the factors impacting system capability for making outbound connections per second. But rather these factors affect system's ability to handle concurrent connections in a sustainable manner for large periods of "activity."
Default Sysctl values on a typical Linux box for tcp_tw_recycle & tcp_tw_reuse would be
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_tw_reuse=0
These do not allow a connection from a "used" socket (in wait state) and force the sockets to last the complete time_wait cycle. I recommend setting:
sysctl net.ipv4.tcp_tw_recycle=1
sysctl net.ipv4.tcp_tw_reuse=1
This allows fast cycling of sockets in time_wait state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these sockets. Make sure to read post "Coping with the TCP TIME-WAIT" from Vincent Bernat to understand the implications. The net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you. Note that net.ipv4.tcp_tw_recycle has been removed from Linux 4.12.
On the Server Side:
The net.core.somaxconn value has an important role. It limits the maximum number of requests queued to a listen socket. If you are sure of your server application's capability, bump it up from default 128 to something like 128 to 1024. Now you can take advantage of this increase by modifying the listen backlog variable in your application's listen call, to an equal or higher integer.
sysctl net.core.somaxconn=1024
txqueuelen parameter of your ethernet cards also have a role to play. Default values are 1000, so bump them up to 5000 or even more if your system can handle it.
ifconfig eth0 txqueuelen 5000
echo "/sbin/ifconfig eth0 txqueuelen 5000" >> /etc/rc.local
Similarly bump up the values for net.core.netdev_max_backlog and net.ipv4.tcp_max_syn_backlog. Their default values are 1000 and 1024 respectively.
sysctl net.core.netdev_max_backlog=2000
sysctl net.ipv4.tcp_max_syn_backlog=2048
Now remember to start both your client and server side applications by increasing the FD ulimts, in the shell.
Besides the above one more popular technique used by programmers is to reduce the number of tcp write calls. My own preference is to use a buffer wherein I push the data I wish to send to the client, and then at appropriate points I write out the buffered data into the actual socket. This technique allows me to use large data packets, reduce fragmentation, reduces my CPU utilization both in the user land and at kernel-level.
There are a couple of variables to set the max number of connections. Most likely, you're running out of file numbers first. Check ulimit -n. After that, there are settings in /proc, but those default to the tens of thousands.
More importantly, it sounds like you're doing something wrong. A single TCP connection ought to be able to use all of the bandwidth between two parties; if it isn't:
Check if your TCP window setting is large enough. Linux defaults are good for everything except really fast inet link (hundreds of mbps) or fast satellite links. What is your bandwidth*delay product?
Check for packet loss using ping with large packets (ping -s 1472 ...)
Check for rate limiting. On Linux, this is configured with tc
Confirm that the bandwidth you think exists actually exists using e.g., iperf
Confirm that your protocol is sane. Remember latency.
If this is a gigabit+ LAN, can you use jumbo packets? Are you?
Possibly I have misunderstood. Maybe you're doing something like Bittorrent, where you need lots of connections. If so, you need to figure out how many connections you're actually using (try netstat or lsof). If that number is substantial, you might:
Have a lot of bandwidth, e.g., 100mbps+. In this case, you may actually need to up the ulimit -n. Still, ~1000 connections (default on my system) is quite a few.
Have network problems which are slowing down your connections (e.g., packet loss)
Have something else slowing you down, e.g., IO bandwidth, especially if you're seeking. Have you checked iostat -x?
Also, if you are using a consumer-grade NAT router (Linksys, Netgear, DLink, etc.), beware that you may exceed its abilities with thousands of connections.
I hope this provides some help. You're really asking a networking question.
To improve upon the answer given by #derobert,
You can determine what your OS connection limit is by catting nf_conntrack_max. For example:
cat /proc/sys/net/netfilter/nf_conntrack_max
You can use the following script to count the number of TCP connections to a given range of tcp ports. By default 1-65535.
This will confirm whether or not you are maxing out your OS connection limit.
Here's the script.
#!/bin/sh
OS=$(uname)
case "$OS" in
'SunOS')
AWK=/usr/bin/nawk
;;
'Linux')
AWK=/bin/awk
;;
'AIX')
AWK=/usr/bin/awk
;;
esac
netstat -an | $AWK -v start=1 -v end=65535 ' $NF ~ /TIME_WAIT|ESTABLISHED/ && $4 !~ /127\.0\.0\.1/ {
if ($1 ~ /\./)
{sip=$1}
else {sip=$4}
if ( sip ~ /:/ )
{d=2}
else {d=5}
split( sip, a, /:|\./ )
if ( a[d] >= start && a[d] <= end ) {
++connections;
}
}
END {print connections}'
In an application level, here are something a developer can do:
From server side:
Check if load balancer(if you have),works correctly.
Turn slow TCP timeouts into 503 Fast Immediate response, if you load balancer work correctly, it should pick the working resource to serve, and it's better than hanging there with unexpected error massages.
Eg: If you are using node server, u can use toobusy from npm.
Implementation something like:
var toobusy = require('toobusy');
app.use(function(req, res, next) {
if (toobusy()) res.send(503, "I'm busy right now, sorry.");
else next();
});
Why 503? Here are some good insights for overload:
http://ferd.ca/queues-don-t-fix-overload.html
We can do some work in client side too:
Try to group calls in batch, reduce the traffic and total requests number b/w client and server.
Try to build a cache mid-layer to handle unnecessary duplicates requests.
im trying to resolve this in 2022 on loadbalancers and one way I found is to attach another IPv4 (or eventualy IPv6) to NIC, so the limit is now doubled. Of course you need to configure the second IP to the service which is trying to connect to the machine (in my case another DNS entry)

Resources