Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I wrote trivial a node.js client/server pair to test local limits on concurrent connections. No data is sent between them: 10.000 clients connect, and wait.
Each time I run the test, I spawn a server and 10 clients that create 1000 connections each.
It takes a little over 2 seconds to reach ~8000 concurrent connections. Then it stops. No errors happen (on 'error' callbacks don't fire, close doesn't fire either). Connections "block", with no result or timeout.
I've already raised the max file descriptor limit (ulimit -n), and allowed more read/write memory to be consumed by the TCP stack via sysctl (net.ipv4.tcp_rmem and wmem).
What's the cap I'm hitting? How can I lift it?
-- EDIT --
Server program, with logging code stripped:
clients = []
server = net.createServer()
server.on 'connection', (socket) ->
clients.push socket
server.listen 5050
Client (this runs n times):
sockets = []
for [1..num_sockets]
socket = new net.Socket
sockets.push socket
socket.connect 5050
These are the system limits:
sysctl -w net.ipv4.ip_local_port_range="500 65535"
sysctl -w net.ipv4.tcp_tw_recycle="1"
sysctl -w net.ipv4.tcp_tw_reuse="1"
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 16384 16777216"
sysctl -w fs.file-max="655300"
sysctl -w fs.nr_open="3000000"
ulimit -n 2000000
Related
I'm coding an application opening on a single linux machine 1800 connections/minute using Netty (async nio). A connection lives for a few seconds and then it is closed or it is timeouted after 20 secs if no answer is received. More, the read/write timeout is 30 secs and the request header contains connection=close.
After a while (2-3 hours) I get a lot of exceptions in the logs because Netty is unable to create new connections due to a lack of resources.
I increased the max number of open files in limits.conf as:
root hard nofile 200000
root soft nofile 200000
Here is the output of netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n:
1 established)
1 FIN_WAIT2
1 Foreign
2 TIME_WAIT
6 LISTEN
739 SYN_SENT
6439 LAST_ACK
6705 CLOSE_WAIT
12484 ESTABLISHED
This is the output of the ss -s command:
Total: 194738 (kernel 194975)
TCP: 201128 (estab 13052, closed 174321, orphaned 6477, synrecv 0, timewait 3/0), ports 0
Transport Total IP IPv6
* 194975 - -
RAW 0 0 0
UDP 17 12 5
TCP 26807 8 26799
INET 26824 20 26804
FRAG 0 0 0
Also ls -l /proc/2448/fd | wc -l gives about 199K.
That said, the questions are about the closed connections reported in the ss -s command output:
1)what are they exactly?
2)why do they keep dangling without being destroyed?
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
1)what are they exactly?
They are sockets that were either never connected or were disconnected and weren't closed.
In Linux, an outgoing TCP socket goes through the following stages (roughly):
You create the socket (unconnected), and kernel allocates a file descriptor for it.
You connect() it to the remote side, establishing a network connection.
You do data transfer (read/write).
When you are done with reading/writing, you shutdown() the socket for both reading and writing, closing the network connection.
You close() the socket, and kernel frees the file descriptor.
So those 174K connections ss reports as closed are sockets that were either not gone past stage 1 (maybe connect() failed or even never called) or gone through stage 4, but not 5. Effectively, they are sockets with underlying open file descriptors, but without any network binding (so netstat / ss listing doesn't show them).
2)why do they keep dangling without being destroyed?
Because nobody called close() on them. I would call it a "file descriptor leak" or a "socket descriptor leak".
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
From the Linux point of view, no. You have to explicitly call close() on them (or terminate the process that owns them so the kernel knows they aren't used anymore).
From the Netty/Java point of view, maybe, I don't know.
Essentially, it's a bug in your code, or in Netty code (less likely), or in JRE code (much less likely). You are not releasing the resources when you should. If you show the code, maybe somebody can spot the error.
As Roman correctly pointed out, closed connections do exist and are sockets which have never been closed properly.
In my case I had some clues about what was going wrong which I report below:
1)ss -s showed stranges values, in particular a lot of closed connections
2)ls -l /proc/pid/fd | wc -l showed a lot of open descriptors
3)Numbers in netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n did not match with the previous ones
4)sudo lsof -n -p pid (Roman suggestion) showed a lot of entries with can't identify protocol.
Looking around on the web I found an interesting post (https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/) which explains what really point 4 might mean and why netstat numbers do not match (see also here https://serverfault.com/questions/153983/sockets-found-by-lsof-but-not-by-netstat) .
I was quite surprised, since I used netty 4.1.x (with Spring) with a common pattern where every connection was supposed to be properly closed, so I spent a few days before understanding what was really wrong.
The subtle problem was in the netty IO thread, where the message body was copied and put in a blocking queue (as part of my code). When the queue was full, that slowed down things, introducing some latency and causing a connection time out not detected by my end and, consequently, the leak of FDs.
My solution was to introduce a sort of pooled queue preventing netty requests when the queue is full.
You still haven't provided the exact error message I asked for, but as far as I can see the question should be about the six and a half thousand connections in CLOSE_WAIT state, not 'those closed connections'.
You're not closing sockets that the peer has disconnected.
That said, the questions are about those closed connections.
What closed connections? Your netstat display doesn't show any closed connections. And there is no evidence that your resource exhaustion problem has anything to do with closed connections.
1)what are they exactly?
They aren't.
2)why do they keep dangling without being destroyed?
They don't.
3)Is there any setting (timeout or whatever) which can help to keep them under a reasonable limit?
As they don't exist, the question is meaningless.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Is there a possibility to filter tcpdump (live or after creating a dump) based on tcp connection time (connection duration)?
I'm recording http json rpc traffic.
I want to record only connections that are longer than lets say 1000 ms.
In wireshark there is tool in Menu->Statistics->Conversations (TCP tab) and there i can sort by "Duration". But i want to record (or filter) long lived connections before (not in wireshark).
In pseudo commands I want to do something like this:
tcpdump -i eth0 port 80 and connectionTime>1000ms -w data.pcap
or after recording:
cat data.pcap | SOMETOOL -connectionTime>1000ms > dataLongConnections.pcap
SOMETOOL must export filtered data to format that Wireshark will understand.
Because after filtering I want to analyze that data in Wireshark.
How I can do this?
SplitCap might work for you. It will take PCAP as an input and output separate PCAPs for each TCP/UDP session. After the split you could filter from the output PCAPs the interesting ones to keep.
You need to consider your traffic at flow level instead of packet level.
If you worked with NetFlow you could use flow-tools and flow-nfilter to filter flows by duration. So you could convert your pcap to NetFlow and later filter it.
The drawback is that at the output you get NetFlow, not PCAP. For building some stats it is sufficient, but to check packets - not certainly.
You can also build your own tool with libpcap in C (hard way) or scapy in python (easier way). The latter option shouldn't be too difficult (provided you work with python)
I'm working on a somewhat unusual application where 10k clients are precisely timed to all try to submit data at once, every 3 mins or so. This 'ab' command fairly accurately simulates one barrage in the real world:
ab -c 10000 -n 10000 -r "http://example.com/submit?data=foo"
I'm using Node.js on Ubuntu 12.4 on a rackspacecloud VPS instance to collect these submissions, however, I'm seeing some very odd behavior from Node, even when I remove all my business logic and turn the http request into a no-op.
When the test gets about 90% done, it hangs for a long period of time. Strangely, this happens consistently at 90% - for c=n=10k, at 9000; for c=n=5k, at 4500; for c=n=2k, at 1800. The test actually completes eventually, often with no errors. But both ab and node logs show continuous processing up till around 80-90% of the test run, then a long pause before completing.
When node is processing requests normally, CPU usage is typically around 50-70%. During the hang period, CPU goes up to 100%. Sometimes it stays near 0. Between the erratic CPU response and the fact that it seems unrelated to the actual number of connections (only the % complete), I do not suspect the garbage collector.
I've tried this running 'ab' on localhost and on a remote server - same effect.
I suspect something related to the TCP stack, possibly involving closing connections, but none of my configuration changes have helped. My changes:
ulimit -n 999999
When I listen(), I set the backlog to 10000
Sysctl changes are:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_max_orphans = 20000
net.ipv4.tcp_max_syn_backlog = 10000
net.core.somaxconn = 10000
net.core.netdev_max_backlog = 10000
I have also noticed that I tend to get this msg in the kernel logs:
TCP: Possible SYN flooding on port 80. Sending cookies. Check SNMP counters.
I'm puzzled by this msg since the TCP backlog queue should be deep enough to never overflow. If I disable syn cookies the "Sending cookies" goes to "Dropping connections".
I speculate that this is some sort of linux TCP stack tuning problem and I've read just about everything I could find on the net. Nothing I have tried seems to matter. Any advice?
Update: Tried with tcp_max_syn_backlog, somaxconn, netdev_max_backlog, and the listen() backlog param set to 50k with no change in behavior. Still produces the SYN flood warning, too.
Are you running ab on the same machine running node? If not do you have a 1G or 10G NIC? If you are, then aren't you really trying to process 20,000 open connections?
Also if you are changing net.core.somaxconn to 10,000 you have absolutely no other sockets open on that machine? If you do then 10,000 is not high enough.
Have you tried to use nodejs cluster to spread the number of open connections per process out?
I think you might find this blog post and also the previous ones useful
http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/
I am programming a server and it seems like my number of connections is being limited since my bandwidth isn't being saturated even when I've set the number of connections to "unlimited".
How can I increase or eliminate a maximum number of connections that my Ubuntu Linux box can open at a time? Does the OS limit this, or is it the router or the ISP? Or is it something else?
Maximum number of connections are impacted by certain limits on both client & server sides, albeit a little differently.
On the client side:
Increase the ephermal port range, and decrease the tcp_fin_timeout
To find out the default values:
sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_fin_timeout
The ephermal port range defines the maximum number of outbound sockets a host can create from a particular I.P. address. The fin_timeout defines the minimum time these sockets will stay in TIME_WAIT state (unusable after being used once).
Usual system defaults are:
net.ipv4.ip_local_port_range = 32768 61000
net.ipv4.tcp_fin_timeout = 60
This basically means your system cannot consistently guarantee more than (61000 - 32768) / 60 = 470 sockets per second. If you are not happy with that, you could begin with increasing the port_range. Setting the range to 15000 61000 is pretty common these days. You could further increase the availability by decreasing the fin_timeout. Suppose you do both, you should see over 1500 outbound connections per second, more readily.
To change the values:
sysctl net.ipv4.ip_local_port_range="15000 61000"
sysctl net.ipv4.tcp_fin_timeout=30
The above should not be interpreted as the factors impacting system capability for making outbound connections per second. But rather these factors affect system's ability to handle concurrent connections in a sustainable manner for large periods of "activity."
Default Sysctl values on a typical Linux box for tcp_tw_recycle & tcp_tw_reuse would be
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_tw_reuse=0
These do not allow a connection from a "used" socket (in wait state) and force the sockets to last the complete time_wait cycle. I recommend setting:
sysctl net.ipv4.tcp_tw_recycle=1
sysctl net.ipv4.tcp_tw_reuse=1
This allows fast cycling of sockets in time_wait state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these sockets. Make sure to read post "Coping with the TCP TIME-WAIT" from Vincent Bernat to understand the implications. The net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you. Note that net.ipv4.tcp_tw_recycle has been removed from Linux 4.12.
On the Server Side:
The net.core.somaxconn value has an important role. It limits the maximum number of requests queued to a listen socket. If you are sure of your server application's capability, bump it up from default 128 to something like 128 to 1024. Now you can take advantage of this increase by modifying the listen backlog variable in your application's listen call, to an equal or higher integer.
sysctl net.core.somaxconn=1024
txqueuelen parameter of your ethernet cards also have a role to play. Default values are 1000, so bump them up to 5000 or even more if your system can handle it.
ifconfig eth0 txqueuelen 5000
echo "/sbin/ifconfig eth0 txqueuelen 5000" >> /etc/rc.local
Similarly bump up the values for net.core.netdev_max_backlog and net.ipv4.tcp_max_syn_backlog. Their default values are 1000 and 1024 respectively.
sysctl net.core.netdev_max_backlog=2000
sysctl net.ipv4.tcp_max_syn_backlog=2048
Now remember to start both your client and server side applications by increasing the FD ulimts, in the shell.
Besides the above one more popular technique used by programmers is to reduce the number of tcp write calls. My own preference is to use a buffer wherein I push the data I wish to send to the client, and then at appropriate points I write out the buffered data into the actual socket. This technique allows me to use large data packets, reduce fragmentation, reduces my CPU utilization both in the user land and at kernel-level.
There are a couple of variables to set the max number of connections. Most likely, you're running out of file numbers first. Check ulimit -n. After that, there are settings in /proc, but those default to the tens of thousands.
More importantly, it sounds like you're doing something wrong. A single TCP connection ought to be able to use all of the bandwidth between two parties; if it isn't:
Check if your TCP window setting is large enough. Linux defaults are good for everything except really fast inet link (hundreds of mbps) or fast satellite links. What is your bandwidth*delay product?
Check for packet loss using ping with large packets (ping -s 1472 ...)
Check for rate limiting. On Linux, this is configured with tc
Confirm that the bandwidth you think exists actually exists using e.g., iperf
Confirm that your protocol is sane. Remember latency.
If this is a gigabit+ LAN, can you use jumbo packets? Are you?
Possibly I have misunderstood. Maybe you're doing something like Bittorrent, where you need lots of connections. If so, you need to figure out how many connections you're actually using (try netstat or lsof). If that number is substantial, you might:
Have a lot of bandwidth, e.g., 100mbps+. In this case, you may actually need to up the ulimit -n. Still, ~1000 connections (default on my system) is quite a few.
Have network problems which are slowing down your connections (e.g., packet loss)
Have something else slowing you down, e.g., IO bandwidth, especially if you're seeking. Have you checked iostat -x?
Also, if you are using a consumer-grade NAT router (Linksys, Netgear, DLink, etc.), beware that you may exceed its abilities with thousands of connections.
I hope this provides some help. You're really asking a networking question.
To improve upon the answer given by #derobert,
You can determine what your OS connection limit is by catting nf_conntrack_max. For example:
cat /proc/sys/net/netfilter/nf_conntrack_max
You can use the following script to count the number of TCP connections to a given range of tcp ports. By default 1-65535.
This will confirm whether or not you are maxing out your OS connection limit.
Here's the script.
#!/bin/sh
OS=$(uname)
case "$OS" in
'SunOS')
AWK=/usr/bin/nawk
;;
'Linux')
AWK=/bin/awk
;;
'AIX')
AWK=/usr/bin/awk
;;
esac
netstat -an | $AWK -v start=1 -v end=65535 ' $NF ~ /TIME_WAIT|ESTABLISHED/ && $4 !~ /127\.0\.0\.1/ {
if ($1 ~ /\./)
{sip=$1}
else {sip=$4}
if ( sip ~ /:/ )
{d=2}
else {d=5}
split( sip, a, /:|\./ )
if ( a[d] >= start && a[d] <= end ) {
++connections;
}
}
END {print connections}'
In an application level, here are something a developer can do:
From server side:
Check if load balancer(if you have),works correctly.
Turn slow TCP timeouts into 503 Fast Immediate response, if you load balancer work correctly, it should pick the working resource to serve, and it's better than hanging there with unexpected error massages.
Eg: If you are using node server, u can use toobusy from npm.
Implementation something like:
var toobusy = require('toobusy');
app.use(function(req, res, next) {
if (toobusy()) res.send(503, "I'm busy right now, sorry.");
else next();
});
Why 503? Here are some good insights for overload:
http://ferd.ca/queues-don-t-fix-overload.html
We can do some work in client side too:
Try to group calls in batch, reduce the traffic and total requests number b/w client and server.
Try to build a cache mid-layer to handle unnecessary duplicates requests.
im trying to resolve this in 2022 on loadbalancers and one way I found is to attach another IPv4 (or eventualy IPv6) to NIC, so the limit is now doubled. Of course you need to configure the second IP to the service which is trying to connect to the machine (in my case another DNS entry)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
We have an home-brewed XMPP server and I was asked what is our server's MSL (Maximum Segment Lifetime).
What does it mean and how can I obtain it? Is it something in the Linux /proc TCP settings?
The MSL (Maximum Segment Lifetime) is the longest time (in seconds) that a TCP segment is expected to exist in the network. It most notably comes into play during the closing of a TCP connection -- between the CLOSE_WAIT and CLOSED state, the machine waits 2 MSL's (conceptually a round trip to the end of the internet and back) for any late packets. During this time, the machine is holding resources for the mostly-closed connection. If a server is busy, then the resources held this way can become an issue. One "fix" is to lower the MSL so that they are released sooner. Generally this works OK, but occasionally it can cause confusing failure scenarios.
On Linux (RHEL anyway, which is what I am familiar with), the "variable" /proc/sys/net/ipv4/tcp_fin_timeout is the 2*MSL value. It is normally 60 (seconds).
To see it, do:
cat /proc/sys/net/ipv4/tcp_fin_timeout
To change it, do something like:
echo 5 > /proc/sys/net/ipv4/tcp_fin_timeout
Here is a TCP STATE DIAGRAM. You can find the wait in question at the bottom.
You can also see a countdown timer for sockets using -o in netstat or ss, which helps show concrete numbers about how long things will wait. For instance, TIME_WAIT does NOT use tcp_fin_timeout (it is based on TCP_TIMEWAIT_LEN which is usually hardcoded to 60s).
cat /proc/sys/net/ipv4/tcp_fin_timeout
3
# See countdown timer for all TIME_WAIT sockets in 192.168.0.0-255
ss --numeric -o state time-wait dst 192.168.0.0/24
NetidRecv-Q Send-Q Local Address:Port Peer Address:Port
tcp 0 0 192.168.100.1:57516 192.168.0.10:80 timer:(timewait,55sec,0)
tcp 0 0 192.168.100.1:57356 192.168.0.10:80 timer:(timewait,25sec,0)
tcp 0 0 192.168.100.1:57334 192.168.0.10:80 timer:(timewait,22sec,0)
tcp 0 0 192.168.100.1:57282 192.168.0.10:80 timer:(timewait,12sec,0)
tcp 0 0 192.168.100.1:57418 192.168.0.10:80 timer:(timewait,38sec,0)
tcp 0 0 192.168.100.1:57458 192.168.0.10:80 timer:(timewait,46sec,0)
tcp 0 0 192.168.100.1:57252 192.168.0.10:80 timer:(timewait,7.436ms,0)
tcp 0 0 192.168.100.1:57244 192.168.0.10:80 timer:(timewait,6.536ms,0)
This looks like it can answer your question:
http://seer.support.veritas.com/docs/264886.htm
I suggest that you ask why someone asked you this and find out how that applies to XMPP.
TCP/IP Illustrated volume 1 is online and describes 2MSL in more detail: Here
MSL is also described in the TCP RFC 793 as mentioned in wikipedia