I have a small web application which opens a TCP socket connection, issues a command, reads the response and then closes the connection for every request to a particular REST endpoint.
I've started load testing the endpoint using Apache JMeter and am noticing that after running for some time, I start seeing errors like "Cannot assign requested address", the code opening this connection is:
def lookup(word: String): Option[String] = {
try {
val socket = new Socket(InetAddress.getByName("localhost"), 2222)
val out = new PrintStream(socket.getOutputStream)
val reader = new BufferedReader(new InputStreamReader(socket.getInputStream, "utf8"))
out.println("lookup " + word)
out.flush()
var curr = reader.readLine()
var response = ""
while (!curr.contains("SUCC") && !curr.contains("FAIL")) {
response += curr + "\n"
curr = reader.readLine()
}
socket.close()
curr match {
case code if code.contains(SUCCESS_CODE) => {
Some(response)
}
case _ => None
}
}
catch {
case e: Exception => println("Got an exception "+ e.getMessage); None
}
}
When I run netstat I also see a lot of the following TIME_WAIT connection statues, which implies to me that I am running out of ports in the ephemeral space.
tcp6 0 0 localhost:54646 localhost:2222 TIME_WAIT
tcp6 0 0 localhost:54638 localhost:2222 TIME_WAIT
tcp6 0 0 localhost:54790 localhost:2222 TIME_WAIT
tcp6 0 0 localhost:54882 localhost:2222 TIME_WAIT
I am wondering what the best solution is for this issue. My current thought is that creating a connection pool where connections to this service running on port 2222 can be reused by different HTTP requests, rather than creating new requests each time. Is this a sensible way of fixing the issue and making the application scale better? It seems like a lot of overhead to introduce and definitely makes my application more complicated.
Are there any other solutions for helping this application scale and overcome this port issue that I'm not seeing? My web application is running in an Ubuntu linux VM.
Yes, creating a connection pool is a good solution. However, an easier solution is to have the server close connections, rather than the client. In this case, the server's sockets, rather than the client's, would end up in TIME_WAIT state, so the client would not run out of ports. On the server side, connections in TIME_WAIT state would not make the server run out of ports because they all use the same local port.
To make sure that the connection is closed by the server, you need to read from the socket (on the client) until you reach end-of-file condition. At this moment, it is safe to close the socket on the client side because the server has closed it already. Of course, you need to make sure that the server would close the socket, rather than, say, waiting for a new request.
Alternatively, if you have root access, there are some sysctl options that you can tweak:
net.ipv4.ip_local_port_range – range of ephemeral ports. Increase it to make more ports available for outgoing connections.
net.ipv4.tcp_tw_recycle – enable faster recycling of connections in TIME_WAIT state.
net.ipv4.tcp_tw_reuse – enable reuse of connections in TIME_WAIT state. Not recommended.
See man pages ip(7) and tcp(7) for more information.
Connection-pooling will solve this problem for you.
I want to point out that the net.ipv4.tcp_tw_recycle has been completely removed from Linux 4.12 per commit 95a22caee396
Linux will randomize timestamp offsets for each connection, making this option completely broken, with or without NAT.
When the remote host is a NAT device, the condition on timestamps will forbid all the hosts except one behind the NAT device to connect during one minute because they do not share the same timestamp clock. In doubt, this is far better to disable this option since it leads to difficult to detect and difficult to diagnose problems.
Source
Related
I'm using Nginx and it sounds that TCP socket are not properly released by Nginx. Clients which connects to my Nginx are using a proxy and so far, the same 4-tuplets ip source, port source, ip dest, port dest could be re-used in a very short period (less than 1 minute). When it occurs, Nginx seems to be lost.
Here is what I can see in a tcpdump trace :
- FIN,ACK initiated by Nginx to close the session
- ACK from the client
- FIN,ACK from the client
- ACK for the server
If the client tries to reconnect very rapidly (less than 1 minute) with the same 4-tuplets, it fails. The client sends SYN TCP packet but Nginx replies with an ACK containing an unknown sequence (the sequence number if very high and does not make any sense with the previous TCP session).
If the same 4-tuplet is re-used after more than 1 minute, there is no problem.
Thank in advance to anyone who could have an idea to solve this problem
Aurélien
I am not familiar with Nginx, but in general, TCP sockets can remain in a TIME_WAIT state after being closed for up to several minutes in order to catch stray out-of-order packets. The 4-tuple cannot be reused until the TIME_WAIT state expires.
See:
What are the CLOSE_WAIT and TIME_WAIT states?
The TIME-WAIT state in TCP and Its Effect on Busy Servers
arangod is running for some time without any problems, but at some point no more connections can be made.
aranogsh then shows the following error message:
Error message 'Could not connect to 'tcp://127.0.0.1:8529' 'connect() failed with #99 - Cannot assign requested address''
In the log file arangod still writes more trace information.
After restarting aranogd it is running without problems again, until the problem suddenly reoccurs.
Why is this happening?
Since this question was sort of answered by time, I'll use this answer to elaborate howto dig into such a situation and to get a valuable analysis on which operating system parameters to look. I'll base this on linux targets.
First we need to find out whats currently going on using the netstat tool as a root user (we care for tcp ports only):
netstat -alnpt
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp 0 0 0.0.0.0:8529 0.0.0.0:* LISTEN 3478/arangod
tcp 0 0 127.0.0.1:45218 127.0.0.1:8529 ESTABLISHED 6902/arangosh
tcp 1 0 127.0.0.1:46985 127.0.0.1:8529 CLOSE_WAIT 485/arangosh
We see an overview of the 3 possible value groups:
LISTEN: These are daemon processes offering tcp services to remote ends, in this case the arangod process with its server socket. It binds port 8529 on all available ipv4 addresses of the system (0.0.0.0) and accepts connections from any remote location (0.0.0.0:*)
ESTABLISHED: this is an active tcp connection in this case between arangosh and arangod; Arangosh has its client port (45218) in the higher range connecting arangod on port 8529.
CLOSE_WAIT: this is a connection in termination state. Its normal to have them. The TCP stack of the operating system keeps them around for a while to have a knowledge where to sort in stray TCP-packages that may have been sent, but did not arive on time.
As you see TCP ports are 16 bits unsigned integers, ranging from 0 to 65535. Server sockets start from the lower end, and most operating systems require processes to be running as root to bind ports below 1024. Client sockets start from the upper end and range down to a specified limit on the client. Since multiple clients can connect one server, while the server port range seems narrow, its usually the client side ports that wear out. If the client frequently closes and reopens the connection you may see many sockets in CLOSE_WAIT state, as many discussions across the net hint, these are the symptoms of your system eventually running out of resources. In general the solution to this problem is to to re-use existing connections through the keepalive feature.
As the solaris ndd command explains thoroughly which parameters it may modify with which consequences in the solaris kernel, the terms explained there are rather generic to tcp sockets, and may be found on many other operating systems in other ways; in linux - which we focus on here - through the /proc/sys/net-filesystem.
Some valuable switches there are:
ipv4/ip_local_port_range This is the range for the local sockets. You can try to narrow it, and use arangob --keep-alive false to explore whats happening if your system runs out of these.
time wait (often shorted to tw) is the section that controls what the TCP-Stack should do with already closed sockets in CLOSE_WAIT state. The Linux kernel can do a trick here - it can instantly re-use connections in that state for new connections. Vincent Bernat explains very nicely which screws to turn and what the differnt parameters in the kernel mean.
So once you decided to change some of your values in /proc so your host better scales to the given situation, you need to make them reboot persistant - since /proc is volatile and won't remember values across reboots.
Most linux systems therefore offer the /etc/sysctl.[d|conf] file; It maps slashes in the proc filesystem to dots, so /proc/sys/net/ipv4/tcp_tw_reuse will translate into net.ipv4.tcp_tw_reuse.
I have problem to open a listening port from localhost in a heavy loaded production system.
Sometimes some request to my port 44000 failed. During that time , I checked the telnet to the port with no response, I'm wonder to know the underneath operations takes there. Is the application that is listening to the port is failing to response to the request or it is some problem in kernel side or number of open files.
I would be thankful if someone could explain the underneath operation to opening a socket.
Let me clarify more. I have a java process which accept state full connection from 12 different server.requests are statefull SOAP message . this service is running for one year without this problem. Recently we are facing a problem that sometimes connection from source is not possible to my server in port 44000. As I checked During that time telnet to the service is not possible even from local server. But all other ports are responding good. they all are running with same user and number of allowed open files are much more bigger than this all (lsof | wc -l )
As I understood there is a mechanism in application that limits the number of connection from source to 450 concurrent session, And the problem will likely takes when I'm facing with maximum number of connection (but not all the time)
My application vendor doesn't accept that this problem is from his side and points to os / network / hardware configuration. To be honest I restarted the network service and the problem solved immediately for this special port. Any idea please???
Here's a quick overview of the steps needed to set up a server-side TCP socket in Linux:
socket() creates a new socket and allocates system resources to it (*)
bind() associates a socket with an address
listen() causes a bound socket to enter a listening state
accept() accepts a received incoming attempt, and creates a new socket for this connection. (*)
(It's explained quite clearly and in more detail on wikipedia).
(*): These operations allocate an entry in the file descriptor table and will fail if it's full. However, most applications fork and there shouldn't be issues unless the number of concurrent connections you are handling is in the thousands (see, the C10K problem).
If a call fails for this or any other reason, errno will be set to report the error condition (e.g., to EMFILE if the descriptor table is full). Most applications will report the error somewhere.
Back to your application, there are multiple reasons that could explain why it isn't responding. Without providing more information about what kind of service you are trying to set up, we can only guess. Try testing if you can telnet consistently, and see if the server is overburdened.
Cheers!
Your description leaves room for interpretation, but as we talked above, maybe your problem is that your terminated application is trying to re-use the same socket port, but it is still in TIME_WAIT state.
You can set your socket options to reuse the same address (and port) by this way:
int srv_sock;
int i = 1;
srv_sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(srv_sock, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(i));
Basically, you are telling the OS that the same socket address & port combination can be re-used, without waiting the MSL (Maximum Segment Life) timeout. This timeout can be several minutes.
This does not permit to re-use the socket when it is still in use, it only applies to the TIME_WAIT state. Apparently there is some minor possibility of data coming from previous transactions, though. But, you can (and should anyway) program your application protocol to take care of unintelligible data.
More information for example here: http://www.unixguide.net/network/socketfaq/4.5.shtml
Start TCP server with sudo will solve or, in case, edit firewalls rules (if you are connecting in LAN).
Try to scan ports with nmap (like with TCP Sync Handshake), or similar, to see if the port is opened to any protocol (maybe network security trunkates pings ecc.. to don't show hosts up). If the port isn't responsive, check privileges used by the program, check firewalls rules maybe the port is on but you can't get to it.
Mh I mean.. you are talking about enterprise network so I'm supposing you are on a LAN environment so you are just trying to localhost but you need it to work on LAN.
Anyway if you just need to open localhost port check privileges and routing, try to "tracert" and see what happens and so on...
Oh and check if port is used by a higher privilege service or deamon.
Anyway I see now that this is a 2014 post, np gg nice coding byebye
I have a listening socket on port 80 on ubuntu linux.
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 12248/nginx
Is there any way to get backlog value of that socket (backlog value that was sent to listen() call)?
I know that I could view the nginx configuration but configuration file could be changed without reloading nginx with new configuration, so the backlog argument in configuration and in actual LISTEN call could be different.
ss -lt gives this value in the Send-Q column.
id use the current backlog information to manage the number of connections received BECAUSE I can respond to the incoming connections and tell the sender to modify their connection interval, thus reducing (or increasing) load. I cannot control how many incoming connections I get but I can control how frequently they connect, hence keeping the backlog down and preventing timeouts on incoming connections.
in my case this happens to be a feature of the incoming connection source firmware, so it might be unique to my situation and not relevant to others.
There is no standard TCP API for getting the backlog. There is also no reason to need it. You created the socket, you put it into listen state, you should know what backlog you specified. Th system is entitled to adjust it up or down, but even then there is nothing useful you can do you with that information in your application.
I am seeing weird behavior on Linux where I am seeing that remote end and local end are both showing same IP and port combination. Following is the netstat output
netstat -anp | grep 6102
tcp 0 0 139.185.44.123:61020 0.0.0.0:* LISTEN 3361/a.out
tcp 0 0 139.185.44.123:61021 139.185.44.123:61021 ESTABLISHED 3361/a.out
Can anyone tell me if this is even possible ? If yes, then what could be the scenario ?
A connection is identified by a 4-tuple ((source ip, source port), (target ip, target port)), and the source and target ports could conceivably be the same without any issues. This connection could even be established by one process, which would lead to the output you're seeing.
However, I just got bitten by a nasty bug, where a client socket would try to connect to a server socket with a port number in the ephemeral port range on the same machine. The connection operation would be retried operation until it succeeded.
The retry feature was the issue: if the server application wasn't running AND the source port that got picked at random was the same as the target port (which is possible because the target port was in the ephemeral range), the client socket would CONNECT TO ITSELF (which was wreaking havoc on the internal logic of the client application, you can imagine).
Since the client was performing retries as quickly as possible, the 1 in 30.000 odds that this can happen were hit quickly enough.
The following Python script reproduces it:
import socket
host = 'localhost'
port = 35911
ctr = 0
while True:
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
ctr += 1
s.connect((host, port))
print "Connected to self after", ctr, "tries"
break
except socket.error, e:
print e
# Retry
Here is an imaginable scenario. The caller of connect could call bind before calling connect. The caller of connect could cooperate with the caller of listen, and intentionally bind to a port number 1 higher than the listening port number.
If the kernel proceeds to reuse the socket number for the listener, I'd call that a kernel bug.
When multi-threaded server software accepts connection, it usually creates another socket, which communicates with newly connected client in separate thread, while original server socket is still listening for new clients in original thread. In such cases ports of both sockets are equal. So, there's no any problem.
It's a slightly odd case, called a TCP "active/active open". A socket has been opened, bound to a local port and then connected to itself, by using connect() with its own address as the destination.
Nothing weird about that. It has a 1 in 63k chance of happening. What you won't see is * two* such ESTABLISHED* connections: that's impossible by the rules of TCP.