SYN packets dropped occasionally on Linux - linux

We're running a Debian with a 2.6.16 kernel, with iptables enabled. The system is running a custom made HTTP proxy, which is subjected to a mild load (it works fine with the same load on other sites). The system comprises of 4 servers that are preceded by a load balancer with virtual IP, which is preceded by an array of 4 ISA 2004 machines, so the basic topology is:
Client -> ISA [1-4] -> Load Balancer -> Our Proxy [1-4] -> The Internet
Occasionally, the ISA will send us a SYN packet, to which no SYN-ACK is being sent. It will try again after 3 seconds, and a third time after another 6 seconds, after which it will report the proxy down, and switch to direct connection. During this time, meaning before, in between and after those 3 SYNs, other SYNs from the same ISA come and are successfully answered to.
A very similar problem is being reported by others (with no solution, however):
All coming from a flavor of Linux called CentOS. It’s peculiarity is in having iptables enabled by default.
http://www.linuxhelpforum.com/showthread.php?t=931912&mode=linear
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=16147
Almost the same: but a bit different:
http://www.linuxquestions.org/questions/linux-networking-3/tcp-handshake-fails-synack-ignored-by-system.-637171/
Also seems to be relevant:
http://groups.google.com/group/comp.os.linux.networking/browse_thread/thread/b1c000e2d65e0034
I suspect iptables to be a culprit, but any additional feedback will be welcome.

Look at the second parameter to the listen call, as mentioned in the first link you posted. It's the maximum number of pending (not accepted yet) connections. According to the listen(2) man page, if the protocol supports retransmission (TCP does), the connection request will be dropped when the queue is full (expecting a later retransmission which will create the connection if there is enough space in the queue again).

Indeed, the iptables turned out to be the culrpit, with the rule that dropped INVALID packets. We still do not know for sure what made iptables to think those SYNs were invalid (no TIME_WAIT for sure, since we did not have any traffic with the same source ports for at least 30 mins prior to the drops).

Related

linux refuse to open listening port from localhost

I have problem to open a listening port from localhost in a heavy loaded production system.
Sometimes some request to my port 44000 failed. During that time , I checked the telnet to the port with no response, I'm wonder to know the underneath operations takes there. Is the application that is listening to the port is failing to response to the request or it is some problem in kernel side or number of open files.
I would be thankful if someone could explain the underneath operation to opening a socket.
Let me clarify more. I have a java process which accept state full connection from 12 different server.requests are statefull SOAP message . this service is running for one year without this problem. Recently we are facing a problem that sometimes connection from source is not possible to my server in port 44000. As I checked During that time telnet to the service is not possible even from local server. But all other ports are responding good. they all are running with same user and number of allowed open files are much more bigger than this all (lsof | wc -l )
As I understood there is a mechanism in application that limits the number of connection from source to 450 concurrent session, And the problem will likely takes when I'm facing with maximum number of connection (but not all the time)
My application vendor doesn't accept that this problem is from his side and points to os / network / hardware configuration. To be honest I restarted the network service and the problem solved immediately for this special port. Any idea please???
Here's a quick overview of the steps needed to set up a server-side TCP socket in Linux:
socket() creates a new socket and allocates system resources to it (*)
bind() associates a socket with an address
listen() causes a bound socket to enter a listening state
accept() accepts a received incoming attempt, and creates a new socket for this connection. (*)
(It's explained quite clearly and in more detail on wikipedia).
(*): These operations allocate an entry in the file descriptor table and will fail if it's full. However, most applications fork and there shouldn't be issues unless the number of concurrent connections you are handling is in the thousands (see, the C10K problem).
If a call fails for this or any other reason, errno will be set to report the error condition (e.g., to EMFILE if the descriptor table is full). Most applications will report the error somewhere.
Back to your application, there are multiple reasons that could explain why it isn't responding. Without providing more information about what kind of service you are trying to set up, we can only guess. Try testing if you can telnet consistently, and see if the server is overburdened.
Cheers!
Your description leaves room for interpretation, but as we talked above, maybe your problem is that your terminated application is trying to re-use the same socket port, but it is still in TIME_WAIT state.
You can set your socket options to reuse the same address (and port) by this way:
int srv_sock;
int i = 1;
srv_sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(srv_sock, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(i));
Basically, you are telling the OS that the same socket address & port combination can be re-used, without waiting the MSL (Maximum Segment Life) timeout. This timeout can be several minutes.
This does not permit to re-use the socket when it is still in use, it only applies to the TIME_WAIT state. Apparently there is some minor possibility of data coming from previous transactions, though. But, you can (and should anyway) program your application protocol to take care of unintelligible data.
More information for example here: http://www.unixguide.net/network/socketfaq/4.5.shtml
Start TCP server with sudo will solve or, in case, edit firewalls rules (if you are connecting in LAN).
Try to scan ports with nmap (like with TCP Sync Handshake), or similar, to see if the port is opened to any protocol (maybe network security trunkates pings ecc.. to don't show hosts up). If the port isn't responsive, check privileges used by the program, check firewalls rules maybe the port is on but you can't get to it.
Mh I mean.. you are talking about enterprise network so I'm supposing you are on a LAN environment so you are just trying to localhost but you need it to work on LAN.
Anyway if you just need to open localhost port check privileges and routing, try to "tracert" and see what happens and so on...
Oh and check if port is used by a higher privilege service or deamon.
Anyway I see now that this is a 2014 post, np gg nice coding byebye

Networking with Python: No response from IP Phone

I'm an Automation Developer and lately I've taken it upon myself to control an IP Phone on my desk (Cisco 7940).
I have a third party application that can control the IP phone with SCCP (Skinny) packets. Through Wireshark, I see that the application will send 4 unique SCCP packets and then receives a TCP ACK message.
SCCP is not very well known, but it looks like this:
Ethernet( IP( TCP( SCCP( ))))
Using a Python packet builder: Scapy, I've been able to send the same 4 packets to the IP Phone, however I never get the ACK. In my packets, I have correctly set the sequence, port and acknowledge values in the TCP header. The ID field in the IP header is also correct.
The only thing I can imagine wrong is that it takes Python a little more than a full second to send the four packets. Whereas the application takes significantly less time. I've tried raising the priority for the Python shell with no luck.
Does anyone have an idea why I may not be receiving the ACK back?
This website may be helpful in debugging why on your machine you aren't seeing the traffic you expect, and taking steps to modify your environment to produce the desired output.
Normally, the Linux kernel takes care of setting up and sending and
receiving network traffic. It automatically sets appropriate header
values and even knows how to complete a TCP 3 way handshake. Uising
the kernel services in this way is using a "cooked" socket.
Scapy does not use these kernel services. It creates a "raw" socket. The
entire TCP/IP stack of the OS is circumvented. Because of this, Scapy
give us compete control over the traffic. Traffic to and from Scapy
will not be filtered by iptables. Also, we will have to take care of
the TCP 3 way handshake ourselves.
http://www.packetlevel.ch/html/scapy/scapy3way.html

Dropping of connections with tcp_tw_recycle

summary of the problem
we are having a setup wherein a lot(800 to 2400 per second( of incoming connections to a linux box and we have a NAT device between the client and server.
so there are so many TIME_WAIT sockets left in the system.
To overcome that we had set tcp_tw_recycle to 1, but that led to drop of in comming connections.
after browsing through the net we did find the references for why the dropping of frames with tcp_tw_recycle and NAT device happens.
resolution tried
we then tried by setting tcp_tw_reuse to 1 it worked fine without any issues with the same setup and configuration.
But the documentation says that tcp_tw_recycle and tcp_tw_reuse should not be used when the Connections that go through TCP state aware nodes, such as firewalls, NAT devices or load balancers may see dropped frames. The more connections there are, the more likely you will see this issue.
Queries
1) can tcp_tw_reuse be used in this type of scenarios?
2) if not, which part of the linux code is preventing tcp_tw_reuse being used for such scenario?
3) generally what is the difference between tcp_tw_recycle and tcp_tw_reuse?
By default, when both tcp_tw_reuse and tcp_tw_recycle are disabled, the kernel will make sure that sockets in TIME_WAIT state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
When you enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps (a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.
When you enable tcp_tw_recycle, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.
Now, timestamps should never go back in time; unless:
the host is rebooted (but then, by the time it comes back up, TIME_WAIT socket will probably have expired, so it will be a non issue);
the IP address is quickly reused by something else (TIME_WAIT connections will stay a bit, but other connections will probably be struck by TCP RST and that will free up some space);
network address translation (or a smarty-pants firewall) is involved in the middle of the connection.
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
Some people recommend to leave tcp_tw_recycle alone, but enable tcp_tw_reuse and lower tcp_fin_timeout. I concur :-)

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

Intercept traffic above the transport layer

Firstly, I'm relatively new to network programming. I want to intercept and delay HTTP traffic before it gets to the server application. I've delved into libnetfilter_queue which gives me all the information I need to delay suitably, but at too low a level. I can delay traffic there, but unless I accept the IP datagrams almost immediately (so sending them up the stack when I want to delay them), they will get resent (when no ACK arrives), which isn't what I want.
I don't want or need to have to deal with TCP, just the payloads it delivers. So my question is how do I intercept traffic on a particular port before it reaches its destination, but after TCP has acknowledged and checked it?
Thanks
Edit: Hopefully it's obvious from the tag and libnetfilter_queue - this is for Linux
Hijack the connections through an HTTP proxy. Google up a good way to do this if you can't just set HTTP_PROXY on the client, or set up your filter running with the IP and port number of the current server, moving the real server to another IP.
So the actual TCP connections are between the client and you, then from you to the server. Then you don't have to deal with ACKs, because TCP always sees mission accomplished.
edit: I see the comments on the original already came up with this idea using iptables to redirect the traffic through your transparent proxy process on the same machine.
Well I've done what I suggested in my comment, and it works, even if it did feel a long-winded way of doing it.
The (or a) problem is that the web server now, understandably, thinks that every request comes from localhost. Really I would like this delay to be transparent to both client and server (except in time of course!). Is there anything I can do about this?
If not, what are the implications? Each HTTP session happens through a different port - is that enough for them to be separated completely as they should be? Presumably so considering it works when behind a NAT where the address for many sessions is the same.

Resources