TCP — delivery of packets sent before client closes connection - linux

Let’s say client opens a tcp connection to server.
Say client sends 100 packets.
10 of them reached server and were picked by application.
50 of them reached server but not yet picked up by application
40 are still sitting in client
socket buffer because the servers receive window is full.
Let’s say now client closes the connection.
Question —
Does application get the 50 packets before it is told that the connection is closed?
Does the client kernel send the remaining 40 packets first to client before it sends the FIN packet?
Now to complicate matters, if there is lot of packet loss, what happens to the remaining 40 packets and the FIN. Does it close it?

Does application get the 50 packets before it is told that the connection is closed?
It does.
Does the client kernel send the remaining 40 packets first to client before it sends the FIN packet?
It does.
Now to complicate matters, if there is lot of packet loss, what happens to the remaining 40 packets and the FIN. Does it close it?
The kernel will keep trying to send the outstanding data. The fact that you closed the socket doesn't change things, unless you altered socket options to change this behaviour.

Related

When is a TCP connection considered idle?

I have a requirement to enable TCP keepalive on any connections and now I am struggling with the results from our test case. I think this is because I do not really understand when the first keepalive probe is sent. I read the following in the documentation for tcp_keepalive_time on Linux:
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the
connection is marked to need keepalive, this counter is not used any
further
Some other sources state that this is the time a connection is idle, but they do not further define what this means. I also looked into Stevens to find a more formal definition of this, because I am wondering what "the last data packet sent" actually means when considering retransmissions.
In my test case, I have a connection where data is only sent from a server to a client at rather high rates. To test keepalive, we unplugged the cable on the client's NIC. I can now see that the network stack tries to send the data and enters the retransmission state, but no keep alive probe is sent. Is it correct that keep alive probes are not sent during retransmission?
I have a connection where data is only sent from a server to a client
at rather high rates.
Then you'll never see keepalives. Keepalives are sent when there is "silence on the wire". RFC1122 has some explanation re keepalives.
A "keep-alive" mechanism periodically probes the other end of a
connection when the connection is otherwise idle, even when there is
no data to be sent
Back to your question:
Some other sources state that this is the time a connection is idle,
but they do not further define what this means.
This is how long TCP will wait before poking the peer "hoy! still alive?".
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
In other words, you've been using a TCP connection and it has been great. However, for the past 2 hours there hasn't been anything to send. Is it reasonable to assume the connection is still alive? Is it reasonable to assume all the middleboxes in the middle still have state about your connection? Opinions vary and keepalives aren't part of RFC793.
The TCP specification does not include a keep-alive mechanism it
could: (1) cause perfectly good connections to break during transient
Internet failures; (2) consume unnecessary bandwidth ("if no one is
using the connection, who cares if it is still good?")
To test keepalive, we unplugged the cable on the client's NIC.
This isn't testing keepalive. This is testing your TCPs retransmit strategy, i.e. how many times and how often TCP will try to get your message across. On a Linux box this (likely) ends up testing net.ipv4.tcp_retries2:
How may times to retry before killing alive TCP connection. RFC 1122
says that the limit should be longer than 100 sec. It is too small
number. Default value 15 corresponds to 13-30min depending on RTO.
But RFC5482 - TCP User Timeout Option provides more ways to influence it.
The TCP user timeout controls how long transmitted data may remain
unacknowledged before a connection is forcefully closed.
Back to the question:
Is it correct that keep alive probes are not sent during retransmission
It makes sense: TCP is already trying to elicit a response from the other peer, an empty keepalive would be superfluous.
Linux-specific (2.4+) options to influence keepalive
TCP_KEEPCNT The maximum number of keepalive probes TCP should send before dropping the connection.
TCP_KEEPIDLE The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE has been set on this socket
TCP_KEEPINTVL The time (in seconds) between individual keepalive probes
Linux-specific (2.6.37+) option to influence TCP User Timeout
TCP_USER_TIMEOUT The maximum amount of time in
milliseconds that transmitted data may remain unacknowledged before
TCP will forcibly close connection.
So for example your application could use this option to determine how long the connection survives when there is no connectivity (similar to your NIC-unplugging example). E.g. if you have reason to believe the client will come back (perhaps they closed the laptop lid? spotty wireless access?) you can specify a timeout of 12 hours and when they do come back the connection will still function.

TCP retranmission timer overrides/kills TCP keepalive timer, delaying disconnect discovery

Machine - linux, 3.10.19 kernel
This is in a large distributed system, there are several servers and clients (on same as well as different nodes/machines) having TCP connections with each other.
Test case:
The client program node/machine is switched off (on purpose, test case) and the only way for server to know about his disconnection is via keepalive timer (idle time=40 sec, 4 probes, probe time=10 sec).
Good case:
This works fine in most of the cases, the server gets to know that the client has gone down in [40,70] sec.
Bad case:
But I am hitting another unique situation where while keepalive timer is running, the server tries sending some data to the client, and this in turn starts the TCP retransmission timer which overrides/kills the keepalive timer. It takes ~15 min for the retransmission timer to detect that the other end is not there anymore.
15 min is a lot of time for server to realize this. I am looking for ways how others handle such a situation. Do I need to tweak my retransmission timer values?
Thanks!
There is a completely separate configuration for retransmission timeouts.
From Linux's tcp.7 man page:
tcp_retries2 (integer; default: 15; since Linux 2.2)
The maximum number of times a TCP packet is retransmitted in
established state before giving up. The default value is 15, which
corresponds to a duration of approximately between 13 to 30 minutes,
depending on the retransmission timeout. The RFC 1122 specified
minimum limit of 100 seconds is typically deemed too short.
This is likely the value you'll want to adjust to change how long it takes to detect if your connection has vanished.
I have the same issue with a linux kernel release 4.3.0-1-amd64:
I used a server and a client, connected to the same switch.
TCP keep-alive mecanism works correctly for client and server in the following cases:
When no message is sent between the cable disconnection and the socket disconnection (by the tcp keep-alive mecanism).
When the cable is disconnected between the client/server and the switch (which sets the link state to down) even if the client/server application tries to send a message.
When the wire is unpluged on the other side of the switch, TCP Keep-Alive frames are transmitted until an applicative message is sent. Then, TCP Retransmission frames are sent and TCP keep-alive frames stop being sent, which prevents the socket to be closed.

Overhead of Idle WebSockets

Say I have an websocket that may receive an event at any time, but is mostly idle, how much bandwidth will be consumed after the initial connection in order to keep it alive?
For what it's worth, the server is NodeJS using ws, and the client is using QtWebSockets.
Thanks!
Once established (meaning the three-way handshake completes), a raw TCP connection uses zero bandwidth unless:
You send or receive data
TCP keepalives are explicitly enabled
A server or client can enable TCP keepalives. A keepalive is a zero-length packet sent with the ACK flag set, and is only 54 bytes sent on the wire plus another 54 for the response. By default, TCP keepalives are sent every two hours. In other words, completely negligible.
WebSockets also have their own keepalive mechanism (to deal with proxies). The server or the client may send a PING frame, and the other end MUST respond with a PONG frame. While there is no browser-side JS API to send PINGs, a node server may send them and a compliant browser will automatically respond. (QtWebSockets does have an API to send PINGs.) This does not happen by default; you must do it manually. WebSocket PING and PONG frames are at least 7 bytes and at most 131 bytes each (plus 54 bytes of TCP/IP overhead). Thus a single PING/PONG costs between 122 and 370 bytes.
ws does not do any keepalives automatically, nor does QtWebSockets. So to answer your question, a default configuration does use zero bandwidth to maintain the TCP connection.
However...
Keepalives are important because intermediate devices (ie NAT routers) will drop inactive TCP connections. Depending on the networks between your server and your clients, this means that if you don't have any keepalives, your clients will lose their connections (possibly without knowing it, which is bad), and will have to re-eastablish the WebSocket session. This will likely be far more expensive in terms of bandwidth than enabling sane keepalives.
A PING/PONG every 5 minutes costs 1.5 - 4.4 kB per hour (per client).
Note: socket.io has its own keepalive mechanism (separate from WebSockets') that is enabled by default. The sio keepalive is managed by the sio library and happens over the WebSocket data channel (as opposed to WebSocket PING/PONG, which are control frames). The socket.io server sends an 8 byte (+overhead) keepalive message every 30 seconds, which amounts to about 15 kB per hour.

Linux application doesn't ACK the FIN packet and retransmission occurs

I have a server running on linux(kernel 3.0.35) and a client running on Windows 7. Every time the client connects to the server a TCP retransmission occurs:
Here is the traffic from wireshark:
http://oi39.tinypic.com/ngsnyb.jpg
What I think happens is this:
The client(192.168.1.100) connects to the server(192.168.1.103), the connection is successful. At some point the client decides to close the connection( FIN,ACK), the server doesn't ACK the FIN.
Then the client starts a new connection, that connection is ACK, and is successful. In the meantime the Windows kernel continues to retransmit the FIN,ACK packet, and finally decides to do a reset.
At the moment the second connection is established I don't receive the data that the client is sending(the packet with 16 bytes length) at server side, I receive these bytes only after the RST packet.
On the server side I'm using the poll() function to check for POLLIN events, and I'm not notified of any data until the RST packet arrives.
So does anyone know why this situation is happening?
Your data bytes are not sent on that 52687 connection but rather the following 52690 connection. I'd guess that the server app is accepting only one connection at a time (the kernel will accept them in advance and just hold the data) and thus not seeing data from the second connection until the first connection is dead and it moves on to the next.
That doesn't explain why your FIN is not being ACK'd. It should. Perhaps there is something in the kernel that doesn't like open-then-close-with-no-data connections? Maybe some kind of attack mitigation? Firewall rules?

Tcp connections hang on CLOSE_WAIT status

Client close the socket first, when there is not much data from server, tcp connection shutdown is okay like:
FIN -->
<-- ACK
<-- FIN, ACK
ACK -->
When the server is busying sending data:
FIN -->
<-- ACK,PSH
RST -->
And the server connection comes to CLOSE_WAIT state and hang on there for a long time.
What's the problem here? client related or server related? This happens on Redhat5 for local sockets.
This article talk about why "RST" is sent, but I do not know why the server connection stuck on CLOSE_WAIT, and do not send a FIN out.
[EDIT]I ignored the most important information, this happens on qemu's slirp network emulation. It seems to be a problem of slirp bug for dealing with close connection.
This means that there is unread data left in in the stream, that the client hasn't finished reading.
You can force it off by using the SO_LINGER option. Here's relevant documentation for Linux (also see the option itself, here), and [here's the matching function2] for Win32.
It's the server side that is remaining open, so it's on the server side you can try disabling SO_LINGER.
It may mean that the server hasn't closed the socket. You can easily tell this by using "lsof" to list the file descriptors open by that process which will include TCP sockets. The fix is to have the process always close the socket when it's finished (even in error cases etc)
This a known defect for qemu.

Resources