TCP keep-alive parameters not being honoured - linux

I am experimenting with TCP keep alive on my Linux box, and have written the following small server:
#include <iostream>
#include <cstring>
#include <netinet/in.h>
#include <arpa/inet.h> // inet_ntop
#include <netinet/tcp.h>
#include <netdb.h> // addrinfo stuff
using namespace std;
typedef int SOCKET;
int main(int argc, char *argv [])
{
struct sockaddr_in sockaddr_IPv4;
memset(&sockaddr_IPv4, 0, sizeof(struct sockaddr_in));
sockaddr_IPv4.sin_family = AF_INET;
sockaddr_IPv4.sin_port = htons(58080);
if (inet_pton(AF_INET, "10.6.186.24", &sockaddr_IPv4.sin_addr) != 1)
return -1;
SOCKET serverSock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (bind(serverSock, (sockaddr*)&sockaddr_IPv4, sizeof(sockaddr_IPv4)) != 0 || listen(serverSock, SOMAXCONN) != 0)
{
cout << "Failed to setup listening socket!\n";
}
SOCKET clientSock = accept(serverSock, 0, 0);
if (clientSock == -1)
return -1;
// Enable keep-alive on the client socket
const int nVal = 1;
if (setsockopt(clientSock, SOL_SOCKET, SO_KEEPALIVE, &nVal, sizeof(nVal)) < 0)
{
cout << "Failed to set keep-alive!\n";
return -1;
}
// Get the keep-alive options that will be used on the client socket
int nProbes, nTime, nInterval;
socklen_t nOptLen = sizeof(int);
bool bError = false;
if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPIDLE, &nTime, &nOptLen) < 0) { bError = true; }
nOptLen = sizeof(int);
if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPCNT, &nProbes, &nOptLen) < 0) {bError = true; }
nOptLen = sizeof(int);
if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPINTVL, &nInterval, &nOptLen) < 0) { bError = true; }
cout << "Keep alive settings are: time: " << nTime << ", interval: " << nInterval << ", number of probes: " << nProbes << "\n";
if (bError)
{
// Failed to retrieve values
cout << "Failed to get keep-alive options!\n";
return -1;
}
int nRead = 0;
char buf[128];
do
{
nRead = recv(clientSock, buf, 128, 0);
} while (nRead != 0);
return 0;
}
I then adjusted the system-wide TCP keep alive settings to be as follows:
# cat /proc/sys/net/ipv4/tcp_keepalive_time
20
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
30
I then connected to my server from Windows, and ran a Wireshark trace to see the keep-alive packets. The image below shows the result.
This confused me, since I now understand the keep-alive interval to only come into play if no ACK is received in response to the original keep alive packet (see my other question here). So I would expect the subsequent packets to be consistently sent at 20 second intervals (not 30, which is what we see), not just the first one.
I then adjusted the system wide settings as follows:
# cat /proc/sys/net/ipv4/tcp_keepalive_time
30
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
20
This time when I connect, I see the following in my Wireshark trace:
Now we see that the first keep-alive packet is sent after 30 seconds, but each one thereafter is also sent at 30 seconds, not the 20 as would be suggested by the previous run!
Can someone please explain this inconsistent behaviour?

Roughly speaking, how it is supposed to work is that a keepalive message will be sent every tcp_keepalive_time seconds. If an ACK is not recieved, it will then probe every tcp_keepalive_intvl seconds. If an ACK is not received after tcp_keepalive_probes, the connection will be aborted. Thus, a connection will be aborted after at most
tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl
seconds without a response. See this kernel documentation.
We can easily watch this work using netcat keepalive, a version of netcat that allows us to set tcp keepalive parameters (The sysctl keepalive parameters are the default, but they can be overriden on a per socket basis in the tcp_sock struct).
First start up a server listening on port 8888 with keepalive_timer set to 5 seconds, keepalive_intval set to 1 second, and keepalive_probes set to 4.
$ ./nckl-linux -K -O 5 -I 1 -P 4 -l 8888 >/dev/null &
Next, let's use iptables to introduce loss for ACK packets sent to the server:
$ sudo iptables -A OUTPUT -p tcp --dport 8888 \
> --tcp-flags SYN,ACK,RST,FIN ACK \
> -m statistic --mode random --probability 0.5 \
> -j DROP
This will cause packets that are sent to TCP port 8888 with just the ACK flag set to be dropped with probability 0.5.
Now let's connect and watch with the vanilla netcat (which will use the sysctl keepalive values):
$ nc localhost 8888
Here is the capture:
As you can see, it waits 5 seconds after receiving an ACK before sending another keepalive message. If it doesn't receive an ACK within 1 second, it sends another probe, and if it doesn't receive an ACK after 4 probes, it aborts the connection. This is exactly how keepalive is supposed to work.
So let's try to reproduce what you were seeing. Let's delete the iptables rule (no loss), start a new server with tcp_keepalive_time set to 1 second, and tcp_keepalive_intvl set to 5 seconds, and then connect with a client. Here is the result:
Interestingly, we see the same behavior you did: after the first ACK, it waits 1 second to send a keepalive message, and thereafter every 5 seconds.
Let's add the iptables rule back in to introduce loss to see what time it actually waits to send another probe if it doesn't get an ACK (using -K -O 1 -I 5 -P 4 on the server):
Again, it waits 1 second from the first ACK to send a keepalive message, but thereafter it waits 5 seconds whether it sees an ACK or not, as if keepalive_time and keepalive_intvl are both set to 5.
In order to understand this behavior, we will need to take a look at the linux kernel TCP implementation. Let's first look at tcp_finish_connect:
if (sock_flag(sk, SOCK_KEEPOPEN))
inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
When the TCP connection is established, the keepalive timer is effectively set to tcp_keepalive_time, which is 1 second in our case.
Next, let's take a look at how the timer is processed in tcp_keepalive_timer:
elapsed = keepalive_time_elapsed(tp);
if (elapsed >= keepalive_time_when(tp)) {
/* If the TCP_USER_TIMEOUT option is enabled, use that
* to determine when to timeout instead.
*/
if ((icsk->icsk_user_timeout != 0 &&
elapsed >= icsk->icsk_user_timeout &&
icsk->icsk_probes_out > 0) ||
(icsk->icsk_user_timeout == 0 &&
icsk->icsk_probes_out >= keepalive_probes(tp))) {
tcp_send_active_reset(sk, GFP_ATOMIC);
tcp_write_err(sk);
goto out;
}
if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
icsk->icsk_probes_out++;
elapsed = keepalive_intvl_when(tp);
} else {
/* If keepalive was lost due to local congestion,
* try harder.
*/
elapsed = TCP_RESOURCE_PROBE_INTERVAL;
}
} else {
/* It is tp->rcv_tstamp + keepalive_time_when(tp) */
elapsed = keepalive_time_when(tp) - elapsed;
}
sk_mem_reclaim(sk);
resched:
inet_csk_reset_keepalive_timer (sk, elapsed);
goto out;
When keepalive_time_when is greater than keepalive_itvl_when this code works as expected. However, when it is not, you see the behavior you observed.
When the initial timer (set when the TCP connection is established) expires after 1 second, we will extend the timer until elapsed is greater than keepalive_time_when. At that point we will send a probe, and will set the timer to keepalive_intvl_when, which is 5 seconds. When this timer expires, if nothing has been received for the last 1 second (keepalive_time_when), we will send a probe, and then set the timer again to keepalive_intvl_when, and wake up in another 5 seconds, and so on.
However, if we have received something within keepalive_time_when when the timer expires, it will use keepalive_time_when to reschedule the timer for 1 second since the last time we received anything.
So, to answer your question, the linux implementation of TCP keepalive assumes that keepalive_intvl is less than keepalive_time, but nevertheless works "sensibly."

Related

AF-XDP: Is there a bug regarding small packets?

Is there a known (or maybe unknown) bug regarding the size of packets in the AF-XDP socket framework (+ libbpf)?
I am experiencing a strange packet loss for my application:
IPv4/UDP/RTP packet stream with all packets being the same size (1442 bytes): no packet loss
IPv4/UDP/RTP packet stream where pretty much all packets are the same size (1492 bytes) except a special "marker" packet (only 357 bytes but they are also IPv4/UDP-packets): all marker packets get lost
I added a bpf_printk statement in my XDP-Kernelprogram:
const int len = bpf_ntohs(iph->tot_len);
if(len < 400) {
bpf_printk("FOUND PACKET LEN < 400: %d.\n", len);
}
This output is never observed via sudo cat /sys/kernel/debug/tracing/trace_pipe. So these small RTP-marker packets aren't even received by my kernel filter - no wonder why I don't receive them in userspace.
ethtool -S <if> shows me this number: rx_256_to_511_bytes_phy. This number is increasing in a similar rate as marker-packets should come in (about 30/s). So this means that my NIC does receive the packets but my XDP-program doesn't - why?
Any idea what could be the cause of this problem?
First, bpf_printk() doesn't always work for me. You may want to take a look at this snippet (kernel-space code):
// Nicer way to call bpf_trace_printk()
#define bpf_custom_printk(fmt, ...) \
({ \
char ____fmt[] = fmt; \
bpf_trace_printk(____fmt, sizeof(____fmt), \
##__VA_ARGS__); \
})
// print:
bpf_custom_printk("This year is %d\n", 2020);
// output: sudo cat /sys/kernel/debug/tracing/trace_pipe
Second: May be the packet entered the other NIC queue. You may want to use vanilla code from xdp-tutorial and add the kernel tracing from the above snippet to print size of packet, then compile and run the example program with -q 1 for queue number 1 for example.
A way to get size of packet:
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
size_t size_pkt = data - data_end;
bpf_custom_printk("Packet size %d\n", size_pkt);

tcp keepalive - Protocol not available?

I'm trying to set tcp keepalive but in doing so I'm seeing the error
"Protocol not available"
int rc = setsockopt(s, SOL_SOCKET, TCP_KEEPIDLE, &keepalive_idle, sizeof(keepalive_idle));
if (rc < 0)
printf("error setting keepalive_idle: %s\n", strerror(errno));
I'm able to turn on keepalive, set keepalive interval and count but keepalive idle which is keepalive time is throwing that error and I never see any keepalive packets being transmitted/received either with wireshark and the filter tcp.analysis.keep_alive or with tcpdump
sudo tcpdump -vv "tcp[tcpflags] == tcp-ack and less 1"
Is there a kernel module that needs to be loaded or something? Or are you no longer able to override the global KEEPIDLE time.
By the way the output of
matt#devpc:~/ sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_probes net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
In an application that I coded, the following works:
setsockopt(*sfd, SOL_SOCKET, SO_KEEPALIVE,(char *)&enable_keepalive, sizeof(enable_keepalive));
setsockopt(*sfd, IPPROTO_TCP, TCP_KEEPCNT, (char *)&num_keepalive_strobes, sizeof(num_keepalive_strobes));
setsockopt(*sfd, IPPROTO_TCP, TCP_KEEPIDLE, (char *)&keepalive_idle_time_secs, sizeof(keepalive_idle_time_secs));
setsockopt(*sfd, IPPROTO_TCP, TCP_KEEPINTVL, (char *)&keepalive_strobe_interval_secs, sizeof(keepalive_strobe_interval_secs));
Try changing SOL_SOCKET to IPPROTO_TCP for TCPKEEPIDLE.
There a very handy lib that can help you, it's called libkeepalive : http://libkeepalive.sourceforge.net/
It can be used with LD_PRELOAD in order to enable and control keep-alive on all TCP sockets. You can also override keep-alive settings with environment variables.
I tried to run a tcp server with it:
KEEPIDLE=5 KEEPINTVL=5 KEEPCNT=100 LD_PRELOAD=/usr/lib/libkeepalive.so nc -l -p 4242
Then I connected a client:
nc 127.0.0.1 4242
And I visualized the traffic with Wireshark: the keep-alive packets began exactly after 5 seconds of inactivity (my system wide setting is 75). Therefore it means that it's possible to override the system settings.
Here is how libkeepalive sets TCP_KEEPIDLE:
if((env = getenv("KEEPIDLE")) && ((optval = atoi(env)) >= 0)) {
setsockopt(s, SOL_TCP, TCP_KEEPIDLE, &optval, sizeof(optval));
}
Looks like they use SOL_TCP instead of SOL_SOCKET.

Linux Serial Port Blocked using termios.h configuration

I'm writing an embedded Linux application that (1) opens a serial connection to another device, (2) sends a known command, (3) checks port for incoming characters (response) until expected response phrase or character is detected, (4) repeats step 2 and 3 until a series of commands are sent and responses received, (5) then closes the port.
My app would go through some cycles of the above sequence and every now and then it would be waiting for a response (reading) when all of a sudden the communication stops and my software faults out because of my built-in timeout logic.
Is there anything in my port configuration that would cause the port to be blocked due to specific byte sent (possibly due to electrical noise)?
Here is how I'm opening my ports (showing configurations via termios.h):
struct termios options;
fd = open("/dev/ttyUSB0", O_RDWR | O_NOCTTY | O_NONBLOCK);
if (fd == -1) {
debug() << "Port open failed!"
return FAIL;
}
debug() << "Port Opened Successful"
fcntl(fd, F_SETFL, 0); // This setting interacts with VMIN and VTIME below
// Get options
tcgetattr(fd, &options);
// Adjust Com port options
options.c_cflag |= (CLOCAL | CREAD); // Program will not "own" port, enable reading on port
options.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG); // Sets RAW input mode (does not treat input as a line of text with CR/LF ending)
options.c_oflag &= ~ OPOST; // Sets RAW ouput mode (avoids newline mapping to CR+LF characters)
options.c_iflag &= ~(IXON | IXOFF | IXANY); // Turns off SW flow c
options.c_cc[VMIN] = 0;
options.c_cc[VTIME] = 10;
// Set options
tcsetattr(fd, TCSANOW, &options);
//return fd;
return SUCCEED;
I can't figure out why the communication all of a sudden just freezes up and then goes away when I cycle power to my device. Thanks all!
More info - here are my read and write functions:
int Comm::Receive(unsigned char* rBuf)
{
int bytes;
ioctl(fd, FIONREAD, &bytes);
if (bytes >= 1)
{
bytes = read(fd, rBuf, 1);
if (bytes < 0)
return READ_ERR;
return SUCCEED;
}
else
return NO_DATA_AVAILABLE;
}
int Comm::Send(int xCt, unsigned char* xBuf)
{
int bytes;
if (fd == -1)
return FAIL;
bytes = write(fd, xBuf, xCt);
if (bytes != xCt)
return FAIL;
else
return SUCCEED;
}
Welcome to the joys of serial ports...
Thought 1: wrap your read calls with a select ()
Thought 2: Unset the ICANON flag in tcsetattr, and set a VTIME attribute for a deliberate timeout (and then, obviously, handle it)
Thought 3: Nothing about serial comms ever works perfectly.
I also had a similar problem with sending command to devices and reading responses from the device. Please refer to below SOF post and the answer this was working for my problem.
In these cases, We have to care about the protocol which we are going to use for device communication (send and receive). If we can send commands successfully and we didn't receive a response with a noise from device implies there is something wrong in the data packet which was sent to the device. First of all check the protocol specification first and create a byte array for a simple command (like make a beep sound) and send it.
Send data to a barcode scanner over RS232 serial port
I can do something for you If you can post your complete source code with the output.
Enjoy the code. Thanks.

Reduce TCP maximum segment size (MSS) in Linux on a socket

In a special application in which our server needs to update firmware of low-on-resource sensor/tracking devices we encountered a problem in which sometimes data is lost in the
remote devices (clients) receiving packets of the new firmware. The connection is TCP/IP over
GPRS network. The devices use SIM900 GSM chip as a network interface.
The problems possibly come because of the device receiving too much data. We tried reducing the
traffic by sending packages more rarely but sometimes the error still occured.
We contacted the local retailer of the SIM900 chip who is also responsible for giving technical support and possibly contacting the chinese manufacturer (simcom) of the chip. They said that at first we should try to reduce the TCP MSS (Maximum Segment Size) of our connection.
In our server I did the following:
static int
create_master_socket(unsigned short master_port) {
static struct sockaddr_in master_address;
int master_socket = socket(AF_INET,SOCK_STREAM,0);
if(!master_socket) {
perror("socket");
throw runtime_error("Failed to create master socket.");
}
int tr=1;
if(setsockopt(master_socket,SOL_SOCKET,SO_REUSEADDR,&tr,sizeof(int))==-1) {
perror("setsockopt");
throw runtime_error("Failed to set SO_REUSEADDR on master socket");
}
master_address.sin_family = AF_INET;
master_address.sin_addr.s_addr = INADDR_ANY;
master_address.sin_port = htons(master_port);
uint16_t tcp_maxseg;
socklen_t tcp_maxseg_len = sizeof(tcp_maxseg);
if(getsockopt(master_socket, IPPROTO_TCP, TCP_MAXSEG, &tcp_maxseg, &tcp_maxseg_len)) {
log_error << "Failed to get TCP_MAXSEG for master socket. Reason: " << errno;
perror("getsockopt");
} else {
log_info << "TCP_MAXSEG: " << tcp_maxseg;
}
tcp_maxseg = 256;
if(setsockopt(master_socket, IPPROTO_TCP, TCP_MAXSEG, &tcp_maxseg, tcp_maxseg_len)) {
log_error << "Failed to set TCP_MAXSEG for master socket. Reason: " << errno;
perror("setsockopt");
} else {
log_info << "TCP_MAXSEG: " << tcp_maxseg;
}
if(getsockopt(master_socket, IPPROTO_TCP, TCP_MAXSEG, &tcp_maxseg, &tcp_maxseg_len)) {
log_error << "Failed to get TCP_MAXSEG for master socket. Reason: " << errno;
perror("getsockopt");
} else {
log_info << "TCP_MAXSEG: " << tcp_maxseg;
}
if(bind(master_socket, (struct sockaddr*)&master_address,
sizeof(master_address))) {
perror("bind");
close(master_socket);
throw runtime_error("Failed to bind master_socket to port");
}
return master_socket;
}
Running the above code results in:
I0807 ... main.cpp:267] TCP_MAXSEG: 536
E0807 ... main.cpp:271] Failed to set TCP_MAXSEG for master socket. Reason: 22 setsockopt: Invalid argument
I0807 ... main.cpp:280] TCP_MAXSEG: 536
As you may see, the problem in the second line of the output: setsockopt returns "Invalid argument".
Why does this happen? I read about some constraints in setting TCP_MAXSEG but I did not encounter any report on such a behaviour as this.
Thanks,
Dennis
In addition to xaxxon's answer, just wanted to note my experience with trying to force my Linux to send only maximum TCP segments of a certain size (lower than what they normally are):
The easiest way I found to do so, was to use iptables:
sudo iptables -A INPUT -p tcp --tcp-flags SYN,RST SYN --destination 1.1.1.1 -j TCPMSS --set-mss 200
This overwrites the remote incoming SYN/ACK packet on an outbound connection, and forces the MSS to a specific value.
Note1: You do not see this in wireshark, since wireshark capture before this happens.
Note 2: Iptables does not allow you to -increase- the MSS, just lower it
Alternatively, I also tried setting the socket option TCP_MAXSEG, like dennis had done. After taking the fix from xaxxon, this also worked.
Note: You should read the MSS value after the connection has been set up. Otherwise it returns the default value, which put me (and dennis) on the wrong track.
Now finally, I also ran into a number of other things:
I ran into TCP-offloading issues, where despite my MSS being set correctly, the frames being sent were still shown by wireshark as too big. You can disable this feature by : sudo ethtool -K eth0 tx off sg off tso off. This took me a long time to figure out.
TCP has lots of fancy things like MTU path discovery, which actually try to dynamically increase the MSS. Fun and cool, but confusing obviously. I did not have issues with it though in my tests
Hope this helps someone trying to do the same thing one day.
Unless otherwise noted, optval is a pointer to an int.
but you're using a u_int16. I don't see anything saying that this parameter isn't an int.
edit: Yeah, here is the source code and you can see:
637 if (optlen < sizeof(int))
638 return -EINVAL;

Linux serial port buffer not empty when opening device

I have a system where I am seeing strange behavior with the serial ports that I don't expect. I've previously seen this on occasion with usb-to-serial adapters, but now I'm seeing it on native serial ports as well, with much greater frequency.
The system is set up to run automated tests and will first perform some tasks that cause a large amount of data to be outputted from the serial device while I do not have the ports open. The device will also reset itself. Only the tx/rx lines are connected. There is no flow control.
After these tasks complete, the testware opens the serial ports and immediately fails because it gets unexpected responses. When I reproduce this, I found that if I open the serial port in a terminal program, I see several kilobytes of old data (that appears to have been sent when the port was closed) immediately flushed out. Once I close this program, I can then run the tests as expected.
What could cause this to happen? How does Linux handle buffering the serial port when the device is closed? If I opened a device, made it send output, and then closed it without reading from it, would this cause the same problem?
The Linux terminal driver buffers input even if it is not opened. This can be a useful feature, especially if the speed/parity/etc. are set appropriately.
To replicate the behavior of lesser operating systems, read all pending input from the port as soon as it is open:
...
int fd = open ("/dev/ttyS0", O_RDWR | O_NOCTTY | O_SYNC);
if (fd < 0)
exit (1);
set_blocking (fd, 0); // disable reads blocked when no input ready
char buf [10000];
int n;
do {
n = read (fd, buf, sizeof buf);
} while (n > 0);
set_blocking (fd, 1); // enable read blocking (if desired)
... // now there is no pending input
void set_blocking (int fd, int should_block)
{
struct termios tty;
memset (&tty, 0, sizeof tty);
if (tcgetattr (fd, &tty) != 0)
{
error ("error %d getting term settings set_blocking", errno);
return;
}
tty.c_cc[VMIN] = should_block ? 1 : 0;
tty.c_cc[VTIME] = should_block ? 5 : 0; // 0.5 seconds read timeout
if (tcsetattr (fd, TCSANOW, &tty) != 0)
error ("error setting term %sblocking", should_block ? "" : "no");
}

Resources