tcp keepalive - Protocol not available? - linux

I'm trying to set tcp keepalive but in doing so I'm seeing the error
"Protocol not available"
int rc = setsockopt(s, SOL_SOCKET, TCP_KEEPIDLE, &keepalive_idle, sizeof(keepalive_idle));
if (rc < 0)
printf("error setting keepalive_idle: %s\n", strerror(errno));
I'm able to turn on keepalive, set keepalive interval and count but keepalive idle which is keepalive time is throwing that error and I never see any keepalive packets being transmitted/received either with wireshark and the filter tcp.analysis.keep_alive or with tcpdump
sudo tcpdump -vv "tcp[tcpflags] == tcp-ack and less 1"
Is there a kernel module that needs to be loaded or something? Or are you no longer able to override the global KEEPIDLE time.
By the way the output of
matt#devpc:~/ sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_probes net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

In an application that I coded, the following works:
setsockopt(*sfd, SOL_SOCKET, SO_KEEPALIVE,(char *)&enable_keepalive, sizeof(enable_keepalive));
setsockopt(*sfd, IPPROTO_TCP, TCP_KEEPCNT, (char *)&num_keepalive_strobes, sizeof(num_keepalive_strobes));
setsockopt(*sfd, IPPROTO_TCP, TCP_KEEPIDLE, (char *)&keepalive_idle_time_secs, sizeof(keepalive_idle_time_secs));
setsockopt(*sfd, IPPROTO_TCP, TCP_KEEPINTVL, (char *)&keepalive_strobe_interval_secs, sizeof(keepalive_strobe_interval_secs));
Try changing SOL_SOCKET to IPPROTO_TCP for TCPKEEPIDLE.

There a very handy lib that can help you, it's called libkeepalive : http://libkeepalive.sourceforge.net/
It can be used with LD_PRELOAD in order to enable and control keep-alive on all TCP sockets. You can also override keep-alive settings with environment variables.
I tried to run a tcp server with it:
KEEPIDLE=5 KEEPINTVL=5 KEEPCNT=100 LD_PRELOAD=/usr/lib/libkeepalive.so nc -l -p 4242
Then I connected a client:
nc 127.0.0.1 4242
And I visualized the traffic with Wireshark: the keep-alive packets began exactly after 5 seconds of inactivity (my system wide setting is 75). Therefore it means that it's possible to override the system settings.
Here is how libkeepalive sets TCP_KEEPIDLE:
if((env = getenv("KEEPIDLE")) && ((optval = atoi(env)) >= 0)) {
setsockopt(s, SOL_TCP, TCP_KEEPIDLE, &optval, sizeof(optval));
}
Looks like they use SOL_TCP instead of SOL_SOCKET.

Related

EADDRNOTAVAIL even after using IP_FREEBIND?

I was under the impression that under Linux you could bind to a non-local address as long as you set the IP_FREEBIND socket option, but that's not the behavior I'm seeing:
$ sudo strace -e 'trace=%network' ...
...
socket(AF_INET, SOCK_RAW, IPPROTO_UDP) = 5
setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_NO_CHECK, [1], 4) = 0
setsockopt(5, SOL_IP, IP_HDRINCL, [1], 4) = 0
setsockopt(5, SOL_IP, IP_FREEBIND, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(abcd), sin_addr=inet_addr("w.x.y.z")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address)
...
I also set the ip_nonlocal_bind setting, just to be certain, and I get the same results.
$ sysctl net.ipv4.ip_nonlocal_bind
net.ipv4.ip_nonlocal_bind = 1
Unfortunately, it seems that it is not possible to bind a raw IP socket to a non-local, non-broadcast and non-multicast address, regardless of IP_FREEBIND. Since I see inet_addr("w.x.y.z") in your strace output, I assume that this is exactly what you're trying to do and w.x.y.z is a non-local unicast address, thus your bind syscall fails.
This seems in accordance with man 7 raw:
A raw socket can be bound to a specific local address using the
bind(2) call. If it isn't bound, all packets with the specified
IP protocol are received. In addition, a raw socket can be bound
to a specific network device using SO_BINDTODEVICE; see socket(7).
Indeed, looking at the kernel source code, in raw_bind() we can see the following check:
ret = -EADDRNOTAVAIL;
if (addr->sin_addr.s_addr && chk_addr_ret != RTN_LOCAL &&
chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST)
goto out;
Also, note that .sin_port must be 0. The .sin_port field for raw sockets is used to select a sending/receiving IP protocol (not a port, since we are at level 3 and ports do not exist). As the manual states, from Linux 2.2 onwards you cannot select a sending protocol through .sin_port anymore, the sending protocol is the one set when creating the socket.

TCP keep-alive parameters not being honoured

I am experimenting with TCP keep alive on my Linux box, and have written the following small server:
#include <iostream>
#include <cstring>
#include <netinet/in.h>
#include <arpa/inet.h> // inet_ntop
#include <netinet/tcp.h>
#include <netdb.h> // addrinfo stuff
using namespace std;
typedef int SOCKET;
int main(int argc, char *argv [])
{
struct sockaddr_in sockaddr_IPv4;
memset(&sockaddr_IPv4, 0, sizeof(struct sockaddr_in));
sockaddr_IPv4.sin_family = AF_INET;
sockaddr_IPv4.sin_port = htons(58080);
if (inet_pton(AF_INET, "10.6.186.24", &sockaddr_IPv4.sin_addr) != 1)
return -1;
SOCKET serverSock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (bind(serverSock, (sockaddr*)&sockaddr_IPv4, sizeof(sockaddr_IPv4)) != 0 || listen(serverSock, SOMAXCONN) != 0)
{
cout << "Failed to setup listening socket!\n";
}
SOCKET clientSock = accept(serverSock, 0, 0);
if (clientSock == -1)
return -1;
// Enable keep-alive on the client socket
const int nVal = 1;
if (setsockopt(clientSock, SOL_SOCKET, SO_KEEPALIVE, &nVal, sizeof(nVal)) < 0)
{
cout << "Failed to set keep-alive!\n";
return -1;
}
// Get the keep-alive options that will be used on the client socket
int nProbes, nTime, nInterval;
socklen_t nOptLen = sizeof(int);
bool bError = false;
if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPIDLE, &nTime, &nOptLen) < 0) { bError = true; }
nOptLen = sizeof(int);
if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPCNT, &nProbes, &nOptLen) < 0) {bError = true; }
nOptLen = sizeof(int);
if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPINTVL, &nInterval, &nOptLen) < 0) { bError = true; }
cout << "Keep alive settings are: time: " << nTime << ", interval: " << nInterval << ", number of probes: " << nProbes << "\n";
if (bError)
{
// Failed to retrieve values
cout << "Failed to get keep-alive options!\n";
return -1;
}
int nRead = 0;
char buf[128];
do
{
nRead = recv(clientSock, buf, 128, 0);
} while (nRead != 0);
return 0;
}
I then adjusted the system-wide TCP keep alive settings to be as follows:
# cat /proc/sys/net/ipv4/tcp_keepalive_time
20
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
30
I then connected to my server from Windows, and ran a Wireshark trace to see the keep-alive packets. The image below shows the result.
This confused me, since I now understand the keep-alive interval to only come into play if no ACK is received in response to the original keep alive packet (see my other question here). So I would expect the subsequent packets to be consistently sent at 20 second intervals (not 30, which is what we see), not just the first one.
I then adjusted the system wide settings as follows:
# cat /proc/sys/net/ipv4/tcp_keepalive_time
30
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
20
This time when I connect, I see the following in my Wireshark trace:
Now we see that the first keep-alive packet is sent after 30 seconds, but each one thereafter is also sent at 30 seconds, not the 20 as would be suggested by the previous run!
Can someone please explain this inconsistent behaviour?
Roughly speaking, how it is supposed to work is that a keepalive message will be sent every tcp_keepalive_time seconds. If an ACK is not recieved, it will then probe every tcp_keepalive_intvl seconds. If an ACK is not received after tcp_keepalive_probes, the connection will be aborted. Thus, a connection will be aborted after at most
tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl
seconds without a response. See this kernel documentation.
We can easily watch this work using netcat keepalive, a version of netcat that allows us to set tcp keepalive parameters (The sysctl keepalive parameters are the default, but they can be overriden on a per socket basis in the tcp_sock struct).
First start up a server listening on port 8888 with keepalive_timer set to 5 seconds, keepalive_intval set to 1 second, and keepalive_probes set to 4.
$ ./nckl-linux -K -O 5 -I 1 -P 4 -l 8888 >/dev/null &
Next, let's use iptables to introduce loss for ACK packets sent to the server:
$ sudo iptables -A OUTPUT -p tcp --dport 8888 \
> --tcp-flags SYN,ACK,RST,FIN ACK \
> -m statistic --mode random --probability 0.5 \
> -j DROP
This will cause packets that are sent to TCP port 8888 with just the ACK flag set to be dropped with probability 0.5.
Now let's connect and watch with the vanilla netcat (which will use the sysctl keepalive values):
$ nc localhost 8888
Here is the capture:
As you can see, it waits 5 seconds after receiving an ACK before sending another keepalive message. If it doesn't receive an ACK within 1 second, it sends another probe, and if it doesn't receive an ACK after 4 probes, it aborts the connection. This is exactly how keepalive is supposed to work.
So let's try to reproduce what you were seeing. Let's delete the iptables rule (no loss), start a new server with tcp_keepalive_time set to 1 second, and tcp_keepalive_intvl set to 5 seconds, and then connect with a client. Here is the result:
Interestingly, we see the same behavior you did: after the first ACK, it waits 1 second to send a keepalive message, and thereafter every 5 seconds.
Let's add the iptables rule back in to introduce loss to see what time it actually waits to send another probe if it doesn't get an ACK (using -K -O 1 -I 5 -P 4 on the server):
Again, it waits 1 second from the first ACK to send a keepalive message, but thereafter it waits 5 seconds whether it sees an ACK or not, as if keepalive_time and keepalive_intvl are both set to 5.
In order to understand this behavior, we will need to take a look at the linux kernel TCP implementation. Let's first look at tcp_finish_connect:
if (sock_flag(sk, SOCK_KEEPOPEN))
inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
When the TCP connection is established, the keepalive timer is effectively set to tcp_keepalive_time, which is 1 second in our case.
Next, let's take a look at how the timer is processed in tcp_keepalive_timer:
elapsed = keepalive_time_elapsed(tp);
if (elapsed >= keepalive_time_when(tp)) {
/* If the TCP_USER_TIMEOUT option is enabled, use that
* to determine when to timeout instead.
*/
if ((icsk->icsk_user_timeout != 0 &&
elapsed >= icsk->icsk_user_timeout &&
icsk->icsk_probes_out > 0) ||
(icsk->icsk_user_timeout == 0 &&
icsk->icsk_probes_out >= keepalive_probes(tp))) {
tcp_send_active_reset(sk, GFP_ATOMIC);
tcp_write_err(sk);
goto out;
}
if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
icsk->icsk_probes_out++;
elapsed = keepalive_intvl_when(tp);
} else {
/* If keepalive was lost due to local congestion,
* try harder.
*/
elapsed = TCP_RESOURCE_PROBE_INTERVAL;
}
} else {
/* It is tp->rcv_tstamp + keepalive_time_when(tp) */
elapsed = keepalive_time_when(tp) - elapsed;
}
sk_mem_reclaim(sk);
resched:
inet_csk_reset_keepalive_timer (sk, elapsed);
goto out;
When keepalive_time_when is greater than keepalive_itvl_when this code works as expected. However, when it is not, you see the behavior you observed.
When the initial timer (set when the TCP connection is established) expires after 1 second, we will extend the timer until elapsed is greater than keepalive_time_when. At that point we will send a probe, and will set the timer to keepalive_intvl_when, which is 5 seconds. When this timer expires, if nothing has been received for the last 1 second (keepalive_time_when), we will send a probe, and then set the timer again to keepalive_intvl_when, and wake up in another 5 seconds, and so on.
However, if we have received something within keepalive_time_when when the timer expires, it will use keepalive_time_when to reschedule the timer for 1 second since the last time we received anything.
So, to answer your question, the linux implementation of TCP keepalive assumes that keepalive_intvl is less than keepalive_time, but nevertheless works "sensibly."

Why TCP_NODELAY option has no effect on linux 3.2.40?

Why the following code can't turn off Nagle Algorithm:
int on = 1;
int ret = 0;
ret = setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &on, sizeof(on));
My program runs on linux kernel 3.2.40(maybe modified).
Thanks!
The following picture is a packet captured by wireshark, please focus on the red rectangle:
string BP01_2_S_1, BP01_3_S_1 and BP01_4_S_1(sorry, I can't include the angular brackets)
should be contained in 3 separate TCP packets(I call `send' 3 times), but now they are in the same packet, it's not what I want.

Reduce TCP maximum segment size (MSS) in Linux on a socket

In a special application in which our server needs to update firmware of low-on-resource sensor/tracking devices we encountered a problem in which sometimes data is lost in the
remote devices (clients) receiving packets of the new firmware. The connection is TCP/IP over
GPRS network. The devices use SIM900 GSM chip as a network interface.
The problems possibly come because of the device receiving too much data. We tried reducing the
traffic by sending packages more rarely but sometimes the error still occured.
We contacted the local retailer of the SIM900 chip who is also responsible for giving technical support and possibly contacting the chinese manufacturer (simcom) of the chip. They said that at first we should try to reduce the TCP MSS (Maximum Segment Size) of our connection.
In our server I did the following:
static int
create_master_socket(unsigned short master_port) {
static struct sockaddr_in master_address;
int master_socket = socket(AF_INET,SOCK_STREAM,0);
if(!master_socket) {
perror("socket");
throw runtime_error("Failed to create master socket.");
}
int tr=1;
if(setsockopt(master_socket,SOL_SOCKET,SO_REUSEADDR,&tr,sizeof(int))==-1) {
perror("setsockopt");
throw runtime_error("Failed to set SO_REUSEADDR on master socket");
}
master_address.sin_family = AF_INET;
master_address.sin_addr.s_addr = INADDR_ANY;
master_address.sin_port = htons(master_port);
uint16_t tcp_maxseg;
socklen_t tcp_maxseg_len = sizeof(tcp_maxseg);
if(getsockopt(master_socket, IPPROTO_TCP, TCP_MAXSEG, &tcp_maxseg, &tcp_maxseg_len)) {
log_error << "Failed to get TCP_MAXSEG for master socket. Reason: " << errno;
perror("getsockopt");
} else {
log_info << "TCP_MAXSEG: " << tcp_maxseg;
}
tcp_maxseg = 256;
if(setsockopt(master_socket, IPPROTO_TCP, TCP_MAXSEG, &tcp_maxseg, tcp_maxseg_len)) {
log_error << "Failed to set TCP_MAXSEG for master socket. Reason: " << errno;
perror("setsockopt");
} else {
log_info << "TCP_MAXSEG: " << tcp_maxseg;
}
if(getsockopt(master_socket, IPPROTO_TCP, TCP_MAXSEG, &tcp_maxseg, &tcp_maxseg_len)) {
log_error << "Failed to get TCP_MAXSEG for master socket. Reason: " << errno;
perror("getsockopt");
} else {
log_info << "TCP_MAXSEG: " << tcp_maxseg;
}
if(bind(master_socket, (struct sockaddr*)&master_address,
sizeof(master_address))) {
perror("bind");
close(master_socket);
throw runtime_error("Failed to bind master_socket to port");
}
return master_socket;
}
Running the above code results in:
I0807 ... main.cpp:267] TCP_MAXSEG: 536
E0807 ... main.cpp:271] Failed to set TCP_MAXSEG for master socket. Reason: 22 setsockopt: Invalid argument
I0807 ... main.cpp:280] TCP_MAXSEG: 536
As you may see, the problem in the second line of the output: setsockopt returns "Invalid argument".
Why does this happen? I read about some constraints in setting TCP_MAXSEG but I did not encounter any report on such a behaviour as this.
Thanks,
Dennis
In addition to xaxxon's answer, just wanted to note my experience with trying to force my Linux to send only maximum TCP segments of a certain size (lower than what they normally are):
The easiest way I found to do so, was to use iptables:
sudo iptables -A INPUT -p tcp --tcp-flags SYN,RST SYN --destination 1.1.1.1 -j TCPMSS --set-mss 200
This overwrites the remote incoming SYN/ACK packet on an outbound connection, and forces the MSS to a specific value.
Note1: You do not see this in wireshark, since wireshark capture before this happens.
Note 2: Iptables does not allow you to -increase- the MSS, just lower it
Alternatively, I also tried setting the socket option TCP_MAXSEG, like dennis had done. After taking the fix from xaxxon, this also worked.
Note: You should read the MSS value after the connection has been set up. Otherwise it returns the default value, which put me (and dennis) on the wrong track.
Now finally, I also ran into a number of other things:
I ran into TCP-offloading issues, where despite my MSS being set correctly, the frames being sent were still shown by wireshark as too big. You can disable this feature by : sudo ethtool -K eth0 tx off sg off tso off. This took me a long time to figure out.
TCP has lots of fancy things like MTU path discovery, which actually try to dynamically increase the MSS. Fun and cool, but confusing obviously. I did not have issues with it though in my tests
Hope this helps someone trying to do the same thing one day.
Unless otherwise noted, optval is a pointer to an int.
but you're using a u_int16. I don't see anything saying that this parameter isn't an int.
edit: Yeah, here is the source code and you can see:
637 if (optlen < sizeof(int))
638 return -EINVAL;

Crafting an ICMP packet inside a Linux kernel Module

I'm tring to experiment with the ICMP protocol and have created a kernel-module for linux that analyses ICMP packet ( Processes the packet only if if the ICMP code field is a magic number ) . Now to test this module , i have to create a an ICMP packet and send it to the host where this analysing module is running . In fact it would be nice if i could implement it the kernel itself (as a module ) . I am looking for something like a packetcrafter in kernel , I googled it found a lot of articles explaining the lifetime of a packet , rather than tutorials of creating it . User space packetcrafters would be my last resort, that too those which are highly flexible like where i'll be able to set ICMP code etc . And I'm not wary of kernel panics :-) !!!!! Any packet crafting ideas are welcome .
Sir, I strongly advice you against using the kernel module to build ICMP packets.
You can use user-space raw-sockets to craft ICMP packets, even build the IP-header itself byte by byte.
So you can get as flexible as it can get using that.
Please, take a look at this
ip = (struct iphdr*) packet;
icmp = (struct icmphdr*) (packet + sizeof(struct iphdr));
/*
* here the ip packet is set up except checksum
*/
ip->ihl = 5;
ip->version = 4;
ip->tos = 0;
ip->tot_len = sizeof(struct iphdr) + sizeof(struct icmphdr);
ip->id = htons(random());
ip->ttl = 255;
ip->protocol = IPPROTO_ICMP;
ip->saddr = inet_addr(src_addr);
ip->daddr = inet_addr(dst_addr);
if ((sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)) == -1)
{
perror("socket");
exit(EXIT_FAILURE);
}
/*
* IP_HDRINCL must be set on the socket so that
* the kernel does not attempt to automatically add
* a default ip header to the packet
*/
setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &optval, sizeof(int));
/*
* here the icmp packet is created
* also the ip checksum is generated
*/
icmp->type = ICMP_ECHO;
icmp->code = 0;
icmp->un.echo.id = 0;
icmp->un.echo.sequence = 0;
icmp->checksum = 0;
icmp-> checksum = in_cksum((unsigned short *)icmp, sizeof(struct icmphdr));
ip->check = in_cksum((unsigned short *)ip, sizeof(struct iphdr));
If this part of code looks flexible enough, then read about raw sockets :D maybe they're the easiest and safest answer to your need.
Please check the following links for further info
http://courses.cs.vt.edu/~cs4254/fall04/slides/raw_6.pdf
http://www.cs.binghamton.edu/~steflik/cs455/rawip.txt
http://cboard.cprogramming.com/networking-device-communication/107801-linux-raw-socket-programming.html a very nice topic, pretty useful imo
You can try libcrafter for packet crafting on user space. Is very easy to use! The library is able to craft or decode packets of most common networks protocols, send them on the wire, capture them and match requests and replies.
For example, the next code craft and send an ICMP packet:
string MyIP = GetMyIP("eth0");
/* Create an IP header */
IP ip_header;
/* Set the Source and Destination IP address */
ip_header.SetSourceIP(MyIP);
ip_header.SetDestinationIP("1.2.3.4");
/* Create an ICMP header */
ICMP icmp_header;
icmp_header.SetType(ICMP::EchoRequest);
icmp_header.SetIdentifier(RNG16());
/* Create a packet... */
Packet packet = ip_header / icmp_header;
packet.Send();
Why you want to craft an ICMP packet on kernel-space? Just for fun? :-p
Linux kernel includes a packet generator tool pktgen for testing the network with pre-configured packets. Source code for this module resides in net/core/pktgen.c

Resources