Extract TCP round trip time (RTT) estimations on linux - linux

I have apache server running on Ubuntu. Client connects and downloads an image. I need to extract RTT estimations for the underlying TCP connection. Is there a way to do this? Maybe something like running my tcp stack in debug mode to have it log this info somewhere?
Note that I don't want to run tcpdump and extract RTTs from the recorded trace! I need the TCP stack's RTT estimations (apparently this is part of the info you can get with TCP_INFO socket option). Basically need something like tcpprob (kprobe) to insert a hook and record the estimated RTT of the TCP connection on every incoming packet (or on every change).
UPDATE:
I found a solution. rtt, congestion window and more can be logged using tcpprobe. I posted an answer below.

This can be done without the need for any additional kernel modules using the ss command (part of the iproute package), which can provide detailed info on open sockets. It won't show it for every packet but most of this info is calculated over a number of packets. E.g. To list the currently open TCP (t option) sockets and associated internal TCP info (i) information - including congestion control algorithm, rtt, cwnd etc:
ss -ti
Here's some example output:
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.56.102:ssh 192.168.56.1:46327
cubic wscale:6,7 rto:201 rtt:0.242/0.055 ato:40 mss:1448 rcvmss:1392
advmss:1448 cwnd:10 bytes_acked:33169 bytes_received:6069 segs_out:134
segs_in:214 send 478.7Mbps lastsnd:5 lastrcv:6 lastack:5
pacing_rate 955.4Mbps rcv_rtt:3 rcv_space:28960

This can be done using tcpprobe, which is a module that inserts a hook into the tcp_recv processing path using kprobe records the state of a TCP connection in response to incoming packets.
Let's say you want to probe tcp connection on port 443, you need to do the following:
sudo modprobe tcp_probe port=443 full=1
sudo chmod 444 /proc/net/tcpprobe
cat /proc/net/tcpprobe > /tmp/output.out &
pid=$!
full=1: log on every ack packet received
full=0: log on only condo changes (if you use this your output might be empty)
Now pid is the process which is logging the probe. To stop, simply kill this process:
kill $pid
The format of output.out (according to the source at line 198):
[time][src][dst][length][snd_nxt][snd_una][snd_cwnd][ssthresh][snd_wnd][srtt][rcv_wnd]

Related

Linux: Read data from serial port with one process and write to it with another

I've ecountered a problem using a serial GPS/GNSS device on a Raspberry Pi. The device in question is a u-blox GNSS receiver symlinked to /dev/gps.
I try to achieve logging the output data from this device and simultaneously sending correction data to it.
To be more specific, I use RTKLIBs (http://www.rtklib.com/) str2str tool for sending NTRIP/RTCM correction data to the GNSS receiver in order to get better position estimations using DGNSS/RTK.
The receiver's output data will be logged by a python script which is based on the GPS deamon (gpsd).
However, I guess the main issue is related to the serial port control.
When I run the writing process (str2str) first and afterwards any reading process (my python script/gpsd frontends (e.g. cgps) /cat) at the same time, the reading process will output data for a few seconds and freeze then. It doesn't matter which tool I use for reading the data.
I found this question: https://superuser.com/questions/488908/sharing-a-serial-port-between-two-processes. Therefore I made sure that the processes got rw access to the device and even tried running them as superuser. Furthermore I stumbled upon socat and virtual serial ports, but didn't find any use for it. (Virtual Serial Port for Linux)
Is there any way to read data from a serial port with one process and write to it with another? The only solution I know of right now might be to rewrite the read and write process in python using pySerial. This would allow to only have one process accessing the serial device, but would mean plenty of work.
Finally I found a soultion using a construction somehow similar to this: https://serverfault.com/questions/453032/socat-to-share-a-serial-link-between-multiple-processes
A first socat instance (A) gets GNSS correction data from a TCP connection, which is piped to socat B. Socat B manages the connection to the serial device and pipes output data to another socat instance C, which allows other processes such as gpsd to connect and get the receiver's output from TCP port.
In total, this looks like:
socat -d -d -d -u -lpA TCP4:127.0.0.1:10030 - 2>>log.txt |
socat -d -d -d -t3 -lpB - /dev/gps,raw 2>>log.txt|
socat -d -d -d -u -lpC - TCP4-LISTEN:10031,forever,reuseaddr,fork 2>>log.txt
With only one process managing the serial connection, it doesn't block anymore.

Fetching the TCP RTT in Linux

I need to fetch the RTT for TCP flow.
I have looked into the proc file system but not able to get the RTT value of TCP .If any one having any idea regarding it that, in which file RTT would be stored pleae share.
Thanks in advance.
Maybe the ss (socket statistics) util available in the iproute utils can help you with this.
# ss -i 'src 1.1.1.1:1234 and dst 2.2.2.2:1234'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 1.1.1.1:1234 2.2.2.2:1234
reno wscale:2,7 rto:3380 rtt:855/602.5 ato:40 ssthresh:2 send 27.3Kbps rcv_space:5840
If you want more information what the rtt field is i think it is best to take a look at ss.c.
You can so this using tcpprobe (inserts a hook into the tcp_recv processing path using kprobe and records the state of a TCP connection in response to incoming packets).
Explained here: Extract TCP round trip time (RTT) estimations on linux
It also possible to print the cached rtts (and rttvar, cwnd) for previous destinations using the ip command:
sudo ip tcp_metrics

How to programmatically increase the per-socket buffer for UDP sockets on LInux?

I'm trying to understand the correct way to increase the socket buffer size on Linux for our streaming network application. The application receives variable bitrate data streamed to it on a number of UDP sockets. The volume of data is substantially higher at the start of the stream and I've used:
# sar -n UDP 1 200
to show that the UDP stack is discarding packets and
# ss -un -pa
to show that each socket Recv-Q length grows to the nearly the limit (124928. from sysctl net.core.rmem_default) before packets are discarded. This implies that the application simply can't keep up with the start of the stream. After discarding enough initial packets the data rate slows down and the application catches up. Recv-Q trends towards 0 and remains there for the duration.
I'm able to address the packet loss by substantially increasing the rmem_default value which increases the socket buffer size and gives the application time to recover from the large initial bursts. My understanding is that this changes the default allocation for all sockets on the system. I'd rather just increase the allocation for the specific UDP sockets and not modify the global default.
My initial strategy was to modify rmem_max and to use setsockopt(SO_RCVBUF) on each individual socket. However, this question makes me concerned about disabling Linux autotuning for all sockets and not just UDP.
udp(7) describes the udp_mem setting but I'm confused how these values interact with the rmem_default and rmem_max values. The language it uses is "all sockets", so my suspicion is that these settings apply to the complete UDP stack and not individual UDP sockets.
Is udp_rmem_min the setting I'm looking for? It seems to apply to individual sockets but global to all UDP sockets on the system.
Is there a way to safely increase the socket buffer length for the specific UDP ports used in my application without modifying any global settings?
Thanks.
Jim Gettys is armed and coming for you. Don't go to sleep.
The solution to network packet floods is almost never to increase buffering. Why is your protocol's queueing strategy not backing off? Why can't you just use TCP if you're trying to send so much data in a stream (which is what TCP was designed for).

How do I get min/avg/var rtt for all TCP connections in Linux?

I'm trying to implement a software that tracks open TCP connections and classify them based on TCP round trip time estimates, on Linux. I'm looking for similar information that the program nettop shows on MacOS X.
$ nettop -m tcp
It shows a list of open connections by the process that owns them. It includes the current round trip time min, mean and variance estimates for each connection.
For listening to a program's own connections it could be done something like in http://linuxgazette.net/136/pfeiffer.html but I'm looking for something like nettop that shows the information for all connections on the machine. On OS X that does not require root access but it is fine if the answer does.
I'd prefer a Python compatible version but if not available, I can live with C. If there is an existing command-line utility like nettop for Linux, that's also interesting.
Related:
Wikipedia: Karn's algorithm
Some of this information is available in the command:
ss -i -t
If you want to do this with your own code you can look at the output of libpcap or tcpdump and compare the timestamp on packets with corresponding sequence and ack numbers and average those out for the last few seconds.
12:19:39.331248 IP 10.0.60.243.ssh > 192.168.50.22.21950: P 11952:12180(228)
12:19:39.331388 IP 10.0.60.243.ssh > 192.168.50.22.21950: P 12328:12476(148)
12:19:39.380981 IP 192.168.50.22.21950 > 10.0.60.243.ssh: . ack 11952 win 65535
12:19:39.381039 IP 10.0.60.243.ssh > 192.168.50.22.21950: P 12624:12772(148)
12:19:39.381054 IP 192.168.50.22.21950 > 10.0.60.243.ssh: . ack 12328 win 65159
12:19:39.381058 IP 192.168.50.22.21950 > 10.0.60.243.ssh: . ack 12624 win 64863
This would average out to about 50ms rtt

Linux app sends UDP without socket

fellow coders.
I'm monitoring my outgoing traffic using libnetfilter_queue module and an iptables rule
ipatbles -I OUTPUT 1 -p all -j NFQUEUE --queue-num 11220
A certain app, called Jitsi (which runs on Java) is exhibiting such a strange behaviour a haven't encountered before:
My monitoring program which process NFQUEUE packets clearly shows that UDP packets are being sent out,
yet when I look into:
"/proc/net/udp" and "/proc/net/udp6" they are empty, moreover "/proc/net/protocols" has a column "sockets" for UDP and it is 0.
But the UDP packets keep getting sent.
Then after a minute or so, "/proc/net/udp" and "/proc/net/protocols" begin to show the correct information about UDP packets.
And again after a while there is no information in them while the UDP packets are being sent.
My only conclusion is that somehow it is possible for an application to send UDP packets without creating a socket and/or it is possible create a socket, then delete it (so that kernel thinks there are none) and still use some obscure method to send packets outside.
Could somebody with ideas about such behaviour land a hand, please?
Two ideas:
Try running the app through strace and take a look at that output.
You could also try to run it through systemtap with a filter for the socket operations.
From that link:
probe kernel.function("*#net/socket.c").call {
printf ("%s -> %s\n", thread_indent(1), probefunc())
}
probe kernel.function("*#net/socket.c").return {
printf ("%s <- %s\n", thread_indent(-1), probefunc())
}
Thank you Paul Rubel for giving me a hint in the right direction. strace showed that Java app was using IPv6 sockets. I had a closer look at /proc/net/udp6 and there those sockets were. I probably had too cursory a view the first time around chiefly because I didn't even expect to find them there. This is the first time I stumbled upon IPv4 packets over IPv6 sockets. But that is what Java does.
Cheers.

Resources