Determine TCP payload activity/statistics

Determine TCP payload activity/statistics - linux

I'd like to lookup a counter of the TCP payload activity (total bytes received) either for a given file descriptor or a given interface. Preferably the given file descriptor, but for the interface would be sufficient. Ideally I'd really like to know about any bytes that have been ack-ed, even ones which I have not read into userspace (yet?).
I've seen the TCP_INFO feature of getsockopt() but none of the fields appear to store "Total bytes received" or "total bytes transmitted (acked, e.g.)" so far as I can tell.
I've also seen the netlink IFLA_STATS+RTNL_TC_BYTES and the SIOCETHTOOL+ETHTOOL_GSTATS ioctl() (rx_bytes field) for the interfaces, and those are great, but I don't think they'll be able to discriminate between the overhead/headers of the other layers and the actual payload bytes.
procfs has /proc/net/tcp but this doesn't seem to contain what I'm looking for either.
Is there any way to get this particular data?
EDIT: promiscuous mode has an unbearable impact on throughput, so I can't leverage anything that uses it. Not to mention that implementing large parts of the IP stack to determine which packets are appropriate is beyond my intended scope for this solution.
The goal is to have an overarching/no-trust/second-guess of what values I store from recvmsg().
The Right Thing™ to do is to keep track of those values correctly, but it would be valuable to have a simple "Hey OS? How many bytes have I really received on this socket?"

One could also use ioctl call with SIOCINQ to get the amount of queued unread data in the receive buffer. Here is usage from the man page: http://man7.org/linux/man-pages/man7/tcp.7.html
int value;
error = ioctl(tcp_socket_fd, SIOCINQ, &value);
For interface TCP stats, we can use " netstat -i -p tcp" to find stats on a per-interface basis.

Do you want this for diagnosis, or for development?
If diagnosis, tcpdump can tell you exactly what's happening on the network, filtered by the port and host details.
If for development, perhaps a bit more information about what you're trying to achieve would help...
ifconfig gives RX and TX totals.
ifconfig gets these details from /proc/net/dev (as you can see via strace ifconfig).
There are also the Send/Receive-Q values given by netstat -t, if that's closer to what you want.

Perhaps the statistics in /proc/net/dev can help. I am not familiar with counting payload versus full packets including headers, so that makes the question harder to answer.
As for statistics on individual file descriptors, I am not aware of any standard means to get that information.
If it's possible to control startup of the programs for which the statistics are needed, it is possible to use an "interceptor" library which implements its own read(), write(), sendto(), and recvfrom() calls, passthrough the calls to the standard C library (or directly to system call), keep counters of the activity, and find a way to publish those values.

In case you don't want to just count total RX/TX per interface (which is already available in ifconfig/iproute2 tools)...
If you look into /proc a bit more, you can get somewhat more information. More specifically /proc/<pid>/net/dev.
Sample output:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
eth0: 12106810846 8527175 0 15842 0 0 0 682866 198923814 1503063 0 0 0 0 0 0
lo: 270255057 3992930 0 0 0 0 0 0 270255057 3992930 0 0 0 0 0 0
sit0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
If you start looking, the information is coming from net/core/net-procfs.c from Linux kernel (procfs just uses this info). All of this of course means you need specific process to track.
You can either peruse information available in /proc or if you need something more stable, then duplicating net-procfs functionality specifically for your application might make sense.

Related

Need to measure pairwire (ip based) bandwidth used over time when using TC

I need to measure the datarates of packets between multiple servers. I need pairwise bandwidths between the servers (if possible even the ports), not the overall datarate per interface on each server.
Example output
Timestamp
Server A to B
Server B to A
Server A to C
Server C to A
0
1
2
1
5
1
5
3
7
1
What I tried or thought of
tcpdump - I was capturing all the packets and looking at ip.len for getting the datarates. It worked quite well till I started testing along with TC.
Turns out tcpdump captures packets at a lower layer than TC. So, the bandwidths I measure using this can't see the limit set by TC.
netstat - I tried using this by greping the output and look at Recv-Q and Send-Q columns. But later I found out that it reports the bytes that have been received and are buffered, waiting for the local process that is using this connection to read and consume them. I won't be able to use them to get bandwidth being used.
iftop - Amazing GUI and has all the things I need. But no way to get the output in a good way to process. Might also overwhelm the storage because of the amount of extra text it stores along with.
bwm-ng - Gives overall datarate per interface on each server but not pairwise.
Please let me know if there are any other ways to achieve what I need.
Thanks in advance for your help.

Get network connections from /proc/net/sockstat

I founds this information in /proc which displays sockets:
$ cat /proc/net/sockstat
sockets: used 8278
TCP: inuse 1090 orphan 2 tw 18 alloc 1380 mem 851
UDP: inuse 6574
RAW: inuse 1
FRAG: inuse 0 memory 0
Can you help me to find what these values means? Also are these values enough reliable or I need to search for it somewhere else?
Is these other way to find information about the TCP/UDP connections in Linux?

Can you help me to find what these values means?
As per code here the values are number of sockets in use (TCP / UDP), number of orphan TCP sockets (socket that applications have no more handles to, they already called close()). TCP tw I am not sure but, based on structure name (tcp_death_row), those are the sockets to be definitively destroyed in near future? sockets represent the number of allocated sockets (as per my understand, contemplates TCP sockets in different states) and mem is number of pages allocated by TCP sockets (memory usage).
This article has some discussions around this topic.

In my understanding the /proc/net/sockstat is The most reliable place to look for that information. I often use it myself, and to have one single server to manage 1MM simultaneous connections that was the only place I could reliably count that information.

You can use the netstat command which itself utilizes the /proc filesystem but prints information more readable for humans.
If you want to display the current tcp connections for example, you can issue the following command:
netstat -t
Check man netstat for the numerous options.

UDP Packet drop - INErrors Vs .RcvbufErrors

I wrote a simple UDP Server program to understand more about possible network bottlenecks.
UDP Server: Creates a UDP socket, binds it to a specified port and addr, and adds the socket file descriptor to epoll interest list. Then its epoll waits for incoming packet. On reception of incoming packet(EPOLLIN), its reads the packet and just prints the received packet length. Pretty simple, right :)
UDP Client: I used hping as shown below:
hping3 192.168.1.2 --udp -p 9996 --flood -d 100
When I send udp packets at 100 packets per second, I dont find any UDP packet loss. But when I flood udp packets (as shown in above command), I see significant packet loss.
Test1:
When 26356 packets are flooded from UDP client, my sample program receives ONLY 12127 packets and the remaining 14230 packets is getting dropped by kernel as shown in /proc/net/snmp output.
cat /proc/net/snmp | grep Udp:
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 12372 0 14230 218 14230 0
For Test1 packet loss percentage is ~53%.
I verified there is NOT much loss at hardware level using "ethtool -S ethX" command both on client side and server side, while at the appln level I see a loss of 53% as said above.
Hence to reduce packet loss I tried these:
- Increased the priority of my sample program using renice command.
- Increased Receive Buffer size (both at system level and process level)
Bump up the priority to -20:
renice -20 2022
2022 (process ID) old priority 0, new priority -20
Bump up the receive buf size to 16MB:
At Process Level:
int sockbufsize = 16777216;
setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF,(char *)&sockbufsize, (int)sizeof(sockbufsize))
At Kernel Level:
cat /proc/sys/net/core/rmem_default
16777216
cat /proc/sys/net/core/rmem_max
16777216
After these changes, performed Test2.
Test2:
When 1985076 packets are flooded from UDP client, my sample program receives 1848791 packets and the remaining 136286 packets is getting dropped by kernel as shown in /proc/net/snmp output.
cat /proc/net/snmp | grep Udp:
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 1849064 0 136286 236 0 0
For Test2 packet loss percentage is 6%.
Packet loss is reduced significantly. But I have the following questions:
Can the packet loss be further reduced?!? I know I am greedy here :) But I am just trying to find out if its possible to reduce packet loss further.
Unlike Test1, in Test2 InErrors doesnt match RcvbufErrors and RcvbufErrors is always zero. Can someone explain the reason behind it, please?!? What exactly is the difference between InErrors and RcvbufErrors. I understand RcvbufErrors but NOT InErrors.
Thanks for your help and time!!!

Tuning the Linux kernel's networking stack to reduce packet drops is a bit involved as there are a lot of tuning options from the driver all the way up through the networking stack.
I wrote a long blog post explaining all the tuning parameters from top to bottom and explaining what each of the fields in /proc/net/snmp mean so you can figure out why those errors are happening. Take a look, I think it should help you get your network drops down to 0.

If there aren't drops at hardware level then should be mostly a question of memory, you should be able to tweak the kernel configuration parameters to reach 0 drops (obviously you need a reasonable balanced hardware for the network traffic you're recv'ing).
I think you're missing netdev_max_backlog which is important for incoming packets:
Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.

InErrors is composed of:
corrupted packets (incorrect headers or checksum)
full RCV buffer size
So my guess is you have fixed the buffer overflow problem (RcvbufErrors is 0) and what is left are packets with incorrect checksums.

UDP IP Fragmentation and MTU

I'm trying to understand some behavior I'm seeing in the context of sending UDP packets.
I have two little Java programs: one that transmits UDP packets, and the other that receives them. I'm running them locally on my network between two computers that are connected via a single switch.
The MTU setting (reported by /sbin/ifconfig) is 1500 on both network adapters.
If I send packets with a size < 1500, I receive them. Expected.
If I send packets with 1500 < size < 24258 I receive them. Expected. I have confirmed via wireshark that the IP layer is fragmenting them.
If I send packets with size > 24258, they are lost. Not Expected. When I run wireshark on the receiving side, I don't see any of these packets.
I was able to see similar behavior with ping -s.
ping -s 24258 hostA works but
ping -s 24259 hostA fails.
Does anyone understand what may be happening, or have ideas of what I should be looking for?
Both computers are running CentOS 5 64-bit. I'm using a 1.6 JDK, but I don't really think it's a programming problem, it's a networking or maybe OS problem.

Implementations of the IP protocol are not required to be capable of handling arbitrarily large packets. In theory, the maximum possible IP packet size is 65,535 octets, but the standard only requires that implementations support at least 576 octets.
It would appear that your host's implementation supports a maximum size much greater than 576, but still significantly smaller than the maximum theoretical size of 65,535. (I don't think the switch should be a problem, because it shouldn't need to do any defragmentation -- it's not even operating at the IP layer).
The IP standard further recommends that hosts not send packets larger than 576 bytes, unless they are certain that the receiving host can handle the larger packet size. You should maybe consider whether or not it would be better for your program to send a smaller packet size. 24,529 seems awfully large to me. I think there may be a possibility that a lot of hosts won't handle packets that large.
Note that these packet size limits are entirely separate from MTU (the maximum frame size supported by the data link layer protocol).

I found the following which may be of interest:
Determine the maximum size of a UDP datagram packet on Linux
Set the DF bit in the IP header and send continually larger packets to determine at what point a packet is fragmented as per Path MTU Discovery. Packet fragmentation should then result in a ICMP type 3 packet with code 4 indicating that the packet was too large to be sent without being fragmented.
Dan's answer is useful but note that after headers you're really limited to 65507 bytes.

What are the units of UDP buffers, and where are docs for sysctl params?

I'm running x86_64 RedHat 5.3 (kernel 2.6.18) and looking specifically at net.core.rmem_max from sysctl -a in the context of trying to set UDP buffers. The receiver application misses packets sometimes, but I think the buffer is already plenty large, depending upon what it means:
What are the units of this setting -- bits, bytes, packets, or pages? If bits or bytes, is it from the datagram/ payload (such as 100 bytes) or the network MTU size (~1500 bytes)? If pages, what's the page size in bytes?
And is this the max per system, per physical device (NIC), per virtual device (VLAN), per process, per thread, per socket/ per multicast group?
For example, suppose my data is 100 bytes per message, and each network packet holds 2 messages, and I want to be able to buffer 50,000 messages per socket, and I open 3 sockets per thread on each of 4 threads. How big should net.core.rmem_max be? Likewise, when I set socket options inside the application, are the units payload bytes, so 5000000 on each socket in this case?
Finally, in general how would I find details of the units for the parameters I see via sysctl -a? I have similar units and per X questions about other parameters such as net.core.netdev_max_backlog and net.ipv4.igmp_max_memberships.
Thank you.

You'd look at these docs. That said, many of these parameters really are quite poorly documented, so do expect do do som googling to dig out the gory details from blogs and mailinglists.
rmem_max is the per socket maximum buffer, in bytes. Digging around, this appears to be the memory where whole packets are received, so the size have to include the sizes of whatever/ip/udp headers as well - though this area is quite fuzzy to me.
Keep in mind though, UDP is unreliable. There's a lot of sources for loss, not the least inbetween switches and routers - these have buffers as well.

It is fully documented in the socket(7) man page (it is in bytes).
Moreover, the limit may be set on a per-socket basis with SO_RCVBUF (as documented in the same page).
Read the socket(7), ip(7) and udp(7) man pages for information on how these things actually work. The sysctls are documented there.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string