As I know, there's a ring buffer for Network Interface Card, and if such buffer is overflow, the incoming packet will be dropped unless kernel drains packet and free space in buffer.
My question is how to detect such NIC ring buffer overflow on Linux?
How to simulate such ring buffer overflow on Linux? Modification of /proc is acceptable if necessary.
Updated at Feb 2, 2016:
I will accept John Zwinck's explanation as answer, while if anyone have knowledge of simulating the ring buffer overflow, please also let me know, thanks in advance.
1) You can detect buffer overflow by examination of NIC statistics: ethtool -s eno1. There will be a lot of information in output. Field's name are driver depended. For example in tg3 driver filed "rx_discards" - is one you are looking for. It store the amount of packets which was dropped due to full buffer.
2) When I need to get packets dropping, I set buffer size into very small value (2 for example): ethtool -G eno1 rx 2. And load the network card with netperf.
I don't think you can (portably) detect or simulate this. There may be ways to do it using a specific NIC driver, but you'd have to specify exactly what you're using, and I suspect for consumer-grade products it won't be possible. You can measure and adjust the size of the ring buffers using ethtool -g however, which is explained here: http://www.scottalanmiller.com/linux/2011/06/20/working-with-nic-ring-buffers/
I tried to fill the buffer by sudo ethtool -C eth0 rx-usecs 10000 and sudo ethtool -G eth0 rx 48. The first command sets how many usecs to delay an RX interrupt after a packet arrives. And the second command sets the RX ring buffer size. Then I opened some websites and watched some videos,the ring buffer was full then the dropped packet count increased.
And it seems that the minimum ring buffer size is 48, as no matter how small I set it, ethtool -g eth0 always shows that the current buffer size is 48.
Related
I'm working on control system stuff, where the NIC interrupt is used as a trigger. This works very well for cycle times greater than 3 microseconds. Now i want to make some performance tests and measure the transmission time, respectively the shortest time between two interrupts.
The sender is sending 60 byte packages as fast as possible. The receiver should generate one interrupt per package. I'm testing with 256 packets, the size of the Rx descriptor ring. The packet data won't handled during the test. Only the interrupt is interesting.
The trouble is, that the reception is very fast up to less then 1 microsecond between two interrupts, but only for around 70 interrupts / descriptors. Then the NIC sets the RDU (Rx Descriptor Unavailable) bit and stops the receiving before reaching the end of the ring. The confusing thing is, when i increase the size of the Rx descriptor ring up to 2048 (e.g.), then the number of interrupts is increasing too (around 800). I don't understand this behavior. I thought he should stop again after 70 interrupts.
It seems to be a time problem, but why? I'm overlooking something, but what? Can somebody help my?
Thanks in advance!
What I think is that due to large RX packet rate, your receive interrupts are missing . Don't count interrupts to see how many packets are received.Rely on "own" bit of Receive descriptors.
Receive Descriptor unavailable will be set only when you reach end of the ring unless you have made some error in programming RX descriptors (e.g. forgot to set ownership bit)
So if your RX ring has 256 descriptors, I think you should receive 256 packets without recycling RX descriptors.
If you are doubtful whether you are reaching the end of ring or not, try setting interrupt on completion bit of only last RX descriptor.In this way you receive only one interrupt at the end of ring.
I am calling a recvfrom api of a valid address , where i am trying to read data of size 9600 bytes , the buffer i have provided i of size 12KB , I am not even getting select read events.
Even tough recommended MTU size is 1.5 KB, I am able to send and receive packets of 4 KB.
I am using android NDK , (Linux) for development.
Please help . Is there a socket Option i have to set to read large buffers ?
If you send a packet larger than the MTU, it will be fragmented. That is, it'll be broken up into smaller pieces, each which fits within the MTU. The problem with this is that if even one of those pieces is lost (quite likely on a cellular connection...), the entire packet will effectively disappear.
To determine whether this is the case you'll need to use a packet sniffer on one (or both) ends of the connection. Wireshark is a good choice on a PC end, or tcpdump on the android side (you'll need root). Keep in mind that home routers may reassemble fragmented packets - this means that if you're sniffing packets from inside a home router/firewall, you might not see any fragments arrive until all of them arrive at the router (and obviously if some are getting lost this won't happen).
A better option would be to simply ensure that you're always sending packets smaller than the MTU, of course. Fragmentation is almost never the right thing to be doing. Keep in mind that the MTU may vary at various hops along the path between server and client - you can either use the common choice of a bit less than 1500 (1400 ought to be safe), or try to probe for it by setting the MTU discovery flag on your UDP packets (via IP_MTU_DISCOVER) and always sending less than the value returned by getsockopt's IP_MTU option (including on retransmits!)
I would like to have UDP packets copied directly from the ethernet adapter into my userspace buffer
Some details on my setup:
I am receiving data from a pair of gigabit ethernet cameras. Combined I am receiving 28800 UDP packets per second (1 packet per line * 30FPS * 2 cameras * 480 lines). There is no way for me to switch to jumbo frames, and I am already looking into tuning driver level interrupts for reduced CPU utilization. What I am after here is reducing the number of times I am copying this ~40MB/s data stream.
This is the best source I have found on this, but I was hoping there was a more complete reference or proof that such an approach worked out in practice.
This article may be useful:
http://yusufonlinux.blogspot.com/2010/11/data-link-access-and-zero-copy.html
Your best avenues are recvmmsg and increasing RX interrupt coalescing.
http://lwn.net/Articles/334532/
You can move lower and match how Wireshark/tcpdump operate but it becomes futile to attempt any serious processing above it having to decode everything yourself.
At only 30,000 packets per second I wouldn't worry too much about copying packets, those problems arise when dealing with 3,000,000 messages per second.
I'm trying to understand some behavior I'm seeing in the context of sending UDP packets.
I have two little Java programs: one that transmits UDP packets, and the other that receives them. I'm running them locally on my network between two computers that are connected via a single switch.
The MTU setting (reported by /sbin/ifconfig) is 1500 on both network adapters.
If I send packets with a size < 1500, I receive them. Expected.
If I send packets with 1500 < size < 24258 I receive them. Expected. I have confirmed via wireshark that the IP layer is fragmenting them.
If I send packets with size > 24258, they are lost. Not Expected. When I run wireshark on the receiving side, I don't see any of these packets.
I was able to see similar behavior with ping -s.
ping -s 24258 hostA works but
ping -s 24259 hostA fails.
Does anyone understand what may be happening, or have ideas of what I should be looking for?
Both computers are running CentOS 5 64-bit. I'm using a 1.6 JDK, but I don't really think it's a programming problem, it's a networking or maybe OS problem.
Implementations of the IP protocol are not required to be capable of handling arbitrarily large packets. In theory, the maximum possible IP packet size is 65,535 octets, but the standard only requires that implementations support at least 576 octets.
It would appear that your host's implementation supports a maximum size much greater than 576, but still significantly smaller than the maximum theoretical size of 65,535. (I don't think the switch should be a problem, because it shouldn't need to do any defragmentation -- it's not even operating at the IP layer).
The IP standard further recommends that hosts not send packets larger than 576 bytes, unless they are certain that the receiving host can handle the larger packet size. You should maybe consider whether or not it would be better for your program to send a smaller packet size. 24,529 seems awfully large to me. I think there may be a possibility that a lot of hosts won't handle packets that large.
Note that these packet size limits are entirely separate from MTU (the maximum frame size supported by the data link layer protocol).
I found the following which may be of interest:
Determine the maximum size of a UDP datagram packet on Linux
Set the DF bit in the IP header and send continually larger packets to determine at what point a packet is fragmented as per Path MTU Discovery. Packet fragmentation should then result in a ICMP type 3 packet with code 4 indicating that the packet was too large to be sent without being fragmented.
Dan's answer is useful but note that after headers you're really limited to 65507 bytes.
I'm running x86_64 RedHat 5.3 (kernel 2.6.18) and looking specifically at net.core.rmem_max from sysctl -a in the context of trying to set UDP buffers. The receiver application misses packets sometimes, but I think the buffer is already plenty large, depending upon what it means:
What are the units of this setting -- bits, bytes, packets, or pages? If bits or bytes, is it from the datagram/ payload (such as 100 bytes) or the network MTU size (~1500 bytes)? If pages, what's the page size in bytes?
And is this the max per system, per physical device (NIC), per virtual device (VLAN), per process, per thread, per socket/ per multicast group?
For example, suppose my data is 100 bytes per message, and each network packet holds 2 messages, and I want to be able to buffer 50,000 messages per socket, and I open 3 sockets per thread on each of 4 threads. How big should net.core.rmem_max be? Likewise, when I set socket options inside the application, are the units payload bytes, so 5000000 on each socket in this case?
Finally, in general how would I find details of the units for the parameters I see via sysctl -a? I have similar units and per X questions about other parameters such as net.core.netdev_max_backlog and net.ipv4.igmp_max_memberships.
Thank you.
You'd look at these docs. That said, many of these parameters really are quite poorly documented, so do expect do do som googling to dig out the gory details from blogs and mailinglists.
rmem_max is the per socket maximum buffer, in bytes. Digging around, this appears to be the memory where whole packets are received, so the size have to include the sizes of whatever/ip/udp headers as well - though this area is quite fuzzy to me.
Keep in mind though, UDP is unreliable. There's a lot of sources for loss, not the least inbetween switches and routers - these have buffers as well.
It is fully documented in the socket(7) man page (it is in bytes).
Moreover, the limit may be set on a per-socket basis with SO_RCVBUF (as documented in the same page).
Read the socket(7), ip(7) and udp(7) man pages for information on how these things actually work. The sysctls are documented there.