Linux kernel and realtek rtl8139 driver - linux

I'm trying to write driver for rtl8139 for linux 2.6 from scratch. I've already written TX path, but I have some problems with RX.
I put RX into promiscous mode and receiving RX irqs. I set RBSTART into physical address of allocated memory by kmalloc.
I don't know how to find out how many received packets there are and how long they are.
I thought that ERBCR, CAPR, CBR registers tell it, but they are == 0.
Maybe I'm doing something wrong? How to find out anything about received packets?

I answer to my question myself.
The received packets are located starting at RBSTART. The first two bytes of rx-ed packet are status bytes, and the next 2 are length of the frame + 4 bytes of crc.
Maybye someone find this info helpful.

On receiving a packet, the data received from the line is stored in the receive FIFO. When Early Receive Threshold is met, the data is moved from FIFO to Recieve Buffer.
So, once you get an interrupt. You need to check the Interrupt Status Register for ROK. Then check the Early Rx status register which gives you the status of the packet received. If EROK is set, then check the Receive buffer status for ROK. Check for are any errors in the ISR and ERSR. Also check your Rx Configuration register for the threshold configuration for Rx FIFO, RX buf length.

Related

When using recv(n), with n greather than the MTU are you guaranteed to read at least a whole layer 2 frame?

I was wondering, imagine if there is no data to read from a TCP socket, then a whole frame of 1492 bytes arrives (full). In your code (C or any language supporting TCP) you have let's say recv 4096 bytes, will the OS guarantee that the recv reads the whole 1492 bytes, or is it possible that the loading of the frame in memory and recv are "interleaved", so the recv may get less ?
TCP is a stream oriented protocol. Data are received in order but you must not do any assumption about how many times you have to call recv until you receive all your data.
It is up to your application to repeat the calls to recv until you know you have received what you need.
(1) TCP is stream-oriented protocol. This means that it accepts a stream of data from the upper layer on the sender and returns the stream of data to the upper layer on the receiver. TCP itself receives packets from IP layer, and then reconstructs the stream. That is at some points packets cease to exist. In theory it is possible that somewhere during this reconstructed stream, only half of the incomming packet is copied in buffer, but it seems to me pretty unlikely that this would happen.
Now, linux man page states
The receive calls normally return any data available up to the requested amount,
I would interpret it as "if one packet has arrived (correctly, in order, etc), you will get the whole packet worth of data". But there is no guarantee.
On the other hand Windows docs states:
recv will return as much data as is currently available—up to the size of the buffer specified.
Which sounds more like the guarantee.
Note, however, that the data will only be returned if the packet is received correctly, and it is next in-order packet (with next expected sequence numbers).
(2) Now, TCP layer works on complete packets. It is actually impossible for it to do interleaving or anything. Ethernet has a checksum, which cannot be computed unless the packet was received completely. Packets with incorrect Ethernet checksum should be filtered out by the network card. TCP also has a checksum which requires all packet data to compute. So, if the network card has passed the packet to your OS, then data should be available.
(3) I don't think you can assume that if the packet is received, it is immediatelly available. A pretty common feature of network cards is TCP segmentation offload, which reconstructs part of the stream and results in network card passing one TCP packet that was reconstructed from multiple TCP packets. There are other things that can be in place to reduce the number of interrupts, which more or less result in several packets comming at once. So, the more likely situation is that you will have maybe some delay and then receive data from several packets at once.
The point is, the opposite of what you described is likely to happen. However, I still would not write an application that makes any assumptions about how large a chunk of data is available at a time. This negates the concept of a stream.

RealTek 8168 (r8169 linux driver) - Rx descriptor ring confusion

I'm working on control system stuff, where the NIC interrupt is used as a trigger. This works very well for cycle times greater than 3 microseconds. Now i want to make some performance tests and measure the transmission time, respectively the shortest time between two interrupts.
The sender is sending 60 byte packages as fast as possible. The receiver should generate one interrupt per package. I'm testing with 256 packets, the size of the Rx descriptor ring. The packet data won't handled during the test. Only the interrupt is interesting.
The trouble is, that the reception is very fast up to less then 1 microsecond between two interrupts, but only for around 70 interrupts / descriptors. Then the NIC sets the RDU (Rx Descriptor Unavailable) bit and stops the receiving before reaching the end of the ring. The confusing thing is, when i increase the size of the Rx descriptor ring up to 2048 (e.g.), then the number of interrupts is increasing too (around 800). I don't understand this behavior. I thought he should stop again after 70 interrupts.
It seems to be a time problem, but why? I'm overlooking something, but what? Can somebody help my?
Thanks in advance!
What I think is that due to large RX packet rate, your receive interrupts are missing . Don't count interrupts to see how many packets are received.Rely on "own" bit of Receive descriptors.
Receive Descriptor unavailable will be set only when you reach end of the ring unless you have made some error in programming RX descriptors (e.g. forgot to set ownership bit)
So if your RX ring has 256 descriptors, I think you should receive 256 packets without recycling RX descriptors.
If you are doubtful whether you are reaching the end of ring or not, try setting interrupt on completion bit of only last RX descriptor.In this way you receive only one interrupt at the end of ring.

UDP tuning linux

I have C application which transmits UDP stream. It works well in most of servers, but its crazy on few servers.
I have 100 Mbps network connection say eth1 on server. Using this network I usually transmit (TX) around 10-30 Mbps UDP streams, and this network connection will have around 100-300 Kbps RX to server. I have other network connection say eth0 in server from which C application receives UDP streams and forwards to 100 Mbps network connection, eth1.
My application uses blocking sendto() function to transmit UDP packets in eth1. Packets are of variable length, from 17 bytes to maximum 1333 bytes. But most of time, more than 1000 bytes.
The problem is: sometime sendto function blocks on eth1 for huge time around 1 second. This happens once in every 30 seconds to 3 minutes. When sendto blocks, I will have lot of UDP packets buffered in UDP receive buffer from eth0 by kernel, from where C application receive packets. Once sendto returns from long blocking call on eth1, C application will have lot of buffered packets to transmit from eth0. And then C application transmits all these buffered packets with next sendto calls. This will create spike in rate at other endpoint which receives UDP stream from eth1. This will create Z like rate graph at other endpoint. So this Z like spike in rate is my problem.
I have tried to increase wmem_default from around 131 KB to 5 MB in kernel setting to overcome spike. And setting this resolves my issue of spike. Now I don't get Z like spike in rate at other endpoint, but I got new issue. The new issue is: I get lot of packet losses in place of spike. I think it may be due to send buffer of eth1 accumulating lot of packets to send while sending current packet from eth1 takes lot of time (this is why may be sendto blocking long). And at next instant when NIC sends all accumulated packets from send buffer in short time, this may be causing network congestion and I may be getting lot of packet losses instead of spike.
So, this is second problem. But I think root cause is: why sometime NIC pauses for long time while sending traffic, once in every 30 seconds to 3 minutes?
May be I need to look in TX ring buffer of driver of eth1? When socket send buffer gets full due to NIC not transmitting all in time (due to random long TX pauses), then next call to sendto blocks for room in socket send buffer, does that also blocks for room in driver TX ring buffer?
Please dont tell me that UDP is unreliable and we can't control packet losses. I know that its unreliable and UDP packets can be lost. But I am sure still we can do something to minimize packet losses.
EDIT
I have tried to increase wmem_default from around 131 KB to 5 MB in kernel setting to overcome spike. And also I have removed blocking sendto call. Now I use like: sendto(sockfd, buf, len, MSG_DONTWAIT ,dest_addr, addrlen); with large send buffer using wmem_default. Also I am not getting any EAGAIN or EWOULDBLOCK errors on sendto due to large send buffer, but still packets loosing in place of spike.
EDIT
As non-blocking sendto call with huge wmem_default, and as NO any EAGAIN or EWOULDBLOCK errors from sendto, spikes have been removed because no much packets accumulating in receive buffer of eth0. I think its possible solution from application side. But main problem is why NIC slows every few moments? What can be possible reasons? While it resumes from long TX pause, and may be it will have lot of packets accumulated in send buffer, which will be sent as burst next moment and congesting network so lot of packet losses.
More update
I use same this C application to transmit locally in machine (127.0.0.1), and I never get any spikes or packet losses problems locally.
The problem is: sometime sendto function blocks on eth1 for huge time around 1 second.
Blocking sendto may block, surprisingly.
The problem is: sometime sendto function blocks on eth1 for huge time around 1 second.
It could be that IP stack is performing path MTU discovery:
While MTU discovery is in progress, initial packets from datagram sockets may be dropped. Applications using UDP should be aware of this and not take it into account for their packet retransmit strategy.
I have tried to increase wmem_default from around 131 KB to 5 MB in kernel setting to overcome spike.
Be careful with increasing buffer sizes. After a certain limit increasing buffer sizes only increases the amount of queuing and hence delay, leading to the infamous bufferbloat.
You may also play around with NIC Queuing Disciplines, they are responsible for dropping outgoing packets.

nonblocking send()/write() and pending data dealing

when the send(or write) buffer is going to be full, let me say, only theres is only 500 bytes space. if I have a NONBLOCKING fd, and do
n = send(fd, buf, 1000,0)
here I wll get n<0, and I can get EWOULDBLOCK or EAGAIN error. my questions are:
1 here, the send write 500 bytes into the send buffer or 0 bytes to the send buffer?
2 if 500 bytes are sent to the buffer and if the fd is a UDP socket, then the datagram is split into 2 parts?
3 I need to use the fd to send many datagrams, if this time the send buffer is full(if there is EWOULDBLOCK or EAGAIN error), I need to make a pending list of datagrams(a FIFO queue). And everytime I want to send some datagram, I will have to check the pending list to see whether it is empty or not. if it is not empty, send the datagram in the pending list first. It seems to me that this design is a bit troublesome. And the design is similar to extending the kernel(BTW, is it in the kernel?) send buffer by a userspace pending list. Is there better solution for this?
thanks!
The below only applies to UDP (and only in linux, though others are similar), but that seems to be what you're asking about.
Setting non-blocking mode on a UDP socket is completely irrelevant (for sending) as a send will never block -- it immediately sends the packet, without any buffering.
It IS possible (if the machine is very busy) for there to be a buffer space problem (run out of transient packet buffers for packet processing), but in that case call will return ENOBUFS, regardless of whether the fd is blocking or non-blocking. This should be extremely rare.
There's a potential problem if you're generating packets faster than the network can take them (fairly easy to do on a fast machine and a 10Mbit ethernet port), in which case the kernel will start dropping the outgoing packets. Unfortunately there's no easy way to detect when this happens (you can check the interface for TX dropped packets, but that won't tell you which packets were dropped).
Its also possible to have a problem if you use the UDP_CORK socket option, which buffers data written to the socket instead of sending a packet, and only sends a single packet when the CORK option is unset. In this case, if the buffer grows too big you'll get EMSGSIZE (and again, the NONBLOCKING setting is irrelevant).
If you are talking about UDP, you are completely off point here - for UDP the value of the SO_SNDBUF socket option puts a limit on the size of a datagram you can send. In other words, there's no real per-socket send buffer (though data is still queued in the kernel to be sent out by appropriate network controller). You would get EMSGSIZE if you try to send(2) more in one shot.
For TCP though, you would only get EWOULDBLOCK when there's no space in the send buffer at all, i.e. no data has been copied from user to the kernel. Otherwise sent(2)'s return value tells you exactly how many bytes have been copyed.

What are meanings of fields in /proc/net/dev?

The Linux file /proc/net/dev reads like this:
[me#host ~]$ cat /proc/net/dev
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
What do fields drop and errs mean?
Are some errs packets also counted in the drop packets?
Why is a packet considered errs , is it because that it suffers from checksum error?
Why is a packet dropped? Is it because that the system has no enough buffer of because there is some burst on the NIC?
Do the two fields take packets that are destined to another host (e.g. when the NIC is working in promiscuous mode) into consider?
You can have a look at net/core/dev.c in the source tree to see what it means:
seq_printf(seq, "%6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu "
"%8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n",
dev->name,
stats->rx_bytes,
stats->rx_packets,
stats->rx_errors,
stats->rx_dropped + stats->rx_missed_errors,
stats->rx_fifo_errors,
stats->rx_length_errors + stats->rx_over_errors +
stats->rx_crc_errors + stats->rx_frame_errors,
stats->rx_compressed,
stats->multicast,
stats->tx_bytes,
stats->tx_packets,
stats->tx_errors,
stats->tx_dropped,
stats->tx_fifo_errors,
stats->collisions,
stats->tx_carrier_errors + stats->tx_aborted_errors +
stats->tx_window_errors + stats->tx_heartbeat_errors,
stats->tx_compressed);
So:
receive errors means any kind of invalid packet, e.g. invalid length or invalid checksum
transmit errors are
carrier errors
aborted errors
window errors
heartbeat errors
(whatever they all mean)
And yes, I think drops means when the device dropped a packet because it ran out of buffer space.
According to http://www.onlamp.com/pub/a/linux/2000/11/16/LinuxAdmin.html, the meanings of each of the columns are:
bytes The total number of bytes of data transmitted or received by the interface.
packets The total number of packets of data transmitted or received by the interface.
errs The total number of transmit or receive errors detected by the device
driver.
drop The total number of packets dropped by the device driver.
fifo The number of FIFO buffer errors.
frame The number of packet framing errors.
colls The number of collisions detected on the interface.
compressed The number of compressed packets transmitted or received by the device
driver. (This appears to be unused in the 2.2.15 kernel.)
carrier The number of carrier losses detected by the device driver.
multicast The number of multicast frames transmitted or received by the device
driver.
Since noone has answered for almost six months, I feel free to speculate:
I don't think the errs and drops overlap. I also think that errs are checksum or other bad data in a received packet (i.e. not enough data to constitute a whole packet). Further, I believe drops only apply to outgoing packages - how would the system know about dropped packages somewhere else?

Resources