Receive TCP ACKs on application level [duplicate]

Receive TCP ACKs on application level [duplicate] - linux

Linux has ioctl SIOCOUTQ described in man-page tcp(7) that returns amount of unsent data in socket buffers. If I understand kernel code right, all the non-ACKed data is counted as "unsent". The ioctl is available at least since 2.4.x.
Is there anything alike for {Free,Net,Open,*}BSD, Solaris, Windows?

There are (at least) two different pieces of information you might want: the amount of data that hasn't been sent yet, and the amount of data that's been sent-but-not-ACK-ed.
On Linux: SIOCOUTQ is documented to give the amount of unsent data, but actually gives the sum of (unsent data + sent-but-not-ACK-ed data). A recent patch (Feb 2016) made it possible to get the actual unsent data from the tcpi_notsent_bytes field in the TCP_INFO struct.
On macOS and iOS: getsockopt(fd, SOL_SOCKET, SO_NWRITE, ...) is just like SIOCOUTQ: it's documented to give the amount of unsent data, but actually gives the sum of (unsent data + sent-but-not-ACK-ed data). I don't know any way to get more fine-grained information.
On Windows: GetPerTcpConnectionEStats with the TcpConnectionEstatsSendBuff option gives you both unsent data and sent-but-not-ACK-ed data as two separate numbers.
I don't know how to get this information on other operating systems.

Since TCP/IP is implemented as a stream device, it might be possible to take a kernel dive and get the queue->q_count (number of bytes on the queue).

Related

The dmaengine in Linux - how to check the number of actually transferred bytes in a single SG transfer?

The dmaengine in Linux significantly simplifies writing of drivers for devices using DMA, especially if they support and use scatter-gather (SG) transfers.
However, there is a problem in case if the length of such transfer is not known a priori. This situation is quite common. It may happen in case of USB transfers, or in case of AXI Stream transfers. In the case of an AXI Stream connected device, the transfer should be terminated by setting the tlast signal to '1'. The driver should prepare the buffer for the longest possible transfer, but after the transfer is complete, it should be able to find the number of actually transferred bytes. Unfortunately it seems, that there is no documented way to read the length of the completed transfer.
The single transfer is represented with an SG-table, that is later converted into the dma_async_tx_descriptor, using the dmaengine_prep_slave_sg function , submitted to the DMA layer (e.g. like here), and finally scheduled for execution via dma_async_issue_pending. After that, the transfer descriptor may be accessed only via a dma_cookie returned by the dmaengine_prep_slave_sg function.
The problem with determining the actual length of the transfer was reported for the USB transfers, and a patch adding the transferred field to the dma_async_tx_descriptor was proposed. However that proposal was rejected, and after a discussion a solution based on using the dmaengine_tx_status, and checking the residue field in the returned dma_tx_state structure.
Unfortunately, it seems that the proposed solution does not work neither in the 4.4 kernel (used for Xilinx SoC devices) nor in the newest 4.7 kernel.
The residue field is set to 0 in case of completed transfers regardless of the actual number of transferred bytes.
So my question is: How can I reliably determine the actual number of transferred bytes in the completed SG DMA transfer in a dmaengine compatible driver?
PS. That question with more Xilinx-specific details related to the AXI DMA IP core has been also asked on the Xilinx forum.

Finding out the number of dropped packets in raw sockets

I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.

I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function

First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:

Designing a Linux char device driver so multiple processes can read

I notice that for serial devices, e.g. /dev/ttyUSB0, multiple processes can open the device but only one process gets the bytes (whichever reads them first).
However, for the Linux input API, e.g. /dev/input/event0, multiple processes can open the device, and all of the processes are able to read the input events.
My current goal:
I'd like to write a driver for several multi-position switches (e.g. like a slider switch with 3 or 4 possible positions), where apps can get a notification of any switch position changes. Ideally I'd like to use the Linux input API, however it seems that the Linux input API has no support for the concept of multi-position switches. So I'm looking at making a custom driver with similar capabilities to the Linux input API.
Two questions:
From a driver design point-of-view, why is there that difference in behaviour between Linux input API and Linux serial devices? I reckon it could be useful for multiple processes to all be able to open one serial port and all listen to incoming bytes.
What is a good way to write a Linux character device driver so that it's like the Linux input API, so multiple processes can open the device and read all the data?

The distinction is partly historical and partly due to the different expectation models.
The event subsystem is designed for unidirectional notification of simple events from multiple writers into the system with very little (or no) configuration options.
The tty subsystem is intended for bidirectional end-to-end communication of potentially large amounts of data and provides a reasonably flexible (albeit fairly baroque) configuration mechanism.
Historically, the tty subsystem was the main mechanism of communicating with the system: you plug your "teletype" into a serial port and bits went in and out. Different teletypes from different vendors used different protocols and thus the termios interface was born. To make the system perform well in a multi-user context, buffering was added in the kernel (and made configurable). The expectation model of the tty subsystem is that of a point-to-point link between moderately intelligent endpoints who will agree on what the data passing between them will look like.
While there are circumstances where "single writer, multiple readers" would make sense in the tty subsystem (GPS receiver connected to a serial port, continually reporting its position, for instance), that's not the main purpose of the system. But you can easily accomplish this "multiple readers" in userspace.
The event system on the other hand, is basically an interrupt mechanism intended for things like mice and keyboards. Unlike teletypes, input devices are unidirectional and provide little or no control over the data they produce. There is also little point in buffering the data. Nobody is going to be interested in where the mouse moved ten minutes ago.
I hope that answers your first question.
For your second question: "it depends". What do you want to accomplish? And what is the "longevity" of the data? You also have to ask yourself whether it makes sense to put the complexity in the kernel or if it wouldn't be better to put it in userspace.
Getting data out to multiple readers isn't particularly difficult. You could create a receive buffer per reader and fill each of them as the data comes in. Things get a little more interesting if the data comes in faster than the readers can consume it, but even that is mostly a solved problem. Look at the network stack for inspiration!
If your device is simple and just produces events, maybe you just want to be an input driver?
Your second question is a lot more difficult to answer without knowing more about what you want to accomplish.
Update after you added your specific goal:
When I do position switches, I usually just create a character device and implement poll and read. If you want to be fancy and have a lot of switches, you could do mmap but I wouldn't bother.
Userspace just opens your /dev/foo and reads the current state and starts polling. When your switches change state, you just wake up the readers and they they'll read again. All your readers will wake up, they'll all read the new state and everyone will be happy.
Be careful to only wake up readers when your switches are 'settled'. Many position switches are very noisy and they'll bounce around a fair bit.
In other words: I would ignore the input system altogether for this. As you surmise, position switches are not really "inputs".

How a character device handles these kinds of semantics is completely up to the driver to define and implement.
It would certainly be possible, for example, to implement a driver for a serial device that will deliver all read data to every process that has the character driver open. And it would also be possible to implement an input device driver that delivers events to only one process, whichever one is queued up to receive the latest event. It's all a matter of coding the appropriate implementation.
The difference is that it all comes down to a simple question: "what makes sense". For a serial device, it's been decided that it makes more sense to handle any read data by a single process. For an input device, it's been decided that it makes more sense to deliver all input events to every process that has the input device open. It would be reasonable to expect that, for example, one process might only care about a particular input event, say pointer button #3 pressed, while another process wants to process all pointer motion events. So, in this situation, it might make more sense to distribute all input events to all concerned parties.
I am ignoring some side issues, for simplicity, like in the situation of serial data being delivered to all reading processes what should happen when one of them stops reading from the device. That's also something that would be factored in, when deciding how to implement the semantics of a particular device.

What is a good way to write a Linux character device driver so that it's like the Linux input API, so multiple processes can open the device and read all the data?
See the .open member of struct file_operations for the char device. Whenever userspace opens the device, then the .open function is called. It can add the open file to a list of open files for the device (and then .release removes it).
The char device data struct should most likely use a kernel struct list_head to keep a list of open files:
struct my_dev_data {
...
struct cdev cdev;
struct list_head file_open_list;
...
}
Data for each file:
struct file_data {
struct my_dev_data *dev_data;
struct list_head file_open_list;
...
}
In the .open function, add the open file to dev_data->file_open_list (use a mutex to protect these operations as needed):
static int my_dev_input_open(struct inode * inode, struct file * filp)
{
struct my_dev_data *dev_data;
dev_data = container_of(inode->i_cdev, struct my_dev_data, cdev);
...
/* Allocate memory for file data and channel data */
file_data = devm_kzalloc(&dev_data->pdev->dev,
sizeof(struct file_data), GFP_KERNEL);
...
/* Add open file data to list */
INIT_LIST_HEAD(&file_data->file_open_list);
list_add(&file_data->file_open_list, &dev_data->file_open_list);
...
file_data->dev_data = dev_data;
filp->private_data = file_data;
}
The .release function should remove the open file from dev_data->file_open_list, and release the memory of file_data.
Now that the .open and .release functions maintain the list of open files, it is possible for all open files to read data. Two possible strategies:
A separate read buffer for each open file. When data is received, it is copied into the buffers of all open files.
A single read buffer for the device, but a separate read pointer for each open file. Data can be freed from the buffer once it has been read through all open files.

Serial to input/event
You could try to look into serial mouse driver source code.
This seem to be what you're searching for: from a TTYSx build a input/event device.
Simplier: creating a server, instead of a driver.
Historically, the 1st character device I remember is /dev/lp0.
To be able to write on it from many source, without overlap or other conflict,
a LPR server was wrotten.
To share a device, you have to
open this device in exclusive (rw) mode.
Create a socket (un*x or TCP) to listen on
redirect request from socket's clients to the device and maybe
store device status (from reading device's answers)
send device status to socket's clients when required.

How do I get amount of queued data for UDP socket?

To see how well I'm doing in processing incoming data, I'd like to measure the queue length at my TCP and UDP sockets.
I know that I can get the queue size via SO_RCVBUF socket option, and that ioctl(<sockfd>, SIOCINQ, &<some_int>) tells me the information for TCP sockets. But for UDP the SIOCINQ/FIONREAD ioctl returns only the size of next pending datagram. Is there a way how to get queue size for UDP, without having to parse system tables such as /proc/net/udp?

FWIW, I did some experiments to map out the behavior of FIONREAD on different platforms.
Platforms where FIONREAD returns all the data pending in a SOCK_DGRAM socket:
Mac OS X, NetBSD, FreeBSD, Solaris, HP-UX, AIX, Windows
Platforms where FIONREAD returns only the bytes for the first pending datagram:
Linux
It might also be worth noting that some implementations include headers or other overhead bytes in the count, while others only count the payload bytes. Linux appears to return the payload size, not including IP headers.

As ldx mentioned, it is not supported through ioctl or getsockopt.
It seems to me that the current implementation of SIOCINQ was aimed to determine how much buffer is needed to read the entire waiting buffer (but I guess it is not so useful for that, as it can change between the read of it to the actual buffer read).
There are many other telemetries which are not supported though such system calls, I guess there is no real need in normal production usage.
You can check the drops/errors through "netstat -su" , or better using SNMP (udpInErrors) if you just want to monitor the machine state.
BTW: You always have the option to hack in the Kernel code and add this value (or others).

Serial port : not able to write big chunk of data

I am trying to send text data from one PC to other using Serial cable. One of the PC is running linux and I am sending data from it using write(2) system call. The log size is approx 65K bytes but the write(2) system call returns some 4K bytes (i.e. this much amount of data is getting transferred). I tried breaking the data in chunks of 4K but write(2) returns -1.
My question is that "Is there any buffer limit for writing data on serial port? or can I send data of any size?. Also do I need to continously read data from other PC as I write 4K chunk of data"
Do I need to do any special configuration in termios structure for sending (huge) data?

The transmit buffer is one page (took a look at Linux 2.6.18 sources) - which is 4K in most (if not all) cases.
The other end must read (don't know the size of the receive buffer), but more importantly you should not write faster than the serial port can transmit, if you are using 115200 bps 8-N-1 you can write the 4K chunk approximately 3 times a second. (115200 / 9 / 4096 = 3.125)

Yes, there is a buffer limit - but when you reach that limit, the write() should block.
When write() returns -1, what is errno set to?

Make sure that the receiver is reading.
You should update the current position it your buffer from the write(), and continue the next write from there. (Applies to all writes(), regardless if the fd is a serial port, tcp socket or a file.)
If you get an error back for subsequent writes. Judging by the manpage, its safe to retry the writes for the following errnos: EAGAIN, EINTR, and probably ENOSPC. Use perror() to see what you get. (..and post it, I am curious.)
EFBIG would seem to indicate that you are trying to write using a buffer (or rather count) that is too large, but that is probably much larger than 64k.
If the internal buffer is filled up, because you are writing to fast, try to (nano)sleep a little between the writes. There are several clever ways of doing this (like tcp does), but if the rate is known, just write at a fixed rate.
If you think the receiver is actually reading, but not much happens, have a look at the serial ports flow-control options and if the cable is wired for DTS/RTS.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string