Finding out the number of dropped packets in raw sockets

Finding out the number of dropped packets in raw sockets - linux

I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.

I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function

First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:

Related

Dropping packets with netcat using a UDP transfer?

I'm working on sending large data files between two Linux computers via a 10 Gigabit Ethernet cable and netcat with a UDP transfer, but seem to be having issues.
After running several tests, I've come to the conclusion that netcat is the issue. I've tested the UDP transfer using [UDT][1], [Tsunami-UDP]2, and a Python UDT transfer as well, and all of which have not had any packet loss issues.
On the server side, we've been doing:
cat "bigfile.txt" | pv | nc -u IP PORT
then on the client side, we've been doing:
nc -u -l PORT > "outputFile.txt"
A few things that we've noticed:
On one of the computers, regardless of whether it's the client or server, it just "hangs". That is to say, even once the transfer is complete, Linux doesn't kill the process and move to the next line in the terminal.
If we run pipe view on the receiving side as well, the incoming data rate is significantly lower than what the sending side thinks it's sending.
Running Wireshark doesn't show any packet loss.
Running the system performance monitor in Linux shows that the incoming data rate (for the receiving side) is the same as the outgoing data rate from the sending side. This is in contrast to what pipe view thinks (see #2)
We're not sure where the issue is with netcat, and if there is a way around it. Any help/insights would be greatly appreciated.
Also, for what it's worth, using netcat with a TCP transfer works fine. And, I do understand that UDP isn't known for reliability, and that packet loss should be expected, but it's the protocol we must use.
Thanks

It could well be that the sending instance is sending the data too fast for the receiving instance. Note that this can occur even if you see no drops on the receiving NIC (as you seem to be saying), because the loss can occur at OS level instead. Your OS could have its UDP buffers overflowing. Run this command:
watch -d "cat /proc/net/snmp | grep -w Udp"
To see if your RcvbufErrors field is non-zero and/or growing while your file transfer is going on.

This answer (How to send only one UDP packet with netcat?) says that nc sends one packet per line. Assuming that's true, this could lead to a significantly higher number of packets than your other transfer mechanisms. Presumably, as #Smeeheey suggested, you're running out of receive buffers on the receiving end.
To cause your sending end to exit, you can add -q 1 to the command line (exit 1 second after seeing end of file).
But there's no way that the the receiving end nc can know when the transfer is complete. This is why these other mechanisms are "protocols" -- they have mechanisms built into them to communicate the bounds of a file. Raw UDP has no concept of end of file.

Tuning the Linux networking stack is a bit complicated, as there are many components to tune to figure out where data is being dropped.
If possible/feasible, I'd recommend that you start by monitoring packet drops throughout the entire network stack. Once you've done that, you can determine where exactly packets are being dropped and then adjust tuning parameters as needed. There are a lot of different files to measure with lots of different fields. I wrote a detailed blog post about monitoring and tuning each component of the Linux networking stack from top to bottom. It's a bit difficult to summarize all the information there, but take a look, I think it can help guide you.

Packet sniffing with Channel hopping in linux

I want to scan the WiFi on b/g interface, and I want to sniff packets on each channel, by spending 100 ms on each channel. One of the biggest requirements I have is not to store the packets I get (because of less disk space), my application will parse the packets, retrieve Tx MAC and RSSI, and would construct the list (MAC, Avg RSSI, #Records) at the end of every minute, and then clear this list and start over again.
I've figured out two ways to do channel hop on linux:
Option 1: Use wi_set_channel(struct wif *, channel number) system call in C, and write the code in C to sniff all the packets
Option 2: Use linux command iw dev wlan0 set channel 4, and use any language like python+scapy OR C to sniff the packets
I'd like to know which is more efficient of the two, if at all, so that the delay/wait for WiFi interface to switch to a different channel is minimal. I suspect that this delay would mean loss of packet while the switch to a different channel happens, is that the case?
I would also like to know some of the other ways to solve this problem in linux.

Answer to your first question us straight forward, use Option1 and have two threads doing the work - one thread populating an in-memory circular buffer with packets collected from channels and second thread processing them in sequence. You can determine best packet discarding algo depending on the measured performance of processing thread and other factors if any.
As for the second question, I would go with the above for being in complete control on exactly how you can tune the algorithm rather than depending on canned processing tools.

Capturing performance with pcap vs raw socket

When capturing network traffic for debugging, there seem to be two common approaches:
Use a raw socket.
Use libpcap.
Performance-wise, is there much difference between these two approaches? libpcap seems a nice compatible way to listen to a real network connection or to replay some canned data, but does that feature set come with a performance hit?

The answer is intended to explain more about the libpcap.
libpcap uses the PF_PACKET to capture packets on an interface. Refer to the following link.
https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt
From the above link
In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
inefficient. It uses very limited buffers and requires one system call to
capture each packet, it requires two if you want to get packet's timestamp
(like libpcap always does).
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
configurable circular buffer mapped in user space that can be used to either
send or receive packets. This way reading packets just needs to wait for them,
most of the time there is no need to issue a single system call. Concerning
transmission, multiple packets can be sent through one system call to get the
highest bandwidth. By using a shared buffer between the kernel and the user
also has the benefit of minimizing packet copies.
performance improvement may vary depending on PF_PACKET implementation is used.
From https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt -
It is said that TPACKET_V3 brings the following benefits:
*) ~15 - 20% reduction in CPU-usage
*) ~20% increase in packet capture rate
The downside of using libpcap -
If an application needs to hold the packet then it may need to make
a copy of the incoming packet.
Refer to manpage of pcap_next_ex.
pcap_next_ex() reads the next packet and returns a success/failure indication. If the packet was read without problems, the pointer
pointed to by the pkt_header argument is set to point to the
pcap_pkthdr struct for the packet, and the pointer pointed to by the
pkt_data argument is set to point to the data in the packet. The
struct pcap_pkthdr and the packet data are not to be freed by the
caller, and are not guaranteed to be valid after the next call to
pcap_next_ex(), pcap_next(), pcap_loop(), or pcap_dispatch(); if the
code needs them to remain valid, it must make a copy of them.
Performance penalty if application only interested in incoming
packets.
PF_PACKET works as taps in the kernel i.e. all the incoming and outgoing packets are delivered to PF_SOCKET. Which results in an expensive call to packet_rcv for all the outgoing packets. Since libpcap uses the PF_PACKET, so libpcap can capture all the incoming as well outgoing packets.
if application is only interested in incoming packets then outgoing packets can be discarded by setting pcap_setdirection on the libpcap handle. libpcap internally discards the outgoing packets by checking the flags on the packet metadata.
So in essence, outgoing packets are still seen by the libpcap but only to be discarded later. This is performance penalty for the application which is interested in incoming packets only.

Raw packet works on IP level (OSI layer 3), pcap on data link layer (OSI layer 2). So its less a performance issue and more a question of what you want to capture. If performance is your main issue search for PF_RING etc, that's what current IDS use for capturing.
Edit: raw packets can be either IP level (AF_INET) or data link layer (AF_PACKET), pcap might actually use raw sockets, see Does libpcap use raw sockets underneath them?

Alternative to pcap (Linux)

On a Linux router I wrote a C-program which uses pcap to get the IP header, and length of the packet. In that way I am able to gather statistics and measure bandwidth based on IP. Pretty neat. :-)
Now the traffic and number of users has grown, and the old program starts to struggle. That is, the router struggles to cope with the massive amount of packets. It's over 50000 packets per second all in all in "prime time".
The program itself is pretty optimized. I don't want to show off, but I believe it's as good as it can get. It reads the IP header, and the packet length. It then converts the IP to a index (just a simple subtract), and the length of the packet is stored (accumulated) in an array. Every now and then (actually a SIGALRM) it stores the array in a MySQL database.
My question is: Is there any other way to tap into an ethernet device to get the bit-stream "cheaper" than pcap?
I can of course modify the ethernet driver to include single IP statistics gathering, but that seems a little overkill.
Basically my program is a 'tcpdump' on a busy eth0 and that will eventually kill my router.

Have you considered PF_RING? It's still the pcap-like world, but on steroids - thanks to the zero-copy mechanism:
As you see, there is a kernel module that provides low-level packet copying into the PF_RING buffer, and there is the userland part that allows to access this buffer.
Who needs PF_RING?
Basically everyone who has to handle many packets per second. The term ‘many’ changes according to the hardware you use for traffic analysis. It can range from 80k pkt/sec on a 1,2GHz ARM to 14M pkt/sec and above on a low-end 2,5GHz Xeon. PF_RING not only enables you to capture packets faster, it also captures packets more efficiently preserving CPU cycles....

I highly recommend you to use PF_RING ZC. It could be found under /userland/examples_zc. it is part of pf_ring.
you can handle and capture tens of Gbps traffics in line rate by pf_ring zc.

I need a TCP option (ioctl) to send data immediately

I've got an unusual situation: I'm using a Linux system in an embedded situation (Intel box, currently using a 2.6.20 kernel.) which has to communicate with an embedded system that has a partially broken TCP implementation. As near as I can tell right now they expect each message from us to come in a separate Ethernet frame! They seem to have problems when messages are split across Ethernet frames.
We are on the local network with the device, and there are no routers between us (just a switch).
We are, of course, trying to force them to fix their system, but that may not end up being feasible.
I've already set TCP_NODELAY on my sockets (I connect to them), but that only helps if I don't try to send more than one message at a time. If I have several outgoing messages in a row, those messages tend to end up in one or two Ethernet frames, which causes trouble on the other system.
I can generally avoid the problem by using a timer to avoid sending messages too close together, but that obviously limits our throughput. Further, if I turn the time down too low, I risk network congestion holding up packet transmits and ending up allowing more than one of my messages into the same packet.
Is there any way that I can tell whether the driver has data queued or not? Is there some way I can force the driver to send independent write calls in independent transport layer packets? I've had a look through the socket(7) and tcp(7) man pages and I didn't find anything. It may just be that I don't know what I'm looking for.
Obviously, UDP would be one way out, but again, I don't think we can make the other end change anything much at this point.
Any help greatly appreciated.

IIUC, setting the TCP_NODELAY option should flush all packets (i.e. tcp.c implements setting of NODELAY with a call to tcp_push_pending_frames). So if you set the socket option after each send call, you should get what you want.

You cannot work around a problem unless you're sure what the problem is.
If they've done the newbie mistake of assuming that recv() receives exactly one message then I don't see a way to solve it completely. Sending only one message per Ethernet frame is one thing, but if multiple Ethernet frames arrive before the receiver calls recv() it will still get multiple messages in one call.
Network congestion makes it practically impossible to prevent this (while maintaining decent throughput) even if they can tell you how often they call recv().

Maybe, set TCP_NODELAY and set your MTU low enough so that there would be at most 1 message per frame? Oh, and add "dont-fragment" flag on outgoing packets

Have you tried opening a new socket for each message and closing it immediately? The overhead may be nauseating,but this should delimit your messages.

In the worst case scenario you could go one level lower (raw sockets), where you have better control over the packets sent, but then you'd have to deal with all the nitty-gritty of TCP.

Maybe you could try putting the tcp stack into low-latency mode:
echo 1 > /proc/sys/net/ipv4/tcp_low_latency
That should favor emitting packets as quickly as possible over combining data. Read the man on tcp(7) for more information.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string