libpcap setfilter() function and packet loss - linux

this is my first question here #stackoverflow.
I'm writing a monitoring tool for some VoIP production servers, particularly a sniff tool that allows to capture all traffic (VoIP calls) that match a given pattern using pcap library in Perl.
I cannot use poor selective filters like e.g. "udp" and then do all the filtering in my app's code, because that would involve too much traffic and the kernel wouldn't cope reporting packet loss.
What I do then is to iteratively build the more selective filter possible during the capture. At the beginning I capture only (all) SIP signalling traffic and IP fragments (the pattern match has to be done at application level in any case) then when I find some information about RTP into SIP packets, I add 'or' clauses to the actual filter-string with specific IP and PORT and re-set the filter with setfilter().
So basically something like this:
Initial filter : "(udp and port 5060) or (udp and ip[6:2] & 0x1fff != 0)" -> captures all SIP traffic and IP fragments
Updated filter : "(udp and port 5060) or (udp and ip[6:2] & 0x1fff != 0) or (host IP and port PORT)" -> Captures also the RTP on specific IP,PORT
Updated filter : "(udp and port 5060) or (udp and ip[6:2] & 0x1fff != 0) or (host IP and port PORT) or (host IP2 and port PORT2)" -> Captures a second RTP stream as well
And so on.
This works quite well, as I'm able to get the 'real' packet loss of RTP streams for monitoring purposes, whereas with a poor selective filter version of my tool, the RTP packet loss percentage wasn't reliable because there was some packets missing due to packet drop by kernel.
But let's get to the drawback of this approach.
Calling setfilter() while capturing involves the fact that libpcap drops packets received "while changing the filter" as stated in code comments for function set_kernel_filter() into pcap-linux.c (checked libpcap version 0.9 and 1.1).
So what happens is that when I call setfilter() and some packets arrive IP-fragmented, I do loose some fragments, and this is not reported by libpcap statistics at the end: I spotted it digging into traces.
Now, I understand the reason why this action is done by libpcap, but in my case I definitely need not to have any packet drop (I don't care about getting some unrelated traffic).
Would you have any idea on how to solve this problem that is not modifying libpcap's code?

What about starting up a new process with the more specific filter. You could have two parallel pcap captures going at once. After some time (or checking that both received the same packets) you could stop the original.

Can you just capture all RTP traffic?
From capture filters the suggestion for RTP traffic is:
udp[1] & 1 != 1 && udp[3] & 1 != 1 && udp[8] & 0x80 == 0x80 && length < 250
As the link points out you will get a few false positives where DNS and possibly other UDP packets occassionally contain the header byte, 0x80, used by RTP packets, however the number should be negligible and not enough to cause kernel drops.

Round hole, square peg.
You have a tool that doesn't quite fit your need.
Another option is to do a first-level filter (as above, that captures a lot more than wanted) and pipe it into an another tool that implements the finer filter you want (down to the per-call case). If that first-level filter is too much for the kernel due to heavy RTP traffic, then you may need to do something else like keep a stable of processes to capture individual calls (so you're not changing the filter on the "main" process; it's simply instructing the others how to set their filters.)
Yes, this may mean merging captures, either on the fly (pass them all to a "save the capture" process) or after the fact.
You do realize that you may well miss RTP packets anyways if you don't install your filters fast. Don't forget that RTP packets could come in for the originator before the 200 OK comes in (or right together), and they may go back to the answerer before the ACK (or on top of it). Also don't forget INVITE with no SDP (offer in the 200 OK, answer in ACK). Etc, etc. :-)

Related

tcpreplay is sending packets out of order?

When I use 'tcpreplay' to send packets to my switch, I found that packets are out of order. For example, using tcpreplay -i eth1 test.pcap, I get:
I send packets like **[1,2,3,4,5,……]**,
but switch received **[1,3,4,2,5,……]**.
Does this problem look familiar? How did you solve it?
When you say switch received a different packet order- how are you determining this is the case? I ask because if you are sniffing on the switch port that would seem like a valid way to check for this, but if you're using a SPAN port then yeah, switches can re-order frames in my experience so I'm not that surprised.
When you run tcpdump on the tcpreplay box, which order does it show the packets being sent? Also, is there another switch in between? Because a lot of switches use a "store and forward" approach which can reorder frames (this is also why SPAN ports tend to re-order).
Lastly, tcpreplay always sends packets in order to the kernel/NIC driver/NIC because it processes the pcap file sequentially. If your computer is actually sending frames out of order, then that is happening either in the kernel, NIC driver or NIC hardware/firmware.

Finding out the number of dropped packets in raw sockets

I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.
I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function
First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:

Capturing performance with pcap vs raw socket

When capturing network traffic for debugging, there seem to be two common approaches:
Use a raw socket.
Use libpcap.
Performance-wise, is there much difference between these two approaches? libpcap seems a nice compatible way to listen to a real network connection or to replay some canned data, but does that feature set come with a performance hit?
The answer is intended to explain more about the libpcap.
libpcap uses the PF_PACKET to capture packets on an interface. Refer to the following link.
https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt
From the above link
In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
inefficient. It uses very limited buffers and requires one system call to
capture each packet, it requires two if you want to get packet's timestamp
(like libpcap always does).
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size 
configurable circular buffer mapped in user space that can be used to either
send or receive packets. This way reading packets just needs to wait for them,
most of the time there is no need to issue a single system call. Concerning
transmission, multiple packets can be sent through one system call to get the
highest bandwidth. By using a shared buffer between the kernel and the user
also has the benefit of minimizing packet copies.
performance improvement may vary depending on PF_PACKET implementation is used. 
From https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt -
It is said that TPACKET_V3 brings the following benefits:
 *) ~15 - 20% reduction in CPU-usage
 *) ~20% increase in packet capture rate
The downside of using libpcap -
If an application needs to hold the packet then it may need to make
a copy of the incoming packet.
Refer to manpage of pcap_next_ex.
pcap_next_ex() reads the next packet and returns a success/failure indication. If the packet was read without problems, the pointer
pointed to by the pkt_header argument is set to point to the
pcap_pkthdr struct for the packet, and the pointer pointed to by the
pkt_data argument is set to point to the data in the packet. The
struct pcap_pkthdr and the packet data are not to be freed by the
caller, and are not guaranteed to be valid after the next call to
pcap_next_ex(), pcap_next(), pcap_loop(), or pcap_dispatch(); if the
code needs them to remain valid, it must make a copy of them.
Performance penalty if application only interested in incoming
packets.
PF_PACKET works as taps in the kernel i.e. all the incoming and outgoing packets are delivered to PF_SOCKET.  Which results in an expensive call to packet_rcv for all the outgoing packets.  Since libpcap uses the PF_PACKET, so libpcap can capture all the incoming as well outgoing packets.
if application is only interested in incoming packets then outgoing packets can be discarded by setting pcap_setdirection on the libpcap handle. libpcap internally discards the outgoing packets by checking the flags on the packet metadata.
So in essence, outgoing packets are still seen by the libpcap but only to be discarded later. This is performance penalty for the application which is interested in incoming packets only.
Raw packet works on IP level (OSI layer 3), pcap on data link layer (OSI layer 2). So its less a performance issue and more a question of what you want to capture. If performance is your main issue search for PF_RING etc, that's what current IDS use for capturing.
Edit: raw packets can be either IP level (AF_INET) or data link layer (AF_PACKET), pcap might actually use raw sockets, see Does libpcap use raw sockets underneath them?

Alternative to pcap (Linux)

On a Linux router I wrote a C-program which uses pcap to get the IP header, and length of the packet. In that way I am able to gather statistics and measure bandwidth based on IP. Pretty neat. :-)
Now the traffic and number of users has grown, and the old program starts to struggle. That is, the router struggles to cope with the massive amount of packets. It's over 50000 packets per second all in all in "prime time".
The program itself is pretty optimized. I don't want to show off, but I believe it's as good as it can get. It reads the IP header, and the packet length. It then converts the IP to a index (just a simple subtract), and the length of the packet is stored (accumulated) in an array. Every now and then (actually a SIGALRM) it stores the array in a MySQL database.
My question is: Is there any other way to tap into an ethernet device to get the bit-stream "cheaper" than pcap?
I can of course modify the ethernet driver to include single IP statistics gathering, but that seems a little overkill.
Basically my program is a 'tcpdump' on a busy eth0 and that will eventually kill my router.
Have you considered PF_RING? It's still the pcap-like world, but on steroids - thanks to the zero-copy mechanism:
As you see, there is a kernel module that provides low-level packet copying into the PF_RING buffer, and there is the userland part that allows to access this buffer.
Who needs PF_RING?
Basically everyone who has to handle many packets per second. The term ‘many’ changes according to the hardware you use for traffic analysis. It can range from 80k pkt/sec on a 1,2GHz ARM to 14M pkt/sec and above on a low-end 2,5GHz Xeon. PF_RING not only enables you to capture packets faster, it also captures packets more efficiently preserving CPU cycles....
I highly recommend you to use PF_RING ZC. It could be found under /userland/examples_zc. it is part of pf_ring.
you can handle and capture tens of Gbps traffics in line rate by pf_ring zc.

TCP handshake with SOCK_RAW socket

Ok, I realize this situation is somewhat unusual, but I need to establish a TCP connection (the 3-way handshake) using only raw sockets (in C, in linux) -- i.e. I need to construct the IP headers and TCP headers myself. I'm writing a server (so I have to first respond to the incoming SYN packet), and for whatever reason I can't seem to get it right. Yes, I realize that a SOCK_STREAM will handle this for me, but for reasons I don't want to go into that isn't an option.
The tutorials I've found online on using raw sockets all describe how to build a SYN flooder, but this is somewhat easier than actually establishing a TCP connection, since you don't have to construct a response based on the original packet. I've gotten the SYN flooder examples working, and I can read the incoming SYN packet just fine from the raw socket, but I'm still having trouble creating a valid SYN/ACK response to an incoming SYN from the client.
So, does anyone know a good tutorial on using raw sockets that goes beyond creating a SYN flooder, or does anyone have some code that could do this (using SOCK_RAW, and not SOCK_STREAM)? I would be very grateful.
MarkR is absolutely right -- the problem is that the kernel is sending reset packets in response to the initial packet because it thinks the port is closed. The kernel is beating me to the response and the connection dies. I was using tcpdump to monitor the connection already -- I should have been more observant and noticed that there were TWO replies one of which was a reset that was screwing things up, as well as the response my program created. D'OH!
The solution that seems to work best is to use an iptables rule, as suggested by MarkR, to block the outbound packets. However, there's an easier way to do it than using the mark option, as suggested. I just match whether the reset TCP flag is set. During the course of a normal connection this is unlikely to be needed, and it doesn't really matter to my application if I block all outbound reset packets from the port being used. This effectively blocks the kernel's unwanted response, but not my own packets. If the port my program is listening on is 9999 then the iptables rule looks like this:
iptables -t filter -I OUTPUT -p tcp --sport 9999 --tcp-flags RST RST -j DROP
You want to implement part of a TCP stack in userspace... this is ok, some other apps do this.
One problem you will come across is that the kernel will be sending out (generally negative, unhelpful) replies to incoming packets. This is going to screw up any communication you attempt to initiate.
One way to avoid this is to use an IP address and interface that the kernel does not have its own IP stack using- which is fine but you will need to deal with link-layer stuff (specifically, arp) yourself. That would require a socket lower than IPPROTO_IP, SOCK_RAW - you need a packet socket (I think).
It may also be possible to block the kernel's responses using an iptables rule- but I rather suspect that the rules will apply to your own packets as well somehow, unless you can manage to get them treated differently (perhaps applying a netfilter "mark" to your own packets?)
Read the man pages
socket(7)
ip(7)
packet(7)
Which explain about various options and ioctls which apply to types of sockets.
Of course you'll need a tool like Wireshark to inspect what's going on. You will need several machines to test this, I recommend using vmware (or similar) to reduce the amount of hardware required.
Sorry I can't recommend a specific tutorial.
Good luck.
I realise that this is an old thread, but here's a tutorial that goes beyond the normal SYN flooders: http://www.enderunix.org/docs/en/rawipspoof/
Hope it might be of help to someone.
I can't help you out on any tutorials.
But I can give you some advice on the tools that you could use to assist in debugging.
First off, as bmdhacks has suggested, get yourself a copy of wireshark (or tcpdump - but wireshark is easier to use). Capture a good handshake. Make sure that you save this.
Capture one of your handshakes that fails. Wireshark has quite good packet parsing and error checking, so if there's a straightforward error it will probably tell you.
Next, get yourself a copy of tcpreplay. This should also include a tool called "tcprewrite".
tcprewrite will allow you to split your previously saved capture files into two - one for each side of the handshake.
You can then use tcpreplay to play back one side of the handshake so you have a consistent set of packets to play with.
Then you use wireshark (again) to check your responses.
I don't have a tutorial, but I recently used Wireshark to good effect to debug some raw sockets programming I was doing. If you capture the packets you're sending, wireshark will do a good job of showing you if they're malformed or not. It's useful for comparing to a normal connection too.
There are structures for IP and TCP headers declared in netinet/ip.h & netinet/tcp.h respectively. You may want to look at the other headers in this directory for extra macros & stuff that may be of use.
You send a packet with the SYN flag set and a random sequence number (x). You should receive a SYN+ACK from the other side. This packet will have an acknowledgement number (y) that indicates the next sequence number the other side is expecting to receive as well as another sequence number (z). You send back an ACK packet that has sequence number x+1 and ack number z+1 to complete the connection.
You also need to make sure you calculate appropriate TCP/IP checksums & fill out the remainder of the header for the packets you send. Also, don't forget about things like host & network byte order.
TCP is defined in RFC 793, available here: http://www.faqs.org/rfcs/rfc793.html
Depending on what you're trying to do it may be easier to get existing software to handle the TCP handshaking for you.
One open source IP stack is lwIP (http://savannah.nongnu.org/projects/lwip/) which provides a full tcp/ip stack. It is very possible to get it running in user mode using either SOCK_RAW or pcap.
if you are using raw sockets, if you send using different source mac address to the actual one, linux will ignore the response packet and not send an rst.

Resources