I understand this question has been discussed many times: Should I use libpcap or PF_PACKET (the data link socket) to capture packets?
Based on my research, libpcap is suggested over PF_PACKET almost everywhere, mainly due to its portability.
However, for my current project (which is used in a production system), portability is not a concern at all, all I care about is performance (speed, packet loss ratio). My program is running on CentOS 5.10 (kernel 2.6.18)
As far as I know, libpcap put a timestamp on each packet. Does this cause big performance loss?
Are there other factors that make libpcap unsuitable in a high-speed network?
As far as I know, libpcap put a timestamp on each packet.
No, libpcap gets a timestamp for the packet from the OS packet capture mechanism that it uses - which, on Linux is...
...PF_PACKET sockets.
The Linux kernel time stamps incoming packets. PF_PACKET sockets have multiple ways of reading from them:
regular socket receives, for which you can either get a time stamp with an explicit ioctl (so you can avoid fetching it to userland, but you can't avoid the kernel time stamping the packet in the first place; libpcap, when using regular socket receives, always asks for the time stamp);
memory-mapped access, which always supplies the time stamp.
Libpcap uses memory-mapped access whenever it's available; if you care about capture performance, you probably want to do so as well. It's not easy to use, however.
Related
I am trying to write some sort of very basic packet filtering in Linux (Ubuntu) user space.
Is it possible to drop down packets in user space via c program using raw socket (AF_PACKET), without any kernel intervention (such as writing kernel module) and net filtering?
Thanks a lot
Tali
It is possible (assuming I understand what you're asking). There are a number of "zero-copy" driver implementations that allow user-space to obtain a large memory-mapped buffer into which (/ from which) packets are directly DMA'd.
That pretty much precludes having the kernel process those same packets though (possible but very difficult to properly coordinate user-space packet sniffing with kernel processing of the same packets). But it's fine if you're creating your own IDS/IPS or whatever and don't need to "terminate" connections on the local machine.
It would definitely not be the standard AF_PACKET; you have to either create your own or use an existing implementation: look into netmap, DPDK, and PF_RING (maybe PF_RING/ZC? not sure). I worked on a couple of proprietary implementations in a previous career so I know it's possible.
The basic idea is either (1) completely duplicate everything the driver is responsible for -- that is, move the driver implementation completely into user space (DPDK basically does this). This is straight-forward on paper, but is a lot of work and makes the driver pretty much fully custom.
Or (2) modify driver source so that key network buffer allocation requests get satisfied with an address that is also mmap'd by the user-space process. You then have the problem of communicating buffer life-cycles / reference counts between user-space and kernel. That's very messy but can be done and is probably less work overall. (I dunno -- there may be a way to automate this latter method if you're clever enough -- I haven't been in this space in some years.)
Whichever way you go, there are several pieces you need to put together to do this right. For example, if you want really high performance, you'll need to use the adapter's "RSS" type mechanisms to split the traffic into multiple queues and pin each to a particular CPU -- then make sure the corresponding application components are pinned to the same CPU.
All that being said, unless your need is pretty severe, you're best staying with plain old AF_PACKET.
You can use iptable rules to drop packets for a given criteria, but dropping using packet filters is not possible, because the packet filters get a copy of the packet while the original packet flows through usual path.
I’m working on a packet reshaping project in Linux using the BeagleBone Black. Basically, packets are received on one VLAN, modified, and then are sent out on a different VLAN. This process is bidirectional - the VLANs are not designated as being input-only or output-only. It’s similar to a network bridge, but packets are altered (sometimes fairly significantly) in-transit.
I’ve tried two different methods for accomplishing this:
Creating a user space application that opens raw sockets on both
interfaces. All packet processing (including bridging) is handled in
the application.
Setting up a software bridge (using the kernel
bridge module) and adding a kernel module that installs a netfilter
hook in post routing (NF_BR_POST_ROUTING). All packet processing is
handled in the kernel.
The second option appears to be around 4 times faster than the first option. I’d like to understand more about why this is. I’ve tried brainstorming a bit and wondered if there is a substantial performance hit in rapidly switching between kernel and user space, or maybe something about the socket interface is inherently slow?
I think the user application is fairly optimized (for example, I’m using PACKET_MMAP), but it’s possible that it could be optimized further. I ran perf on the application and noticed that it was spending a good deal of time (35%) in v7_flush_kern_dcache_area, so perhaps this is a likely candidate. If there are any other suggestions on common ways to optimize packet processing I can give them a try.
Context switches are expensive and kernel to user space switches imply a context switch. You can see this article for exact numbers, but the stated durations are all in the order of microseconds.
You can also use lmbench to benchmark the real cost of context switches on your particular cpu.
The performance of the user space application depends on the used syscall to monitor the sockets too. The fastest syscall is epoll() when you need to handle a lot of sockets. select() will perform very poor, if you handle a lot of sockets.
See this post explaining it:
Why is epoll faster than select?
I've just read in these answers about two options for developing packet filters in linux.
The first is using iptables and netfilter, probably with NFQUEUE and libnetfilter_queue library.
The second is by using BPF (Berkeley Packet Filter), that seems in a quick reading to have similar capabilities for filtering purposes.
So, which of these alternatives is a better way to create a packet filter? What are the differences? My software is going to run as a gateway proxy, or "man-in-the-middle" that should receive a packet from one computer (with destination address to another one, not the filter's local address), and send it out after some filtering.
Thanks a lot!
Though my understanding is limited to the theoretical, I've done some reading while debugging the Kubernetes networking implementation and can thus take a stab at answering this.
Broadly, both netfilter and eBPF (the successor to BPF) implement a virtual machine that execute some logic while processing packets. netfilter's implementation appears to strive for compatibility with iptables previous implementation, being essentially a more performant successor to iptables.
However, there are still performance problems when using iptables -- particularly when there are large sets of iptables rules. The way eBPF is structured can alleviate some of these performance problems; specifically:
eBPF can be offloaded to a "smart nic"
eBPF can be structured to lookup rules more efficiently
Though it was initially used for network processing, eBPF is also being used for kernel instrumentation (sysdig, iovisor). It has a far larger set of use cases, but because of this, is likely a much tougher learning curve.
So, in summary:
Use what you're familiar with, unless you hit perf problems then
Look at eBPF
Relevant:
https://cilium.io/blog/2018/11/20/fb-bpf-firewall/
https://www.youtube.com/watch?v=4-pawkiazEg
https://ferrisellis.com/posts/ebpf_past_present_future/
https://lwn.net/Articles/747551/
Notes:
eBPF is the successor to cBPF, and has replaced it in the kernel
I refer to eBPF explicitly here out of habit
i'm capturing network packets( a transport stream) along with its arrival time using winpcap library. But I'm facing some issues.Whenever I play audio on my machine or copy a large file from network, the timing information of my captured packets gets distorted.Some packets timestamp are very close to each other while others are a bit far.Is there any solution (software/hardware) to rectify this.I need accurate timestamping of network packets.
You could raise the process priority of the capture application to High using the Task Manager.
But you really need to consider what you are trying to achieve and why. Do you want to know when the packet arrives at the NIC, when it is processed by the kernel, when the kernel places it in the capture program's socket buffer, when the capture program reads it out of its buffer, when the kernel places it in some other programs socket buffer, or when some other program reads it from its socket buffer?
All those time stamps are different, and when the system is under load the differences will necessarily become larger. Timing information from capture program will most likely reflect that time when the capture program read the packet out of its own socket buffer. Increasing the capture application's process priority will make that happen more smoothly, but it will make the handling of packets by any other applications less reliable.
At present we are using Fedora Core 3 for a system we are working on. This system needs to communicate via serial. The timing of communications is timing critical. At present it seems that the serial driver has delays in pushing the data from the 4k fifo into the 16byte hardware uart.
Is there any way in force Linux to treat this action with a higher priority?
Try the using setserial to set the low_latency option.
By default serial ports are optimised for throughput not latency, this option I think lets you change it.
If you have a hard real-time processing requirement, you may be better off using a distribution that's built with that in mind, for example RTLinux.
Consider getting the device vendor to change the protocol to something less stupid where timing doesn't matter.
Having a timing-critical serial protocol, or indeed one which requires you to acknowledge one message before sending the next, is really stupid.
rs232-style serial ports are really slow, and anything which makes them worse is a bad idea.
I wrote a program to control a device which had a stupid protocol - each byte of data was individually acknowledged (duuh!) and the next one wasn't sent until the ack arrived - which meant that the data transfer rate was a tiny fraction of what it should have been.
Look at the zmodem protocol, for example, which is less stupid.
Better still, get the vendor to enter the 1990s and use USB.