I am trying to modify the source and destination address in the IP header manually in kernel when sending the packet. After that, I need to recalculate both IP checksum and TCP checksum.
I am doing it in the following way.
iph = ip_hdr(skb);
iph->saddr = mysaddr;
iph->daddr = mydaddr;
tcph= tcp_hdr(skb);
__tcp_v4_send_check(skb, iph->saddr, iph->daddr);
iph->tot_len = htons(skb->len);
ip_send_check(iph);
But at the receiver, the checksum always fails at TCP layer while it can pass IP layer.
I did much debugs, and found that during normal process, when the packet arrives, the skb->ip_summed is generally CHECKSUM_UNNECESSARY, but if I do the modification at the sender, then this value will be CHECKSUM_NONE when arriving at the receiver.
Can anybody give me some suggestion? Thanks.
As for CHECKSUM_UNNECESSARY and CHECKSUM_NONE, too much details about them, can not describe them in simple word, better read:
<<Understanding.Linux.Network.Internals >> / 19.1.1.2. sk_buff structure.
Post my guess here (hope it helpful):
Since at sender side, you see generally CHECKSUM_UNNECESSARY, this might means your network card is probably able to do checksum verification and computing checksum in hardware.
By default, if TCP stack seeing your network card has this hardware capability, it would not bother computing checksum itself in software. But set CHECKSUM_PARTIAL in egress packet, so that network interface driver would instruct its hardware to computing checksum.
In your case, you compute the checksum by yourself, and should set CHECKSUM_NONE, so network interface would not compute the checksum again for you ( if doing so, break your checksum of modified packet ).
you can just use any packet capture tool (like wireshark) with a hub connecting between sender and receiver, to see if the checksum in captured packets are broken.
Related
The sk_buff has a member ip_summed commented with
#ip_summed: Driver fed us an IP checksum
It seems to indicate the checksum status of IP/L3 layer from its name.
But according to it's possible value such as:
CHECKSUM_UNNECESSARY
The hardware you’re dealing with doesn’t calculate the full checksum (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums for specific protocols. For such packets it will set CHECKSUM_UNNECESSARY if their checksums are okay. sk_buff.csum is still undefined in this case though. A driver or device must never modify the checksum field in the packet even if checksum is verified.
CHECKSUM_UNNECESSARY is applicable to following protocols:
TCP: IPv6 and IPv4.
UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a zero UDP checksum for either IPv4 or IPv6, the networking stack may perform further validation in this case.
GRE: only if the checksum is present in the header.
SCTP: indicates the CRC in SCTP header has been validated.
FCOE: indicates the CRC in FC frame has been validated.
in https://docs.kernel.org/networking/skbuff.html it seems to indicate the checksum status of L4 layer.
Besides, in ip_rcv() it seems the ip_summed is not used before calculating the checksum of IP header.
So is there any reason the member is named as ip_summed rather than l4_summed (maybe)?
We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).
if (skb->ip_summed == CHECKSUM_COMPLETE)
skb->csum = csum_block_sub(skb->csum,
csum_partial(start, len, 0), off);
else if (skb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_start_offset(skb) < 0)
skb->ip_summed = CHECKSUM_NONE;
As it is implied by this question, it seems that checksum is calculated and verified by ethernet hardware, so it seems highly unlikely that it must be generated by software when sending frames using an AF_PACKET socket, as seem here and here. Also, I don't think it can be received from the socket nor by any simple mean, since even Wireshark doesn't display it.
So, can anyone confirm this? Do I really need to send the checksum myself as shown in the last two links? Will checksum be created and checked automatically by the ethernet adaptor?
No, you do not need to include the CRC.
When using a packet socket in Linux using socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL) ), you must provide the layer 2 header when sending. This is defined by struct ether_header in netinet/if_ether.h and includes the destination host, source host, and type. The frame check sequence is not included, nor is the preamble, start of frame delimiter, or trailer. These are added by the hardware.
On Linux, if you mention socket(AF_PACKET, SOCK_RAW, htobe16(ETH_P_ALL)) similar case, you don't need to calculate ethernet checksum, NIC hardware/driver will do it for you. That means you need to offer whole data link layer frame except checksum before send it to raw socket.
I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.
I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function
First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:
When capturing network traffic for debugging, there seem to be two common approaches:
Use a raw socket.
Use libpcap.
Performance-wise, is there much difference between these two approaches? libpcap seems a nice compatible way to listen to a real network connection or to replay some canned data, but does that feature set come with a performance hit?
The answer is intended to explain more about the libpcap.
libpcap uses the PF_PACKET to capture packets on an interface. Refer to the following link.
https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt
From the above link
In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
inefficient. It uses very limited buffers and requires one system call to
capture each packet, it requires two if you want to get packet's timestamp
(like libpcap always does).
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
configurable circular buffer mapped in user space that can be used to either
send or receive packets. This way reading packets just needs to wait for them,
most of the time there is no need to issue a single system call. Concerning
transmission, multiple packets can be sent through one system call to get the
highest bandwidth. By using a shared buffer between the kernel and the user
also has the benefit of minimizing packet copies.
performance improvement may vary depending on PF_PACKET implementation is used.
From https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt -
It is said that TPACKET_V3 brings the following benefits:
*) ~15 - 20% reduction in CPU-usage
*) ~20% increase in packet capture rate
The downside of using libpcap -
If an application needs to hold the packet then it may need to make
a copy of the incoming packet.
Refer to manpage of pcap_next_ex.
pcap_next_ex() reads the next packet and returns a success/failure indication. If the packet was read without problems, the pointer
pointed to by the pkt_header argument is set to point to the
pcap_pkthdr struct for the packet, and the pointer pointed to by the
pkt_data argument is set to point to the data in the packet. The
struct pcap_pkthdr and the packet data are not to be freed by the
caller, and are not guaranteed to be valid after the next call to
pcap_next_ex(), pcap_next(), pcap_loop(), or pcap_dispatch(); if the
code needs them to remain valid, it must make a copy of them.
Performance penalty if application only interested in incoming
packets.
PF_PACKET works as taps in the kernel i.e. all the incoming and outgoing packets are delivered to PF_SOCKET. Which results in an expensive call to packet_rcv for all the outgoing packets. Since libpcap uses the PF_PACKET, so libpcap can capture all the incoming as well outgoing packets.
if application is only interested in incoming packets then outgoing packets can be discarded by setting pcap_setdirection on the libpcap handle. libpcap internally discards the outgoing packets by checking the flags on the packet metadata.
So in essence, outgoing packets are still seen by the libpcap but only to be discarded later. This is performance penalty for the application which is interested in incoming packets only.
Raw packet works on IP level (OSI layer 3), pcap on data link layer (OSI layer 2). So its less a performance issue and more a question of what you want to capture. If performance is your main issue search for PF_RING etc, that's what current IDS use for capturing.
Edit: raw packets can be either IP level (AF_INET) or data link layer (AF_PACKET), pcap might actually use raw sockets, see Does libpcap use raw sockets underneath them?
As it is implied by this question, it seems that checksum is calculated and verified by ethernet hardware, so it seems highly unlikely that it must be generated by software when sending frames using an AF_PACKET socket, as seem here and here. Also, I don't think it can be received from the socket nor by any simple mean, since even Wireshark doesn't display it.
So, can anyone confirm this? Do I really need to send the checksum myself as shown in the last two links? Will checksum be created and checked automatically by the ethernet adaptor?
No, you do not need to include the CRC.
When using a packet socket in Linux using socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL) ), you must provide the layer 2 header when sending. This is defined by struct ether_header in netinet/if_ether.h and includes the destination host, source host, and type. The frame check sequence is not included, nor is the preamble, start of frame delimiter, or trailer. These are added by the hardware.
On Linux, if you mention socket(AF_PACKET, SOCK_RAW, htobe16(ETH_P_ALL)) similar case, you don't need to calculate ethernet checksum, NIC hardware/driver will do it for you. That means you need to offer whole data link layer frame except checksum before send it to raw socket.