What does ip_summed exactly mean in skb? - linux

The sk_buff has a member ip_summed commented with
#ip_summed: Driver fed us an IP checksum
It seems to indicate the checksum status of IP/L3 layer from its name.
But according to it's possible value such as:
CHECKSUM_UNNECESSARY
The hardware you’re dealing with doesn’t calculate the full checksum (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums for specific protocols. For such packets it will set CHECKSUM_UNNECESSARY if their checksums are okay. sk_buff.csum is still undefined in this case though. A driver or device must never modify the checksum field in the packet even if checksum is verified.
CHECKSUM_UNNECESSARY is applicable to following protocols:
TCP: IPv6 and IPv4.
UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a zero UDP checksum for either IPv4 or IPv6, the networking stack may perform further validation in this case.
GRE: only if the checksum is present in the header.
SCTP: indicates the CRC in SCTP header has been validated.
FCOE: indicates the CRC in FC frame has been validated.
in https://docs.kernel.org/networking/skbuff.html it seems to indicate the checksum status of L4 layer.
Besides, in ip_rcv() it seems the ip_summed is not used before calculating the checksum of IP header.
So is there any reason the member is named as ip_summed rather than l4_summed (maybe)?

We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).
if (skb->ip_summed == CHECKSUM_COMPLETE)
skb->csum = csum_block_sub(skb->csum,
csum_partial(start, len, 0), off);
else if (skb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_start_offset(skb) < 0)
skb->ip_summed = CHECKSUM_NONE;

Related

When working with raw sockets (Layer 2), does the kernel generate the frame check sequence (FCS), or do I need to generate it and append it myself? [duplicate]

As it is implied by this question, it seems that checksum is calculated and verified by ethernet hardware, so it seems highly unlikely that it must be generated by software when sending frames using an AF_PACKET socket, as seem here and here. Also, I don't think it can be received from the socket nor by any simple mean, since even Wireshark doesn't display it.
So, can anyone confirm this? Do I really need to send the checksum myself as shown in the last two links? Will checksum be created and checked automatically by the ethernet adaptor?
No, you do not need to include the CRC.
When using a packet socket in Linux using socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL) ), you must provide the layer 2 header when sending. This is defined by struct ether_header in netinet/if_ether.h and includes the destination host, source host, and type. The frame check sequence is not included, nor is the preamble, start of frame delimiter, or trailer. These are added by the hardware.
On Linux, if you mention socket(AF_PACKET, SOCK_RAW, htobe16(ETH_P_ALL)) similar case, you don't need to calculate ethernet checksum, NIC hardware/driver will do it for you. That means you need to offer whole data link layer frame except checksum before send it to raw socket.

Specify MTU value

I'm trying to pentest some IPSEC implementation for a uni project, and following this guide I'm stuck at:
Step 1 (common): Forging an ICMP PTB packet from the untrusted network The attacker first has to forge an appropriate ICMP PTB packet (a single packet is sufficient). This is done by eavesdropping a valid packet from the IPsec tunnel on the untrusted network. Then the attacker forges an ICMP PTB packet, specifying a very small MTU value equal or smaller than 576 with IPv4 (resp. 1280 with IPv6). The attacker can use 0 for instance. This packet spoofs the IP address of a router of the untrusted network (in case the source IP address is checked), and in order to bypass the IPsec protection mechanism against blind attacks, it includes as a payload a part of the outer IP packet that has just been eavesdropped. This is the only packet an attacker needs to send. None of the following steps involve the attacker.
I know what MTU is, but what does the bold statement mean?
How do I set the MTU size of a packet with scapy?
It means that I have to set the size of a IP packet less than 576 bytes?
It's already set to 140 B,at least it shows this with len command.
There's something that I didn't get right, maybe I have to set the fragmentation?
I know nothing about the subject, but some quick searching seems to indicate that it's referring to an IPv6 ICMP packet with a type of 2 ("packet too big").
Then from some poking around scapy, this appears to be how you'd create one:
from scapy.layers.inet6 import ICMPv6PacketTooBig
icmp_ptb = ICMPv6PacketTooBig(mtu=0)
Of course though, you'll need to do some testing to verify this.

Update TCP/IP checksum in headers

I am trying to modify the source and destination address in the IP header manually in kernel when sending the packet. After that, I need to recalculate both IP checksum and TCP checksum.
I am doing it in the following way.
iph = ip_hdr(skb);
iph->saddr = mysaddr;
iph->daddr = mydaddr;
tcph= tcp_hdr(skb);
__tcp_v4_send_check(skb, iph->saddr, iph->daddr);
iph->tot_len = htons(skb->len);
ip_send_check(iph);
But at the receiver, the checksum always fails at TCP layer while it can pass IP layer.
I did much debugs, and found that during normal process, when the packet arrives, the skb->ip_summed is generally CHECKSUM_UNNECESSARY, but if I do the modification at the sender, then this value will be CHECKSUM_NONE when arriving at the receiver.
Can anybody give me some suggestion? Thanks.
As for CHECKSUM_UNNECESSARY and CHECKSUM_NONE, too much details about them, can not describe them in simple word, better read:
<<Understanding.Linux.Network.Internals >> / 19.1.1.2. sk_buff structure.
Post my guess here (hope it helpful):
Since at sender side, you see generally CHECKSUM_UNNECESSARY, this might means your network card is probably able to do checksum verification and computing checksum in hardware.
By default, if TCP stack seeing your network card has this hardware capability, it would not bother computing checksum itself in software. But set CHECKSUM_PARTIAL in egress packet, so that network interface driver would instruct its hardware to computing checksum.
In your case, you compute the checksum by yourself, and should set CHECKSUM_NONE, so network interface would not compute the checksum again for you ( if doing so, break your checksum of modified packet ).
you can just use any packet capture tool (like wireshark) with a hub connecting between sender and receiver, to see if the checksum in captured packets are broken.

Is ethernet checksum exposed via AF_PACKET?

As it is implied by this question, it seems that checksum is calculated and verified by ethernet hardware, so it seems highly unlikely that it must be generated by software when sending frames using an AF_PACKET socket, as seem here and here. Also, I don't think it can be received from the socket nor by any simple mean, since even Wireshark doesn't display it.
So, can anyone confirm this? Do I really need to send the checksum myself as shown in the last two links? Will checksum be created and checked automatically by the ethernet adaptor?
No, you do not need to include the CRC.
When using a packet socket in Linux using socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL) ), you must provide the layer 2 header when sending. This is defined by struct ether_header in netinet/if_ether.h and includes the destination host, source host, and type. The frame check sequence is not included, nor is the preamble, start of frame delimiter, or trailer. These are added by the hardware.
On Linux, if you mention socket(AF_PACKET, SOCK_RAW, htobe16(ETH_P_ALL)) similar case, you don't need to calculate ethernet checksum, NIC hardware/driver will do it for you. That means you need to offer whole data link layer frame except checksum before send it to raw socket.

libpcap setfilter() function and packet loss

this is my first question here #stackoverflow.
I'm writing a monitoring tool for some VoIP production servers, particularly a sniff tool that allows to capture all traffic (VoIP calls) that match a given pattern using pcap library in Perl.
I cannot use poor selective filters like e.g. "udp" and then do all the filtering in my app's code, because that would involve too much traffic and the kernel wouldn't cope reporting packet loss.
What I do then is to iteratively build the more selective filter possible during the capture. At the beginning I capture only (all) SIP signalling traffic and IP fragments (the pattern match has to be done at application level in any case) then when I find some information about RTP into SIP packets, I add 'or' clauses to the actual filter-string with specific IP and PORT and re-set the filter with setfilter().
So basically something like this:
Initial filter : "(udp and port 5060) or (udp and ip[6:2] & 0x1fff != 0)" -> captures all SIP traffic and IP fragments
Updated filter : "(udp and port 5060) or (udp and ip[6:2] & 0x1fff != 0) or (host IP and port PORT)" -> Captures also the RTP on specific IP,PORT
Updated filter : "(udp and port 5060) or (udp and ip[6:2] & 0x1fff != 0) or (host IP and port PORT) or (host IP2 and port PORT2)" -> Captures a second RTP stream as well
And so on.
This works quite well, as I'm able to get the 'real' packet loss of RTP streams for monitoring purposes, whereas with a poor selective filter version of my tool, the RTP packet loss percentage wasn't reliable because there was some packets missing due to packet drop by kernel.
But let's get to the drawback of this approach.
Calling setfilter() while capturing involves the fact that libpcap drops packets received "while changing the filter" as stated in code comments for function set_kernel_filter() into pcap-linux.c (checked libpcap version 0.9 and 1.1).
So what happens is that when I call setfilter() and some packets arrive IP-fragmented, I do loose some fragments, and this is not reported by libpcap statistics at the end: I spotted it digging into traces.
Now, I understand the reason why this action is done by libpcap, but in my case I definitely need not to have any packet drop (I don't care about getting some unrelated traffic).
Would you have any idea on how to solve this problem that is not modifying libpcap's code?
What about starting up a new process with the more specific filter. You could have two parallel pcap captures going at once. After some time (or checking that both received the same packets) you could stop the original.
Can you just capture all RTP traffic?
From capture filters the suggestion for RTP traffic is:
udp[1] & 1 != 1 && udp[3] & 1 != 1 && udp[8] & 0x80 == 0x80 && length < 250
As the link points out you will get a few false positives where DNS and possibly other UDP packets occassionally contain the header byte, 0x80, used by RTP packets, however the number should be negligible and not enough to cause kernel drops.
Round hole, square peg.
You have a tool that doesn't quite fit your need.
Another option is to do a first-level filter (as above, that captures a lot more than wanted) and pipe it into an another tool that implements the finer filter you want (down to the per-call case). If that first-level filter is too much for the kernel due to heavy RTP traffic, then you may need to do something else like keep a stable of processes to capture individual calls (so you're not changing the filter on the "main" process; it's simply instructing the others how to set their filters.)
Yes, this may mean merging captures, either on the fly (pass them all to a "save the capture" process) or after the fact.
You do realize that you may well miss RTP packets anyways if you don't install your filters fast. Don't forget that RTP packets could come in for the originator before the 200 OK comes in (or right together), and they may go back to the answerer before the ACK (or on top of it). Also don't forget INVITE with no SDP (offer in the 200 OK, answer in ACK). Etc, etc. :-)

Resources