netfilter-like kernel module to get source and destination address - linux

I read this guide to write a kernel module to do simple network filtering.
First, I have no idea of what below text this means, and what's the difference between inbound and outbound data packet(by transportation layer)?
When a packet goes in from wire, it travels from physical layer, data
link layer, network layer upwards, therefore it might not go through
the functions defined in netfilter for skb_transport_header to work.
Second, I hate magic numbers, and I want to replace the 20 (the length of typical IP header) with any function from the linux kernel's utilities(source file).
Any help will be appreciated.

This article is a little outdated now. Text that you don't understand is only applicable to kernel versions below 3.11.
For new kernels (>= 3.11)
If you are sure that your code will only be used with kernels >= 3.11, you can use next code for both input and output packets:
udp_header = (struct udphdr *)skb_transport_header(skb);
Or more elegant:
udp_header = udp_hdr(skb);
It's because transport header is already set up for you in ip_rcv():
skb->transport_header = skb->network_header + iph->ihl*4;
This change was brought by this commit.
For old kernels (< 3.11)
Outgoing packets (NF_INET_POST_ROUTING)
In this case .transport_header field set up correctly in sk_buffer, so it points to actual transport layer header (UDP/TCP). So you can use code like this:
udp_header = (struct udphdr *)skb_transport_header(skb);
or better looking (but actually the same):
udp_header = udp_hdr(skb);
Incoming packets (NF_INET_PRE_ROUTING)
This is the tricky part.
In this case the .transport_header field is not set to the actual transport layer header (UDP or TCP) in sk_buffer structure (that you get in your netfilter hook function). Instead, .transport_header points to IP header (which is network layer header).
So you need to calculate address of transport header by your own. To do so you need to skip IP header (i.e. add IP header length to your .transport_header address). That's why you can see next code in the article:
udp_header = (struct udphdr *)(skb_transport_header(skb) + 20);
So 20 here is just the length of IP header.
It can be done more elegant in this way:
struct iphdr *iph;
struct udphdr *udph;
iph = ip_hdr(skb);
/* If transport header is not set for this kernel version */
if (skb_transport_header(skb) == (unsigned char *)iph)
udph = (unsigned char *)iph + (iph->ihl * 4); /* skip IP header */
else
udph = udp_hdr(skb);
In this code we use an actual IP header size (which is iph->ihl * 4, in bytes) instead of magic number 20.
Another magic number in the article is 17 in next code:
if (ip_header->protocol == 17) {
In this code you should use IPPROTO_UDP instead of 17:
#include <linux/udp.h>
if (ip_header->protocol == IPPROTO_UDP) {
Netfilter input/output packets explanation
If you need some reference about difference between incoming and outgoing packets in netfilter, see the picture below.
Details:
[1]: Some useful code from GitHub
[2]: "Linux Kernel Networking: Implementation and Theory" by Rami Rosen
[3]: This answer may be also useful

Related

netfilter hook is not retrieving complete packet

I'm writing a netfilter module, that deeply inspect the packet. However, during tests I found that netfilter module is not receiving the packet in full.
To verify this, I wrote the following code to dump packet retrieved on port 80 and write the result to dmesg buffer:
const struct iphdr *ip_header = ip_hdr(skb);
if (ip_header->protocol == IPPROTO_TCP)
{
const struct tcphdr *tcp_header = tcp_hdr(skb);
if (ntohs(tcp_header->dest) != 80)
{
return NF_ACCEPT;
}
buff = (char *)kzalloc(skb->len * 10, GFP_KERNEL);
if (buff != NULL)
{
int pos = 0, i = 0;
for (i = 0; i < skb->len; i ++)
{
pos += sprintf(buff + pos, "%02X", skb->data[i] & 0xFF);
}
pr_info("(%pI4):%d --> (%pI4):%d, len=%d, data=%s\n",
&ip_header->saddr,
ntohs(tcp_header->source),
&ip_header->daddr,
ntohs(tcp_header->dest),
skb->len,
buff
);
kfree (buff);
}
}
In virtual machine running locally, I can retrieve the full HTTP request; On Alibaba cloud, and some other OpenStack based VPS provider, the packet is cut in the middle.
To verify this, I execute curl http://VPS_IP on another VPS, and I got the following output in dmesg buffer:
[ 1163.370483] (XXXX):5007 --> (XXXX):80, len=237, data=451600ED000040003106E3983D87A950AC11D273138F00505A468086B44CE19E80180804269300000101080A1D07500A000D2D90474554202F20485454502F312E310D0A486F73743A2033392E3130372E32342E37370D0A4163636570743A202A2F2A0D0A557365722D4167656E743A204D012000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000001E798090F5FFFF8C0000007B00000000E0678090F5FFFF823000003E00000040AE798090F5FFFF8C0000003E000000000000000000000000000000000000000000000000000000000000
When decoded, the result is like this
It's totally weird, everything after User-Agent: M is "gone" or zero-ed. Although the skb->len is 237, but half of the packet is missing.
Any ideas? Tried both PRE_ROUTING and LOCAL_IN, no changes.
It appears that sometimes you are getting a linear skb, and sometimes your skb is not linear. In the latter case you are not reading the full data contents of an skb.
If skb->data_len is zero, then your skb is linear and the full data contents of the skb is in skb->data. If skb->data_len is not zero, then your skb is not linear, and skb->data contains just the the first (linear) part of the data. The length of this area is skb->len - skb->data_len. skb_headlen() helper function calculates that for convenience. skb_is_nonlinear() helper function tells in an skb is linear or not.
The rest of the data can be in paged fragments, and in skb fragments, in this order.
skb_shinfo(skb)->nr_frags tells the number of paged fragments. Each paged fragment is described by a data structure in the array of structures skb_shinfo(skb)->frags[0..skb_shinfo(skb)->nr_frags]. skb_frag_size() and skb_frag_address() helper functions help dealing with this data. They accept the address of the structure that describes a paged fragment. There are other useful helper functions depending on your kernel version.
If the total size of data in paged fragments is less than skb->data_len, then the rest of the data is in skb fragments. It's the list of skb which is attached to this skb at skb_shinfo(skb)->frag_list (see skb_walk_frags() in the kernel).
Please note that there may be that there's no data in the linear part and/or there's no data in the paged fragments. You just need to process data piece by piece in the order just described.

Use setsockopt() to modify kernel structures in TCP?

Is it possible to modify a single member of a kernel struct in TCP? I want to be able to use setsockopt() to update a member of the tcp_info struct in TCP.
I've tried the following:
struct tcp_info info;
unsigned int optlen = sizeof(struct tcp_info);
if (getsockopt(sock, IPPROTO_TCP, TCP_INFO, &info, &optlen) < 0)
printf("Can't get data from getsockopt.\n");
info.retransmits += 10; // random member of tcp_info - as example
if (setsockopt(sock, IPPROTO_TCP, TCP_INFO, (char *) &info, optlen) < 0)
printf("Can't set data with setsockopt.\n");
The call to setsockopt() fails (returns a negative value).
The way I'm trying to solve it (above), given that it had worked - doesn't seem optimal. Is it possible to modify a members value from a struct, without having to fetch and update the entire struct (all of its members)?
You may not set arbitrary values with setsockopt(). It has a finite list of options you may set.
I'll use the FreeBSD kernel in this example, but all of this is similar if not identical in Linux. I will jump to FreBSD's sosetopt() function in sys/kern/uipc_socket.c.
The only valid options you may set are:
SO_ACCEPTFILTER, SO_LINGER, SO_DEBUG, SO_KEEPALIVE, SO_DONTROUTE, SO_USELOOPBACK, SO_BROADCAST, SO_REUSEADDR, SO_REUSEPORT, SO_REUSEPORT_LB, SO_OOBINLINE, SO_TIMESTAMP, SO_BINTIME, SO_NOSIGPIPE, SO_NO_DDP, SO_NO_OFFLOAD, SO_RERROR, SO_SETFIB, SO_USER_COOKIE, SO_SNDBUF, SO_RCVBUF, SO_SNDLOWAT, SO_RCVLOWAT, SO_SNDTIMEO, SO_RCVTIMEO, SO_LABEL, SO_TS_CLOCK, and SO_MAX_PACING_RATE.
That list contain a number of status flags, enabling or disabling features. There are only a few that allow setting of numerical values.
SO_USER_COOKIE - set a user-specified metadata value to a socket.
SO_SNDBUF/SO_RCVBUF - set the allocated buffer sizes for sending and receiving.
SO_SNDLOWAT/SO_RCVLOWAT - set a minimum amount of data to be sent/received per call.
SO_SNDTIMEO/SO_RCVTIMEO - set a timeout for sending/receiving calls.
SO_MAX_PACING_RATE - Instructs the network adapter to limit the transfer rate.
None of these write values directly to kernel structures. To accomplish something of the sort you have request, you will need to modify the kernel. Your other question addresses that objective.

skb->priority and IP::tos and ping -Q

I'm developing some network driver and I'm trying to assign packets to different queues basing on ip::tos value. For testing purposes I'm running:
ping -Q 1 10.0.0.2
to set ip::tos value to 1. The problem I've got is that on this system where I run ping command - outgoing skb has skb->priority==0, but I think it should be 1.
I assumed that setting "-Q 1" will set skb->priority to 1, but it isn't.
Anyone knows why?
First of all, there is no direct mapping between the skb->priority and the IP TOS field. It is done like so in the linux kernel:
sk->sk_priority = rt_tos2priority(val)
...
static inline char rt_tos2priority(u8 tos)
{
return ip_tos2prio[IPTOS_TOS(tos)>>1];
}
(and the ip_tos2prio table can be found in ipv4/route.c).
It seems to me you'll have to set the "TOS" byte to atleast 32 to get skb->priority to anything other than 0.
ping -Q 1 sets the whole TOS byte to 1. Note that TOS is deprecated in favor of DSCP. The 2 low-order bits are used for ECN, while the upper 6 bits are used for the DSCP value (the "priority").
So you likely have to start at 4, to get a DSCP priority of 1, but according to the above table, start at 32 to get skb->priority set as well, as in ping -Q 32 10.0.0.2
However, I'm not sure that will set the skb->priority as well in all cases. If the ping tool crafts packets using raw sockets, it might bypass setting the skb->priority.
However, skb->priority for locally generated packets will be set if one does e.g.
int tos = 32;
setsockopt(sock_fd, IPPROTO_IP, IP_TOS,
&tos, sizeof(tos));
So you might need to cook up a small sample program that does the above before sending packets.
The above answer is right, Let's complete it here
static inline char rt_tos2priority(u8 tos)
{
return ip_tos2prio[IPTOS_TOS(tos)>>1];
}
where IPTOS_TOS is a macro, which ANDs "tos" value with 0x1E
So, If you give TOS as 0xFF, the above return statement reduces to
return ip_tos2prio[(0x1E & 0xFF)>>1];
calculate it further, (0x1E & 0xFF) is equal to 0x1E,
and (0x1E >> 1) gives us 0x0F, which is 15 in decimal.
we can say that above return statement is equal to
return ip_tos2prio[15];
Now "ip_tos2prio" is a predefined array, like this
const __u8 ip_tos2prio[16]={0,0,0,0,2,2,2,2,6,6,6,4,4,4,4};
where each distinct value has a meaning, 0->BESTEFFORT, 2->BULK, 4->INTERACTIVE BULK, 6 ->INTERACTIVE.
get back to the return statement, it returns the 15th element in ip_tos2prio array, which is 4.

Where are ioctl parameters (such as 0x1268 / BLKSSZGET) actually specified?

I am looking for a definitive specification describing the expected arguments and behavior of ioctl 0x1268 (BLKSSZGET).
This number is declared in many places (none of which contain a definitive reference source), such as linux/fs.h, but I can find no specification for it.
Surely, somebody at some point in the past decided that 0x1268 would get the physical sector size of a device and documented that somewhere. Where does this information come from and where can I find it?
Edit: I am not asking what BLKSSZGET does in general, nor am I asking what header it is defined in. I am looking for a definitive, standardized source that states what argument types it should take and what its behavior should be for any driver that implements it.
Specifically, I am asking because there appears to be a bug in blkdiscard in util-linux 2.23 (and 2.24) where the sector size is queried in to a uint64_t, but the high 32-bits are untouched since BLKSSZGET appears to expect a 32-bit integer, and this leads to an incorrect sector size, incorrect alignment calculations, and failures in blkdiscard when it should succeed. So before I submit a patch, I need to determine, with absolute certainty, if the problem is that blkdiscard should be using a 32-bit integer, or if the driver implementation in my kernel should be using a 64-bit integer.
Edit 2: Since we're on the topic, the proposed patch presuming blkdiscard is incorrect is:
--- sys-utils/blkdiscard.c-2.23 2013-11-01 18:28:19.270004947 -0400
+++ sys-utils/blkdiscard.c 2013-11-01 18:29:07.334002382 -0400
## -71,7 +71,8 ##
{
char *path;
int c, fd, verbose = 0, secure = 0;
- uint64_t end, blksize, secsize, range[2];
+ uint64_t end, blksize, range[2];
+ uint32_t secsize;
struct stat sb;
static const struct option longopts[] = {
## -146,8 +147,8 ##
err(EXIT_FAILURE, _("%s: BLKSSZGET ioctl failed"), path);
/* align range to the sector size */
- range[0] = (range[0] + secsize - 1) & ~(secsize - 1);
- range[1] &= ~(secsize - 1);
+ range[0] = (range[0] + (uint64_t)secsize - 1) & ~((uint64_t)secsize - 1);
+ range[1] &= ~((uint64_t)secsize - 1);
/* is the range end behind the end of the device ?*/
end = range[0] + range[1];
Applied to e.g. https://www.kernel.org/pub/linux/utils/util-linux/v2.23/.
The answer to "where is this specified?" does seem to be the kernel source.
I asked the question on the kernel mailing list here: https://lkml.org/lkml/2013/11/1/620
In response, Theodore Ts'o wrote (note: he mistakenly identified sys-utils/blkdiscard.c in his list but it's inconsequential):
BLKSSZGET returns an int. If you look at the sources of util-linux
v2.23, you'll see it passes an int to BLKSSZGET in
sys-utils/blkdiscard.c
lib/blkdev.c
E2fsprogs also expects BLKSSZGET to return an int, and if you look at
the kernel sources, it very clearly returns an int.
The one place it doesn't is in sys-utils/blkdiscard.c, where as you
have noted, it is passing in a uint64 to BLKSSZGET. This looks like
it's a bug in sys-util/blkdiscard.c.
He then went on to submit a patch¹ to blkdiscard at util-linux:
--- a/sys-utils/blkdiscard.c
+++ b/sys-utils/blkdiscard.c
## -70,8 +70,8 ## static void __attribute__((__noreturn__)) usage(FILE *out)
int main(int argc, char **argv)
{
char *path;
- int c, fd, verbose = 0, secure = 0;
- uint64_t end, blksize, secsize, range[2];
+ int c, fd, verbose = 0, secure = 0, secsize;
+ uint64_t end, blksize, range[2];
struct stat sb;
static const struct option longopts[] = {
I had been hesitant to mention the blkdiscard tool in both my mailing list post and the original version of this SO question specifically for this reason: I know what's in my kernel's source, it's already easy enough to modify blkdiscard to agree with the source, and this ended up distracting from the real question of "where is this documented?".
So, as for the specifics, somebody more official than me has also stated that the BLKSSZGET ioctl takes an int, but the general question regarding documentation remained. I then followed up with https://lkml.org/lkml/2013/11/3/125 and received another reply from Theodore Ts'o (wiki for credibility) answering the question. He wrote:
> There was a bigger question hidden behind the context there that I'm
> still wondering about: Are these ioctl interfaces specified and
> documented somewhere? From what I've seen, and from your response, the
> implication is that the kernel source *is* the specification, and not
> document exists that the kernel is expected to comply with; is this
> the case?
The kernel source is the specification. Some of these ioctl are
documented as part of the linux man pages, for which the project home
page is here:
https://www.kernel.org/doc/man-pages/
However, these document existing practice; if there is a discrepancy
between what is in the kernel has implemented and the Linux man pages,
it is the Linux man pages which are buggy and which will be changed.
That is man pages are descriptive, not perscriptive.
I also asked about the use of "int" in general for public kernel APIs, his response is there although that is off-topic here.
Answer: So, there you have it, the final answer is: The ioctl interfaces are specified by the kernel source itself; there is no document that the kernel adheres to. There is documentation to describe the kernel's implementations of various ioctls, but if there is a mismatch, it is an error in the documentation, not in the kernel.
¹ With all the above in mind, I want to point out that an important difference in the patch Theodore Ts'o submitted, compared to mine, is the use of "int" rather than "uint32_t" -- BLKSSZGET, as per kernel source, does indeed expect an argument that is whatever size "int" is on the platform, not a forced 32-bit value.

Crafting an ICMP packet inside a Linux kernel Module

I'm tring to experiment with the ICMP protocol and have created a kernel-module for linux that analyses ICMP packet ( Processes the packet only if if the ICMP code field is a magic number ) . Now to test this module , i have to create a an ICMP packet and send it to the host where this analysing module is running . In fact it would be nice if i could implement it the kernel itself (as a module ) . I am looking for something like a packetcrafter in kernel , I googled it found a lot of articles explaining the lifetime of a packet , rather than tutorials of creating it . User space packetcrafters would be my last resort, that too those which are highly flexible like where i'll be able to set ICMP code etc . And I'm not wary of kernel panics :-) !!!!! Any packet crafting ideas are welcome .
Sir, I strongly advice you against using the kernel module to build ICMP packets.
You can use user-space raw-sockets to craft ICMP packets, even build the IP-header itself byte by byte.
So you can get as flexible as it can get using that.
Please, take a look at this
ip = (struct iphdr*) packet;
icmp = (struct icmphdr*) (packet + sizeof(struct iphdr));
/*
* here the ip packet is set up except checksum
*/
ip->ihl = 5;
ip->version = 4;
ip->tos = 0;
ip->tot_len = sizeof(struct iphdr) + sizeof(struct icmphdr);
ip->id = htons(random());
ip->ttl = 255;
ip->protocol = IPPROTO_ICMP;
ip->saddr = inet_addr(src_addr);
ip->daddr = inet_addr(dst_addr);
if ((sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)) == -1)
{
perror("socket");
exit(EXIT_FAILURE);
}
/*
* IP_HDRINCL must be set on the socket so that
* the kernel does not attempt to automatically add
* a default ip header to the packet
*/
setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &optval, sizeof(int));
/*
* here the icmp packet is created
* also the ip checksum is generated
*/
icmp->type = ICMP_ECHO;
icmp->code = 0;
icmp->un.echo.id = 0;
icmp->un.echo.sequence = 0;
icmp->checksum = 0;
icmp-> checksum = in_cksum((unsigned short *)icmp, sizeof(struct icmphdr));
ip->check = in_cksum((unsigned short *)ip, sizeof(struct iphdr));
If this part of code looks flexible enough, then read about raw sockets :D maybe they're the easiest and safest answer to your need.
Please check the following links for further info
http://courses.cs.vt.edu/~cs4254/fall04/slides/raw_6.pdf
http://www.cs.binghamton.edu/~steflik/cs455/rawip.txt
http://cboard.cprogramming.com/networking-device-communication/107801-linux-raw-socket-programming.html a very nice topic, pretty useful imo
You can try libcrafter for packet crafting on user space. Is very easy to use! The library is able to craft or decode packets of most common networks protocols, send them on the wire, capture them and match requests and replies.
For example, the next code craft and send an ICMP packet:
string MyIP = GetMyIP("eth0");
/* Create an IP header */
IP ip_header;
/* Set the Source and Destination IP address */
ip_header.SetSourceIP(MyIP);
ip_header.SetDestinationIP("1.2.3.4");
/* Create an ICMP header */
ICMP icmp_header;
icmp_header.SetType(ICMP::EchoRequest);
icmp_header.SetIdentifier(RNG16());
/* Create a packet... */
Packet packet = ip_header / icmp_header;
packet.Send();
Why you want to craft an ICMP packet on kernel-space? Just for fun? :-p
Linux kernel includes a packet generator tool pktgen for testing the network with pre-configured packets. Source code for this module resides in net/core/pktgen.c

Resources