I'm trying to write a virtual netdevice driver on linux kernel 3.3.2. Some features of my driver need the route info when transmitting packets, so I use function skb_dst(struct sk_buff *) to get the dst_entry pointer. But whatever I do, wherever I ping, whenever I try, skb_dst() always returns NULL. I don't know why, and the bug confused me for more than a week. Can anyone help me?
I've found the reason! It's because of a flag added to kernel: IFF_XMIT_DST_RELEASE, if a virtual device is allocated with the flag set to 0, the kernel will drop the routing information when sending the sk_buff to the device. Thanks for Kristof Provost's reply all the same and sorry for ending the question so late.
Ping uses RAW sockets. They probably bypass part of the routing infrastructure.
Try looking at raw_send_hdrinc and raw_sendmsg in net/ipv4/raw.c
To be clear, add dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; to setup function
Related
I had two question relating to code implementation in the Linux networking stack:
I see that "struct eth_header_ops" is used to add ethernet header to a IP packet. But I am unable to find how the functions inside it are invoked, and which function is supposed to do what. What is the code flow for this?
Similarly, when does the ethernet header get removed on an incoming frame? Could you show the path from the NIC driver to the place where the header is actually removed?
thank you.
I think this is done as part of ip_finish_output2(). But I would really like some experts to throw more light into the flow for TX and RX wrt ethernet header manipulation.
According to this, wireshark is able to get the packet before it is dropped (therefore I cannot get such packets by myself). And I'm still wondering the exact location in linux kernel for wireshark to fetch the packets.
The answer goes as "On UN*Xes, it uses libpcap, which, on Linux, uses AF_PACKET sockets." Does anyone have more concrete example to use "AF_PACKET sockets"? If I understand wireshark correctly, the network interface card (NIC) will make a copy of all incoming packets and send it to a filter (berkeley packet filter) defined by the user. But where does this happen? Or am I wrong with that understanding and do I miss anything here?
Thanks in advance!
But where does this happen?
If I understood you correctly - you want to know, where is initialized such socket.
There is pcap_create function, that tries to determine type of source interface, creates duplicate of it and activates it.
For network see pcap_create_interface function => pcap_create_common function => pcap_activate_linux function.
All initialization happens in pcap_activate_linux => activate_new function => iface_bind function
( copy descriptor of device with handlep->device = strdup(device);,
create socket with socket(PF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL)),
bind socket to device with bind(fd, (struct sockaddr *) &sll, sizeof(sll)) ).
For more detailed information read comments in source files of mentioned functions - they are very detailed.
After initialization all work happens in a group of functions such as pcap_read_linux, etc.
On Linux, you should be able to simply use tcpdump (which leverages the libpcap library) to do this. This can be done with a file or to STDOUT and you specify the filter at the end of the tcpdump command..
I want to do some testing by sending layer 2 packages with wrong FCS/CRCs.
I've searched scapy/mz/nemesis, but it seems none of them could play with it.
Is it possible to do this on a regular linux NIC? Or if the FCS/CRC is automatically appended by hardware that we cannot do anything with it?
I have some specific machine to detect all incoming packets before dropping them, so I want to test if it does work like that.
No you cannot, as far as my experience with most NICs go. You can, however, disable automatic checksum calculation at the rx side, manipulate it at the buffer desccriptor layer and give it to stack.
Googled it for you. These guys say intresting things. Take a look.
http://dev.inversepath.com/download/802.3/whitepaper.txt
Yes you can. I've found another discussion on this here: How do you send an Ethernet frame with a corrupt FCS?
There is a link going to a working example (http://markmail.org/thread/eoquixklsjgvvaom). I've tried that and it's working (on igb and e1000 Eth cards).
As it is implied by this question, it seems that checksum is calculated and verified by ethernet hardware, so it seems highly unlikely that it must be generated by software when sending frames using an AF_PACKET socket, as seem here and here. Also, I don't think it can be received from the socket nor by any simple mean, since even Wireshark doesn't display it.
So, can anyone confirm this? Do I really need to send the checksum myself as shown in the last two links? Will checksum be created and checked automatically by the ethernet adaptor?
No, you do not need to include the CRC.
When using a packet socket in Linux using socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL) ), you must provide the layer 2 header when sending. This is defined by struct ether_header in netinet/if_ether.h and includes the destination host, source host, and type. The frame check sequence is not included, nor is the preamble, start of frame delimiter, or trailer. These are added by the hardware.
On Linux, if you mention socket(AF_PACKET, SOCK_RAW, htobe16(ETH_P_ALL)) similar case, you don't need to calculate ethernet checksum, NIC hardware/driver will do it for you. That means you need to offer whole data link layer frame except checksum before send it to raw socket.
I want to write a Linux 2.6 netfilter module, which can check the incoming IP packet information, such as dest-ip, source-ip. After that pass these information to user space app which (i.e Socket app) will handle these information as soon as the packet reach the HOOKs.
I want to try two ways:
1 . Inside the netfilter module, make a fifo struct line, every time the packet reach the hook, put the packet information to the fifo. and with the help of /proc/example filesystem . every time the user space app read the /proc/example file , will get a packet information from the fifo head .
I'm a newbie of kernel program, this program crash my kernel several times. -_-!
I wonder is this way possible?
2 . Inside the netfilter module, make a char device, the user space app read packet information from this char device. But I still don't know how to make sure to get the packet as soon as possible, is there any way when the packet reach the netfilter hook, kernel will send some signal to info user space app, then the user space app come to pick the packet information?
Thank you very much.
My way of doing this:
Create a kernel module to get hooked to your network activity.
Then use Netlink which can talk to user space from the kernel to pass the data IPC.
Option 1 is possible and workable what is the problem? But I usually use to communicate between user-space and kernel space using netlink
netlink_kernel_create
netlink_kernel_release
nl_sk = netlink_kernel_create(&init_net, 17, 0, recv_cmd, NULL, THIS_MODULE);
netlink_kernel_release(nl_sk);
netlink_unicast
What do you mean by as soon as possible? Do you have some actual hard/soft realtime requirements?
If you select option 2 you should be able to get new data rather quickly by opening the character device non-blocking and select()ing on the read fd. I've done something similar with a kernel level socket, where socket data was presented to a user level process via a character driver. I saw very little latency as long as I serviced the socket in a timely manner. The driver was used in an environment that had some soft realtime requirements and we didn't have any problem meeting those requirements with run-of-the-mill kernel facilities.
Have a look at Linux Device Drivers, 3rd Edition.
I'm not sure about the first method, but using the second, your user space app could use a select() call with the char device as the only target - as soon as select() returns, you'll have data to read. Just make sure to read it all, and not just assume one packet's worth of data.