directions about customized Layer 2 implementation in linux

directions about customized Layer 2 implementation in linux - linux

I have some machines running on the same network. One node is the control node which distributes traffic coming to it to the other nodes. The thing is that I want to have a custom protocol header between MAC header and IP(or whatever) payload incoming to the control node.
Control node receives this any packet like this:
------------------------------------------------
| Layer 2 | IP(or whatever protocol) | Payload |
------------------------------------------------
This packet should be distributed like this to other nodes
----------------------------------------------------------------
| Layer 2 | Custom Header | IP(or whatever protocol) | Payload |
----------------------------------------------------------------
I want some directions to do such a thing, Is there any current solution which I can use and I have to hack kernel for it from the scratch. A similar approach is to use L2TP but that runs over IP layer so I dont want that.
I also want this communication to be appeared as a seperate interface in linux like tun0 apart from physical eth0 interface.
Any help or ideas would be highly appreciated.
I dont know in what stack-exchange website this question belongs to so directions to correct website are also appreciated.

Your case is very similar to VLAN, where VLAN header also sits between L2 header and IP header. You can take a look at VLAN code, especially net/8021q/vlan_dev.c.
The key here is you need to construct your own L2 header, so you need to register your own header_ops like what VLAN does:
static const struct header_ops vlan_header_ops = {
.create = vlan_dev_hard_header,
.rebuild = vlan_dev_rebuild_header,
.parse = eth_header_parse,
};
and register it during initialization:
dev->header_ops = &vlan_header_ops;
dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
The ->create() function pointer here is used to create the custom header:
static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
unsigned short type,
const void *daddr, const void *saddr,
unsigned int len)
{
struct vlan_hdr *vhdr;
unsigned int vhdrlen = 0;
u16 vlan_tci = 0;
int rc;
if (!(vlan_dev_priv(dev)->flags & VLAN_FLAG_REORDER_HDR)) {
vhdr = (struct vlan_hdr *) skb_push(skb, VLAN_HLEN);
vlan_tci = vlan_dev_priv(dev)->vlan_id;
vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
vhdr->h_vlan_TCI = htons(vlan_tci);
/*
* Set the protocol type. For a packet of type ETH_P_802_3/2 we
* put the length in here instead.
*/
if (type != ETH_P_802_3 && type != ETH_P_802_2)
vhdr->h_vlan_encapsulated_proto = htons(type);
else
vhdr->h_vlan_encapsulated_proto = htons(len);
skb->protocol = htons(ETH_P_8021Q);
type = ETH_P_8021Q;
vhdrlen = VLAN_HLEN;
}
/* Before delegating work to the lower layer, enter our MAC-address */
if (saddr == NULL)
saddr = dev->dev_addr;
/* Now make the underlying real hard header */
dev = vlan_dev_priv(dev)->real_dev;
rc = dev_hard_header(skb, dev, type, daddr, saddr, len + vhdrlen);
if (rc > 0)
rc += vhdrlen;
return rc;
}

Related

Linux skb packet count header and metadata

Could not find any info about this header at the end of Skb and about this metadata
So it seems it is user controlled and should be checked for bounds
static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
{
struct sk_buff *ax_skb;
int pkt_cnt;
u32 rx_hdr;
u16 hdr_off;
u32 *pkt_hdr;
/* At the end of the SKB, there's a header telling us how many packets
* are bundled into this buffer and where we can find an array of
* per-packet metadata (which contains elements encoded into u16).
*/
if (skb->len < 4)
return 0;
skb_trim(skb, skb->len - 4);
rx_hdr = get_unaligned_le32(skb_tail_pointer(skb));
pkt_cnt = (u16)rx_hdr;
hdr_off = (u16)(rx_hdr >> 16);
if (pkt_cnt == 0)
return 0;
/* Make sure that the bounds of the metadata array are inside the SKB
* (and in front of the counter at the end).
*/
if (pkt_cnt * 2 + hdr_off > skb->len)
return 0;
Can somebody point to code in Kernel or references describing it?

USB Ethernet devices don't have hardware framing support, so have their own schemes to do framing by inserting extra bytes in the packet. The tx_fixup and rx_fixup are provided to do that. Some of the schemes are described here: http://www.linux-usb.org/usbnet

ebpf packet filter on payload matching

I am new in ebpf & xdp topic and want to do learn it. My question is how to use ebpf filter to filter the packet on specific payload matching? for example, if the data(payload) of the packet is 1234 its passes to the network stack otherwise it blocks the packet. I reached payload length. For example, if I want to match the message payload length it works fine but when I start matching the payload characters I got an error. here is my code:
int ret_val;
unsigned long payload_offset;
unsigned long payload_size;
const char *payload = "test";
struct ethhdr *eth = data;
if ((void*)eth + sizeof(*eth) <= data_end) {
struct iphdr *ip = data + sizeof(*eth);
if ((void*)ip + sizeof(*ip) <= data_end) {
if (ip->protocol == IPPROTO_UDP ) {
struct udphdr *udp = (void*)ip + sizeof(*ip);
if ((void*)udp + sizeof(*udp) <= data_end) {
if (udp->dest == ntohs(5005)) {
payload_offset = sizeof(struct udphdr);
payload_size = ntohs(udp->len) - sizeof(struct udphdr);
unsigned char *s = (unsigned char *)&payload_size;
if (ret_val == __builtin_memcmp(s,payload,4) == 0) {
return XDP_DROP;
}
}
}
}
}
}
The error had removed but unable to compare the payload... I am sending the UDP message from python socket code. If I compare the payload length it works fine.

What did you try? You should probably read a bit more about eBPF to try to understand how to process packets, the basic example you give does not sound too complicated.
Basically you would have to parse the headers to see where your payload begins. Simple BPF parsing examples might help you understand the principles:
Start from beginning of header (e.g. Ethernet at first)
Check packet is long enough to hold the header (or you would risk an out-of-bound access when trying to access the upper layers otherwise)
Add header length to get the offset of your next header (e.g. IPv4, then e.g. TCP...)
Rinse and repeat.
In your case you would process all headers until you get the offset of the data payload. Note that this is trivial if the traffic you try to match always has the same headers (e.g. always IPv4 and UDP), but you get more cases to sort out if there is a mix (IPv4 + IPv6, encapsulation, IPv4 options...).
Once you have the offset for your data, just compare data at this offset to your pattern (that you may hardcode in the BPF program or get from a BPF map, depending on your use case). Note that you do not have access to strcmp(), but __builtin_memcmp() is available if you need to compare more than 64 bits.
(All the above applying of course to a C program that you would compile into an object file containing eBPF instructions with the LLVM back-end.)
If you were to search for a string at an arbitrary offset in the payload, know that eBPF now supports (bounded) loops since kernel 5.3 (if I remember correctly).

Your edit is pretty much a new question, so here an updated answer. Please consider opening a new question instead in the future.
There are a number of things that are wrong in your program. In particular:
1| payload_offset = sizeof(struct udphdr);
2| payload_size = ntohs(udp->len) - sizeof(struct udphdr);
3| unsigned char *s = (unsigned char *)&payload_size;
4|
5| if (ret_val == __builtin_memcmp(s, payload, 4) == 0) {
6| return XDP_DROP;
7| }
On line 1, your payload_offset variable is not an offset, it just contains the length of the UDP header. You would need to add that to the start of the UDP header to get the actual payload offset.
Line 2 is fine.
Line 3 does not make any sense! You make s (that you later compare to your pattern) point towards the size of the payload? (a.k.a “I told you so in the comments! :)”). Instead, it should point to... the beginning of the payload, maybe? So, basically, data + payload_offset (once offset is fixed).
Between lines 3 and 5, the check on payload length is missing. When you try to access your payload in s (__builtin_memcmp(s, payload, 4)), you try to compare four bytes of packet data; you must ensure that the packet is long enough to read those four bytes (just as you checked the length each time before you read from an Ethernet, IP or UDP header field).
While at it, we can also check that the length of the payload is equal to the length of the pattern to match, and exit if they differ without having to compare the bytes.
Line 5 has a == instead of =, as discussed in the comments. Easy to fix. However, I had no luck with __builtin_memcmp() for your program, it seems LLVM does not want to inline it and turns it into a failing function call. Never mind, we can work without it. For your example, you can cast to int and compare the four-byte long values directly. For longer patterns, and for recent kernels (or by unrolling if pattern size is fixed), we can use bounded loops.
Here is a amended version of your program, that works on my setup.
#include <arpa/inet.h>
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
int xdp_func(struct xdp_md *ctx)
{
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
char match_pattern[] = "test";
unsigned int payload_size, i;
struct ethhdr *eth = data;
unsigned char *payload;
struct udphdr *udp;
struct iphdr *ip;
if ((void *)eth + sizeof(*eth) > data_end)
return XDP_PASS;
ip = data + sizeof(*eth);
if ((void *)ip + sizeof(*ip) > data_end)
return XDP_PASS;
if (ip->protocol != IPPROTO_UDP)
return XDP_PASS;
udp = (void *)ip + sizeof(*ip);
if ((void *)udp + sizeof(*udp) > data_end)
return XDP_PASS;
if (udp->dest != ntohs(5005))
return XDP_PASS;
payload_size = ntohs(udp->len) - sizeof(*udp);
// Here we use "size - 1" to account for the final '\0' in "test".
// This '\0' may or may not be in your payload, adjust if necessary.
if (payload_size != sizeof(match_pattern) - 1)
return XDP_PASS;
// Point to start of payload.
payload = (unsigned char *)udp + sizeof(*udp);
if ((void *)payload + payload_size > data_end)
return XDP_PASS;
// Compare each byte, exit if a difference is found.
for (i = 0; i < payload_size; i++)
if (payload[i] != match_pattern[i])
return XDP_PASS;
// Same payload, drop.
return XDP_DROP;
}

ff_replay substructure in ff_effect empty

I am developing a force feedback driver (linux) for a yet unsupported gamepad.
Whenever a application in userspace requests a ff-effect (e.g rumbling), a function in my driver is called:
static int foo_ff_play(struct input_dev *dev, void *data, struct ff_effect *effect)
this is set by the following code inside my init function:
input_set_capability(dev, EV_FF, FF_RUMBLE);
input_ff_create_memless(dev, NULL, foo_ff_play);
I'm accessing the ff_effect struct (which is passed to my foo_ff_play function) like this:
static int foo_ff_play(struct input_dev *dev, void *data, struct ff_effect *effect)
{
u16 length;
length = effect->replay.length;
printk(KERN_DEBUG "length: %i", length);
return 0;
}
The problem is, that the reported length (in ff_effect->replay) is always zero.
That's confusing, since i am running fftest on my device, and fftest definitely sets the length attribute: https://github.com/flosse/linuxconsole/blob/master/utils/fftest.c (line 308)
/* a strong rumbling effect */
effects[4].type = FF_RUMBLE;
effects[4].id = -1;
effects[4].u.rumble.strong_magnitude = 0x8000;
effects[4].u.rumble.weak_magnitude = 0;
effects[4].replay.length = 5000;
effects[4].replay.delay = 1000;
Does this have something to do with the "memlessness"? Why does the data in ff_replay seem to be zero if it isn't?
Thank you in advance

Why is the replay struct empty?
Taking a look at https://elixir.free-electrons.com/linux/v4.4/source/drivers/input/ff-memless.c#L406 we find:
static void ml_play_effects(struct ml_device *ml)
{
struct ff_effect effect;
DECLARE_BITMAP(handled_bm, FF_MEMLESS_EFFECTS);
memset(handled_bm, 0, sizeof(handled_bm));
while (ml_get_combo_effect(ml, handled_bm, &effect))
ml->play_effect(ml->dev, ml->private, &effect);
ml_schedule_timer(ml);
}
ml_get_combo_effect sets the effect by calling ml_combine_effects., but ml_combine_effects simply does not copy replay.length to the ff_effect struct which is passed to our foo_play_effect (at least not if the effect-type is FF_RUMBLE): https://elixir.free-electrons.com/linux/v4.4/source/drivers/input/ff-memless.c#L286
That's why we cannot read out the ff_replay-data in our foo_play_effect function.
Okay, replay is empty - how can we determine how long we have to play the effect (e.g. FF_RUMBLE) then?
Looks like the replay structure is something we do not even need to carry about. Yes, fftest sets the length and then uploads the effect to the driver, but if we take a look at ml_ff_upload (https://elixir.free-electrons.com/linux/v4.4/source/drivers/input/ff-memless.c#L481), we can see the following:
if (test_bit(FF_EFFECT_STARTED, &state->flags)) {
__clear_bit(FF_EFFECT_PLAYING, &state->flags);
state->play_at = jiffies +
msecs_to_jiffies(state->effect->replay.delay);
state->stop_at = state->play_at +
msecs_to_jiffies(state->effect->replay.length);
state->adj_at = state->play_at;
ml_schedule_timer(ml);
}
That means that the duration is already handled by the input-subsystem. It starts the effect and also stops it as needed.
Furthermore we can see at https://elixir.free-electrons.com/linux/v4.4/source/include/uapi/linux/input.h#L279 that
/*
* All duration values are expressed in ms. Values above 32767 ms (0x7fff)
* should not be used and have unspecified results.
*/
That means that we have to make our effect play at least 32767ms. Everything else (stopping the effect before) is up to the scheduler - which is not our part :D

Sending UDP packets from the Linux Kernel

Even if a similar topic already exists, I noticed that it dates back two years, thus I guess it's more appropriate to open a fresh one...
I'm trying to figure out how to send UDP packets from the Linux Kernel (3.3.4), in order to monitor the behavior of the random number generator (/drivers/char/random.c). So far, I've managed to monitor a few things owing to the sock_create and sock_sendmsg functions. You can find the typical piece of code I use at the end of this message. (You might also want to download the complete modified random.c file here.)
By inserting this code inside the appropriate random.c functions, I'm able to send a UDP packet for each access to /dev/random and /dev/urandom, and each keyboard/mouse events used by the random number generator to harvest entropy. However it doesn't work at all when I try to monitor the disk events: it generates a kernel panic during boot.
Consequently, here's my main question: Have you any idea why my code causes so much trouble when inserted in the disk events function? (add_disk_randomness)
Alternatively, I've read about the netpoll API, which is supposed to handle this kind of UDP-in-kernel problems. Unfortunately I haven't found any relevant documentation apart from an quite interesting but outdated Red Hat presentation from 2005. Do you think I should rather use this API? If yes, have you got any example?
Any help would be appreciated.
Thanks in advance.
PS: It's my first question here, so please don't hesitate to tell me if I'm doing something wrong, I'll keep it in mind for future :)
#include <linux/net.h>
#include <linux/in.h>
#include <linux/netpoll.h>
#define MESSAGE_SIZE 1024
#define INADDR_SEND ((unsigned long int)0x0a00020f) //10.0.2.15
static bool sock_init;
static struct socket *sock;
static struct sockaddr_in sin;
static struct msghdr msg;
static struct iovec iov;
[...]
int error, len;
mm_segment_t old_fs;
char message[MESSAGE_SIZE];
if (sock_init == false)
{
/* Creating socket */
error = sock_create(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
if (error<0)
printk(KERN_DEBUG "Can't create socket. Error %d\n",error);
/* Connecting the socket */
sin.sin_family = AF_INET;
sin.sin_port = htons(1764);
sin.sin_addr.s_addr = htonl(INADDR_SEND);
error = sock->ops->connect(sock, (struct sockaddr *)&sin, sizeof(struct sockaddr), 0);
if (error<0)
printk(KERN_DEBUG "Can't connect socket. Error %d\n",error);
/* Preparing message header */
msg.msg_flags = 0;
msg.msg_name = &sin;
msg.msg_namelen = sizeof(struct sockaddr_in);
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_iov = &iov;
msg.msg_control = NULL;
sock_init = true;
}
/* Sending a message */
sprintf(message,"EXTRACT / Time: %llu / InputPool: %4d / BlockingPool: %4d / NonblockingPool: %4d / Request: %4d\n",
get_cycles(),
input_pool.entropy_count,
blocking_pool.entropy_count,
nonblocking_pool.entropy_count,
nbytes*8);
iov.iov_base = message;
len = strlen(message);
iov.iov_len = len;
msg.msg_iovlen = len;
old_fs = get_fs();
set_fs(KERNEL_DS);
error = sock_sendmsg(sock,&msg,len);
set_fs(old_fs);

I solved my problem a few months ago. Here's the solution I used.
The standard packet-sending API (sock_create, connect, ...) cannot be used in a few contexts (interruptions). Using it in the wrong place leads to a KP.
The netpoll API is more "low-level" and works in every context. However, there are several conditions :
Ethernet devices
IP network
UDP only (no TCP)
Different computers for sending and receiving packets (You can't send to yourself.)
Make sure to respect them, because you won't get any error message if there's a problem. It will just silently fail :) Here's a bit of code.
Declaration
#include <linux/netpoll.h>
#define MESSAGE_SIZE 1024
#define INADDR_LOCAL ((unsigned long int)0xc0a80a54) //192.168.10.84
#define INADDR_SEND ((unsigned long int)0xc0a80a55) //192.168.10.85
static struct netpoll* np = NULL;
static struct netpoll np_t;
Initialization
np_t.name = "LRNG";
strlcpy(np_t.dev_name, "eth0", IFNAMSIZ);
np_t.local_ip = htonl(INADDR_LOCAL);
np_t.remote_ip = htonl(INADDR_SEND);
np_t.local_port = 6665;
np_t.remote_port = 6666;
memset(np_t.remote_mac, 0xff, ETH_ALEN);
netpoll_print_options(&np_t);
netpoll_setup(&np_t);
np = &np_t;
Use
char message[MESSAGE_SIZE];
sprintf(message,"%d\n",42);
int len = strlen(message);
netpoll_send_udp(np,message,len);
Hope it can help someone.

Panic during boot might be caused by you trying to use something which wasn't initialized yet. Looking at stack trace might help figuring out what actually happened.
As for you problem, I think you are trying to do a simple thing, so why not stick with simple tools? ;) printks might be bad idea indeed, but give trace_printk a go. trace_printk is part of Ftrace infrastructure.
Section Using trace_printk() in following article should teach you everything you need to know:
http://lwn.net/Articles/365835/

How to implement ACTION (move/rename, set permissions) operation in J2ME Bluetooth OBEX?

Bluetooth FTP specification says I need to use ACTION operation, here's a page
But the ClentSession provides only GET and PUT operations, and nothing mentioned in javadocs.
here's how the create file operation looks, it's pretty easy
public void create() throws IOException {
HeaderSet hs = cs.createHeaderSet();
hs.setHeader(HeaderSet.NAME, file);
op = cs.put(hs);
OutputStream os = op.openOutputStream();
os.close();
op.close();
}
Question 1: How do I implement ACTION operation with custom headers to perform move/rename and set permissions? It should be possible without JSR82 OBEX API. Please help me to do this.
Question 2:
Did I understand how to set permissions?
According to OBEX_Errata Compiled For 1.3.pdf (thanks alanjmcf!)
So, to set read-only, I should do the following:
int a = 0;
//byte 0 //zero
//byte 1 //user
//byte 2 //group
//byte 3 //other
//set read for user
a |= (1 << 7); //8th bit - byte 1, bit 0 -> set to 1
// a = 10000000
//for group
a |= (1 << 15); //16th bit - byte 2, bit 0 -> set to 1
// a = 1000000010000000
//for other
a |= (1 << 23); //24th bit - byte 3, bit 0 -> set to 1
// a = 100000001000000010000000
//or simply
private static final int READ = 8421504 //1000,0000,1000,0000,1000,0000
int value = 0 | READ;
//========== calculate write constant =========
a = 0;
a |= (1 << 8); //write user
a |= (1 << 16); //write group
a |= (1 << 24); //write other
// a = 1000000010000000100000000
private static final int WRITE = 16843008 // 1,0000,0001,0000,0001,0000,0000
//========= calculate delete constant ==========
a = 0;
a |= (1 << 9); //delete user
a |= (1 << 17); //delete group
a |= (1 << 25); //delete other
//a = 10000000100000001000000000
private static final DELETE = 33686016; //10,0000,0010,0000,0010,0000,0000
//========= calculate modify constant ==========
a = 0;
a |= (1 << (7 + 7)); //modify user
a |= (1 << (15 + 7)); //modify group
a |= (1 << (23 + 7)); //modify other
//a = 1000000010000000100000000000000
private static final MODIFY = 1077952512; // 100,0000,0100,0000,0100,0000,0000,0000
// now, if i want to set read-write-delete-modify, I will do the following:
int rwdm = 0 | READ | WRITE | DELETE | MODIFY;
// and put the value to the header... am I right?
if right, the only problem remains the question 1: how do I make ACTION operation and how to set the headers.

Note that the text you quote from the Bluetooth FTP specification mentions three headers: ActionId, Name, DestName. So you need to add one NAME header and one DestName header. Jsr-82 apparently doesn't define the const for that (new) header so quoting from the OBEX specification:
MODIFICATION
2.1 OBEX Headers
HI identifier | Header name | Description
0x94 Action Id Specifies the action to be performed (used in ACTION operation)
0x15 DestName The destination object name (used in certain ACTION operations)
0xD6 Permissions 4 byte bit mask for setting permissions
0x17 to 0x2F Reserved for future use. This range includes all combinations of the upper 2 bits
So create the following etc. (My Java's a bit rusty)
static final int DEST_NAME = 0x15;
And use that in your code.
[ADD] All the operations (actions) that are actions use the ACTION operation! :-,) That is use OBEX opcode ACTION instead of PUT or GET etc. The value of opcode ACTION is 0x86.
I'm reading this from "OBEX_Errata Compiled For 1.3.pdf". The IrDA did charge for specifications but seem to now provide them on request (http://www.irda.org). Ask for a copy of the latest OBEX specs (1.5 IIRC). I've done so myself but not yet got a response. Or you could maybe try googling for say "move/rename object action" to get that '1.3 Errata' PDF.
Anyway, if Java prevents you from using new Opcodes (only allowing GET and PUT) and also prevents you from using new HeaderId values then you can't proceed anyway. :-( *(There's no reason for them to do that as HeaderId encodes the data type it contains).
After having another look at the Java API I can't see any way of sending an arbitrary command over ClientSession. You'd have to manually build the packets, connect to the OBEX service and then send and receive packets over that connection. It isn't too difficult to build the packets...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string