eBPF Maps : How to get MAP FD from another user space program - linux

I have attached eBPF XDP program(port_filter_kern.c) in my network Interface.
port_filter_kern.c - It will drop the incoming traffic which comes to the specific ports(port numbers are present in "port_map" as key).
port_filter_user.c - It will load my eBPF program to the given interface and update the eBPF map "port_map" after reading the text file(which has port numbers).
map_fd = bpf_object__find_map_fd_by_name(obj, "port_map");
printf("map_fd %d ",map_fd); //to see the map fd integer
int result = bpf_map_update_elem(map_fd, &portkey, &value, BPF_ANY);
Now, I want to access the same map "port_map" using another user space program (port_filter_runtime.c) which will get the port numbers from user/text file during run time and need to update the same map "port_map", to drop the incoming traffic which comes to the newly given port number.
I have tried below ways to find same map FD. I didnt get correct FD,(verified through the FD, which is printed in first user space program port_filter_user.c).
struct bpf_object *obj = bpf_object__open_file("port_filter_kern.o", NULL);
struct bpf_map *map = bpf_object__find_map_by_name(obj, "port_map");
int map_fd = bpf_map__fd(map);
printf("map_fd %d ",map_fd); //to see the map fd integer
and tried with below code also,
struct bpf_object *obj = bpf_object__open_file("port_filter_kern.o", NULL);
int map_fd = bpf_object__find_map_fd_by_name(obj, "port_map");
printf("map_fd %d ",map_fd); //to see the map fd integer
If I gets the same map FD, I can use that to update my map.
Any guidance? Thanks in Advance...

The file descriptor is an integer value that only makes sense in the context of its process. You cannot just share the value with any process and expect that it will point to the same resource.
Typically you would share a reference between processes by pinning the map (in the user space program that created it) to the bpffs (/sys/fs/bpf/), then retrieving the file descriptor in the other program from the pinned path with a bpf() syscall (see for example int bpf_obj_pin(int fd, const char *pathname), and then int bpf_obj_get(const char *pathname) from libbpf).
Once you have the file descriptor in your second process, you can assign it to the map in the struct bpf_object with libbpf's bpf_map__reuse_fd().

Related

SPI linux driver

I am trying to learn how to write a basic SPI driver and below is the probe function that I wrote.
What I am trying to do here is setup the spi device for fram(datasheet) and use the spi_sync_transfer()api description to get the manufacturer's id from the chip.
When I execute this code, I can see the data on the SPI bus using logic analyzer but I am unable to read it using the rx buffer. Am I missing something here? Could someone please help me with this?
static int fram_probe(struct spi_device *spi)
{
int err;
unsigned char ch16[] = {0x9F,0x00,0x00,0x00};// 0x9F => 10011111
unsigned char rx16[] = {0x00,0x00,0x00,0x00};
printk("[FRAM DRIVER] fram_probe called \n");
spi->max_speed_hz = 1000000;
spi->bits_per_word = 8;
spi->mode = (3);
err = spi_setup(spi);
if (err < 0) {
printk("[FRAM DRIVER::fram_probe spi_setup failed!\n");
return err;
}
printk("[FRAM DRIVER] spi_setup ok, cs: %d\n", spi->chip_select);
spi_element[0].tx_buf = ch16;
spi_element[1].rx_buf = rx16;
err = spi_sync_transfer(spi, spi_element, ARRAY_SIZE(spi_element)/2);
printk("rx16=%x %x %x %x\n",rx16[0],rx16[1],rx16[2],rx16[3]);
if (err < 0) {
printk("[FRAM DRIVER]::fram_probe spi_sync_transfer failed!\n");
return err;
}
return 0;
}
spi_element is not declared in this example. You should show that and also how all elements of that are array are filled. But just from the code that's there I see a couple mistakes.
You need to set the len parameter of spi_transfer. You've assigned the TX or RX buffer to ch16 or rx16 but not set the length of the buffer in either case.
You should zero out all the fields not used in the spi_transfer.
If you set the length to four, you would not be sending the proper command according to the datasheet. RDID expects a one byte command after which will follow four bytes of output data. You are writing a four byte command in your first transfer and then reading four bytes of data. The tx_buf in the first transfer should just be one byte.
And finally the number of transfers specified as the last argument to spi_sync_transfer() is incorrect. It should be 2 in this case because you have defined two, spi_element[0] and spi_element[1]. You could use ARRAY_SIZE() if spi_element was declared for the purpose of this message and you want to sent all transfers in the array.
Consider this as a way to better fill in the spi_transfers. It will take care of zeroing out fields that are not used, defines the transfers in a easy to see way, and changing the buffer sizes or the number of transfers is automatically accounted for in remaining code.
const char ch16[] = { 0x8f };
char rx16[4];
struct spi_transfer rdid[] = {
{ .tx_buf = ch16, .len = sizeof(ch16) },
{ .rx_buf = rx16, .len = sizeof(rx16) },
};
spi_transfer(spi, rdid, ARRAY_SIZE(rdid));
Since you have a scope, be sure to check that this operation happens under a single chip select pulse. I have found more than one Linux SPI driver to have a bug that pulses chip select when it should not. In some cases switching from TX to RX (like done above) will trigger a CS pulse. In other cases a CS pulse is generated for every word (8 bits here) of data.
Another thing you should change is use dev_info(&spi->dev, "device version %d", id)' and also dev_err() to print messages. This inserts the device name in a standard way instead of your hard-coded non-standard and inconsistent "[FRAME DRIVER]::" text. And sets the level of the message as appropriate.
Also, consider supporting device tree in your driver to read device properties. Then you can do things like change the SPI bus frequency for this device without rebuilding the kernel driver.

ping pong DMA in kernel

I am using a SoC to DMA data from the programmable logic side (PL) to the processor side (PS). My overall scheme is to allocate two separate dma buffers and ping-pong data to these buffers to the user-space application can access data without collisions. I have successfully done this using single dma transactions in a loop (in a kthread) with each transaction either on the first or second buffer. I was able to notify a poll() method of either the first or second buffer.
Now I would like to investigate scatter-gather and/or cyclic DMA.
struct scatterlist sg[2]
struct dma_async_tx_descriptor *chan_desc;
struct dma_slave_config config;
sg_init_table(sg, ARRAY_SIZE(sg));
addr_0 = (unsigned int *)dma_alloc_coherent(NULL, dma_size, &handle_0, GFP_KERNEL);
addr_1 = (unsigned int *)dma_alloc_coherent(NULL, dma_size, &handle_1, GFP_KERNEL);
sg_dma_address(&sg[0]) = handle_0;
sg_dma_len(&sg[0]) = dma_size;
sg_dma_address(&sg[1]) = handle_1;
sg_dma_len(&sg[1]) = dma_size;
memset(&config, 0, sizeof(config));
config.direction = DMA_DEV_TO_MEM;
config.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
config.src_maxburst = DMA_MAX_BURST_SIZE / DMA_SLAVE_BUSWIDTH_4_BYTES;
dmaengine_slave_config(dma_dev->chan, &config);
chan_desc = dmaengine_prep_slave_sg(dma_dev->chan, &sg[0], 2, DMA_DEV_TO_MEM, DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
chan_desc->callback = sync_callback;
chan_desc->callback_param = dma_dev->cmp;
init_completion(dma_dev->cmp);
dmaengine_submit(chan_desc);
dma_async_issue_pending(dma_dev->chan);
dma_free_coherent(NULL, dma_size, addr_0, handle_0);
dma_free_coherent(NULL, dma_size, addr_1, handle_1);
This works fine for a single run through the scatterlist and then calls the call_back at sync_callback. My thought was to chain the scatterlist in a loop but I won't get the callback.
IS THERE A WAY to have a callback for each descriptor in the scatterlist? I wondered if I could use dmaengine_prep_slave_cyclic (calls callback after every transaction) but it looks to me like this is for a single buffer when reviewing dmaengine.h. Looking at DMA Engine API Guide it looks like there is another option dmaengine_prep_interleaved_dma using a dma_interleaved_template that sounds interesting but hard to find info about.
In the end I just want to signal the user space in some manner as to what buffer is ready.

How to use proc_pid_cmdline in kernel module

I am writing a kernel module to get the list of pids with their complete process name. The proc_pid_cmdline() gives the complete process name;using same function /proc/*/cmdline gets the complete process name. (struct task_struct) -> comm gives hint of what process it is, but not the complete path.
I have included the function name, but it gives error because it does not know where to find the function.
How to use proc_pid_cmdline() in a module ?
You are not supposed to call proc_pid_cmdline().
It is a non-public function in fs/proc/base.c:
static int proc_pid_cmdline(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
However, what it does is simple:
get_cmdline(task, m->buf, PAGE_SIZE);
That is not likely to return the full path though and it will not be possible to determine the full path in every case. The arg[0] value may be overwritten, the file could be deleted or moved, etc. A process may exec() in a way which obscures the original command line, and all kinds of other maladies.
A scan of my Fedora 20 system /proc/*/cmdline turns up all kinds of less-than-useful results:
-F
BUG:
WARNING: at
WARNING: CPU:
INFO: possible recursive locking detecte
ernel BUG at
list_del corruption
list_add corruption
do_IRQ: stack overflow:
ear stack overflow (cur:
eneral protection fault
nable to handle kernel
ouble fault:
RTNL: assertion failed
eek! page_mapcount(page) went negative!
adness at
NETDEV WATCHDOG
ysctl table check failed
: nobody cared
IRQ handler type mismatch
Machine Check Exception:
Machine check events logged
divide error:
bounds:
coprocessor segment overrun:
invalid TSS:
segment not present:
invalid opcode:
alignment check:
stack segment:
fpu exception:
simd exception:
iret exception:
/var/log/messages
--
/usr/bin/abrt-dump-oops
-xtD
I have managed to solve a version of this problem. I wanted to access the cmdline of all PIDs but within the kernel itself (as opposed to a kernel module as the question states), but perhaps these principles can be applied to kernel modules as well?
What I did was, I added the following function to fs/proc/base.c
int proc_get_cmdline(struct task_struct *task, char * buffer) {
int i;
int ret = proc_pid_cmdline(task, buffer);
for(i = 0; i < ret - 1; i++) {
if(buffer[i] == '\0')
buffer[i] = ' ';
}
return 0;
}
I then added the declaration in include/linux/proc_fs.h
int proc_get_cmdline(struct task_struct *, char *);
At this point, I could access the cmdline of all processes within the kernel.
To access the task_struct, perhaps you could refer to kernel: efficient way to find task_struct by pid?.
Once you have the task_struct, you should be able to do something like:
char cmdline[256];
proc_get_cmdline(task, cmdline);
if(strlen(cmdline) > 0)
printk(" cmdline :%s\n", cmdline);
else
printk(" cmdline :%s\n", task->comm);
I was able to obtain the commandline of all processes this way.
To get the full path of the binary behind a process.
char * exepathp;
struct file * exe_file;
struct mm_struct *mm;
char exe_path [1000];
//straight up stolen from get_mm_exe_file
mm = get_task_mm(current);
down_read(&mm->mmap_sem); //lock read
exe_file = mm->exe_file;
if (exe_file) get_file(exe_file);
up_read(&mm->mmap_sem); //unlock read
//reduce exe path to a string
exepathp = d_path( &(exe_file->f_path), exe_path, 1000*sizeof(char) );
Where current is the task struct for the process you are interested in. The variable exepathp gets the string of the full path. This is slightly different than the process cmd, this is the path of binary which was loaded to start the process. Combining this path with the process cmd should give you the full path.

Can I pass parameter to driver during INSMOD or MODPROBE?

I currently worte a USB device driver in which I created a Kthread from probe() function. The general kthread_create() function creates thread on the CPU which is least busy.
What I want to do is create kthread on a particular CPU (kthread_create_on_cpu()), so that I can assign seperate core to device threads dealing with output devices.
How can I pass the CPU number to module when the module/driver is being loaded.
Either I can use a global variable which will be set once when the system boots up and will be read by drivers OR pass CPU number to module while it's being loaded.
Please suggest which method will be more feasible to use and implement.
Thanks and Regards,
Mitesh G
You can pass command line arguments. For this you have to add module_param or module_param_array in module.
Add these lines in your modules of course according to your requirements
int myintdata = 100;
module_param(myintdata, int, 0);
char mychardata = 'A';
module_param(mychardata, char, 0);
int myarray[2];
module_param_array(myarray, int, NULL, 0);
static char *name;
module_param(name, charp, 0); // here you have to mention charp as data type
or module_param_string(name, string, len, perm); for String
while inserting modules
insmod module_name.ko myintdata=5 mychardata = 'X' name= "xyz" myarray =99,100
`

fuse: Setting offsets for the filler function in readdir

I am implementing a virtual filesystem using the fuse, and need some understanding regarding the offset parameter in readdir.
Earlier we were ignoring the offset and passing 0 in the filler function, in which case the kernel should take care.
Our filesystem database, is storing: directory name, filelength, inode number and parent inode number.
How do i calculate get the offset?
Then is the offset of each components, equal to their size sorted in incremental form of their inode number? What happens is there is a directory inside a directory, is the offset in that case equal to the sum of the files inside?
Example: in case the dir listing is - a.txt b.txt c.txt
And inode number of a.txt=3, b.txt=5, c.txt=7
Offset of a.txt= directory offset
Offset of b.txt=dir offset + size of a.txt
Offset of c.txt=dir offset + size of b.txt
Is the above assumption correct?
P.S: Here are the callbacks of fuse
The selected answer is not correct
Despite the lack of upvotes on this answer, this is the correct answer. Cracking into the format of the void buffer should be discouraged, and that's the intent behind declaring such things void in C code - you shouldn't write code that assumes knowledge of the format of the data behind void pointers, use whatever API is provided properly instead.
The code below is very simple and straightforward, as it should be. No knowledge of the format of the Fuse buffer is required.
Fictitious API
This is a contrived example of what some device's API could look
like. This is not part of Fuse.
// get_some_file_names() -
// returns a struct with buffers holding the names of files.
// PARAMETERS
// * path - A path of some sort that the fictitious device groks.
// * offset - Where in the list of file names to start.
// RETURNS
// * A name_list, it has some char buffers holding the file names
// and a couple other auxiliary vars.
//
name_list *get_some_file_names(char *path, size_t offset);
Listing the files in parts
Here's a Fuse callback that can be registered with the Fuse system to
list the filenames provided by get_some_file_names(). It's arbitrarily named readdir_callback() so its purpose is obvious.
int readdir_callback( char *path,
void *buf, // This is meant to be "opaque".
fuse_fill_dir_t *filler, // filler takes care of buf.
off_t off, // Last value given to filler.
struct fuse_file_info *fi )
{
// Call the fictitious API to get a list of file names.
name_list *list = get_some_file_names(path, off);
for (int i = 0; i < list->length; i++)
{
// Feed the file names to filler() one at a time.
if (filler(buf, list->names[i], NULL, off + i + 1))
{
break; // filler() returned 1, requesting a break.
}
incr_num_files_listed(list);
}
if (all_files_listed(list))
{
return 1; // Tell Fuse we're done.
}
return 0;
}
The off (offset) value is not used by the filler function to fill its opaque buffer, buf. The off value is, however, meaningful to the callback as an offset base as it provides file names to filler(). Whatever value was last passed to filler() is what gets passed back to readdir_callback() on its next invocation. filler()
itself only cares whether the off value is 0 or not-0.
Indicating "I'm done listing!" to Fuse
To signal to the Fuse system that your readdir_callback() is done listing file names in parts (when the last of the list of names has been given to filler()), simply return 1 from it.
How off Is Used
The off, offset, parameter should be non-0 to perform the partial listings. That's its only requirement as far as filler() is concerned. If off is 0, that indicates to Fuse that you're going to do a full listing in one shot (see below).
Although filler() doesn't care what the off value is beyond it being non-0, the value can still be meaningfully used. The code above is using the index of the next item in its own file list as its value. Fuse will keep passing the last off value it received back to the read dir callback on each invocation until the listing is complete (when readdir_callback() returns 1).
Listing the files all at once
int readdir_callback( char *path,
void *buf,
fuse_fill_dir_t *filler,
off_t off,
struct fuse_file_info *fi )
{
name_list *list = get_all_file_names(path);
for (int i = 0; i < list->length; i++)
{
filler(buf, list->names[i], NULL, 0);
}
return 0;
}
Listing all the files in one shot, as above, is simpler - but not by much. Note that off is 0 for the full listing. One may wonder, 'why even bother with the first approach of reading the folder contents in parts?'
The in-parts strategy is useful where a set number of buffers for file names is allocated, and the number of files within folders may exceed this number. For instance, the implementation of name_list above may only have 8 allocated buffers (char names[8][256]). Also, buf may fill up and filler() start returning 1 if too many names are given at once. The first approach avoids this.
The offset passed to the filler function is the offset of the next item in the directory. You can have the entries in the directory in any order you want. If you don't want to return an entire directory at once, you need to use the offset to determine what gets asked for and stored. The order of items in the directory is up to you, and doesn't matter what order the names or inodes or anything else is.
Specifically, in the readdir call, you are passed an offset. You want to start calling the filler function with entries that will be at this callback or later. In the simplest case, the length of each entry is 24 bytes + strlen(name of entry), rounded up to the nearest multiple of 8 bytes. However, see the fuse source code at http://sourceforge.net/projects/fuse/ for when this might not be the case.
I have a simple example, where I have a loop (pseudo c-code) in my readdir function:
int my_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi)
{
(a bunch of prep work has been omitted)
struct stat st;
int off, nextoff=0, lenentry, i;
char namebuf[(long enough for any one name)];
for (i=0; i<NumDirectoryEntries; i++)
{
(fill st with the stat information, including inode, etc.)
(fill namebuf with the name of the directory entry)
lenentry = ((24+strlen(namebuf)+7)&~7);
off = nextoff; /* offset of this entry */
nextoff += lenentry;
/* Skip this entry if we weren't asked for it */
if (off<offset)
continue;
/* Add this to our response until we are asked to stop */
if (filler(buf, namebuf, &st, nextoff))
break;
}
/* All done because we were asked to stop or because we finished */
return 0;
}
I tested this within my own code (I had never used the offset before), and it works fine.

Resources