Why is eBPF said to be safer than LKM? - security

when talking about ebpf advantage, it always mentions safe than lkm.
I read some documentation, ebpf ensures safe by verifying code before it loaded.
these are checklists that verify to do:
loops
out of range jumps
unreachable instructions
invalid instructions
uninitialized register access
uninitialized stack access
misaligned stack access
out of range stack access
invalid calling convention
most of these checks I can understand, but it's all reason that lkm cause kernel panic? if do these can ensure safe?
I have 120000 servers in production, this question is the only reason to prevent me to migrate from traditional hids to ebpf hids. but if it can cause a kernel panic on a large scale, only one time, our business will be over.

Yes, as far as I know, the BPF verifier is meant to prevent any sort of kernel crash. That however doesn't mean you can't break things unintentionally in production. You could for example freeze your system by attaching BPF programs to all kernel functions or lose all connectivity by dropping all received packets. In those cases, the verifier has no way to know that you didn't mean to perform those actions; it won't stop you.
That being said, any sort of verification is better than no verification as in traditional kernel modules. With kernel modules, not only can you shoot yourself in the foot as I've described above, but you could also crash the whole system because of a subtle bug somewhere in the code.
Regardless of what you're using, you should obviously test it extensively before deploying to production.

Related

2 questions regarding ASLR

I've been reading about ASLR and I have a couple of questions. I have little programming experience but I am interested in the theory behind it.
I understand that it randomizes where DLLs, stacks and heaps are in the virtual address space so that malicious code doesn't know their location, but how does the actual program know their location when it needs them?
If the legitimate process can locate them, what stops the malicious code doing the same?
and finally, is the malicious code that ASLR tries to prevent running in the user space of the process it is attacking?
Thanks
As background, ASLR is intended to complicate code injection attacks where the attacker tries to utilize your overflow bug to trick your application into running the attacker's code. For example, in a successful stack buffer overflow attack the attacker pushes their code onto the stack and modifies the call frame's return pointer to point to the on-stack code.
Most code injection attacks require the attacker to know the absolute address of some part of your process's memory layout. For stack buffer overflow attacks, they need to know the address of the stack frame of the vulnerable function call so they can set the functions return pointer to point to the stack. For other attacks this could be the address of heap variables, exception tables, etc...
One more important background fact: unlike programming languages, machine code has absolute addresses in it. While your program may call function foo(), the machine code will call address 0x12345678.
but how does the actual program know their location when it needs them?
This is established by the dynamic linker and other operating system features that are responsible for converting your on-disk executable into an in-memory process. This involves replacing references to foo with references to 0x12345678.
If the legitimate process can locate them, what stops the malicious code doing the same?
The legitimate process knows where the addresses are because the dynamic linker creates the process such that the actual addresses are hard-wired into the process. So the process isn't locating them, per se. By the time the process is started, the addresses are all calculated and inserted into the code. An attacker can't utilize this because their code is not modified by the dynamic linker.
Consider the scenario where an attacker has a copy of the same executable that they are trying to attack. They can run the executable on their machine, examine it, and find all of the relevant addresses. Without ASLR, these addresses have a good chance of being the same on your machine when you're running the executable. ASLR randomizes these addresses meaning that the attacker can't (easily) find the addresses.
and finally, is the malicious code that ASLR tries to prevent running in the user space of the process it is attacking?
Unless there's a kernel injection vulnerability (which would likely be very bad and result in patches by your OS vendpr), yes, it's running in the user space. More specifically, it will likely be located on the stack or the heap as this is where user input is stored. Using data execution prevention will also help to prevent successful injection attacks.

Relevant debug data for a Linux target

For an embedded ARM system running in-field there is a need to retrieve relevant debug information when a user-space application crash occurs. Such information will be stored in a non-volatile memory so it could be retreived at a later time. All such information must be stored during runtime, and cannot use third-party applications due to memory consumption concerns.
So far I have thought of following:
Signal ID and corresponding PC / memory addresses in case a kernel SIG occurs;
Process ID;
What other information do you think it's relevant in order to indentify the causing problem and be able to do a fast debug afterwards?
Thank you!
Usually, to be able to understand an issue, you'll need every register (from r0 to r15), the CPSR, and the top of the stack (to be able to determine what happened before the crash). Please also note that, when your program is interrupt for any invalid operation (jump to invalid address, ...), the processor goes to an exception mode, while you need to dump the registers and stack in the context of your process.
To be able to investigate, using those data, you also must keep the ELF files (with debug information, if possible) from your build, to be able to interpret the content of your registers and stack.
In the end, the more information you keep, the easier the debug is, but it may be expensive to keep every memory sections used by your program at the time of the failure (as a matter of fact, I've never done this).
In postmortem analysis, you will face some limits :
Dynamically linked libraries : if your crash occurs in a dynamically loaded and linked code, you will also need the lib binary you are using on your target.
Memory corruption : memory corruption usually results in the call of random data as code. On ARM with linux, this will probably lead to a segfault, as you can't go to an other process memory area, and as your data will probably be marked as "never execute", nevertheless, when the crash happens, you may have already corrupted the data that could have allow you to identify the source of the corruption. Postmortem analysis isn't always able to identify the failure cause.

Linux System Calls & Kernel Mode

I understand that system calls exist to provide access to capabilities that are disallowed in user space, such as accessing a HDD using the read() system call. I also understand that these are abstracted by a user-mode layer in the form of library calls such as fread(), to provide compatibility across hardware.
So from the application developers point of view, we have something like;
//library //syscall //k_driver //device_driver
fread() -> read() -> k_read() -> d_read()
My question is; what is stopping me inlining all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so the CPU should behave in the same way? I have not tried it, but I assume that this does not work for some reason I am missing. Otherwise any application could get arbitrary kernel mode operation.
TL;DR: What allows system calls to 'enter' kernel mode that is not copy-able by an application?
System calls do not enter the kernel themselves. More precisely, for example the read function you call is still, as far as your application is concerned, a library call. What read(2) does internally is calling the actual system call using some interruption or the syscall(2) assembly instruction, depending on the CPU architecture and OS.
This is the only way for userland code to have privileged code to be executed, but it is an indirect way. The userland and kernel code execute in different contexts.
That means you cannot add the kernel source code to your userland code and expect it to do anything useful but crash. In particular, the kernel code has access to physical memory addresses required to interact with the hardware. Userland code is limited to access a virtual memory space that has not this capability. Also, the instructions userland code is allowed to execute is a subset of the ones the CPU support. Several I/O, interruption and virtualization related instructions are examples of prohibited code. They are known as privileged instructions and require to be in an lower ring or supervisor mode depending on the CPU architecture.
You could inline them. You can issue system calls directly through syscall(2), but that soon gets messy. Note that the system call overhead (context switches back and forth, in-kernel checks, ...), not to mention the time the system call itself takes, makes your gain by inlining dissapear in the noise (if there is any gain, more code means cache isn't so useful, and performance suffers). Trust the libc/kernel folks to have studied the matter and done the inlining for you behind your back (in the relevant *.h file) if it really is a measurable gain.

pinning a pthread to a single core

I am trying to measure the performance of some library calls. My primary measurement tool is the rdtsc call. After doing some reading I realize that I need to disable preemption and interrupts in order to get the most accurate readings. Can someone help me figure out how to do these? I know that pthreads have a 'set affinity' mechanism. Is that enough to get the job done?
I also read somewhere that I can make calls into the kernel of the sort
preempt_disable()
raw_local_irq_save(...)
Is there any benefit to using one approach over the other? I tried the latter approach and got this error.
error: 'preempt_disable' was not declared in this scope
which can be fixed by including linux/preempt.h but the compiler still complains.
linux/preempt.h: No such file or directory
Obviously I have not done any kernel hacking and I could not find this file on my system anywhere. I am really hoping I wont have to install a new linux kernel. :)
Thanks for your input.
Pinning a pthread to a single CPU can be done using pthread_setaffinity_np
But what you want to achieve at the end is not so simple. I'll explain you why.
preempt.h is part of the Linux Kernel source. Its located here. You need to have kernel sources with you. Anyways, you need to write a kernel module to access it, you cannot use it from user space. Learn how to write a kernel module here. Same is the case with functions preempt_disable and other interrupt disabling kernel functions
Now the point is, pthreads are in user space and your preemption disabling function is in kernel space. How to interact?
Either you need to write a new system call of your own where you do your preemption and interrupt disabling and call it from user space. Or you need to resort to other Kernel-User Space Interfaces like procfs, sysfs, ioctl etc
But I am really skeptical as to how all these will help you to benchmark library functions. You may want to have a look at how performance is typically measured using rdtsc

How early can I call kalloc in an arm linux kernel?

I would like to dynamically allocate memory from the machine_init function in my arm linux kernel. However, my tests indicate that calling kalloc sometimes results in a complete failure of the system to boot.
My debugging tools are very limited so I can't give much more information regarding the failure.
Simply put, is it legal to call kalloc from a machine_init function in ARM linux, and, if not, is there an alternative?
I understand that in most cases it is wrong-headed to be allocating memory this early in the boot process (this kind of work should be done by the device drivers); however, I am convinced that my particular project requires it.
I can't see where machine_init is called from, but I can't help thinking you're trying to do the wrong thing.
Device drivers and other subsystems have their own init time, trying to do things very early on is usually a mistake (because something required isn't started yet). You can definitely call kmalloc during the initialisation of a device driver (at least, most. Maybe the console driver is different).
In any case, the fact that your on ARM suggests that it's an embedded system, so you're unlikely to have to deal with a lot of different hardware. Can't you just statically allocate an array with as many elements as could possibly be required (give an error if it is exceeded) ?
Kmalloc is a kernel API on top slab/slob/slub memory frame work. Once any of these framework(one which used by kernel) is initialized kmalloc works fine. Make sure your call after the slab/slob/slub initialization
cheers

Resources