Null pointer dereference in User space and kernel space

Null pointer dereference in User space and kernel space - linux

what will happen if we dereference the null pointer in user space and kernel space?
From my understanding the behaviour is based on compiler,architecture,etc.
but in general for every user space program allocated with virtual memory and the paging is used to translate the virtual address to physical address using page tables.
so if we are dereferencing null pointer in user space,that address is invalid so the context switch will happen and in kernel based on the interrupt for this null pointer dereference 'Segmentation fault will come or page fault error will come'.
In Kernel space:
If we dereference the NULL pointer there is a possibility of crashing the system or kernel may not able to return from that call.
Is my understanding correct?or any other informations missing means please explain.
Ref:I have understood from this "What happens in OS when we dereference a NULL pointer in C?"

The kernel maps the page at virtual address 0 into all processes with no permission bits set. When you try to access that page, you get a page fault. The kernel routine that handles it issues a SIGSEGV signal to your process. If you have no handler for SIGSEGV registered, core is dumped and you see your "Segmentation fault" message.
Kernel side, things are a bit different. After all, the kernel is supposed to be robust:
If the dereference happens and recovery is possible (e.g. your trackpad driver did the offence), a kernel oops is generated. The kernel continues running (for now).
If the dereference occurs so that no recovery is possible, the Oops leads to a kernel panic. Reboot necessary.
If for some reason, there is data mapped at page zero, you will corrupt memory. Which could lead to a panic down the way, go unnoticed or even be abused for a privilege escalation attack.

Related

Linux - Accessing mmap()ed memory from Thread from Userspace in Kernel Space

Mapped this memory in my Thread in Userspace:
b7fd0000-b7fd1000 rwxp 00000000 00:00 0
Thread is running (endless loop)
Made a breakpoint in the Kernel and trying to access it:
Thread 466 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 3908]
0xc10d4060 in kgdb_breakpoint ()
(gdb) x/01i 0xb7fd0000
0xb7fd0000: Cannot access memory at address 0xb7fd0000
But it is not accessible.
How can I access 0xb7fd0000 from Kernel space? What address will be under?
Is it even possible?
Thanks,

The address the memory will appear under depends on which user space context is currently mapped.
The way this works is, some of the virtual addresses are reserved to the kernel, and these are the same in all contexts. This is why you can set a break point on a kernel address without worrying about which user space process is currently mapped.
For user space, this is not the case. Each time a new process is mapped, the virtual addresses for US change completely.
This is, likely, an X-Y problem. You're trying to do something, and you think that a kernel level break point is how you want to achieve it.
Taking a guess, you want your kernel driver to do something to communicate with your user space thread. If that's the case, your best bet is to export a character device, and have the userspace open it and mmap from there (rather than just an anonymous mmap). You can then control which memory it receives, and thus also map it to the kernel address space, where pointers are stable.

I'm not expert in kgdb but it will be worth checking current->pid at the time breakpoint is hit, contains pid of the process you're tracking. This is because although memory location inside kernel space has been hit, the user space process may not be the one you are interested in. Also to guard against possibility of pages being swapped out, it might be safer to lock mapped pages using mlock.

There are at least two things to pay attention to.
When to break. If you simply hit crtl-C in the kernel debugger, you don't know what the userspace context is going to be. It could be any userspace process. You want the kernel debugger to pause and give you control when the userspace context refers to the process you are interested in. One way to do this is as follows :- If you have the ability to recompile the kernel, add a new system call. Invoke this system call from the userspace process, after the region is mmap'd in. Start debugging the kernel after placing a breakpoint on the newly added system call. When the breakpoint is hit, you know for a fact that the userspace context is that of the process one you are interested in.
Virtual memory late binding. Even if you follow the steps in [1], you will still have trouble when accessing the contents of the buffer in userspace, if you have not read / written anything in that location. Make sure that after mmap'ing the region, you either read or write to the mmap'd location before invoking your newly added system call.

Kernel Panic due to bad paging request

What can be the reasons for the kernel to panic due to
Unable to handle kernel paging request at virtual address 0x00000024 epc=0x9caf9876 ra=0x9432adfc
Address not yet dynamically allocated
No corresponding virtual address entry in page table
What else?
Correct me if am wrong.

virtual address 0x00000024
Surely that's a NULL pointer dereference? Accessing p->field, where p == NULL and offsetof(typeof(p), field) == 0x24.
EDIT: ah, note this doesn't explain a full panic. Most frequently, a NULL pointer dereference would take down one task, log "OOPS" and a bracktrace, and let you try to shut down. With a panic, all you could do is hit the hard reboot button.
If you had a NULL-pointer dereference inside the MM, maybe that would be a reason for a full panic. I think the surrounding messages would let you determine whether that was the case.

For arm Linux, could threads in user space access virtual address of Kernel space?

Virtual memory is split two parts. In tradition, 0~3GB is for user space and 3GB~4GB for kernel space.
My question:
Could the thread in user space access memory of kernel space?
For ARM datasheet, the access attribution is in the charge of domain access control register. But in kernel source code,the domain value in page table entry of user space virtual memory is same as kernel space's page table entry.

In fact, your application might access page 0xFFFF0000, as it contains the swi-handler and a couple of other userspace-helpers. So no, the 3/1 split is nothing magical, it's just very easy for the kernel to manage.
Usually the kernel will setup all memory above 3GB to be only accessible by the kernel-domain itself. If a driver needs to share memory between user and kernel-space it will usually provide an mmap interface, which then creates an aliased mapping, so you have two virtual addresses for the same physical address. This only works reliably on VIPT-Cache systems or with a LOT of careful explicit cache flushing. If you don't want this you CAN hack the kernel to make a chunk of memory ABOVE the 3G-split accessible to userspace. But then all userspace applications will share this memory. I've done this once for a special application on a armv5-system.

Userspace code getting Kernel memory? The only kernel that ever allowed that was DOS and its archaic friends.
But back to the question, look at this example C code:
char c=42;
*c=42;
We take one byte (a char) and assign it the numeric value 42. We then dereference this non-pointer, which will probably try to access the 42nd byte of virtual memory, which is almost definitely not your memory, and, for the sake of this example, Kernel memory. guess what happens when you run this (if you manage to hold the compiler at gunpoint):
Segmentation fault
Linux has memory protection like any modern operating system. If you try to access the memory of another process, your process will be terminated before it can do anything (other things I'm not so sure about happen with debuggers though). Even if that memory was that of another Userland process, you would still get terminated. I'm almost sure that root programs can't access other programs memory, or Kernel memory. The only way to access Kernel memory is to be part of the Kernel, or indirectly through the kernel's cooperation.

How does copy_from_user from the Linux kernel work internally?

How exactly does the copy_from_user() function work internally? Does it use any buffers or is there any memory mapping done, considering the fact that kernel does have the privilege to access the user memory space?

The implementation of copy_from_user() is highly dependent on the architecture.
On x86 and x86-64, it simply does a direct read from the userspace address and write to the kernelspace address, while temporarily disabling SMAP (Supervisor Mode Access Prevention) if it is configured. The tricky part of it is that the copy_from_user() code is placed into a special region so that the page fault handler can recognise when a fault occurs within it. A memory protection fault that occurs in copy_from_user() doesn't kill the process like it would if it is triggered by any other process-context code, or panic the kernel like it would if it occured in interrupt context - it simply resumes execution in a code path which returns -EFAULT to the caller.

regarding "how bout copy_to_user since the kernel is passing on the kernel space address,how can a user space process access it"
A user space process can attempt to access any address. However, if the address is not mapped in that process user space (i.e. in the page tables of that process) or if there is a problem with the access like a write attempt to a read-only location, then a page fault is generated. Note that at least on the x86, every process has all the kernel space mapped in the lowest 1 gigabyte of that process's virtual address space, while the 3 upper gigabytes of the 4GB total address space (I'm using here the 32-bit classic case) are used for the process text (i.e. code) and data.
A copy to or from user space is executed by the kernel code that is executing on behalf of the process and actually it's the memory mapping (i.e. page tables) of that process that are in-use during the copy. This takes place while execution is in kernel mode - i.e. privileged/supervisor mode in x86 language.
Assuming the user-space code has passed a legitimate target location (i.e. an address properly mapped in that process address space) to have data copied to, copy_to_user, run from kernel context would be able to normally write to that address/region w/out problems and after the control returns to the user, user space also can read from this location setup by the process itself to start with.
More interesting details can be found in chapters 9 and 10 of Understanding the Linux Kernel, 3rd Edition, By Daniel P. Bovet, Marco Cesati. In particular, access_ok() is a necessary but not sufficient validity check. The user can still pass addresses not belong to the process address space. In this case, a Page Fault exception will occur while the kernel code is executing the copy. The most interesting part is how the kernel page fault handler determines that the page fault in such case is not due to a bug in the kernel code but rather a bad address from the user (especially if the kernel code in question is from a kernel module loaded).

The best answer has something wrong, copy_(from|to)_user can't be used in interrupt context, they may sleep, copy_(from|to)_user function can only be used in process context,
the process's page table include all the information that kernel need to access it, so kernel can direct access the user space address if we can make sure the page addressed is in memory, use copy_(from|to)_user function, because they can check it for us and if the user space addressed page is not resident, it will fix it for us directly.

The implementation of copy_from_user() system call is done using two buffers from different address spaces:
The user-space buffer in user virtual address space.
The kernel-space buffer in kernel virtual address space.
When the copy_from_user() system call is invoked, data is copied from user buffer to kernel buffer.
A part (write operation) of character device driver code where copy_from_user() is used is given below:
ssize_t cdev_fops_write(struct file *flip, const char __user *ubuf,
size_t count, loff_t *f_pos)
{
unsigned int *kbuf;
copy_from_user(kbuf, ubuf, count);
printk(KERN_INFO "Data: %d",*kbuf);
}

Debugging SIGBUS on x86 Linux

What can cause SIGBUS (bus error) on a generic x86 userland application in Linux? All of the discussion I've been able to find online is regarding memory alignment errors, which from what I understand doesn't really apply to x86.
(My code is running on a Geode, in case there are any relevant processor-specific quirks there.)

SIGBUS can happen in Linux for quite a few reasons other than memory alignment faults - for example, if you attempt to access an mmap region beyond the end of the mapped file.
Are you using anything like mmap, shared memory regions, or similar?

You can get a SIGBUS from an unaligned access if you turn on the unaligned access trap, but normally that's off on an x86. You can also get it from accessing a memory mapped device if there's an error of some kind.
Your best bet is using a debugger to identify the faulting instruction (SIGBUS is synchronous), and trying to see what it was trying to do.

SIGBUS on x86 (including x86_64) Linux is a rare beast. It may appear from attempt to access past the end of mmaped file, or some other situations described by POSIX.
But from hardware faults it's not easy to get SIGBUS. Namely, unaligned access from any instruction — be it SIMD or not — usually results in SIGSEGV. Stack overflows result in SIGSEGV. Even accesses to addresses not in canonical form result in SIGSEGV. All this due to #GP being raised, which almost always maps to SIGSEGV.
Now, here're some ways to get SIGBUS due to a CPU exception:
Enable AC bit in EFLAGS, then do unaligned access by any memory read or write instruction. See this discussion for details.
Do canonical violation via a stack pointer register (rsp or rbp), generating #SS. Here's an example for GCC (compile with gcc test.c -o test -masm=intel):
int main()
{
__asm__("mov rbp,0x400000000000000\n"
"mov rax,[rbp]\n"
"ud2\n");
}

Oh yes there's one more weird way to get SIGBUS.
If the kernel fails to page in a code page due to memory pressure (OOM killer must be disabled) or failed IO request, SIGBUS.

This was briefly mentioned above as a "failed IO request", but I'll expand upon it a bit.
A frequent case is when you lazily grow a file using ftruncate, map it into memory, start writing data and then run out of space in your filesystem. Physical space for mapped file is allocated on page faults, if there's none left then process receives a SIGBUS.
If you need your application to correctly recover from this error, it makes sense to explicitly reserve space prior to mmap using fallocate. Handling ENOSPC in errno after fallocate call is much simpler than dealing with signals, especially in a multi-threaded application.

You may see SIGBUS when you're running the binary off NFS (network file system) and the file is changed. See https://rachelbythebay.com/w/2018/03/15/core/.

If you request a mapping backed by hugepages with mmap and the MAP_HUGETLB flag, you can get SIGBUS if the kernel runs out of allocated huge pages and thus cannot handle a page fault.
In this case, you'll need to raise the number of allocated huge pages via
/sys/kernel/mm/hugepages/hugepages-<size>/nr_hugepages or
/sys/devices/system/node/nodeX/hugepages/hugepages-<size>/nr_hugepages on NUMA systems.

A common cause of a bus error on x86 Linux is attempting to dereference something that is not really a pointer, or is a wild pointer. For example, failing to initialize a pointer, or assigning an arbitrary integer to a pointer and then attempting to dereference it will normally produce either a segmentation fault or a bus error.
Alignment does apply to x86. Even though memory on an x86 is byte-addressable (so you can have a char pointer to any address), if you have for example an pointer to a 4-byte integer, that pointer must be aligned.
You should run your program in gdb and determine which pointer access is generating the bus error to diagnose the issue.

It's a bit off the beaten path, but you can get SIGBUS from an unaligned SSE2 (m128) load.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string