Kernel Panic due to bad paging request - linux

What can be the reasons for the kernel to panic due to
Unable to handle kernel paging request at virtual address 0x00000024 epc=0x9caf9876 ra=0x9432adfc
Address not yet dynamically allocated
No corresponding virtual address entry in page table
What else?
Correct me if am wrong.

virtual address 0x00000024
Surely that's a NULL pointer dereference? Accessing p->field, where p == NULL and offsetof(typeof(p), field) == 0x24.
EDIT: ah, note this doesn't explain a full panic. Most frequently, a NULL pointer dereference would take down one task, log "OOPS" and a bracktrace, and let you try to shut down. With a panic, all you could do is hit the hard reboot button.
If you had a NULL-pointer dereference inside the MM, maybe that would be a reason for a full panic. I think the surrounding messages would let you determine whether that was the case.

Related

write access for kernel into a readonly segment

Until now I thought the kernel has the permissions to write in readonly segments. But this code has brought a lot of questions
int main() {
char *x = "Hello World";
int status = pipe((int*)x);
perror("Error");
}
The output of the code is
Error : Bad Address
What my argument is, "Since the pipe function executes in kernel mode the ro segment must be writable by kernel". Which doesn't seem to be the case here. Now my questions are
How kernel protects the memory segments which are readonly?
Or am I assuming wrong about the kernel's capabilities?
Much like the user space, the kernel's address space is subject to whether a particular virtual address (also called a logical address) is mapped as readable, writable and executable. Unlike the user space though, the kernel has the free rein to map a group of virtual addresses with a page and change the page permission attributes. However, just because the kernel has the ability to map a page as writeable, does not mean the address stored in char*x was paged in the kernel's address space as writable, or even paged at all, at the time of the pipe call.
The way the kernel protects regions of memory is with a piece of hardware called a memory management unit (MMU). The MMU is what performs the mapping of virtual to physical addresses and enforces permissions in those regions. The kernel is more or less given free rein to configure the MMU. Unlike kernel space, user space code should be unable to access the MMU. Since the user space can not access the MMU, it can not change the page table's mappings or the permission attributes of a page. This effectively means that user space has to use the address space mapping and the permissions set by the kernel.
I don't understand where the "kernel can write to ro pages" assertion comes from. If the kernel wants to it can remap memory however it sees fit of course, but why would it do that for this case?
I presume you are running on x86. On this arch the kernel splits the address space into 2 parts (user/kernel). When you switch to the kernel, userspace is still mapped So in particular when the kernel wants tries to write to the provided address, it hits the same mapping your userspace process would. Since the mapping does not allow write access, the operation fails.
For the sake of argument let's say this would not hold true. That is, whatever read-only mapping is in userspace, the kernel will write to it anyway and that will work. Well, that would be an instant security problem - consider a file you can only read/exec, like the glibc. it is mapped read-only/exec. And now you make the kernel write to area, effectively changing the file for everyone. So why not in particular do read(evilfd, address_of_libc, sizeo_of_libc); and bam, you just managed to overwrite the entire lib with data of your choice.

Null pointer dereference in User space and kernel space

what will happen if we dereference the null pointer in user space and kernel space?
From my understanding the behaviour is based on compiler,architecture,etc.
but in general for every user space program allocated with virtual memory and the paging is used to translate the virtual address to physical address using page tables.
so if we are dereferencing null pointer in user space,that address is invalid so the context switch will happen and in kernel based on the interrupt for this null pointer dereference 'Segmentation fault will come or page fault error will come'.
In Kernel space:
If we dereference the NULL pointer there is a possibility of crashing the system or kernel may not able to return from that call.
Is my understanding correct?or any other informations missing means please explain.
Ref:I have understood from this "What happens in OS when we dereference a NULL pointer in C?"
The kernel maps the page at virtual address 0 into all processes with no permission bits set. When you try to access that page, you get a page fault. The kernel routine that handles it issues a SIGSEGV signal to your process. If you have no handler for SIGSEGV registered, core is dumped and you see your "Segmentation fault" message.
Kernel side, things are a bit different. After all, the kernel is supposed to be robust:
If the dereference happens and recovery is possible (e.g. your trackpad driver did the offence), a kernel oops is generated. The kernel continues running (for now).
If the dereference occurs so that no recovery is possible, the Oops leads to a kernel panic. Reboot necessary.
If for some reason, there is data mapped at page zero, you will corrupt memory. Which could lead to a panic down the way, go unnoticed or even be abused for a privilege escalation attack.

x86_64 Linux 3.0: Invalid Memory Addresses

Processes on Linux 3.0 on x86_64 architecture have a 64-bit virtual address space.
It is clear that 0 is guaranteed to be an invalid memory address [see definition below] in this address space, as this is used to indicate a NULL pointer.
What other 64-bit numbers (if any) are guaranteed never to be valid memory addresses, and why?
For example, can 1 ever be a valid address? What about 2^64-1?
Definition: What do you mean "guaranteed to be an invalid memory address" ?
void deref_and_assign(uint64_t i)
{
char* p = (char*) i;
*p = 42;
}
For the purposes of this question a guaranteed invalid memory reference means that the function deref_and_assign will always raise a SIGSEGV.
On x86/64 if page translation enabled and the memory at virtual address 0 isn't accessible (because of the way physical memory is mapped into the virtual address space), 1 ... 4095 won't be accessible either because all these 4096 addresses correspond to a single page of memory and it can only be available or unavailable as a whole. It is a good idea to never map memory at virtual address 0. Not mapping it will help to catch many NULL pointer dereferences. The CPU here will generate a page fault (aka #PF) on unmapped locations or locations requiring higher privilege than the currently executing code.
In 64-bit mode the CPU may implement fewer (48+) than 64 virtual address bits and 64-bit addresses must contain either all zeroes or all ones in the bits that aren't implemented (the value, 0 or 1, must be the same as the value of the most significant implemented address bit, all of which can be interpreted as address sign-extension). Such addresses are called canonical. If you try to read or write memory using a non-canonical address, you'll get a general protection fault (AKA #GP).
So, depending on the OS (effectively, on its memory layout) and actual CPU you may come up with ranges of "invalid" memory addresses. If you try to read/write the kernel's memory from a user mode application, you'll get #PF. If you try to read/write unmapped memory (e.g. at address 0 through 4095), you'll get #PF. If you try to read/write at a non-canonical address, you'll get a #GP.
Is that the kind of thing you're looking for?
You checked that a Linux process cannot mmap with MAP_FIXED a segment starting at (void*)0.
So for practical purposes you can safely assume that the very first page 0 - 0xfff is never mmap-ed (the 4Kb size of pages is processor and system dependent, but it is very often 4kb). Then you can assume that dereferencing pointers (from inside a Linux application) inside this first page gives SIGSEGV
Likewise for the last page ending at 0xffffffffffffffff (ie 2^64-1)

How does copy_from_user from the Linux kernel work internally?

How exactly does the copy_from_user() function work internally? Does it use any buffers or is there any memory mapping done, considering the fact that kernel does have the privilege to access the user memory space?
The implementation of copy_from_user() is highly dependent on the architecture.
On x86 and x86-64, it simply does a direct read from the userspace address and write to the kernelspace address, while temporarily disabling SMAP (Supervisor Mode Access Prevention) if it is configured. The tricky part of it is that the copy_from_user() code is placed into a special region so that the page fault handler can recognise when a fault occurs within it. A memory protection fault that occurs in copy_from_user() doesn't kill the process like it would if it is triggered by any other process-context code, or panic the kernel like it would if it occured in interrupt context - it simply resumes execution in a code path which returns -EFAULT to the caller.
regarding "how bout copy_to_user since the kernel is passing on the kernel space address,how can a user space process access it"
A user space process can attempt to access any address. However, if the address is not mapped in that process user space (i.e. in the page tables of that process) or if there is a problem with the access like a write attempt to a read-only location, then a page fault is generated. Note that at least on the x86, every process has all the kernel space mapped in the lowest 1 gigabyte of that process's virtual address space, while the 3 upper gigabytes of the 4GB total address space (I'm using here the 32-bit classic case) are used for the process text (i.e. code) and data.
A copy to or from user space is executed by the kernel code that is executing on behalf of the process and actually it's the memory mapping (i.e. page tables) of that process that are in-use during the copy. This takes place while execution is in kernel mode - i.e. privileged/supervisor mode in x86 language.
Assuming the user-space code has passed a legitimate target location (i.e. an address properly mapped in that process address space) to have data copied to, copy_to_user, run from kernel context would be able to normally write to that address/region w/out problems and after the control returns to the user, user space also can read from this location setup by the process itself to start with.
More interesting details can be found in chapters 9 and 10 of Understanding the Linux Kernel, 3rd Edition, By Daniel P. Bovet, Marco Cesati. In particular, access_ok() is a necessary but not sufficient validity check. The user can still pass addresses not belong to the process address space. In this case, a Page Fault exception will occur while the kernel code is executing the copy. The most interesting part is how the kernel page fault handler determines that the page fault in such case is not due to a bug in the kernel code but rather a bad address from the user (especially if the kernel code in question is from a kernel module loaded).
The best answer has something wrong, copy_(from|to)_user can't be used in interrupt context, they may sleep, copy_(from|to)_user function can only be used in process context,
the process's page table include all the information that kernel need to access it, so kernel can direct access the user space address if we can make sure the page addressed is in memory, use copy_(from|to)_user function, because they can check it for us and if the user space addressed page is not resident, it will fix it for us directly.
The implementation of copy_from_user() system call is done using two buffers from different address spaces:
The user-space buffer in user virtual address space.
The kernel-space buffer in kernel virtual address space.
When the copy_from_user() system call is invoked, data is copied from user buffer to kernel buffer.
A part (write operation) of character device driver code where copy_from_user() is used is given below:
ssize_t cdev_fops_write(struct file *flip, const char __user *ubuf,
size_t count, loff_t *f_pos)
{
unsigned int *kbuf;
copy_from_user(kbuf, ubuf, count);
printk(KERN_INFO "Data: %d",*kbuf);
}

Linux Kernel: copy_from_user - struct with pointers

I've implemented some kind of character device and I need help with copy_ from_user function.
I've a structure:
struct my_struct{
int a;
int *b;
};
I initialize it in user space and pass pointer to my_struct to my char device using 'write' function. In Kernel's Space character device 'write' function I cast it from a *char to this kind of structure. I alloc some memory for a struct using kmalloc and do copy_from_user into it.
It's fine for simple 'int a', but it copies only pointer (address) of b value, not value pointed by b, so I'm now in Kernel Space and I'm working with a pointer that points to a user space memory. Is this incorrect and I shouldn't access to user space pointer directly and I have to copy_from_user every single pointer in my struct and then copy back every pointer in "read" function using copy_to_user function?
You must always use copy_from_user and similar to access user space memory from kernel space, regardless of how you got the pointer. Since b is a pointer to user space memory, you must use copy_from_user to access it.
These functions do two important additional tasks:
They make sure the pointer points into user space and not kernel space. Without this check, user space programs might be able to read or write to kernel memory, bypassing normal security.
They handle page faults correctly. Normally a page fault in kernel mode will result in an OOPS or panic - the copy_*_user family of functions have a special override that tells the PF handler that all is well, and the fault should be handled normally; and in the event that the fault cannot be satisfied by IO (ie, what would normally cause a SIGSEGV or SIGBUS), return an error code instead so their caller can do any necessary cleanup before returning to userspace with -EFAULT.
You are correct in your surmising. If you need to access the value *b, you will need to use copy_from_user (and copy_to_user to update it back in the user process).

Resources