I'm working on an old linux operating system which has one kernel for all processes (it basically an exo-kernel type).
While implementing debugging features from user space, I would like to disassemble other's processes commands. Therefore, I have created a system-call which takes the virtual address at the target process and prints it's value in it (so I can disassemble the bytes).
My idea was to switch to the target's pgdir, call a pagewalk and then access the data in the physical address pointer. I get a kernel panic while trying to access the later.
If I'm switching to the target's process and then access the virtual address (without pagewalk), the bytes of the command are printed without any problem (with printf("%04x", *va) for example).
My question is - why does the virtual address contain the actual command but the physical address don't (and why does it panic?)
Thank you!
Note: This is an XY-answer ... I'm aware I'm not answering your question ('how to twiddle with hardware MMU setup to read ... memory somewhere') but I'm suggesting a solution to your stated problem (how to read from another process' address space).
Linux provides a facility to do what you ask for - read memory from another process' address space - via the use of ptrace(),
PTRACE_PEEKTEXT, PTRACE_PEEKDATA
Read a word at the address addr in the tracee's memory, returning
the word as the result of the ptrace() call. Linux does not have
separate text and data address spaces, so these two requests are
currently equivalent. (data is ignored; but see NOTES.)
https://stackoverflow.com/search?q=ptrace+PTRACE_PEEKDATA for some references ?
Related
Until now I thought the kernel has the permissions to write in readonly segments. But this code has brought a lot of questions
int main() {
char *x = "Hello World";
int status = pipe((int*)x);
perror("Error");
}
The output of the code is
Error : Bad Address
What my argument is, "Since the pipe function executes in kernel mode the ro segment must be writable by kernel". Which doesn't seem to be the case here. Now my questions are
How kernel protects the memory segments which are readonly?
Or am I assuming wrong about the kernel's capabilities?
Much like the user space, the kernel's address space is subject to whether a particular virtual address (also called a logical address) is mapped as readable, writable and executable. Unlike the user space though, the kernel has the free rein to map a group of virtual addresses with a page and change the page permission attributes. However, just because the kernel has the ability to map a page as writeable, does not mean the address stored in char*x was paged in the kernel's address space as writable, or even paged at all, at the time of the pipe call.
The way the kernel protects regions of memory is with a piece of hardware called a memory management unit (MMU). The MMU is what performs the mapping of virtual to physical addresses and enforces permissions in those regions. The kernel is more or less given free rein to configure the MMU. Unlike kernel space, user space code should be unable to access the MMU. Since the user space can not access the MMU, it can not change the page table's mappings or the permission attributes of a page. This effectively means that user space has to use the address space mapping and the permissions set by the kernel.
I don't understand where the "kernel can write to ro pages" assertion comes from. If the kernel wants to it can remap memory however it sees fit of course, but why would it do that for this case?
I presume you are running on x86. On this arch the kernel splits the address space into 2 parts (user/kernel). When you switch to the kernel, userspace is still mapped So in particular when the kernel wants tries to write to the provided address, it hits the same mapping your userspace process would. Since the mapping does not allow write access, the operation fails.
For the sake of argument let's say this would not hold true. That is, whatever read-only mapping is in userspace, the kernel will write to it anyway and that will work. Well, that would be an instant security problem - consider a file you can only read/exec, like the glibc. it is mapped read-only/exec. And now you make the kernel write to area, effectively changing the file for everyone. So why not in particular do read(evilfd, address_of_libc, sizeo_of_libc); and bam, you just managed to overwrite the entire lib with data of your choice.
If a 32bit user program is running on 64bit linux kernel,
and wants to pass a pointer to data in userspace to kernel code. If the
same structure is defined both in user space and kernel space.
will kernel space code be able to decode the data correctly?
If yes how it is done?
Yes. The 32bit addresses that you use (or any addresses that you use, it's the same in 64bits) are virtual addresses. In other words, any kind of address that you use and pass to anyone (including the kernel) is a "fantasy" thing, it does not correspond to real addresses in any obvious way. You don't know anything but virtual addresses.
In order to make this work, the kernel (usually with help from the MMU) routinely translates virtual addresses to phyiscal addresses. For that, every process has a table with all pages that are valid for this process (managed by the kernel).
The kernel maps and remaps virtual addresses to existing or non-existing locations at pretty much every page fault (so basically, all the time).
The kernel can consequently of course do any translations that may be necessary for any pointer you pass it, whenever that is the case.
In linux, because the bases of segments are all 0, so the logical address coincide with the linear address (Book "Understanding the linux kernel"). I think the logical address of different process may be the same, so the linear address of different process may be the same and as each process view 4GB, each process will have its own linear address space (local address space). But some other articles says there is a large linear address space shared by all process, and the segment mechanism is used to map different process into different part of the linear address space. Sounds like a global linear address space with wider address bits. Where am I wrong? Or they are used in different architecture?
Each Linux process has its own address space; it is virtual memory. Different processes have different address spaces (but all the threads inside a process share the same address space).
You can get a map of process 1234 on Linux by reading /proc/1234/maps or from inside the process /proc/self/maps
Try the following commands
cat /proc/$$/maps
cat /proc/self/maps
and think about their output; the first command shows the memory map of your shell; the second one shows the memory map of the process running cat
The address space is set with execve(2) at program startup and changed with the mmap(2) and related syscalls.
An application interact with the kernel only thru syscalls. The kernel has a "different" address space, which you should not care about (unless you are coding inside the kernel).
Read also a good book like Advanced Unix Programming and/or Advanced Linux Programming
See also this explanation on syscalls.
Notice that segmented addressing is specific to i386 and is obsolete: most systems don't use it anymore. It has completely disappeared in 64 bits mode of x86-64. All Linux systems use a flat memory model
Please read carefully all the references.
Intel support 3 kinds of addresses:
logical address --(segment unit)---> linear address ---(paging unit)---> physical address
as you know, all kernel and user code access data or text thought virtual address (logical address in CPU). The address is translated into linear address as the following graph:
As linux implementation does not support the concept of linear addressing and the segments is only provided for permission control. Linux kernel configures each segment's offset value to zero. That is why you can't see the linear address in kernel and kernel directly use virtual address on paging units.
After getting the linear address, the MMU paging unit reference CR3 register to get base of paing table to generate physical address.
The same with cpu cache, the paging unit also has a TLB cache per CPU core to speed up the address translation that performed on memory.
Reference:
intel64 software developer's manual
I am changing the linux kernel scheduler to print the pid of the next process in a known physical memory location. mmap is used for userspace programs while i read that ioremap marks the page as non-cacheable which would slowdown the execution of the program. I would like a fast way to write to a known physical memory. phys_to_virt is the option that i think is feasible. Any idea for a different technique.
PS: i am running this linux kernel on top of qemu. the physical address will be used by qemu to read information sent by guest kernel. writing to a known io-port is not feasible since the device code backing this io-device will be called every time there is an access to the device.
EDIT : I want the physical address location of the pid to be safe. How can I make sure that a physical address that the kernel is using is not being assigned to any process. As far as my knowledge goes, ioremap would mark the page as cacheable and would hence not be of great use.
The simplest way to do this would be to do kmalloc() to get some memory in the kernel. Then you can get the physical address of the pointer that returns by passing it to virt_to_phys(). This is a total hack but for your case of debugging / tracing under qemu, it should work fine.
EDIT: I misunderstood the question. If you want to use a specific physical address, there are a couple of things you could do. Maybe the cleanest thing to do would be to modify the e820 map that qemu passes in to mark the RAM page as reserved, and then the kernel won't use it. (ie the same way that ACPI tables are passed in).
If you don't want to modify qemu, you could also modify the early kernel startup (around arch/x86/kernel/setup.c probably) to do reserve_bootmem() on the specific physical page you want to protect from being used.
To actually use the specified physical address, you can just use ioremap_cache() the same way the ACPI drivers access their tables.
It seems I misunderstood the cache coherency between VM and host part, here is an updated answer.
What you want is "virtual adress in VM" <-> "virtual or physical adress in QEMU adress space".
Then you can either kmalloc it, but it may vary from instance to instance,
or simply declare a global variable in the kernel.
Then virt_to_phys would give you access to the physical address in VM space, and I suppose you can translate this in a QEMU adress space. What do you mean by "a physical address that the kernel is using is not assigned to any process ?" You are afraid the page conatining your variable might be swapped ? kmalloced memory is not swappable
Original (and wrong) answer
If the adress where you want to write is in it's own page, I can't see how an ioremap
of this page would slow down code executing in a different page.
You need a cache flush anyway, and without SSE, I can't see how you can bypass the cache if MMU and cache are on. I can see only this two options :
ioremap and declare a particular page non cacheable
use a "normal" address, and manually do a cache flush each time you write.
How exactly does the copy_from_user() function work internally? Does it use any buffers or is there any memory mapping done, considering the fact that kernel does have the privilege to access the user memory space?
The implementation of copy_from_user() is highly dependent on the architecture.
On x86 and x86-64, it simply does a direct read from the userspace address and write to the kernelspace address, while temporarily disabling SMAP (Supervisor Mode Access Prevention) if it is configured. The tricky part of it is that the copy_from_user() code is placed into a special region so that the page fault handler can recognise when a fault occurs within it. A memory protection fault that occurs in copy_from_user() doesn't kill the process like it would if it is triggered by any other process-context code, or panic the kernel like it would if it occured in interrupt context - it simply resumes execution in a code path which returns -EFAULT to the caller.
regarding "how bout copy_to_user since the kernel is passing on the kernel space address,how can a user space process access it"
A user space process can attempt to access any address. However, if the address is not mapped in that process user space (i.e. in the page tables of that process) or if there is a problem with the access like a write attempt to a read-only location, then a page fault is generated. Note that at least on the x86, every process has all the kernel space mapped in the lowest 1 gigabyte of that process's virtual address space, while the 3 upper gigabytes of the 4GB total address space (I'm using here the 32-bit classic case) are used for the process text (i.e. code) and data.
A copy to or from user space is executed by the kernel code that is executing on behalf of the process and actually it's the memory mapping (i.e. page tables) of that process that are in-use during the copy. This takes place while execution is in kernel mode - i.e. privileged/supervisor mode in x86 language.
Assuming the user-space code has passed a legitimate target location (i.e. an address properly mapped in that process address space) to have data copied to, copy_to_user, run from kernel context would be able to normally write to that address/region w/out problems and after the control returns to the user, user space also can read from this location setup by the process itself to start with.
More interesting details can be found in chapters 9 and 10 of Understanding the Linux Kernel, 3rd Edition, By Daniel P. Bovet, Marco Cesati. In particular, access_ok() is a necessary but not sufficient validity check. The user can still pass addresses not belong to the process address space. In this case, a Page Fault exception will occur while the kernel code is executing the copy. The most interesting part is how the kernel page fault handler determines that the page fault in such case is not due to a bug in the kernel code but rather a bad address from the user (especially if the kernel code in question is from a kernel module loaded).
The best answer has something wrong, copy_(from|to)_user can't be used in interrupt context, they may sleep, copy_(from|to)_user function can only be used in process context,
the process's page table include all the information that kernel need to access it, so kernel can direct access the user space address if we can make sure the page addressed is in memory, use copy_(from|to)_user function, because they can check it for us and if the user space addressed page is not resident, it will fix it for us directly.
The implementation of copy_from_user() system call is done using two buffers from different address spaces:
The user-space buffer in user virtual address space.
The kernel-space buffer in kernel virtual address space.
When the copy_from_user() system call is invoked, data is copied from user buffer to kernel buffer.
A part (write operation) of character device driver code where copy_from_user() is used is given below:
ssize_t cdev_fops_write(struct file *flip, const char __user *ubuf,
size_t count, loff_t *f_pos)
{
unsigned int *kbuf;
copy_from_user(kbuf, ubuf, count);
printk(KERN_INFO "Data: %d",*kbuf);
}