Physical address of kernel code - linux

this post says the kernel page is unswappable.
Then what I wonder is the physical address of kernel code is always calculatable from the virtual address when I disable KASLR (especially, in x86_64)?
Let's assume PHYSICAL_START in configuration is 0x1000000, and the start virtual address of <_text> section is 0xffffffff81000000.
I think the physical address of every kernel code is (vaddr - 0xffffffff81000000 + 0x1000000)
Is this always true? If not, is this true when I use defconf (except disabling KASLR)?
Updated:
I'm modifying QEMU itself for the research purpose. I have to read a guest kernel code instruction. I only use a vmlinux image (meaning, I don't load additional modules).
I have the virtual address and I tried to read a memory using this virtual address.
For somewhat reason, I failed to read a memory with the virtual address, but I succeeded to read a memory with a physical address (calculated by my hand).
So If I can calculate the physical address in the above way, I think this way could be a shortcut (even it isn't a good idea).
I know there are monitor and gdbserver and they work well. But I don't know they are the options.

Related

Logical and physical address with MMU on?

I'm having some doubts about how kernel physical and logical addresses are handled by the MMU. I'll try to explain my question doing an example.
Let's doing the assumption that we are on an ARM architecture.
The system starts with the MMU off, so all the addresses which pass inside the CPU are physical. Before we enable the MMU we make a page table where we say that all our physical addresses are mapped at the virtual addresses physical address + 0xC0000000. After this we turn on the MMU. All of this is clear. But now questions start:
Since we are in a pipeline architecture let's say that the instruction after is a load from the address 0x8000. Now for my knowledge here we should have a page fault since the MMU doesn't find this address anywhere inside the page table, so it's invoked a page fault for handle the situation. But also if we've set the interrupt vector, inside it there's a branch to another physical address, so the MMU doesn't find this address and we fall unavoidably in an endless loop. What am I missing?
You miss that all used virtual addresses should be mapped even if in fact they match the same physical memory regions. Let's elaborate.
Say there is startup code (for primary initialisation and enabling MMU) and other main code (only to work with MMU enabled) with a following layout.
startup - phys: 0x4000-0x7FFFF, virt: 0x4000-0x7FFFF
main - phys: 0x8000-0xBFFF, virt: 0xC0008000-0xC000BFFF
Such a layout is linker job and your job is to feed it with a right script. CPU entry point in this example is supposed to be 0x4000. That pretty much is a hardware specific address.
CPU starts with MMU disabled and 'PC=0x4000', and it continues till say 0x4800 when MMU is enabled.
So before enabling MMU there have to be maps for a 0x4000-0x7FFFF (startup) and 0xC0008000-0xC000BFFF (all other code).
Now MMU is enabled. PC has 0x4804 (assuming 32b CPU). 0x4804 is a virtual address now and it's mapped on 0x4804 physical address. CPU proceeds with 0x4804, 0x4808, 0x480C etc.
At some point you should jump into a main. For ARM that would be something like
ldr r0, =0xC0008000
bx r0
Note, that virtual address of main entry is used. So after branching PC=0xC0008000 which is resolved into 0x8000 in physical memory.
0x4000-0x7FFFF mapping could be removed after that.

How exactly do kernel virtual addresses get translated to physical RAM?

On the surface, this appears to be a silly question. Some patience please.. :-)
Am structuring this qs into 2 parts:
Part 1:
I fully understand that platform RAM is mapped into the kernel segment; esp on 64-bit systems this will work well. So each kernel virtual address is indeed just an offset from physical memory (DRAM).
Also, it's my understanding that as Linux is a modern virtual memory OS, (pretty much) all addresses are treated as virtual addresses and must "go" via hardware - the TLB/MMU - at runtime and then get translated by the TLB/MMU via kernel paging tables. Again, easy to understand for user-mode processes.
HOWEVER, what about kernel virtual addresses? For efficiency, would it not be simpler to direct-map these (and an identity mapping is indeed setup from PAGE_OFFSET onwards). But still, at runtime, the kernel virtual address must go via the TLB/MMU and get translated right??? Is this actually the case? Or is kernel virtual addr translation just an offset calculation?? (But how can that be, as we must go via hardware TLB/MMU?). As a simple example, lets consider:
char *kptr = kmalloc(1024, GFP_KERNEL);
Now kptr is a kernel virtual address.
I understand that virt_to_phys() can perform the offset calculation and return the physical DRAM address.
But, here's the Actual Question: it can't be done in this manner via software - that would be pathetically slow! So, back to my earlier point: it would have to be translated via hardware (TLB/MMU).
Is this actually the case??
Part 2:
Okay, lets say this is the case, and we do use paging in the kernel to do this, we must of course setup kernel paging tables; I understand it's rooted at swapper_pg_dir.
(I also understand that vmalloc() unlike kmalloc() is a special case- it's a pure virtual region that gets backed by physical frames only on page fault).
If (in Part 1) we do conclude that kernel virtual address translation is done via kernel paging tables, then how exactly does the kernel paging table (swapper_pg_dir) get "attached" or "mapped" to a user-mode process?? This should happen in the context-switch code? How? Where?
Eg.
On an x86_64, 2 processes A and B are alive, 1 cpu.
A is running, so it's higher-canonical addr
0xFFFF8000 00000000 through 0xFFFFFFFF FFFFFFFF "map" to the kernel segment, and it's lower-canonical addr
0x0 through 0x00007FFF FFFFFFFF map to it's private userspace.
Now, if we context-switch A->B, process B's lower-canonical region is unique But
it must "map" to the same kernel of course!
How exactly does this happen? How do we "auto" refer to the kernel paging table when
in kernel mode? Or is that a wrong statement?
Thanks for your patience, would really appreciate a well thought out answer!
First a bit of background.
This is an area where there is a lot of potential variation between
architectures, however the original poster has indicated he is mainly
interested in x86 and ARM, which share several characteristics:
no hardware segments or similar partitioning of the virtual address space (when used by Linux)
hardware page table walk
multiple page sizes
physically tagged caches (at least on modern ARMs)
So if we restrict ourselves to those systems it keeps things simpler.
Once the MMU is enabled, it is never normally turned off. So all CPU
addresses are virtual, and will be translated to physical addresses
using the MMU. The MMU will first look up the virtual address in the
TLB, and only if it doesn't find it in the TLB will it refer to the
page table - the TLB is a cache of the page table - and so we can
ignore the TLB for this discussion.
The page table
describes the entire virtual 32 or 64 bit address space, and includes
information like:
whether the virtual address is valid
which mode(s) the processor must be in for it to be valid
special attributes for things like memory mapped hardware registers
and the physical address to use
Linux divides the virtual address space into two: the lower portion is
used for user processes, and there is a different virtual to physical
mapping for each process. The upper portion is used for the kernel,
and the mapping is the same even when switching between different user
processes. This keep things simple, as an address is unambiguously in
user or kernel space, the page table doesn't need to be changed when
entering or leaving the kernel, and the kernel can simply dereference
pointers into user space for the
current user process. Typically on 32bit processors the split is 3G
user/1G kernel, although this can vary. Pages for the kernel portion
of the address space will be marked as accessible only when the processor
is in kernel mode to prevent them being accessible to user processes.
The portion of the kernel address space which is identity mapped to RAM
(kernel logical addresses) will be mapped using big pages when possible,
which may allow the page table to be smaller but more importantly
reduces the number of TLB misses.
When the kernel starts it creates a single page table for itself
(swapper_pg_dir) which just describes the kernel portion of the
virtual address space and with no mappings for the user portion of the
address space. Then every time a user process is created a new page
table will be generated for that process, the portion which describes
kernel memory will be the same in each of these page tables. This could be
done by copying all of the relevant portion of swapper_pg_dir, but
because page tables are normally a tree structures, the kernel is
frequently able to graft the portion of the tree which describes the
kernel address space from swapper_pg_dir into the page tables for each
user process by just copying a few entries in the upper layer of the
page table structure. As well as being more efficient in memory (and possibly
cache) usage, it makes it easier to keep the mappings consistent. This
is one of the reasons why the split between kernel and user virtual
address spaces can only occur at certain addresses.
To see how this is done for a particular architecture look at the
implementation of pgd_alloc(). For example ARM
(arch/arm/mm/pgd.c) uses:
pgd_t *pgd_alloc(struct mm_struct *mm)
{
...
init_pgd = pgd_offset_k(0);
memcpy(new_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,
(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
...
}
or
x86 (arch/x86/mm/pgtable.c) pgd_alloc() calls pgd_ctor():
static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
references from swapper_pg_dir. */
...
clone_pgd_range(pgd + KERNEL_PGD_BOUNDARY,
swapper_pg_dir + KERNEL_PGD_BOUNDARY,
KERNEL_PGD_PTRS);
...
}
So, back to the original questions:
Part 1: Are kernel virtual addresses really translated by the TLB/MMU?
Yes.
Part 2: How is swapper_pg_dir "attached" to a user mode process.
All page tables (whether swapper_pg_dir or those for user processes)
have the same mappings for the portion used for kernel virtual
addresses. So as the kernel context switches between user processes,
changing the current page table, the mappings for the kernel portion
of the address space remain the same.
The kernel address space is mapped to a section of each process for example on 3:1 mapping after address 0xC0000000. If the user code try to access this address space it will generate a page fault and it is guarded by the kernel.
The kernel address space is divided into 2 parts, the logical address space and the virtual address space. It is defined by the constant VMALLOC_START. The CPU is using the MMU all the time, in user space and in kernel space (can't switch on/off).
The kernel virtual address space is mapped the same way as user space mapping. The logical address space is continuous and it is simple to translate it to physical so it can be done on demand using the MMU fault exception. That is the kernel is trying to access an address, the MMU generate fault , the fault handler map the page using macros __pa , __va and change the CPU pc register back to the previous instruction before the fault happened, now everything is ok. This process is actually platform dependent and in some hardware architectures it mapped the same way as user (because the kernel doesn't use a lot of memory).

Processors and virtual/physical addresses

In nutshell, as I understand memory management, processor produces virtual addresses. These addresses are translated to corresponding physical addresses using per-process address table by MMU (with TLBs and page-faults in-between, as and when needed).
My question is does processor always produces Virtual addresses? In terms of Address-spaces(user/kernel), Processor modes (user/kernel) and contexts (process/system) when all times does processor produce physical addresses?
Memory typically knows nothing about virtual addresses or segments, which are CPU concepts, it is just memory, a collection of addressable and readable/writable bits. The processor talks to memory using physical addresses. Many simple processors (especially old ones or for special embedded uses) have no MMUs, virtual addresses or privileged modes. Those that have MMUs and virtual addresses normally start either with those disabled or they at first use fixed mapping because otherwise nothing would be able to work if there's no mapping at all.
So, physical addresses are always in use, while for virtual addresses it depends on the CPU and the software in use.
Processor is unaware about whether it is physical address or virtual address , it is the job of respective MMU to do the translation.
Processor has to place the address on it address bus , so now the path depends whether MMU is enabled or disabled. if MMU is enabled it will follow the path of MMU translation and respective physical address will get placed on address bus and if MMU is disabled the same address generated by respective instruction will be placed on address bus.
so it is responsibility of programmer if MMU is disabled all address access should be physical address otherwise it will be a exception or abort in system
Generally the CPU ONLY knows "virtual addresses". Ie, when you do any assembly programming, any "load regs, *(memory ptr)" or such-like operation, the addresses are virtual addresses.
The following diagram illustrates well the concept:
http://slideplayer.com/slide/4394245/ (page 7)
Any addresses coming out of CPU, is always virtual. But if the processor has a MMU, then the MMU will intercept the addresses and convert it to physical addresses (through Page Table mechanism) before putting it on the memory bus. So if you sniff the memory bus, you will see physical addresses.
References:
https://www.quora.com/What-is-the-return-address-of-kmalloc-Physical-or-Virtual
Yes when a system works on VM then instructions/programs produce only VA even at the time of booting, addresses generated are not the same as physical despite page tables are still not in existence.
But processor need physical addresses to access memory. So there is mechanism to produce physical addresses from virtual addresses which is called address translation done through page tables. Yes whatever you see in kernel mode/user mode, register values all are virtual addresses. but when processor actually perform computation, processor always uses physical addresses
My question is does processor always produces Virtual addresses? In terms of Address-spaces(user/kernel), Processor modes (user/kernel) and contexts (process/system) when all times does processor produce physical addresses?
An x86 CPU operates with 3 different "kinds" of addresses:
physical: the actual addresses that are used to select the byte in memory. Used for things like segment base addresses, interrupt descriptor table, global descriptor table.
logical: these are addresses that you use most of the time when paging is disabled. They're converted to physical addresses by adding the respective segments (physical) base address.
virtual: the addresses used by most instructions when paging is enabled. These are translated into logical addresses using page directories and tables.
Which kind of address is used isn't dependent on the current privilege level (CPL; "system mode", "user mode" or between). It depends on the state of the processor (paging enabled or not) and the actual instruction (lidt for example). Though I guess it's safe to assume that there won't be physical addressing in "user mode" (CPL > 0) since instructions using physical addresses are usually privileged instructions.

Register value debugging

I have an objdump of the crashing method. I found that the crash is due to a bad memory access. The memory address is present in the MIPS register a0. Is there a way to track on how the register got this address other than backtracking (walkthrough) the objdump step by step(a0 got it from s3 and so on).
And I have one more question.
How is paging done in kernel. There must be no concept of Virtual Address in Kernel since all of them are already in memory. This question I got since in my crash there is something called BADVA(is it BAD Virtual Address) holding a bad address.
Here is the crash report
Cpu 0
Registries dump
Status: 10000302 KERNEL EXL
**Cause : 00803c08 TLBL**
**BadVA : fdca9b68**
PrId : 01019378
The proximate cause of the crash is that no TLB entry matches the virtual address in BadVA, and that this happened while the CPU is in exception mode.
The BadVA address (fdca9b68) is in the KSEG2 region of the virtual address space. This region is used for mapped addresses in the MIPS Linux Kernel (typically used for Kernel Modules). I would suspect a bug in a kernel module.
If you want to understand what is going on in a MIPS CPU you should buy and study See MIPS Run.

Write to a cacheable physical address in linux kernel without using ioremap or mmap

I am changing the linux kernel scheduler to print the pid of the next process in a known physical memory location. mmap is used for userspace programs while i read that ioremap marks the page as non-cacheable which would slowdown the execution of the program. I would like a fast way to write to a known physical memory. phys_to_virt is the option that i think is feasible. Any idea for a different technique.
PS: i am running this linux kernel on top of qemu. the physical address will be used by qemu to read information sent by guest kernel. writing to a known io-port is not feasible since the device code backing this io-device will be called every time there is an access to the device.
EDIT : I want the physical address location of the pid to be safe. How can I make sure that a physical address that the kernel is using is not being assigned to any process. As far as my knowledge goes, ioremap would mark the page as cacheable and would hence not be of great use.
The simplest way to do this would be to do kmalloc() to get some memory in the kernel. Then you can get the physical address of the pointer that returns by passing it to virt_to_phys(). This is a total hack but for your case of debugging / tracing under qemu, it should work fine.
EDIT: I misunderstood the question. If you want to use a specific physical address, there are a couple of things you could do. Maybe the cleanest thing to do would be to modify the e820 map that qemu passes in to mark the RAM page as reserved, and then the kernel won't use it. (ie the same way that ACPI tables are passed in).
If you don't want to modify qemu, you could also modify the early kernel startup (around arch/x86/kernel/setup.c probably) to do reserve_bootmem() on the specific physical page you want to protect from being used.
To actually use the specified physical address, you can just use ioremap_cache() the same way the ACPI drivers access their tables.
It seems I misunderstood the cache coherency between VM and host part, here is an updated answer.
What you want is "virtual adress in VM" <-> "virtual or physical adress in QEMU adress space".
Then you can either kmalloc it, but it may vary from instance to instance,
or simply declare a global variable in the kernel.
Then virt_to_phys would give you access to the physical address in VM space, and I suppose you can translate this in a QEMU adress space. What do you mean by "a physical address that the kernel is using is not assigned to any process ?" You are afraid the page conatining your variable might be swapped ? kmalloced memory is not swappable
Original (and wrong) answer
If the adress where you want to write is in it's own page, I can't see how an ioremap
of this page would slow down code executing in a different page.
You need a cache flush anyway, and without SSE, I can't see how you can bypass the cache if MMU and cache are on. I can see only this two options :
ioremap and declare a particular page non cacheable
use a "normal" address, and manually do a cache flush each time you write.

Resources