I am working on a small embedded system. When my linux boots up into user space, I know where are my devices in the physical memory. I want to map them into user space virtual addresses. Currently, I am doing it through a kernel module. I use vmalloc/kmalloc (depending on the size) and then I use ioremap_page_range on that returned virtual addresses to map my physical addresses. I dont think that is the correct way to go about. First of all I am allocating memory and then I am asking kernel to remap that virtual address space to some different physical address space. (Initially mapped physical->virtual in vmcall/kmalloc is kinda useless as I dont care about those physical pages. This is definitely not good.)
Instead of this is there a better way to map the known physical memory into user space process. (I know other than my user space process, no one gonna touch that memory.)
Thanks
What you are trying to do is accessing what is called IO memory. I can only encourage you to read the Linux Device Drivers (LDD) book and more specifically the chapter 9.
To "allocate" such an area, you need to call
struct resource *request_mem_region(unsigned long start, unsigned long len, char *name)
. Before your driver can access it, you have to assign it a virtual address, this is done with a call to
void *ioremap(unsigned long phys_addr, unsigned long size)
To ensure that your driver will then work on different architectures/platforms, be sure to use some accessor function to such areas ( ioread8/16/32 or iowrite8/16/32 and all of their variants).
In Kernel module, remap_pfn_range() can be used to convert the physical address to virtual address. The following link will be helpful.
How remap_pfn_range remaps kernel memory to user space?
In Kernel module, remap_pfn_range() can be used to convert the physical address to virtual address. When you don't have a actual devices you can:
1) create a virtual device and,
2) use mmap to those virtual devices to access the very same kernel memory through remap_pfn_range virtual mapping of that process.
3) Usually in dedicated environments you may addition want to pin those physical pages lest they are taken away from your process.
4) You also share these physical addresses with different processes but will need to handle synchronization, independently through other IPC mechanisms as to each process they will look as different addresses.
Related
I have started to learn about Virtual Address Space (VAS) and I have few questions:
How much of VAS is created for each process depending on the architecture (32-bit and 64-bit)?
Is VAS for each process created on hard disk? If so, what happens if there is not enough space?
What is the difference between VAS and Virtual Memory (VM)?
Virtual address versus physical address
During the execution of your program, the variables (integers, arrays, strings, etc.) are stored somewhere in the main memory of your computer (RAM). Some programming languages (like C or C++) allow you to obtain the memory address at which a given variable is stored (with the & operator), and to manipulate that address (add to it, subtract from it, print it, etc.).
Here is a C program that prints the memory address of a variable:
#include <stdio.h>
int main(void) {
int variable = 1234;
void *address = &variable;
printf("Memory address of variable: %p\n", address);
return 0;
}
Output:
Memory address of variable: 0x7ffc9e9662a4
Now, if you compile and execute this program on a typical desktop computer, with a typical operating system (like GNU/Linux or Windows), the memory address that is printed by this program is not the hardware address at which the data 1234 is actually located in the memory chip. This may be surprising, but there is a level of indirection between the addresses used by your program and the hardware addresses.
Virtual address space on 64-bit computers
On a 64-bit computer, a memory address manipulated by your program is an integer between 0 and 18446744073709551615 inclusive. Such an address is called a virtual memory address. The range of those addresses is called the virtual address space of the process. You can ask the operating system to map a range of virtual memory addresses to the physical memory of your computer, so that when you try to read or write bytes at thoses addresses, your program doesn't crash for accessing unmapped virtual memory addresses.
Typically, on x86-64 computers, only 248 virtual memory addresses can be successfully mapped to physical memory, because 256 TiB of usable virtual address space is considered sufficient. In the future, processor manufacturers may raise or remove this limit if there is a need for it.
Virtual address space on 32-bit computers
On 32-bit computers, there are 232 virtual memory addresses. On those computers, a memory address manipulated by your program is an integer between 0 and 4294967295 inclusive.
On x86 32-bit computers, there is usually no restriction on the range of virtual memory addresses that can be mapped to physical memory addresses.
Mapping a range of virtual memory addresses
On GNU/Linux, you can request a mapping by calling the function mmap(). On Windows, you can request a mapping by calling the function VirtualAlloc(). Those functions take the size of the mapping as argument, and return the first virtual address that is now backed by actual physical memory. Those functions can fail to create a new mapping if the physical memory is already completely used by other processes. And again, if you try to access (read or write) the content of a virtual memory address that is outside an area mapped by mmap() or VirtualAlloc(), the operating system will terminate your program (by sending a segmentation fault signal).
On GNU/Linux, a process can examine the mappings created in its virtual address space just by reading the file /proc/self/maps. You can learn a lot by reading the output of the command cat /proc/self/maps.
Hard disk drive
On a typical computer, the main memory is a semiconductor memory, and the hard disk drive is only a secondary storage device.
On a typical operating system, a range of virtual memory addresses can only be mapped to the main memory (which is usually a semiconductor memory device). Such a range cannot be directly mapped to a secondary storage device (usually a hard disk drive) without using the main memory as intermediary.
On an n-bit machine, the VAS is 2n bytes large. So, on a 32-bit machine, the VAS 232 = 4 GiB large.
Virtual memory is not created on disk. In fact, the existence of a disk is not needed for implementing virtual memory. Most implementations of virtual memory are paged. So, when a 4 GiB VAS is created, only the pages that are needed are mapped into that VAS. For example, suppose a process only uses 16 pages of memory on a 32-bit system with 4k-sized pages. Despite having a 4 GiB VAS, only 16 * 4k = 216 bytes of memory are mapped into the VAS. The rest of the memory is unmapped. If the CPU tries to access this unmapped memory, a segmentation fault will occur. If a process wants to map memory at this address, then (in a POSIX-complaint OS) it can request the mapping from the OS using mmap(2). This will make a lot more sense once you learn about page tables.
Virtual memory is a concept. A virtual address space is an entity that stems from the concept of virtual memory. These terms go hand in hand, but refer to different things.
I will list a couple of caveats.
Caveat 1.1
I am not not aware of any 64-bit processor that truly supports a 64-bit VAS. The addresses themselves are 64 bits wide, but a certain number of upper bits are ignored. AMD's first implementation of x86_64 only supported 48-bit addresses. The upper 16 bits of an address were effectively ignored. In such a system, the addresses are 64 bits wide, but the real size of the VAS is limited to 248 bytes. Subsequent architectures supported 56-bit addresses.
Caveat 1.2
If a processor supports PAE, then the VAS on an n-bit machine may be larger than 2n bytes. This is how 32-bit processors can support VASs larger than 4 GiB.
Caveat 2.1
Not really a caveat, but this is related to your question. You asked what happens when there isn't enough space on disk to create a VAS. As I mentioned in the main answer, the VAS is not created on disk. However, any computer only has a finite amount of physical memory. What happens when a process requests a page be mapped, but there is no physical memory available? There are there several ways to handle this:
Swapping is done by temporarily moving a page that is mapped in virtual memory to disk. The entire contents of the page are copied to disk. Then, the process that requested the page has the physical page mapped into their memory. Eventually, the old page may be requested. If this occurs, then the OS copies the page from disk and remaps it into the corresponding VAS. This is what Linux and most modern operating systems do.
The process is simply told there is no memory available, for example, through an error number like ENOMEM.
The process is blocked until memory is available. I haven't seen this in practice.
Swapping implies the use of a disk, but virtual memory does not imply the use of swapping, hence a disk is not necessary for virtual memory.
Virtual Address Space - wikipedia
When a new application on a 32-bit OS is executed, the process has a 4 GiB VAS: each one of the memory addresses (from 0 to 232 − 1) in that space can have a single byte as a value. Initially, none of them have values.
For n-bit OS, these n-address lines allow address space upto 2n addresses, i.e., 0 to 2n - 1. This would mean 16 EiB for 64-bit OS. (Though in actual implementations, less space is used as this much space is unnecessary.)
CPU Cache - wikipedia
Most general purpose CPUs implement some form of virtual memory. To summarize, either each program running on the machine sees its own simplified address space, which contains code and data for that program only, or all programs run in a common virtual address space. A program executes by calculating, comparing, reading and writing to addresses of its virtual address space, rather than addresses of physical address space, making programs simpler and thus easier to write.
For example, in C++, program memory is divided in stack, heap, data, code. I'm not sure if analogy is correct (may be), but it somewhat presents an insight if you're aware.
Virtual memory - wikipedia
In computing, virtual memory is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine"3 which "creates the illusion to users of a very large (main) memory".[4]
The computer's operating system, using a combination of hardware and software, maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage, as seen by a process or task, appears as a contiguous address space or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory.
Address translation hardware in the CPU, often referred to as a memory management unit (MMU), automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer.
If you know about computer architecture (which I'm sure you do from the question), it'd be clarified by now.
Still, for anyone in general, I'm giving a bit of explanation.
Assume addresses as pointers in C++. If you don't know C++, closest analogy would be array/list indices in any language. Now the addresses point to the memory locations, just like pointers point to the variable. The actual data is stored in the variable. To get the variable data using pointer/index, you provide address location from where the data is to be extracted. Now in physical memory, there won't be a thing like a variable. There is memory and it's location address through which it is accessed.
The real memory is physical memory, which is the hard disks. It is accessed with physical addresses, which would be unique for each byte.
Accessing physical memory directly with physical addresses would be cumbersome. Thus the addresses are simplified by the OS to virtual addresses. These addresses may or may not be unique (these aren't physical addresses, remember). Thus, multiple virtual addresses may point to same location.
The virtual memory is not actually existent, rather it's just a concept of physical memory simplified using virtual addresses to give the user an illusionous space say where next memory location is stored at next address (virtual address to be precise).
Since multiple virtual addresses can be mapped, by using MMU, to same physical address, and thus to point to same phyical memory location, the virtual memory size can be made to exceed the physical memory size (virtually). But effectively, the memory size would still be same as physical.
Thus, to access a memory data, Virtual Addresses are specified by user/program to OS, which are converted to Physical Addresses by memory management unit (mmu) and then applied to the address lines of the computer architecture (electronics spotted!!), which yields the data at the corresponding physical location. And this concept is called Virtual Memory.
-Himanshu
what is kernel mapping? What are permanent mapping and temporary mapping. What is a window in this context? I went through code and explanation of this but could not understand this
I'm assuming you're talking about memory mapping in linux kernel.
Memory mapping is a process of mapping kernel address space directly to users process's address space.
Types of addresses :
User virtual address : These are the regular addresses seen by user-space programs
Physical addresses : The addresses used between the processor and the system’s memory.
Bus addresses : The addresses used between peripheral buses and memory. Often, they are the same as the physical addresses used by the processor, but that is not necessarily the case.
Kernel logical addresses : These make up the normal address space of the kernel.
Kernel virtual addresses : Kernel virtual addresses are similar to logical addresses in that they are a mapping from a kernel-space address to a physical address.
High and Low Memory :
Low memory : Memory for which logical addresses exist in kernel space. On almost every system you will likely encounter, all memory is low memory.
High memory : Memory for which logical addresses do not exist, because it is beyond the address range set aside for kernel virtual addresses.This means the kernel needs to start using temporary mappings of the pieces of physical memory that it wants to access.
Kernel splits virtual address into two part user address space and kernel address space. The kernel’s code and data structures must fit into that space, but the biggest consumer of kernel address space is virtual mappings for physical memory. Thus kernel needs its own virtual address for any memory it must touch directly. So, the maximum amount of physical memory that could be handled by the kernel was the amount that could be mapped into the kernel’s portion of the virtual address space, minus the space used by kernel code.
Temporary mapping : When a mapping must be created but the current context cannot sleep, the kernel provides temporary mappings (also called atomic mappings). The kernel can atomically map a high memory page into one of the reserved mappings (which can hold temporary mappings). Consequently, a temporary mapping can be used in places that cannot sleep, such as interrupt handlers, because obtaining the mapping never blocks.
Ref :
kernel.org/doc/Documentation/vm/highmem.txt
static.lwn.net/images/pdf/LDD3/ch15.pdf
man mmap
notes.shichao.io/lkd/ch12/
A full answer would be very long, for details refers (for example) to Linux Kernel Addressing or Understanding the Linux Kernel (pages 306-). These concepts are related to the way address spaces are organized in Linux. Firstly how kernel space is mapped into user space (kernel mapped onto user space simplifies the switching in between user and kernel mode) and, secondly the way physical memory is mapped onto kernel space (because kernel have to manage physical memory).
Beware that this is of no concern in modern 64bit architectures.
On the surface, this appears to be a silly question. Some patience please.. :-)
Am structuring this qs into 2 parts:
Part 1:
I fully understand that platform RAM is mapped into the kernel segment; esp on 64-bit systems this will work well. So each kernel virtual address is indeed just an offset from physical memory (DRAM).
Also, it's my understanding that as Linux is a modern virtual memory OS, (pretty much) all addresses are treated as virtual addresses and must "go" via hardware - the TLB/MMU - at runtime and then get translated by the TLB/MMU via kernel paging tables. Again, easy to understand for user-mode processes.
HOWEVER, what about kernel virtual addresses? For efficiency, would it not be simpler to direct-map these (and an identity mapping is indeed setup from PAGE_OFFSET onwards). But still, at runtime, the kernel virtual address must go via the TLB/MMU and get translated right??? Is this actually the case? Or is kernel virtual addr translation just an offset calculation?? (But how can that be, as we must go via hardware TLB/MMU?). As a simple example, lets consider:
char *kptr = kmalloc(1024, GFP_KERNEL);
Now kptr is a kernel virtual address.
I understand that virt_to_phys() can perform the offset calculation and return the physical DRAM address.
But, here's the Actual Question: it can't be done in this manner via software - that would be pathetically slow! So, back to my earlier point: it would have to be translated via hardware (TLB/MMU).
Is this actually the case??
Part 2:
Okay, lets say this is the case, and we do use paging in the kernel to do this, we must of course setup kernel paging tables; I understand it's rooted at swapper_pg_dir.
(I also understand that vmalloc() unlike kmalloc() is a special case- it's a pure virtual region that gets backed by physical frames only on page fault).
If (in Part 1) we do conclude that kernel virtual address translation is done via kernel paging tables, then how exactly does the kernel paging table (swapper_pg_dir) get "attached" or "mapped" to a user-mode process?? This should happen in the context-switch code? How? Where?
Eg.
On an x86_64, 2 processes A and B are alive, 1 cpu.
A is running, so it's higher-canonical addr
0xFFFF8000 00000000 through 0xFFFFFFFF FFFFFFFF "map" to the kernel segment, and it's lower-canonical addr
0x0 through 0x00007FFF FFFFFFFF map to it's private userspace.
Now, if we context-switch A->B, process B's lower-canonical region is unique But
it must "map" to the same kernel of course!
How exactly does this happen? How do we "auto" refer to the kernel paging table when
in kernel mode? Or is that a wrong statement?
Thanks for your patience, would really appreciate a well thought out answer!
First a bit of background.
This is an area where there is a lot of potential variation between
architectures, however the original poster has indicated he is mainly
interested in x86 and ARM, which share several characteristics:
no hardware segments or similar partitioning of the virtual address space (when used by Linux)
hardware page table walk
multiple page sizes
physically tagged caches (at least on modern ARMs)
So if we restrict ourselves to those systems it keeps things simpler.
Once the MMU is enabled, it is never normally turned off. So all CPU
addresses are virtual, and will be translated to physical addresses
using the MMU. The MMU will first look up the virtual address in the
TLB, and only if it doesn't find it in the TLB will it refer to the
page table - the TLB is a cache of the page table - and so we can
ignore the TLB for this discussion.
The page table
describes the entire virtual 32 or 64 bit address space, and includes
information like:
whether the virtual address is valid
which mode(s) the processor must be in for it to be valid
special attributes for things like memory mapped hardware registers
and the physical address to use
Linux divides the virtual address space into two: the lower portion is
used for user processes, and there is a different virtual to physical
mapping for each process. The upper portion is used for the kernel,
and the mapping is the same even when switching between different user
processes. This keep things simple, as an address is unambiguously in
user or kernel space, the page table doesn't need to be changed when
entering or leaving the kernel, and the kernel can simply dereference
pointers into user space for the
current user process. Typically on 32bit processors the split is 3G
user/1G kernel, although this can vary. Pages for the kernel portion
of the address space will be marked as accessible only when the processor
is in kernel mode to prevent them being accessible to user processes.
The portion of the kernel address space which is identity mapped to RAM
(kernel logical addresses) will be mapped using big pages when possible,
which may allow the page table to be smaller but more importantly
reduces the number of TLB misses.
When the kernel starts it creates a single page table for itself
(swapper_pg_dir) which just describes the kernel portion of the
virtual address space and with no mappings for the user portion of the
address space. Then every time a user process is created a new page
table will be generated for that process, the portion which describes
kernel memory will be the same in each of these page tables. This could be
done by copying all of the relevant portion of swapper_pg_dir, but
because page tables are normally a tree structures, the kernel is
frequently able to graft the portion of the tree which describes the
kernel address space from swapper_pg_dir into the page tables for each
user process by just copying a few entries in the upper layer of the
page table structure. As well as being more efficient in memory (and possibly
cache) usage, it makes it easier to keep the mappings consistent. This
is one of the reasons why the split between kernel and user virtual
address spaces can only occur at certain addresses.
To see how this is done for a particular architecture look at the
implementation of pgd_alloc(). For example ARM
(arch/arm/mm/pgd.c) uses:
pgd_t *pgd_alloc(struct mm_struct *mm)
{
...
init_pgd = pgd_offset_k(0);
memcpy(new_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,
(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
...
}
or
x86 (arch/x86/mm/pgtable.c) pgd_alloc() calls pgd_ctor():
static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
references from swapper_pg_dir. */
...
clone_pgd_range(pgd + KERNEL_PGD_BOUNDARY,
swapper_pg_dir + KERNEL_PGD_BOUNDARY,
KERNEL_PGD_PTRS);
...
}
So, back to the original questions:
Part 1: Are kernel virtual addresses really translated by the TLB/MMU?
Yes.
Part 2: How is swapper_pg_dir "attached" to a user mode process.
All page tables (whether swapper_pg_dir or those for user processes)
have the same mappings for the portion used for kernel virtual
addresses. So as the kernel context switches between user processes,
changing the current page table, the mappings for the kernel portion
of the address space remain the same.
The kernel address space is mapped to a section of each process for example on 3:1 mapping after address 0xC0000000. If the user code try to access this address space it will generate a page fault and it is guarded by the kernel.
The kernel address space is divided into 2 parts, the logical address space and the virtual address space. It is defined by the constant VMALLOC_START. The CPU is using the MMU all the time, in user space and in kernel space (can't switch on/off).
The kernel virtual address space is mapped the same way as user space mapping. The logical address space is continuous and it is simple to translate it to physical so it can be done on demand using the MMU fault exception. That is the kernel is trying to access an address, the MMU generate fault , the fault handler map the page using macros __pa , __va and change the CPU pc register back to the previous instruction before the fault happened, now everything is ok. This process is actually platform dependent and in some hardware architectures it mapped the same way as user (because the kernel doesn't use a lot of memory).
I am writing a Linux kernel module that needs to map a specific physical address to a specific virtual one, and I just can't find a way to do it.
OK, This is my current solution. To map phys_addr to virt_addr I use this code:
page = pfn_to_page(virt_addr >> PAGE_SHIFT);
pte = get_locked_pte(&init_mm, phys_addr, &ptl);
set_pte_at(&init_mm, phys_addr, pte, mk_pte(page, VM_READ | VM_WRITE | VM_EXEC));
spin_unlock(ptl);
flush_tlb_all();
Some explenations: I use the pfn_to_page func to get the page struct corresponding to my virt_addr. I get the page table entry (pte) with the get_locked_pte func which needs the physical address corresponding to the wanted pte and an uninitalized spinlock (ptl). Then, I actually map the page using set_pte_at func and the mk_pte macro, unlock the spinlock and flush the tlb cache.
This solution seems to work pretty well, though it does not survive a context switch.
Could you tell us the kernel version and CPU architecture/type you are using? Generally speaking, if the specific virtual address you want to mapping to does not overlap with kernel virtual address(such as 0xC0000000), and also if the physical address which your device is using does not overlap with system memory physical address range, you can using the low-level functions(if there is not, you can using assemble language to set up MMU TLB entries directly during kernel booting up) to set up MMU TLB entries to map the specific address to a specific virtual one during kernel booting up. I can provide one example based on 2.6.10 kernel version and Freescale PowerPC CPU, there is a function io_block_mapping to do the thing you want.
I am changing the linux kernel scheduler to print the pid of the next process in a known physical memory location. mmap is used for userspace programs while i read that ioremap marks the page as non-cacheable which would slowdown the execution of the program. I would like a fast way to write to a known physical memory. phys_to_virt is the option that i think is feasible. Any idea for a different technique.
PS: i am running this linux kernel on top of qemu. the physical address will be used by qemu to read information sent by guest kernel. writing to a known io-port is not feasible since the device code backing this io-device will be called every time there is an access to the device.
EDIT : I want the physical address location of the pid to be safe. How can I make sure that a physical address that the kernel is using is not being assigned to any process. As far as my knowledge goes, ioremap would mark the page as cacheable and would hence not be of great use.
The simplest way to do this would be to do kmalloc() to get some memory in the kernel. Then you can get the physical address of the pointer that returns by passing it to virt_to_phys(). This is a total hack but for your case of debugging / tracing under qemu, it should work fine.
EDIT: I misunderstood the question. If you want to use a specific physical address, there are a couple of things you could do. Maybe the cleanest thing to do would be to modify the e820 map that qemu passes in to mark the RAM page as reserved, and then the kernel won't use it. (ie the same way that ACPI tables are passed in).
If you don't want to modify qemu, you could also modify the early kernel startup (around arch/x86/kernel/setup.c probably) to do reserve_bootmem() on the specific physical page you want to protect from being used.
To actually use the specified physical address, you can just use ioremap_cache() the same way the ACPI drivers access their tables.
It seems I misunderstood the cache coherency between VM and host part, here is an updated answer.
What you want is "virtual adress in VM" <-> "virtual or physical adress in QEMU adress space".
Then you can either kmalloc it, but it may vary from instance to instance,
or simply declare a global variable in the kernel.
Then virt_to_phys would give you access to the physical address in VM space, and I suppose you can translate this in a QEMU adress space. What do you mean by "a physical address that the kernel is using is not assigned to any process ?" You are afraid the page conatining your variable might be swapped ? kmalloced memory is not swappable
Original (and wrong) answer
If the adress where you want to write is in it's own page, I can't see how an ioremap
of this page would slow down code executing in a different page.
You need a cache flush anyway, and without SSE, I can't see how you can bypass the cache if MMU and cache are on. I can see only this two options :
ioremap and declare a particular page non cacheable
use a "normal" address, and manually do a cache flush each time you write.