Linux Page Table Management and MMU - linux

I have a question about relationship between linux kernel and MMU.
I now got a point that the linux kernel manages page table between virtual memory addresses and physical memory addresses.
At the same time there is MMU in x86 architecture which manages page table between virtual memory addresses and physical memory addresses.
If MMU presents near CPU, does kernel still need to take care of page table?
This question may be stupid, but the other question is, if MMU takes care of memory space, who manages high memory and low memory? I believe kernel will receive size of virtual memory from MMU (4GB in 32bit) then kernel will distinguish between userspace and kernel space in virtual address.
Am I correct? or completely wrong?
Thanks a lot in advance!

The OS and MMU page management responsibilities are 2 sides of the same mechanism, that lives on the boundary between architecture and micro-architecture.
The first side defines the "contract" between the hardware and the software that runs over it (in this case - the OS) - if you want to use virtual memory, you need build and maintain a page table as described in that contract.
The MMU side, on the other hand, is a hardware unit that's responsible for performing the HW tasks of the address translation. This may or may not include hardware optimizations, these are usually hidden and may be implemented in various ways to run under the hood, as long as it maintains the hardware side of the contract.
In theory, the MMU may decide to issue a set of memory accesses for each translation (a page walk), in order to achieve the required behavior. However, since it's a performance critical element, most MMUs optimize this by caching the results of previous page walks inside the TLB, just like a cache stores the results of previous accesses (actually, on some implementations, the caches themselves may also store some of the accesses to the page table since it usually resides in cacheable memory). The MMU can manage multiple TLBs (most implementations separate the ones for data and code pages, and some have 2nd level TLBs), and provide the translation from there without you noticing that except for the faster access time.
It should also be noted that the hardware must guard against many corner cases that can harm the coherency of such TLB "caching" of previous translations, for example page aliasing or remaps during usage. On some machines, the nastier cases even require a massive flush flow called TLB shootdown.

Related

Is memory allocated with "ftruncate" is physically contiguous? [duplicate]

Is there a way to allocate contiguous physical memory from userspace in linux? At least few guaranteed contiguous memory pages. One huge page isn't the answer.
No. There is not. You do need to do this from Kernel space.
If you say "we need to do this from User Space" - without anything going on in kernel-space it makes little sense - because a user space program has no way of controlling or even knowing if the underlying memory is contiguous or not.
The only reason where you would need to do this - is if you were working in-conjunction with a piece of hardware, or some other low-level (i.e. Kernel) service that needed this requirement. So again, you would have to deal with it at that level.
So the answer isn't just "you can't" - but "you should never need to".
I have written such memory managers that do allow me to do this - but it was always because of some underlying issue at the kernel level, which had to be addressed at the kernel level. Generally because some other agent on the bus (PCI card, BIOS or even another computer over RDMA interface) had the physical contiguous memory requirement. Again, all of this had to be addressed in kernel space.
When you talk about "cache lines" - you don't need to worry. You can be assured that each page of your user-space memory is contiguous, and each page is much larger than a cache-line (no matter what architecture you're talking about).
Yes, if all you need is a few pages, this may indeed be possible.
The file /proc/[pid]/pagemap now allows programs to inspect the mapping of their virtual memory to physical memory.
While you cannot explicitly modify the mapping, you can just allocate a virtual page, lock it into memory via a call to mlock, record its physical address via a lookup into /proc/self/pagemap, and repeat until you just happen to get enough blocks touching eachother to create a large enough contiguous block. Then unlock and free your excess blocks.
It's hackish, clunky and potentially slow, but it's worth a try. On the other hand, there's a decently large chance that this isn't actually what you really need.
DPDK library's memory allocator uses approach #Wallacoloo described. eal_memory.c. The code is BSD licensed.
if specific device driver exports dma buffer which is physical contiguous, user space can access through dma buf apis
so user task can access but not allocate directly
that is because physically contiguous constraints are not from user aplications but only from device
so only device drivers should care.

In Linux, physical memory pages belong to the kernel data segment are swappable or not?

I'm asking because I remember that all physical pages belong to the kernel are pinned in memory and thus are unswappable, like what is said here: http://www.cse.psu.edu/~axs53/spring01/linux/memory.ppt
However, I'm reading a research paper and feel confused as it says,
"(physical) pages frequently move between the kernel data segment and user space."
It also mentions that, in contrast, physical pages do not move between the kernel code segment and user space.
I think if a physical page sometimes belongs to the kernel data segment and sometimes belongs to user space, it must mean that physical pages belong to the kernel data segment are swappable, which is against my current understanding.
So, physical pages belong to the kernel data segment are swappable? unswappable?
P.S. The research paper is available here:
https://www.cs.cmu.edu/~arvinds/pubs/secvisor.pdf
Please search "move between" and you will find it.
P.S. again, a virtual memory area ranging from [3G + 896M] to 4G belongs to the kernel and is used for mapping physical pages in ZONE_HIGHMEM (x86 32-bit Linux, 3G + 1G setting). In such a case, the kernel may first map some virtual pages in the area to the physical pages that host the current process's page table, modify some page table entries, and unmap the virtual pages. This way, the physical pages may sometimes belong to the kernel and sometimes belong to user space, because they do not belong to the kernel after the unmapping and thus become swappable. Is this the reason?
tl;dr - the memory pools and swapping are different concepts. You can not make any deductions from one about the other.
kmalloc() and other kernel data allocation come from slab/slub, etc. The same place that the kernel gets data for user-space. Ergo pages frequently move between the kernel data segment and user space. This is correct. It doesn't say anything about swapping. That is a separate issue and you can not deduce anything.
The kernel code is typically populated at boot and marked read-only and never changes after that. Ergo physical pages do not move between the kernel code segment and user space.
Why do you think because something comes from the same pool, it is the same? The network sockets also come from the same memory pool. It is a seperation of concern. The linux-mm (memory management system) handles swap. A page can be pinned (unswappable). The check for static kernel memory (this may include .bss and .data) is a simple range check. The memory is normally pinned and marked unswappable at the linux-mm layer. The user data (whos allocation come from the same pool) can be marked as swappable by the linux-mm. For instance, even without swap, user-space text is still swappable because it is backed by an inode. Caching is much simpler for read-only data. If data is swapped, it is marked as such in the MMU tables and a fault handler must distinguish between swap and a SIGBUS; which is part of the linux-mm.
There are also versions of Linux with no-mm (or no MMU) and these will never swap anything. In theory someone might be able to swap kernel data; but the why is it in the kernel? The Linux way would be to use a module and only load them as needed. Certainly, the linux-mm data is kernel data and hopefully, you can see a problem with swapping that.
The problem with conceptual questions like this,
It can differ with Linux versions.
It can differ with Linux configurations.
The advice can change as Linux evolves.
For certain, the linux-mm code can not be swappable, nor any interrupt handler. It is possible that at some point in time, kernel code and/or data could be swapped. I don't think that this is ever the current case outside of module loading/unloading (and it is rather pedantic/esoteric as to whether you call this swapping or not).
I think if a physical page sometimes belongs to the kernel data segment and sometimes belongs to user space, it must mean that physical pages belong to the kernel data segment are swappable, which is against my current understanding.
there is no connection between swappable memory and page movement between user space and kernel space. whether a page can be swapped or not depends totally on whether it is pinned or not. Pinned pages are not swapped so their mapping is considered permanent.
So, physical pages belong to the kernel data segment are swappable? unswappable?
usually pages used by kernel are pinned and so are meant not to be swappable.
However, I'm reading a research paper and feel confused as it says, "(physical) pages frequently move between the kernel data segment and user space."
Could you please give a link of this research papaer?
As far as I known, (just from UNIX lectures and labs at school) the pages for kernel space has been allocated for kernel, with a simple, fixed mapping algorithm, and they are all pinned. After kernel turn on the paging mode, (bits operation of CR0&CR3 for x86) there will be the first user mode process, and the pages which has been allocated for kernel will not be in the available set of pages for user space.

Large physically contiguous memory area

For my M.Sc. thesis, I have to reverse-engineer the hash function Intel uses inside its CPUs to spread data among Last Level Cache slices in Sandy Bridge and newer generations. To this aim, I am developing an application in Linux, which needs a physically contiguous memory area in order to make my tests. The idea is to read data from this area, so that they are cached, probe if older data have been evicted (through delay measures or LLC miss counters) in order to find colliding memory addresses and finally discover the hash function by comparing these colliding addresses.
The same procedure has already been used in Windows by a researcher, and proved to work.
To do this, I need to allocate an area that must be large (64 MB or more) and fully cachable, so without DMA-friendly options in TLB. How can I perform this allocation?
To have a full control over the allocation (i.e., for it to be really physically contiguous), my idea was to write a Linux module, export a device and mmap() it from userspace, but I do not know how to allocate so much contiguous memory inside the kernel.
I heard about Linux Contiguous Memory Allocator (CMA), but I don't know how it works
Applications don't see physical memory, a process have some address space in virtual memory. Read about the MMU (what is contiguous in virtual space might not really be physically contiguous and vice versa)
You might perhaps want to lock some memory using mlock(2)
But your application will be scheduled, and other processes (or scheduled tasks) would dirty your CPU cache. See also sched_setaffinity(2)
(and even kernel code might be perhaps preempted)
This page on Kernel Newbies, has some ideas about memory allocation. But the max for get_free_pages looks like 8MiB. (Perhaps that's a compile-time constraint?)
Since this would be all-custom, you could explore the mem= boot parameter of the linux kernel. This will limit the amount of memory used, and you can party all over the remaining memory without anyone knowing. Heck, if you boot up a busybox system, you could probably do mem=32M, but even mem=256M should work if you're not booting a GUI.
You will also want to look into the Offline Scheduler (and here). It "unplugs" the CPU from Linux so you can have full control over ALL code running on it. (Some parts of this are already in the mainline kernel, and maybe all of it is.)

What does ASLR(address space layout randomiztion) do?

I read that it is a security measure to protect against common attacks. The idea is that it keeps randomizing the virtual memory space which I believe will require periodic updates to the page table and the TLB? Am I correct?
My other question is, does it, at all, randomize the physical location of pages in the physical memory? Because I have been looking into the behavior of the physical memory under and without ASLR and the behavior is different.
ASLR is a security feature that's supposed to randomise the virtual address space of a program on each run. It doesn't hot-swap the virtual address space of a program while it's running (that would be a disaster).
The operating system will usually have to make periodic updates to the page table simply as part of scheduling. Whether ASLR is enabled is largely irrelevant to page table updates.
As to your other question, the layout of physical memory is essentially random whether you have ASLR enabled or not under Linux. Pages get swapped in and out of memory very often and, furthermore, the physical memory layout can quickly become fragmented. The only semi-predictable parts of physical memory would probably be memory reserved for DMA.
I'm still unsure how you managed to "look into the behaviour of physical memory" and conclude that the behaviour is significantly different between ASLR enabled and not.

single common address space for all tasks

How to give single common address space for all tasks. IF its happening like this can we avoid virtual to physical memory mapping.
I f all task sharing common address space then how can we avoid virtual to physical memory mapping.
There are a few modern (research) OS's that do this, like Singularity and there are performance benefits, primarily because it no longer needs to do context changes and the file/symbol loader no longer needs to do address translation for global caches and kernel functions.
You do need to be a bit more specific about what you're looking for, tho'. You tagged your post as OSX and Linux, both of which require virtual memory. When running on systems without a MMU (and thus no virtual memory) it emulates it, which I'm fairly certain you can't circumvent. I'm not an expert by any means.
uClinux is an implementation of Linux that runs on processors that lack an MMU (such as ARM7), so by definition must have a single address space for all tasks.
So one answer to "how" is "use uClinux".
You tagged this VxWorks, and there is another answer; VxWorks supports a flat memory. In fact when I last used it the MMU protection was an (expensive) add on. Many other RTOS designed for micro controllers similarly do not support an MMU, such as eCOS, and FreeRTOS.
Of RTOS's that do support an MMU, QNX is probably amongst the most robust and mature, while still maintaining high performance.
I'm not sure why you would want to disable virtual memory mapping - it's a built in function of the cpu, and pretty much essential when running an OS to properly isolate processes from each other.
Most operating systems allow you to disable virtual memory, so that your memory capacity is limited by physical memory. However, A processes address space is still virtual, and virtual to physical mapping is still happening.
A way to get what you want is to run an operating system that executes in Real Mode, such as DOS or Windows 3.0, or write your own.
The advantages of virtual memory far outweigh the disadvantages. Why do you want to avoid virtual memory.
This is how some older operating systems and even how some modern operating systems that lack VM still work. It has many disadvantages for things like desktop and server applications but it can be useful in an embedded and/or real-time context, or where you have minimal hardware.
The VxWorks AE(Advanced Edition supports) deviates from the concept of Common address space for all tasks.So it can effectively be used in both systems with MMU and without MMU .The common address space for all tasks is called flat memory model and the separate address space for different tasks is called over lapped memory model or segmented memory model.You should not confuse the memory model with the memory lay out as seen in object files which divides data in to Code Segment ,Data Segment ,BSS etc .Both are entirely different things :).
This link in stack overflow will help better
Difference between flat memory model and protected memory model?

Resources