How and when are different memory maps created? - linux

An Excerpt from one of the books I'm referring quotes,
The processor memory map: This is the first memory map that needs to be created. It explains the CPU’s memory management policies such
as how the CPU handles the different address spaces (user mode, kernel
mode), what are the caching policies for the various memory regions,
and so on.
The board memory map: Once there is an idea of how the processor sees the various memory areas, the next step is to fit the various
onboard devices into the processor memory areas. This requires an
understanding of the various onboard devices and the bus controllers.
The software memory map: Next a portion of the memory needs to be given for the various software components such as the boot loader and
the Linux kernel. The Linux kernel sets up its own memory map and
decides where the various kernel sections such as code and heap will
When are these memory maps created ? Like say, is it something hard coded and before the compilation phase or is it decided by some run time task ?
Are there some standards on mapping the processor address space to various devices or is it the user's choice ?
The book btw.

When are these memory maps created ? Like say, is it something hard
coded and before the compilation phase or is it decided by some run
time task ?
The configuration for this would typically be the first thing the kernel would do when starting. It can be hard coded or could be decided before the kernel starts, i.e. in the bootloader.
Are there some standards on mapping the processor address space to
various devices or is it the user's choice ?
It is really up to you as the designer. Many would probably choose a similar configuration from other operating system such as linux.


Large physically contiguous memory area

For my M.Sc. thesis, I have to reverse-engineer the hash function Intel uses inside its CPUs to spread data among Last Level Cache slices in Sandy Bridge and newer generations. To this aim, I am developing an application in Linux, which needs a physically contiguous memory area in order to make my tests. The idea is to read data from this area, so that they are cached, probe if older data have been evicted (through delay measures or LLC miss counters) in order to find colliding memory addresses and finally discover the hash function by comparing these colliding addresses.
The same procedure has already been used in Windows by a researcher, and proved to work.
To do this, I need to allocate an area that must be large (64 MB or more) and fully cachable, so without DMA-friendly options in TLB. How can I perform this allocation?
To have a full control over the allocation (i.e., for it to be really physically contiguous), my idea was to write a Linux module, export a device and mmap() it from userspace, but I do not know how to allocate so much contiguous memory inside the kernel.
I heard about Linux Contiguous Memory Allocator (CMA), but I don't know how it works
Applications don't see physical memory, a process have some address space in virtual memory. Read about the MMU (what is contiguous in virtual space might not really be physically contiguous and vice versa)
You might perhaps want to lock some memory using mlock(2)
But your application will be scheduled, and other processes (or scheduled tasks) would dirty your CPU cache. See also sched_setaffinity(2)
(and even kernel code might be perhaps preempted)
This page on Kernel Newbies, has some ideas about memory allocation. But the max for get_free_pages looks like 8MiB. (Perhaps that's a compile-time constraint?)
Since this would be all-custom, you could explore the mem= boot parameter of the linux kernel. This will limit the amount of memory used, and you can party all over the remaining memory without anyone knowing. Heck, if you boot up a busybox system, you could probably do mem=32M, but even mem=256M should work if you're not booting a GUI.
You will also want to look into the Offline Scheduler (and here). It "unplugs" the CPU from Linux so you can have full control over ALL code running on it. (Some parts of this are already in the mainline kernel, and maybe all of it is.)

Linux Page Table Management and MMU

I have a question about relationship between linux kernel and MMU.
I now got a point that the linux kernel manages page table between virtual memory addresses and physical memory addresses.
At the same time there is MMU in x86 architecture which manages page table between virtual memory addresses and physical memory addresses.
If MMU presents near CPU, does kernel still need to take care of page table?
This question may be stupid, but the other question is, if MMU takes care of memory space, who manages high memory and low memory? I believe kernel will receive size of virtual memory from MMU (4GB in 32bit) then kernel will distinguish between userspace and kernel space in virtual address.
Am I correct? or completely wrong?
Thanks a lot in advance!
The OS and MMU page management responsibilities are 2 sides of the same mechanism, that lives on the boundary between architecture and micro-architecture.
The first side defines the "contract" between the hardware and the software that runs over it (in this case - the OS) - if you want to use virtual memory, you need build and maintain a page table as described in that contract.
The MMU side, on the other hand, is a hardware unit that's responsible for performing the HW tasks of the address translation. This may or may not include hardware optimizations, these are usually hidden and may be implemented in various ways to run under the hood, as long as it maintains the hardware side of the contract.
In theory, the MMU may decide to issue a set of memory accesses for each translation (a page walk), in order to achieve the required behavior. However, since it's a performance critical element, most MMUs optimize this by caching the results of previous page walks inside the TLB, just like a cache stores the results of previous accesses (actually, on some implementations, the caches themselves may also store some of the accesses to the page table since it usually resides in cacheable memory). The MMU can manage multiple TLBs (most implementations separate the ones for data and code pages, and some have 2nd level TLBs), and provide the translation from there without you noticing that except for the faster access time.
It should also be noted that the hardware must guard against many corner cases that can harm the coherency of such TLB "caching" of previous translations, for example page aliasing or remaps during usage. On some machines, the nastier cases even require a massive flush flow called TLB shootdown.

why kernel needs virtual addressing?

In Linux each process has its virtual address space (e.g. 4 GB in case of 32 bit system, wherein 3GB is reserved for process and 1 GB for kernel). This virtual addressing mechanism helps isolating the address space of each process. This is understandable in case of process since there are many processes. But since we have 1 kernel only so why do we need virtual addressing for kernel?
The reason the kernel is "virtual" is not to deal with paging as such, it is becuase the processor can only run in one mode at a time. So once you turn on paged memory mapping (Bit 31 in CR0 on x86), the processor is expecting ALL memory accesses to go through the page-mapping mechanism. So, since we do want to access the kernel even after we have enabled paging (virtual memory), it needs to exist somewhere in the virtual space.
The "reserving" of memory is more about "easy way to determine if an address is kernel or user-space" than anything else. It would be perfectly possible to put a little bit of kernel at address 12345-34121, another bit of kernel at 101900-102400 and some other bit of kernel at 40000000-40001000. But it would make life difficult for every aspect of the kernel and userspace - there would be gaps/holes to deal with [there already are such holes/gapes, but having more wouldn't exactly help things]. By setting a fixed limit for "userspace is from here to here, kernel is from end of userspace to X", it makes life much easier in that respect. We can just say kernel = 0; if (address > max_userspace) kernel=1; in some code.
Of course, the kerneln only takes up as much PHYSICAL memory as it will actually use - so the common thinking that "it's a waste to take up a whole gigabyte for the kernel" is wrong - the kernel itself is a few (a dozen or so for a very "big" kernel) megabytes. The modules loaded can easily add up to several more megabytes, and graphics drivers from ATI and nVidia easily another few megabytes just for the kernel moduel for that itself. The kernel also uses some bits of memory to store "kernel data", such as tasks, queues, semaphores, files and other "stuff" the kernel has to deal with. A few megabytes is used for this as well.
Virtual Memory Management is that feature of Linux which enables Multi-tasking in system without any limitation on no. of task or amount of memory used by each task. The Linux Memory Manager Subsystem (along with MMU hardware) facilitates VMM support, where memory or mem-mapped device are accessed through virtual addresses. Within Linux everything, both kernel and user components, works with virtual address except when dealing with real hardware. That's when the Memory Manager takes its place, does virtual-to-physical address translation and points to physical mem/dev location.
A process is an abstract entity, defined by kernel to which system resources are allocated in order to execute a program. In Linux Process Management the kernel is an integrated part of a process memory map. A process has two main regions, like two faces of one coin:
User Space view - contains user program sections (Code, Data, Stack, Heap, etc...) used by process
Kernel Space view - contains kernel data structures that maintain information (PID. States, FD, Resource Usage, etc...) about the process
Every process in Linux system has a unique and separate User Space Region. This feature of Linux VMM isolates each process program sections from one and other. But all processes in the system shares the common Kernel Space Region. When a process needs service from the kernel it must execute the kernel code in this region, or in other words kernel is performing on behalf of user process request.

single common address space for all tasks

How to give single common address space for all tasks. IF its happening like this can we avoid virtual to physical memory mapping.
I f all task sharing common address space then how can we avoid virtual to physical memory mapping.
There are a few modern (research) OS's that do this, like Singularity and there are performance benefits, primarily because it no longer needs to do context changes and the file/symbol loader no longer needs to do address translation for global caches and kernel functions.
You do need to be a bit more specific about what you're looking for, tho'. You tagged your post as OSX and Linux, both of which require virtual memory. When running on systems without a MMU (and thus no virtual memory) it emulates it, which I'm fairly certain you can't circumvent. I'm not an expert by any means.
uClinux is an implementation of Linux that runs on processors that lack an MMU (such as ARM7), so by definition must have a single address space for all tasks.
So one answer to "how" is "use uClinux".
You tagged this VxWorks, and there is another answer; VxWorks supports a flat memory. In fact when I last used it the MMU protection was an (expensive) add on. Many other RTOS designed for micro controllers similarly do not support an MMU, such as eCOS, and FreeRTOS.
Of RTOS's that do support an MMU, QNX is probably amongst the most robust and mature, while still maintaining high performance.
I'm not sure why you would want to disable virtual memory mapping - it's a built in function of the cpu, and pretty much essential when running an OS to properly isolate processes from each other.
Most operating systems allow you to disable virtual memory, so that your memory capacity is limited by physical memory. However, A processes address space is still virtual, and virtual to physical mapping is still happening.
A way to get what you want is to run an operating system that executes in Real Mode, such as DOS or Windows 3.0, or write your own.
The advantages of virtual memory far outweigh the disadvantages. Why do you want to avoid virtual memory.
This is how some older operating systems and even how some modern operating systems that lack VM still work. It has many disadvantages for things like desktop and server applications but it can be useful in an embedded and/or real-time context, or where you have minimal hardware.
The VxWorks AE(Advanced Edition supports) deviates from the concept of Common address space for all tasks.So it can effectively be used in both systems with MMU and without MMU .The common address space for all tasks is called flat memory model and the separate address space for different tasks is called over lapped memory model or segmented memory model.You should not confuse the memory model with the memory lay out as seen in object files which divides data in to Code Segment ,Data Segment ,BSS etc .Both are entirely different things :).
This link in stack overflow will help better
Difference between flat memory model and protected memory model?

Dynamic memory managment under Linux

I know that under Windows, there are API functions like global_alloc() and such, which allocate memory, and return a handle, then this handle can be locked and a pointer returned, then unlocked again. When unlocked, the system can move this piece of memory around when it runs low on space, optimising memory usage.
My question is that is there something similar under Linux, and if not, how does Linux optimize its memory usage?
Those Windows functions come from a time when all programs were running in the same address space in real mode. Linux, and modern versions of Windows, run programs in separate address spaces, so they can move them about in RAM by remapping what physical address a particular virtual address resolves to in the page tables. No need to burden the programmer with such low level details.
Even on Windows, it's no longer necessary to use such functions except when interacting with a small number of old APIs. I believe Raymond Chen's blog and book have some discussions of the topic if you are interested in more detail. Eg here's part 4 of a series on the history of GlobalLock.
Not sure what Linux equivalent is but in ATT UNIX there are "scatter gather" memory management functions in the memory manager of the core OS. In a virtual memory operating environment there are no absolute addresses so applications don't have an equivalent function. The executable object loader (loads executable file into memory where it becomes a process) uses memory addressing from the memory manager that is all kept track of in virtual memory blocks maintained in its page table (which contains the physical memory addresses). Bottom line is your applications physical memory layout is likely in no way ever linear or accessible directly.
