Xen E820 memory map for domU kernels - linux

How does xen handle E820 memory map for domU kernels? In my specific problem I am trying to map Non-Volatile RAM to domU kernels. The dom0 memory map returns the following relevant parts.
100000000-17fffffff : System RAM (4GB to 6GB)
180000000-37fffffff : reserved (6GB to 14GB)
The second line corrosponds to the NVRAM which is a region from 6GB to 14GB in the dom0 kernel. How can I map this NVRAM region to the domU kernel which does not map this region at all.
Ultimately I want to the nvram region to be available in other domU VMs so any solutions or advice would be highly helpful.
P.S. :: If I attempt to write to this region from the domU kernel will Xen intercept this write operation. Actually this is just a memory region write which should not be a problem, but it might appear as a hardware access.

Guest domains in Xen have two different models for x86:
1. Hardware Virtual Machine (HVM) : It takes benefit of Intel VT or AMD SVM extensions to enable true virtualization on x86 platform
2. Paravirtualized (PV) : This mode adds modifications in the source code of the operating system to get rid of the x86 virtualization problems and also add performance boost to the system.
These two different models handle the E820 memory map differently. E820 memory map basically gives an OS the physical address space to operate on along with the location of I/O devices. In PV mode I/O devices are available through Xenstore. The domain builder only provides a console device during boot to the pv guest. All other I/O devices have to be mapped by the guest. The guest in this mode starts execution in protected mode instead of real mode for x86. The domain builder maps the start_info pages into the guest domain's physical address space. This start_info pages contain most of the information to initialize a kernel such as number of available pages, number of CPUs, console information, Xenstore etc. E820 memory map in this context would just consist of the number of available memory pages because BIOS is not emulated and I/O device information is provided separately through Xenstore.
On the otherhand, in HVM guest BIOS and other devices have to be emulated by Xen. This mode should support any unmodified OS, thus we cannot use the previous method. BIOS emulation is done via code borrowed from Bochs, while devices are emulated using QEMU code. Here an OS is provided with an E820 memory map, build by the domain builder. The HVM domain builder would typically pass the memory layout information to the Bochs emulator which then performs the required task.
To get hold of the NVRAM pages you will have to build a separate MMU for NVRAM. This MMU should handle all the NVM pages and allocate/free it on demand just like the RAM pages. It is a lot of work.

Related

Is it possible to partially virtualize the physical address space?

I'm currently working on systems that include embedded Linux and FPGAs. We've various IP cores that support the AXI-Bus. To communicate with the IP cores of PL (programable logic), we need to map them onto the address space of the PS (processing system). For example, for the widely used Zynq PS the address space is as follows (UG585 - Section 4.1: Address Map)
0x0000_0000 to 0x7FFF_FFFF: Mapped to the physical memory. Either external DDR or on-chip memory
0x8000_0000 to 0xBFFF_FFFF: Mapped to the PL explained above
0xE000_0000 to 0xFFFF_FFFF: Various other devices on the chip
As you can see, only the first 1GB of the address space is reserved to the physical memory, and the rest is occupied by the devices either in PL or PS. So, if possible, the virtualization range can be applied only for the first 1GB to allow faster access to devices on the chip by skipping the MMU.
I know that by doing such a modification we allow any kind of process to access the physical devices of the system without any control of its privileges. So, the questions are
Is it possible to partially virtualize the physical address space in Linux or any other OS?
If it is possible, would it be rational to do it?

Virtual Memory and Virtual Address in Linux

I am currently studying about virtual memory in operating system and I have few questions.
Is swap partition or swap file same as virtual memory in terms of Linux?
If yes, then in case I've no swapping enabled in my Linux system, does that mean my system has no virtual memory?
I have also read that virtual memory makes system more secure because with virtual memory, CPU generates virtual addresses which are then translated to actual physical addresses by MMU, therefore securing the system because no process can actually interact with the actual physical memory. So if I just enable swapping on my Linux system, will my CPU start generating virtual addresses and currently it's directly generating physical addresses as I have no swap partition?
How does CPU know if virtual memory is present or not?
Having no swap file/partition doesn't imply that you don't have virtual memory. Modern operating-systems always use paging/virtual memory no matter what.
Is swap partition or swap file same as virtual memory in terms of Linux?
No swap file and virtual memory is not the same in terms of any OS. Virtual memory just says that all memory accesses are going to be translated by the MMU using the page tables. Modern OSes always use paging.
If yes, then in case I've no swapping enabled in my Linux system, does that mean my system has no virtual memory?
Your system certainly has virtual memory. To use long mode (64bits mode), the OS must enable paging. I doubt that you have a system old enough to not use paging. Page swapping to the hard-disk is not virtual memory. It is more like a feature of virtual memory that can be used to extend physical memory because a page which isn't required immediatly can be swapped to the hard-disk momentarily.
I have also read that virtual memory makes system more secure because with virtual memory, CPU generates virtual addresses which are then translated to actual physical addresses by MMU, therefore securing the system because no process can actually interact with the actual physical memory. So if I just enable swapping on my Linux system, will my CPU start generating virtual addresses and currently it's directly generating physical addresses as I have no swap partition?
Your computer certainly has paging/virtual memory enabled. Having no swap partition doesn't mean that you don't have virtual memory. Paging can also be used to avoid fragmentation of RAM and for security. You are right that paging is securing your system because the page tables prevent a process from accessing the memory of another process. It also has ring privilege on a page to page basis which allows to differentiate between kernel mode and user mode code.
How does CPU know if virtual memory is present or not?
The OS just enables paging by setting a bit in a control register. Then the CPU starts blindly translating every memory accesses using the MMU.
No. Swap file is not the same as virtual memory.
Once the firmware/kernel sets up the necessary registers and/or in-memory data structures and switches the processor mode, virtual memory mappings are used for accessing the physical memory.
Yes, the inability of processes to refer to memory locations without a mapping allows the kernel to employ isolation and access control mechanisms.
Through active mappings, different virtual addresses can map to the same physical memory region at different times. The kernel can maintain the illusion that a larger amount of memory is available that the capacity of the actual physical memory, where only a subset of the virtual memory resides in the physical memory at any given time. The rest is stored in the swap file.
Accesses to virtual addresses where the corresponding data is currently in the swap file are trapped by the kernel (via a page fault) and might lead to the kernel swapping the data in, and swapping some other data from physical memory out.
If you disable the swap file, the kernel has no place store the swapped out data. This reduces the amount of virtual memory available.

What role do QEMU and KVM play in virtual machine I/O?

I find the boundary between QEMU and KVM very blurred. I find that someone says a virtual machine is a qemu process while others say a kvm process. What is it exactly?
And what role does QEMU and KVM plays in virtual machine I/O? For example, when a vm does PIO/MMIO, is it qemu or kvm that will trap it and turns it to hardware operation. Or both are responsible?
KVM: the code in the Linux kernel which provides a friendly interface to userspace for using the CPU virtualization. This includes functions that userspace can call for "create CPU", "run CPU", etc. For a full virtual machine, you need to have some userspace code which can use this. This is usually either QEMU, or the simpler "kvmtool"; some large cloud providers have their own custom userspace code instead.
QEMU: emulates a virtual piece of hardware, including disks, memory, CPUs, serial port, graphics, and other devices. Also provides mechanisms (a UI, and some programmable interfaces) for doing things like starting, stopping, and migrating. QEMU supports several different 'accelerator' modes for how it handles the CPU emulation:
TCG: pure emulation -- works anywhere, but very slow
KVM: uses the Linux kernel's KVM APIs to allow running guest code using host CPU hardware virtualization support
hax: similar to KVM, but using the Intel HAXM code, which will work on Windows hosts
From an implementation point of view the boundary between KVM and QEMU is very clear -- KVM is a part of the host kernel, whereas QEMU is a separate userspace program. For a user, you typically don't have to care where the boundary is, because that's an implementation detail.
To answer your question about what happens for MMIO:
the guest makes an MMIO access
this is trapped to the host kernel by the hardware
the host kernel (KVM) might then emulate this MMIO access itself, because a few devices are implemented in the kernel for performance reasons. This usually applies only to the interrupt controller and perhaps the iommu.
otherwise, KVM reports the MMIO access back to userspace (ie to QEMU, kvmtool, etc)
userspace then can handle the access, using its device emulation code
userspace then returns the result (eg the data to return for a read) to the kernel
the kernel updates the vcpu's register state as required to complete emulation of the instruction
the kernel then resumes execution of the VM at the following instruction

Application processor memory map

What information is contained in the memory map of application processor? Is it tells which subsystem can access which area of RAM or it means if CPU tries to access an address based on memory map it can be RAM address or a device address? I am referring this documentation
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0515b/CIHIJJJA.html.
Here 0x00_0000_0000 to 0x00_0800_0000 is mapped to the boot region, what does that imply?
The style of memory map diagram you've linked to shows how the processor and peripherals will decode physical memory addresses. This is a normal diagram for any System-on-Chip device, though the precise layout will vary. The linked page actually lists which units of the SoC use this memory map for their address decoding, and it includes the ARM and the Mali graphics processor. In a Linux system, much of this information will be passed to the kernel in the device tree. It's important to remember that this tells us nothing about how the operating system chooses to organise the virtual memory addresses.
Interesting regions of this are:
DRAM - these addresses will be passed to the DRAM controller. There is no guarantee that the specific board being used has DRAM at all of that address space. The boot firmware will set up the DRAM controller and pass those details to the operating system.
PCIe - these addresses will be mapped to the PCIe controller, and ultimately to transfers on the PCIe links.
The boot region on this chip by default contains an on-chip boot rom and working space. On this particular chip there's added complexity caused by ARMs TrustZone security architecture, which means that application code loaded after boot may not have access to this region. On the development board it should be possible to override this mapping and boot from external devices.
The memory map contains an layout of the memory of your device.
It tells your OS, where the OS can place data and how it is accessed, as some areas may be only accessible in a privileged state.
Your boot image will be placed in the boot area.This defines among other things your entry point.

Windows Program Memory Vs Linux Program Memory

Linux creates virtual memory pages for every program to use, and the OS handles mapping the virtual addresses to genuine hardware addresses, correct?
But how does Windows do this? Do Windows programs actually have memory that translates to real hardware addresses? I'm also aware that windows can use hard disk memory when RAM is over used, and this process is again called virtual memory, but I believe this is an entirely different concept?
Windows and Linux (at least on Intel 32/64 bit systems) both implement virtual memory using the same mechanism: hardware supported page tables. The OS and the hardware cooperate together to do the address mapping.
The entire concept of separating the logical addresses a program uses from the physical addresses is what is called virtual memory. The use of the hard disk as a backing store is an implementation of virtual memory that uses a swap file to increase the amount of virtual memory to an amount greater than the physical memory installed in the system.
Virtual memory is a pretty deep and wide subject. Maybe start with this Wiki article an Memory Management and then hit the googles for a deeper understanding.

Resources