how do VM's virtualize HW - io

Suppose I have a machine running Mac OS X, which is running VMware, which is running Ubuntu, which is running the canonical helloworld.c in a shell. What are the high-level sequence of events that occur between me pressing enter and Hello World! popping up on my screen?
I can understand that everything sitting above Ubuntu acts obliviously to the virtualization occurring. Additionally, I can somewhat understand from the point of view of Mac OS X, VMware is just another program - nothing special there. However, I don't understand how Ubuntu thinks it's interacting with HW, especially if it's not actually running in kernal-mode?
I'm just learning about OS's - so may not understand the full picture. Is there an additional sw/fw layer underneath the OS which the hypervisor emulates?

What 'Ubuntu' is (or any other application) is a set of bytes that either represent instructions (opcodes long with their arguemnts) or data.
The instructions are decoded and executed by the CPU. The data is mostly read into the memory (lets say a group of constants, static variables, etc.).
VMware is basically a virtual computer hardware platform (here it's a virtualization of the x86 platform). This means that it reads all the bytes of an application (a raw binary, a PE or ELF exec, whatever) and tries to act as an x86 CPU. If done properly this is indistinguishable to anything interpreted by it.
This isn't abstraction - it doesn't hide the communication method with the hardware abstracting it to some higher-level access method (like the Linux filesystem for example). It just tries to act like a x86 CPU the best it can, an abstraction would be a clearly visible layer.
As an example - C is an abstraction over ASM/machine language, you can tell the difference between them quite clearly.

Related

Xen binary rewriting method

In full virtualization, what is the CPL of guest OS?
in paravertualiation, CPL of guest OS is 1(ring 1)
is it same in full virtualization?
and I heard that some of the x86 privileged instructions are
not easily handled, thus "binary rewriting" method is required...
how does this "binary rewriting" happens??
I understand that in virtualization, CPU is not emulated.
so how can hypervisor change the binary instruction codes before
the CPU executes them?? do they predict the next instruction on memory and
update the memory contents before CPU gets there??
if this is true, I think hypervisor code(performing binary rewriting)
needs to intercept the CPU every time before some instruction of guest OS is
executed. I think this is absurd.
specific explanation will be appreciated.
thank you in advance..!!
If by full virtualization, you mean hardware supported virtualization, then the CPL of the guest is identical to if it was running on bare-metal.
Xen never rewrites the binary.
This is something that VMWare (as far as I understand). To the best of my understanding (but I have never seen the VMWare source code), the method consists of basically doing runtime patching of code that needs to run differently - typically, this involves replacing an existing op-code with something else - either causing a trap to the hypervisor, or a replacement set of code that "does the right thing". If I understand how this works in VMWare is that the hypervisor "learns" the code by single-stepping through a block, and either applies binary patches or marks the section as "clear" (doesn't need changing). The next time this code gets executed, it has already been patched or is clear, so it can run at "full speed".
In Xen, using paravirtualization (ring compression), then the code in the OS has been modified to be aware of the virtualized environment, and as such is "trusted" to understand certain things. But the hypervisor will still trap for example writes to the page-table (otherwise someone could write a malicious kernel module that modifies the page-table to map in another guest's memory, or some such).
The HVM method does intercept CERTAIN instructions - but the rest of the code runs at normal full speed, thanks to the hardware support in modern processors, such as SVM in AMD and VMX in Intel processors. ARM has a similar technology in the latest models of their processors, but I'm not sure what the name of it is.
I'm not sure if I've answered quite all of your questions, if I've missed something, or it's not clear enough, feel free to ask...

Monitoring the instructions of a running program in ubuntu?

I'm a little stuck here.
The idea is that I'd like to get a file of every instruction run by a program during it's execution. I'd like to do it with just the executable in hand (no source) and be able to determine what operation is occuring on what address when.
For example, I'd like to be able to run it on Google Chrome, Firefox, etc.
I want to use this for a performance prediction system I'm working on. I figure if I'm able to obtain each instruction that is executed in order it is executed on system 1, I can attempt to simulate/model the run time of an identical program being run on system 2, because I'll be able to predict(although I know not with 100% accuracy) L1/L2 cache-misses, L1/L2 cache-hits, TLB hits/misses, page faults, time taken on floating point multiplication operations, etc.
I'd like to try to do this on two different systems:
System 1: Ubuntu 10.10 on Intel Core 2 Duo CPU
System 2: Ubuntu 12.04 on system with 2x AMD Sixteen Core Opteron model 6274
(I can definitely change the OS's as neccessary, but would prefer to stay with Ubuntu, if possible)
Is this possible / how could I go about doing it? I know with debuggers, you can use them to step through everything, but I don't have the source available.
I think, you can use qemu (or even bochs) or valgrind to monitor every executed instruction. They are x86 binary translation tools (excluding bochs - which is an interpreter of x86 code). There is a valgrind tool called cachegrind (+ kcachegrind gui), which is ready to emulate cache by instrumenting every memory access and simulating some L1/L2 cache model (sizes may be configured via command line options).
To get deeper (into pipeline) you may want to look on free ptlsim (http://www.ptlsim.org/)

Address space identifiers using qemu for i386 linux kernel

Friends, I am working on an in-house architectural simulator which is used to simulate the timing-effect of a code running on different architectural parameters like core, memory hierarchy and interconnects.
I am working on a module takes the actual trace of a running program from an emulator like "PinTool" and "qemu-linux-user" and feed this trace to the simulator.
Till now my approach was like this :
1) take objdump of a binary executable and parse this information.
2) Now the emulator has to just feed me an instruction-pointer and other info like load-address/store-address.
Such approaches work only if the program content is known.
But now I have been trying to take traces of an executable running on top of a standard linux-kernel. The problem now is that the base kernel image does not contain the code for LKM(Loadable Kernel Modules). Also the daemons are not known when starting a kernel.
So, my approach to this solution is :
1) use qemu to emulate a machine.
2) When an instruction is encountered for the first time, I will parse it and save this info. for later.
3) create a helper function which sends the ip, load/store address when an instruction is executed.
i am stuck in step2. how do i differentiate between different processes from qemu which is just an emulator and does not know anything about the guest OS ??
I can modify the scheduler of the guest OS but I am really not able to figure out the way forward.
Sorry if the question is very lengthy. I know I could have abstracted some part but felt that some part of it gives an explanation of the context of the problem.
In the first case, using qemu-linux-user to perform user mode emulation of a single program, the task is quite easy because the memory is linear and there is no virtual memory involved in the emulator. The second case of whole system emulation is a lot more complex, because you basically have to parse the addresses out of the kernel structures.
If you can get the virtual addresses directly out of QEmu, your job is a bit easier; then you just need to identify the process and everything else functions just like in the single-process case. You might be able to get the PID by faking a system call to get_pid().
Otherwise, this all seems quite a bit similar to debugging a system from a physical memory dump. There are some tools for this task. They are probably too slow to run for every instruction, though, but you can look for hints there.

Would executable files be Machine Code - made for the hardware?

Here is from Wiki .
"In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," ( Machine Code ?? )
"Modern operating systems retain control over the computer's resources, requiring that individual programs make system calls to access privileged resources. Since each operating system family features its own system call architecture, executable files are generally tied to specific operating systems."
Well this is my perspective .
Executables cannot be Machine Code as they need to tal to the OS for hardware services ( system calls) Hence executable is just not yet "Machine Code" ... Perhaps it is like some part of the code is actual Machine Code and some parts are just meant to call the Machine code embedded in the Operating system ? Overall it contains some junks of Machine Code - and some junks of codes to call the operating system .
Edited after Damon's Answer :
In the end OS is a set of machine codes . Basically OS would be doing the job of copy pasting user's Machine Code ( created by C Compiler ) and then if the instruction is a system call , the transfer goes to OS memory region for handling it . Now the question is what Machine Code generated in C can do this part ? Like asking to transfer control to OS etc - I suppose its system calls at higher abstraction but under the hood - how does it work .
I get a feeling its similar to chicken egg problem , C creates OS and C uses OS Cant find the exactly how the process goes .
Can anyone break the puzzle for me ?
One thing does not exclude the other. Executables are (unless they are some form of bytecode running in a virtual machine) machine code. However, there are different kinds of instructions, some of which are not usable at certain privilegue levels.
That is where the operating system comes in, it is "machine code" that runs at the highest privilegue level, working as arbiter for the "important" parts and tasks, such as deciding who gets CPU time and what value goes into some hardware register.
(originally comment, made an answer by request)
EDIT: About your extended question, this works approximately as follows. When the computer is turned on, the processor runs at its highest privilegue level. In this "mode", the BIOS, the boot loader, and the operating system can do just what they want. This sounds great, but you don't want any kind of code being able to do just whatever it wants.
For example, the code can tell the MMU which memory pages are allowed to be read or written to, and which ones are not. Or, it can define what address is called if "something special" such as a trap or interrupt happens. Or, it can directly write to some special memory addresses that map ports of some devices (disk, network, whatever).
Eventually, the OS switches to "unprivileged" mode and calls some non-OS code. When a trap or interrupt happens, execution is interrupted and continues elsewhere (as specified by the OS previously), and the privilege level is upped again. Once the interrupt has been dealt with, privilege is taken away, and user code is called again.
If a user program needs the OS to do something "OS like", it sets up parameters according to an agreed scheme (for example in some particular registers) and executes a trap instruction.
This is for example how things like multithreading or virtual memory are implemented. In regular intervals, a timer fires off an interrupt, which stops execution of "normal" code, and calls some code in the kernel (in privileged mode). That code then decides what user process control should returned to, after some kind of priority scheme. Those are the "CPU time slices" that are handed out.
If some process reads from or writes to a page that it isn't allowed, a trap is generated by the MMU. The OS then looks at what happened and where, and decides whether to load some data from disk into some memory region (and possibly purge something else) and change the process' mappings, or whether to kill the process with a "segmentation fault" error.
Of course in reality, it is a million times more complicated, but in principle that's about as it works.
It does not really matter whether the OS or the programs were originally written in C or with an assembler. To the processor, it's just a sequence of machine instructions. Even a python or perl script is "just machine instructions" in the end, only with a detour via the interpreter.

Protected Mode Keyboard Access on x86 Assembly

I'm working on keyboard input for a very basic kernel that I'm developing and I'm completely stuck. I can't seem to find any information online that can show me the information I need to know.
My kernel is running in protected mode right now, so I can't use the real mode keyboard routines without jumping into real mode and back, which I'm trying to avoid. I want to be able to access my keyboard from protected mode. Does anyone know how to do this? The only thing I have found so far is that it involves talking to the controller directly using in/out ports, but beyond that I'm stumped. This is, of course, is not something that comes up very often. Normally, Assembly tutorials assume you're running an operating system underneath.
I'm very new to the x86 assembly, so I'm just looking for some good resources for working with the standard hardware from protected mode. I'm compiling the Assembly source code with NASM and linking it to the C source code compiled with DJGPP. Any suggestions?
The MIT operating systems class has lots of good references. In particular, check out Adam Chapweske's resources on keyboard and mouse programming.
In short, yes, you will be using the raw in/out ports, which requires either running in kernel mode, or having the I/O permission bits (IOPL) set in the EFLAGS register. See this page for more details on I/O permissions.
You work with standard legacy hardware the same way on real and protected modes. In this case, you want to talk with the 8042 at I/O ports 0x60 to 0x6f, which in turn will talk to the controller within the keyboard at the other end of the wire.
A quick Google search found me an interesting resource at http://heim.ifi.uio.no/~stanisls/helppc/8042.html (for the 8042) and http://heim.ifi.uio.no/~stanisls/helppc/keyboard_commands.html (for the keyboard).
In case you are not used to it, you talk with components at I/O ports via the IN (read) and OUT (write) opcodes, which receive the I/O port number (a 16-bit value) and the value to be read or written (either 8, 16, or 32 bits). Note that the size read or written is important! Writing 16 bits to something which is expecting 8 bits (or vice versa) is a recipe for disaster. Get used to these opcodes, since you will be using them a lot (it is the only way to talk to some peripherals, including several essential ones; other peripherals use memory-mapped I/O (MMIO) or bus-mastering DMA).
The 8042 PS/2 Controller looks like the simplest possibility.
The oszur11 OS tutorial contains a working example under https://sourceforge.net/p/oszur11/code/ci/master/tree/Chapter_06_Shell/04_Makepp/arch/i386/arch/devices/i8042.c
Just:
sudo apt-get install build-essential qemu
sudo ln -s /usr/bin/qemu-system-i386 /usr/bin/qemu
git clone git://git.code.sf.net/p/oszur11/code oszur11
cd oszur11/Chapter_06_Shell/04_Makepp
make qemu
Tested on Ubuntu 14.04 AMD64.
My GitHub mirror (upstream inactive): https://github.com/cirosantilli/oszur11-operating-system-examples
Not reproducing it here because the code it too long, will update if I manage to isolate the keyboard part in a minimal example.

Resources