How does linux protect memory? - linux

I'm interested in how linux runs in protected mode from an assembly point of view. Which registers and interrupts are used when it comes to putting the cpu in protected mode for an i386:0x86_64 machine? I understand how memory managment works when I look at the c source of functions like mmap and mprotect, however whats keeping me from taking over with assembly? Where can I get more info on this?

I believe you're looking for arch/x86/mm/ -- arch/x86/mm/init.c sets up the page tables for the correct architecture (ia32 or AMD64) and takes into account the processor features available (PSE, PGE, etc.).

It bears stressing: This is a function of the processor. Linux tells the processor what to protect, and the processor does it.
AFA the system call interface, have a glance at http://stromberg.dnsalias.org/~strombrg/pbmonherc.html from back before the C library had mmap, but after the Linux kernel did. See file mmap.c.

Related

Why did bootsect move itself to 0x90000 in linux(x86)?

I'm studying the process of x86-system booting
and Here is the booting flow:
BIOS load the bootsect from disk MBR to 0x7c00 memory address
boosect copy itself to 0x90000 memory address and jump to 0x90000.
boosect load setup from disk to 0x90200 memory address.
Get some system peripheral device parameters (video, root disk, keyboard,…,etc.) and jump to 0x90200.
Switch system into protected mode move kernel from 0x10000(64K) to 0x0000
Jump to 0x0000 and execute head.s for kernel boot
My question is that why we need to move bootsect itself to 0x90000 first?
Why can't we just move setup and system?
Thanks.
It was (and still is) a good practice to "shadow copy" your bootloader and jump to it. This practice began early when the typical boot loader was limited to the size of a single segment on an x86 processor and a single read sector from disk. Once interrogating the hardware a boot loader could do more advanced work, like install system files (calls, hooks, TSRs, etc), be taken over by viruses, or initialize protected mode and start performing hardware paging of applications, etc.
The origin of the "behavior" predates Linux, you should find that this behavior was common to x86 bootloaders. Possibly any computer based on the IBM PC.
The code presently in Linux was probably derived from this:
Fx. https://stuff.mit.edu/afs/sipb/user/warlord/C/memtest86/bootsect.s
In which case the choice to relocate to 0x90000 is likely arbitrary, the goal was to move the loader out of the default location into a location of its own choice where it wouldn't be tampered with by programs which might allocate from "low mem" (in effect: as a matter of practice.)
I would like to see a definite reason myself :) pretty sure it's really just a remnant of a time when the x86 platform was a DOS platform, and as the hardware evolved new tricks were employed to remain backward compatible with "unfriendly" lowmem code.
I believe that moving the boot sector out of the way was mostly a matter of convenience - there is no hard technical reason that it could not be done otherwise.
That said, 0x7c00 lies less than 32KiB from the start of the memory. 32KiB is often not enough for the setup stage of the kernel, let alone the kernel itself. 0x90000 is well under the area that is reserved by the PC BIOS, while also leaving enough space for the kernel.
In any case, the process you are referring to has not been used by the Linux kernel for several years. The addresses you mentioned are used by versions of the Linux Boot Protocol before v2.02, which was first used with linux-2.4.0. I think that the kernel itself stopped being directly bootable with linux-2.6.0 or so. The arch/i386/boot/bootsect.S file of that version would output a message to that effect when someone attempted to boot the kernel directly.
These days the kernel is usually loaded by a separate bootloader, which is free to use whatever approach it wishes as long as it complies with the boot protocol. The bootloader may have several stages and may even do kernel-y things, such as switching to protected mode itself.

Linux kernel assembly and logic

My question is somewhat weird but I will do my best to explain.
Looking at the languages the linux kernel has, I got C and assembly even though I read a text that said [quote] Second iteration of Unix is written completely in C [/quote]
I thought that was misleading but when I said that kernel has assembly code I got 2 questions of the start
What assembly files are in the kernel and what's their use?
Assembly is architecture dependant so how can linux be installed on more than one CPU architecture
And if linux kernel is truly written completely in C than how can it get GCC needed for compiling?
I did a complete find / -name *.s
and just got one assembly file (asm-offset.s) somewhere in the /usr/src/linux-headers-`uname -r/
Somehow I don't think that is helping with the GCC working, so how can linux work without assembly or if it uses assembly where is it and how can it be stable when it depends on the arch.
Thanks in advance
1. Why assembly is used?
Because there are certain things then can be done only in assembly and because assembly results in a faster code. For eg, "you can get access to unusual programming modes of your processor (e.g. 16 bit mode to interface startup, firmware, or legacy code on Intel PCs)".
Read here for more reasons.
2. What assembly file are used?
From: https://www.kernel.org/doc/Documentation/arm/README
"The initial entry into the kernel is via head.S, which uses machine
independent code. The machine is selected by the value of 'r1' on
entry, which must be kept unique."
From https://www.ibm.com/developerworks/library/l-linuxboot/
"When the bzImage (for an i386 image) is invoked, you begin at ./arch/i386/boot/head.S in the start assembly routine (see Figure 3 for the major flow). This routine does some basic hardware setup and invokes the startup_32 routine in ./arch/i386/boot/compressed/head.S. This routine sets up a basic environment (stack, etc.) and clears the Block Started by Symbol (BSS). The kernel is then decompressed through a call to a C function called decompress_kernel (located in ./arch/i386/boot/compressed/misc.c). When the kernel is decompressed into memory, it is called. This is yet another startup_32 function, but this function is in ./arch/i386/kernel/head.S."
Apart from these assembly files, lot of linux kernel code has usage of inline assembly.
3. Architecture dependence?
And you are right about it being architecture dependent, that's why the linux kernel code is ported to different architecture.
Linux porting guide
List of supported arch
Things written mainly in assembly in Linux:
Boot code: boots up the machine and sets it up in a state in which it can start executing C code (e.g: on some processors you may need to manually initialize caches and TLBs, on x86 you have to switch to protected mode, ...)
Interrupts/Exceptions/Traps entry points/returns: there you need to do very processor-specific things, e.g: saving registers and reenabling interrupts, and eventually restoring registers and properly returning to user mode. Some exceptions may be handled entirely in assembly.
Instruction emulation: some CPU models may not support certain instructions, may not support unaligned data access, or may not have an FPU. An option is using emulation when getting the corresponding exception.
VDSO: the VDSO is a virtual library that the kernel maps into userspace. It allows e.g: selecting the optimal syscall sequence for the current CPU (on x86 use sysenter/syscall instead of int 0x80 if available), and implementing certain system calls without requiring a context switch (e.g: gettimeofday()).
Atomic operations and locks: Maybe in a future some of these could be written using C11 support for atomic operations.
Copying memory from/to user mode: Besides using an optimized copy, these check for out-of-bounds access.
Optimized routines: the kernel has optimized version of some routines, e.g: crypto routines, memset, clear_page, csum_copy (checksum and copy to another place IP data in one pass), ...
Support for suspend/resume and other ACPI/EFI/firmware thingies
BPF JIT: newer kernels include a JIT compiler for BPF expressions (used for example by tcpdump, secmode mode 2, ...)
...
To support different architectures, Linux has assembly code (re-)written for each architecture it supports (and sometimes, there are several implementations of some code for different platforms using the same CPU architecture). Just look at all the subdirectories under arch/
Assembly is needed for a couple of reasons.
There are many instructions that are needed for the operation of an operating system that have no C equivalent, at least on most processors. A good example on Intel x86/64 processors is the iret instruciton, which returns from hardware/software interrupts. These interrupts are key to handling hardware events (like a keyboard press) and system calls from programs on older processors.
A computer does not start up in a state that is immediately ready for execution of C code. For an Intel example, when execution gets to the startup routine the processor may not be in 32-bit mode (or 64-bit mode), and the stack required by C also may not be ready. There are some other features present in some processors (like paging) which need to be turned on from assembly as well.
However, most of the Linux kernel is written in C, which interfaces with some platform specific C/assembly code through standardized interfaces. By separating the parts in this way, most of the logic of the Linux kernel can be shared between platforms. The build system simply compiles the platform independent and dependent parts together for specific platforms, which results in different executable kernel files for different platforms (and kernel configurations for that matter).
Assembly code in the kernel is generally used for low-level hardware interaction that can't be done directly from C. They're like a platform- specific foundation that's used by higher-level parts of the kernel that are written in C.
The kernel source tree contains assembly code for a variety of systems. When you compile a kernel for a particular type of system (such as an x86 PC), only the appropriate assembly code for that platform is included in the build process.
Linux is not the second version of Unix (or Unix in general). It is Unix compatible, but Unix and Linux have separate histories and, in terms of code base (of their kernels), are completely separate. Linus Torvald's idea was to write an open source Unix.
Some of the lower level things like some of the architecture dependent parts of memory management are done in assembly. The old (but still available) Linux kernel API for x86, int 0x80, is implemented in assembly. There are probably other places in the kernel that are implemented in assembly, but I don't know any others.
When you compile the kernel, you select an architecture to target. Depending on the target, the right assembly files for that architecture are included in the build.
The reason you don't find anything is because you're searching the headers, not the sources. Download a tar ball from kernel.org and search that.

Why do we need a bootloader in an embedded device?

I'm working with ELinux kernel on ARM cortex-A8.
I know how the bootloader works and what job it's doing. But i've got a question - why do we need bootloader, why was the bootloader born?
Why we can't directly load the kernel into RAM from flash memory without bootloader? If we load it what will happen? In fact, processor will not support it, but why are we following the procedure?
In the context of Linux, the boot loader is responsible for some predefined tasks. As this question is arm tagged, I think that ARM booting might be a useful resource. Specifically, the boot loader was/is responsible for setting up an ATAG list that describing the amount of RAM, a kernel command line, and other parameters. One of the most important parameters is the machine type. With device trees, an entire description of the board is passed. This makes a stock ARM Linux impossible to boot with out some code to setup the parameters as described.
The parameters allows one generic Linux to support multiple devices. For instance, an ARM Debian kernel can support hundreds of different board types. Uboot or other boot loader can dynamically determine this information or it can be hard coded for the board.
You might also like to look at bootloader info page here at stack overflow.
A basic system might be able to setup ATAGS and copy NOR flash to SRAM. However, it is usually a little more complex than this. Linux needs RAM setup, so you may have to initialize an SDRAM controller. If you use NAND flash, you have to handle bad blocks and the copy may be a little more complex than memcpy().
Linux often has some latent driver bugs where a driver will assume that a clock is initialized. For instance if Uboot always initializes an Ethernet clock for a particular machine, the Linux Ethernet driver may have neglected to setup this clock. This can be especially true with clock trees.
Some systems require boot image formats that are not supported by Linux; for example a special header which can initialize hardware immediately; like configuring the devices to read initial code from. Additionally, often there is hardware that should be configured immediately; a boot loader can do this quickly whereas the normal structure of Linux may delay this significantly resulting in I/O conflicts, etc.
From a pragmatic perspective, it is simpler to use a boot loader. However, there is nothing to prevent you from altering Linux's source to boot directly from it; although it maybe like pasting the boot loader code directly to the start of Linux.
See Also: Coreboot, Uboot, and Wikipedia's comparison. Barebox is a lesser known, but well structured and modern boot loader for the ARM. RedBoot is also used in some ARM systems; RedBoot partitions are supported in the kernel tree.
A boot loader is a computer program that loads the main operating system or runtime environment for the computer after completion of the self-tests.
^ From Wikipedia Article
So basically bootloader is doing just what you wanted - copying data from flash into operating memory. It's really that simple.
If you want to know more about boostrapping the OS, I highly recommend you read the linked article. Boot phase consists, apart from tests, also of checking peripherals and some other things. Skipping them makes sense only on very simple embedded devices, and that's why their bootloaders are even simpler:
Some embedded systems do not require a noticeable boot sequence to begin functioning and when turned on may simply run operational programs that are stored in ROM.
The same source
The primary bootloader is usually built in into the silicon and performs the load of the first USER code that will be run in the system.
The bootloader exists because there is no standardized protocol for loading the first code, since it is chip dependent. Sometimes the code can be loaded through a serial port, a flash memory, or even a hard drive. It is bootloader function to locate it.
Once the user code is loaded and running, the bootloader is no longer used and the correctness of the system execution is user responsibility.
In the embedded linux chain, the primary bootloader will setup and run the Uboot. Then Uboot will find the linux kernel and load it.
Why we can't directly load the kernel into RAM from flash memory without bootloader? If we load it what will happen? In fact, processor will not support it, but why are we following the procedure?
Bartek, Artless, and Felipe all give parts of the picture.
Every embedded processor type (E.G. 386EX, Coretex-A53, EM5200) will do something automatically when it is reset or powered on. Sometimes that something is different depending on whether the power is cycled or the device is reset. Some embedded processors allow you to change that something based on voltages applied to different pins when the device is powered or reset.
Regardless, there is a limited amount of something that a processor can do, because of the physical space on-processor required to define that something, whether it is on-chip FLASH, instruction micro-code, or some other mechanism.
This limit means that the something is
fixed purpose, does one thing as quickly as possible.
limited in scope and capability, typically loading a small block of code (often a few kilobytes or less) into a fixed memory location and executing from the start of the loaded code.
unmodifiable.
So what a processor does in response to reset or power-cycle cannot be changed, and cannot do very much, and we don't want it to automatically copy hundreds of megabytes or gigabytes into memory which may not exist or may not be initialized, and which could take a looooong time.
So....
We set up a small program which is smaller than the smallest size permitted across all of the devices we are going to use. That program is stored wherever the something needs it to be.
Sometimes the small program is U-Boot. Sometimes even U-Boot is too big for initial load, so the small program then in turn loads U-Boot.
The point is that whatever gets loaded by the something, is modifiable as needed for a particular system. If it is U-Boot, great, if not, it knows where to load the main operating system or where to load U-Boot (or some other bootloader).
U-Boot (speaking of bootloaders in general) then configures a minimal set of devices, memory, chip settings, etc., to enable the main OS to be loaded and started. The main OS init takes care of any additional configuration or initialization.
So the sequence is:
Processor power-on or reset
Something loads initial boot code (or U-Boot style embedded bootloader)
Initial boot code (may not be needed)
U-Boot (or other general embedded bootloader)
Linux init
The kernel requires the hardware on which you are working to be in a particular state. All the hardware you used needs to be checked for its state and initialized for its further operation. This is one of the main reasons to use a boot loader in an embedded (or any other environment), apart from its use to load a kernel image into the RAM.
When you turn on a system, the RAM is also not in a useful state (fully initialized to use) for us to load kernel into it. Therefore, we cannot load a kernel directly (to answer your question)and thus arises the need for a construct to initialize it.
Apart from what is stated in all the other answers - which is correct - in some cases the system has to go through different execution modes, take as example TrustZone for secure ARM chips. It is possible to still consider it as sort of HW initialization, but what makes it peculiar is the fact that there are additional limitations (ex: memory available) that make it impractical, if not impossible, to do everything in a single binary, thus multiple stages of bootloader are available.
Furthermore, for security reason, each of them is signed and can perform its job only if it meets the security requirements.

Address space identifiers using qemu for i386 linux kernel

Friends, I am working on an in-house architectural simulator which is used to simulate the timing-effect of a code running on different architectural parameters like core, memory hierarchy and interconnects.
I am working on a module takes the actual trace of a running program from an emulator like "PinTool" and "qemu-linux-user" and feed this trace to the simulator.
Till now my approach was like this :
1) take objdump of a binary executable and parse this information.
2) Now the emulator has to just feed me an instruction-pointer and other info like load-address/store-address.
Such approaches work only if the program content is known.
But now I have been trying to take traces of an executable running on top of a standard linux-kernel. The problem now is that the base kernel image does not contain the code for LKM(Loadable Kernel Modules). Also the daemons are not known when starting a kernel.
So, my approach to this solution is :
1) use qemu to emulate a machine.
2) When an instruction is encountered for the first time, I will parse it and save this info. for later.
3) create a helper function which sends the ip, load/store address when an instruction is executed.
i am stuck in step2. how do i differentiate between different processes from qemu which is just an emulator and does not know anything about the guest OS ??
I can modify the scheduler of the guest OS but I am really not able to figure out the way forward.
Sorry if the question is very lengthy. I know I could have abstracted some part but felt that some part of it gives an explanation of the context of the problem.
In the first case, using qemu-linux-user to perform user mode emulation of a single program, the task is quite easy because the memory is linear and there is no virtual memory involved in the emulator. The second case of whole system emulation is a lot more complex, because you basically have to parse the addresses out of the kernel structures.
If you can get the virtual addresses directly out of QEmu, your job is a bit easier; then you just need to identify the process and everything else functions just like in the single-process case. You might be able to get the PID by faking a system call to get_pid().
Otherwise, this all seems quite a bit similar to debugging a system from a physical memory dump. There are some tools for this task. They are probably too slow to run for every instruction, though, but you can look for hints there.

Arguments to kernel

Is there anything that the kernel need to get from the boot loader.Usually the kernel is capable of bringing up a system from scratch, so why does it require anything from boot-loader?
I have seen boot messages from kernel like this.
"Fetching vars from bootloader... OK"
So what exactly are the variables being passed?
Also how are the variables being passed from the boot-loader? Is it through stack?
The kernel accept so called command-line options, that are text based. This is very useful, because you can do a lot of thing without having to recompile your kernel. As for the argument passing, it is architecture dependent. On ARM it is done through a pointer to a location in memory, or a fixed location in memory.
Here is how it is done on ARM.
Usually a kernel is not capable of booting the machine from scratch. May be from the bios, but then it is not from scratch. It needs some initialisation, this is the job of the bootloader.
There are some parametres that the Linux kernel accepts from the bootloader, of which what I can remember now is the vga parametre. For example:
kernel /vmlinuz-2.6.30 root=/dev/disk/by-uuid/3999cb7d-8e1e-4daf-9cce-3f49a02b00f2 ro vga=0x318
Have a look at 10 boot time parameters you should know about the Linux kernel which explains some of the common parametres.
For the Linux kernel, there are several things the bootloader has to tell the kernel. It includes things like the kernel command line (as several other people already mentioned), where in the memory the initrd has been loaded and its size, if an initrd is being used (the kernel cannot load it by itself; often when using an initrd, the modules needed to acess storage devices are within the initrd, and it can also have to do some quite complex setup before being able to access the storage), and several assorted odds and ends.
See Documentation/x86/boot.txt (link to 2.6.30's version) for more detail for the traditional x86 architecture (both 32-bit and 64-bit), including how these variables are passed to the kernel setup code.
The bootloader doesn't use a stack to pass arguments to the kernel. At least in the case of Linux, there is a rather complex memory structure that the bootloader fills in that the kernel knows how to parse. This is how the bootloader points the kernel to its command line. See Documentaion/x86/boot.txt for more info.
Linux accepts variables from the boot loader to allow certain options to be used. I know that one of the things you can do is make it so that you don't have to log-in (recovery mode) and there are several other options. It mainly just allows fixes to be done if there's an issue with something or for password changing. This is how the Ubuntu Live-CD boots Linux if you select to use another option.
Normally the parameters called command line parameters, which is passed to kernel module from boot loader. Bootloader use many of the BIOS interrupts to detect,
memory
HDD
Processor
Keyboard
Screen
Mouse
ETC...
and all harwares details are going to be detected at boot time, that is in real mode, then pass this parameters to Kernel.

Resources