Where can I find the Process Control Block (PCB) and GDTR/LDTR contents using GDB and QEMU? - linux

I have a barebone linux kernel with buildroot setup for debugging using QEMU and GDB. I am using the x86_64 architecture.
I want to check how the memory protection works for each process. So basically, I need to find the base and limit values that govern the access to the physical memory.
If I understood correctly, the GDTR register in the x86 architecture "holds the base address (32 bits in protected mode; 64 bits in IA-32e mode) and the 16-bit table limit for the GDT." If not, please let me know where such information is held.
I tried using the i r in GDB but the output does not show the GDTR/LDTR contents. I read somewhere that we can use another method while inside the kernel in order to display the results of these registers.
I also need to check the PCB (Process Control Block) contents. I can't seem to find a way to do so. I read somewhere that if we do memory dump in the kernel, we can get the PCB contents, but I can't figure out how to do so.
So, how can I check the contents of PCB and GDTR/LDTR from gdb?
The setup is a simple qemu that launches the linux kernel with buildroot, connects gdb by using target remote :1234 and execute a simple C program that has fork and exec inside of it.

Related

ARM64 Kernel Mode Linux: Minimal & maintainable modification for Glibc & Kernel

I'm seeking possible solution to achieve Kernel Mode Linux without modify Glibc.
The project called "Kernel Mode Linux on aarch64", which make specified processes execute in kernel mode, not all processes. (ex: programs in /trusted/) It enhance the speed of invoking system call. The background research is from Toshiyuki Maeda Website and sonicyang/KML.
If the user program execute in kernel mode, that means it can access syscall function directly.(Monolithic kernel) However, the access path of syscall is a hard path in arm64 glibc. The syscall will eventually use "svc 0" which cause an "Instruction Abort" exception. (# define INTERNAL_SYSCALL_RAW(name, nr, args...) \ in sysdeps/unix/sysv/linux/aarch64/sysdep.h). Of course, there is vDSO (vsyscall) way to go, but the current impl doesn't let most syscall functions have option to go vsyscal way.
In this situation, I have two modification plan, but both miss critical step.
Modify INTERNAL_SYSCALL_RAW to be multiplex of syscall or dl-call (or vsyscall) in glibc. How can I determine the process is in kernel mode or user mode without heavy overhead? (mrs x0, CurrentEL isn't allowed in EL0)
Replace svc 0 to bl dl-call when binelf loader loads. The program will be loaded by elf loader. We set it in kernel mode, no problem, but as we knew the libc.so is an dynamic link library. It keeps one piece in vma, but other normal user program will use it too. How can I deal with this situation? compile in static is great, but the size is really not acceptable.
Due to my limit understanding, please drop me any practical idea.
After a few research, the option 1 could work well as long as compile a customized glibc. The program runs in kernel mode must link to the the customized glibc. It'll not affect the system's glibc.

How can we tell an instruction is from application code or library code on Linux x86_64

I wanted to know whether an instruction is from the application itself or from the library code.
I observed some application code/data are located at about 0x000055xxxx while libraries and mmaped regions are by default located at 0x00007fcxxxx. Can I use for example, 0x00007f00...00 as a boundary to tell instruction is from the application itself or from the library?
How can I configure this boundary in Linux kernel?
Updated.
Can I prevent (or detect) a syscall instruction being issued from application code (only allow it to go through libc). Maybe we can do a binary scan, but due to the variable length of instruction, it's hard to prevent unintended syscall instruction.
Do it the other way. You need to learn a lot.
First, read a lot more about operating systems. So read the Operating Systems: Three Easy Pieces textbook.
Then, learn more about ASLR.
Read also Drepper's How to write shared libraries and Levine's Linkers and loaders book.
You want to use pmap(1) and proc(5).
You probably want to parse the /proc/self/maps pseudo-file from inside your program. Or use dladdr(3).
To get some insight, run cat /proc/$$/maps and cat /proc/self/maps in a Linux terminal
I wanted to know whether an instruction is from userspace or from library code.
You are confused: both library code and main executable code are userspace.
On Linux x86_64, you can distinguish kernel addresses from userpsace addresses, because the kernel addresses are in the FFFF8000'00000000 through FFFFFFFF'FFFFFFFF range on current (48-bit) implementations. See canonical form address description here.
I observed some application code/data are located at about 0x000055xxxx while libraries and mmaped regions are by default located at 0x00007fcxxxx. Can I use for example, 0x00007f00...00 as a boundary to tell instruction is from the application itself or from the library?
No, in general you can't. An application can be linked to load anywhere within canonical address space (though most applications aren't).
As Basile Starynkevitch already answered, you'll need to parse /proc/$pid/maps, or know what address the executable is linked to load at (for non-PIE binary).

How does linux protect memory?

I'm interested in how linux runs in protected mode from an assembly point of view. Which registers and interrupts are used when it comes to putting the cpu in protected mode for an i386:0x86_64 machine? I understand how memory managment works when I look at the c source of functions like mmap and mprotect, however whats keeping me from taking over with assembly? Where can I get more info on this?
I believe you're looking for arch/x86/mm/ -- arch/x86/mm/init.c sets up the page tables for the correct architecture (ia32 or AMD64) and takes into account the processor features available (PSE, PGE, etc.).
It bears stressing: This is a function of the processor. Linux tells the processor what to protect, and the processor does it.
AFA the system call interface, have a glance at http://stromberg.dnsalias.org/~strombrg/pbmonherc.html from back before the C library had mmap, but after the Linux kernel did. See file mmap.c.

Arguments to kernel

Is there anything that the kernel need to get from the boot loader.Usually the kernel is capable of bringing up a system from scratch, so why does it require anything from boot-loader?
I have seen boot messages from kernel like this.
"Fetching vars from bootloader... OK"
So what exactly are the variables being passed?
Also how are the variables being passed from the boot-loader? Is it through stack?
The kernel accept so called command-line options, that are text based. This is very useful, because you can do a lot of thing without having to recompile your kernel. As for the argument passing, it is architecture dependent. On ARM it is done through a pointer to a location in memory, or a fixed location in memory.
Here is how it is done on ARM.
Usually a kernel is not capable of booting the machine from scratch. May be from the bios, but then it is not from scratch. It needs some initialisation, this is the job of the bootloader.
There are some parametres that the Linux kernel accepts from the bootloader, of which what I can remember now is the vga parametre. For example:
kernel /vmlinuz-2.6.30 root=/dev/disk/by-uuid/3999cb7d-8e1e-4daf-9cce-3f49a02b00f2 ro vga=0x318
Have a look at 10 boot time parameters you should know about the Linux kernel which explains some of the common parametres.
For the Linux kernel, there are several things the bootloader has to tell the kernel. It includes things like the kernel command line (as several other people already mentioned), where in the memory the initrd has been loaded and its size, if an initrd is being used (the kernel cannot load it by itself; often when using an initrd, the modules needed to acess storage devices are within the initrd, and it can also have to do some quite complex setup before being able to access the storage), and several assorted odds and ends.
See Documentation/x86/boot.txt (link to 2.6.30's version) for more detail for the traditional x86 architecture (both 32-bit and 64-bit), including how these variables are passed to the kernel setup code.
The bootloader doesn't use a stack to pass arguments to the kernel. At least in the case of Linux, there is a rather complex memory structure that the bootloader fills in that the kernel knows how to parse. This is how the bootloader points the kernel to its command line. See Documentaion/x86/boot.txt for more info.
Linux accepts variables from the boot loader to allow certain options to be used. I know that one of the things you can do is make it so that you don't have to log-in (recovery mode) and there are several other options. It mainly just allows fixes to be done if there's an issue with something or for password changing. This is how the Ubuntu Live-CD boots Linux if you select to use another option.
Normally the parameters called command line parameters, which is passed to kernel module from boot loader. Bootloader use many of the BIOS interrupts to detect,
memory
HDD
Processor
Keyboard
Screen
Mouse
ETC...
and all harwares details are going to be detected at boot time, that is in real mode, then pass this parameters to Kernel.

Is it possible to shutdown linux kernel and resume in Real Mode?

Let's say I'd like to start a small linux distro before my ordinary operating system start.
BIOS load MBR and execute MBR.
MBR locates the active partition which is my linux partition.
Linux start and I perform what I need to do.
Linux shut down and I switch to Real Mode again.
The original partition boot sector is loaded and my ordinary OS start.
AFAIK, step 4 will be the difficult task, restore the state on all devices prior to linux, will INT13h be functional? Do I need to restore the Interrupt Vector Table? To mention a few.
Has this been done in any existing project perhaps?
Linux does not normally support this, particularly since it reinitializes hardware in a way that the BIOS and DOS programs may not expect. However, there is some infrastructure to switch back to real mode in specific cases - particularly, for a reboot (see machine_real_restart in arch/x86/kernel/reboot.c) - and has code to reinitialize hardware for kexec or suspend. I suspect you might be able to do something with a combination of these - but I don't know if the result will truly match what DOS or Windows would expect to see on reboot.
A much easier plan would be to use a chainloading bootloader that can be set to boot in a particular configuration once, like GRUB. You could invoke grub-set-default, then reboot. When GRUB comes up, it would then pass control off to Windows. By then setting the fallback OS to the Linux partition, control would return to Linux on the next boot.
Yet another option may be to use Coreboot, but I'm not sure if this is production-ready for booting windows yet.
i haven't tried this so I don't know if it would work, but here goes:
There is an option in the header of a bzImage format kernel file that specifies the address of real mode code to execute before the protected mode code starts. You could create a minimal bzImage-compliant file which has no actual kernel, but which has real mode code to load your MBR using INT 0x13 to 0x7c00 and jmp into it like the BIOS does.
If you use kexec to load the bzImage using the "-t bzImage-x86 --real-mode" options, it should reset the PE bit in CR0 to drop to realmode (as bdonlan above mentioned) and execute the code pointed to by the bzImage header option.
The bzImage header option is called realmode_swtch and is documented in /usr/src/linux/Documentation/x86/boot.txt , the header format code is in /usr/src/linux/arch/x86/boot/header.S
Have you looked into kexec?

Resources