How to disable memory zeroing in Linux kernel - linux

I 'm trying to disable the Linux kernel's memory zeroing mechanism to investigate its impact. I 'm aware that this is a security feature to protect private data from other processes, however, I consider my set up to be in a safe environment where no malicious users exist (just for experimental purposes).
I downloaded the kernel version 5.2.3 and tried first to “grep“ all “memsets“ from the Memory Management directory (mm) and I started disabling those that looked suspicious. After I recompiled the kernel, updated the grub, and executed a C program that allocates memory, checks if a non-null character exists (exits if it finds one or continues if not), write random information to it, and after frees it.
This program also spawns processes that are doing the same thing.
However, I still cannot achieve it.
What I have commented till now are “memsets“ from mm/page_alloc.c, include/linux/mm.h, arch/sh/include/asm/page.h.

Related

How can a program change a directory without using chdir()?

I can find a lot of documentation on using chdir() to change a directory in a program (a command shell, for instance). I was wondering if it is possible to somehow do the same thing without the use of chdir(). Yet, I can't find any documentation or examples of code where a person is changing directories without using chdir() to some capacity. Is this possible?
In Linux, chdir() is a syscall. That means it's not something a program does in its own memory, but it's a request for the OS kernel to do something on the program's behalf.
Granted, it's one of two syscalls that can change directories -- the other one is fchdir(). Theoretically you could use the other one, though whether that's what your professor actually wants is very much open to interpretation.
In terms of why chdir() and fchdir() can't be reimplemented by an application but need to be leveraged: The current working directory is among the process state maintained by the kernel on a program's behalf; the program itself can't access kernel memory without asking the kernel to operate on its behalf.
Things are syscalls because they need to be syscalls -- if something could be done in-process, it would be done that way (crossing the boundary between userspace and kernelspace involves a context-switch penalty; it's not without performance impact). In this case, letting the kernel do accurate bookkeeping as to what a process's working directory is ensures that the working directory is maintained when a new executable is loaded (with execve()), and helps to ensure the integrity of the kernel's records (making sure a program can't pretend to have its current working directory be a directory it doesn't actually have access to).

Linux memory/disk behavior during ELF execution

I have a series of executables running on a cluster that read in some small config files at execution start, and then do a lot of processing for several hours, and then write out some data and exit. Our sysadmin is trying to tell me that our fileserver is really slow because his analysis is showing that the cluster nodes are spending all their time using NFS disk I/O reading in ELF executables for execution, long after they've been spawned (note: our executables are only a few MB in size). This doesn't sound right to me, as I was under the impression that the dynamic linker loaded the entire executable into memory at runtime and then operated out of memory. I know that the kernel leaves an open file descriptor to the executable while it's running, but I didn't think it was continually reading from it.
My question is, is my understanding of how executables are loaded flawed? I find it hard to believe that the kernel is constantly doing file reads on the executable to fetch instructions, as this would be terribly slow (even with caching) because branch predictions are hardly reliable, so you'd be spending forever reading the executable from disk if your binary performed frequent jumps.
I was under the impression that the dynamic linker loaded the entire executable into memory at runtime and then operated out of memory.
Your impression is incorrect.
First, a minor inaccuracy: while the dynamic linker is responsible for loading shared libraries, the main executable itself is loaded by the kernel before dynamic loader is started.
Second, most current systems use demand paging. The files are mmaped, but the code isn't actually loaded into memory until that code is accessed (i.e. tries to execute). If you never execute some parts of the program, these parts are never loaded into memory at all.
I find it hard to believe that the kernel is constantly doing file reads on the executable to fetch instructions
It doesn't constantly do that. It typically loads the code into memory and the code stays there.
It is possible for the code to be discarded from memory (which would require reloading it again if it executes again) on a system that doesn't have enough memory (this is called thrashing).
because branch predictions are hardly reliable,
Branch prediction
has approximately nothing to do with your problem, and
is exceedingly good on modern CPUs.

Why Kernel Can't Handle Crash Gracefully

For user mode application, incorrect page access doesn't create a lot of trouble other than the application crash and the application crash can be done gracefully by exception handling. Why can't we do the same for kernel crash. So when a kernel module tries to access some invalid address, there is a page fault and the kernel crash. Why it can't be handled gracefully like unloading the faulty module.
More specifically I'm interested to know if it is completely impossible or possible. I am not inclined to know the difficulties it might pose in using the system. I understand a driver crash will result unusable device and I'm okay with that. The only thing is whether it is possible to gracefully unload a faulty driver.
As other answer explains very well why it's not feasible to recover from the kernel crashes, I'll try to tell something else.
There is a lot of research in this area, most notably from prof. Andy Tanenbaum with his MINIX. While the kernel crash is still fatal for the MINIX, MINIX kernel is very simple (micro-kernel) narrowing the space for bugs and inside it most other stuff (including drivers) is running as a user-mode process. So, in case of network driver failure, as they are running in the separate address space, all kernel needs to do is to attempt to restart the driver.
Of course, there are areas where you can't recover (or still can't recover), like in case of the file system crash (see the recent discussion here).
There are several good papers on this topic such as http://pages.cs.wisc.edu/~swami/papers/thesis.pdf and I would highly recommend watching Tanenbaum's videos such this one (title is "MINIX 3: A Reliable and Secure Operating System" in case it ever goes offline).
I think this addresses your comment:
We should be able to unload the faulty module. Why can't we? That is my question. Is it a design choice for security or its not possible at all. If it is a design choice, what factors forced us to make that choice
You could live without screen if graphics driver module crashes. However, we can't unload the faulty module and continue because if it crashed and it runs in the same address space as kernel, you don't know if it poisoned the kernel memory - security is the prime factor here.
That's kind of like saying "if you wrap all your Java code in a try/catch block, you've eliminated all bugs!"
There are a number of "errors" that are caught, e.g. kalloc returns NULL if it's out of memory, USB code returns errors if there's no USB, etc. But there's no try/catch for the entire operating system, because not all bugs can be repaired.
As pointed out, what happens if your filesystem module crashes? Keep running without files? How about your ethernet driver? Now your box is cut off from the internet and you can't even ssh into it anymore, but doesn't even have the decency to reboot.
So even though it may be possible for the kernel to not "crash" when a module crashes, the state of the kernel could be arbitrarily broken. The kernel could stay alive without a screen, filesystem or internet connection, but what kind of existence is that?
The kernel modules and the kernel itself share the same address space. There is simply no protection if a modules starts to misbehave and overwrite memory from another subsystem.
So, when a driver crashes, it may or may not stay local to that driver. If you are lucky, you still have a somewhat functional kernel and can continue to work.
That doesn't happen with userspace because the address space for each process is separate and so it is possible to catch erroneous memory access and stop the process (this is a SEGFAULT).

Cyclictest for RT patched Linux Kernel

Hello I patched the Linux kernel with the RT-Patch and tested it with the Cyclinctest which monitors latencies. The Kernel isn't doing good and not better than the vanilla kernel.
https://rt.wiki.kernel.org/index.php/Cyclictest
I checked the uname for RT, which looks fine.
So I checked the requirements for the cyclinctest and it states that I have to make sure that the following is configured within the kernel config:
CONFIG_PREEMPT_RT=y
CONFIG_WAKEUP_TIMING=y
CONFIG_LATENCY_TRACE=y
CONFIG_CRITICAL_PREEMPT_TIMING=y
CONFIG_CRITICAL_IRQSOFF_TIMING=y
The Problem now arising is that the config doesn't contain such entries. Maybe there are old and the they may be renamed in the new patch versions (3.8.14)?
I found options like:
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT_BASE=y
CONFIG_HIGH_RES_TIMERS=y
Is that enought in the 3.x kernel to provide the required from above? Anyone a hint?
There's a lot that must be done to get hard realtime performance under PREEMPT_RT. Here are the things I am aware of. Entries marked with an asterisk apply to your current position.
Patch the kernel with PREEMPT_RT (as you already did), and enable CONFIG_PREEMPT_RT_FULL (which used to be called CONFIG_PREEMPT_RT, as you correctly derived).
Disable processor frequency scaling (either by removing it from the kernel configuration or by changing the governor or its settings). (*)
Reasoning: Changing a core's frequency takes a while, during which the core does no useful work. This causes high latencies.
To remove this, look under the ACPI options in the kernel settings.
If you don't want to remove this capability from the kernel, you can set the cpufreq governor to "performance" to lock it into its highest frequency.
Disable deep CPU sleep states
Reasoning: Like switching frequencies, Waking the CPU from a deep sleep can take a while.
Cyclictest does this for you (look up /dev/cpu_dma_latency to see how to do it in your application).
Alternatively, you can disable the "cpuidle" infrastructure in the kernel to prevent this from ever occurring.
Set a high priority for the realtime thread, above 50 (preferably 99) (*)
Reasoning: You need to place your priority above the majority of the kernel -- much of a PREEMPT_RT kernel (including IRQs) runs at a priority of 50.
For cyclictest, you can do this with the "-p#" option, e.g. "-p99".
Your application's memory must be locked. (*)
Reasoning: If your application's memory isn't locked, then the kernel may need to re-map some of your application's address space during execution, triggering high latencies.
For cyclictest, this may be done with the "-m" option.
To do this in your own application, see the RT_PREEMPT howto.
You must unload the nvidia, nouveau, and i915 modules if they are loaded (or not build them in the first place) (*)
Reasoning: These are known to cause high latencies. Hopefully you don't need them on a realtime system :P
Your realtime task must be coded to be realtime
For example, you cannot do file access or dynamic memory allocation via malloc(). Many system calls are off-limits (it's hard to find which ones are acceptable, IMO).
cyclictest is mostly already coded for realtime operation, as are many realtime audio applications. You do need to run it with the "-n" flag, however, or it will not use a realtime-safe sleep call.
The actual execution of cyclictest should have at least the following set of parameters:
sudo cyclictest -p99 -m -n

Why do we need a bootloader in an embedded device?

I'm working with ELinux kernel on ARM cortex-A8.
I know how the bootloader works and what job it's doing. But i've got a question - why do we need bootloader, why was the bootloader born?
Why we can't directly load the kernel into RAM from flash memory without bootloader? If we load it what will happen? In fact, processor will not support it, but why are we following the procedure?
In the context of Linux, the boot loader is responsible for some predefined tasks. As this question is arm tagged, I think that ARM booting might be a useful resource. Specifically, the boot loader was/is responsible for setting up an ATAG list that describing the amount of RAM, a kernel command line, and other parameters. One of the most important parameters is the machine type. With device trees, an entire description of the board is passed. This makes a stock ARM Linux impossible to boot with out some code to setup the parameters as described.
The parameters allows one generic Linux to support multiple devices. For instance, an ARM Debian kernel can support hundreds of different board types. Uboot or other boot loader can dynamically determine this information or it can be hard coded for the board.
You might also like to look at bootloader info page here at stack overflow.
A basic system might be able to setup ATAGS and copy NOR flash to SRAM. However, it is usually a little more complex than this. Linux needs RAM setup, so you may have to initialize an SDRAM controller. If you use NAND flash, you have to handle bad blocks and the copy may be a little more complex than memcpy().
Linux often has some latent driver bugs where a driver will assume that a clock is initialized. For instance if Uboot always initializes an Ethernet clock for a particular machine, the Linux Ethernet driver may have neglected to setup this clock. This can be especially true with clock trees.
Some systems require boot image formats that are not supported by Linux; for example a special header which can initialize hardware immediately; like configuring the devices to read initial code from. Additionally, often there is hardware that should be configured immediately; a boot loader can do this quickly whereas the normal structure of Linux may delay this significantly resulting in I/O conflicts, etc.
From a pragmatic perspective, it is simpler to use a boot loader. However, there is nothing to prevent you from altering Linux's source to boot directly from it; although it maybe like pasting the boot loader code directly to the start of Linux.
See Also: Coreboot, Uboot, and Wikipedia's comparison. Barebox is a lesser known, but well structured and modern boot loader for the ARM. RedBoot is also used in some ARM systems; RedBoot partitions are supported in the kernel tree.
A boot loader is a computer program that loads the main operating system or runtime environment for the computer after completion of the self-tests.
^ From Wikipedia Article
So basically bootloader is doing just what you wanted - copying data from flash into operating memory. It's really that simple.
If you want to know more about boostrapping the OS, I highly recommend you read the linked article. Boot phase consists, apart from tests, also of checking peripherals and some other things. Skipping them makes sense only on very simple embedded devices, and that's why their bootloaders are even simpler:
Some embedded systems do not require a noticeable boot sequence to begin functioning and when turned on may simply run operational programs that are stored in ROM.
The same source
The primary bootloader is usually built in into the silicon and performs the load of the first USER code that will be run in the system.
The bootloader exists because there is no standardized protocol for loading the first code, since it is chip dependent. Sometimes the code can be loaded through a serial port, a flash memory, or even a hard drive. It is bootloader function to locate it.
Once the user code is loaded and running, the bootloader is no longer used and the correctness of the system execution is user responsibility.
In the embedded linux chain, the primary bootloader will setup and run the Uboot. Then Uboot will find the linux kernel and load it.
Why we can't directly load the kernel into RAM from flash memory without bootloader? If we load it what will happen? In fact, processor will not support it, but why are we following the procedure?
Bartek, Artless, and Felipe all give parts of the picture.
Every embedded processor type (E.G. 386EX, Coretex-A53, EM5200) will do something automatically when it is reset or powered on. Sometimes that something is different depending on whether the power is cycled or the device is reset. Some embedded processors allow you to change that something based on voltages applied to different pins when the device is powered or reset.
Regardless, there is a limited amount of something that a processor can do, because of the physical space on-processor required to define that something, whether it is on-chip FLASH, instruction micro-code, or some other mechanism.
This limit means that the something is
fixed purpose, does one thing as quickly as possible.
limited in scope and capability, typically loading a small block of code (often a few kilobytes or less) into a fixed memory location and executing from the start of the loaded code.
unmodifiable.
So what a processor does in response to reset or power-cycle cannot be changed, and cannot do very much, and we don't want it to automatically copy hundreds of megabytes or gigabytes into memory which may not exist or may not be initialized, and which could take a looooong time.
So....
We set up a small program which is smaller than the smallest size permitted across all of the devices we are going to use. That program is stored wherever the something needs it to be.
Sometimes the small program is U-Boot. Sometimes even U-Boot is too big for initial load, so the small program then in turn loads U-Boot.
The point is that whatever gets loaded by the something, is modifiable as needed for a particular system. If it is U-Boot, great, if not, it knows where to load the main operating system or where to load U-Boot (or some other bootloader).
U-Boot (speaking of bootloaders in general) then configures a minimal set of devices, memory, chip settings, etc., to enable the main OS to be loaded and started. The main OS init takes care of any additional configuration or initialization.
So the sequence is:
Processor power-on or reset
Something loads initial boot code (or U-Boot style embedded bootloader)
Initial boot code (may not be needed)
U-Boot (or other general embedded bootloader)
Linux init
The kernel requires the hardware on which you are working to be in a particular state. All the hardware you used needs to be checked for its state and initialized for its further operation. This is one of the main reasons to use a boot loader in an embedded (or any other environment), apart from its use to load a kernel image into the RAM.
When you turn on a system, the RAM is also not in a useful state (fully initialized to use) for us to load kernel into it. Therefore, we cannot load a kernel directly (to answer your question)and thus arises the need for a construct to initialize it.
Apart from what is stated in all the other answers - which is correct - in some cases the system has to go through different execution modes, take as example TrustZone for secure ARM chips. It is possible to still consider it as sort of HW initialization, but what makes it peculiar is the fact that there are additional limitations (ex: memory available) that make it impractical, if not impossible, to do everything in a single binary, thus multiple stages of bootloader are available.
Furthermore, for security reason, each of them is signed and can perform its job only if it meets the security requirements.

Resources