Is there any relationship between Secure Boot and Kernel Lockdown?

Is there any relationship between Secure Boot and Kernel Lockdown? - security

As far as I googled till now, the two features seem independent.
Secure Boot is dependent on Kernel Signature, so the bootloader will be checking (Kernel/Single Image Application) Signature if Valid will Call Kernel Start Function.
Lockdown is another feature where "The lockdown code is intended to allow for kernels to be locked down early in boot - sufficiently early that we don't have the ability to kmalloc() yet. Disallowing even privileged User from accessing Confidential Data present in kernel memory."
Lockdown comes in via boot parameters/sysfs control after kernel is authenticated to be
valid.
Is my understanding correct?
So with a Secure Boot Disabled, the Lockdown Feature should still work.

Yes, I'd say that your understanding is correct.
Secure boot is a security feature implemented in hardware (i.e. directly in your CPU, though it could also be implemented in UEFI firmware). It's a mechanism of verification that is done as the very first thing when powering on the computer. Some known public keys are stored in hardware and are used to verify the signature of the bootloader before running it. The process can be then repeated through the multiple stages of the boot process where each stage verifies the following one until the OS is started. To know more, take a look at this page of the Debian documentation about secure boot.
Kernel lockdown is a security feature of the Linux kernel, which was recently introduced in version 5.4 as an optional security module. As mentioned in this interesting article from LWN, the purpose of kernel lockdown is to enforce a distinction between running as root and the ability to run code in kernel mode. Depending on the configuration, kernel lockdown can disable features of the kernel that allow modifications of the running kernel or the extraction of confidential information from userspace. Take a look at the relevant commit message that introduced the feature to Linux.
The relationship between secure boot and kernel lockdown can be explained by this very important consideration (from the same LWN article linked above):
Proponents of UEFI secure boot maintain that this separation [i.e. kernel lockdown] is necessary; otherwise the promise of secure boot (that the system will only run trusted code in kernel mode) cannot be kept. Closing off the paths by which a privileged attacker could run arbitrary code in kernel mode requires disabling a number of features in the kernel.
In other words, one could argue that secure boot is useless if the kernel that is verified and ran can be then modified by userland processes. Indeed, without proper kernel hardening, a potential attacker could exploit privileged userland processes to alter the running kernel and therefore tear down the whole security model of the system.
On the other hand, one could as easily argue that kernel lockdown is useless without secure boot since a potential attacker could compromise the boot chain (e.g. modifying the bootloader or the kernel image on the disk) and make the machine run a modified kernel on the next boot.
Nonetheless, the two features are independent of each other. There are still valid use cases of secure boot without kernel lockdown, and vice versa kernel lockdown without secure boot. It all ultimately depends on what's your threat model.

Related

TrustZone vs ROM as root-of-trust in Secure Boot

A lot of literature that I stumbled upon referred TrustZone as a mechanism that facilitates Secure Boot (as can be seen here, and a lot more).
To my knowledge, Secure Boot operates this way:
"Root-of-Trust verifies img1 verifies img2 ..."
So in case the chip is booting from a ROM that verifies the first image which resides in a flash memory, what added value do I get by using TrustZone?
It seems to me that TrustZone cannot provide Secure Boot if there is no ROM Root-of-Trust to the system, because it can only isolate RAM memory and not flash, so during run-time, if the non-trusted OS is compromised, it has no way of protecting its own flash from being rewritten.
Am I missing something here?

So in case the chip is booting from a ROM that verifies the first image which resides in a flash memory, what added value do I get by using TrustZone?
Secure boot and TrustZone are separate features/functions. They often work together. Things will always depend on your threat model and system design/requirements. Ie, does an attacker have physical access to a device, etc.
If you have an image in flash and someone can re-write flash, it may be the case that a system is 'OK' if the boot fails. Ie, someone can not re-program the flash and have a user think the software is legitimate. In this case, you can allow the untrusted OS access to the flash. If the image is re-written the secure boot will fail and an attacker can not present a trojan image.
Am i missing something here?
If your system fails if some one can stop the system from booting, then you need to assign the flash controller to secure memory and only allow access to the flash via controlled channels between worlds. In this design/requirement, secure boot might not really do much as you are trying to construct the system to not run un-authorized software.
This is probably next to impossible if an attacker has physical access. They can disassemble the device and re-program flash by removing, external programming and reinstalling the chip. Also, an attacker can swap the device with some mocked-up trojan device that doesn't even have the same CPU but just an external appearance and similar behaviour.
If the first case is acceptable (rogue code reprogramming flash, but not bootable), then you have designs/requirements where the in-memory code can not compromise functionality of the running system. Ie, you may not want this code to grab passwords, etc. So TrustZone and secure boot work together for a lot of cases. It is entirely possible to find some model that works with only one. It is probably more common that you need both and don't understand all threats.

Pretty sure TrustZone can isolate flash depending on the vendor's implementation of the Secure Configuration Register(SCR)
Note this is with regards to TrustZone-M(TrustZone for Cortex-M architecture) which may not be what you are looking for.

What is cost of context switching to secure mode (arm trustzone)

I am trying to understand the cost of switching back and forth between trusted (secure) and non-secure modes in arm.
What exactly needs to happen when moving from non-secure to secure world? I know the ns bit needs to be set (based on some special instruction?), the page tables need to be flushed and updated (?), the processor caches flushed and updated. Anything else that needs to happen?
Processor caches: Are they caches segmented and shared or is the whole cache used for each mode? That determines the cost of the switch.
RAM: This must be 'partitioned' and used by both modes. So addressing is just an offset into the 'partition'. Is this right?
What is different about this from a user space to kernel mode switch or a process to process switch in user space?
Is there anything in moving from non-secure to secure modes that would make it more expensive than the regular process context switch?
Are there any articles that explain what exactly happens?
EDIT: Based on a reply below, I am looking to understand what exactly happens when a process switches from non-secure mode to a secure mode (trust zone) on an arm processor.

What exactly needs to happen when moving from non-secure to secure world?
TL-DR; the minimum is to save/restore all CPU registers that are needed by the secure world and change the NS bits. Normally, R0-R14 as well as current mode, and banked LR and SP (aborts, interrupts, etc) are in this register group. Everything else depends on your security model.
First off, there are many different models that can be used in TrustZone; TrustZone is a tool not a solution. The most basic model is a library with API where some secure data is stored (ie decryption keys) to process by an external source (some DRM download from the 'normal world' space). I assume you don't mean this.
An OS can be pre-emptible and non-premptible. If you have two OSes in both worlds, then how control is relinquished, resources shared and security assets protected will all come into play on a world switch.
In many cases, the caches and TLB are world aware. Devices may also be world aware and designed with the intent that context is built into the device. This is not to say that some system might have information leaked in some way.
Meltdown (2017)
Specter (2017)
Hyperthreading exploit (2004)
If you are really concerned about this type of attack, it may be appropriate to mark the secure world memory as non-cached that needs to be protected. In many ARM systems, the L1/L2 and TLB cache are unified between worlds and can provide a side channel attack.
TrustZone as implmented on many ARM devices comes with a GIC which can run FIQ in the secure world and masking of FIQ can be prevented in the normal world. Many GIC features are banked between worlds allowing both OSes to use it without 'context switch' information. Ie, the NS bit will automatically change the accessed GIC features based on the state of the NS bit (so it has the context stored in the device). Many other vendor specific devices are designed to behave this way.
If both worlds use NEON/VFP, then you need to save/restore these registers on a world switch as well. For pre-emption you may need to hook into the OS secure scheduler to allow and normal world interrupt to pre-empt the secure world main line (obviously this depends on assets you are trying to protect; if you allow this the secure mainline has a DOS vector).
If there are glitches in devices, then you may need to save/restore device state. If the normal world is restricted from using FIQ mode, it is still needed to at least clear the SP_fiq and LR_fiq when going to the normal world (and restore the secure value the other way). Some of these registers are difficult to save/restore as you must switch modes which can itself be a security risk if care is not taken.
RAM: This must be 'partitioned' and used by both modes. So addressing is just an offset into the 'partition'. Is this right?
Secure boot will partition memory based on the 'NS bit'. The physical memory will be visible or not based on the partition manager device logic which can often be locked at boot. Ie, if non-visible it is a bus error like any non-existent memory. There is no 'switch' beside the NS bit.
Is there anything in moving from non-secure to secure modes that would make it more expensive than the regular process context switch?
Yes a normal switch is only for a 'mode'. A world is for all ARM modes and so all banked registers must be switched. Depending on the system the TLB and cache would not normally need to be switched.
Related:
How to introspect normal world
TrustZone monitor mode switch design
Preventing memory access from the normal world
How is a TrustZone OS secure?
TrustZone scheduler in secure/non-secure OS
IMX53 and TrustZone
ARM Trusted firmware on github
TrustZone Whitepaper

Is it possible to circumvent OS security by not using the supplied System Calls?

I understand that an Operating System forces security policies on users when they use the system and filesystem via the System Calls supplied by stated OS.
Is it possible to circumvent this security by implementing your own hardware instructions instead of making use of the supplied System Call Interface of the OS? Even writing a single bit to a file where you normally have no access to would be enough.

First, for simplicity, I'm considering the OS and Kernel are the same thing.
A CPU can be in different modes when executing code.
Lets say a hypothetical CPU has just two modes of execution (Supervisor and User)
When in Supervisor mode, you are allowed to execute any instructions, and you have full access to the hardware resources.
When in User mode, there is subset of instructions you don't have access to, such has instructions to deal with hardware or change the CPU mode. Trying to execute one of those instructions will cause the OS to be notified your application is misbehaving, and it will be terminated. This notification is done through interrupts. Also, when in User mode, you will only have access to a portion of the memory, so your application can't even touch memory it is not supposed to.
Now, the trick for this to work is that while in Supervisor Mode, you can switch to User Mode, since it's a less privileged mode, but while in User Mode, you can't go back to Supervisor Mode, since the instructions for that are not permitted anymore.
The only way to go back to Supervisor mode is through system calls, or interrupts. That enables the OS to have full control of the hardware.
A possible example how everything fits together for this hypothetical CPU:
The CPU boots in Supervisor mode
Since the CPU starts in Supervisor Mode, the first thing to run has access to the full system. This is the OS.
The OS setups the hardware anyway it wants, memory protections, etc.
The OS launches any application you want after configuring permissions for that application. Launching the application switches to User Mode.
The application is running, and only has access to the resources the OS allowed when launching it. Any access to hardware resources need to go through System Calls.
I've only explained the flow for a single application.
As a bonus to help you understand how this fits together with several applications running, a simplified view of how preemptive multitasking works:
In a real-world situation. The OS will setup an hardware timer before launching any applications.
When this timer expires, it causes the CPU to interrupt whatever it was doing (e.g: Running an application), switch to Supervisor Mode and execute code at a predetermined location, which belongs to the OS and applications don't have access to.
Since we're back into Supervisor Mode and running OS code, the OS now picks the next application to run, setups any required permissions, switches to User Mode and resumes that application.
This timer interrupts are how you get the illusion of multitasking. The OS keeps changing between applications quickly.
The bottom line here is that unless there are bugs in the OS (or the hardware design), the only way an application can go from User Mode to Supervisor Mode is through the OS itself with a System Call.
This is the mechanism I use in my hobby project (a virtual computer) https://github.com/ruifig/G4DevKit.

HW devices are connected to CPU trough bus, and CPU does use to communicate with them in/out instructions to read/write values at I/O ports (not used with current HW too much, in early age of home computers this was the common way), or a part of device memory is "mapped" into CPU address space, and CPU controls the device by writing values at defined locations in that shared memory.
All of this should be not accessible at "user level" context, where common applications are executed by OS (so application trying to write to that shared device memory would crash on illegal memory access, actually that piece of memory is usually not even mapped into user space, ie. not existing from user application point of view). Direct in/out instructions are blocked too at CPU level.
The device is controlled by the driver code, which is either run is specially configured user-level context, which has the particular ports and memory mapped (micro-kernel model, where drivers are not part of kernel, like OS MINIX). This architecture is more robust (crash in driver can't take down kernel, kernel can isolate problematic driver and restart it, or just kill it completely), but the context switches between kernel and user level are a very costly operation, so the throughput of data is hurt a bit.
Or the device drivers code runs on kernel-level (monolithic kernel model like Linux), so any vulnerability in driver code can attack the kernel directly (still not trivial, but lot more easier than trying to get tunnel out of user context trough some kernel bug). But the overall performance of I/O is better (especially with devices like graphics cards or RAID disc clusters, where the data bandwidth goes into GiBs per second). For example this is the reason why early USB drivers are such huge security risk, as they tend to be bugged a lot, so a specially crafted USB device can execute some rogue code from device in kernel-level context.
So, as Hyd already answered, under ordinary circumstances, when everything works as it should, user-level application should be not able to emit single bit outside of it's user sandbox, and suspicious behaviour outside of system calls will be either ignored, or crash the app.
If you find a way to break this rule, it's security vulnerability and those get usually patched ASAP, when the OS vendor gets notified about it.
Although some of the current problems are difficult to patch. For example "row hammering" of current DRAM chips can't be fixed at SW (OS) or CPU (configuration/firmware flash) level at all! Most of the current PC HW is vulnerable to this kind of attack.
Or in mobile world the devices are using the radiochips which are based on legacy designs, with closed source firmware developed years ago, so if you have enough resources to pay for a research on these, it's very likely you would be able to seize any particular device by fake BTS station sending malicious radio signal to the target device.
Etc... it's constant war between vendors with security researchers to patch all vulnerabilities, and hackers to find ideally zero day exploit, or at least picking up users who don't patch their devices/SW fast enough with known bugs.

Not normally. If it is possible it is because of an operating system software error. If the software error is discovered it is fixed fast as it is considered to be a software vulnerability, which equals bad news.

"System" calls execute at a higher processor level than the application: generally kernel mode (but system systems have multiple system level modes).
What you see as a "system" call is actually just a wrapper that sets up registers then triggers a Change Mode Exception of some kind (the method is system specific). The system exception hander dispatches to the appropriate system server.
You cannot just write your own function and do bad things. True, sometimes people find bugs that allow circumventing the system protections. As a general principle, you cannot access devices unless you do it through the system services.

uclinux and necessity for device drivers

Normally MMU-less systems don't have MPU (memory protection unit) as well, there's also no distinction between the user & kernel modes. In such a case, assuming we have a MMU-less system with some piece of hardware which is mapped in CPU address space, does it really make sense to have device drivers in the kernel, if all the hardware resources can be accessed from the userspace?
Does a kernel code have more control over memory, then the usercode?

Yes, on platforms without MMUs that host ucLinux it makes sense to do everything as if you had a normal embedded Linux environment. It is a cleaner design to have user applications and services go through their normal interfaces (syscalls, etc.) and have the OS route those kernel requests through to device drivers, file systems, network stack, etc.
Although the kernel does not have more control over the hardware in these circumstances, the actual hardware should only be touched by system software running in the kernel. Not limiting access to the hardware would make debugging things like system resets and memory corruption virtually impossible. This practice also makes your design more portable.
Exceptions may be for user mode debugging binaries that are only used in-house for platform bring-up and diagnostics.

Can containers be implemented purely in userspace?

There are a bunch of container mechanisms for Linux now: LXC, Docker, lmctfy, OpenVZ, Linux-VServer, etc. All of these either involve kernel patches or recently added Linux features like cgroups and seccomp.
I'm wondering if it would be possible to implement similar (OS-level) virtualization purely in userspace.
There's already a precedent for this - User Mode Linux. However, it also requires special kernel features to be reasonably fast and secure. Also, it is literally a Linux kernel running in userspace, which makes networking setup rather difficult.
I'm thinking more along the lines of a process that would act as an intermediary between spawned programs and the Linux kernel. You would start the process with the programs to spawn as arguments; it would track system calls they made, and block or redirect attempts to access the real root filesystem, real network devices, etc. without itself relying on special kernel features.
Is such a thing possible to implement securely, and in a way that could be invoked effectively by a limited user (i.e. not privileged like chroot)?
In summary: would a pure userspace implementation of something like LXC be possible? If yes, what would the penalties be for doing it in userspace? If no, why not?

Surprisingly it turns out the answer is "yes": this is what systrace and sysjail do.
http://sysjail.bsd.lv/
And they are also inherently insecure on modern operating systems.
http://www.watson.org/~robert/2007woot/
So if you want proper sandboxing, it has to be done in kernel space.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string