What is cost of context switching to secure mode (arm trustzone) - security

I am trying to understand the cost of switching back and forth between trusted (secure) and non-secure modes in arm.
What exactly needs to happen when moving from non-secure to secure world? I know the ns bit needs to be set (based on some special instruction?), the page tables need to be flushed and updated (?), the processor caches flushed and updated. Anything else that needs to happen?
Processor caches: Are they caches segmented and shared or is the whole cache used for each mode? That determines the cost of the switch.
RAM: This must be 'partitioned' and used by both modes. So addressing is just an offset into the 'partition'. Is this right?
What is different about this from a user space to kernel mode switch or a process to process switch in user space?
Is there anything in moving from non-secure to secure modes that would make it more expensive than the regular process context switch?
Are there any articles that explain what exactly happens?
EDIT: Based on a reply below, I am looking to understand what exactly happens when a process switches from non-secure mode to a secure mode (trust zone) on an arm processor.

What exactly needs to happen when moving from non-secure to secure world?
TL-DR; the minimum is to save/restore all CPU registers that are needed by the secure world and change the NS bits. Normally, R0-R14 as well as current mode, and banked LR and SP (aborts, interrupts, etc) are in this register group. Everything else depends on your security model.
First off, there are many different models that can be used in TrustZone; TrustZone is a tool not a solution. The most basic model is a library with API where some secure data is stored (ie decryption keys) to process by an external source (some DRM download from the 'normal world' space). I assume you don't mean this.
An OS can be pre-emptible and non-premptible. If you have two OSes in both worlds, then how control is relinquished, resources shared and security assets protected will all come into play on a world switch.
In many cases, the caches and TLB are world aware. Devices may also be world aware and designed with the intent that context is built into the device. This is not to say that some system might have information leaked in some way.
Meltdown (2017)
Specter (2017)
Hyperthreading exploit (2004)
If you are really concerned about this type of attack, it may be appropriate to mark the secure world memory as non-cached that needs to be protected. In many ARM systems, the L1/L2 and TLB cache are unified between worlds and can provide a side channel attack.
TrustZone as implmented on many ARM devices comes with a GIC which can run FIQ in the secure world and masking of FIQ can be prevented in the normal world. Many GIC features are banked between worlds allowing both OSes to use it without 'context switch' information. Ie, the NS bit will automatically change the accessed GIC features based on the state of the NS bit (so it has the context stored in the device). Many other vendor specific devices are designed to behave this way.
If both worlds use NEON/VFP, then you need to save/restore these registers on a world switch as well. For pre-emption you may need to hook into the OS secure scheduler to allow and normal world interrupt to pre-empt the secure world main line (obviously this depends on assets you are trying to protect; if you allow this the secure mainline has a DOS vector).
If there are glitches in devices, then you may need to save/restore device state. If the normal world is restricted from using FIQ mode, it is still needed to at least clear the SP_fiq and LR_fiq when going to the normal world (and restore the secure value the other way). Some of these registers are difficult to save/restore as you must switch modes which can itself be a security risk if care is not taken.
RAM: This must be 'partitioned' and used by both modes. So addressing is just an offset into the 'partition'. Is this right?
Secure boot will partition memory based on the 'NS bit'. The physical memory will be visible or not based on the partition manager device logic which can often be locked at boot. Ie, if non-visible it is a bus error like any non-existent memory. There is no 'switch' beside the NS bit.
Is there anything in moving from non-secure to secure modes that would make it more expensive than the regular process context switch?
Yes a normal switch is only for a 'mode'. A world is for all ARM modes and so all banked registers must be switched. Depending on the system the TLB and cache would not normally need to be switched.
Related:
How to introspect normal world
TrustZone monitor mode switch design
Preventing memory access from the normal world
How is a TrustZone OS secure?
TrustZone scheduler in secure/non-secure OS
IMX53 and TrustZone
ARM Trusted firmware on github
TrustZone Whitepaper

Related

Is there any relationship between Secure Boot and Kernel Lockdown?

As far as I googled till now, the two features seem independent.
Secure Boot is dependent on Kernel Signature, so the bootloader will be checking (Kernel/Single Image Application) Signature if Valid will Call Kernel Start Function.
Lockdown is another feature where "The lockdown code is intended to allow for kernels to be locked down early in boot - sufficiently early that we don't have the ability to kmalloc() yet. Disallowing even privileged User from accessing Confidential Data present in kernel memory."
Lockdown comes in via boot parameters/sysfs control after kernel is authenticated to be
valid.
Is my understanding correct?
So with a Secure Boot Disabled, the Lockdown Feature should still work.
Yes, I'd say that your understanding is correct.
Secure boot is a security feature implemented in hardware (i.e. directly in your CPU, though it could also be implemented in UEFI firmware). It's a mechanism of verification that is done as the very first thing when powering on the computer. Some known public keys are stored in hardware and are used to verify the signature of the bootloader before running it. The process can be then repeated through the multiple stages of the boot process where each stage verifies the following one until the OS is started. To know more, take a look at this page of the Debian documentation about secure boot.
Kernel lockdown is a security feature of the Linux kernel, which was recently introduced in version 5.4 as an optional security module. As mentioned in this interesting article from LWN, the purpose of kernel lockdown is to enforce a distinction between running as root and the ability to run code in kernel mode. Depending on the configuration, kernel lockdown can disable features of the kernel that allow modifications of the running kernel or the extraction of confidential information from userspace. Take a look at the relevant commit message that introduced the feature to Linux.
The relationship between secure boot and kernel lockdown can be explained by this very important consideration (from the same LWN article linked above):
Proponents of UEFI secure boot maintain that this separation [i.e. kernel lockdown] is necessary; otherwise the promise of secure boot (that the system will only run trusted code in kernel mode) cannot be kept. Closing off the paths by which a privileged attacker could run arbitrary code in kernel mode requires disabling a number of features in the kernel.
In other words, one could argue that secure boot is useless if the kernel that is verified and ran can be then modified by userland processes. Indeed, without proper kernel hardening, a potential attacker could exploit privileged userland processes to alter the running kernel and therefore tear down the whole security model of the system.
On the other hand, one could as easily argue that kernel lockdown is useless without secure boot since a potential attacker could compromise the boot chain (e.g. modifying the bootloader or the kernel image on the disk) and make the machine run a modified kernel on the next boot.
Nonetheless, the two features are independent of each other. There are still valid use cases of secure boot without kernel lockdown, and vice versa kernel lockdown without secure boot. It all ultimately depends on what's your threat model.

How is an ARM TrustZone secure OS secure?

I am trying to read the TrustZone white paper but it is really difficult to understand some of the basic stuff. I have some questions about it. They may be simple questions but I am a beginner in this field:
What makes secure world really "secure". I mean why normal world might be tampered with but not the secure world?
Who can change secure os? I mean like adding a "service"? can for example an application developer for mobile pay application add a service in the Secure OS to work with his app? if Yes then how any developer can add to the secure OS and it is still secure?
What prevents a malicious application form the normal OS to raise an SMC exception and transfer to Secure OS?,answered
The idea of a secure world is to keep the code executing there as small and as simple as possible - the bare minimum to fulfil its duties (usually controlling access to some resource like encryption keys or hardware or facilitating some secure functions like encryption/decryption).
Because the amount of code in the secure world is small, it can be audited easily and there's reduced surface area for bugs to be introduced. However, it does not mean that the secure world is automatically 'secure'. If there is a vulnerability in the secure world code, it can be exploited just like any other security vulnerability.
Contrast this with code executing in the normal world. For example, the Linux kernel is much more complex and much harder to audit. There are plenty of examples of kernel vulnerabilities and exploits that allow malicious code to take over the kernel.
To illustrate this point, let's suppose you have a system where users can pay money via some challenge-response transaction system. When they want to make a transaction, the device must wait for the user to press a physical button before it signs the transaction with a cryptographic key and authorises the payment.
But what if some malicious code exploited a kernel bug and is able to run arbitrary code in kernel mode? Normally this means total defeat. The malware is able to bypass all control mechanisms and read out the signing keys. Now the malware can make payments to anyone it wants without even needing the user to press a button.
What if there was a way that allows for signing transactions without the Linux kernel knowing the actual key? Enter the secure world system.
We can have a small secure world OS with the sole purpose of signing transactions and holding onto the signing key. However, it will refuse to sign a transaction unless the user presses a special button. It's a very small OS (in the kilobytes) and you've hired people to audit it. For all intents and purposes, there are no bugs or security vulnerabilities in the secure world OS.
When the normal world OS (e.g. Linux) needs to sign a transaction, it makes a SMC call to transfer control to the secure world (note, the normal world is not allowed to modify/read the secure world at all) with the transaction it wants to sign. The secure world OS will wait for a button press from the user, sign the transaction, then transfer control back to normal world.
Now, imagine the same situation where malware has taken over the Linux kernel. The malware now can't read the signing key because it's in the secure world. The malware can't sign transactions without the user's consent since the secure world OS will refuse to sign a transaction unless the user presses the button.
This kind of use case is what the secure world is designed for. The whole idea is the hardware enforced separation between the secure and normal world. From the normal world, there is no way to directly tamper with the secure world because the hardware guarantees that.
I haven't worked with TrustZone in particular but I imagine once the secure world OS has booted, there is no way to directly modify it. I don't think application developers should be able to 'add' services to the secure world OS since that would defeat the purpose of it. I haven't seen any vendors allowing third parties to add code to their secure world OS.
To answer your last question, I've already answered it in an answer here. SMC exceptions are how you request a service from the secure world OS - they're basically system calls but for the secure world OS. What would malicious code gain by transferring control to the secure world?
You cannot modify/read the secure world from the normal world
When you transfer control to the secure world, you lose control in the normal world
What makes secure world really "secure". I mean why normal world might be tampered with but not the secure world?
The security system designer makes it secure. TrustZone is a tool. It provides a way to partition PHYSICAL memory. This can prevent a DMA attack. TrustZone generally supports lock at boot features. So once a physical mapping is complete (secure/normal world permissions) they can not be changed. TrustZone gives tools to partition interrupts as well as boot securely.
It is important to note that the secure world is a technical term. It is just a different state than the normal world. The name secure world doesn't make it secure! The system designer must. It is highly dependant on what the secure assets are. TrustZone only gives tools to partition things that can prevent the normal world access.
Conceptually there are two types of TrustZone secure world code.
A library - here there will not usually be interrupts used in the secure world. The secure API is a magic eight ball. You can ask it a question and it will give you an answer. For instance, it is possible that some Digital rights management system might use this approach. The secret keys will be hidden and in-accessible in the normal world.
A secure OS - here the secure world will have interrupts. This is more complex as interrupts imply some sort of pre-emption. The secure OS may or may not use the MMU. Usually the MMU is needed if the system cache will be used.
Those are two big differences between final TrustZone solutions. The depend on the system design and what the end application is. TrustZone is ONLY part of the tools to try and achieve this.
Who can change secure os? I mean like adding a "service"? can for example an application developer for mobile pay application add a service in the Secure OS to work with his app? if Yes then how any developer can add to the secure OS and it is still secure?
This is not defined by TrustZone. It is up to the SOC vendor (people who licence from ARM and build the CPU) to provide a secure boot technology. The Secure OS might be in ROM and not changeable for instance. Other methods are that the secure code is digitally signed. In this case, there is probably on-chip secure ROM that verifies the digital signing. The SOC vendor will provide (usually NDA) information and techniques for the secure boot. It usually depends on their target market. For instance, physical tamper protection and encrypt/decrypt hardware maybe also included in the SOC.
The on-chip ROM (only programmed by the SOC vendor) usually has mechanisms to boot from different sources like NAND flash, serial NOR flash, eMMC, ROM, Ethernet, etc). Typically it will have some one time fuseable memory (EPROM) that a device/solution vendor (the people that make things secure for the application) can program to configure the system.
Other features are secure debug, secure JTAG, etc. Obviously all of these are possible attack vectors.

Is it possible to circumvent OS security by not using the supplied System Calls?

I understand that an Operating System forces security policies on users when they use the system and filesystem via the System Calls supplied by stated OS.
Is it possible to circumvent this security by implementing your own hardware instructions instead of making use of the supplied System Call Interface of the OS? Even writing a single bit to a file where you normally have no access to would be enough.
First, for simplicity, I'm considering the OS and Kernel are the same thing.
A CPU can be in different modes when executing code.
Lets say a hypothetical CPU has just two modes of execution (Supervisor and User)
When in Supervisor mode, you are allowed to execute any instructions, and you have full access to the hardware resources.
When in User mode, there is subset of instructions you don't have access to, such has instructions to deal with hardware or change the CPU mode. Trying to execute one of those instructions will cause the OS to be notified your application is misbehaving, and it will be terminated. This notification is done through interrupts. Also, when in User mode, you will only have access to a portion of the memory, so your application can't even touch memory it is not supposed to.
Now, the trick for this to work is that while in Supervisor Mode, you can switch to User Mode, since it's a less privileged mode, but while in User Mode, you can't go back to Supervisor Mode, since the instructions for that are not permitted anymore.
The only way to go back to Supervisor mode is through system calls, or interrupts. That enables the OS to have full control of the hardware.
A possible example how everything fits together for this hypothetical CPU:
The CPU boots in Supervisor mode
Since the CPU starts in Supervisor Mode, the first thing to run has access to the full system. This is the OS.
The OS setups the hardware anyway it wants, memory protections, etc.
The OS launches any application you want after configuring permissions for that application. Launching the application switches to User Mode.
The application is running, and only has access to the resources the OS allowed when launching it. Any access to hardware resources need to go through System Calls.
I've only explained the flow for a single application.
As a bonus to help you understand how this fits together with several applications running, a simplified view of how preemptive multitasking works:
In a real-world situation. The OS will setup an hardware timer before launching any applications.
When this timer expires, it causes the CPU to interrupt whatever it was doing (e.g: Running an application), switch to Supervisor Mode and execute code at a predetermined location, which belongs to the OS and applications don't have access to.
Since we're back into Supervisor Mode and running OS code, the OS now picks the next application to run, setups any required permissions, switches to User Mode and resumes that application.
This timer interrupts are how you get the illusion of multitasking. The OS keeps changing between applications quickly.
The bottom line here is that unless there are bugs in the OS (or the hardware design), the only way an application can go from User Mode to Supervisor Mode is through the OS itself with a System Call.
This is the mechanism I use in my hobby project (a virtual computer) https://github.com/ruifig/G4DevKit.
HW devices are connected to CPU trough bus, and CPU does use to communicate with them in/out instructions to read/write values at I/O ports (not used with current HW too much, in early age of home computers this was the common way), or a part of device memory is "mapped" into CPU address space, and CPU controls the device by writing values at defined locations in that shared memory.
All of this should be not accessible at "user level" context, where common applications are executed by OS (so application trying to write to that shared device memory would crash on illegal memory access, actually that piece of memory is usually not even mapped into user space, ie. not existing from user application point of view). Direct in/out instructions are blocked too at CPU level.
The device is controlled by the driver code, which is either run is specially configured user-level context, which has the particular ports and memory mapped (micro-kernel model, where drivers are not part of kernel, like OS MINIX). This architecture is more robust (crash in driver can't take down kernel, kernel can isolate problematic driver and restart it, or just kill it completely), but the context switches between kernel and user level are a very costly operation, so the throughput of data is hurt a bit.
Or the device drivers code runs on kernel-level (monolithic kernel model like Linux), so any vulnerability in driver code can attack the kernel directly (still not trivial, but lot more easier than trying to get tunnel out of user context trough some kernel bug). But the overall performance of I/O is better (especially with devices like graphics cards or RAID disc clusters, where the data bandwidth goes into GiBs per second). For example this is the reason why early USB drivers are such huge security risk, as they tend to be bugged a lot, so a specially crafted USB device can execute some rogue code from device in kernel-level context.
So, as Hyd already answered, under ordinary circumstances, when everything works as it should, user-level application should be not able to emit single bit outside of it's user sandbox, and suspicious behaviour outside of system calls will be either ignored, or crash the app.
If you find a way to break this rule, it's security vulnerability and those get usually patched ASAP, when the OS vendor gets notified about it.
Although some of the current problems are difficult to patch. For example "row hammering" of current DRAM chips can't be fixed at SW (OS) or CPU (configuration/firmware flash) level at all! Most of the current PC HW is vulnerable to this kind of attack.
Or in mobile world the devices are using the radiochips which are based on legacy designs, with closed source firmware developed years ago, so if you have enough resources to pay for a research on these, it's very likely you would be able to seize any particular device by fake BTS station sending malicious radio signal to the target device.
Etc... it's constant war between vendors with security researchers to patch all vulnerabilities, and hackers to find ideally zero day exploit, or at least picking up users who don't patch their devices/SW fast enough with known bugs.
Not normally. If it is possible it is because of an operating system software error. If the software error is discovered it is fixed fast as it is considered to be a software vulnerability, which equals bad news.
"System" calls execute at a higher processor level than the application: generally kernel mode (but system systems have multiple system level modes).
What you see as a "system" call is actually just a wrapper that sets up registers then triggers a Change Mode Exception of some kind (the method is system specific). The system exception hander dispatches to the appropriate system server.
You cannot just write your own function and do bad things. True, sometimes people find bugs that allow circumventing the system protections. As a general principle, you cannot access devices unless you do it through the system services.

TrustZone Memory Partitioning

I am reading about ARM Trustzone at this link. I understand that using TrustZone, one can partition the memory into secure and non-secure regions. Vendors may use this to run a secure OS.
What I am curious about is that what is the granularity support for this partition ? Is it just that there can be a block of memory marked "secure" and there can be only one such block of memory per OS ? Does TrustZone have the capacity to partition memory for individual processes ?
Lets say I have a .so file (hypothetical example) for a Linux application. Could it be possible that the same code in process A could be marked secure in virtual address 0x1000 to 0x2000, while in process B could be marked secure in virtual address 0x5000 to 0x6000 ?
TrustZone partitioning happens at the physical memory level, so the process-level parts of your question don't really apply. Note that Linux as the non-secure OS can't even see secure memory, so having virtual mappings for inaccessible addresses would be of little use; however the secure OS does have the ability to map both secure and non-secure physical addresses by virtue of the NS bit in its page table entries.
As for how that physical partitioning goes, it depends on the implementation. The TZC-380 your link refers to supports 2-16 regions with a minimum 32KB granularity; its successor the TZC-400 has 9 regions, and goes all the way down to 4KB granularity. Other implementations may be different still, although granularity below 4KB is unlikely since that would be pretty much unusable for the CPU with its MMU on. Also, there are usually some things in a system which are going to be hardwired to the secure memory map only (the TZC's programming interface, for one), and that often includes some dedicated secure SRAM.

What is partition checker in ARM Secure Mode

As per this link
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0333h/Chdfjdgi.html
under
System boot sequence
...
Program the partition checker to allocate physical memory available to the Non-secure OS.
What is the partition checker? Is it a subsystem which has registers, what is its programming model ?
What is the partition checker?
It is outside of the TrustZone specification for the CPU. However, in a nut shell it partitions or divided memory spaces into different permitted accesses. If the access is not permitted, it throws an external BUS error.
Is it a subsystem which has registers, what is its programming model?
Typically, it is a bunch of registers. It maybe multiple register files. For instance, an APB (peripheral bus), AHB (older ARM bus) and a new AXI (TrustZone aware bus) may all be present in one system. There may even be multiple APB buses, etc.
From the same page,
The principle of TrustZone memory management is to partition the physical memory into Secure and Non-secure regions.
It should be added that partitioning the masters as secure and non-secure is also important. The partitioning is outside the ARM CPU TrustZone specification; it is part of the BUS architecture. It is up to a bus controller/structure to implement this. The bus controller has both masters (CPUs, DMA peripherals, etc) and slaves (memory devices, register interfaces, etc) connected.
Partitioning in the context of the ARM TrustZone document is a little nebulous as it is up to each SOC and the bus controllers (and hierarchy) to implement the details. As above, it partitions or divided memory spaces into different permitted accesses. This is just like supervisor versus user access with traditional ARM (AMABA) AHB buses. The AXI interface adds an NS bit.
Here are possible combinations for a bus controller to support.
| Read | Write
-------------+--------+-------
Normal User | yes/no | yes/no
Normal Super | yes/no | yes/no
Secure User | yes/no | yes/no
Secure Super | yes/no | yes/no
The SCR NS bit will dynamically determine whether the 'NS' bit is set on bus accesses. This is a TrustZone difference. For the super and user, there is a traditional HPROT bit. As well, each master will assert a WRITE/~READ signal (maybe the polarity is different, but we are software not hardware).
A DMA master (Ethernet, USB, etc) may also send out requests to a BUS. Typically, these are setup and locked at boot time. If your secure world uses the Ethernet, then it is probably a secure DMA master to access secure memory. The Ethernet chip also typically has a slave register interface. It must be marked (or partitioned) as secure. If the normal world accesses the ethernet register file, then an BUS error is thrown. A vendor may also make DMA peripherals that dynamically set the NS bit depending on the command structure. The CAAM is a crypto driver that can setup job descriptions to handle both normal and secure access, as an example of a DMA master which does both.
A CPU (say Cortex-M4 or Cortex-R) may also be globally secure or normal. Only the Cortex-A series (and ARMv6) with full TrustZone will dynamically toggle the NS bit allowing the CPU to be both secure and normal, depending on context.
Slave peripherals maybe partitioned. For example, the first 10MB of SDRAM maybe both normal and secure read and write for inter-world communication. Then next 54MB, maybe normal only read/write for the normal world. Then a final 64MB of read/write secure for the secure world. Typically, register interfaces for peripherals are an all or none setup.
These are all outside of the scope of an MMU and deal only with physical addresses. If the SOC locks them after boot, it is impossible for anyone to change the mapping. If the secure world code is read-only, it maybe more difficult to engineer an exploit.
Typically, all APB buses are layered on an AHB bus, which connects to an AXI main bus like a tree. The AXI bus is the default for a Cortex-A. Each BUS will have a list of slaves and masters and will support various yes and no configurations, which maybe a subset of the list above; Ie, it may not care about read/write or super/user or some other permutations. It will be different for each ARM system. In some cases, a vendor may not even support it. In this case, it maybe more difficult to make the system secure or even use TrustZone. See: Handling ARM TrustZones‌​, where some of the bus issues are touched on in less details.
See: TrustZone versus Hypervisor which gives some more details.

Resources