Embedded Linux - Booting phases - linux

I would like to systematize my U-Boot/linux knowledge. Is it true that minimum 2 bootloader phases are needed in each embedded platform? Or can following process vary?
1st-stage bootloader (can be U-Boot) is stored in internal the processor's ROM and can't be updated. It will run from internal cache memory. This U-Boot needs to (at least): initialize RAM, initialize external flash, initialize serial console, read and run 2nd-stage bootloader.
2nd-stage bootloader (can be U-Boot) is stored in RW flash memory. It will handle ethernet, flash RW functions, etc. This U-Boot can be customized and overwritten. Main task is to load linux kernel into RAM and run it.
linux kernel startup.
Is 1st-stage bootloader always Read-Only?

Where, how that first bootloader is is heavily system dependent. You might have some sort of usb bootable device that enumerates and downloads firmware to ram all in hardware then the processor boots from that ram.
Normally yes the first boot is some sort of flash. It is a good idea to have that first bootloader uber simple, essentially 100% bug free and durable and reliable with perhaps a serial or other way to get in so that you can use it to replace the second/real bootloader.
Ideally the second bootloader wants to be flash as well, the second bootloader would want to do the bulk of the work, initializing ddr, setting up ethernet if it wants to have some sort of ethernet based debugging or transferring of files, bootp, etc. Being significantly larger and more complicated it is expected to both have bugs and need to be upgraded more often than the primary bootloader. The primary is hopefully protected from being overwritten, so that you can comfortably replace the second bootloader without bricking the system.
Do all systems use the above? No, some/many may only use a single bootloader, with there perhaps being a pause very early so that a keystroke on a serial port can interrupt the bootloader taking you to a place where you can re-load the bootloader. Allowing for bootloader development with fewer chances at bricking but still a chance if you mess up that first bit before and including the keystroke and serial flash loader thing. Here again that serial loader thing is not always present, just a convenience for the bootloader developers. Often the fallback will be jtag, or a removable prom or some other system way to get in and reprogram the prom when you brick it (also, sometimes the same way you program it the first time in system when the board is produced, some designs are brickable to save on cost and use pre-programmed flashes during manufacturing so the first boot works).
A linux bootloader does not require any/all of this, a very very minimal, setup ram, prep the command line or atags or whatever and branch to linux.
It is a loaded question as the answer is heavily dependent on your system, processor, design engineers (including you). Traditionally processors boot from flash and the bootloader gets memory and some other things up so the next bit of code can run. That next bit of code can come from many places, usb, disk, flash/rom, ethernet/bootp/tftp, pcie, mdio, spi, i2c, etc. And there can be as many layers between power on reset and linux starting as the design desires or requires.

The first-stage bootloader doesn't have to be read-only - but putting a read-only bootloader in ROM with some recovery mode is helpful in case you corrupt the read-write parts of flash; otherwise you'll need to physically attach a programmer to the flash chip in order to recover.

If you are using U-Boot, the 2nd stage bootloader can be skipped to speed up the boot time. In other words, the first stage bootloader (SPL) will load the Linux kernel directly, skipping the second stage bootloader (u-boot). In U-Boot, this is called the Falcon Mode.

Related

How u-boot start instruction is found by ROM Code

I am trying to understand ARM Linux Boot Process.
These are the things I understood:
When reset button is pressed in any processor, it jumps to the reset vector or address, in case of ARM it is either 0x00 or 0xFFFF0000.
This location contains the start up code or ROM Code or Boot ROM Code
My query is how this Boot ROM Code gets the address of u-boot first instruction ?
It depends on the SoC, and the scheme used for booting will differ from one SoC to the other. It is usually documented in the SoC's reference manual, and it does describe the various conventions (where to read u-boot from, specific addresses) the u-boot port specific to this SoC should follow in order to the code in ROM to be able to load u-boot, and ultimately transfer control to u-boot.
This code in the ROM could do something like: - If pin x is 0, read 64KiB from first sector of the eMMC into the On Chip Static RAM, then transfer control to the code located at offset 256 of the OCRAM for example. - If pin x is 1, configure the UART for 19200 bauds, 8 bits parity, no stop bits, attempt to read 64KiB from the serial port using the X-MODEM protocol into the OCRAM, then transfer control to the code located at offset 256 of the OCRAM.This code, which is often named the Secondary Program Loader (SPL) would then be responsible for, say, configuring the SDRAM controller, then read the non-SPL part of u-boot into at the beginnning of the SDRAM, then jump to a specific address in the SDRAM. The SPL for a given SoC should be small enough to fit in the SoC On Chip RAM. The ROM code would be the Primary Boot Loader in this scenario.
In the case of the TI AM335x Cortex-A8 SoCs for example, section 26.1.6 of the Technical Reference Manual, and more specifically figure 26-10, explains the boot process. Some input pins may be used by the ROM code to direct the boot process - see SYSBOOT Configuration Pins in table 26-7. See The AM335x U-Boot User's Guide for more u-boot specific, AM335x-related information.
ARM doesnt make chips it makes IP that chip vendors purchase. It is one block in their chip design, usually they have many other blocks, usb controllers (likely purchased ip), pcie controller (likely purchased ip), ddr, ethernet, sata, emmc/sd, etc. Plus glue logic, plus whatever their secret sauce is they add to this to make it different and more interesting to the consumer than the competition.
The address space, particularly for the full sized arms is fairly wide open, so even if they use the same usb ip as everyone else, doesnt mean it is at the same address as everyone else.
There is no reason to assume that all chips with a cortex-a are centered around the cortex-a, the cortex-a may just be there to boot and manage the real thing the chip was made for. The chips you are asking about are most likely centered around the ARM processor the purpose of the chip is to make a "CPU" that is ARM based. What we have seen in that market is a need to support various non-volatile storage solutions. Some may wish to be ram heavy and dont care that a slow spi flash is used to get the kernel and root file system over and runtime everything is ram based including the file system. Some may wish to support traditional hard drives as well as ram, the file system being on something sata for example, spinning media or ssd. Some may wish to use eMMC, SD, etc. With the very high cost of chip production, it does not make sense to make one chip for each combination, rather make one chip that supports many combinations. You use several "strap" pins (which are not pins but balls/pads on the BGA) that the customer ties to ground or "high" whatever the definition of that voltage is so that when the chip comes out of reset (whichever of the reset pins for that product are documented as sampling the strap pins) those strap pins tell the "processor" (chip as an entity) how you desire it to boot. I want to you to first look for an sd card on this spi bus if nothing there then look for a sata drive on this interface if nothing there then please fall into the xmodem bootloader on uart0.
This leads into Frant's excellent answer. What IP is in the chip, what possible non-volatile storage is supported and what possible solutions of loading a bootloader if the "chip" itself supports it are very much chip specific not just broadcom does it this way and ti does it another but a specific chip or family of chips within their possibly vast array of products, no reason to assume any two products from a vendor work the same way, you read the documentation for each one you are interested in. Certainly dont assume any two vendors details are remotely the same, it is very likely that they have purchased similar ip for certain technologies (maybe everyone uses the same usb ip for example, or most usb ip conforms to a common set of registers, or maybe not...).
I have not gotten to the arm core, you could in these designs likely change your mind and pull the arm out and put a mips in and sell that as a product...
Now does it make sense to say write logic to read a spi flash that loads the contents of that flash into an internal sram and then for that boot mode put that sram at the arm processors address zero then reset the arm? yes that is not a horrible idea to do that in logic only. But does it make sense for example to have logic dig through a file system of a sata drive to find some bootloader? Maybe not so much, possible sure, but maybe your product will be viable longer if you instead put a rom in the product that can be aimed at the arm address zero the arm boots that, the arm code in that rom reads the straps, makes the decision as to what the boot media is spins up that peripheral (sata, emmc, spi, etc) wades through the filesystem looking for a filename, copies that file to sram, re-maps the arm address space (not using an mmu but using the logic in the chip) and fakes a reset to that by branching to address zero. (the rom is mapped in two places at least address zero then some other address so that it can branch to the other address space allowing address zero to be remapped and reused). so that if down the road you find a bug all you hve to do is change the image burned into the rom before you deliver the chips, rather than spin the chip to change the transistors and/or wiring of the transistors (pennies and days/weeks vs millions of dollars and months). so you may actually never see or be the code that the arm processor boots into on reset. The reset line to the arm core you might never have any physical nor software access to.
THEN depending on the myriad of boot options for this or any of the many chip offerings, the next step is very much specific to that chip and possibly that boot mode. You as owning all the bootcode for that board level product may have to per the chip and board design, bring up ddr, bring up pcie, bring up usb. Or perhaps some chip vendor logic/code has done some of that for you (unlikely, but maybe for specific boot cases). Now you have these generic and popular "boot loaders" like u-boot, you as the software designer and implementer may choose to have code that preceeds u-boot that does a fair amount of the work because maybe porting u-boot is a PITA, maybe not. Also note u-boot is in no way required for linux, booting linux is easy, u-boot is a monstrosity, a beast of its own, the easiest way to boot linux is to not bother porting u-boot. what u-boot gives is an already written bootloader, it is an argument that can go either way is it cheaper to port u-boot or is it cheaper to just roll your own (or port one of the competitors to u-boot)? Depends on the boot options you want, if you want bootp/tftp or anything network stack, well thats a task although there are off the shelf solutions. If you want to access a file system on some media, well that is another strong argument to just use u-boot. but if you dont need all of that, then maybe you dont need u-boot.
You have to walk the list of things that need to happen before linux boots, the chips tend to not have enough on chip ram to hold the linux kernel image and the root file system, so you need to get ddr up before linux. you probably need to get pcie up and enumerated and maybe usb I have not looked at that. but ethernet that can be brought up by the linux driver for that peripheral as an example.
The requirements to "boot" linux on arm ports of linux and probably others are relatively simple. you copy the linux kernel to some space in memory ideally aligned or at an agreed offset from an aligned address (say 0x10001000 for example, just pulling that out of the air), you then provide a table of information, how much ram there is, the ascii kernel boot string, and these days the device tree information. you branch to the linux kernel with one of the registers say r0 pointed at this table (google ATAG arm linux or some such combination of words). thats it booting linux using a not that old kernel is setting a few dozen bytes in ram, copy the kernel to ram, and branch to it, a few dozen lines of code, no need for the u-boot monstrosity. Now it is more than a few dozen bytes but it is still a table generated independent of u-boot, place that in ram, place the kernel in ram set one or more registers to point at the table, branch to the address where the kernel lives "booting linux" is complete or the linux bootloader is complete.
you still have to port linux which is a task that requires lots of trial and error and eventually years of experience. particularly since linux is a constantly evolving beast in and of itself.
How do you get to u-boot code? you may have some pre-u-boot code you have to write to find u-boot and copy it to ram then branch to it. the chip vendor may have solved this for you and "all you have to do" is put u-boot where they tell you for the media choice, and then u-boot is placed at address zero in the arm memory space for some internal sram, or u-boot is placed at some non-zero address in the arm memory space and some magic (a rom based bootloader in the chip) causes your u-boot code to execute from that address.
One I messed with recently is the ti chip used on various beagle boards, black, green, white, pocket, etc...One of the boot modes it looks at a specific offset on the sd card (not part of a file system yet, a specific logical block if you will or basically specific offset in the sd card address space) for a table, that table includes where in the "processors" address space you want the attached "bootloader" to be copied to, is it compressed, etc. you make your bootloader (roll your own or build a u-boot port) you build the correct table per the documentation with a destination address how much data, possibly a crc/checksum, whatever the docs say. the "chip" magically (probably software but might be pure logic) copies that over and causes the arm to start executing at that address (likely arm software that simply branches there). And that is how YOU get u-boot running on that product line with that boot option.
The SAME product line has other strap options, has other sd-card boot options to get a u-boot loaded and running.
Other products from other vendors have different solutions.
The broadcom chip in the raspberry pi, totally different beast, or at least how it is used. It has a broadcom (invented or purchased) gpu in it, that gpu boots some rom based code that knows how to find its own first stage bootloader on an sd card, that first stage loader does things like initialize DDR, there isnt pcie so that doesnt have to happen and I dont think the gpu cares about usb so that doesnt have to get enumerated either. but it does search out a second stage bootloader of gpu code, which is really an RTOS that it is loading, the code the GPU uses to do its graphics features to offload the burden on the ARM. In addition to that that software also looks for a third file on the flash (and fourth and nth) lets just go with third kernel.img which it copies to ram (the ddr is shared between the gpu and the arm but with different addressing schemes) at an agreed offset (0x8000 if kernel.img is used without config.txt adjustments to that) the gpu then writes a bootstrap program and ATAGs into arms memory at address zero and then releases reset on the ARM core(s). The GPU is the bootloader, with relatively limited options, but for that platform design/solution one media option, a removable sd card, what operating system, etc you run on the arm is whatever is on that sd card.
I think you will find the lots of straps driving multiple possible non-volatile storage media peripherals being the more common solution. Whether or not one or any of these boot options for a particular SOC can take u-boot (or choose your bootloader or write your own) directly or of a pre-u-boot program is required for any number of reasons (on chip sram is too small for a full u-boot lets say for example, sake of argument) is specific to that boot option for that chip from that vendor and is documented somewhere although if you are not part of the company making the board that signed the NDA with that chip vendor you may not get to see that documentation. And/or as you may know or will learn, that doesnt mean the documentation is good or makes sense. Some companies or products do a bad job, some do a good job and most are somewhere in between. If you are paying them for parts and have an NDA you at least have the option of getting or buying tech support and can ask direct questions (again the answers may not be useful, depends on the company and their support).
Just because there is an ARM inside means next to nothing. ARM makes processor cores not IP, depending on the SOC design it may be easy or moderatly painful but possible to pull the arm out and put some other purchased ip (like MIPS) in there or free ip (like risc-v) and re-use the rest of your already tested design/glue. Trying to generalize ARM based processors is like trying to generalize automobiles with V-6 engines, if I have a vehicle with a V-6 engine does that mean the headlight controls are always on the dash to the left of the steering column? Nope!

I/O Memory Mapping

I am reviewing the essentials of I/O, and while I think I understand most of what's going on, I'm still confused as to how either physical addresses or separate ports are mapped to individual devices. Does the computer poll the bus on system boot, assigning addresses to devices one by one, or are there fixed addresses that are loaded into memory somewhere? If this is done via the BIOS, how is this memory layout information relayed to the operating system?
Thanks for your help!
(this question has been asked and answered before, you should search first)
depends on the platform, you were not specific
some systems, some peripherals in those systems, are hardcoded by the chip/system designers.
for pci(e), as defined by that, you enumerate the bus(es) searching for attached peripherals, and those peripherals configuration spaces (which are defined by the peripheral vendor per their needs) indicate how many and how big they need. For an x86 pc, the bios does this enumeration not the operating system. for other platforms it depends on that platform it may be the bootloader or operating system. but someone has to take the available space (basically hardcoded essentially for that platform knowing the platform and what is used already) and divide it up. for x86 it used to be just one gig that was divided up in the 32 bit days, and still happens on some systems, but for 64 bit systems the bioses open that up to 2gig for everyting, and can place that in a high address space to avoid ram (ever wonder why your 32 bit system with 4gig of dram only had 3gig usable?). naturally a flat memory space is only an illusion, the windows asked for by the pci peripherals can be small windows into their space, video cards with lots of ram for example. you use the csrs to move the window about, kind of like standing in your house looking out a small window and physically moving side to side to see more stuff through the window, but only the size of the window at any one time.
same goes for usb, it is enumerated, the busses are searched and the peripherals answer. with usb though it doesnt map into the address space of the host.
how the operating system finds this information is heavily dependent on the type of system. with bios on an x86 there is a known way to get that info, I think you can also get at the same info in dos (yes dos is still heavily used). for non pcie or usb the operating system drivers have to find the peripherals or just know, if the platform is consistent (address of the serial ports in a pc) or have a way of finding them without harming other devices or crashing. there are the cases where the operating system itself did the enumeration. or the bootloader if that is the place where enumeration happened. but each combination of bootloaders and operating systems on top of various platforms may each have their own different solution, no reason to expect them to be the same.
okay you did say bios and have a bios tag, implying x86 systems. the bios does pci/pcie enumeration at boot time, if you dont setup your bios to know that your operating system is 64 bit it may take a gig out of your lower 4Gig space for the pcie devices (and if you set for 64 bit but install a 32 bit operating system, then you are in trouble there for other reasons). I dont remember, but would assume there are bios calls the operating system can use to find out what the bios had done, should not be hard at all to find this information. Anything not discoverable in this way is likely legacy and hardcoded or uses legacy techniques for being discoverable (isa bus style search for a bios across a range of addresses, etc). the pcie/usb vendor and product id information tell the drivers what is there and from that they have hardcoded offsets into those spaces to complete the addresses needed to communicate with the peripherals.

Do initrd really reduce kernel image size in case of Bootpimage?

According to Wikipedia in article about initrd
"Many Linux distributions ship a single, generic kernel image - one that the distribution's developers intend will boot on as wide a variety of hardware as possible. The device drivers for this generic kernel image are included as loadable modules because statically compiling many drivers into one kernel causes the kernel image to be much larger, perhaps too large to boot on computers with limited memory. This then raises the problem of detecting and loading the modules necessary to mount the root file system at boot time, or for that matter, deducing where or what the root file system is.
To avoid having to hardcode handling for so many special cases into the kernel, an initial boot stage with a temporary root file-system — now dubbed early user space — is used. This root file-system can contain user-space helpers which do the hardware detection, module loading and device discovery necessary to get the real root file-system mounted. "
My question is if we add modules etc needed to load actual filesystem in initrd not in actual kernel image to save save then what we will achieve in case of Bootpimage where both kernel and initrd are combined to form a single bootpimage. This size of kernel would increase even by using initrd.
Can someone clarify ?
Define "the size of the kernel".
Yes, if you have a minimal kernel image plus an initrd full of hundreds of modules, it will probably take up more total storage space than the equivalent kernel with everything compiled in, what with all the module headers and such. However, once it's booted, determined what hardware it's on, loaded a few modules and thrown all the rest away (the init in initrd), it will take up considerably less memory. The all-built-in kernel image on the other hand, once booted, is still just as big in memory as on disk, wasting space with all that unneeded driver code.
Storage is almost always considerably cheaper and more abundant than RAM, so optimising for storage space at the cost of reducing available memory once the system is running would generally be a bit silly. Even for network booting, sacrificing runtime capability for total image size for the sake of booting slightly faster makes little sense. The few kinds of systems where such considerations might have any merit almost certainly wouldn't be using generic multiplatform kernels in the first place.
There are several aspects to size and this maybe confusing.
Binary size on disk/network
Boot time size
Run time size
tl-dr; Using an initrd with modules gives a generic image a minimum run time memory footprint with current (3.17) Linux kernel code.
My question is if we add modules etc needed to load actual filesystem in initrd not in actual kernel image to save save then what we will achieve in case of Bootpimage where both kernel and initrd are combined to form a single bootpimage. This size of kernel would increase even by using initrd.
You are correct in that the same amount of data will be transfered no matter which mechanism you chose. In fact, the initrd with module loading will be bigger than a fully statically linked kernel and the boot time will be slower. Sounds bad.
A customized kernel which is specifically built for the device and contains no extra hardware driver nor module support is always the best. The Debian handbook on kernel compilation give two reason that a use may want to make a custom kernel.
Limit the risk of security problems via feature minimization.
to optimize memory consumption
The second option is often the most critical parameter. To minimize the amount of memory that a running kernel consumes. The initrd (or initramfs) is a binary disk image that is loaded as a ram disk. It is all user code with the single task of probing the devices and using module loading to get the correct drivers for the system. After this job is done, it mounts a real boot device or the normal root file system. When this happens, the initrd image is discarded.
The initrd does not consume run-time memory. You get both a generic image and one that has a fairly minimal run time footprint.
I will say that the efforts made by distro people have on occasion created performance issues. Typically ARM drivers were only compiled for one SOC; although the source supported an SOC family, but only one could be selected through conditions. In more recent kernels the ARM drivers always support the whole SOC family. The memory overhead is minimal. However, using a function pointer for a low-level driver transfer function can limit the bandwidth of the controller.
The cacheflush routine have an option for multi-cache. The function pointers cause compilers to automatically spill. However, if you compile for a specific cache type, the compiler can inline functions. This often generates much better and smaller code. Most drivers do not have this type of infra-structure. But you will have better run-time behavior if you compile a monolithic kernel that is tuned for your CPU. Several critical kernel functions will use inlined functions.
Drivers will not usually be faster when compiled in to the kernel. Many systems support hot-plug via USB, PCMCIA, SDIO, etc. These systems have a memory advantage with module loading as well.

Difference between user-space driver and kernel driver [duplicate]

This question already has answers here:
Userspace vs kernel space driver
(2 answers)
Closed 5 years ago.
I have been reading "Linux Device Drivers" by Jonathan Corbet. I have some questions that I want to know:
What are the main differences between a user-space driver and a kernel driver?
What are the limitations of both of them?
Why user-space drivers are commonly used and preferred nowadays over kernel drivers?
What are the main differences between a user-space driver and a kernel driver?
User space drivers run in user space. Kernel drivers run in kernel space.
What are the limitations of both of them?
The kernel driver can do anything the kernel can, so you could say it has no limitations. But kernel drivers are much harder to "prove correct" and debug. It's all-to-easy to introduce race conditions, or use a kernel function in the wrong context or with the wrong locking. Things will appear to work for a while, but cause problems (including crashing the whole system) down the road. Drivers must also be wary when reading all user input (both from the device and from userspace) because invalid data can sometimes cause crashes.
A user-space driver usually needs a small shim in the kernel to do it's bidding. Usually, that 'shim' provides a simpler API. For example, the FUSE layer lets people write file systems in any language. They can be mounted, read/written, then unmounted. The shim must also protect the kernel against all invalid input.
User-space drivers have lots of limitations. For example, the kernel reserves some memory for use during emergencies, but that is not available for users-space. During memory pressure, the kernel will kill random user-space programs, but never kill kernel threads. User-space programs may be swapped out, which could lead to your device being unavailable for several seconds. (Kernel code can not be swapped out.) Running code in user-space requires several context switches. These waste a "lot" of CPU time. If your device is a 300 baud modem, nobody will notice. But if it's a gigabit Ethernet card, and every packet has to go to your userspace driver before it gets to the real user, the system will have major bottlenecks.
User space programs are also "harder" to use because you have to install that user-space software, which often has many library dependencies. Kernel modules "just work".
Why user-space drivers are commonly used and preferred nowadays over kernel drivers?
The question is "Does this complexity really need to be in the kernel?"
I used to work for a company that made USB dongles that talked a particular protocol. We could have written a full kernel driver, but instead just wrote our program on top of libUSB.
The advantages: The program was portable between Linux, Mac, Win. No worrying about our code vs the GPL.
The disadvantages: If the device needed to data to the PC and get a response quickly, there is no guarantee that would happen. For example, if we needed a real-time control loop on the PC, it would be harder to have bounded response times. (Maybe not entirely impossible on Linux.)
If there is a way to do it in userspace, I would try that first. Only if there are significant performance bottlenecks, or significant complexity in keeping it in userspace would you move it. Even then, consider the "shim" approach, and/or the "emulator" approach (where your kernel module makes your device look like a serial port or a block device.)
On the other hand, if there are already several kernel modules similar to what you want, then start there.

Multi threaded BIOS

I would like to know why the BIOS is single-threaded even we have 4 cores/8 cores. Latest UEFI technology allows GUI utilities. Is there any specific reason for not implementing Multi-threaded BIOS.
The simple answer is: Diminishing Returns
On most PCs, the boot sequence of BIOS/UEFI only takes ~5 seconds to work (Not counting HDD spinup latency). To most people, that is fast enough. (If you want faster, put your PC to sleep instead of turning it off.)
Keep in mind that many of the tasks done in the BIOS cannot be parallelized. The memory controller has to be initialized first. The PCI/PCIe busses must be enumerated before you can check any of the subsequent devices (USB, SATA, Video, etc). You can't boot until you disks have spun up.
There are a few initialization items that are time-consuming, and could be done in parallel.
IDE/SATA - Usually takes a while due to mechanical disk latencies.
USB - Some USB devices need 100s of msec after power is applied to come to life.
Video (any any other third-party BIOS extensions) - It takes a while to communicate with the displays and sync up.
Those tasks could be done in parallel, which might speed up your PC's boot time. Keep in mind that to get there, you need to write a kernel and task scheduler. In legacy BIOS (pure x86 assembler), this would not be pretty. In UEFI (which is mostly C source), this is a little more feasible. However, it still requires a non-trivial engineering effort for a minor gain (maybe 1-2 second of boot time.)
Phoenix has tried to introduce a multi-threaded BIOS initialization before. As far as I know, it never took off.
Because there is no need. The BIOS does not do heavy computations. It does some coordination and then exits (forever).
UEFI does not describe any multiprocessing functionality. However, the PI specification (also produced by the UEFI Forum) does, and EDK2 provides the EFI_MP_SERVICES_PROTOCOL (currently for IA32/X64 only).
It is not exactly pthreads, but it does let you schedule tasks to run on Application Processors while the Bootstrap Processor keeps providing the single-threaded UEFI instance.
The interface for DXE phase is described in Volume 3 of the v1.5 PI specification, section MP Services Protocol (13.4).
Functionality available during PEI are described by Volume 2, EFI MP Services PPI (8.3.9).

Resources