I am trying to understand ARM Linux Boot Process.
These are the things I understood:
When reset button is pressed in any processor, it jumps to the reset vector or address, in case of ARM it is either 0x00 or 0xFFFF0000.
This location contains the start up code or ROM Code or Boot ROM Code
My query is how this Boot ROM Code gets the address of u-boot first instruction ?
It depends on the SoC, and the scheme used for booting will differ from one SoC to the other. It is usually documented in the SoC's reference manual, and it does describe the various conventions (where to read u-boot from, specific addresses) the u-boot port specific to this SoC should follow in order to the code in ROM to be able to load u-boot, and ultimately transfer control to u-boot.
This code in the ROM could do something like: - If pin x is 0, read 64KiB from first sector of the eMMC into the On Chip Static RAM, then transfer control to the code located at offset 256 of the OCRAM for example. - If pin x is 1, configure the UART for 19200 bauds, 8 bits parity, no stop bits, attempt to read 64KiB from the serial port using the X-MODEM protocol into the OCRAM, then transfer control to the code located at offset 256 of the OCRAM.This code, which is often named the Secondary Program Loader (SPL) would then be responsible for, say, configuring the SDRAM controller, then read the non-SPL part of u-boot into at the beginnning of the SDRAM, then jump to a specific address in the SDRAM. The SPL for a given SoC should be small enough to fit in the SoC On Chip RAM. The ROM code would be the Primary Boot Loader in this scenario.
In the case of the TI AM335x Cortex-A8 SoCs for example, section 26.1.6 of the Technical Reference Manual, and more specifically figure 26-10, explains the boot process. Some input pins may be used by the ROM code to direct the boot process - see SYSBOOT Configuration Pins in table 26-7. See The AM335x U-Boot User's Guide for more u-boot specific, AM335x-related information.
ARM doesnt make chips it makes IP that chip vendors purchase. It is one block in their chip design, usually they have many other blocks, usb controllers (likely purchased ip), pcie controller (likely purchased ip), ddr, ethernet, sata, emmc/sd, etc. Plus glue logic, plus whatever their secret sauce is they add to this to make it different and more interesting to the consumer than the competition.
The address space, particularly for the full sized arms is fairly wide open, so even if they use the same usb ip as everyone else, doesnt mean it is at the same address as everyone else.
There is no reason to assume that all chips with a cortex-a are centered around the cortex-a, the cortex-a may just be there to boot and manage the real thing the chip was made for. The chips you are asking about are most likely centered around the ARM processor the purpose of the chip is to make a "CPU" that is ARM based. What we have seen in that market is a need to support various non-volatile storage solutions. Some may wish to be ram heavy and dont care that a slow spi flash is used to get the kernel and root file system over and runtime everything is ram based including the file system. Some may wish to support traditional hard drives as well as ram, the file system being on something sata for example, spinning media or ssd. Some may wish to use eMMC, SD, etc. With the very high cost of chip production, it does not make sense to make one chip for each combination, rather make one chip that supports many combinations. You use several "strap" pins (which are not pins but balls/pads on the BGA) that the customer ties to ground or "high" whatever the definition of that voltage is so that when the chip comes out of reset (whichever of the reset pins for that product are documented as sampling the strap pins) those strap pins tell the "processor" (chip as an entity) how you desire it to boot. I want to you to first look for an sd card on this spi bus if nothing there then look for a sata drive on this interface if nothing there then please fall into the xmodem bootloader on uart0.
This leads into Frant's excellent answer. What IP is in the chip, what possible non-volatile storage is supported and what possible solutions of loading a bootloader if the "chip" itself supports it are very much chip specific not just broadcom does it this way and ti does it another but a specific chip or family of chips within their possibly vast array of products, no reason to assume any two products from a vendor work the same way, you read the documentation for each one you are interested in. Certainly dont assume any two vendors details are remotely the same, it is very likely that they have purchased similar ip for certain technologies (maybe everyone uses the same usb ip for example, or most usb ip conforms to a common set of registers, or maybe not...).
I have not gotten to the arm core, you could in these designs likely change your mind and pull the arm out and put a mips in and sell that as a product...
Now does it make sense to say write logic to read a spi flash that loads the contents of that flash into an internal sram and then for that boot mode put that sram at the arm processors address zero then reset the arm? yes that is not a horrible idea to do that in logic only. But does it make sense for example to have logic dig through a file system of a sata drive to find some bootloader? Maybe not so much, possible sure, but maybe your product will be viable longer if you instead put a rom in the product that can be aimed at the arm address zero the arm boots that, the arm code in that rom reads the straps, makes the decision as to what the boot media is spins up that peripheral (sata, emmc, spi, etc) wades through the filesystem looking for a filename, copies that file to sram, re-maps the arm address space (not using an mmu but using the logic in the chip) and fakes a reset to that by branching to address zero. (the rom is mapped in two places at least address zero then some other address so that it can branch to the other address space allowing address zero to be remapped and reused). so that if down the road you find a bug all you hve to do is change the image burned into the rom before you deliver the chips, rather than spin the chip to change the transistors and/or wiring of the transistors (pennies and days/weeks vs millions of dollars and months). so you may actually never see or be the code that the arm processor boots into on reset. The reset line to the arm core you might never have any physical nor software access to.
THEN depending on the myriad of boot options for this or any of the many chip offerings, the next step is very much specific to that chip and possibly that boot mode. You as owning all the bootcode for that board level product may have to per the chip and board design, bring up ddr, bring up pcie, bring up usb. Or perhaps some chip vendor logic/code has done some of that for you (unlikely, but maybe for specific boot cases). Now you have these generic and popular "boot loaders" like u-boot, you as the software designer and implementer may choose to have code that preceeds u-boot that does a fair amount of the work because maybe porting u-boot is a PITA, maybe not. Also note u-boot is in no way required for linux, booting linux is easy, u-boot is a monstrosity, a beast of its own, the easiest way to boot linux is to not bother porting u-boot. what u-boot gives is an already written bootloader, it is an argument that can go either way is it cheaper to port u-boot or is it cheaper to just roll your own (or port one of the competitors to u-boot)? Depends on the boot options you want, if you want bootp/tftp or anything network stack, well thats a task although there are off the shelf solutions. If you want to access a file system on some media, well that is another strong argument to just use u-boot. but if you dont need all of that, then maybe you dont need u-boot.
You have to walk the list of things that need to happen before linux boots, the chips tend to not have enough on chip ram to hold the linux kernel image and the root file system, so you need to get ddr up before linux. you probably need to get pcie up and enumerated and maybe usb I have not looked at that. but ethernet that can be brought up by the linux driver for that peripheral as an example.
The requirements to "boot" linux on arm ports of linux and probably others are relatively simple. you copy the linux kernel to some space in memory ideally aligned or at an agreed offset from an aligned address (say 0x10001000 for example, just pulling that out of the air), you then provide a table of information, how much ram there is, the ascii kernel boot string, and these days the device tree information. you branch to the linux kernel with one of the registers say r0 pointed at this table (google ATAG arm linux or some such combination of words). thats it booting linux using a not that old kernel is setting a few dozen bytes in ram, copy the kernel to ram, and branch to it, a few dozen lines of code, no need for the u-boot monstrosity. Now it is more than a few dozen bytes but it is still a table generated independent of u-boot, place that in ram, place the kernel in ram set one or more registers to point at the table, branch to the address where the kernel lives "booting linux" is complete or the linux bootloader is complete.
you still have to port linux which is a task that requires lots of trial and error and eventually years of experience. particularly since linux is a constantly evolving beast in and of itself.
How do you get to u-boot code? you may have some pre-u-boot code you have to write to find u-boot and copy it to ram then branch to it. the chip vendor may have solved this for you and "all you have to do" is put u-boot where they tell you for the media choice, and then u-boot is placed at address zero in the arm memory space for some internal sram, or u-boot is placed at some non-zero address in the arm memory space and some magic (a rom based bootloader in the chip) causes your u-boot code to execute from that address.
One I messed with recently is the ti chip used on various beagle boards, black, green, white, pocket, etc...One of the boot modes it looks at a specific offset on the sd card (not part of a file system yet, a specific logical block if you will or basically specific offset in the sd card address space) for a table, that table includes where in the "processors" address space you want the attached "bootloader" to be copied to, is it compressed, etc. you make your bootloader (roll your own or build a u-boot port) you build the correct table per the documentation with a destination address how much data, possibly a crc/checksum, whatever the docs say. the "chip" magically (probably software but might be pure logic) copies that over and causes the arm to start executing at that address (likely arm software that simply branches there). And that is how YOU get u-boot running on that product line with that boot option.
The SAME product line has other strap options, has other sd-card boot options to get a u-boot loaded and running.
Other products from other vendors have different solutions.
The broadcom chip in the raspberry pi, totally different beast, or at least how it is used. It has a broadcom (invented or purchased) gpu in it, that gpu boots some rom based code that knows how to find its own first stage bootloader on an sd card, that first stage loader does things like initialize DDR, there isnt pcie so that doesnt have to happen and I dont think the gpu cares about usb so that doesnt have to get enumerated either. but it does search out a second stage bootloader of gpu code, which is really an RTOS that it is loading, the code the GPU uses to do its graphics features to offload the burden on the ARM. In addition to that that software also looks for a third file on the flash (and fourth and nth) lets just go with third kernel.img which it copies to ram (the ddr is shared between the gpu and the arm but with different addressing schemes) at an agreed offset (0x8000 if kernel.img is used without config.txt adjustments to that) the gpu then writes a bootstrap program and ATAGs into arms memory at address zero and then releases reset on the ARM core(s). The GPU is the bootloader, with relatively limited options, but for that platform design/solution one media option, a removable sd card, what operating system, etc you run on the arm is whatever is on that sd card.
I think you will find the lots of straps driving multiple possible non-volatile storage media peripherals being the more common solution. Whether or not one or any of these boot options for a particular SOC can take u-boot (or choose your bootloader or write your own) directly or of a pre-u-boot program is required for any number of reasons (on chip sram is too small for a full u-boot lets say for example, sake of argument) is specific to that boot option for that chip from that vendor and is documented somewhere although if you are not part of the company making the board that signed the NDA with that chip vendor you may not get to see that documentation. And/or as you may know or will learn, that doesnt mean the documentation is good or makes sense. Some companies or products do a bad job, some do a good job and most are somewhere in between. If you are paying them for parts and have an NDA you at least have the option of getting or buying tech support and can ask direct questions (again the answers may not be useful, depends on the company and their support).
Just because there is an ARM inside means next to nothing. ARM makes processor cores not IP, depending on the SOC design it may be easy or moderatly painful but possible to pull the arm out and put some other purchased ip (like MIPS) in there or free ip (like risc-v) and re-use the rest of your already tested design/glue. Trying to generalize ARM based processors is like trying to generalize automobiles with V-6 engines, if I have a vehicle with a V-6 engine does that mean the headlight controls are always on the dash to the left of the steering column? Nope!
Related
Background:
I have a PCI card, which is basically a clock. It gets the time by GPS and saves the current time in a certain register.
Goal:
I want to read a limited number of registers/bytes (for example the current time) over and over again, with the lowest possible latency. (The clock provides very high precision and I think I will loose precision the higher the latency is.). The operating system is RedHat. The programming language is C/C++. I also want to write to the device memory, whereby latency is not an issue.
Possible Ways to go:
I see these ways. If you see another, please tell me:
Writing a Linux kernel module driver, which creates a character device (or one character device for each register to read). Then a user space application can do a "read" on the /dev/ file(s).
DMA
mmap the sysfs resourceX file to user space by a user space application (systemcall). (like here for example)
Write a Linux kernel module driver which implements a mmap file operation.
Questions:
Which is the way with the lowest latency when it comes to the actual reading of the register? I am aware that mmap causes a lot of overhead in the kernel, but as far as I understand that is only for initialisation.
Is way 3 a legit way to go? It looks like a hack to me. How can I determine the /sys/ path automatically from the application?
Is there a difference between way 3 and 4? I am new to PCI driver programming and I think I didn't really understand how way 4 works. I read this (and other chapters of that book), but maybe you can give me a hint or an example. I would appreciate that.
Method 3 or 4 should work fine. There’s no difference between them with respect to latency. Latency would be in the order of 100 ns.
Method 4 would be needed if you need to initialize the device, or control which applications are allowed to access it, or enforce one reader at a time, etc. Method 3 does seem like a bit of a hack because it skips all of this. But it is simpler if you don’t need such things.
A character device is definitely higher latency, because it requires a kernel transition each time the device is read.
The latency of a DMA method depends entirely on how frequently the device writes the time to memory. It is lower latency for the CPU to access memory than MMIO, but if the device only does DMA once a millisecond, then that would be your latency. Also, that method generates a lot of useless DMA traffic, since the CPU would read the value far less often than it is written.
Adding to #prl's answer...
Method 3 seems perfectly legit to me. That's what it's for. You may want to take a look at the kernel documentation file: https://www.kernel.org/doc/Documentation/filesystems/sysfs-pci.txt
You can also use the /sys filesystem to find your device. First, note the vendor ID and device ID for your clock card (and subsystem vendor / device if necessary), then you can easily walk the /sys/devices hierarchy, looking for a matching device (using the vendor, device, etc. special files). Once you've found it, you presumably know which resourceN file to open from the device's data sheet, then mmap it at the appropriate offset and you're done.
That all assumes that your device is configured and enabled already. Typically a PCI device is not enabled to do anything when the system boots. Some driver needs to claim the device, and initialize / configure it. Once that is done, if the time is accessible just by reading a register or two, you can can go with method 3. (I'm not sure: it may be possible for a PCI device to be self-initializing but I've never seen one. I think probably something needs to enable its memory space at the very least. Likely that could be done from user-space if the setup is small enough / simple enough.)
The primary difference with method 4 is that the driver controlling the device would provide support for allowing the area to be mmap'd explicitly. For the user-space application, there is little difference between the two methods aside from the device name used. For method 4, the driver's probably going to provide a symbolic device name /dev/clock0 or something like that for use by the user-space application (and presumably the application then doesn't need to go find the device, it would just know the device file name to open).
From user-space, you will do the mmap operation in much the same way with either method. In method 4, the driver internally supplies the physical address to map -- and possibly the offset -- instead of the generic PCI subsystem doing so, but either way, it's just open + mmap.
Linux driver programming is not terribly difficult, but there's a significant learning curve there if you haven't done it before, so I definitely wouldn't go with method 4 unless there were a real need to do so.
We have a new batch of DDR3 IC used in our mass production custom ARM board, they differ from the old one in term of some memory parameters and most notably the data rate (1600 VS 1866 MT/s).
The funny part is the old bootloader is still bootable on new board, and we have run memory test and our application for more than 72 hours without error, but we're not sure whether the different timing parameters will have any effects in the long run.
So is there any way to differentiate them programmatically? Or what is the best way to program different bootloader apart from inspecting the DDR3 part number manually?
If you have DDR3 DIMM module with a SPD EEPROM, then the EEPROM can be queried over I2C to get the memory's timing parameters.
But likely you have just raw DRAM chips on your board. I know of no way to query a DDR3 DRAM chip for any information such as a part number or timing parameters.
Usually DRAM parameters are hard-coded into the bootloader on embedded ARM devices. How exactly varies greatly between SoCs. Perhaps you can use something like GPIO lines that you have strapped to different values to figure out what version board is running, and then program the proper DRAM parameters based on that? Usually there are a few GPIO lines that are very easy to read and can fit in code that runs before DRAM is configured by the bootloader.
I am reviewing the essentials of I/O, and while I think I understand most of what's going on, I'm still confused as to how either physical addresses or separate ports are mapped to individual devices. Does the computer poll the bus on system boot, assigning addresses to devices one by one, or are there fixed addresses that are loaded into memory somewhere? If this is done via the BIOS, how is this memory layout information relayed to the operating system?
Thanks for your help!
(this question has been asked and answered before, you should search first)
depends on the platform, you were not specific
some systems, some peripherals in those systems, are hardcoded by the chip/system designers.
for pci(e), as defined by that, you enumerate the bus(es) searching for attached peripherals, and those peripherals configuration spaces (which are defined by the peripheral vendor per their needs) indicate how many and how big they need. For an x86 pc, the bios does this enumeration not the operating system. for other platforms it depends on that platform it may be the bootloader or operating system. but someone has to take the available space (basically hardcoded essentially for that platform knowing the platform and what is used already) and divide it up. for x86 it used to be just one gig that was divided up in the 32 bit days, and still happens on some systems, but for 64 bit systems the bioses open that up to 2gig for everyting, and can place that in a high address space to avoid ram (ever wonder why your 32 bit system with 4gig of dram only had 3gig usable?). naturally a flat memory space is only an illusion, the windows asked for by the pci peripherals can be small windows into their space, video cards with lots of ram for example. you use the csrs to move the window about, kind of like standing in your house looking out a small window and physically moving side to side to see more stuff through the window, but only the size of the window at any one time.
same goes for usb, it is enumerated, the busses are searched and the peripherals answer. with usb though it doesnt map into the address space of the host.
how the operating system finds this information is heavily dependent on the type of system. with bios on an x86 there is a known way to get that info, I think you can also get at the same info in dos (yes dos is still heavily used). for non pcie or usb the operating system drivers have to find the peripherals or just know, if the platform is consistent (address of the serial ports in a pc) or have a way of finding them without harming other devices or crashing. there are the cases where the operating system itself did the enumeration. or the bootloader if that is the place where enumeration happened. but each combination of bootloaders and operating systems on top of various platforms may each have their own different solution, no reason to expect them to be the same.
okay you did say bios and have a bios tag, implying x86 systems. the bios does pci/pcie enumeration at boot time, if you dont setup your bios to know that your operating system is 64 bit it may take a gig out of your lower 4Gig space for the pcie devices (and if you set for 64 bit but install a 32 bit operating system, then you are in trouble there for other reasons). I dont remember, but would assume there are bios calls the operating system can use to find out what the bios had done, should not be hard at all to find this information. Anything not discoverable in this way is likely legacy and hardcoded or uses legacy techniques for being discoverable (isa bus style search for a bios across a range of addresses, etc). the pcie/usb vendor and product id information tell the drivers what is there and from that they have hardcoded offsets into those spaces to complete the addresses needed to communicate with the peripherals.
I want to know how an OS kernel defines its own inputs and outputs to make the computer run. Of course you need the right hardware for it to work, but how can you simply just make some variable and call it USB_PORT_1 or something? Is it having to do with firmware as well? Assigning arbitrary values do nothing on its own, so there must be something I am missing between the interaction of the hardware and software when you plug in a 1 terabyte HDD into your USB 3.0 Slot that is marked by the kernel as USB3_PORT_0. At this point there is obviously some stuff going on in the firmware, so what is it?
Reason: I'm making one.
To truely understand the interaction between hardware and software you have to understand how things work at a low level. What is a variable? In a programming language, variables may be assigned values that can later be modified, etc. But where is this stored physically in the machine? The truth is that it can be stored in several places. It could be in one of the processor's registers, it could be in RAM, or it could be in a completely difference place entirely.
When the kernel wishes to communicate with hardware, sometimes it may go through what you call firmware but for the most part it doesn't have to. Hardware exposes itself to the kernel in a variety of ways but the simplest way to think about it is as RAM. RAM is accessable with an address, so 0x1000 is a memory address somewhere in RAM. Speaking generallly, There is no reason that any particular address has to be mapped to RAM. Suppose I have a USB controller. I can map some address (lets call it 0xDEADBEEF) to this memory controller. So, if I read from 0xDEADBEEF it might tell me how many devices are connected to the system. Another adjacent address may tell me which port, etc etc. Every device does this differently, so we have device drivers that tell the kernel how the device is accessed, and then the kernel doesn't have to wory about specific memory addresses or anything, it simply abstracts everything into something called "USB3_PORT_0." The kernel and software simply use this to refer to the device, and the device driver translates that into a set of accesses via memory with interrupts, etc.
It is impossible for me to enumerate the number of ways harware and software can interact, however this should give you an idea of how it is done.
I would like to systematize my U-Boot/linux knowledge. Is it true that minimum 2 bootloader phases are needed in each embedded platform? Or can following process vary?
1st-stage bootloader (can be U-Boot) is stored in internal the processor's ROM and can't be updated. It will run from internal cache memory. This U-Boot needs to (at least): initialize RAM, initialize external flash, initialize serial console, read and run 2nd-stage bootloader.
2nd-stage bootloader (can be U-Boot) is stored in RW flash memory. It will handle ethernet, flash RW functions, etc. This U-Boot can be customized and overwritten. Main task is to load linux kernel into RAM and run it.
linux kernel startup.
Is 1st-stage bootloader always Read-Only?
Where, how that first bootloader is is heavily system dependent. You might have some sort of usb bootable device that enumerates and downloads firmware to ram all in hardware then the processor boots from that ram.
Normally yes the first boot is some sort of flash. It is a good idea to have that first bootloader uber simple, essentially 100% bug free and durable and reliable with perhaps a serial or other way to get in so that you can use it to replace the second/real bootloader.
Ideally the second bootloader wants to be flash as well, the second bootloader would want to do the bulk of the work, initializing ddr, setting up ethernet if it wants to have some sort of ethernet based debugging or transferring of files, bootp, etc. Being significantly larger and more complicated it is expected to both have bugs and need to be upgraded more often than the primary bootloader. The primary is hopefully protected from being overwritten, so that you can comfortably replace the second bootloader without bricking the system.
Do all systems use the above? No, some/many may only use a single bootloader, with there perhaps being a pause very early so that a keystroke on a serial port can interrupt the bootloader taking you to a place where you can re-load the bootloader. Allowing for bootloader development with fewer chances at bricking but still a chance if you mess up that first bit before and including the keystroke and serial flash loader thing. Here again that serial loader thing is not always present, just a convenience for the bootloader developers. Often the fallback will be jtag, or a removable prom or some other system way to get in and reprogram the prom when you brick it (also, sometimes the same way you program it the first time in system when the board is produced, some designs are brickable to save on cost and use pre-programmed flashes during manufacturing so the first boot works).
A linux bootloader does not require any/all of this, a very very minimal, setup ram, prep the command line or atags or whatever and branch to linux.
It is a loaded question as the answer is heavily dependent on your system, processor, design engineers (including you). Traditionally processors boot from flash and the bootloader gets memory and some other things up so the next bit of code can run. That next bit of code can come from many places, usb, disk, flash/rom, ethernet/bootp/tftp, pcie, mdio, spi, i2c, etc. And there can be as many layers between power on reset and linux starting as the design desires or requires.
The first-stage bootloader doesn't have to be read-only - but putting a read-only bootloader in ROM with some recovery mode is helpful in case you corrupt the read-write parts of flash; otherwise you'll need to physically attach a programmer to the flash chip in order to recover.
If you are using U-Boot, the 2nd stage bootloader can be skipped to speed up the boot time. In other words, the first stage bootloader (SPL) will load the Linux kernel directly, skipping the second stage bootloader (u-boot). In U-Boot, this is called the Falcon Mode.