We have a new batch of DDR3 IC used in our mass production custom ARM board, they differ from the old one in term of some memory parameters and most notably the data rate (1600 VS 1866 MT/s).
The funny part is the old bootloader is still bootable on new board, and we have run memory test and our application for more than 72 hours without error, but we're not sure whether the different timing parameters will have any effects in the long run.
So is there any way to differentiate them programmatically? Or what is the best way to program different bootloader apart from inspecting the DDR3 part number manually?
If you have DDR3 DIMM module with a SPD EEPROM, then the EEPROM can be queried over I2C to get the memory's timing parameters.
But likely you have just raw DRAM chips on your board. I know of no way to query a DDR3 DRAM chip for any information such as a part number or timing parameters.
Usually DRAM parameters are hard-coded into the bootloader on embedded ARM devices. How exactly varies greatly between SoCs. Perhaps you can use something like GPIO lines that you have strapped to different values to figure out what version board is running, and then program the proper DRAM parameters based on that? Usually there are a few GPIO lines that are very easy to read and can fit in code that runs before DRAM is configured by the bootloader.
Related
I am trying to understand ARM Linux Boot Process.
These are the things I understood:
When reset button is pressed in any processor, it jumps to the reset vector or address, in case of ARM it is either 0x00 or 0xFFFF0000.
This location contains the start up code or ROM Code or Boot ROM Code
My query is how this Boot ROM Code gets the address of u-boot first instruction ?
It depends on the SoC, and the scheme used for booting will differ from one SoC to the other. It is usually documented in the SoC's reference manual, and it does describe the various conventions (where to read u-boot from, specific addresses) the u-boot port specific to this SoC should follow in order to the code in ROM to be able to load u-boot, and ultimately transfer control to u-boot.
This code in the ROM could do something like: - If pin x is 0, read 64KiB from first sector of the eMMC into the On Chip Static RAM, then transfer control to the code located at offset 256 of the OCRAM for example. - If pin x is 1, configure the UART for 19200 bauds, 8 bits parity, no stop bits, attempt to read 64KiB from the serial port using the X-MODEM protocol into the OCRAM, then transfer control to the code located at offset 256 of the OCRAM.This code, which is often named the Secondary Program Loader (SPL) would then be responsible for, say, configuring the SDRAM controller, then read the non-SPL part of u-boot into at the beginnning of the SDRAM, then jump to a specific address in the SDRAM. The SPL for a given SoC should be small enough to fit in the SoC On Chip RAM. The ROM code would be the Primary Boot Loader in this scenario.
In the case of the TI AM335x Cortex-A8 SoCs for example, section 26.1.6 of the Technical Reference Manual, and more specifically figure 26-10, explains the boot process. Some input pins may be used by the ROM code to direct the boot process - see SYSBOOT Configuration Pins in table 26-7. See The AM335x U-Boot User's Guide for more u-boot specific, AM335x-related information.
ARM doesnt make chips it makes IP that chip vendors purchase. It is one block in their chip design, usually they have many other blocks, usb controllers (likely purchased ip), pcie controller (likely purchased ip), ddr, ethernet, sata, emmc/sd, etc. Plus glue logic, plus whatever their secret sauce is they add to this to make it different and more interesting to the consumer than the competition.
The address space, particularly for the full sized arms is fairly wide open, so even if they use the same usb ip as everyone else, doesnt mean it is at the same address as everyone else.
There is no reason to assume that all chips with a cortex-a are centered around the cortex-a, the cortex-a may just be there to boot and manage the real thing the chip was made for. The chips you are asking about are most likely centered around the ARM processor the purpose of the chip is to make a "CPU" that is ARM based. What we have seen in that market is a need to support various non-volatile storage solutions. Some may wish to be ram heavy and dont care that a slow spi flash is used to get the kernel and root file system over and runtime everything is ram based including the file system. Some may wish to support traditional hard drives as well as ram, the file system being on something sata for example, spinning media or ssd. Some may wish to use eMMC, SD, etc. With the very high cost of chip production, it does not make sense to make one chip for each combination, rather make one chip that supports many combinations. You use several "strap" pins (which are not pins but balls/pads on the BGA) that the customer ties to ground or "high" whatever the definition of that voltage is so that when the chip comes out of reset (whichever of the reset pins for that product are documented as sampling the strap pins) those strap pins tell the "processor" (chip as an entity) how you desire it to boot. I want to you to first look for an sd card on this spi bus if nothing there then look for a sata drive on this interface if nothing there then please fall into the xmodem bootloader on uart0.
This leads into Frant's excellent answer. What IP is in the chip, what possible non-volatile storage is supported and what possible solutions of loading a bootloader if the "chip" itself supports it are very much chip specific not just broadcom does it this way and ti does it another but a specific chip or family of chips within their possibly vast array of products, no reason to assume any two products from a vendor work the same way, you read the documentation for each one you are interested in. Certainly dont assume any two vendors details are remotely the same, it is very likely that they have purchased similar ip for certain technologies (maybe everyone uses the same usb ip for example, or most usb ip conforms to a common set of registers, or maybe not...).
I have not gotten to the arm core, you could in these designs likely change your mind and pull the arm out and put a mips in and sell that as a product...
Now does it make sense to say write logic to read a spi flash that loads the contents of that flash into an internal sram and then for that boot mode put that sram at the arm processors address zero then reset the arm? yes that is not a horrible idea to do that in logic only. But does it make sense for example to have logic dig through a file system of a sata drive to find some bootloader? Maybe not so much, possible sure, but maybe your product will be viable longer if you instead put a rom in the product that can be aimed at the arm address zero the arm boots that, the arm code in that rom reads the straps, makes the decision as to what the boot media is spins up that peripheral (sata, emmc, spi, etc) wades through the filesystem looking for a filename, copies that file to sram, re-maps the arm address space (not using an mmu but using the logic in the chip) and fakes a reset to that by branching to address zero. (the rom is mapped in two places at least address zero then some other address so that it can branch to the other address space allowing address zero to be remapped and reused). so that if down the road you find a bug all you hve to do is change the image burned into the rom before you deliver the chips, rather than spin the chip to change the transistors and/or wiring of the transistors (pennies and days/weeks vs millions of dollars and months). so you may actually never see or be the code that the arm processor boots into on reset. The reset line to the arm core you might never have any physical nor software access to.
THEN depending on the myriad of boot options for this or any of the many chip offerings, the next step is very much specific to that chip and possibly that boot mode. You as owning all the bootcode for that board level product may have to per the chip and board design, bring up ddr, bring up pcie, bring up usb. Or perhaps some chip vendor logic/code has done some of that for you (unlikely, but maybe for specific boot cases). Now you have these generic and popular "boot loaders" like u-boot, you as the software designer and implementer may choose to have code that preceeds u-boot that does a fair amount of the work because maybe porting u-boot is a PITA, maybe not. Also note u-boot is in no way required for linux, booting linux is easy, u-boot is a monstrosity, a beast of its own, the easiest way to boot linux is to not bother porting u-boot. what u-boot gives is an already written bootloader, it is an argument that can go either way is it cheaper to port u-boot or is it cheaper to just roll your own (or port one of the competitors to u-boot)? Depends on the boot options you want, if you want bootp/tftp or anything network stack, well thats a task although there are off the shelf solutions. If you want to access a file system on some media, well that is another strong argument to just use u-boot. but if you dont need all of that, then maybe you dont need u-boot.
You have to walk the list of things that need to happen before linux boots, the chips tend to not have enough on chip ram to hold the linux kernel image and the root file system, so you need to get ddr up before linux. you probably need to get pcie up and enumerated and maybe usb I have not looked at that. but ethernet that can be brought up by the linux driver for that peripheral as an example.
The requirements to "boot" linux on arm ports of linux and probably others are relatively simple. you copy the linux kernel to some space in memory ideally aligned or at an agreed offset from an aligned address (say 0x10001000 for example, just pulling that out of the air), you then provide a table of information, how much ram there is, the ascii kernel boot string, and these days the device tree information. you branch to the linux kernel with one of the registers say r0 pointed at this table (google ATAG arm linux or some such combination of words). thats it booting linux using a not that old kernel is setting a few dozen bytes in ram, copy the kernel to ram, and branch to it, a few dozen lines of code, no need for the u-boot monstrosity. Now it is more than a few dozen bytes but it is still a table generated independent of u-boot, place that in ram, place the kernel in ram set one or more registers to point at the table, branch to the address where the kernel lives "booting linux" is complete or the linux bootloader is complete.
you still have to port linux which is a task that requires lots of trial and error and eventually years of experience. particularly since linux is a constantly evolving beast in and of itself.
How do you get to u-boot code? you may have some pre-u-boot code you have to write to find u-boot and copy it to ram then branch to it. the chip vendor may have solved this for you and "all you have to do" is put u-boot where they tell you for the media choice, and then u-boot is placed at address zero in the arm memory space for some internal sram, or u-boot is placed at some non-zero address in the arm memory space and some magic (a rom based bootloader in the chip) causes your u-boot code to execute from that address.
One I messed with recently is the ti chip used on various beagle boards, black, green, white, pocket, etc...One of the boot modes it looks at a specific offset on the sd card (not part of a file system yet, a specific logical block if you will or basically specific offset in the sd card address space) for a table, that table includes where in the "processors" address space you want the attached "bootloader" to be copied to, is it compressed, etc. you make your bootloader (roll your own or build a u-boot port) you build the correct table per the documentation with a destination address how much data, possibly a crc/checksum, whatever the docs say. the "chip" magically (probably software but might be pure logic) copies that over and causes the arm to start executing at that address (likely arm software that simply branches there). And that is how YOU get u-boot running on that product line with that boot option.
The SAME product line has other strap options, has other sd-card boot options to get a u-boot loaded and running.
Other products from other vendors have different solutions.
The broadcom chip in the raspberry pi, totally different beast, or at least how it is used. It has a broadcom (invented or purchased) gpu in it, that gpu boots some rom based code that knows how to find its own first stage bootloader on an sd card, that first stage loader does things like initialize DDR, there isnt pcie so that doesnt have to happen and I dont think the gpu cares about usb so that doesnt have to get enumerated either. but it does search out a second stage bootloader of gpu code, which is really an RTOS that it is loading, the code the GPU uses to do its graphics features to offload the burden on the ARM. In addition to that that software also looks for a third file on the flash (and fourth and nth) lets just go with third kernel.img which it copies to ram (the ddr is shared between the gpu and the arm but with different addressing schemes) at an agreed offset (0x8000 if kernel.img is used without config.txt adjustments to that) the gpu then writes a bootstrap program and ATAGs into arms memory at address zero and then releases reset on the ARM core(s). The GPU is the bootloader, with relatively limited options, but for that platform design/solution one media option, a removable sd card, what operating system, etc you run on the arm is whatever is on that sd card.
I think you will find the lots of straps driving multiple possible non-volatile storage media peripherals being the more common solution. Whether or not one or any of these boot options for a particular SOC can take u-boot (or choose your bootloader or write your own) directly or of a pre-u-boot program is required for any number of reasons (on chip sram is too small for a full u-boot lets say for example, sake of argument) is specific to that boot option for that chip from that vendor and is documented somewhere although if you are not part of the company making the board that signed the NDA with that chip vendor you may not get to see that documentation. And/or as you may know or will learn, that doesnt mean the documentation is good or makes sense. Some companies or products do a bad job, some do a good job and most are somewhere in between. If you are paying them for parts and have an NDA you at least have the option of getting or buying tech support and can ask direct questions (again the answers may not be useful, depends on the company and their support).
Just because there is an ARM inside means next to nothing. ARM makes processor cores not IP, depending on the SOC design it may be easy or moderatly painful but possible to pull the arm out and put some other purchased ip (like MIPS) in there or free ip (like risc-v) and re-use the rest of your already tested design/glue. Trying to generalize ARM based processors is like trying to generalize automobiles with V-6 engines, if I have a vehicle with a V-6 engine does that mean the headlight controls are always on the dash to the left of the steering column? Nope!
I am reviewing the essentials of I/O, and while I think I understand most of what's going on, I'm still confused as to how either physical addresses or separate ports are mapped to individual devices. Does the computer poll the bus on system boot, assigning addresses to devices one by one, or are there fixed addresses that are loaded into memory somewhere? If this is done via the BIOS, how is this memory layout information relayed to the operating system?
Thanks for your help!
(this question has been asked and answered before, you should search first)
depends on the platform, you were not specific
some systems, some peripherals in those systems, are hardcoded by the chip/system designers.
for pci(e), as defined by that, you enumerate the bus(es) searching for attached peripherals, and those peripherals configuration spaces (which are defined by the peripheral vendor per their needs) indicate how many and how big they need. For an x86 pc, the bios does this enumeration not the operating system. for other platforms it depends on that platform it may be the bootloader or operating system. but someone has to take the available space (basically hardcoded essentially for that platform knowing the platform and what is used already) and divide it up. for x86 it used to be just one gig that was divided up in the 32 bit days, and still happens on some systems, but for 64 bit systems the bioses open that up to 2gig for everyting, and can place that in a high address space to avoid ram (ever wonder why your 32 bit system with 4gig of dram only had 3gig usable?). naturally a flat memory space is only an illusion, the windows asked for by the pci peripherals can be small windows into their space, video cards with lots of ram for example. you use the csrs to move the window about, kind of like standing in your house looking out a small window and physically moving side to side to see more stuff through the window, but only the size of the window at any one time.
same goes for usb, it is enumerated, the busses are searched and the peripherals answer. with usb though it doesnt map into the address space of the host.
how the operating system finds this information is heavily dependent on the type of system. with bios on an x86 there is a known way to get that info, I think you can also get at the same info in dos (yes dos is still heavily used). for non pcie or usb the operating system drivers have to find the peripherals or just know, if the platform is consistent (address of the serial ports in a pc) or have a way of finding them without harming other devices or crashing. there are the cases where the operating system itself did the enumeration. or the bootloader if that is the place where enumeration happened. but each combination of bootloaders and operating systems on top of various platforms may each have their own different solution, no reason to expect them to be the same.
okay you did say bios and have a bios tag, implying x86 systems. the bios does pci/pcie enumeration at boot time, if you dont setup your bios to know that your operating system is 64 bit it may take a gig out of your lower 4Gig space for the pcie devices (and if you set for 64 bit but install a 32 bit operating system, then you are in trouble there for other reasons). I dont remember, but would assume there are bios calls the operating system can use to find out what the bios had done, should not be hard at all to find this information. Anything not discoverable in this way is likely legacy and hardcoded or uses legacy techniques for being discoverable (isa bus style search for a bios across a range of addresses, etc). the pcie/usb vendor and product id information tell the drivers what is there and from that they have hardcoded offsets into those spaces to complete the addresses needed to communicate with the peripherals.
I am running linux on an MX28 (ARMv5), and am using a GPIO line to talk to a device. Unfortunately, the device has some special timing requirements. A low on the GPIO line cannot last longer than 7us, highs have no special timing requirements. The code is implemented as a kernel device driver, and toggles the GPIO with direct register writes rather than going through the kernel GPIO api. For testing, I am just generating 3 pulses. The process is as follows, all in one function so it should fit in the instruction cache:
set gpio high
Save Flags & Disable Interrupts
gpio low
pause
gpio high
repeat 2x more
Restore Flags/Reenable Interrups
Here's the output of a logic analyzer tied to the GPIO.
Most of the time it works just great, and the pulses last just under 1us. However, about 10% of the lows last for many, many microseconds. Even though interrupts are disabled, something is causing the flow of the code to be interrupted.
I am at a loss. RT Linux would likely not help here, because the problem is not latency, it appears to be something happening during the low, even though nothing should interrupt it with the IRQs disabled. Any suggestions would be greatly, greatly appreciated.
The ARM cache on an IMX25 (ARM926) is 16K Code, 16K Data L1 with a 32byte length or eight instructions. With the DDR-SDRAM controller running at 133Mhz and a 16bit bus the transfer rate is about 300MB/s. A cache fill should only take about 100nS, not 9uS; this is about 100 times too long.
However, you have four other issues with Linux.
TLB misses and a page table walk.
Data aborts.
DMA masters stealing.
FIQ interrupts.
It is unlikely that the LCD master is stealing enough bandwidth, unless you have a huge display. Is your display larger than 1/4VGA? If not, this is only 10% of the memory bandwidth and this will pipeline with the processor. Do you have either Ethernet or USB active? These peripherals are higher data rate and could cause this type of contention with SDRAM.
All of these issues maybe avoided by writing your toggler PC relative and copying it to the IRAM. See: iram_alloc.c; this file should be portable to older versions of Linux. The XBAR switch allows fetches from SDRAM and IRAM simultaneously. The IRAM can still be a target of other DMA masters. If you are really pressed, move the code to the ETB buffers which no other master in the system can access.
The TLB miss can actually be quite steep as it may need to run several single beat SDRAM cycles; still this should be under 1uS. You have not posted code, so it is possible that a variable and/or other is causing a data fault which is not maskable.
If you have any drivers using the FIQ, they may still be running even though you have masked the normal IRQ interrupts. For instance, the ALSA driver for this system normally uses the FIQ.
Both the ETB and the IRAM are 32-bit data paths and low wait state. Either one will probably give better response than the DDR-SDRAM.
We have achieved sub micro-second response by using a FIQ and IRAM to toggle GPIOs on an IMX258 with another protocol using bit banging.
One possible workaround to the problem Chris mentioned (in addition to problems with paging of kernel module code) is to use a PWM peripheral where the duration of the pulse is pre-programmed and the timing is implemented in hardware.
Fancy processors with caches are not suitable for hard realtime work. Execution time varies if cache misses are non-deterministic (and designs where cache misses are completely deterministic aren't complicated enough to justify a fancy processor).
You can try to avoid memory controller latency during critical sections by aligning the critical section so that it doesn't straddle cache lines. Or prefetch the code you will need. But this is going to be very non-portable and create a nightmare for future maintenance. And still doesn't protect the access to memory-mapped GPIO from bus contention.
I am seeking help, most importantly from VMEbus experts.
I am working on a project that aims to setup a communication channel from a real-time powerpc controller (Emerson MVME4100), running vxWorks 6.8, to a Linux Intel computer (Xembedded XVME6300), running Debian 6 with kernel 2.6.32.
This channel runs over VME bus; both computers are in a VME enclosure and both use the Tundra Tsi148 chipset. The Intel computer is explicitly configured as the system controller, the real-time computer is explicitly not.
Setup:
For the Intel computer I wrote a custom driver that creates a 4MB kernel buffer, and shares it over the VME bus by means of a slave window;
For the real-time computer I setup a DMA transfer to repeatedly forward blocks of exactly 48640 bytes; filled with bytes of test data (zeros, ones, twos, etc), in quick succession (once every 32 milliseconds, if possible)
For the Intel computer I read the kernel buffer from the driver, to see whether the data arrives correctly, with a hand-started Python program.
Expectation:
I am expecting to see the same data (zeros, ones etc) from the Python program.
I am expecting transmission times roughly corresponding to the chosen bus speed (typically 290 us or 145 us, depending on bus speed), plus a reasonable DMA setup overhead (up to 10us? I am willing to accept larger numbers, say hundreds of usecs, if that is what the bus normally needs)
Result:
Sometimes data does not arrive at all, and "transmission" time is ~2000 us
Sometimes data arrives reliably, but transmission time is ~98270us, or 98470us, depending on the chosen bus speed.
Questions:
How could I make the transmission reliable and bring down these aweful latencies?
What general direction should I search next?
(I would like to tag with VMEbus if I could)
Many thanks
My comments on the question describe how I got the bus working:
- ensure 2eSST320 on both sides of the bus
- ensure that the DMA transaction used a valid block size (the largest valid was 4096 bytes)
I achieved an effective speed of 150MBytes/s (the bus can achieve 320MBytes/s but the tsi148 chip is known for causing significant overhead). This is good enough for me.
I would like to systematize my U-Boot/linux knowledge. Is it true that minimum 2 bootloader phases are needed in each embedded platform? Or can following process vary?
1st-stage bootloader (can be U-Boot) is stored in internal the processor's ROM and can't be updated. It will run from internal cache memory. This U-Boot needs to (at least): initialize RAM, initialize external flash, initialize serial console, read and run 2nd-stage bootloader.
2nd-stage bootloader (can be U-Boot) is stored in RW flash memory. It will handle ethernet, flash RW functions, etc. This U-Boot can be customized and overwritten. Main task is to load linux kernel into RAM and run it.
linux kernel startup.
Is 1st-stage bootloader always Read-Only?
Where, how that first bootloader is is heavily system dependent. You might have some sort of usb bootable device that enumerates and downloads firmware to ram all in hardware then the processor boots from that ram.
Normally yes the first boot is some sort of flash. It is a good idea to have that first bootloader uber simple, essentially 100% bug free and durable and reliable with perhaps a serial or other way to get in so that you can use it to replace the second/real bootloader.
Ideally the second bootloader wants to be flash as well, the second bootloader would want to do the bulk of the work, initializing ddr, setting up ethernet if it wants to have some sort of ethernet based debugging or transferring of files, bootp, etc. Being significantly larger and more complicated it is expected to both have bugs and need to be upgraded more often than the primary bootloader. The primary is hopefully protected from being overwritten, so that you can comfortably replace the second bootloader without bricking the system.
Do all systems use the above? No, some/many may only use a single bootloader, with there perhaps being a pause very early so that a keystroke on a serial port can interrupt the bootloader taking you to a place where you can re-load the bootloader. Allowing for bootloader development with fewer chances at bricking but still a chance if you mess up that first bit before and including the keystroke and serial flash loader thing. Here again that serial loader thing is not always present, just a convenience for the bootloader developers. Often the fallback will be jtag, or a removable prom or some other system way to get in and reprogram the prom when you brick it (also, sometimes the same way you program it the first time in system when the board is produced, some designs are brickable to save on cost and use pre-programmed flashes during manufacturing so the first boot works).
A linux bootloader does not require any/all of this, a very very minimal, setup ram, prep the command line or atags or whatever and branch to linux.
It is a loaded question as the answer is heavily dependent on your system, processor, design engineers (including you). Traditionally processors boot from flash and the bootloader gets memory and some other things up so the next bit of code can run. That next bit of code can come from many places, usb, disk, flash/rom, ethernet/bootp/tftp, pcie, mdio, spi, i2c, etc. And there can be as many layers between power on reset and linux starting as the design desires or requires.
The first-stage bootloader doesn't have to be read-only - but putting a read-only bootloader in ROM with some recovery mode is helpful in case you corrupt the read-write parts of flash; otherwise you'll need to physically attach a programmer to the flash chip in order to recover.
If you are using U-Boot, the 2nd stage bootloader can be skipped to speed up the boot time. In other words, the first stage bootloader (SPL) will load the Linux kernel directly, skipping the second stage bootloader (u-boot). In U-Boot, this is called the Falcon Mode.