u-boot hangs on soft reboot - linux

I'm having this subtle issue where if I put my ARM device (U-boot + Linux) under soft reboot cycle (stress test), it fails after 100+ cycles. The serial output I capture in failed scenario is:
...
g_txrx_mode=1
g_profileid=1
id=0x1F11 board_type=0x0004 HAS_POE_SUPPORT=1
Not POE
read_rbf_header_from_ext4 - filename = e30.core.rbf filesize = 7317252
cff_from_mmc_ext4:writing e30.core.rbf length 13 num_files 0
Full Configuration Succeeded.
crestron_load_rbf: use core e30.core.rbf length 13 rval 1
Booting from primary
Writing to MMC(0)... done
dram_init: id 1f11 (id & 0x0001) 1 has_dsp/has_dante0
DDRCAL: Success
INFO : Skip relocation as SDRAM is non secure memory
Reserving 2048 Bytes for IRQ stack at: ffe2f708
DRAM : 512 MiB
On a successful reboot, next printed lines are:
WARNING: Caches not enabled
MMC: In: serial
Out: serial
Err: serial
it seems it failed between 'skip_relocation()' and 'enable_caches()'. but why after 100+ attempts? Could it be memory issue? Memory timing issue? And how can I debug it?

Related

Why is my eeprom loop device spitting kernel errors?

I'm working on an embedded linux project using PetaLinux and running on kernel 5.4.0-xilinx-v2020.1. We have multiple apps that needs to write their ~200 bytes config in a file every second for years at a time so we chose to include an 8kB FRAM on the board. The number of apps running isn't fixed, nor is the size of the data each will need to write so a filesystem seems like the easiest, user-friendliest and most transparent solution.
I've found the littlefs-fuse project and I've tested it on my workstation with a 8k loop device, everything is working as expected.
I'm currently in the integration phase and I'm having some issues creating the littlefs filesystem on the FRAM, here's how I'm hoping things would work:
The FRAM is setup in the device-tree as a 24c64-eeprom-compatible i2c device
Linux loads the at24 as expected and create the /sys/bus/i2c/devices/x-xxxx/ folder
I can read and write using bash commands to the /sys/bus/i2c/devices/x-xxxx/eeprom file
I create a loop device using the eeprom file
I use the littlefs-fuse project to create my file system and mount it
I'm currently stuck at the end of part 4, I created a loop device with the eeprom file using losetup -fP /sys/bus/i2c/devices/x-xxxx/eeprom and it works fine without any error but I can't access any data on it.
Here's the error message I get when I try to cat the file
# cat /dev/loop0
cat: read error: Input/output error
And here's what I can find with dmesg concerning the error:
[11950.780699] print_req_error: 5 callbacks suppressed
[11950.780706] blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[11950.795985] blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[11950.805985] buffer_io_error: 3 callbacks suppressed
[11950.805990] Buffer I/O error on dev loop0, logical block 0, async page read
[11950.817854] blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[11950.827846] Buffer I/O error on dev loop0, logical block 0, async page read
It's completly possible I'm try to fit a square peg in a round hole here but as far as I know, any file is supposed to work with a loop device so I'm confused as to why it's not working as I expect

The begining and end adress of the Flash memory

I am trying to run linux on an Arduino Yun board. The Arduino board contains an Atheros AR9331 chipset
On U-Boot These are the steps I am doing:
1- Download the kernel :
ar7240> tftp 0x80060000 openwrt-ar71xx-generic-uImage-lzma.bin;
Load address: 0x80060000
Loading: #################################################################
#################################################################
#################################################################
#################################################################
######################
done
Bytes transferred = 1441863 (160047 hex)
2- Erase Flash in order to copy the kernel:
ar7240> erase 0x9fEa0000 +0x160047
Error: end address (0xa0000046) not in flash!
Bad address format
This is the problem It seems that 0x9fEa0000 +0x160047 exceeds the total size of the flash.
So my questions are:
1- How can I figure the total amount of memory reserved for the flash in Uboot (From wich address it starts and ends), I am thinking about changing 0x9fEa0000 by a fewer address but i'am afraid i can harm other things
This is the output of the help:
ar7240> help
? - alias for 'help'
boot - boot default, i.e., run 'bootcmd'
bootd - boot default, i.e., run 'bootcmd'
bootm - boot application image from memory
cp - memory copy
erase - erase FLASH memory
help - print online help
md - memory display
mm - memory modify (auto-incrementing)
mtest - simple RAM test
mw - memory write (fill)
nm - memory modify (constant address)
ping - send ICMP ECHO_REQUEST to network host
printenv- print environment variables
progmac - Set ethernet MAC addresses
reset - Perform RESET of the CPU
run - run commands in an environment variable
setenv - set environment variables
tftpboot- boot image via network using TFTP protocol
version - print monitor version
2- Is there someone experienced with Atheros AR9331 chipset who can help me find the Flash mapping (From where it starts and ends) from the datasheet
You can determine the flash layout from the kernel boot command line. Either run the printenv command in u-boot or boot into the existing kernel and look through the boot log. You need to find something like the following:
(There are plenty of guides on the internet, I took this one from https://finninday.net/wiki/index.php/Arduino_yun, your board may or may not be the same).
linino> printenv
bootargs=console=ttyATH0,115200 board=linino-yun mem=64M rootfstype=squashfs,jffs2 noinitrd mtdparts=spi0.0:256k(u-boot)ro,64k(u-boot-env)ro,14656k(rootfs),1280k(kernel),64k(nvram),64k(art),15936k#0x50000(firmware)
bootcmd=bootm 0x9fea0000
This means there are the following partitions:
u-boot 0 to 256K (0x0 - 0x40000)
u-boot-env 256k to 320k (0x40000 - 0x50000)
rootfs (squashfs) 320k to 14976k (0x50000 - 0xea0000)
kernel 14976k to 16256k (0xea0000 - 0xfe0000)
nvram 16256k to 16320k (0xfe0000 - 0xff0000)
art 16320k to 16384k (0xff0000 - 0x1000000)
The rootfs partition is 14M, which is much larger than the rootfs image file (less than 8MB) so in theory you can move the kernel image at a lower address. For this you will need to modify the kernel boot line in the u-boot environment block (rootfs aand kernel partition sizes) and the bootcmd parameter so the u-boot know where the new kernel is located.
Flash is mapped to 0x9f000000 so the value in the bootcmd should be 0x9f000000 + the offset of the kernel in bytes.
What I am not sure about is if there is an overlay filesystem for any persistent changes to the flash. Can you boot into the existing system and post the output of df -h and cat /proc/mounts?

DMA error on start without AC

I installed funtoo on surface pro 2. All works good except situation when you're booting tablet on battery power. In this case I'm getting error bellow every 20 sec and tablet doesn't react on keypress, touch, doesn't log anything. fsck says there is no errors with disk.
EH complete
exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen
SError: {.PHYRdyChg CommWake }
failed command: WRITE DMA
cmd ca/00:20:f0:0f:c4/00::00:00:00:00:e3 tag 15 dma 16384 out
res 40/00:00:00:00:00:/00:00:00:00:00/00 Emask 0x4 (timeout)
statys: {DRDY}
Kernel: sys-kernel/debian-sources 4.8.15
This looks a power saving issue
If you have TLP try to disable it here /usr/sbin/tlp.
Comment out
# set_sata_link_power $1
More info you can find in following discussion
(I know this is mac related discussion but can be useful for finding a solution for your setup)

Understanding Linux load address for U-Boot process

I'm trying to understand embedded Linux principles and can't figure out addresses at u-boot output.
For example, I have UDOO board based on i.MX6 quad processor and I got following output from U-Boot:
U-Boot 2013.10-rc3 (Jan 20 2014 - 13:33:34)
CPU: Freescale i.MX6Q rev1.2 at 792 MHz
Reset cause: POR
Board: UDOO
DRAM: 1 GiB
MMC: FSL_SDHC: 0
No panel detected: default to LDB-WVGA
Display: LDB-WVGA (800x480)
In: serial
Out: serial
Err: serial
Net: using phy at 6
FEC [PRIME]
Warning: FEC MAC addresses don't match:
Address in SROM is 00:c0:08:88:a5:e6
Address in environment is 00:c0:08:88:9c:ce
Hit any key to stop autoboot: 0
Booting from mmc ...
4788388 bytes read in 303 ms (15.1 MiB/s)
## Booting kernel from Legacy Image at 12000000 ...
Image Name: Linux-3.0.35
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 4788324 Bytes = 4.6 MiB
Load Address: 10008000
Entry Point: 10008000
Verifying Checksum ... OK
Loading Kernel Image ... OK
Starting kernel ...
I don't understand the value of Load address 0x10008000. According to documentation for this particular processor, at address zone 0x10000000 - 0xffffffff is mapped main memory. But what is 0x8000 offset? I can't figure out reason for this value.
I also don't understand address 0x12000000, where the kernel image is loaded from. Is there mapped memory region for SD card?
Please, can you give me some explanation for these addresses or even better, some references to resources about this topic. My goal is to learn how to port u-boot and Linux kernel to another boards.
Thank you!
If you check the environment variables of the u-boot, you will find that kernel image is copied from boot device to the RAM location(Here, 12000000) through command like fatload.
Now, This is not the LOADADDRESS. You give LOADADDRESS to command line while compiling the kernel, This address is mostly at 32K offset from start of the RAM in Physical address space of the processor.
Your RAM is mapped at 10000000 and kernel LOADADDRESS is 10008000(32K offset). bootm command uncompress the kernel image from 12000000 to 10008000 address and then calls the kernel entry point.
check out include/configs folder. It contains all the board definitions
i.MX uboot include/configs
To port uboot to another port, base on a very similar board and modify from there.

ALIX 2D13, linux kernel error "serial number revalidation" using Compact Flash and Harddisk

i'm building a linux based firmare for an alix 2d13 using crosstools-ng, for the toolchain, buildroot, for the root filesystem, and vanilla kernel ... for the kernel.
I need to use compact flash and harddisk but, when i connect them to the alix, i get a really weird error:
[ 1.072380] ata1.00: CFA: CF Card, Ver2.34, max UDMA/100
[ 1.077738] ata1.00: 7880544 sectors, multi 0: LBA
[ 1.082670] ata1.00: limited to UDMA/33 due to 40-wire cable
[ 1.096260] ata1.00: serial number mismatch '6EB10703040700582043' != '6EB1p703040700582043'
[ 1.104738] ata1.00: revalidation failed (errno=-19)
[ 1.109740] ata1.00: limiting speed to UDMA/33:PIO3
.
.
.
[ 6.209775] ata1.00: serial number mismatch '6EB10703040700582043' != '6EB1p703040700582043'
[ 6.218324] ata1.00: revalidation failed (errno=-19)
[ 6.222235] ata1.00: disabled
All works fine if i detach the harddisk from alix. hdparm output is:
Model=CF Card , FwRev=Ver2.34 , SerialNo=6EB10703040700582043
Config={ HardSect NotMFM Fixed DTR>10Mbs }
RawCHS=7818/16/63, TrkSize=32256, SectSize=512, ECCbytes=4
BuffType=(2) DualPort, BuffSize=1kB, MaxMultSect=1, MultSect=?1?
CurCHS=7818/16/63, CurSects=7880544, LBA=yes, LBAsects=7880544
IORDY=yes, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 *udma2
AdvancedPM=no
And alix configuration is
(C) CHS mode L LBA mode W HDD wait V HDD slave U UDMA enable
I tried using pata_amd and pata_cs5536 but the result is the same.
Full kernel output is here
http://pastebin.com/7wcvEdRG
You have a hardware problem or serious misconfiguration, where bits are getting scrambled between the device and host.
When the kernel tries to read the serial number from the drive (using an ATA identify device command), one of the bytes gets a bit flipped. Note that the bad character 'p' (0x70) is only one bit different from '0' (0x30).
It's likely that scrambling identify data is the least of your problems-- read/write data is probably unreliable as well.
If this only happens when you have two devices attached to the same ribbon cable, either one of two things is true:
The two devices are causing signal integrity problems when sharing the bus. Check all of your jumper settings. If everything else is correct, either find another device, a better cable, or give up and don't put them on the same cable.
Your kernel is misconfiguring the ATA controller when both devices are present.

Resources