Load binary into stm32f103c8t6 with OpenOCD and arm-none-eabi-gdb

Load binary into stm32f103c8t6 with OpenOCD and arm-none-eabi-gdb - rust

I tried to load binary that is compiled from rust code, but it doesn't work.
First, I downloaded Rust code from https://github.com/rust-embedded/discovery.
Then, I built it.
# I am in the `src/05-led-roulette` directory
rustup target add thumbv7m-none-eabi
cargo build --target thumbv7m-none-eabi
It was successfully compiled.
After that, I successfully connected with stm32f103c8t6 using OpenOCD.
Then, I run this command.
arm-none-eabi-gdb -q target/thumbv7m-none-eabi/debug/led-roulette
But it seemed like it didn't finish reading.
Reading symbols from target/thumbv7m-none-eabi/debug/led-roulette...
(gdb)
(not done?!)
After that, I tried loadcommand, but it returned following sentences.
Start address 0x0, load size 0
Transfer rate: 0 bits in <1 sec.
I have no idea about why it doesn't work.
Please help me.

First see if your binary is good, then try telnet, then gdb. Rust also multiplies the odds of failure, so start with something simple:
so.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
ldr r0,some_addr
ldr r1,[r0]
add r1,r1,#1
str r1,[r0]
b .
.align
some_addr: .word 0x20000000
build it
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld -Ttext=0x08000000 so.o -o so.elf
arm-none-eabi-objdump -D so.elf
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 :
8000008: 4802 ldr r0, [pc, #8] ; (8000014 <some_addr>)
800000a: 6801 ldr r1, [r0, #0]
800000c: 3101 adds r1, #1
800000e: 6001 str r1, [r0, #0]
8000010: e7fe b.n 8000010 <reset+0x8>
8000012: 46c0 nop ; (mov r8, r8)
08000014 <some_addr>:
8000014: 20000000 andcs r0, r0, r0
for small programs (Read the st documentation) this can be based at address 0x08000000 or 0x00000000 for this part. 0x08000000 is preferred. The vector table must be first in this case ignore the disassembly just look at the values
8000000: 20001000 andcs r1, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
The 0x08000009 is the reset address ORRed with one. so 0x08000008 | 1 is 0x08000009. So that will at least boot and try to fetch code without a fault.
This code simply reads the word at address 0x20000000 and increments it, sram is not affected by a reset so we can keep resetting and seeing that value increment.
using whatever configs you have and interface, I combine the openocd one for the st part into a single file and carry that with the project along with ones for the various interfaces (stlinks of different versions and jlink).
openocd -f jlink.cfg -f target.cfg
Open On-Chip Debugger 0.9.0 (2019-04-28-23:34)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : JLink SWD mode enabled
swd
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
none separate
cortex_m reset_config sysresetreq
Info : J-Link ARM-OB STM32 compiled Jun 30 2009 11:14:15
Info : J-Link caps 0x88ea5833
Info : J-Link hw version 70000
Info : J-Link hw type J-Link
Info : J-Link max mem block 15344
Info : J-Link configuration
Info : USB-Address: 0x0
Info : Kickstart power on JTAG-pin 19: 0x0
Info : Vref = 3.300 TCK = 1 TDI = 1 TDO = 1 TMS = 1 SRST = 1 TRST = 1
Info : J-Link JTAG Interface ready
Info : clock speed 1000 kHz
Info : SWD IDCODE 0x1ba01477
Info : stm32f1x.cpu: hardware has 6 breakpoints, 4 watchpoints
If you don't see the watchpoints line if it returns to the console, it didn't work.
In another window
telnet localhost 4444
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
>
Now let's stop the chip and write our program. The psr, pc, etc values may be different depending than mine depending on what you had running.
> reset halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000010 msp: 0x20001000
> flash write_image erase so.elf
auto erase enabled
device id = 0x20036410
flash size = 64kbytes
wrote 1024 bytes from file so.elf in 0.437883s (2.284 KiB/s)
Let's read it and see that it is there, should match the words from the disassembly
> mdw 0x08000000 20
0x08000000: 20001000 08000009 5000f04f 31016801 e7fe6001 ffffffff ffffffff ffffffff
0x08000020: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
0x08000040: ffffffff ffffffff ffffffff ffffffff
Assume this is random garbage and that is fine so long as we see it increment.
> mdw 0x20000000
0x20000000: 2e006816
> reset
> halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000012 msp: 0x20001000
> mdw 0x20000000
0x20000000: 2e006817
So the value incremented if you do a reset, then do a halt (not a reset halt in one command) then dump that memory location it should keep incrementing every time.
Now you can choose to take the gdb path (I don't have a use for gdb so don't have one installed) with this binary or examine your rust binary by first examining the vector table to see it is correct, without at least the reset vector being correct then you will fault and not run any code on the processor. Can flash it using telnet or you can try gdb.
If gdb is having a problem with the file then perhaps you are using the wrong file. or the file is incorrectly built. did you try a simple program in that repository? can you make a minimal program from that repository, an empty entry function or an infinite loop or a counter that counts forever?
Is this truly a gdb problem? Is this an openocd problem? Is this a Rust tools problem? Is this a Rust binary problem? Is this a bug in the docs and you are pointing gdb at the wrong file problem? If the above works then openocd works, binutils at least works, the debugger/hardware works, it eliminates those and then becomes is this a rust thing, a gdb thing, using the wrong file thing, or something else?

After connecting openocd with the board don't forget to connect the debugger
arm-none-eabi-gdb with openocd.
> arm-none-eabi-gdb -se target/thumbv7em-none-eabi/release/your_binary
(gdb) target remote localhost:3333
If all is OK in the terminal console where openocd is running you will see the message:
accepting 'gdb' connection on tcp/3333`
and you should be able to start debugging.
To optimize connection setup you may create/update the .gdbinit file with the content:
target remote localhost:3333

Related

Qemu AARCH64 of Raspi3 and smp 4, all cores are suspended except core/thread 0

I am trying to enable multicore features on qemu of raspi3 which I did similar thing on Qemu.riscv. But when I run the qemu command, all other cores seem suspended except core 0 show as followings:
The command I am using is
qemu:
qemu-system-aarch64 \
-M raspi3b \
-m 1024M \
-smp 4 \
-S -s \
-kernel kernel8.img \
-serial stdio
Initial code I have is like this
_start:
//single core: only hart 0 jump to kernel
mrs x0, mpidr_el1
and x0, x0, #0b11
cbz x0, _core_0
//cbz x0, 1f
other_cores:
wfe
b other_cores
_core_0:
mrs x0, CurrentEL
and x0, x0, #0b1100
cmp x0, #0b1000
......
(gdb) info threads
Id Target Id Frame
1 Thread 1.1 (CPU#0 [running]) main () at main.c:44
2 Thread 1.2 (CPU#1 [running]) 0x000000000000030c in ?? ()
3 Thread 1.3 (CPU#2 [running]) 0x000000000000030c in ?? ()
4 Thread 1.4 (CPU#3 [running]) 0x000000000000030c in ?? ()
qemu-system-aarch64 -version
QEMU emulator version 6.2.0
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
Could you give me some guidance how to enable multicore on qemu?
Thanks in advance

Multicore is enabled, but since you have passed your guest image as '-kernel' QEMU is booting it in the way the Linux kernel wants to be booted, which is to say that only a single core is started running the guest image code, and the other three sit in a spinloop in a bit of boot code that QEMU provides, waiting for the kernel to poke the known addresses in RAM with the addresses for the secondary cores to jump to.
You need to either (a) look up the Linux kernel boot protocol for raspberry pi, and have your guest code do the right thing to make the secondary cores leave the spinloop, or (b) use one of the other ways to get QEMU to load guest code (eg the generic loader or the -bios option), in which case QEMU will just start all four cores running your code.
Generally speaking, SMP boot protocol for different architectures can be pretty significantly different, and sometimes it differs across machine types too, so you should check documentation for the hardware you're emulating.
(Consider also whether 'raspi3' is really the board type you want to use.)

replace function offset with source code line number in output of "adb dumpsys meminfo --unreachable"

To find memory leaks of the native C code of my app I run the following command
adb shell dumpsys meminfo --unreachable $(adb shell pidof com.mycompany.myapp)
I get the following output
6 bytes unreachable at ee1551b0
and 654 similar unreachable bytes in 109 allocations
contents:
ee1551b0: 31 2e 30 2e 30 00 1.0.0.
#00 pc 000421e4 /apex/com.android.runtime/lib/bionic/libc.so (malloc +68)
#01 pc 00126670 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction7 +80)
#02 pc 0011d2b9 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so
#03 pc 0011d025 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction6 +165)
#04 pc 0011e82c /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction5 +892)
#05 pc 00120ea7 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so
#06 pc 0012091c /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so
#07 pc 00120144 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so
#08 pc 0011eebe /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so
#09 pc 0011e6d4 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction4 +548)
#10 pc 00121924 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction3 +68)
#11 pc 00121743 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction2 +99)
#12 pc 001221c9 /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (myCfunction1 +409)
#13 pc 00121f4d /data/app/~~XX==/com.mycompany.myapp-W6-xx==/lib/x86/mylib.so (Java_com_mycompany_myapp_myjavaclass_myjavamethod +333)
I'm looking for a simple way to replace offset functions (+333, +409, +99, ...) by line numbers of the source code.
I do this kind of stuff by using ndk-stack on tombstone files. Perhaps there is a tool able to decipher the output of 'dumpsys meminfo --unreachable'.

I've found a simple solution.
I attach the debugger to the process, set a breakpoint and move to the lldb console of android studio.
To set a breapoint to a given offset after a function, I enter the command given here
https://stackoverflow.com/a/23250892/9459302
For instance
b -a (void*()(void, size_t)) myCfunction1 +400
where (void*(*)(void*, size_t)) is the prototype of myCfunction1

Rust embedded application is not linked correctly under AArch64 system

I'm trying to compile and debug an embedded rust application for stm32f0 using an ARM system as host.
The application already compiles and works under an Intel installation.
I am running on a Pinebook Pro, powered by a Quad Cortex-A53, 64-bit CPU. The OS is a 64-bit version of Debian:
$ uname -a
Linux pinebook 4.4.196 #1 SMP Tue Oct 15 16:54:21 EDT 2019 aarch64 GNU/Linux
I installed rust and cargo with rustup for AArch64 (channel stable):
$ rustc --version
rustc 1.39.0 (4560ea788 2019-11-04)
$ cargo --version
cargo 1.39.0 (1c6ec66d5 2019-09-30)
As per this issue I found out that rust-lld is not distributed in binary form for ARM systems, so I had to compile it from sources:
$ ld.lld --version
LLD 10.0.0 (https://github.com/llvm/llvm-project.git 1c247dd028b368875bc36cd2a9ccc7fd90507776) (compatible with GNU linkers)
Now the compilation process completes without issues:
export RUSTFLAGS="-C linker=ld.lld"
cargo build
However the resulting elf file seems to be linked incorrectly: trying to load it with gdb through openocd results in some kind of silent failure:
(gdb) target remote :3333
Remote debugging using :3333
0x00000000 in ?? ()
(gdb) load
Start address 0x0, load size 0
Transfer rate: 0 bits in <1 sec.
(gdb)
The load size is empty, so no new program is flashed. In contrast, when using the elf compiled in my Intel system (with openocd still running on the arm laptop) everything works as expected:
(gdb) target remote 192.168.1.153:3333
Remote debugging using 192.168.1.153:3333
0x00000000 in ?? ()
(gdb) load
Loading section .vector_table, size 0xc0 lma 0x8000000
Loading section .text, size 0x686e lma 0x80000c0
Loading section .rodata, size 0x4a0 lma 0x8006940
Start address 0x8005b58, load size 28110
Transfer rate: 19 KB/sec, 7027 bytes/write.
(gdb)
It would seems like the elf is not linked correctly. Running readelf -l highlights that on my ARM system the entry point set is 0x0, which is wrong for the stm32f0.
This is readelf on my ARM laptop:
lf file type is EXEC (Executable file)
Entry point 0x0
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00010034 0x00010034 0x00060 0x00060 R 0x4
LOAD 0x000000 0x00010000 0x00010000 0x00094 0x00094 R 0x1000
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0
Section to Segment mapping:
Segment Sections...
00
01
02
While this is from the elf that works, compiled under my Intel system:
Elf file type is EXEC (Executable file)
Entry point 0x8005b59
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x08000000 0x08000000 0x06de0 0x06de0 R E 0x1000
LOAD 0x007de0 0x20000000 0x08006de0 0x00000 0x00028 RW 0x1000
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0
Section to Segment mapping:
Segment Sections...
00 .vector_table .text .rodata
01 .data .bss
02
I'm not sure if this has something to do with the target architecture being the same as the host (thus using the linux userland linking) or simple because arm systems are less supported.
Can anyone point me in the right direction?

I was correct in assuming there was a problem with the linker, and there are a couple of solutions.
Since two years ago Rust uses LLD as the default linker for the ARM architecture (https://rust-embedded.github.io/blog/2018-08-2x-psa-cortex-m-breakage/). Unfortunately rust-lld itself is not distributed in binary form for the ARM platforms (ironic, isn't it?), so I had to compile it from source and specify it via command line.
Exporting the RUSTFLAGS variable works but overwrites its default value defined in .cargo/config, which would include also the directive for the linker script (-C link-arg=-Tlink.x). In short I was convinced of using the correct linker script because it was listed in .cargo/config, but the RUSTFLAGS env variable was removing it.
The solution is to either
include the linker script explicitly when exporting RUSTFLAGS:
export RUSTFLAGS="-C linker=ldd -C link-arg=-Tlink.x"
specify "-C", "linker=lld" as a rust flag in the .cargo/config file with the other options
Enable the old linker (arm-none-eabi-ld) which is more easily retrievable by uncommenting the following line in .cargo/config: "-C", "linker=arm-none-eabi-gcc"

You have to create raw binary image by using a linker srcipt dedicated to that board.
https://github.com/szczys/stm32f0-discovery-basic-template/tree/master/Device/ldscripts

SIGSEGV when executing 32 bit binary on 64 bit Linux

You may have heard of StoneKnifeForth, a project by kragen: https://github.com/kragen/stoneknifeforth. It's a Python program that acts as a small Forth interpreter and a Forth program that acts as a Forth compiler. Therefore you can build a Forth compiler binary using those two in unison.
After porting StoneKnifeForth to C++ (https://github.com/tekknolagi/stoneknifecpp), I noticed that all binaries produced by StoneKnifeForth (either variety) segfault on 64 bit Linux. That is, if you clone stoneknifecpp and run:
make
./l01compiler # produced by the Forth program
You'll get the following:
willow% ./l01compiler
[1] 31614 segmentation fault ./l01compiler
This isn't a very interesting error message, obviously, so I thought I would strace it:
willow% strace ./l01compiler
execve("./l01compiler", ["./l01compiler"], [/* 110 vars */]) = -1 EPERM (Operation not permitted)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
[1] 31615 segmentation fault (core dumped) strace ./l01compiler
And got... somewhat more information. It looks like the ELF header is wrong somehow, except for the following two interesting tidbits:
It runs fine under 32 bit qemu
It runs fine if I sudo ./l01compiler
I'm a bit of a loss as to why this is, even after internet searching for possible differences between ELF header formats between 32 bit and 64 bit Linux kernels, etc. If anyone has any information, I would be delighted.
I have attached the header below:
willow% readelf -h l01compiler
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x1e39
Start of program headers: 52 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 0
Section header string table index: 0

Big thanks to Tom Hebb (tchebb) for confirming my suspicions and figuring this out. As can be seen in this commit, the problem was that the origin address was simply too low. It's not related to 32 bit or 64 bit, but instead earlier kernel vs newer kernel.
The newer kernels have increased the vm.mmap_min_addr sysctl parameter, meaning that the old origin would prohibit the program from starting at all. That explains why sudo worked. As Tom explained, "Unless you invoke qemu with KVM support, qemu is an emulator not a hypervisor, so it simulates the entire address space and virtual memory subsystem in software and presumably doesn't impose any load address restrictions."

Compiling driver for ARMv7 vs ARMv5

I've managed to compile a driver for an ARM based device, but the driver crashed when I try to load it.
here is the output from cpuinfo:
Processor : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 999.42
Features : swp half thumb fastmult vfp edsp neon vfpv3
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2
Here is the uname -r output
2.6.37
modinfo driver.ko
filename: cp210x.ko
description: Silicon Labs CP210x RS232 serial adaptor driver
license: GPL
vermagic: 2.6.37 mod_unload ARMv7
vermagic: 2.6.37 mod_unload modversions ARMv5
parm: debug:Enable verbose debugging messages
As you can I've added an extra vermagic (2.6.37 mod_unload ARMv7) so it will match the target system.
So if I understand this correct, I've compiled this module for an ARMv5 cpu, while the target is v7. Could this be the cause of the device driver crashing?
The device has this driver, but its embedded into an other driver package from the hw producer. This package also load some drivers that we cannot use. This driver package is not load, but I guess this indicate that this driver should work on this hardware some how.
here is the crash log
modprobe cp210x.ko
Unable to handle kernel NULL pointer dereference at virtual address 0000000a
pgd = ca1fc000
[0000000a] *pgd=870dd031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in: dahdi_dummy dahdi cmemk syslink ipt_MASQUERADE nf_nat iptable_filter ip_tables ipt_LOG xt_state nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xt_recent xt_mac xt_limit work_led reset_button ipv6
CPU: 0 Not tainted (2.6.37 #1)
PC is at sys_init_module+0xfe0/0x1460
LR is at sys_init_module+0xe7c/0x1460
pc : [<c00836e8>] lr : [<c0083584>] psr: 20000013
sp : cc5e9ed0 ip : bf3828dc fp : cc5e8000
r10: bf385ca8 r9 : cf3bcb4e r8 : 000000c5
r7 : 00000027 r6 : bf382544 r5 : bf38266c r4 : bf385ca8
r3 : 00000000 r2 : c7c9f000 r1 : 0000000a r0 : 0000000a
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: 8a1fc019 DAC: 00000015
Process modprobe (pid: 2676, stack limit = 0xcc5e82e8)
Stack: (0xcc5e9ed0 to 0xcc5ea000)
9ec0: bf382544 00000001 000ac048 bf382550
9ee0: 000000c5 cf3bd5a4 cf3b8000 000055f4 cf3bd20c cf3bd128 cf3bc2a0 c7c9f000
9f00: 0000266c 000028dc 00000000 00000000 00000017 00000018 00000010 0000000d
9f20: 00000009 00000000 6e72656b 00006c65 00000000 00000000 00000000 00000000
9f40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9f60: 00000000 00000000 c9a19540 00000000 ca2403c0 00000006 c9a19540 00000000
9f80: ca2403c0 000055f4 00000000 00000006 00000080 c0037c28 cc5e8000 00000000
9fa0: 00000001 c0037a80 000055f4 00000000 000ac998 000055f4 000ac048 000ac978
9fc0: 000055f4 00000000 00000006 00000080 000ac008 000ac028 000ac998 00000001
9fe0: bebaf968 bebaf958 00017764 40214740 60000010 000ac998 c1e38bcc 03de8ad9
[<c00836e8>] (sys_init_module+0xfe0/0x1460) from [<c0037a80>] (ret_fast_syscall+0x0/0x30)
Code: e7923103 e1a03133 e3130001 15963128 (17d33000)
---[ end trace 6e8943127db36208 ]---
Segmentation fault

I hade to change the cp210x.c file and comment out where there was any use of mutex. this was the only place:
static void cp210x_close(struct usb_serial_port *port)
{
dbg("%s - port %d", __func__, port->number);
usb_serial_generic_close(port);
/* mutex_lock(&port->serial->disc_mutex);*/
if (!port->serial->disconnected)
cp210x_set_config_single(port, CP210X_IFC_ENABLE, UART_DISABLE);
/* mutex_unlock(&port->serial->disc_mutex);*/
}

Are you trying to load a kernel module that was compiled for one kernel into another kernel? Linux modules (what you call drivers) are only supposed to be loaded into the kernel that they were compiled for. Even the same version of the kernel with different configuration or compiler settings could render the module incompatible. So playing with version magic is very dangerous.
The reason your driver is crashing is because it is trying to access kernel data structures using incorrect layout, so it is not actually reading the attributes it thinks it should be reading.
Changing architecture from ARMv7 to ARMv5 is very drastic configuration change that will completely change the memory layout of kernel data structures.
Unlike some other operating systems like Windows, Linux does not have an abstraction layer or fixed memory layouts that let you load the same loadable module into different versions of the kernel.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string