Segmentation fault when running binaries compiled using riscv64-unknown-linux-gnu-gcc in spike - riscv

#include<stdio.h>
int main()
{
int src = 5;
int dst = 0;
asm ("mv %0,%1":"=X"(dst):"r"(src));
asm("mv a0,a1");
printf(" %d\n", dst);
return 0;
}
prashantravi#ubuntu:~/rocket-chip$ riscv64-unknown-linux-gnu-gcc -o asm_test asm_test.c
prashantravi#ubuntu:~/rocket-chip$ spike riscv/bin/pk asm_test
z 0000000000000000 ra 0000000000000000 sp 00000000fefffb50 gp 0000000000801fb8
tp 0000000000000000 t0 0000000000000000 t1 0000000000000008 t2 00000000008012e0
s0 0000000000000000 s1 0000000000000000 a0 0000000000800430 a1 0000000000000001
a2 00000000fefffb58 a3 0000000000800484 a4 0000000000800514 a5 0000000000000000
a6 00000000fefffb50 a7 0000000000000000 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 ffffffffffffffff t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc fffffffffffffffe va fffffffffffffffe insn ffffffff sr 8000000000003008
User fetch segfault # 0xfffffffffffffffe
I am getting the above error when i am compiling programs using riscv64-unknown-linux-gnu-gcc in spike.
The same code executes perfectly when run with riscv64-unknown-elf gcc

You can't run dynamically linked programs on the proxy-kernel.
You must statically link your programs if you are going to run them on the proxy-kernel. This is performed by default using the riscv64-unknown-elf-gcc compiler. If you are going to use the riscv64-unknown-linux-gnu-gcc compiler, you must either pass -static or you must run it on the Linux kernel.
$ riscv64-unknown-elf-gcc -o asm_test asm_test.c [or...]
$ riscv64-unknown-linux-gnu-gcc -static -o asm_test asm_test.c
$ spike pk asm_test
In more detail, how I debugged this before I remembered the above limitation:
By running $ spike -d pk asm_test 2> output.txt, we can see the trace of the program:
<snippet>
374618 : core 0: 0x0000000000800320 (0x00002197) auipc gp, 0x2
374619 : core 0: 0x0000000000800324 (0xc9818193) addi gp, gp, -872
374620 : core 0: 0x0000000000800328 (0x00050793) mv a5, a0
374621 : core 0: 0x000000000080032c (0x00000517) auipc a0, 0x0
374622 : core 0: 0x0000000000800330 (0x10450513) addi a0, a0, 260
374623 : core 0: 0x0000000000800334 (0x00013583) ld a1, 0(sp)
374624 : core 0: 0x0000000000800338 (0x00810613) addi a2, sp, 8
374625 : core 0: 0x000000000080033c (0xff017113) andi sp, sp, -16
374626 : core 0: 0x0000000000800340 (0x00000697) auipc a3, 0x0
374627 : core 0: 0x0000000000800344 (0x14468693) addi a3, a3, 324
374628 : core 0: 0x0000000000800348 (0x00000717) auipc a4, 0x0
374629 : core 0: 0x000000000080034c (0x1cc70713) addi a4, a4, 460
374630 : core 0: 0x0000000000800350 (0x00010813) mv a6, sp
374631 : core 0: 0x0000000000800354 (0xfbdff06f) j pc - 0x44
374632 : core 0: 0x0000000000800310 (0x00001e17) auipc t3, 0x1
374633 : core 0: 0x0000000000800314 (0x498e3e03) ld t3, 1176(t3)
374634 : core 0: 0x0000000000800318 (0x000e0367) jalr t1, t3, 0
374635 : core 0: 0x00000000008002e0 (0x00001397) auipc t2, 0x1
374636 : core 0: 0x00000000008002e4 (0x41c30333) sub t1, t1, t3
374637 : core 0: 0x00000000008002e8 (0x4b03be03) ld t3, 1200(t2)
374638 : core 0: 0x00000000008002ec (0xfd430313) addi t1, t1, -44
374639 : core 0: 0x00000000008002f0 (0x4b038293) addi t0, t2, 1200
374640 : core 0: 0x00000000008002f4 (0x00135313) srli t1, t1, 1
374641 : core 0: 0x00000000008002f8 (0x0082b283) ld t0, 8(t0)
374642 : core 0: 0x00000000008002fc (0x000e0067) jr t3
374643 : core 0: exception trap_instruction_access_fault, epc 0xfffffffffffffffe
374644 core 0: 0x0000000000000100 (0x34011173) csrrw sp, mscratch, sp
374645 : core 0: 0x0000000000000104 (0x04a13823) sd a0, 80(sp)
374646 : core 0: 0x0000000000000108 (0x04b13c23) sd a1, 88(sp)
If you objdump asm_test, you'll see that it's in _start, then __libc_start_main, then __libc_start_main#plt (0x800310), and then _PROCEDURE_LINKAGE_TABLE_ (0x8002e0).
From there, it attempts a jr, which jumps to 0xfffffffffffffffe, which is a misaligned fetch address. Hence the crash.

Related

how to make use the .rela.dyn to trigger elf loader for aarch64?

I have implemented packer of x86_64 shared library.
Briefly,
a simple xor-loader is injected to a shared library (by creating a new section)
the rela.dyn act as an entrypoint for the shared library
the rela.dyn entry is patched such that it points to the address of the loader.
once the shared library is called, the xor-loader is triggered and decrypt the .text section.
The mechanism works fine for the x86_64 shared library.
The rela.dyn tricks is borrowed from
https://github.com/0xN3utr0n/Noteme/blob/master/injection.c
However, this mechanism failed on aarch64.
I found that
if I xor-ed the .text section, the xor-loader is bypassed
I got Illegal instruction (core dumped), since the .text section has not been decrypted by the xor-loader. (confirmed by inspecting the .text section by gdb)
I have the loader objdump-ed, the loader is intact.
if I don't xor the .text section, the xor-loader is called and work normally. (But the decryption is wrong, since the .text section has not been xor-ed beforehand).
Question:
What could have possible went wrong in aarch64 ?
The result of readelf for aarch64 shared library is provided below.
libtest.so is the library before packing.
While libtest_packed.so is the library after packing.
Here is the result of readelf --relocs libtest.so
Relocation section '.rela.dyn' at offset 0x550 contains 7 entries:
000000010df0 000000000403 R_AARCH64_RELATIV 780
000000010df8 000000000403 R_AARCH64_RELATIV 738
000000011018 000000000403 R_AARCH64_RELATIV 11018
000000010fc8 000300000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_deregisterTMClone + 0
000000010fd0 000400000401 R_AARCH64_GLOB_DA 0000000000000000 __cxa_finalize#GLIBC_2.17 + 0
000000010fd8 000500000401 R_AARCH64_GLOB_DA 0000000000000000 __gmon_start__ + 0
000000010fe0 000700000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_registerTMCloneTa + 0
the functions corresponding to the first 3 entries are:
0000000000000780 t frame_dummy
0000000000000738 t __do_global_dtors_aux
000000000011018 d __dso_handle
here is the result of readelf --relocs libtest_packed.so
Relocation section '.rela.dyn' at offset 0x550 contains 7 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000010df0 000000000403 R_AARCH64_RELATIV 11028
000000010df8 000000000403 R_AARCH64_RELATIV 738
000000011018 000000000403 R_AARCH64_RELATIV 11018
000000010fc8 000300000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_deregisterTMClone + 0
000000010fd0 000400000401 R_AARCH64_GLOB_DA 0000000000000000 __cxa_finalize#GLIBC_2.17 + 0
000000010fd8 000500000401 R_AARCH64_GLOB_DA 0000000000000000 __gmon_start__ + 0
000000010fe0 000700000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_registerTMCloneTa + 0
As you can see, the first entry is overwritten by the address of the loader.

RISC-V interrupts, setting up MTIMECMP

I am trying to write a program in RISC-V assembly for HiFive1 board to wake up with timer interrupt
This is my interrupt setup routine
.section .text
.align 2
.globl setupINTERRUPT
.equ MTIMECMP, 0x2004000
setupINTERRUPT:
addi sp, sp, -16 # allocate a stack frame, moves the stack up by 16 bits
sw ra, 12(sp) # save return adress on stack
li t0, 0x8 # time interval at which to triger the interrupt
li t1, MTIMECMP # MTIMECMP register of the CLINT memmory map
sw t0, 0(t1) # store the interval in MTIMECMP memory location
li t0, 0x800 # make a mask for 3rd bit
csrrs t1, mstatus, t0 # use CRS READ/SET instruction to set 3rd bit using previously defined mask
li t0, 0x3 # make a mask for 0th and 1st bit
csrrc t1, mtvec, t0 # use CSR READ/CLEAR instruction to clear 0th and 1st bit
li t0, 0x80 # make a mask for 7th bit
csrrs t1, mie, t0 # set 7th bit for MACHINE TIMER INTERRUPT ENABLE
lw ra, 12(sp) # restore the return address
addi sp, sp, 16 # dealocating stack frame
ret
I am not too sure if im setting the MTIMECMP correctly, i know its a 64 bit memory location.
I am trying to use this interrupt as a delay timer for a blinking LED (just trying to make sure the interrupt works before i move onto writing a handler)
here is my setLED program. (not that all the GPIO register setup was done previously and is known to work). I have WFI instruction before each of the ON and OFF functions. The LED doesn't light up, even though in the debug mode it does. I think in LED it skips the WFI instruction as if the interrupt was asserted.
.section .text
.align 2
.globl setLED
#include "memoryMap.inc"
#include "GPIO.inc"
.equ NOERROR, 0x0
.equ ERROR, 0x1
.equ LEDON, 0x1
# which LED to set comes into register a0
# desired On/Off state comes into a1
setLED:
addi sp, sp, -16 # allocate a stack frame, moves the stack up by 16 bits
sw ra, 12(sp) # save return adress on stack
li t0, GPIO_CTRL_ADDR # load GPIO adress
lw t1, GPIO_OUTPUT_VAL(t0) # get the current value of the pins
beqz a1, ledOff # Branch off to turn off led if a1 requests it
li t2, LEDON # load up valued of LEDON into temp register
beq a1, t2, ledOn # branch if on requested
li a0, ERROR # we got a bad status request, return an error
j exit
ledOn:
wfi
xor t1, t1, a0 # doing xor to only change the value of requested LED
sw t1, GPIO_OUTPUT_VAL(t0) # write the new output value to GPIO out
li a0, NOERROR # no error
j exit
ledOff:
wfi
xor a0, a0, 0xffffffff # invert everything so that all bits are one except the LED we are turning off
and t1, t1, a0 # and a0 and t1 to get the LED we want to turn off
sw t1, GPIO_OUTPUT_VAL(t0) # write the new output value
li a0, NOERROR
exit:
lw ra, 12(sp) # restore the return address
addi sp, sp, 16 # dealocating stack frame
ret

Unable to get the RoCC accelerator built with the default Accumulator example for zed board

Tried building the RoCC accelerator default accumulator example for zed board, but getting an error of "illegal instruction"
I tried the below config in the configs.scala file:-
class WithAccumRocc extends Config(
(pname,site,here) => pname match {
case RoccNMemChannels => 1
case RoccMaxTaggedMemXacts => 0
case BuildRoCC => {
Some((p: Parameters) =>
Module(new AccumulatorExample()(p.alterPartial({ case CoreName => "AccumRocc" }))))
}
}
)
class WithRoCCConfig extends Config(new WithAccumRocc ++ new DefaultFPGAConfig)
The bitstream was generated successfully but when i ran the dummy_rocc_test binary generated from the example given in riscv-isa-sim, i got the following error on the zed board.
root#zynq:~# ./fesvr-zynq pk /sdcard/Custom\ elfs/dummy_rocc
z 0000000000000000 ra 0000000000010044 sp 000000000feffb10 gp 0000000000017880
tp 0000000000000000 t0 0000000000017178 t1 0000000000017178 t2 0000000000000000
s0 000000000feffb40 s1 0000000000000000 a0 0000000000000001 a1 000000000feffb48
a2 0000000000000000 a3 0000000000000000 a4 0000000000000000 a5 000000000000007b
a6 0000000000000000 a7 0000000000000001 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 0000000000010168 va 0000000000010168 insn 0027e00b sr 8000000000003008
An illegal instruction was executed!
Any help here would be greatly appreciated.
P.S. :- The dummy_rocc_test example is working fine with spike and has been compiled with riscv64-unknown-elf-gcc
Hello guys this issue has been resolved, once the change was made the rocket needs to be built with the new pga configuration and the vivid project must also be updated accordingly in order to generate the new bitstream.

How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard?

dummy_rocc is a naive built-in RoCC accelerator example in RISCV tools, where several custom0 instructions are defined. After setup dummy_rocc (either on Spike ISA simulator or on Rocket-FPGA, differently), we use dummy_rocc_test -- a user program testcase to verify the correctness of the dummy_rocc accelerator. We have two ways to run dummy_rocc_test, either on pk (proxy kernel) or on Linux.
I once setup dummy_rocc on Spike ISA simulator, the dummy_rocc_test worked well either on pk or on Linux.
Now I replace Spike with Rocket-FPGA on Zedboard. While the execution on pk succeeds:
root#zynq:~# ./fesvr-zynq pk /nfs/copy_to_rootfs/work/dummy_rocc_test
begin
after asm code
load x into accumulator 2 (funct=0)
read it back into z (funct=1) to verify it
accumulate 456 into it (funct=3)
verify it
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
success!
the execution on Linux fails:
./fesvr-zynq +disk=/nfs/root.bin bbl /nfs/fpga-zynq/zedboard/fpga-images-zedboard/riscv/vmlinux
..................................Booting RISC-V Linux.........................................
/ # ./work/dummy_rocc_test
begin
after asm code
[ 0.400000] dummy_rocc_test[23]: unhandled signal 4 code 0x30001 at 0x0000000000800500 in ]
[ 0.400000] CPU: 0 PID: 23 Comm: dummy_rocc_test Not tainted 3.14.33-g043bb5d #1
[ 0.400000] task: ffffffff8fa3f500 ti: ffffffff8fb76000 task.ti: ffffffff8fb76000
[ 0.400000] sepc: 0000000000800500 ra : 00000000008004fc sp : 0000003fff943c70
[ 0.400000] gp : 0000000000882198 tp : 0000000000884700 t0 : 0000000000000000
[ 0.400000] t1 : 000000000080adc8 t2 : 8101010101010100 s0 : 0000003fff943ca0
[ 0.400000] s1 : 0000000000800d5c a0 : 000000000000000f a1 : 0000002000002000
[ 0.400000] a2 : 000000000000000f a3 : 000000000085cee8 a4 : 0000000000000001
[ 0.400000] a5 : 000000000000007b a6 : 0000000000000008 a7 : 0000000000000040
[ 0.400000] s2 : 0000000000000000 s3 : 00000000008a2668 s4 : 00000000008d8d98
[ 0.400000] s5 : 00000000008d7770 s6 : 0000000000000008 s7 : 00000000008d6000
[ 0.400000] s8 : 00000000008d8d60 s9 : 0000000000000000 s10: 00000000008a32b8
[ 0.400000] s11: ffffffffffffffff t3 : 000000000000000b t4 : 000000006ffffdff
[ 0.400000] t5 : 000000000000000a t6 : 000000006ffffeff
[ 0.400000] sstatus: 8000000000003008 sbadaddr: 0000000000800500 scause: 0000000000000002
Illegal instruction
A screenshot shows that the "signal 4" is caused by a custom0 instruction.
readelf screenshot of dummy_rocc_test
So my problem is "How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard? "
The source code of dummy_rocc_test is provided as reference:
// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int main() {
printf("begin\n");
uint64_t x = 123, y = 456, z = 0;
// load x into accumulator 2 (funct=0)
// asm code
asm volatile ("addi a1, a1, 2");
/// printf again
printf("after asm code\n");
asm volatile ("custom0 x0, %0, 2, 0" : : "r"(x));
printf("load x into accumulator 2 (funct=0)\n");
// read it back into z (funct=1) to verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("read it back into z (funct=1) to verify it\n");
assert(z == x);
// accumulate 456 into it (funct=3)
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("accumulate 456 into it (funct=3)\n");
// verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("verify it\n");
assert(z == x+y);
// do it all again, but initialize acc2 via memory this time (funct=2)
asm volatile ("custom0 x0, %0, 2, 2" : : "r"(&x));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
assert(z == x+y);
printf("success!\n");
}
"Illegal instruction" means your processor threw an illegal instruction exception.
Since custom0 is not going to be something Linux will know how to execute in software (since it's something that's customizable!), Linux will panic and throw the error that you saw.
The question I have for you is "Did you implement the custom0 instruction in the processor? Is it enabled? Did the program execute your custom0 instruction properly and return the correct answer when you used the proxy-kernel?"

Accessing external bus in kernel space on an ARM based board

I'm trying to write an LCD display driver on an ARM based board.
The LCD controller is plugged on the external memory bus.
So I try to convert the physical address of registers of the controller to the virtual one.
I use the following pieces of code to do that :
#define AT91_VA_BASE_CS2 phys_to_virt(0x50000000)
static inline unsigned char at91_CS2_read(unsigned int reg)
{
void __iomem *CS2_base = (void __iomem *)AT91_VA_BASE_CS2;
return __raw_readb(CS2_base + reg);
}
static inline void at91_CS2_write(unsigned int reg, unsigned char value)
{
void __iomem *CS2_base = (void __iomem *)AT91_VA_BASE_CS2;
__raw_writeb(value, CS2_base + reg);
}
void write_lcd_port (int mode, unsigned char cmd_dat)
{
while ((read_lcd_port() & 0x03) != 0x03) {
/* wait while LCD is busy!!! */
} /* endwhile */
/* Send Command */
if (mode == 1)
{
at91_CS2_write(4, cmd_dat);
}
/* Send Data */
if (mode == 0)
{
at91_CS2_write(0, cmd_dat);
}
}
I get the following message :
Unable to handle kernel paging request at virtual address 4f000004
pgd = c39bc000
[4f000004] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in: module_complet dm9000 at91_wdt vfat fat jffs2 nls_iso8859_1 nls_cp437 nls_base usb_storage sd_mod sg scsie
CPU: 0
PC is at read_lcd_port+0x1c/0x38 [module_complet]
LR is at 0x1
pc : [<bf0a21b8>] lr : [<00000001>] Tainted: P
sp : c380bf1c ip : 60000093 fp : c380bf2c
r10: 0003a804 r9 : c380a000 r8 : c001de64
r7 : 00000000 r6 : fefff000 r5 : 0000009c r4 : 00000001
r3 : 4f000000 r2 : 00000000 r1 : 00001438 r0 : bf0a25cc
Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user
Control: C000717F Table: 239BC000 DAC: 00000015
Process insmod (pid: 903, stack limit = 0xc380a198)
Stack: (0xc380bf1c to 0xc380c000)
bf00: 00000001
bf20: c380bf44 c380bf30 bf0a21f4 bf0a21ac 00000000 fefa0000 c380bf54 c380bf48
bf40: bf0a2288 bf0a21e4 c380bf64 c380bf58 bf0a246c bf0a2280 c380bf84 c380bf68
bf60: bf0a4058 bf0a2464 40017000 c01c89a0 bf0a2d80 c01c8990 c380bfa4 c380bf88
bf80: c004cd20 bf0a4010 00000003 00000000 0000000c 00000080 00000000 c380bfa8
bfa0: c001dcc0 c004cbc8 00000000 0000000c 00900080 40017000 0000162e 00041050
bfc0: 00000003 00000000 0000000c bea0fde4 bea0fec4 00000002 0003a804 00000000
bfe0: bea0fd10 bea0fd04 0001b290 400d1d20 60000010 00900080 20002031 20002431
Backtrace:
[<bf0a219c>] (read_lcd_port+0x0/0x38 [module_complet]) from [<bf0a21f4>] (write_lcd_port+0x20/0x80 [module_complet])
r4 = 00000001
[<bf0a21d4>] (write_lcd_port+0x0/0x80 [module_complet]) from [<bf0a2288>] (wr_cmd+0x18/0x1c [module_complet])
r5 = FEFA0000 r4 = 00000000
[<bf0a2270>] (wr_cmd+0x0/0x1c [module_complet]) from [<bf0a246c>] (lcd_init+0x18/0x80 [module_complet])
[<bf0a2454>] (lcd_init+0x0/0x80 [module_complet]) from [<bf0a4058>] (mon_module_init+0x58/0xcc [module_complet])
[<bf0a4000>] (mon_module_init+0x0/0xcc [module_complet]) from [<c004cd20>] (sys_init_module+0x168/0x2c8)
r6 = C01C8990 r5 = BF0A2D80 r4 = C01C89A0
[<c004cbb8>] (sys_init_module+0x0/0x2c8) from [<c001dcc0>] (ret_fast_syscall+0x0/0x2c)
r7 = 00000080 r6 = 0000000C r5 = 00000000 r4 = 00000003
Code: e59f001c eb3e43c2 e3a0344f e59f0014 (e5d34004)
Segmentation fault
Note that this method works for internal peripherals (such as timers).
So in some cases, phys_to_virt works.
I think that no page is allocated at the address 0x50000000. How can I allocate a page at this specific address ?
I found functions like kmap but it seems to be very complicated and I don't know how to use it.
The best way to access memory-mapped peripherals is with the kernel's ioremap and friends.
First, declare that you want to use a specific region of memory for your peripheral:
struct resource *res = request_mem_region(0x50000000, region_size, "at91");
When you unload your driver, you will want to free that memory region.
release_mem_region(0x50000000, region_size);
Now, you can remap the I/O region before use.
void *ptr = ioremap(0x50000000, region_size);
If you want to prevent caching of these registers, use ioremap_nocache instead. You can also only remap a subregion of your device's memory space if you're only using that part.
Now that you have the iomapped region, you can do I/O on that memory.
iowrite8(value, (char *)ptr + reg);
unsigned int val = ioread8((char *)ptr + reg);
Once you're done reading from and writing to that region of memory, you can unmap it.
iounmap(ptr);
I hope this helps. I would recommend reading (or at least using as a reference) Linux Device Drivers, 3rd Edition, which can be read online for free.

Resources