Unable to get the RoCC accelerator built with the default Accumulator example for zed board - riscv

Tried building the RoCC accelerator default accumulator example for zed board, but getting an error of "illegal instruction"
I tried the below config in the configs.scala file:-
class WithAccumRocc extends Config(
(pname,site,here) => pname match {
case RoccNMemChannels => 1
case RoccMaxTaggedMemXacts => 0
case BuildRoCC => {
Some((p: Parameters) =>
Module(new AccumulatorExample()(p.alterPartial({ case CoreName => "AccumRocc" }))))
}
}
)
class WithRoCCConfig extends Config(new WithAccumRocc ++ new DefaultFPGAConfig)
The bitstream was generated successfully but when i ran the dummy_rocc_test binary generated from the example given in riscv-isa-sim, i got the following error on the zed board.
root#zynq:~# ./fesvr-zynq pk /sdcard/Custom\ elfs/dummy_rocc
z 0000000000000000 ra 0000000000010044 sp 000000000feffb10 gp 0000000000017880
tp 0000000000000000 t0 0000000000017178 t1 0000000000017178 t2 0000000000000000
s0 000000000feffb40 s1 0000000000000000 a0 0000000000000001 a1 000000000feffb48
a2 0000000000000000 a3 0000000000000000 a4 0000000000000000 a5 000000000000007b
a6 0000000000000000 a7 0000000000000001 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 0000000000010168 va 0000000000010168 insn 0027e00b sr 8000000000003008
An illegal instruction was executed!
Any help here would be greatly appreciated.
P.S. :- The dummy_rocc_test example is working fine with spike and has been compiled with riscv64-unknown-elf-gcc

Hello guys this issue has been resolved, once the change was made the rocket needs to be built with the new pga configuration and the vivid project must also be updated accordingly in order to generate the new bitstream.

Related

how to make use the .rela.dyn to trigger elf loader for aarch64?

I have implemented packer of x86_64 shared library.
Briefly,
a simple xor-loader is injected to a shared library (by creating a new section)
the rela.dyn act as an entrypoint for the shared library
the rela.dyn entry is patched such that it points to the address of the loader.
once the shared library is called, the xor-loader is triggered and decrypt the .text section.
The mechanism works fine for the x86_64 shared library.
The rela.dyn tricks is borrowed from
https://github.com/0xN3utr0n/Noteme/blob/master/injection.c
However, this mechanism failed on aarch64.
I found that
if I xor-ed the .text section, the xor-loader is bypassed
I got Illegal instruction (core dumped), since the .text section has not been decrypted by the xor-loader. (confirmed by inspecting the .text section by gdb)
I have the loader objdump-ed, the loader is intact.
if I don't xor the .text section, the xor-loader is called and work normally. (But the decryption is wrong, since the .text section has not been xor-ed beforehand).
Question:
What could have possible went wrong in aarch64 ?
The result of readelf for aarch64 shared library is provided below.
libtest.so is the library before packing.
While libtest_packed.so is the library after packing.
Here is the result of readelf --relocs libtest.so
Relocation section '.rela.dyn' at offset 0x550 contains 7 entries:
000000010df0 000000000403 R_AARCH64_RELATIV 780
000000010df8 000000000403 R_AARCH64_RELATIV 738
000000011018 000000000403 R_AARCH64_RELATIV 11018
000000010fc8 000300000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_deregisterTMClone + 0
000000010fd0 000400000401 R_AARCH64_GLOB_DA 0000000000000000 __cxa_finalize#GLIBC_2.17 + 0
000000010fd8 000500000401 R_AARCH64_GLOB_DA 0000000000000000 __gmon_start__ + 0
000000010fe0 000700000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_registerTMCloneTa + 0
the functions corresponding to the first 3 entries are:
0000000000000780 t frame_dummy
0000000000000738 t __do_global_dtors_aux
000000000011018 d __dso_handle
here is the result of readelf --relocs libtest_packed.so
Relocation section '.rela.dyn' at offset 0x550 contains 7 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000010df0 000000000403 R_AARCH64_RELATIV 11028
000000010df8 000000000403 R_AARCH64_RELATIV 738
000000011018 000000000403 R_AARCH64_RELATIV 11018
000000010fc8 000300000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_deregisterTMClone + 0
000000010fd0 000400000401 R_AARCH64_GLOB_DA 0000000000000000 __cxa_finalize#GLIBC_2.17 + 0
000000010fd8 000500000401 R_AARCH64_GLOB_DA 0000000000000000 __gmon_start__ + 0
000000010fe0 000700000401 R_AARCH64_GLOB_DA 0000000000000000 _ITM_registerTMCloneTa + 0
As you can see, the first entry is overwritten by the address of the loader.

usb_control_msg returns -EAGAIN

I have a Customized board which interfaces over USB.. I am writing USB Linux driver.
Everything is working fine when I test it on my Virtual Machine.. But when I switch to the real hardware and use Yocto on hardware and run the following code.. It fails with -EAGAIN..
retval = usb_control_msg(serial->dev,
usb_rcvctrlpipe(serial->dev, 0),
CP210X_GET_MDMSTS,
USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_INTERFACE,
0,
i,
(void *)&modem_status,
1,
USB_CTRL_SET_TIMEOUT);
retval returns with -11. I am not sure why this is not happening on the Virtual Machine, as the only difference is that the PIC Board gets connected to X86 Customized board and runs linux..
dmesg output:
transfer buffer not dma capable
------------[ cut here ]------------
WARNING: CPU: 1 PID: 754 at /usr/src/kernel/drivers/usb/core/hcd.c:1595 usb_hcd_map_urb_for_dma+0x3e6/0x5b0
Modules linked in: cp2108(O)
CPU: 1 PID: 754 Comm: test_quad Tainted: G O 4.14.68-intel-pk-standard #1
task: ffff9a33b7d4a4c0 task.stack: ffff9ce5c0130000
RIP: 0010:usb_hcd_map_urb_for_dma+0x3e6/0x5b0
RSP: 0018:ffff9ce5c0133b08 EFLAGS: 00010296
RAX: 000000000000001f RBX: ffff9a33b7d89780 RCX: 0000000000000000
RDX: ffff9a33bfc9d680 RSI: ffff9a33bfc95598 RDI: ffff9a33bfc95598
RBP: ffff9ce5c0133b28 R08: 0000000000000001 R09: 0000000000000328
R10: ffff9a33ba840068 R11: 0000000000000000 R12: ffff9a33ba2ea000
R13: 00000000fffffff5 R14: 0000000001400000 R15: 0000000000000200
FS: 00007fac9eeed4c0(0000) GS:ffff9a33bfc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000003c334e3cb0 CR3: 0000000179498000 CR4: 00000000003406e0
Call Trace:
usb_hcd_submit_urb+0x420/0xa00
? tty_port_open+0xa7/0xd0
? tty_ldisc_unlock+0x1a/0x20
? tty_open_proc_set_tty+0x7f/0xb0
? tty_unlock+0x29/0x40
? tty_open+0x38e/0x450
usb_submit_urb+0x364/0x550
usb_start_wait_urb+0x5f/0xe0
usb_control_msg+0xc5/0x110
cp210x_ioctl+0x4d2/0xe20 [cp2108]
? filemap_map_pages+0x129/0x290
? do_filp_open+0xa0/0xf0
serial_ioctl+0x46/0x50
tty_ioctl+0xe7/0x870
do_vfs_ioctl+0x99/0x5e0
? putname+0x4c/0x60
SyS_ioctl+0x79/0x90
Can you guys please provide a hint for me to try.. Appreciate your time and efforts.
I got the solution Kernel >= 4.9 no longer accepts any statically allocated buffer.
Modified the code to use dynamic memory and it worked.
modem_status = kmalloc(sizeof(unsigned long), GFP_KERNEL);
if (!modem_status)
return -ENOMEM;
for (i = 0; i < MAX_CP210x_INTERFACE_NUM; i++) {
retval = usb_control_msg(serial->dev,
usb_rcvctrlpipe(serial->dev, 0),
CP210X_GET_MDMSTS,
USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_INTERFACE,
0,
i,
(void *)modem_status,
1,
USB_CTRL_SET_TIMEOUT);

How to send a UDP packet from inside linux kernel

I'm modifying the UDP protocol such that when connect() is called on a UDP socket, in addition to finding the route, a "Hello" packet is also sent to the destination.
From the UDP proto structure, I figured out that the function ip4_datagram_connect does the job of finding the route to the destination. Now at the end of this function, I need to send the Hello packet.
I don't think I can use udp_sendmsg() as it's used for copying and sending data from the userspace.
I think udp_send_skb() should be used to sent the hello. My problem is I don't know how to create an appropriate skbuff to store the Hello message (it should be a proper udp datagram) to be passed on to udp_send_skb(). I've tried this
int quic_connect(struct sock *sk, struct flowi4 *fl4, struct rtable *rt){
struct sk_buff *skb;
char *hello;
int err = 0, exthdrlen, hh_len, datalen, trailerlen;
char *data;
hh_len = LL_RESERVED_SPACE(rt->dst.dev);
exthdrlen = rt->dst.header_len;
trailerlen = rt->dst.trailer_len;
datalen = 200;
//Create a buffer to be send without fragmentation
skb = sock_alloc_send_skb(sk,
exthdrlen + datalen + hh_len + trailerlen + 15,
MSG_DONTWAIT, &err);
if (skb == NULL)
goto out;
skb->ip_summed = CHECKSUM_PARTIAL; // Use hardware checksum
skb->csum = 0;
skb_reserve(skb, hh_len);
skb_shinfo(skb)->tx_flags = 1; //Time stamp the packet
/*
* Find where to start putting bytes.
*/
data = skb_put(skb, datalen + exthdrlen);
skb_set_network_header(skb, exthdrlen);
skb->transport_header = (skb->network_header +
sizeof(struct iphdr));
err = udp_send_skb(skb, fl4);
However, this gives me errors in the kernel log
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff81686555>] __ip_local_out+0x45/0x80
PGD 4f4dd067 PUD 4f4df067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 3019 Comm: client Not tainted 3.13.11-ckt39-test006 #28
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ffff8800598df6b0 ti: ffff880047022000 task.ti: ffff880047022000
RIP: 0010:[<ffffffff81686555>] [<ffffffff81686555>] __ip_local_out+0x45/0x80
RSP: 0018:ffff880047023d78 EFLAGS: 00010287
RAX: 0000000000000001 RBX: ffff880047008a00 RCX: 0000000020000000
RDX: 0000000000000000 RSI: ffff880047008a00 RDI: ffff8800666fde40
RBP: ffff880047023d88 R08: 0000000000003200 R09: 0000000000000001
R10: 0000000000000000 R11: 00000000000001f9 R12: ffff880047008a00
R13: ffff8800666fde80 R14: ffff880059aec380 R15: ffff880059aec690
FS: 00007f5508b04740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 000000004f561000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff880047023d80 ffff880047008a00 ffff880047023da0 ffffffff8168659d
ffffffff81c8f8c0 ffff880047023db8 ffffffff81687810 0000000000000000
ffff880047023df8 ffffffff816ac6be 0000000000000020 ffff880047008a00
Call Trace:
[<ffffffff8168659d>] ip_local_out+0xd/0x30
[<ffffffff81687810>] ip_send_skb+0x10/0x40
[<ffffffff816ac6be>] udp_send_skb+0x14e/0x3d0
[<ffffffff816b0e9e>] quic_connect+0x6e/0x80
[<ffffffff816aa3ff>] __ip4_datagram_connect+0x2bf/0x2d0
[<ffffffff816aa437>] ip4_datagram_connect+0x27/0x40
[<ffffffff816b8748>] inet_dgram_connect+0x38/0x80
[<ffffffff8161fd97>] SYSC_connect+0xc7/0x100
[<ffffffff817ed471>] ? __schedule+0x341/0x8c0
[<ffffffff816206e9>] SyS_connect+0x9/0x10
[<ffffffff817f8d42>] system_call_fastpath+0x16/0x1b
Code: c8 00 00 00 66 c1 c0 08 66 89 47 02 e8 d5 e0 ff ff 48 8b 53 58 b8 01 00 00 00 48 83 e2 fe 48 81 3d 9d 0e 64 00 f0 73 cc 81 74 26 <4c> 8b 42 18 49 c7 c1 f0 45 68 81 c7 04 24 00 00 00 80 31 c9 48
RIP [<ffffffff81686555>] __ip_local_out+0x45/0x80
RSP <ffff880047023d78>
CR2: 0000000000000018
---[ end trace 474c5db1b9b19a03 ]---
So my question is, what else do I need to fill in my skbuff before it can be handled properly by udp_send_skb. Or am I missing something else here?
There is a bug in your code.
if (skb_tailroom(hbuff) > 30) {
printk(" Enough room for QUIC connect message\n");
hello = kmalloc(30, GFP_ATOMIC); //You allocate slub memory
hello = "Hello from QUIC connect"; //You let 'hello' point to a string,
//which is stored somewhere else.
//At this point, your slub memory
//allocated is lost.
memcpy(__skb_put(hbuff, 30), hello, 30);
kfree(hello); //You try to free the memory pointed by
//hello as slub memory, I think this is
// why you get mm/slub.c bug message.
} else
You can change your code like this:
if (skb_tailroom(hbuff) > 30) {
printk(" Enough room for QUIC connect message\n");
memcpy(__skb_put(hbuff, 30), "Hello from QUIC connect", 30);
} else
I just used the functions ip_make_skb followed by ip_send_skb. Since ip_make_skb is used to copy data from user space and I didn't need to do that, I just used a dummy pointer and provided the length to be copied as Zero (+sizeof udphdr, as required by this function). It's a dirty way to do it, but it works for now.
My code:
int quic_connect(struct sock *sk, struct flowi4 *fl4, struct rtable *rt, int oif){
struct sk_buff *skb;
struct ipcm_cookie ipc;
void *hello = "Hello";
int err = 0;
ipc.opt = NULL;
ipc.tx_flags = 1;
ipc.ttl = 0;
ipc.tos = -1;
ipc.oif = oif;
ipc.addr = fl4->daddr;
skb = ip_make_skb(sk, fl4, ip_generic_getfrag, hello, sizeof(struct udphdr),
sizeof(struct udphdr), &ipc, &rt,
0);
err = PTR_ERR(skb);
if (!IS_ERR_OR_NULL(skb)){
err = udp_send_skb(skb, fl4);
}
return err;
}

Segmentation fault when running binaries compiled using riscv64-unknown-linux-gnu-gcc in spike

#include<stdio.h>
int main()
{
int src = 5;
int dst = 0;
asm ("mv %0,%1":"=X"(dst):"r"(src));
asm("mv a0,a1");
printf(" %d\n", dst);
return 0;
}
prashantravi#ubuntu:~/rocket-chip$ riscv64-unknown-linux-gnu-gcc -o asm_test asm_test.c
prashantravi#ubuntu:~/rocket-chip$ spike riscv/bin/pk asm_test
z 0000000000000000 ra 0000000000000000 sp 00000000fefffb50 gp 0000000000801fb8
tp 0000000000000000 t0 0000000000000000 t1 0000000000000008 t2 00000000008012e0
s0 0000000000000000 s1 0000000000000000 a0 0000000000800430 a1 0000000000000001
a2 00000000fefffb58 a3 0000000000800484 a4 0000000000800514 a5 0000000000000000
a6 00000000fefffb50 a7 0000000000000000 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 ffffffffffffffff t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc fffffffffffffffe va fffffffffffffffe insn ffffffff sr 8000000000003008
User fetch segfault # 0xfffffffffffffffe
I am getting the above error when i am compiling programs using riscv64-unknown-linux-gnu-gcc in spike.
The same code executes perfectly when run with riscv64-unknown-elf gcc
You can't run dynamically linked programs on the proxy-kernel.
You must statically link your programs if you are going to run them on the proxy-kernel. This is performed by default using the riscv64-unknown-elf-gcc compiler. If you are going to use the riscv64-unknown-linux-gnu-gcc compiler, you must either pass -static or you must run it on the Linux kernel.
$ riscv64-unknown-elf-gcc -o asm_test asm_test.c [or...]
$ riscv64-unknown-linux-gnu-gcc -static -o asm_test asm_test.c
$ spike pk asm_test
In more detail, how I debugged this before I remembered the above limitation:
By running $ spike -d pk asm_test 2> output.txt, we can see the trace of the program:
<snippet>
374618 : core 0: 0x0000000000800320 (0x00002197) auipc gp, 0x2
374619 : core 0: 0x0000000000800324 (0xc9818193) addi gp, gp, -872
374620 : core 0: 0x0000000000800328 (0x00050793) mv a5, a0
374621 : core 0: 0x000000000080032c (0x00000517) auipc a0, 0x0
374622 : core 0: 0x0000000000800330 (0x10450513) addi a0, a0, 260
374623 : core 0: 0x0000000000800334 (0x00013583) ld a1, 0(sp)
374624 : core 0: 0x0000000000800338 (0x00810613) addi a2, sp, 8
374625 : core 0: 0x000000000080033c (0xff017113) andi sp, sp, -16
374626 : core 0: 0x0000000000800340 (0x00000697) auipc a3, 0x0
374627 : core 0: 0x0000000000800344 (0x14468693) addi a3, a3, 324
374628 : core 0: 0x0000000000800348 (0x00000717) auipc a4, 0x0
374629 : core 0: 0x000000000080034c (0x1cc70713) addi a4, a4, 460
374630 : core 0: 0x0000000000800350 (0x00010813) mv a6, sp
374631 : core 0: 0x0000000000800354 (0xfbdff06f) j pc - 0x44
374632 : core 0: 0x0000000000800310 (0x00001e17) auipc t3, 0x1
374633 : core 0: 0x0000000000800314 (0x498e3e03) ld t3, 1176(t3)
374634 : core 0: 0x0000000000800318 (0x000e0367) jalr t1, t3, 0
374635 : core 0: 0x00000000008002e0 (0x00001397) auipc t2, 0x1
374636 : core 0: 0x00000000008002e4 (0x41c30333) sub t1, t1, t3
374637 : core 0: 0x00000000008002e8 (0x4b03be03) ld t3, 1200(t2)
374638 : core 0: 0x00000000008002ec (0xfd430313) addi t1, t1, -44
374639 : core 0: 0x00000000008002f0 (0x4b038293) addi t0, t2, 1200
374640 : core 0: 0x00000000008002f4 (0x00135313) srli t1, t1, 1
374641 : core 0: 0x00000000008002f8 (0x0082b283) ld t0, 8(t0)
374642 : core 0: 0x00000000008002fc (0x000e0067) jr t3
374643 : core 0: exception trap_instruction_access_fault, epc 0xfffffffffffffffe
374644 core 0: 0x0000000000000100 (0x34011173) csrrw sp, mscratch, sp
374645 : core 0: 0x0000000000000104 (0x04a13823) sd a0, 80(sp)
374646 : core 0: 0x0000000000000108 (0x04b13c23) sd a1, 88(sp)
If you objdump asm_test, you'll see that it's in _start, then __libc_start_main, then __libc_start_main#plt (0x800310), and then _PROCEDURE_LINKAGE_TABLE_ (0x8002e0).
From there, it attempts a jr, which jumps to 0xfffffffffffffffe, which is a misaligned fetch address. Hence the crash.

How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard?

dummy_rocc is a naive built-in RoCC accelerator example in RISCV tools, where several custom0 instructions are defined. After setup dummy_rocc (either on Spike ISA simulator or on Rocket-FPGA, differently), we use dummy_rocc_test -- a user program testcase to verify the correctness of the dummy_rocc accelerator. We have two ways to run dummy_rocc_test, either on pk (proxy kernel) or on Linux.
I once setup dummy_rocc on Spike ISA simulator, the dummy_rocc_test worked well either on pk or on Linux.
Now I replace Spike with Rocket-FPGA on Zedboard. While the execution on pk succeeds:
root#zynq:~# ./fesvr-zynq pk /nfs/copy_to_rootfs/work/dummy_rocc_test
begin
after asm code
load x into accumulator 2 (funct=0)
read it back into z (funct=1) to verify it
accumulate 456 into it (funct=3)
verify it
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
success!
the execution on Linux fails:
./fesvr-zynq +disk=/nfs/root.bin bbl /nfs/fpga-zynq/zedboard/fpga-images-zedboard/riscv/vmlinux
..................................Booting RISC-V Linux.........................................
/ # ./work/dummy_rocc_test
begin
after asm code
[ 0.400000] dummy_rocc_test[23]: unhandled signal 4 code 0x30001 at 0x0000000000800500 in ]
[ 0.400000] CPU: 0 PID: 23 Comm: dummy_rocc_test Not tainted 3.14.33-g043bb5d #1
[ 0.400000] task: ffffffff8fa3f500 ti: ffffffff8fb76000 task.ti: ffffffff8fb76000
[ 0.400000] sepc: 0000000000800500 ra : 00000000008004fc sp : 0000003fff943c70
[ 0.400000] gp : 0000000000882198 tp : 0000000000884700 t0 : 0000000000000000
[ 0.400000] t1 : 000000000080adc8 t2 : 8101010101010100 s0 : 0000003fff943ca0
[ 0.400000] s1 : 0000000000800d5c a0 : 000000000000000f a1 : 0000002000002000
[ 0.400000] a2 : 000000000000000f a3 : 000000000085cee8 a4 : 0000000000000001
[ 0.400000] a5 : 000000000000007b a6 : 0000000000000008 a7 : 0000000000000040
[ 0.400000] s2 : 0000000000000000 s3 : 00000000008a2668 s4 : 00000000008d8d98
[ 0.400000] s5 : 00000000008d7770 s6 : 0000000000000008 s7 : 00000000008d6000
[ 0.400000] s8 : 00000000008d8d60 s9 : 0000000000000000 s10: 00000000008a32b8
[ 0.400000] s11: ffffffffffffffff t3 : 000000000000000b t4 : 000000006ffffdff
[ 0.400000] t5 : 000000000000000a t6 : 000000006ffffeff
[ 0.400000] sstatus: 8000000000003008 sbadaddr: 0000000000800500 scause: 0000000000000002
Illegal instruction
A screenshot shows that the "signal 4" is caused by a custom0 instruction.
readelf screenshot of dummy_rocc_test
So my problem is "How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard? "
The source code of dummy_rocc_test is provided as reference:
// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int main() {
printf("begin\n");
uint64_t x = 123, y = 456, z = 0;
// load x into accumulator 2 (funct=0)
// asm code
asm volatile ("addi a1, a1, 2");
/// printf again
printf("after asm code\n");
asm volatile ("custom0 x0, %0, 2, 0" : : "r"(x));
printf("load x into accumulator 2 (funct=0)\n");
// read it back into z (funct=1) to verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("read it back into z (funct=1) to verify it\n");
assert(z == x);
// accumulate 456 into it (funct=3)
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("accumulate 456 into it (funct=3)\n");
// verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("verify it\n");
assert(z == x+y);
// do it all again, but initialize acc2 via memory this time (funct=2)
asm volatile ("custom0 x0, %0, 2, 2" : : "r"(&x));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
assert(z == x+y);
printf("success!\n");
}
"Illegal instruction" means your processor threw an illegal instruction exception.
Since custom0 is not going to be something Linux will know how to execute in software (since it's something that's customizable!), Linux will panic and throw the error that you saw.
The question I have for you is "Did you implement the custom0 instruction in the processor? Is it enabled? Did the program execute your custom0 instruction properly and return the correct answer when you used the proxy-kernel?"

Resources