How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard? - linux

dummy_rocc is a naive built-in RoCC accelerator example in RISCV tools, where several custom0 instructions are defined. After setup dummy_rocc (either on Spike ISA simulator or on Rocket-FPGA, differently), we use dummy_rocc_test -- a user program testcase to verify the correctness of the dummy_rocc accelerator. We have two ways to run dummy_rocc_test, either on pk (proxy kernel) or on Linux.
I once setup dummy_rocc on Spike ISA simulator, the dummy_rocc_test worked well either on pk or on Linux.
Now I replace Spike with Rocket-FPGA on Zedboard. While the execution on pk succeeds:
root#zynq:~# ./fesvr-zynq pk /nfs/copy_to_rootfs/work/dummy_rocc_test
begin
after asm code
load x into accumulator 2 (funct=0)
read it back into z (funct=1) to verify it
accumulate 456 into it (funct=3)
verify it
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
success!
the execution on Linux fails:
./fesvr-zynq +disk=/nfs/root.bin bbl /nfs/fpga-zynq/zedboard/fpga-images-zedboard/riscv/vmlinux
..................................Booting RISC-V Linux.........................................
/ # ./work/dummy_rocc_test
begin
after asm code
[ 0.400000] dummy_rocc_test[23]: unhandled signal 4 code 0x30001 at 0x0000000000800500 in ]
[ 0.400000] CPU: 0 PID: 23 Comm: dummy_rocc_test Not tainted 3.14.33-g043bb5d #1
[ 0.400000] task: ffffffff8fa3f500 ti: ffffffff8fb76000 task.ti: ffffffff8fb76000
[ 0.400000] sepc: 0000000000800500 ra : 00000000008004fc sp : 0000003fff943c70
[ 0.400000] gp : 0000000000882198 tp : 0000000000884700 t0 : 0000000000000000
[ 0.400000] t1 : 000000000080adc8 t2 : 8101010101010100 s0 : 0000003fff943ca0
[ 0.400000] s1 : 0000000000800d5c a0 : 000000000000000f a1 : 0000002000002000
[ 0.400000] a2 : 000000000000000f a3 : 000000000085cee8 a4 : 0000000000000001
[ 0.400000] a5 : 000000000000007b a6 : 0000000000000008 a7 : 0000000000000040
[ 0.400000] s2 : 0000000000000000 s3 : 00000000008a2668 s4 : 00000000008d8d98
[ 0.400000] s5 : 00000000008d7770 s6 : 0000000000000008 s7 : 00000000008d6000
[ 0.400000] s8 : 00000000008d8d60 s9 : 0000000000000000 s10: 00000000008a32b8
[ 0.400000] s11: ffffffffffffffff t3 : 000000000000000b t4 : 000000006ffffdff
[ 0.400000] t5 : 000000000000000a t6 : 000000006ffffeff
[ 0.400000] sstatus: 8000000000003008 sbadaddr: 0000000000800500 scause: 0000000000000002
Illegal instruction
A screenshot shows that the "signal 4" is caused by a custom0 instruction.
readelf screenshot of dummy_rocc_test
So my problem is "How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard? "
The source code of dummy_rocc_test is provided as reference:
// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int main() {
printf("begin\n");
uint64_t x = 123, y = 456, z = 0;
// load x into accumulator 2 (funct=0)
// asm code
asm volatile ("addi a1, a1, 2");
/// printf again
printf("after asm code\n");
asm volatile ("custom0 x0, %0, 2, 0" : : "r"(x));
printf("load x into accumulator 2 (funct=0)\n");
// read it back into z (funct=1) to verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("read it back into z (funct=1) to verify it\n");
assert(z == x);
// accumulate 456 into it (funct=3)
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("accumulate 456 into it (funct=3)\n");
// verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("verify it\n");
assert(z == x+y);
// do it all again, but initialize acc2 via memory this time (funct=2)
asm volatile ("custom0 x0, %0, 2, 2" : : "r"(&x));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
assert(z == x+y);
printf("success!\n");
}

"Illegal instruction" means your processor threw an illegal instruction exception.
Since custom0 is not going to be something Linux will know how to execute in software (since it's something that's customizable!), Linux will panic and throw the error that you saw.
The question I have for you is "Did you implement the custom0 instruction in the processor? Is it enabled? Did the program execute your custom0 instruction properly and return the correct answer when you used the proxy-kernel?"

Related

Why is the compiler adding an extra 'sxtw' instruction (resulting further in a kernel panic)?

Issue/Symptom:
At the end of a function return, the compiler adds an sxtw instruction as seen in the disassembly, resulting in a return address of only 32 bits instead of 64 bits, resulting in a kernel panic:
Unable to handle kernel paging request at virtual address xxxx
Build Environment:
Platform : ARMV7LE
gcc, linux-4.4.60
Archictecture : arm64
gdb : aarch64-5.3-glibc-2.22/usr/bin/aarch64-linux-gdb
Details:
Here's the simplified project structure. It's been taken care of correctly in the corresponding makefile. Also note that file1.c and file2.c are part of same module.
../src/file1.c /* It has func1() defined as well as called /
../src/file2.c
../inc/files.h / There's no func1() declared in the header */
Cause of the issue:
A call to the func1() was added from the file2.c w/o func1 declaration in files.h or file2.c. (Basically the inclusion of func1 was accidentally missed in the files.h.)
Code compiled with no errors, but a warning as expected -- Implicit declaration of function func1.
At run time though, right after returning from func1 inside file2, the system crashed as it tried de-referencing the returned address from func1.
Further analysis showed that at the end of a function return, the compiler added an sxtw instruction as seen in the disassembly, resulting in a return address of only 32 bits instead of 64 bits, resulting in a kernel panic.
Unable to handle kernel paging request at virtual address xxxx
Note that x19 is of 64 bit while w0 is of 32 bit.
Note that x0 LS word matches with that of x19.
System crashed while de-referencing x19.
sxtw x19, w0 /* This was added by compiler as extra instruction /
ldp x1, x0, [x19,#304] / System crashed here */
Registers:
[ 91.388130] pc : [<ffffff80016c9074>] lr : [<ffffff80016c906c>] pstate: 80000145
[ 91.462090] sp : ffffff80094333b0
[ 91.552708] x29: ffffff80094333d0 x28: ffffffc06995408a
[ 91.652701] x27: ffffffc06c400a00 x26: 0000000000000000
[ 91.716243] x25: 0000000000000000 x24: ffffffc069958000
[ 91.779784] x23: ffffffc076e00000 x22: ffffffc06c400a00
[ 91.843326] x21: 0000000000000031 x20: ffffffc073060000
[ 91.906867] x19: 0000000066bfc780 x18: ffffff8009436888
[ 91.970409] x17: 0000000000000000 x16: ffffff8008193074
[ 92.033952] x15: 00000000000a8c06 x14: 2c30323030387830
[ 92.097492] x13: 3d7367616c66202c x12: 3038653030303030
[ 92.161034] x11: 3038666666666666 x10: 78303d646e65202c
[ 92.224576] x9 : 3063303030303030 x8 : 3030303030303030
[ 92.288117] x7 : 0000000000000880 x6 : 0000000000000000
[ 92.351659] x5 : ffffffc07fd10ad8 x4 : 0000000000000001
[ 92.415202] x3 : 0000000000000007 x2 : cb88537fdc8ba63c
[ 92.478743] x1 : 0000000000000000 x0 : ffffffc066bfc780
After adding the declaration of func1 in the files.h, the extra instruction and hence the crash was not seen.
Can someone please explain why the compiler added sxtw in this case?
You should have received at least two warnings, one about the missing function declaration and another one about the the implicit conversion from int to a pointer type.
The reason is that implicitly declared functions have a return type of int. Casting this int value to a 64-bit pointer throws away 32 bit of the result. This is the expected GNU C behavior, based on what C compilers for early 64-bit targets did. The sxtw instruction is required to implement this behavior. (Current C standards no longer have implicit function declarations, but GCC still has to support them for backwards compatibility with existing autoconf tests.)
Note that your platform is obviously Aarch64 (with 64-bit registers), not 32-bit ARMv7.

Unable to get the RoCC accelerator built with the default Accumulator example for zed board

Tried building the RoCC accelerator default accumulator example for zed board, but getting an error of "illegal instruction"
I tried the below config in the configs.scala file:-
class WithAccumRocc extends Config(
(pname,site,here) => pname match {
case RoccNMemChannels => 1
case RoccMaxTaggedMemXacts => 0
case BuildRoCC => {
Some((p: Parameters) =>
Module(new AccumulatorExample()(p.alterPartial({ case CoreName => "AccumRocc" }))))
}
}
)
class WithRoCCConfig extends Config(new WithAccumRocc ++ new DefaultFPGAConfig)
The bitstream was generated successfully but when i ran the dummy_rocc_test binary generated from the example given in riscv-isa-sim, i got the following error on the zed board.
root#zynq:~# ./fesvr-zynq pk /sdcard/Custom\ elfs/dummy_rocc
z 0000000000000000 ra 0000000000010044 sp 000000000feffb10 gp 0000000000017880
tp 0000000000000000 t0 0000000000017178 t1 0000000000017178 t2 0000000000000000
s0 000000000feffb40 s1 0000000000000000 a0 0000000000000001 a1 000000000feffb48
a2 0000000000000000 a3 0000000000000000 a4 0000000000000000 a5 000000000000007b
a6 0000000000000000 a7 0000000000000001 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 0000000000010168 va 0000000000010168 insn 0027e00b sr 8000000000003008
An illegal instruction was executed!
Any help here would be greatly appreciated.
P.S. :- The dummy_rocc_test example is working fine with spike and has been compiled with riscv64-unknown-elf-gcc
Hello guys this issue has been resolved, once the change was made the rocket needs to be built with the new pga configuration and the vivid project must also be updated accordingly in order to generate the new bitstream.

Segmentation fault when running binaries compiled using riscv64-unknown-linux-gnu-gcc in spike

#include<stdio.h>
int main()
{
int src = 5;
int dst = 0;
asm ("mv %0,%1":"=X"(dst):"r"(src));
asm("mv a0,a1");
printf(" %d\n", dst);
return 0;
}
prashantravi#ubuntu:~/rocket-chip$ riscv64-unknown-linux-gnu-gcc -o asm_test asm_test.c
prashantravi#ubuntu:~/rocket-chip$ spike riscv/bin/pk asm_test
z 0000000000000000 ra 0000000000000000 sp 00000000fefffb50 gp 0000000000801fb8
tp 0000000000000000 t0 0000000000000000 t1 0000000000000008 t2 00000000008012e0
s0 0000000000000000 s1 0000000000000000 a0 0000000000800430 a1 0000000000000001
a2 00000000fefffb58 a3 0000000000800484 a4 0000000000800514 a5 0000000000000000
a6 00000000fefffb50 a7 0000000000000000 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 ffffffffffffffff t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc fffffffffffffffe va fffffffffffffffe insn ffffffff sr 8000000000003008
User fetch segfault # 0xfffffffffffffffe
I am getting the above error when i am compiling programs using riscv64-unknown-linux-gnu-gcc in spike.
The same code executes perfectly when run with riscv64-unknown-elf gcc
You can't run dynamically linked programs on the proxy-kernel.
You must statically link your programs if you are going to run them on the proxy-kernel. This is performed by default using the riscv64-unknown-elf-gcc compiler. If you are going to use the riscv64-unknown-linux-gnu-gcc compiler, you must either pass -static or you must run it on the Linux kernel.
$ riscv64-unknown-elf-gcc -o asm_test asm_test.c [or...]
$ riscv64-unknown-linux-gnu-gcc -static -o asm_test asm_test.c
$ spike pk asm_test
In more detail, how I debugged this before I remembered the above limitation:
By running $ spike -d pk asm_test 2> output.txt, we can see the trace of the program:
<snippet>
374618 : core 0: 0x0000000000800320 (0x00002197) auipc gp, 0x2
374619 : core 0: 0x0000000000800324 (0xc9818193) addi gp, gp, -872
374620 : core 0: 0x0000000000800328 (0x00050793) mv a5, a0
374621 : core 0: 0x000000000080032c (0x00000517) auipc a0, 0x0
374622 : core 0: 0x0000000000800330 (0x10450513) addi a0, a0, 260
374623 : core 0: 0x0000000000800334 (0x00013583) ld a1, 0(sp)
374624 : core 0: 0x0000000000800338 (0x00810613) addi a2, sp, 8
374625 : core 0: 0x000000000080033c (0xff017113) andi sp, sp, -16
374626 : core 0: 0x0000000000800340 (0x00000697) auipc a3, 0x0
374627 : core 0: 0x0000000000800344 (0x14468693) addi a3, a3, 324
374628 : core 0: 0x0000000000800348 (0x00000717) auipc a4, 0x0
374629 : core 0: 0x000000000080034c (0x1cc70713) addi a4, a4, 460
374630 : core 0: 0x0000000000800350 (0x00010813) mv a6, sp
374631 : core 0: 0x0000000000800354 (0xfbdff06f) j pc - 0x44
374632 : core 0: 0x0000000000800310 (0x00001e17) auipc t3, 0x1
374633 : core 0: 0x0000000000800314 (0x498e3e03) ld t3, 1176(t3)
374634 : core 0: 0x0000000000800318 (0x000e0367) jalr t1, t3, 0
374635 : core 0: 0x00000000008002e0 (0x00001397) auipc t2, 0x1
374636 : core 0: 0x00000000008002e4 (0x41c30333) sub t1, t1, t3
374637 : core 0: 0x00000000008002e8 (0x4b03be03) ld t3, 1200(t2)
374638 : core 0: 0x00000000008002ec (0xfd430313) addi t1, t1, -44
374639 : core 0: 0x00000000008002f0 (0x4b038293) addi t0, t2, 1200
374640 : core 0: 0x00000000008002f4 (0x00135313) srli t1, t1, 1
374641 : core 0: 0x00000000008002f8 (0x0082b283) ld t0, 8(t0)
374642 : core 0: 0x00000000008002fc (0x000e0067) jr t3
374643 : core 0: exception trap_instruction_access_fault, epc 0xfffffffffffffffe
374644 core 0: 0x0000000000000100 (0x34011173) csrrw sp, mscratch, sp
374645 : core 0: 0x0000000000000104 (0x04a13823) sd a0, 80(sp)
374646 : core 0: 0x0000000000000108 (0x04b13c23) sd a1, 88(sp)
If you objdump asm_test, you'll see that it's in _start, then __libc_start_main, then __libc_start_main#plt (0x800310), and then _PROCEDURE_LINKAGE_TABLE_ (0x8002e0).
From there, it attempts a jr, which jumps to 0xfffffffffffffffe, which is a misaligned fetch address. Hence the crash.

Powerpc gnu eabi register initialization

I have created a minimal bare metal application that I am compiling with the codesourcery gnu powerpc eabi lite toolchain and loading on a powerPC target with a USB JTAG TAP. The application is created from an assembly file used to configure hardware and set up registers for the eabi, a main.c file that contains an infinite loop, a linker script, and a Makefile.
I have found a good deal of documentation on startup code for the powerpc, register initialization for the eabi, and the gnu linker for creating the linker script and have tried to follow it closely.
I have the application compiling and running until it reaches main. The problem I am having is when the assembly routines complete and an rfi is executed I see the PC transition to main() as expected. However, executing the first instruction in main (lis r9, 0) results in an exception, 0x700 (invalid instruction or fp exception).
The assembly routine initially contained code to invalidate the L1 data and instruction caches and disable them then enable only the L1 instruction cache. Suspecting some of that code was incorrect, I removed much of it and only have be bare minimum.
Could it be that I am missing a C runtime initialization step? Any other ideas? Thanks for your help in advance.
The assembly code now only consists of the following:
.text
.global resetHandler
.global _start
.global __eabi
.space(0x0100) /* locate start at hreset vector */
_start:
b resetHandler
.space(0x3000) /* locate the remainder past the exception vector space */
resetHandler:
xor r3, r3, r3
/* set SRR0 to main */
addis r3,r0,main#h
ori r3,r3,main#l
mtspr srr0,r3
/* save machine state register to srr1 */
mfmsr r0
mtspr srr1, r0
xor r1, r1, r1
lis r1, _stack_start#h
addi r1, r1, _stack_start#l
bl __eabi
/* place the address of done in the link register */
xor r3, r3, r3
addis r3, 0, done#h
ori r3, r3, done#l
mtlr r3
rfi
__eabi:
addis r13,r0,_SDA_BASE_#h
ori r13,r13,_SDA_BASE_#l
addis r2,r0,_SDA2_BASE_#h
ori r2,r2,_SDA2_BASE_#l
blr
done:
b .
The linker script follows:
OUTPUT_ARCH(powerpc)
ENTRY(resetHandler)
SEARCH_DIR(.)
MEMORY
{
ram (rwx) : ORIGIN = 0x000000, LENGTH = 1M
}
_STACK_SIZE = 8k;
_HEAP_SIZE = 32k;
SECTIONS
{
.text :
{
*(.text*)
*(.rodata*)
} >ram
.data : ALIGN (8)
{
_final_data_start = .;
*(.data)
_final_data_end = .;
} >ram
.sdata : { *(.sdata) } >ram
.sbss : { *(.sbss) } >ram
.sdata2 : { *(.sdata2) } >ram
.sbss2 : { *(.sbss2) } >ram
.bss : ALIGN (8)
{
_bss = .;
*(.bss*)
. = ALIGN (8);
_ebss = .;
} >ram
.stack :
{
_stack_end = .;
. = . + _STACK_SIZE;
. = ALIGN(16);
__stack = .;
}
.heap :
{
__heap = .;
. = . + _HEAP_SIZE;
. = ALIGN(16);
_heap_end = .;
}
_stack_start = __stack;
_heap_start = __heap;
_SDA2_BASE_ = ADDR(.sdata2);
_SDA_BASE_ = ADDR(.sdata);
}

Accessing external bus in kernel space on an ARM based board

I'm trying to write an LCD display driver on an ARM based board.
The LCD controller is plugged on the external memory bus.
So I try to convert the physical address of registers of the controller to the virtual one.
I use the following pieces of code to do that :
#define AT91_VA_BASE_CS2 phys_to_virt(0x50000000)
static inline unsigned char at91_CS2_read(unsigned int reg)
{
void __iomem *CS2_base = (void __iomem *)AT91_VA_BASE_CS2;
return __raw_readb(CS2_base + reg);
}
static inline void at91_CS2_write(unsigned int reg, unsigned char value)
{
void __iomem *CS2_base = (void __iomem *)AT91_VA_BASE_CS2;
__raw_writeb(value, CS2_base + reg);
}
void write_lcd_port (int mode, unsigned char cmd_dat)
{
while ((read_lcd_port() & 0x03) != 0x03) {
/* wait while LCD is busy!!! */
} /* endwhile */
/* Send Command */
if (mode == 1)
{
at91_CS2_write(4, cmd_dat);
}
/* Send Data */
if (mode == 0)
{
at91_CS2_write(0, cmd_dat);
}
}
I get the following message :
Unable to handle kernel paging request at virtual address 4f000004
pgd = c39bc000
[4f000004] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in: module_complet dm9000 at91_wdt vfat fat jffs2 nls_iso8859_1 nls_cp437 nls_base usb_storage sd_mod sg scsie
CPU: 0
PC is at read_lcd_port+0x1c/0x38 [module_complet]
LR is at 0x1
pc : [<bf0a21b8>] lr : [<00000001>] Tainted: P
sp : c380bf1c ip : 60000093 fp : c380bf2c
r10: 0003a804 r9 : c380a000 r8 : c001de64
r7 : 00000000 r6 : fefff000 r5 : 0000009c r4 : 00000001
r3 : 4f000000 r2 : 00000000 r1 : 00001438 r0 : bf0a25cc
Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user
Control: C000717F Table: 239BC000 DAC: 00000015
Process insmod (pid: 903, stack limit = 0xc380a198)
Stack: (0xc380bf1c to 0xc380c000)
bf00: 00000001
bf20: c380bf44 c380bf30 bf0a21f4 bf0a21ac 00000000 fefa0000 c380bf54 c380bf48
bf40: bf0a2288 bf0a21e4 c380bf64 c380bf58 bf0a246c bf0a2280 c380bf84 c380bf68
bf60: bf0a4058 bf0a2464 40017000 c01c89a0 bf0a2d80 c01c8990 c380bfa4 c380bf88
bf80: c004cd20 bf0a4010 00000003 00000000 0000000c 00000080 00000000 c380bfa8
bfa0: c001dcc0 c004cbc8 00000000 0000000c 00900080 40017000 0000162e 00041050
bfc0: 00000003 00000000 0000000c bea0fde4 bea0fec4 00000002 0003a804 00000000
bfe0: bea0fd10 bea0fd04 0001b290 400d1d20 60000010 00900080 20002031 20002431
Backtrace:
[<bf0a219c>] (read_lcd_port+0x0/0x38 [module_complet]) from [<bf0a21f4>] (write_lcd_port+0x20/0x80 [module_complet])
r4 = 00000001
[<bf0a21d4>] (write_lcd_port+0x0/0x80 [module_complet]) from [<bf0a2288>] (wr_cmd+0x18/0x1c [module_complet])
r5 = FEFA0000 r4 = 00000000
[<bf0a2270>] (wr_cmd+0x0/0x1c [module_complet]) from [<bf0a246c>] (lcd_init+0x18/0x80 [module_complet])
[<bf0a2454>] (lcd_init+0x0/0x80 [module_complet]) from [<bf0a4058>] (mon_module_init+0x58/0xcc [module_complet])
[<bf0a4000>] (mon_module_init+0x0/0xcc [module_complet]) from [<c004cd20>] (sys_init_module+0x168/0x2c8)
r6 = C01C8990 r5 = BF0A2D80 r4 = C01C89A0
[<c004cbb8>] (sys_init_module+0x0/0x2c8) from [<c001dcc0>] (ret_fast_syscall+0x0/0x2c)
r7 = 00000080 r6 = 0000000C r5 = 00000000 r4 = 00000003
Code: e59f001c eb3e43c2 e3a0344f e59f0014 (e5d34004)
Segmentation fault
Note that this method works for internal peripherals (such as timers).
So in some cases, phys_to_virt works.
I think that no page is allocated at the address 0x50000000. How can I allocate a page at this specific address ?
I found functions like kmap but it seems to be very complicated and I don't know how to use it.
The best way to access memory-mapped peripherals is with the kernel's ioremap and friends.
First, declare that you want to use a specific region of memory for your peripheral:
struct resource *res = request_mem_region(0x50000000, region_size, "at91");
When you unload your driver, you will want to free that memory region.
release_mem_region(0x50000000, region_size);
Now, you can remap the I/O region before use.
void *ptr = ioremap(0x50000000, region_size);
If you want to prevent caching of these registers, use ioremap_nocache instead. You can also only remap a subregion of your device's memory space if you're only using that part.
Now that you have the iomapped region, you can do I/O on that memory.
iowrite8(value, (char *)ptr + reg);
unsigned int val = ioread8((char *)ptr + reg);
Once you're done reading from and writing to that region of memory, you can unmap it.
iounmap(ptr);
I hope this helps. I would recommend reading (or at least using as a reference) Linux Device Drivers, 3rd Edition, which can be read online for free.

Resources