Custom Instruction in RISCV

Custom Instruction in RISCV - riscv

I am following the gem5 to add the custom instruction. My question is how to the interpret operands mentioned in "const struct riscv_opcode riscv_opcodes[]" in riscv-opc.h.
For an example :
{"mod", "I", "d,s,t", MATCH_MOD, MASK_MOD, match_opcode, 0 }
.how "d,s,t" are interpret here?
Can anyone explain the this whole statement
refLink :https://nitish2112.github.io/post/adding-instruction-riscv/

According to the comment at the top of the array describing the instructions :
/* name, isa, operands, match, mask, match_func, pinfo. */
The line says that
{"mod", "I", "d,s,t",
mod belongs to the Integer ISA and that it is a triadic instruction, meaning that it takes 3 registers whose symbolic names are d,s,t.
d being the destination register, s and t being source registers.

This question is almost 4 years old, but since I spent a lot of time figuring out similar things, I would like to post my knowledge in case anyone needs it.
"mod" is the instruction label, "I" is the type of the instruction and in this case, it is an integer instruction. It takes three registers, the destination "d" register and the source registers "s" and "t".
The MASK_MOD and MATCH_MOD are used for the instruction matching in the assembler. The instruction matching is done by the function match_opcode that you are passing in the next parameter. This function does the instruction matching is done as follows:
((insn ^ MATCH_MOD) & MASK_MOD) == 0
This means that the instruction (32 bits in length) is XOR'ed with the MATCH_MOD and then the result is AND'ed with the MASK_MOD. The result must always be zero to match the "mod" instruction that you are adding. This means you have to define the instruction opcode, FUNCT7, and FUNCT3 accordingly in the riscv-opcodes/opcodes file included in RISCV-GNU-Toolchain. You should also define the MASK_MOD and MATCH_MOD in riscv-isa-sim/riscv/encoding.h.

Related

RISC-V user level reference or reference implementation

Summary: What is the definitive reference or reference implementation for the RISC-V user-level ISA?
Context: The RISC-V website has "The RISC-V Instruction Set Manual" which explains the user-level instructions very well, but does not give an exact specification for them. I am trying to build a user-level ISA simulator now and intend to write an FPGA implementation later, so the exact behavior is important to me.
A reference implementation would be sufficient, but should preferably be as simple as possible -- i.e. I would try to understand a pipelined implementation only as a last resort. What is important is to have an understanding of the specified ISA and not of a single CPU implementation or compiler implementation.
One example to show my problem is the AUIPC instruction: The prose explanation says that "AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd." I wanted to know whether this refers to the old or new PC, i.e. the position of the AUIPC instruction or the next instruction. I looked at the "RISCV Angel" implementation, but that seems to mask out the lower bits of the (old) PC -- not just of the immediate -- which I could not find any reason for in the spec, not even in the change history of the spec (since Angel is a bit older). Instead of an answer, I now have two questions about AUIPC. Many other instructions pose similar problems to me.

AFAICT the RISC-V Instruction Set Manual you cite is the closest thing there is to a definitive reference. If there are things that are unclear or incorrect in there then you could open issues at the Github site where that document is maintained: https://github.com/riscv/riscv-isa-manual
As far as AIUPC is concerned, the answer is implied, but not stated explicitly, by this sentence at the bottom of page 9 in the current manual:
There is one additional user-visible register: the program counter pc holds the address of the current instruction.
Based on that statement I would expect that the pc value that is seen and manipulated by the AIUPC instruction is the address of the AIUPC instruction itself.
This interpretation is supported by the discussion of the JALR instruction:
The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the least-signicant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.
Given that the address of the following instruction is expressed as pc+4, it seems clear that the pc value visible during the execution of JALR is the address of the JALR instruction itself.
The latest draft of the manual (at https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190321-ba17106/riscv-spec.pdf) makes the situation slightly clearer. In place of this in the current manual:
AUIPC appends 12 low-order zero bits to the 20-bit U-immediate, sign-extends the result to 64 bits, then adds it to the pc and places the result in register rd.
the latest draft says:
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc of the AUIPC instruction, then places the result in register rd.

When will CPSR GE[3:0] bits be modified

I read in ARM docs that:
GE[3:0], bits[19:16]
The instructions described in Parallel addition and subtraction instructions on
page A4-171 update these flags to indicate the results from individual bytes or halfwords
of the operation. These flags can control a later SEL instruction.
So apparently GE[3:0] stands for "eq/lt/gt"?
I came into a couple of strange issues which I yet don't have a clue, but they all have CPSR value xxxf0030, so the GE bits are 0b1111? What does that stands for? Is it normal for these GE bits?
Thanks in advance!

In the ARMv7 ARM (which matches that text), the details of how the GE flags get set are only in the operation pseudocode of the parallel instructions themselves. Sadly, they seem to have removed this nice prose description which was in the ARMv6 ARM:
Instructions that operate on halfwords:
set or clear GE[3:2] together, based on the result of the top halfword calculation
set or clear GE[1:0] together, based on the result of the bottom halfword calculation.
Instructions that operate on bytes:
set or clear GE[3] according to the result of the top byte calculation
set or clear GE[2] according to the result of the second byte calculation
set or clear GE[1] according to the result of the third byte calculation
set or clear GE[0] according to the result of the bottom byte calculation.
Each bit is set (otherwise cleared) if the results of the
corresponding calculation are as follows:
for unsigned byte addition, if the result is greater than or equal to 2^8
for unsigned halfword addition, if the result is greater than or equal to 2^16
for unsigned subtraction, if the result is greater than or equal to zero
for signed arithmetic, if the result is greater than or equal to zero.
As arithmetic flags, they could have any old value (undefined at reset, and can be freely written to via APSR), so until you've specifically used one of the instructions which sets them, they're pretty much meaningless and can be ignored.

How to decode cmd = 3222823425 in ioctl in Linux 2.6.29?

I am just confused like how can I break cmd=3222823425 value into different parts to figure out what this command means actually in the Linux kernel. I know, some functions are making ioctl command with following parameters but I want to know what these parameter values mean.
fd=21, cmd=3222823425 and arg=3203118816
I have been looking into various forums, man pages and other links to figure this out like what does it mean when a cmd in an ioctl system call has value of 3222823425. I have found that cmd is a command number which consists of type, number and data_type and first twos are 8-bit integers (0-255).
So my question is how to decode these parameter values to find out what this call is trying to do?

Beware to refer to the right documentation to understand how to decode an ioctl command. Documentation/ioctl-number.txt explanes how to create a new ioctl code, while the document linked in the previous answer gives an overview of the overall process before focusing on ioctl creation as well. asm/ioctl.h is a better source, because an ioctl actual masking may vary across different architectures, but an explanation of the general convention and bitfields meaning and position can be found in include/asm-generic/ioctl.h and Documentation/ioctl-decoding.txt.
From the latter:
bits meaning
31-30 00 - no parameters: uses _IO macro
10 - read: _IOR
01 - write: _IOW
11 - read/write: _IOWR
29-16 size of arguments
15-8 ascii character supposedly
unique to each driver
7-0 function #
According to the above, cmd=3222823425 should decode as:
3222823425 -> 0xC0186201 -> 11000000000110000110001000000001
- `direction` -> `11` -> read/write;
- `size` -> `00000000011000` -> 24 bytes (a pointer to a struct of
this size should be passed as 3rd
argument of ioctl();
- `type` -> `01100010` -> 0x62, ascii for character 'b';
- `number` -> `00000001` -> driver function #1.
In the hope this can help.
Regards.

According to this link, ioctl command number has multiple components:
type. The magic number.This field is _IOC_TYPEBITS bits wide (usually 8)
number. The ordinal (sequential) number. It's _IOC_NRBITS bits wide. (usually 8)
direction. The direction of data transfer. The possible values are _IOC_NONE (no data transfer), _IOC_READ, _IOC_WRITE, and _IOC_READ|_IOC_WRITE (data is transferred both ways). It's usually 2 bits.
size. The size of user data involved. It's _IOC_SIZEBITS wide (14 bits).
You should consult include/asm/ioctl.h and Documentation/ioctl-number.txt for your kernel to see the actual configuration.
For your case 3222823425==0xC0186201
So:
type==0xC0
number==0x18
direction==0x1
size==0x2201
(6 in bits is 0110, so the size is the first two bits(01), the remaining bits are put in data_type, which remains 0x2201)

What is the purpose of this code segment from glibc

I am trying to understand what the following code segment from tls.h in glibc is doing and why:
/* Macros to load from and store into segment registers. */
# define TLS_GET_FS() \
({ int __seg; __asm ("movl %%fs, %0" : "=q" (__seg)); __seg; })
I think I understand the basic operation it is moving the value stored in the fs register to __seg. However, I have some questions:
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
More importantly I think that the scope of the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful? I'm sure that the authors of glibc have a good reason for doing this but I can't figure out what it is from looking at the source code.
I tried generating assembly for this code and I got the following?
#APP
# 13 "fs-test.cpp" 1
movl %fs, %eax
# 0 "" 2
#NO_APP
So in my case it looks like eax was used for __seg. But I don't know if that is always what happens or if it was just what happened in the small test file that I compiled. If it is always going to use eax why wouldn't the assembly be written that way? If the compiler might pick other registers then how will the programmer know which one to access since __seg goes out of scope at the end of the macro? Finally I did not see this macro used anywhere when I grepped for it in the glibc source code, so that further adds to my confusion about what its purpose is. Any explanation about what the code is doing and why is appreciated.

My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
Yes.
the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful?
You have to read about GCC statement-expression extension. The value of statement expression is the value of the last expression in it. The __seg; at the end would be useless, unless one assigns it to something else, like this:
int foo = TLS_GET_FS();
Finally I did not see this macro used anywhere when I grepped for it in the glibc source code
The TLS_{GET,SET}_FS in fact do not appear to be used. They probably were used in some version, then accidentally left over when the code referencing them was removed.

Initial state of program registers and stack on Linux ARM

I'm currently playing with ARM assembly on Linux as a learning exercise. I'm using 'bare' assembly, i.e. no libcrt or libgcc. Can anybody point me to information about what state the stack-pointer and other registers will at the start of the program before the first instruction is called? Obviously pc/r15 points at _start, and the rest appear to be initialised to 0, with two exceptions; sp/r13 points to an address far outside my program, and r1 points to a slightly higher address.
So to some solid questions:
What is the value in r1?
Is the value in sp a legitimate stack allocated by the kernel?
If not, what is the preferred method of allocating a stack; using brk or allocate a static .bss section?
Any pointers would be appreciated.

Since this is Linux, you can look at how it is implemented by the kernel.
The registers seem to be set by the call to start_thread at the end of load_elf_binary (if you are using a modern Linux system, it will almost always be using the ELF format). For ARM, the registers seem to be set as follows:
r0 = first word in the stack
r1 = second word in the stack
r2 = third word in the stack
sp = address of the stack
pc = binary entry point
cpsr = endianess, thumb mode, and address limit set as needed
Clearly you have a valid stack. I think the values of r0-r2 are junk, and you should instead read everything from the stack (you will see why I think this later). Now, let's look at what is on the stack. What you will read from the stack is filled by create_elf_tables.
One interesting thing to notice here is that this function is architecture-independent, so the same things (mostly) will be put on the stack on every ELF-based Linux architecture. The following is on the stack, in the order you would read it:
The number of parameters (this is argc in main()).
One pointer to a C string for each parameter, followed by a zero (this is the contents of argv in main(); argv would point to the first of these pointers).
One pointer to a C string for each environment variable, followed by a zero (this is the contents of the rarely-seen envp third parameter of main(); envp would point to the first of these pointers).
The "auxiliary vector", which is a sequence of pairs (a type followed by a value), terminated by a pair with a zero (AT_NULL) in the first element. This auxiliary vector has some interesting and useful information, which you can see (if you are using glibc) by running any dynamically-linked program with the LD_SHOW_AUXV environment variable set to 1 (for instance LD_SHOW_AUXV=1 /bin/true). This is also where things can vary a bit depending on the architecture.
Since this structure is the same for every architecture, you can look for instance at the drawing on page 54 of the SYSV 386 ABI to get a better idea of how things fit together (note, however, that the auxiliary vector type constants on that document are different from what Linux uses, so you should look at the Linux headers for them).
Now you can see why the contents of r0-r2 are garbage. The first word in the stack is argc, the second is a pointer to the program name (argv[0]), and the third probably was zero for you because you called the program with no arguments (it would be argv[1]). I guess they are set up this way for the older a.out binary format, which as you can see at create_aout_tables puts argc, argv, and envp in the stack (so they would end up in r0-r2 in the order expected for a call to main()).
Finally, why was r0 zero for you instead of one (argc should be one if you called the program with no arguments)? I am guessing something deep in the syscall machinery overwrote it with the return value of the system call (which would be zero since the exec succeeded). You can see in kernel_execve (which does not use the syscall machinery, since it is what the kernel calls when it wants to exec from kernel mode) that it deliberately overwrites r0 with the return value of do_execve.

Here's what I use to get a Linux/ARM program started with my compiler:
/** The initial entry point.
*/
asm(
" .text\n"
" .globl _start\n"
" .align 2\n"
"_start:\n"
" sub lr, lr, lr\n" // Clear the link register.
" ldr r0, [sp]\n" // Get argc...
" add r1, sp, #4\n" // ... and argv ...
" add r2, r1, r0, LSL #2\n" // ... and compute environ.
" bl _estart\n" // Let's go!
" b .\n" // Never gets here.
" .size _start, .-_start\n"
);
As you can see, I just get the argc, argv, and environ stuff from the stack at [sp].
A little clarification: The stack pointer points to a valid area in the process' memory. r0, r1, r2, and r3 are the first three parameters to the function being called. I populate them with argc, argv, and environ, respectively.

Here's the uClibc crt. It seems to suggest that all registers are undefined except r0 (which contains a function pointer to be registered with atexit()) and sp which contains a valid stack address.
So, the value you see in r1 is probably not something you can rely on.
Some data are placed on the stack for you.

I've never used ARM Linux but I suggest you either look at the source for the libcrt and see what they do, or use gdb to step into an existing executable. You shouldn't need the source code just step through the assembly code.
Everything you need to find out should happen within the very first code executed by any binary executable.
Hope this helps.
Tony

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string