Using values of loaded registers in rocket-chip custom instructions - riscv

I am trying to add a custom instruction to a freedom e300 rocket-chip.
The custom instruction is to perform an operation using the values of registers a0 to a7 that have been pre-loaded. (Load the values in the a0 to a7 registers with the LUI instruction and then execute the custom instruction)
lui a0, val0
lui a1, val1
:
:
CUSTOM_INST rd //<== Calculate using the values of a0, a1, a2, a3, a4, a5, a6, a7
I am trying to modify the ALU module (ALU class in rocket-chip/src/main/scala/rocket/ALU.scala) to add arithmetic operations for custom instructions.
So I want to refer to the values of registers a0 to a7 that have been loaded with values in advance in the arithmetic process, but I don't know how to describe it.

There is a standardized interface for custom instructions called ROCC. If you have a look at the Chipyard documentation, they have a chapter on how you can add custom instructions using this interface: https://chipyard.readthedocs.io/en/latest/Customization/RoCC-Accelerators.html
But more concretely, you will generally have a hard time to add an instruction that works with more than two source registers, since the register file in RocketChip only has two read ports. In general, high numbers of read-ports are hard to impossible to fabricate for a read chip. Instead, consider breaking up your custom instruction into several independent steps. One example could be a custom instruction to load values into your accelerator and a different one to tell your accelerator to compute the result once all values have been loaded.

Related

Can you add two address registers in 68000 using ADDA

For example if I write "ADDA A1, A5" would this be valid?
Sure you can. That's what the instruction is good for.
Note you must put the .l postfix after the instruction if you want a full 32-bit addition. Without the postfix, the assembler will assume word (16-bit) size and add the sign-extended word source operand to the destination.

RISC-V: S-format instructions table

I have this table of S-format instructions. Can you explain to me what imm[11:5] and funct3 are? I know funct indicates its size in bits and sometimes it is 000 or 010. I don't know exactly why it's there. Also, imm[11:5] is also 7-bits of all 0s.
Please help!
imm[4:0] and imm[11:5] denote closed-intervals into the bit-representation of the immediate operand.
The S-format is used to encode store instructions, i.e.:
sX rs2, offset(r1)
There are different types of store instructions, e.g. store-byte (sb), store-half-word (sh), store-word (sw), etc. The funct3 part is used to encode the type (i.e. 0b00 -> sb, 0b010 -> sw, 0b011 -> sd, etc.). This allows to just use one (major) opcode while still having multiple types of store instructions - instead of having to waste several (major) opcodes. IOW, funct3 encodes the minor opcode of the instruction.
The immediate operand encodes the offset. If you ask yourself why it's split like this - it allows to increase similarity of the remaining parts in the encoding with other instruction formats. For example, the opcode, rs1, and funct3 parts are located at the exact same place in the R-type, I-type and B-type instruction formats. The rs2 part placement is shared with the R-type and B-type instruction formats. Those similarities help to simplify the instruction decoder.
That means the offset is 12 bit wide and in pseudo-code:
offset = sign_ext(imm[11:5] << 5 | imm[4:0])
See also the first figure in Section 2.6 (Load and Store Instruction) of the RISC-V Base specification (2019-06-08 ratified):

RiscV assembler - output is not what I expected for register and immediate operands

I am compiling (with RV32I assembler) the following code - with no errors posted on the command line.
slt x15,x16,x17 # line a
slt x15,x16,22 # line b immediate operand
slti x15,x16,22 # line c
sltu x15,x16,x17 # line d
sltu x15,x16,22 # line e immediate operand
sltiu x15,x16,22 # line f
I notice that the machine code generated for line b is identical to the machine code generated for line c. And I notice the same situation with line e and f - the machine code from these 2 lines are identical. This machine output for these specific instructions, does not meet my expectation. Shouldn't the assembler throw an error or warning that the operands are not technically correct for "slt x15,x16,22" - and the immediate version of this instruction should be used - "slti x15,x16,22"? I invoke the assembler with the '-warn' option.
This result appears to defeat the purpose of having 2 different versions of these instructions. A version where all operands are registers and another version that has registers and one immediate operand. What if the intention was to use 'x22' instead of '22'?
As mentioned in a comment, I have moved this issue to GitHub as issue #79 on riscv/riscv-binutils-gdb.
The short answer to my original question is the assembler has a feature that will convert an instruction like SLTU, regX,regY,imm to the immediate version of the instruction - SLTIU regX,regY,imm. I have not seen any documentation that explains this feature.
By experimenting, here is a list of instructions I have discovered that perform this operation.
.text
slt x0,x0,-1 # bug
sltu x0,x0,0 # -> sltiu
add x0,x0,5 # -> addi
xor x0,x0,8 # -> xori
or x0,x0,12 # -> ori
and x0,x0,16 # -> andi
sll x0,x0,6 # -> slli
srl x0,x0,4 # -> srli
sra x0,x0,9 # -> srai
These instructions assemble with no errors or warnings. And I verified the machine code with the list file output below. (this task is simplified by using the x0 register).
Disassembly of section .text:
0000000000000000 <.text>:
0: fff02013 slt x0,x0,-1
4: 00003013 sltiu x0,x0,0
8: 00500013 addi x0,x0,5
c: 00804013 xori x0,x0,8
10: 00c06013 ori x0,x0,12
14: 01007013 andi x0,x0,16
18: 00601013 slli x0,x0,0x6
1c: 00405013 srli x0,x0,0x4
20: 40905013 srai x0,x0,0x9
The SLT instruction will write machine code for SLTI but the list file shows SLT - I consider this a bug. For detailed arguments see GitHub #79. All other instructions work as expected.
This approach works only if you have base instruction pairs in the base instructions. Like ADD/ADDI or XOR/XOI. But alas, SUB does not have a SUBI instruction in the RiscV ISA. I confirmed this when I received an error trying to assemble SUB with an immediate operand. So if you are the lazy assembler programmer and you don't want to use the correct operands for a base instruction - now you have to remember that should work fine except for SUB. Or add the SUBI instruction to your custom RiscV ISA.
What follows are some philosophy comments (so, you can skip the rest of this Answer if your RiscV project is due tomorrow). First, I feel guilty being critical of any open-source project. I am a long time Linux user and have used many open source tools. Not just for hobby work but for products used by IBM, HP and Dell. I have maybe 6 assemblers I have used in the past - at various levels of expertise. Starting way back with 8080/8085 and I have taught assembly language/computer architecture at the college level. I have to admit there is a lot of expertise that has huddled around RiscV - but none-the-less, I do not consider myself a total noob in assemblers.
1) Assemblers should stay close to the base instructions - and therefore they should present very good reasons when they deviate. Things like this feature where ADD is internally converted to ADDI inside the assembler - I feel this feature offers very little value. IMO There may be some value when using disassembly from C/C++ - but I can't put my finger on it. If someone has some details on why this approach was taken please post.
2) RiscV was touted as a fresh new, open ISA. However, it is similar to MIPS and the problem is MIPS binutils baggage comes with RiscV. Seems like I have run head-on into the "it worked in MIPS so it has to work in RiscV" thinking on GitHub #79.
3) If you don't like the assembly mnemonics - or are too lazy to bother using the correct operands for an instruction - then please consider writing a macro. For example, you can write a macro for the SUB operation to handle immediate arguments. Resist the urge to carry the macro idea into the assembler - especially if it will not be well documented to new users. This feature I have discovered, is very similar to a built-in macro in the assembler.
4) Bugs in list files are important - to some people they are critical to the verification task. They should be taken seriously and fixed. I am not certain if the bug on SLT to SLTI for the list file is the fault of the assembler it may be a problem in the binutils objdump utility.
5) Pseudoinstructions that are defined in the ISA - are like built-in macros. I think they should be used sparingly. Since, I think they can add more confusion. I write macros for my stack operations like PUSH and POP. I don't mind writing those macros - I don't feel I need many pseudoinstructions in the assembler or in the ISA. People who are familiar with gcc/gnu style assembler syntax should be able to quickly code-up some test code using only base instructions and not have to worry about discovering tricks in the assembler. I stumbled on the SLT trick by accident (typo).
6) This trick of converting instructions in the RiscV assembler comes at the expense of 'strong typing' the operands. If you make a typo (like I did) - but you intended to use all register operands for the base instruction - you will get the immediate form of the instruction with no warnings posted. So consider this a friendly heads-up. I prefer to invoke the KIS principle in assemblers and lean toward strict enforcement of the correct operands. Or why not offer an assembler option to turn on/off this feature?
7) More and more it seems like assemblers are used mostly for debug and verification and not for general purpose software development. If you need more abstract code tools - you typically move to C or C++ for embedded cores. Yes you could go crazy writing many assembly macros, but it is much easier to code in C/C++. You use some inline assembler perhaps to optimize some time critical code - and certainly it helps to disassemble to view compiled C/C++ code. But the C/C++ compilers have improved so much that for many projects this can make assembly optimization obsolete. Assembly is used for startup code - e.g. if you port Uboot bootloader to another processor you probably will have to deal with some start up files in assembler. So, I think the purpose of assemblers has morphed over time to some startup file duty but the most value in debug and verification. And that is why I think things like list files have to be correct. The list of commands that have this feature (e.g. converting from ADD to ADDI based on operand type), mean that the assembly programmer needs to master only one instruction. But RiscV has a small list of base instructions anyway. This is apparent if you had any experience with the old CISC processors. In fact, Risc processors by default should have a small instruction sets. So my question in my original post - why have the immediate version of the instruction? The answer is - for the instructions I have identified - you don't need them. You can code them with either all registers or registers and an immediate value - and the assembler will figure it out. But the HW implementation most definitely needs both versions (register only operands and register and immediate operands). E.g. the core needs to steer ALU input operands from either the register file output or the immediate value that was stripped from the instruction word.
So, the answer to my original question - "why does this create the exact same machine code?" - is "because that is how the assembler works". But as it stands today - this feature works most of the time..

Good programming practice: use internal procedures to take advantage of variables scope

I am doing some coding in Fortran 95. I would like to know if using subroutines changing global variables defined in modules is considered bad programming practice. I tend to use only pure subroutines in general but in this case I cannot use "pure", right. As an alternative I could define variables in a subroutine and then use those variables in procedures internal to that subroutine, as in the example below. Is that acceptable?
subroutine test(X, Y)
implicit none
integer, parameter(dp) :: kind(0.d0)
integer, parameter(dp) :: r1d3 = 1._dp / 3._dp
real(dp), intent(in) :: X(20)
real(dp), intent(out) :: Y(20)
real(dp) :: f1, f2, f3, f4, f5, f6, f7, f8, f9, f10
real(dp), dimension(20) :: g1, g2, g3, g4, g5, g6, g7, DX
real(dp) :: res(20), jac(20,20)
f1 = exp(- norm2(X(7:12)))
g1 = X(1:6) - r1d3 * sum(X(1:3))
! code to calculate variables f1..., g1...
! functions of X
! f1 ... f10, g1 .. g7, are needed to compute both the residual and the jacobian
call residual(X, res)
condition = ( norm2(res) < tol )
! I do not want to calculate the jacobian if this is not needed. Should I?
if (condition) then
call jacobian(X, jac)
end if
DX = -res
call gesv(jac, DX)
! and so on
contains
pure subroutine residual(X, res)
....
end subroutine residual
pure subroutine jacobian(X, jac)
....
end subroutine jacobian
Is the code above decently written? I could have included the computation of both the residual and the jacobian in the same subroutine and do all the needed calculations of f1 ... g7 there, avoiding the definition of residual and jacobian as internal subroutines, but I only want to calculate the jacobian if needed. What do you think?
I thought the following alternative could also work:
module EP_integration
implicit none
integer, parameter(dp) :: kind(0.d0)
real(dp), PRIVATE, SAVE :: f1, f2, f3, f4, f5, f6, f7, f8, f9, f10
real(dp), dimension(20), PRIVATE, SAVE :: g1, g2, g3, g4, g5, g6, g7
contains
pure subroutine calc_funcs(X, res)
! calculates f1 .. f10, g1 .. g10 as functions of X
! f1 ... f10, g1 .. g7, are needed to compute both the residual and the jacobian
....
end subroutine calc_funcs
pure subroutine residual(X, res)
....
end subroutine residual
pure subroutine jacobian(X, jac)
....
end subroutine jacobian
end module EP_integration
or maybe USEing the module in the main subroutine instead of using the attribute SAVE.
I would like to know if using subroutines changing global variables defined in modules is considered bad programming practice.
It certainly is widely considered to be bad practice, but I bet you know that. As always, there are arguments for special cases. Personally, for example, I have no problem with a value for pi being global, but then that's something that my programs rarely update.
The rest of your question prompts the thought that you have probably not packaged your data properly -- very long argument lists suggest to me that you may not have defined data types to organise your data at the right levels.
But beyond those broad platitudes it's very difficult to provide any kind of a good answer with so little detail in your question.
Global variables are not a problem in themselves. The problem is the mutability of the variables, which is even worse when the variables are global.
What adds complexity to a code is the time dependence of the variables. This is well explained in the lecture of Pecquet (if you can read french, this course explains things very well) In this code :
a=b
! some code
a=c
the variable a has a value that changes during the execution of the program. This change is made by changing the state of the memory using a side effect and often, this can be avoided. For example, pure functional programming languages kill this complexity by forbidding mutable variables and programs are much more under control than with imperative languages.
If a can be modified in another subroutine, it will be much more difficult to know in which state a is. And if you are in a multithreaded program where each thread can modify a, it can become a nightmare.
However, most scientific program make use of some entities that have to be used by the majority of the subroutines and functions, and this often leads to your question. Often, you will have to use mutable global variables in your codes, so you will have to keep them coherent. In the case of a conjugate gradient, the global variables are mutable between the iterations, but they are constant in a given iteration. Global constants (such as pi) are not a problem, since they are not time dependent. So you have two different time scales : the time scale of the CPU instructions and the time scale of the iterations. To keep the control of your code, you will have to mutate your global data at a well defined "checkpoints" (the end of each iteration), so you know that the global data is constant during an iteration.
A simple solution to keep the coherence is to have a global variable for the current iteration A_current and a variable you are constructing for the next iteration A_next. At the end of the iteration, you copy (or swap pointers of) the A_next and A_current. This guarantees that for a given iteration you know the global state.
For more complicated problems, you can use the Implicit Reference to Parameters (IRP) strategy explained in this GitBook
and you can use IRPF90 which is an open-source Fortran code generator I develop and use for all my codes to program using this method.
In general having a lot of global variables make a code unreadable and difficult to maintain. So having several dozens (or hundred!) of global variables may be considered as the symptom of bad software design.
Fortran95 has data types with the TYPE keyword. So you can define composite (and nested) data types (made of "smaller" or "simpler" components) and use them (perhaps as abstract data types). You could have some functions to build composite data, and other functions to operate on them (and change them).
A rule of thumb code discipline is, for fixed arity functions (non variadic), to have them accept less than 5 to 8 formal arguments (for readability reasons, and cognitive limitations of humans). You are coding not only for the computer, but also for human colleagues (present or future ones, perhaps even yourself in a couple of months) who would need to understand your source code.
Some old Fortran77 code had functions with several dozens of formal arguments, but that makes them unreadable.
If your management permits it, I strongly suggest to use at least a more recent version of Fortran, e.g. Fortran2008 (and some numerical codes are coded today in C99 or C++11, perhaps with some OpenCL or CUDA or OpenMP or OpenACC, for very good reasons; I even know some numerical scientists coding some code in Ocaml... So switching to some better language might be considered).
BTW, if you have no formal education in programming, I believe that learning some basics is worthwhile, even for numerical scientists. Reading SICP (and playing with some Scheme implementation) will widen a lot your thinking and will improve your daily Fortran coding. Also, if you don't know it, read http://floating-point-gui.de/ which is relevant for every numerical scientist writing numerical code. It takes ten years to learn programming (or something else, like numerical analysis), so be patient and persevering.
You could also consider using tools like Scilab, Octave, R ...

Decoding 68k instructions

I'm writing an interpreted 68k emulator as a personal/educational project. Right now I'm trying to develop a simple, general decoding mechanism.
As I understand it, the first two bytes of each instruction are enough to uniquely identify the operation (with two rare exceptions) and the number of words left to be read, if any.
Here is what I would like to accomplish in my decoding phase:
1. read two bytes
2. determine which instruction it is
3. extract the operands
4. pass the opcode and the operands on to the execute phase
I can't just pass the first two bytes into a lookup table like I could with the first few bits in a RISC arch, because operands are "in the way". How can I accomplish part 2 in a general way?
Broadly, my question is: How do I remove the variability of operands from the decoding process?
More background:
Here is a partial table from section 8.2 of the Programmer's Reference Manual:
Table 8.2. Operation Code Map
Bits 15-12 Operation
0000 Bit Manipulation/MOVEP/Immediate
0001 Move Byte
...
1110 Shift/Rotate/Bit Field
1111 Coprocessor Interface...
This made great sense to me, but then I look at the bit patterns for each instruction and notice that there isn't a single instruction where bits 15-12 are 0001, 0010, or 0011. There must be some big piece of the picture that I'm missing.
This Decoding Z80 Opcodes site explains decoding explicitly, which is something I haven't found in the 68k programmer's reference manual or by googling.
I've decided to simply create a look-up table with every possible pattern for each instruction. It was my first idea, but I discarded it as "wasteful, inelegant". Now, I'm accepting it as "really fast".

Resources