How the RISC-V HW can determine the privilege level? - riscv

The RISC-V current SW privilege level is not set in any CSR. Nevertheless the spec states that "Attempts to access a CSR without appropriate privilege level ... raise illegal instruction". How can it be implemented then (in the HW)?

Well, on interrupts - "xPP holds the previous privilege mode (x=M,S or U). The xPP fields can only hold privilege modes up to x, so MPP is two bits wide, SPP is one bit wide, and UPP is implicitly zero."
Actually, what I have found now is that the xRET instruction enables the processor to store (internally) the current mode - "The MRET, SRET, or URET instructions are used to return from traps in M-mode, S-mode, or U-mode respectively. When executing an xRET instruction, supposing xPP holds the value y, x IE is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if user-mode is not supported)."

The privilege level is reflected in the MPP bits of the mstatus register.

We have mstatus.mPP. that holdS the previous privilege mode. Current privilege mode is not visible to software.
on interrupt the mstatus.mPP is saved to mcause.mPP.. on mrwt, its just written back to mstatus.mPP.

I found this answer from sifive forums quite helpful when I was looking for the same question.
RISC-V deliberately doesn’t make it easy for code to discover what
mode it is running it because this is a virtualisation hole. As a
general principle, code should be designed for and implicitly know
what mode it will run in. Applications code should assume it is in U
mode. The operating system should assume it is in S mode (it might in
fact be virtualised and running in U mode, with things U mode can’t do
trapped and emulated by the hypervisor).
https://forums.sifive.com/t/how-to-determine-the-current-execution-privilege-mode/2823

Related

RISC-V user level reference or reference implementation

Summary: What is the definitive reference or reference implementation for the RISC-V user-level ISA?
Context: The RISC-V website has "The RISC-V Instruction Set Manual" which explains the user-level instructions very well, but does not give an exact specification for them. I am trying to build a user-level ISA simulator now and intend to write an FPGA implementation later, so the exact behavior is important to me.
A reference implementation would be sufficient, but should preferably be as simple as possible -- i.e. I would try to understand a pipelined implementation only as a last resort. What is important is to have an understanding of the specified ISA and not of a single CPU implementation or compiler implementation.
One example to show my problem is the AUIPC instruction: The prose explanation says that "AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd." I wanted to know whether this refers to the old or new PC, i.e. the position of the AUIPC instruction or the next instruction. I looked at the "RISCV Angel" implementation, but that seems to mask out the lower bits of the (old) PC -- not just of the immediate -- which I could not find any reason for in the spec, not even in the change history of the spec (since Angel is a bit older). Instead of an answer, I now have two questions about AUIPC. Many other instructions pose similar problems to me.
AFAICT the RISC-V Instruction Set Manual you cite is the closest thing there is to a definitive reference. If there are things that are unclear or incorrect in there then you could open issues at the Github site where that document is maintained: https://github.com/riscv/riscv-isa-manual
As far as AIUPC is concerned, the answer is implied, but not stated explicitly, by this sentence at the bottom of page 9 in the current manual:
There is one additional user-visible register: the program counter pc holds the address of the current instruction.
Based on that statement I would expect that the pc value that is seen and manipulated by the AIUPC instruction is the address of the AIUPC instruction itself.
This interpretation is supported by the discussion of the JALR instruction:
The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the least-signicant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.
Given that the address of the following instruction is expressed as pc+4, it seems clear that the pc value visible during the execution of JALR is the address of the JALR instruction itself.
The latest draft of the manual (at https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190321-ba17106/riscv-spec.pdf) makes the situation slightly clearer. In place of this in the current manual:
AUIPC appends 12 low-order zero bits to the 20-bit U-immediate, sign-extends the result to 64 bits, then adds it to the pc and places the result in register rd.
the latest draft says:
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc of the AUIPC instruction, then places the result in register rd.

Is it possible to {activate|de-activate} SIGFPE generation only for some specific C++11 code segments?

I am writing a path tracer in C++11, on Linux, for the numerical simulation of light transport and I am using
#include <fenv.h>
...
feenableexcept( FE_INVALID |
FE_DIVBYZERO |
FE_OVERFLOW |
FE_UNDERFLOW );
in order to catch and debug any numerical errors that may eventually occur during execution.
At some point in the code I have to compute the intersection of rays (line segments) against axis-aligned bounding boxes (AABBs). For this computation I am using a very optimized and robust ray-box intersection algorithm which relies on the generation of some special values (e.g. NaN and inf) described in the IEEE 754 standard. Obviously, I am not interested in catching floating point exceptions generated specifically by this ray-box intersection routine.
Thus, my questions are:
Is it possible to deactivate the generation of floating point exception signals (SIGFPE) for only some sections of the code (i.e. for the ray-box intersection code section)?
When we are calculating simulations we are very concerned about
performance. In the case that it is possible to suppress exception
signals only for specific code sections, can this be done at
compile time (i.e. instrumenting/de-instrumenting code during its
generation, such that we could avoid expensive function calls)?
Thank you for any help!
UPDATE
It is possible to instrument/deinstrument code through the use of feenableexcept and fedisableexcept function calls (actually, I posted this question because I was not aware about the fedisableexcept, only feenableexcept... shame on me!). For instance:
#include <fenv.h>
int main() {
float a = 1.0f;
fedisableexcept(FE_DIVBYZERO); // disable div by zero catching
// generates an inf that **won't be** catched
float c = a / 0.0f;
feenableexcept(FE_DIVBYZERO); // enable div by zero catching
// generates an inf that **will be** catched
float d = a / 2.0f;
return 0
}
Standard C++ does not provide any way to mark code at compile-time as to whether it should run with floating-point trapping enabled or disabled. In fact, support for manipulating the floating-point environment is not required by the standard, so whether an implementation has it at all is implementation-dependent. Any answer beyond standard C++ depends on the particular hardware and software you are using, but you have not reported that information.
On typical processors, enabling and disabling floating-point trapping is achieved by changing a processor control register. You do not need a function call to do this, but it is not the function call that is expensive, as you suggest in your question. The actual instruction may consume time as it may require the processor to serialize instruction execution. (Modern processors may have hundreds of instructions executing at the same time—some being decoded, some waiting for a subunit within the processor, some in various stages of calculation, some waiting to write their results to general registers, and so on. When changing a control register, the processor may have to wait for all currently executing instructions to finish, then change the register, then start executing new instructions.) If your hardware behaves this way, there is no way to get around it. (With such hardware, which is common, it is not possible to compile code to run with or without trapping without actually executing the run-time instruction to change the control register.)
You might be able to mitigate the time cost by batching path-tracing calculations, so they are performed in groups with only two changes to the floating-point control register (one to turn traps off, one to turn them on) for the entire group.

How do I properly handle DDC metadata and settings?

Using REDHAWK Version 2.0.5,
Given a CHANNELIZER centered at 300MHz and a DDC attached to the CHANNELIZER centered at 301MHz. The DDC is set relative to the CHANNELIZER and in this case the DDC is centered at a 1MHz offset from the CHANNELIZER.
A) How should I present the DDC center frequency to a user in the frontend tuner status and allocation? For example, would they enter 1MHz or 301MHz to set the center frequency for the DDC? Currently I am using the latter version.
B) In version 2.1.0 of the REDHAWK manual in section F.5.2 it says the COL_RF SRI keyword is the center frequency of the collector and the CHAN_RF is the center frequency of the stream. In the above case, I set COL_RF to 300MHz and CHAN_RF to 301MHz but the REDHAWK IDE plots center at 300MHz for the DDC. Should the CHAN_RF be a relative value such as 1MHz? Currently, at 301MHz, the IDE plots appear to center at the COL_RF frequency of 300MHz.
C) When the CHANNELIZER center frequency changes, I only set the valid field in the allocation to false on attached DDCs. Is there any other special bookkeeping that needs to be done when this happens?
D) Should disabling or enabling the output from the CHANNELIZER also disable or enable the output for the attached DDCs?
E) Must deallocating the CHANNELIZER force all DDCs that are attached to deallocate?
A) All external interfaces (allocation, FrontendTuner ports, status properties, etc) assume RF values, not IF or offsets. Allocate or tune to 301MHz in order to center a DDC at 301MHz. The center_frequency field of the frontend_tuner_status property should be set to 301MHz for that DDC.
B) Your understanding of how to use COL_RF (300MHz) and CHAN_RF (301MHz) is correct. You may be able to work around this by reordering the SRI keywords to have CHAN_RF appear first, if necessary.
For (C) and (D), there are some design decisions that are left up to the developer since the implementation, as well as the hardware (if any), may impact those decisions. Here are some recommendations, though.
C) In general, if at any point the DDCs become invalid, they should be marked as such. It is possible to retune a CHANNELIZER by a small amount such that one or more DDCs still fall within the frequency range and remain valid, but that may also be hardware dependent. Additionally, it's recommended that DDCs only produce data when both enabled AND valid, so if marking invalid you may also want to stop producing data from the invalid DDCs.
D) CHANNELIZER and RX_DIGITIZER_CHANNELIZER tuners both have wideband input and narrowband DDC output. Some implementations of an RX_DIGITIZER_CHANNELIZER may have the ability to produce wideband digital output of the analog input (acting as an RX_DIGITIZER). In this scenario, the RX_DIGITIZER_CHANNELIZER output enable/disable controls the wideband output, while the DDCs output enable remain independently controlled. The behavior of a CHANNELIZER, which does not produce wideband output, is left as a design decision for the developer. For behavior consistent with RX_DIGITIZER_CHANNELIZER tuners, it's recommended that the DDCs remain independently controlled. Note that the enable for a tuner is specifically the output enable, and not an overall enable/disable of the tuner itself. For that reason, it's recommended that the enable for a CHANNELIZER not affect the data flow to the DDCs since that data flow is internal to the device. Again, this is all up to the developer and these are just recommendations since the spec leaves it open.
E) Yes, deallocating a CHANNELIZER should result in deallocation of all associated DDCs.

2 Questions about Risc-V-Privileged-Spec-v1.7

Page 16, Table 3.1:
Base field in mcpuid: RV32I RV32E RV64I RV128I
What is "RV32E"?
Is there a "E" extension?
ECALL (page 30) says nothing about the behavior of the pc.
While mepc (page 28) and mbadaddr (page 29) claim that "mepc will point to the beginning of the instruction". I think ECALL should set the mepc to the end of the causing instruction so that a ERET would go to the next instruction. Is that right?
As answered by CliffordVienna, RV32E ("embedded") is a new base ISA which uses 16 registers and makes some of the counter registers optional.
I would not recommend implementing a RV32E core, as it is probably an unnecessary over-optimization in core size that limits your ability to use a large body of RV*I code. But if performance is not needed, and you really need the core to be a tad smaller, and the core is not connected to a memory hierarchy that would dominate the area/power anyways, and you were willing to deal with the tool-chain headaches... then maybe an RV32E core is appropriate.
ECALL is treated like an exception, and will redirect the PC to the appropriate trap handler based on the current privilege level. MEPC will be set to the current PC of the ecall instruction.
You can verify this behavior by analyzing the Berkeley RV64G Rocket processor (https://github.com/ucb-bar/rocket/blob/master/src/main/scala/csr.scala), or by looking at the Spike ISA simulator (starting here: https://github.com/riscv/riscv-isa-sim/blob/master/riscv/insns/scall.h). Careful: as of 2015 Jun 27 the code is still in flux regarding the Privileged Spec.
If we look at how Spike handles eret ("sret": https://github.com/riscv/riscv-isa-sim/blob/master/riscv/insns/sret.h) for example, we have to be a bit careful. The PC is set to "mepc", but it's the trap handler's job to advance the PC by 4. We can see that done, for example, by the proxy kernel in some of the handler functions here (https://github.com/riscv/riscv-pk/blob/master/pk/handlers.c).
A draft of the RV32E (embedded) spec can be found here (via isa-dev mailing list):
https://lists.riscv.org/lists/arc/isa-dev/2015-06/msg00022/rv32e.pdf
It's RV32I with 16 instead of 32 registers and without the counter instructions.

instruction set emulator guide

I am interested in writing emulators like for gameboy and other handheld consoles, but I read the first step is to emulate the instruction set. I found a link here that said for beginners to emulate the Commodore 64 8-bit microprocessor, the thing is I don't know a thing about emulating instruction sets. I know mips instruction set, so I think I can manage understanding other instruction sets, but the problem is what is it meant by emulating them?
NOTE: If someone can provide me with a step-by-step guide to instruction set emulation for beginners, I would really appreciate it.
NOTE #2: I am planning to write in C.
NOTE #3: This is my first attempt at learning the whole emulation thing.
Thanks
EDIT: I found this site that is a detailed step-by-step guide to writing an emulator which seems promising. I'll start reading it, and hope it helps other people who are looking into writing emulators too.
Emulator 101
An instruction set emulator is a software program that reads binary data from a software device and carries out the instructions that data contains as if it were a physical microprocessor accessing physical data.
The Commodore 64 used a 6502 Microprocessor. I wrote an emulator for this processor once. The first thing you need to do is read the datasheets on the processor and learn about its behavior. What sort of opcodes does it have, what about memory addressing, method of IO. What are its registers? How does it start executing? These are all questions you need to be able to answer before you can write an emulator.
Here is a general overview of how it would look like in C (Not 100% accurate):
uint8_t RAM[65536]; //Declare a memory buffer for emulated RAM (64k)
uint16_t A; //Declare Accumulator
uint16_t X; //Declare X register
uint16_t Y; //Declare Y register
uint16_t PC = 0; //Declare Program counter, start executing at address 0
uint16_t FLAGS = 0 //Start with all flags cleared;
//Return 1 if the carry flag is set 0 otherwise, in this example, the 3rd bit is
//the carry flag (not true for actual 6502)
#define CARRY_FLAG(flags) ((0x4 & flags) >> 2)
#define ADC 0x69
#define LDA 0xA9
while (executing) {
switch(RAM[PC]) { //Grab the opcode at the program counter
case ADC: //Add with carry
A = X + RAM[PC+1] + CARRY_FLAG(FLAGS);
UpdateFlags(A);
PC += ADC_SIZE;
break;
case LDA: //Load accumulator
A = RAM[PC+1];
UpdateFlags(X);
PC += MOV_SIZE;
break;
default:
//Invalid opcode!
}
}
According to this reference ADC actually has 8 opcodes in the 6502 processor, which means you will have 8 different ADC in your switch statement, each one for different opcodes and memory addressing schemes. You will have to deal with endianess and byte order, and of course pointers. I would get a solid understanding of pointer and type casting in C if you dont already have one. To manipulate the flags register you have to have a solid understanding of bitwise operations in C. If you are clever you can make use of C macros and even function pointers to save yourself some work, as the CARRY_FLAG example above.
Every time you execute an instruction, you must advance the program counter by the size of that instruction, which is different for each opcode. Some opcodes dont take any arguments and so their size is just 1 byte, while others take 16-bit integers as in my MOV example above. All this should be pretty well documented.
Branch instructions (JMP, JE, JNE etc) are simple: If some flag is set in the flags register then load the PC to the address specified. This is how "decisions" are made in a microprocessor and emulating them is simply a matter of changing the PC, just as the real microprocessor would do.
The hardest part about writing an instruction set emulator is debugging. How do you know if everything is working like it should? There are plenty of resources for helping you. People have written test codes that will help you debug every instruction. You can execute them one instruction at a time and compare the reference output. If something is different, you know you have a bug somewhere and can fix it.
This should be enough to get you started. The important thing is that you have A) A good solid understanding of the instruction set you want to emulate and B) a solid understanding of low level data manipulation in C, including type casting, pointers, bitwise operations, byte order, etc.

Resources