Why has RISC-V output -1 or xFFFFFFFF from dividing by 0?

Why has RISC-V output -1 or xFFFFFFFF from dividing by 0? - riscv

As i know, the result of dividing by 0 in RISC-V ist -1(signed), 0xFFFFFFFF(hex).
But i dont know the reason why the RISC-V outputs this code.
Also How is this procedure justified?
Thank you!

We considered raising exceptions on integer divide by zero, with these exceptions causing a trap in most execution environments. However, this would be the only arithmetic trap in the standard ISA (floating-point exceptions set flags and write default values, but do not cause traps) and would require language implementors to interact with the execution environment’s trap handlers for this case. Further, where language standards mandate that a divide-by-zero exception must cause an immediate control flow change, only a single branch instruction needs to be added to each divide operation, and this branch instruction can be inserted after the divide and should normally be very predictably not taken, adding little runtime overhead.
The value of all bits set is returned for both unsigned and signed divide by zero to simplify the divider circuitry. The value of all 1s is both the natural value to return for unsigned divide, representing the largest unsigned number, and also the natural result for simple unsigned divider implementations. Signed division is often implemented using an unsigned division circuit and specifying the same overflow result simplifies the hardware.
from https://five-embeddev.com/riscv-isa-manual/latest/m.html

Related

Is it possible to {activate|de-activate} SIGFPE generation only for some specific C++11 code segments?

I am writing a path tracer in C++11, on Linux, for the numerical simulation of light transport and I am using
#include <fenv.h>
...
feenableexcept( FE_INVALID |
FE_DIVBYZERO |
FE_OVERFLOW |
FE_UNDERFLOW );
in order to catch and debug any numerical errors that may eventually occur during execution.
At some point in the code I have to compute the intersection of rays (line segments) against axis-aligned bounding boxes (AABBs). For this computation I am using a very optimized and robust ray-box intersection algorithm which relies on the generation of some special values (e.g. NaN and inf) described in the IEEE 754 standard. Obviously, I am not interested in catching floating point exceptions generated specifically by this ray-box intersection routine.
Thus, my questions are:
Is it possible to deactivate the generation of floating point exception signals (SIGFPE) for only some sections of the code (i.e. for the ray-box intersection code section)?
When we are calculating simulations we are very concerned about
performance. In the case that it is possible to suppress exception
signals only for specific code sections, can this be done at
compile time (i.e. instrumenting/de-instrumenting code during its
generation, such that we could avoid expensive function calls)?
Thank you for any help!
UPDATE
It is possible to instrument/deinstrument code through the use of feenableexcept and fedisableexcept function calls (actually, I posted this question because I was not aware about the fedisableexcept, only feenableexcept... shame on me!). For instance:
#include <fenv.h>
int main() {
float a = 1.0f;
fedisableexcept(FE_DIVBYZERO); // disable div by zero catching
// generates an inf that **won't be** catched
float c = a / 0.0f;
feenableexcept(FE_DIVBYZERO); // enable div by zero catching
// generates an inf that **will be** catched
float d = a / 2.0f;
return 0
}

Standard C++ does not provide any way to mark code at compile-time as to whether it should run with floating-point trapping enabled or disabled. In fact, support for manipulating the floating-point environment is not required by the standard, so whether an implementation has it at all is implementation-dependent. Any answer beyond standard C++ depends on the particular hardware and software you are using, but you have not reported that information.
On typical processors, enabling and disabling floating-point trapping is achieved by changing a processor control register. You do not need a function call to do this, but it is not the function call that is expensive, as you suggest in your question. The actual instruction may consume time as it may require the processor to serialize instruction execution. (Modern processors may have hundreds of instructions executing at the same time—some being decoded, some waiting for a subunit within the processor, some in various stages of calculation, some waiting to write their results to general registers, and so on. When changing a control register, the processor may have to wait for all currently executing instructions to finish, then change the register, then start executing new instructions.) If your hardware behaves this way, there is no way to get around it. (With such hardware, which is common, it is not possible to compile code to run with or without trapping without actually executing the run-time instruction to change the control register.)
You might be able to mitigate the time cost by batching path-tracing calculations, so they are performed in groups with only two changes to the floating-point control register (one to turn traps off, one to turn them on) for the entire group.

When will CPSR GE[3:0] bits be modified

I read in ARM docs that:
GE[3:0], bits[19:16]
The instructions described in Parallel addition and subtraction instructions on
page A4-171 update these flags to indicate the results from individual bytes or halfwords
of the operation. These flags can control a later SEL instruction.
So apparently GE[3:0] stands for "eq/lt/gt"?
I came into a couple of strange issues which I yet don't have a clue, but they all have CPSR value xxxf0030, so the GE bits are 0b1111? What does that stands for? Is it normal for these GE bits?
Thanks in advance!

In the ARMv7 ARM (which matches that text), the details of how the GE flags get set are only in the operation pseudocode of the parallel instructions themselves. Sadly, they seem to have removed this nice prose description which was in the ARMv6 ARM:
Instructions that operate on halfwords:
set or clear GE[3:2] together, based on the result of the top halfword calculation
set or clear GE[1:0] together, based on the result of the bottom halfword calculation.
Instructions that operate on bytes:
set or clear GE[3] according to the result of the top byte calculation
set or clear GE[2] according to the result of the second byte calculation
set or clear GE[1] according to the result of the third byte calculation
set or clear GE[0] according to the result of the bottom byte calculation.
Each bit is set (otherwise cleared) if the results of the
corresponding calculation are as follows:
for unsigned byte addition, if the result is greater than or equal to 2^8
for unsigned halfword addition, if the result is greater than or equal to 2^16
for unsigned subtraction, if the result is greater than or equal to zero
for signed arithmetic, if the result is greater than or equal to zero.
As arithmetic flags, they could have any old value (undefined at reset, and can be freely written to via APSR), so until you've specifically used one of the instructions which sets them, they're pretty much meaningless and can be ignored.

Verilog: Using casex for synthesis

I'm looking to implement a parallel case block that will check the value of a 16-bit register. In some cases, I need it to check for all 16 bits. However, in others, I only need to check a few. Is casex suitable in this scenario? If you hadn't already inferred, it is to be synthesized.
It's part of a control matrix for a microprocessor. It's a Moore machine connected to an instruction register. Instructions are 16 bits wide. For some instructions, such as mov, the machine states are exactly the same, except the register/memory addressing is different. The instruction contains the information about what register or memory it's referencing, so I do not need to explicitly have a case for every possible instruction.
For example, if my opcode was 1111, and the remaining 12 bits were addressing, I could simply use a case for 16'b1111xxxxxxxxxxxx.
I'd like it to be parallel, and so I'm not using if-else statements. I'm unsure if this will work the way I intend it to. Any suggestions would be appreciated.

Yes, you could either use casex or casez.
Examples from IEEE 1800-2012:
with casez, you can use ? for don't care values:
logic[7:0] ir;
casez(ir)
8'b1???????: instruction1(ir);
8'b01??????: instruction2(ir);
8'b00010???: instruction3(ir);
8'b000001??: instruction4(ir);
endcase
The following is an example of the casex statement. It demonstrates an extreme case of how donot-care conditions can be dynamically controlled during simulation. In this example, if
r = 8'b01100110, then the task stat2is called.
logic[7:0] r, mask;
mask = 8'bx0x0x0x0;
casex(r ^ mask)
8'b001100xx: stat1;
8'b1100xx00: stat2;
8'b00xx0011: stat3;
8'bxx010100: stat4;
endcase

Forcing MSVC to generate FIST instructions with the /QIfist option

I'm using the /QIfist compiler switch regularly, which causes the compiler to generate FISTP instructions to round floating point values to integers, instead of calling the _ftol helper function.
How can I make it use FIST(P) DWORD, instead of QWORD?
FIST QWORD requires the CPU to store the result on stack, then read stack into register and finally store to destination memory, while FIST DWORD just stores directly into destination memory.

FIST QWORD requires the CPU to store the result on stack, then read stack into register and finally store to destination memory, while FIST DWORD just stores directly into destination memory.
I don't understand what you are trying to say here.
The FIST and FISTP instructions differ from each other in exactly two ways:
FISTP pops the top value off of the floating point stack, while FIST does not. This is the obvious difference, and is reflected in the opcode naming: FISTP has that P suffix, which means "pop", just like ADDP, etc.
FISTP has an additional encoding that works with 64-bit (QWORD) operands. That means you can use FISTP to convert a floating point value to a 64-bit integer. FIST, on the other hand, maxes out at 32-bit (DWORD) operands.
(I don't think there's a technical reason for this. I certainly can't imagine it is related to the popping behavior. I assume that when the Intel engineers added support for 64-bit operands some time later, they figured there was no reason for a non-popping version. They were probably running out of opcode encodings.)
There are lots of online references for the x86 instruction set. For example, this site is the top hit for most Google searches. Or you can look in Intel's manuals (FIST/FISTP are on p. 365).
Where the two instructions read the value from, and where they store it to, is exactly the same. Both read the value from the top of the floating point stack, and both store the result to memory.
There would be absolutely no advantage to the compiler using FIST instead of FISTP. Remember that you always have to pop all values off of the floating point stack when exiting from a function, so if FIST is used, you'd have to follow it by an additional FSTP instruction. That might not be any slower, but it would needlessly inflate the code.
Besides, there's another reason that the compiler prefers FISTP: the support for 64-bit operands. It allows the code generator to be identical, regardless of what size integer you're rounding to.
The only time you might prefer to use FIST is if you're hand-writing assembly code and want to re-use the floating point value on the stack after rounding it. The compiler doesn't need to do that.
So anyway, all of that to say that the answer to your question is no. The compiler can't be made to generate FIST instructions automatically. If you're still insistent, you can write inline assembly that uses whatever instructions you want:
int32 RoundToNearestEven(float value)
{
int32 result;
__asm
{
fld DWORD PTR value
fist DWORD PTR result
// do something with the value on the floating point stack...
//
// ... but be sure to pop it off before returning
fstp st(0)
}
return result;
}

Is it possible for a binary integer operation that results in an overflow to overwrite adjacent memory?

This question is not about any error I'm currently seeing, it's more about theory and getting educated on the variations in HW architecture design and implementation.
Scenario 1: Assuming a 16-bit processor with 16-bit registers, 16-bit addressing, and sizeof(int) = 16 bits:
unsigned int a, b, c, d;
a=0xFFFF;
b=0xFFFF;
c=a+b;
Is it possible for the memory location next to c to be overwritten? (In this case I would expect that an overflow flag would be raised during the add operation, and c either remain unchanged or be filled with undefined data.)
Scenario 2: Assuming a 32-bit processor with 32-bit registers, 32-bit addressing, sizeof(int) = 32 bits, and sizeof(short int)=16 bits:
unsigned int a, b;
unsigned short int c, d;
a=0xFFFF;
b=0xFFFF;
c=a+b;
Is it possible for the memory location next to c to be overwritten? (I would expect that no overflow flag be raised during the add operation, but whether or not a memory access or overflow flag be raised during the assignment operation would depend on the actual design and implementation of the HW. If d was located in the upper 16 bits of the same 32-bit address location (probably not even possible with 32-bit addressing), it might be overwritten.)
Scenario 3: Assuming a 32-bit processor with 32-bit registers, 16-bit addressing, sizeof(int) = 32 bits, and sizeof(short int)=16 bits:
unsigned int a, b;
unsigned short int c, d;
a=0xFFFF;
b=0xFFFF;
c=a+b;
Is it possible for the memory location next to c to be overwritten? (I would expect some overflow flag or memory violation flag to be raised during the type conversion and assignment operation.)
Scenario 4: Assuming a 32-bit processor with 32-bit registers, 32-bit addressing, and sizeof(int) = 32 bits:
unsigned int a, b;
struct {
unsigned int c:16;
unsigned int d:16;
} e;
a=0xFFFF;
b=0xFFFF;
e.c=a+b;
Is it possible for the memory location next to c, namely d, to be overwritten? (In this case, since c and d are expected to reside in the same 32-bit address and both are technically 32-bit integers, it's conceivable that no overflow flags be raised during addition or assignment, and d could be affected.)
I have not tried to test this on actual hardware because my question is more about theory and possible variations in design and implementation. Any insight would be appreciated.
Is it possible for a binary integer operation that results in an overflow to overwrite adjacent memory?
Are there currently any HW implementations that suffer from similar memory overwrite problems, or have there been systems in the past that had this problem?
What devices do typical processors use to guard against neighboring memory from being overwritten by arithmetic and assignment operations?

None of your 1,2,3,4 will result in any memory corruption, or writing "past" the storage for the integer location you are performing arithmetic on. C specifies what should happen on an overflow of unsigned integers. This is assuming the compiler produces the code it's supposed to. Nothing will guard you acainst a buggy compiler that generates code which copies 4 bytes into a 2 byte variable.
Here's what C99 (6.2.5) says:
"A computation involving unsigned operands can never overﬂow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type."
So, it's well defined what will happen when you try to "overflow" an unsigned integer.
Now, if your integers had been signed integers, it is an other story. According to C, overflowing a signed integer results in undefined behavior. And undefined behavior means anything, including memory corruption, could occur. I have not yet seen a C compiler that would corrupt anything regarding overflowing an integer though.
What devices do typical processors use to guard against neighboring memory from being overwritten by arithmetic and assignment operations?
There's no guards against neighboring memory regarding assignment and arithmetic operations. The processor simply executes the machine code instructions given to it (And if those instructions overwrites memory it was not suppoed to do as expressed in a higher level language, the processor does not care).
At a slightly different level, the CPU might issue traps if it cannot carry out the operation(e.g. the operation specified memory location that does not exist), or tries to do some illegal operation (e.g. division by zero, or encounters an op code the processor does not understand, or tries to do unaligned access to data).

The addition operation is processed inside the processor afaik, so whatever you do, the add operation will be done inside the CPU (in the ALU more precisely)
The overflow register would be set in case of overflow, and the result will still be in the register, and then copied back to your memory location without risk of corrupting adjacent memory.
This is how the code would (sort of) be translated in asm:
mov ax, ptr [memory location of a]
mov bx, ptr [memory location of b]
add ax,bx
mov ptr [memory location of c], ax
so as you can see, c would only hold what was in ax (which is of a known and fixed size) no matter if an overflow occurred or not

In C, the behavior of overflows with unsigned types is well-defined. Any implementation in which overflow causes any result other than what the standard predicts is non-compliant.
The behavior of overflows with signed types is undefined. While the most common effects of an overflow would either be the assignment of some erroneous value or an outright crash, nothing in the C standard guarantees that the processor won't trigger an overflow fault in a way that the compiled code tries to recover from, but which for some reason trashes the contents of a stack variable or a register. It's also possible that an overflow could cause a flag to get set in a way that code doesn't expect, and that such a flag might cause erroneous behavior on future computations.
Other languages may have different semantics for what happens when an overflow occurs.
note: I've seen processors which trap on overflow, processors where unexpected traps that happen at the same time as an external interrupt may cause data corruption, and processors where an unexpected unsigned overflow could cause a succeeding computation to be off by one. I don't know of any where a signed overflow would latch a flag that would interfere with subsequent calculations, but some DSP's have interesting overflow behaviors, so I wouldn't be surprised if one exists.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string