I read in ARM docs that:
GE[3:0], bits[19:16]
The instructions described in Parallel addition and subtraction instructions on
page A4-171 update these flags to indicate the results from individual bytes or halfwords
of the operation. These flags can control a later SEL instruction.
So apparently GE[3:0] stands for "eq/lt/gt"?
I came into a couple of strange issues which I yet don't have a clue, but they all have CPSR value xxxf0030, so the GE bits are 0b1111? What does that stands for? Is it normal for these GE bits?
Thanks in advance!
In the ARMv7 ARM (which matches that text), the details of how the GE flags get set are only in the operation pseudocode of the parallel instructions themselves. Sadly, they seem to have removed this nice prose description which was in the ARMv6 ARM:
Instructions that operate on halfwords:
set or clear GE[3:2] together, based on the result of the top halfword calculation
set or clear GE[1:0] together, based on the result of the bottom halfword calculation.
Instructions that operate on bytes:
set or clear GE[3] according to the result of the top byte calculation
set or clear GE[2] according to the result of the second byte calculation
set or clear GE[1] according to the result of the third byte calculation
set or clear GE[0] according to the result of the bottom byte calculation.
Each bit is set (otherwise cleared) if the results of the
corresponding calculation are as follows:
for unsigned byte addition, if the result is greater than or equal to 2^8
for unsigned halfword addition, if the result is greater than or equal to 2^16
for unsigned subtraction, if the result is greater than or equal to zero
for signed arithmetic, if the result is greater than or equal to zero.
As arithmetic flags, they could have any old value (undefined at reset, and can be freely written to via APSR), so until you've specifically used one of the instructions which sets them, they're pretty much meaningless and can be ignored.
Related
As i know, the result of dividing by 0 in RISC-V ist -1(signed), 0xFFFFFFFF(hex).
But i dont know the reason why the RISC-V outputs this code.
Also How is this procedure justified?
Thank you!
We considered raising exceptions on integer divide by zero, with these exceptions causing a trap in most execution environments. However, this would be the only arithmetic trap in the standard ISA (floating-point exceptions set flags and write default values, but do not cause traps) and would require language implementors to interact with the execution environment’s trap handlers for this case. Further, where language standards mandate that a divide-by-zero exception must cause an immediate control flow change, only a single branch instruction needs to be added to each divide operation, and this branch instruction can be inserted after the divide and should normally be very predictably not taken, adding little runtime overhead.
The value of all bits set is returned for both unsigned and signed divide by zero to simplify the divider circuitry. The value of all 1s is both the natural value to return for unsigned divide, representing the largest unsigned number, and also the natural result for simple unsigned divider implementations. Signed division is often implemented using an unsigned division circuit and specifying the same overflow result simplifies the hardware.
from https://five-embeddev.com/riscv-isa-manual/latest/m.html
what does MOVZON X'FF' do in quickjob. I believe it just moves input to output. Please let me know, if I am wrong.
The smallest unit of information is the bit. Processors usually don‘t work on single bits when accessing memory; they work on bytes. A byte consists of 8 consecutive bits (for most architectures).
To describe how different processor instructions work with bytes, bytes are sometimes subdivided into two 4-bit groups, called nibbles. Counting left to right, bits 0-3 are called „left nibble“, „high order nibble“, or „zone nibble“. Bits 4-7, the right half, are called „right nibble“, „low order nibble“, or „number nibble“.
There are instructions that work on the whole byte, e.g. MOVE. And there are instructions that work on nibbles. MOVEZONE (MOVZON) works on zone nibbles and leaves the number nibbles alone; MOVENUM (MOVNUM) works on number nibbles, and leaves the zone nibbles alone.
This kind of instructions are usually used with bytes that contain numeric values, coded as either zoned decimal, or packed decimal. They are rather exotic when working on text data.
This reference is used.
Given the instruction:
MOVZON X'FF' to OFA1
The receiving field OFA1 refers to the first record position (the 1) of the output file ( the OF) designated as A. The instruction will set the high-order bits (0-3 or "zone bits") of the first position to ones, matching bits 0-3 of the X'FF'.
However, it appears, as a matter of style, the instruction should have been written as MOVZON X'F0' TO OAF1 since the low-order bits (4-7) are not used.
I'm working with a serial protocol. Messages are of variable length that is known in advance. On both transmission and reception sides, I have the message saved to a shift register that is as long as the longest possible message.
I need to calculate CRC32 of these registers, the same as for Ethernet, as fast as possible. Since messages are variable length (anything from 12 to 64 bits), I chose serial implementation that should run already in parallel with reception/transmission of the message.
I ran into a problem with organization of data before calculation. As specified here , the data needs to be bit-reversed, padded with 32 zeros and complemented before calculation.
Even if I forget the part about running in parallel with receiving or transmitting data, how can I effectively get only my relevant message from max-length register so that I can pad it before calculation? I know that ideas like
newregister[31:0] <= oldregister[X:0] // X is my variable length
don't work. It's also impossible to have the generate for loop clause that I use to bit-reverse the old vector run variable number of times. I could use a counter to serially move data to desired length, but I cannot afford to lose this much time.
Alternatively, is there an operation that would directly give me the padded and complemented result? I do not even have an idea how to start developing such an idea.
Thanks in advance for any insight.
You've misunderstood how to do a serial CRC; the Python question you quote isn't relevant. You only need a 32-bit shift register, with appropriate feedback taps. You'll get a million hits if you do a Google search for "serial crc" or "ethernet crc". There's at least one Xilinx app note that does the whole thing for you. You'll need to be careful to preload the 32-bit register with the correct value, and whether or not you invert the 32-bit data on completion.
EDIT
The first hit on 'xilinx serial crc' is xapp209, which has the basic answer in fig 1. On top of this, you need the taps, the preload value, whether or not to invert the answer, and the value to check against on reception. I'm sure they used to do all this in another app note, but I can't find it at the moment. The basic references are the Ethernet 802.3 spec (3.2.8 Frame check Sequence field, which was p27 in the original book), and the V42 spec (8.1.1.6.2 32-bit frame check sequence, page 311 in the old CCITT Blue Book). Both give the taps. V42 requires a preload to all 1's, invert of completion, and gives the test value on reception. Warren has a (new) chapter in Hacker's Delight, which shows the taps graphically; see his website.
You only need the online generators to check your solution. Be careful, though: they will generally have different preload values, and may or may not invert the result, and may or may not be bit-reversed.
Since X is a viarable, you will need to bit assignments with a for-loop. The for-loop needs to be inside an always block and the for-loop must static unroll (ie the starting index, ending index, and step value must be constants).
for(i=0; i<32; i=i+1) begin
if (i<X)
newregister[i] <= oldregister[i];
else
newregister[i] <= 1'b0; // pad zeros
end
I'm using the /QIfist compiler switch regularly, which causes the compiler to generate FISTP instructions to round floating point values to integers, instead of calling the _ftol helper function.
How can I make it use FIST(P) DWORD, instead of QWORD?
FIST QWORD requires the CPU to store the result on stack, then read stack into register and finally store to destination memory, while FIST DWORD just stores directly into destination memory.
FIST QWORD requires the CPU to store the result on stack, then read stack into register and finally store to destination memory, while FIST DWORD just stores directly into destination memory.
I don't understand what you are trying to say here.
The FIST and FISTP instructions differ from each other in exactly two ways:
FISTP pops the top value off of the floating point stack, while FIST does not. This is the obvious difference, and is reflected in the opcode naming: FISTP has that P suffix, which means "pop", just like ADDP, etc.
FISTP has an additional encoding that works with 64-bit (QWORD) operands. That means you can use FISTP to convert a floating point value to a 64-bit integer. FIST, on the other hand, maxes out at 32-bit (DWORD) operands.
(I don't think there's a technical reason for this. I certainly can't imagine it is related to the popping behavior. I assume that when the Intel engineers added support for 64-bit operands some time later, they figured there was no reason for a non-popping version. They were probably running out of opcode encodings.)
There are lots of online references for the x86 instruction set. For example, this site is the top hit for most Google searches. Or you can look in Intel's manuals (FIST/FISTP are on p. 365).
Where the two instructions read the value from, and where they store it to, is exactly the same. Both read the value from the top of the floating point stack, and both store the result to memory.
There would be absolutely no advantage to the compiler using FIST instead of FISTP. Remember that you always have to pop all values off of the floating point stack when exiting from a function, so if FIST is used, you'd have to follow it by an additional FSTP instruction. That might not be any slower, but it would needlessly inflate the code.
Besides, there's another reason that the compiler prefers FISTP: the support for 64-bit operands. It allows the code generator to be identical, regardless of what size integer you're rounding to.
The only time you might prefer to use FIST is if you're hand-writing assembly code and want to re-use the floating point value on the stack after rounding it. The compiler doesn't need to do that.
So anyway, all of that to say that the answer to your question is no. The compiler can't be made to generate FIST instructions automatically. If you're still insistent, you can write inline assembly that uses whatever instructions you want:
int32 RoundToNearestEven(float value)
{
int32 result;
__asm
{
fld DWORD PTR value
fist DWORD PTR result
// do something with the value on the floating point stack...
//
// ... but be sure to pop it off before returning
fstp st(0)
}
return result;
}
I stumbled upon a statement in Intel Software developers manual:
"For LGDT, LIDT, LLDT, LTR, SGDT, SIDT, SLDT, STR, the exit qualification receives the value of the instruction’s displacement field, which is sign-extended to 64 bits if necessary (32 bits on processors that do not support Intel 64 architecture). If the instruction has no displacement (for example, has a register operand), zero is stored into the exit qualification. "
Now if I have an instruction LIDT 0xf290, then is "0xf290" a displacement? I think answer is yes.
So, my confusion is what all constitute as displacement? I was under impression that displacement is something which is calculated with respect to current eip value.
For eg. jmp xxx (In intrasegment jumps this will be a displacement. But for intersegment jumps, it should be absolute address.) If that is the case then why LIDT loads a relative address?
A displacement is just an offset from some origin, which may be a Base+Index*Scale, or 0. The other operand x86 has that can hold large values is immediate, which is useful for things like adding constants (e.g. ADD $42, %eax).
Incidentally, it appears that relative jumps use the immediate field, probably because they modify EIP by a constant.