Related
what does MOVZON X'FF' do in quickjob. I believe it just moves input to output. Please let me know, if I am wrong.
The smallest unit of information is the bit. Processors usually don‘t work on single bits when accessing memory; they work on bytes. A byte consists of 8 consecutive bits (for most architectures).
To describe how different processor instructions work with bytes, bytes are sometimes subdivided into two 4-bit groups, called nibbles. Counting left to right, bits 0-3 are called „left nibble“, „high order nibble“, or „zone nibble“. Bits 4-7, the right half, are called „right nibble“, „low order nibble“, or „number nibble“.
There are instructions that work on the whole byte, e.g. MOVE. And there are instructions that work on nibbles. MOVEZONE (MOVZON) works on zone nibbles and leaves the number nibbles alone; MOVENUM (MOVNUM) works on number nibbles, and leaves the zone nibbles alone.
This kind of instructions are usually used with bytes that contain numeric values, coded as either zoned decimal, or packed decimal. They are rather exotic when working on text data.
This reference is used.
Given the instruction:
MOVZON X'FF' to OFA1
The receiving field OFA1 refers to the first record position (the 1) of the output file ( the OF) designated as A. The instruction will set the high-order bits (0-3 or "zone bits") of the first position to ones, matching bits 0-3 of the X'FF'.
However, it appears, as a matter of style, the instruction should have been written as MOVZON X'F0' TO OAF1 since the low-order bits (4-7) are not used.
For https://github.com/KastnerRG/riffa/blob/master/fpga/riffa_hdl/fifo_packer_128.v#L113 , is it sensible to have 224 bits for signal 'rPackedData' ?
Besides, do anyone have any idea regarding the following code segment which OR together the shifted-in data ? Should it not be AND instead ?
_rPackedData = ((rPackedData>>(32*{rPackedCount[2], 2'd0})) | (rDataMasked<<(32*rPackedCount[1:0])));
any idea regarding the following code segment which OR together the shifted-in data ? Should it not be AND instead ?
The left half of that expression shifts stored data OUT, zeroing some higher bits. The right half shifts IN new data, placing it at required position at higher portion of the buffer, and padding lower bits with zero. In order to get updated content of the buffer you need to OR both halves of the expression. If you will try to AND it, you'll get completely zeroed buffer, since valid data from the left and right halves of expression does not overlap.
I'm working with a serial protocol. Messages are of variable length that is known in advance. On both transmission and reception sides, I have the message saved to a shift register that is as long as the longest possible message.
I need to calculate CRC32 of these registers, the same as for Ethernet, as fast as possible. Since messages are variable length (anything from 12 to 64 bits), I chose serial implementation that should run already in parallel with reception/transmission of the message.
I ran into a problem with organization of data before calculation. As specified here , the data needs to be bit-reversed, padded with 32 zeros and complemented before calculation.
Even if I forget the part about running in parallel with receiving or transmitting data, how can I effectively get only my relevant message from max-length register so that I can pad it before calculation? I know that ideas like
newregister[31:0] <= oldregister[X:0] // X is my variable length
don't work. It's also impossible to have the generate for loop clause that I use to bit-reverse the old vector run variable number of times. I could use a counter to serially move data to desired length, but I cannot afford to lose this much time.
Alternatively, is there an operation that would directly give me the padded and complemented result? I do not even have an idea how to start developing such an idea.
Thanks in advance for any insight.
You've misunderstood how to do a serial CRC; the Python question you quote isn't relevant. You only need a 32-bit shift register, with appropriate feedback taps. You'll get a million hits if you do a Google search for "serial crc" or "ethernet crc". There's at least one Xilinx app note that does the whole thing for you. You'll need to be careful to preload the 32-bit register with the correct value, and whether or not you invert the 32-bit data on completion.
EDIT
The first hit on 'xilinx serial crc' is xapp209, which has the basic answer in fig 1. On top of this, you need the taps, the preload value, whether or not to invert the answer, and the value to check against on reception. I'm sure they used to do all this in another app note, but I can't find it at the moment. The basic references are the Ethernet 802.3 spec (3.2.8 Frame check Sequence field, which was p27 in the original book), and the V42 spec (8.1.1.6.2 32-bit frame check sequence, page 311 in the old CCITT Blue Book). Both give the taps. V42 requires a preload to all 1's, invert of completion, and gives the test value on reception. Warren has a (new) chapter in Hacker's Delight, which shows the taps graphically; see his website.
You only need the online generators to check your solution. Be careful, though: they will generally have different preload values, and may or may not invert the result, and may or may not be bit-reversed.
Since X is a viarable, you will need to bit assignments with a for-loop. The for-loop needs to be inside an always block and the for-loop must static unroll (ie the starting index, ending index, and step value must be constants).
for(i=0; i<32; i=i+1) begin
if (i<X)
newregister[i] <= oldregister[i];
else
newregister[i] <= 1'b0; // pad zeros
end
I read in ARM docs that:
GE[3:0], bits[19:16]
The instructions described in Parallel addition and subtraction instructions on
page A4-171 update these flags to indicate the results from individual bytes or halfwords
of the operation. These flags can control a later SEL instruction.
So apparently GE[3:0] stands for "eq/lt/gt"?
I came into a couple of strange issues which I yet don't have a clue, but they all have CPSR value xxxf0030, so the GE bits are 0b1111? What does that stands for? Is it normal for these GE bits?
Thanks in advance!
In the ARMv7 ARM (which matches that text), the details of how the GE flags get set are only in the operation pseudocode of the parallel instructions themselves. Sadly, they seem to have removed this nice prose description which was in the ARMv6 ARM:
Instructions that operate on halfwords:
set or clear GE[3:2] together, based on the result of the top halfword calculation
set or clear GE[1:0] together, based on the result of the bottom halfword calculation.
Instructions that operate on bytes:
set or clear GE[3] according to the result of the top byte calculation
set or clear GE[2] according to the result of the second byte calculation
set or clear GE[1] according to the result of the third byte calculation
set or clear GE[0] according to the result of the bottom byte calculation.
Each bit is set (otherwise cleared) if the results of the
corresponding calculation are as follows:
for unsigned byte addition, if the result is greater than or equal to 2^8
for unsigned halfword addition, if the result is greater than or equal to 2^16
for unsigned subtraction, if the result is greater than or equal to zero
for signed arithmetic, if the result is greater than or equal to zero.
As arithmetic flags, they could have any old value (undefined at reset, and can be freely written to via APSR), so until you've specifically used one of the instructions which sets them, they're pretty much meaningless and can be ignored.
I want to move the number 01,02,03,04 individually into the register D1 so that after all the moves have completed the result in register D1 will be 01020304.
The way I'm thinking of solving this problem is, move 01 to D1. Then somehow shift it 2 digits to the left and then move 02 in. And so on to get the solution.
How should I do this?
this can be done in several ways, the most logical is to load the intended topmost byte first, then left-shift the register (by 8 bits = 1 byte) and load the next byte:
move.b #$01,d0
lsl.w #8,d0 ; could lsl.l here, too
move.b #$02,d0
lsl.l #8,d0
move.b #$03,d0
lsl.l #8,d0
move.b #$04,d0
; d0 = $01020304
A somewhat more confusing, but (on the 68000) faster method is load the most significant word as desribed above into the lower word, then use the SWAP instruction to switch register halves, then load the least significant word normally:
move.b #$01,d0
lsl.w #8,d0
move.b #$02,d0
swap d0
move.b #$03,d0
lsl.w #8,d0
move.b #$04,d0
; d0 = $01020304
The point here is that SWAP performs a rotate by 16 on the entire register, so the individual rotations can be performed using word sized shifts. Also, since the 68000 doesn't have a barrel shifter, shifting performance is dependent on shift distance, meaning shifting a register by 8 is relatively slow, while SWAP performs quickly.
On the "bigger" 68K members you won't see much performance difference, because they shift quickly, regardless of shift distance.