Verilog operation unexpected result - verilog

I am studying verilog language and faced problems.
integer intA;
...
intA = - 4'd12 / 3; // expression result is 1431655761.
// -4’d12 is effectively a 32-bit reg data type
This snippet from standard and it blew our minds. The standard says that 4d12 - is a 4 bit number 1100.
Then -4d12 = 0100. It's okay now.
To perform the division, we need to bring the number to the same size. 4 to 32 bit. The number of bits -4'd12 - is unsigned, then it should be equal to 32'b0000...0100, but it equal to 32'b1111...10100. Not ok, but next step.
My version of division: -4d12 / 3 = 32'b0000...0100 / 32'b0000...0011 = 1
Standart version: - 4'd12 / 3 = 1431655761
Can anyone tell why? Why 4 bit number keeps extra bits?

You need to read section 11.8.2 Steps for evaluating an expression of the 1800-2012 LRM. They key piece you are missing is that the operand is 4'd12 and that it is sized to 32 bits as an unsigned value before the unary - operator is applied.
If you want the 4-bit value treated as a signed -3, then you need to write
intA = - 4'sd12 / 3 // result is 1

here the parser interprets -'d12 as 32 bits number which is unsigned initially and the negative sign would result in the negation of bits. so the result would be
negation of ('d12)= negation of (28 zeros + 1100)= 28ones+2zeros+2ones =
11111111111111111111111111110011. gives output to 4294967283 . if you divide this number (4294967283) by 3 the answer would be 1,431,655,761.
keep smiling :)

Related

Signed and Unsigned Multiplication Problem in Verilog

I have been working on approximate multiplication recently and I want to write a Verilog code for dynamic segment multiplication (DSM) . It suggest that you find the first index in you number which has a value of 1 and then take other 3 indexes next to it to form a 4 bit number that represent an 8 bit number then you should multiply these 4 bit numbers instead of 8 bits then some shifts to have the final result it helps a lot on hardware actually.. but my problem is about multiplication of these segments because sometimes they should be considered signed and some time unsigned I have the last 3 lines of my code: (a and b are input 8 bit numbers) and m1 and m2 are segments I wrote m,m2 as reg signed [3:0] and a and b as input signed [7:0]
Here is my code:
assign out = ({a[7],b[7]}==2'b11)||({a[7],b[7]}==2'b00) ? ($unsigned(m1)*$unsigned(m2)) << (shift_m1+shift_m2) : 16'dz;
assign out = ({a[7],b[7]}==2'b01) ? ($signed({1'b0,m1})*$signed(m2)) << (shift_m1+shift_m2) : 16'dz;
assign out = ({a[7],b[7]}==2'b10) ? ($signed(m1)*$signed({1'b0,m2})) << (shift_m1+shift_m2) : 16'dz;
But in simulation Verilog always considers segments as unsigned and does unsigned multiplication even though I noted signed or unsigned mark...
Can anyone help? I read all of the questions about this problem in stackoverflow and other places but still cannot solve this issue...
The rules for non-self determined operands say that if one operand is unsigned, the result is unsigned. 16'dz is unsigned.
The conditional operator i ? j : k has the condition operand i self-determined, but the two selections j and k are in a context based on the assignment or expression it is a part of. The shift operator i << j has the shift amount operand j self-determined.
All of the context rules are explained in section 11.6.1 Rules for expression bit lengths in the IEEE 1800-2017 SystemVerilog LRM.
You can get your desired result by using the signed literal 16'sdz.
However the logic you wrote may not be synthesizable for certain technologies that do not allow using a z state inside your device. The correct and more readable way is using a case statement:
alway #(*) case({a[7],b[7]})
2'b00,
2'b11: out = $unsigned(m1)*$unsigned(m2) << shift_m1+shift_m2;
2'b01: out = $signed({1'b0,m1})*m2 << shift_m1+shift_m2;
2'b10: out = m1*$signed({1'b0,m2}) << shift_m1+shift_m2;
endcase

Floating point addition with LSB error

I'm implementing a hardware double precision adder with Verilog. During the verification phase when I compare my hardware output to MATLAB (or C) double precision addition outputs I found some weird cases where the LSB is not matching, taking into account that I'm using the same rounding mode (round to nearest even). My question is about the accuracy of the C calculation, is it truly accurate in doing the rounding or it's limited to some CPU architecture (32 or 64 bits)?
Here's an example,
A = 0x62a5a1c59bd10037 = 1.5944933396238637e+167
B = 0x62724bc40659bf0c = 1.685748657333889e+166 = 0.1685748657333889e+167
The correct output (just by doing the addition of the above real numbers manually)
= 1.7630682053572526e+167 = 0x62a7eb3e1c9c3819 (this matches my hardware)
When I try doing A+B in C, the result is equal to
= 1.7630682053572525e+167 = 0x62a7eb3e1c9c3818
When I try this application to check the intermediate operations
http://www.ecs.umass.edu/ece/koren/arith/simulator/FPAdd/
I can see from mantissa addition that C is not doing the rounding correctly (round to nearest even). In this case the mantissa should be rounded by adding one. Any idea why this is happening?
The operation of http://www.ecs.umass.edu/ece/koren/arith/simulator/FPAdd/ is correct. The last round to nearest even peforms a downward rounding:
A+B + 1.0111111010110011111000011100100111000011100000011000|10 *2^555
^
|
to forget the |10 part (exactly in the middle), the result chooses 0 (even) instead of 1

Finding a remainder using division of 10

I need to find the right most bit of any integer. So, i can find the remainder of the value divided by 10 (i.e) a = rem(Num1,10); in Matlab.. How to do the same using Verilog . I have Xilinx 14.1 and 9.1..
% is the modulus operator in verilog, just like in C
looking at the comments, it looks like you want to make a rounding function: here's something that will do that:
One note: the code below will be VERY inefficient since % is expensive in hardware. Consider dividing by a power of 2 like 8 or 16 instead of 10.
module round
(
input wire[31:0] x,
output reg[31:0] rounded
);
reg[31:0] remainder;
always #(*) begin
// % operator is VERY slow and expensive!!!
remainder = (x % 32'd10);
// the lines below are decently efficient
if (remainder < 32'd5)
rounded = x - remainder;
else
rounded = x + (32'd10 - remainder);
end
endmodule

fixed point integer division ("fractional division") algorithm

The Honeywell DPS8 computer (and others) have/had a "divide fractional" instruction:
"This instruction divides a 71-bit fractional dividend (including sign) by a 36-bit
fractional divisor (including sign) to form a 36-bit fractional quotient (including
sign) and a 36-bit fractional remainder (including sign). Bit 35 of the remainder
corresponds to bit 70 of the dividend. The remainder sign is equal to the dividend
sign unless the remainder is zero."
So, as I understand it, this is integer division with the decimal point way over on the left.
.qqqqq / .ddddd
(I did scaled integer math in FORTH back in the day, but my memories of the techniques are lost in fog of time.)
To implement this instruction in a DPS8 emulator, I believe I need to start by creating two 70 bit numbers: the 71 bit dividend less it's sign bit, and the the 36 bit divisor less its sign bit and shifted 35 bits to the left so that the decimal points line up.
I think I can then form the remainder and quotient (in C) with '%' and '/', but I am unsure if those results need to be normalized (i.e. shifted).
I found an example of a "shift and subtract" algorithm "Computer Arithmetic", slide 10), but I would prefer a more straight forward implementation.
Am I on the right track, or is the solution more nuanced (fixing up the signs and detection of errors have been elided from here; those stages are well documented. The actual division is the issue.). Any pointers to C implementations of this kind of hardware emulation would be particularly helpful.
I do not have the definitive answer, but as a division is a division, you might find it helpful to look at some basic division routines.
Imagine that you have a 32-bit variable and you want an 8-bit fractional part.
You then have an integer part between 0 and 16777215, and a fractional part which is between 0 and 255.
0xiiiiiiff (where i is the integer part, f is the fractional part).
Imagine you have a 24-bit dividend (numerator), say the value 3, and a 24-bit divisor (denominator), say the value 13.
As we quickly will see, 3/13 is greater than zero and less than one. That means our fractional part is nonzero, but our integer part is filled completely with zeros.
So to do the above division using a standard divide function, we'll just bit-shift the dividend by N, thus we will get N bits of precision in our fractional part.
quotient_fp = (dividend_ip << 8) / divisor_ip
So far, so good.
But what if we want the divisor to have a fractional part, then ?
If we just shift the divisor up by 8, then we'll have a problem:
(dividend_ip << 8) / (divisor_ip << 8)
- because we'll obviously lose our fractional part of the quotient (result).
Instead, we'll need to shift the dividend up by as many bits as we shift the fractional part up...
((dividend_ip << 8) << 8) / (divisor_ip << 8)
...That makes it...
(dividend_ip << (dividend_precision + divisor_precision) / (divisor_ip << divisor_precision)
Now, let's put our fractional part math into the picture...
(((dividend_ip << dividend_precision) | dividend_fp) << divisor_precision) / ((divisor_ip << divisor_precision) | divisor_fp)
Our quotient's precision will be the same as dividend_precision, which is 8 bits.
Unfortunately, this eats a lot of bits.
Fortunately, in your case, the integer part is not important, so you'll have a lot of room for the fractional part.
Let's increase the precision to 15 bits; this can be tested using normal 32-bit integers...
(((dividend_ip << 15) | dividend_fp) << 15) / ((divisor_ip << 15) | divisor_fp)
Our quotient will now have a 15-bit precision.
OK, but since you're supplying only the fractional parts and the integer part is always zero anyway, you should be able to just toss the integer part. That makes it....
(((dividend_ip << 16) | dividend_fp) << 16) / ((divisor_ip << 16) | divisor_fp)
... reduced to ...
(dividend_fp << 16) / divisor_fp
... now let's use a 64-bit integer instead, we can get 32 bits of precision in the quotient...
(dividend_fp << 32) / divisor_fp
... some compilers have support for a int128_t (it can be enabled on some platforms for GCC), so you might be able to use that type, in order to get 128 bits easily. I have not tried it, but I've come across info on the Web earlier; search for int128_t, and you might find out how.
If you get the int128_t to work, you could make the dividend 128 bit, the divisor 64 bit and the quotient 64 bit...
quotient_fp = ((dividend_fp << 36) / divisor) >> (64 - 36)
... in order to get 36 bits precision.
Notice that since the result is in the top 36 bits of the quotient, the quotient needs to be shifted down (64 - 36) = 28 bits.
You could even go as high as (128 - 36) = 92 bits precision:
(dividend_fp << 92) / divisor
Now, that you probably (hopefully) have a solution, I would like to recommend that you get familiar with low-level binary divide (again; since you've been there a while ago).
The best sources seem to be how hardware divides binary numbers; such as microcontrollers, CPUs and the like. Assembly language dividers are also good for getting to know the inner workings. Often 32-bit divide routines that use bit-shifting are very good sources.
Through the time, I've come across a very clever implementation for ARM in ARM assembly language. Normally I wouldn't post references or assembly language examples, but considering that the code is very small, I think it would be alright.
Taken from A Fast Hi Precision Fixed Point Divide
r0 is the numerator (dividend)
r2 is the denominator (divisor)
mov r1,#0
adds r0,r0,r0
.rept 32
adcs r1,r2,r1,lsl#1
subcc r1,r1,r2
adcs r0,r0,r0
.endr
r0 is the quotient (result)
r1 is the remainder (rest, modulo result)
The above routine contains the basics for an unsigned divide.
I hope this information will be useful. It may contain errors, as I have not tested any code or example mentioned. I'm confident, though, that it's not all wrong. ;)

Verilog shift extending result?

We have the following line of code and we know that regF is 16 bits long, regD is 8 bits long and regE is 8 bits long, regC is 3 bits long and assumed unsigned:
regF <= regF + ( ( regD << regC ) & { 16{ regE [ regC ]} }) ;
My question is : will the shift regD << regC assume that the result is 8 bits or will it extended to 16 bits because of the bitwise & with the 16 bit vector?
The shift sub-expression itself has a width of 8 bits; the bit width of a shift is always the bit width of the left operand (see table 5-22 in the 2005 LRM).
However, things get more complicated after that. The shift sub-expression appears as an operand of the & operator. The bit length of the & expression is the bit-length of the largest of the 2 operands; in this case, 16 bits.
This sub-expression now appears as an operand of the + expression; the result width of this expression is again the maximum width of the two operands of the +, which is again 16.
We now have an assignment. This is not technically an operand, but the same rules are used; in this case, the LHS is also 16 bits, so the size of the RHS is unaffected.
We now know that the overall expression size is 16 bits; this size is propagated back down to the operands, except the 'self-determined' operands. The only self-determined operand here is the RHS of the shift expression (regC), which isn't extended.
The signedness of the expressions is now determined. Propagation happens in the same way. The overall effect here, since we have at least one unsigned operand, is that the expression is unsigned, and all operands are coerced to unsigned. So, all (non-self-determined) operands are coerced to unsigned 16-bit before any operation is actually carried out.
So, in other words, the shift sub-expression actually ends up as a 16-bit shift, even though it appears to be 8-bit at first sight. Note that it's not 16-bit because the RHS of the & is 16-bit, but because the entire sizing process - the width propagation up the expression - came up with an answer of 16. If you'd assigned to an 18-bit reg, instead of the 16-bit regF, then your shift would have been extended to 18 bits.
This is all very complicated and non-intuitive, at least if you have any experience of mainstream languages. It's explained (more or less) in sections 5.4 and 5.5 of the 2005 LRM. If you want any advice, then never write expressions like this. Write defensively - break everything down to individual sub-expressions, and then combine the sub-expressions.

Resources