I want to update inputs in verilog - verilog

I want to implement the following algorithm in verilog. Is there a way to update input in verilog just like c/c++(eg, a=a+i;)
a←a + b + 2 · lsw(a) · lsw(b)
d←(d ⊕ a) >>> 32
c←c + d + 2 · lsw(c) · lsw(d)
b←(b ⊕ c) >>> 24
a←a + b + 2 · lsw(a) · lsw(b)
d←(d ⊕ a) >>> 16
c←c + d + 2 · lsw(c) · lsw(d)
d←(b ⊕ c) >>> 63
here a,b,c,d are 64 bit ASCII inputs. And lsw(x) is the least significant word of 32 bit."⊕" denotes bit wise XOR and "+" denotes wordwise addition., and >>> denotes right shift operation.

I assume that you want the code to execute line by line as in C. You can use blocking statements inside an always statement for this. You will need a clock signal, because of the feedback nature of your signals.
For example,
always# (posedge clk)
begin
a = a+1;
b = b*a;
c = a+b-c;
end
The above code will execute line by line, on every positive edge of clock.
And for the operators you require,
lsw(x) - the least significant word of 32 bit can be represented by x[7:0] assuming x is defined as [31:0].
"⊕" bit wise XOR can be represented by ^ operator.
"+" and "." can be represented by | and & respectively.
right shift can be done by >> operator.

Related

Splitting an int64 into two int32, performing math, then re-joining

I am working within constraints of hardware that has 64bit integer limit. Does not support floating point. I am dealing with very large integers that I need to multiply and divide. When multiplying I encounter an overflow of the 64bits. I am prototyping a solution in python. This is what I have in my function:
upper = x >> 32 #x is cast as int64 before being passed to this function
lower = x & 0x00000000FFFFFFFF
temp_upper = upper * y // z #Dividing first is not an option, as this is not the actual equation I am working with. This is just to make sure in my testing I overflow unless I do the splitting.
temp_lower = lower * y // z
return temp_upper << 32 | lower
This works, somewhat, but I end up losing a lot of precision (my result is off by sometimes a few million). From looking at it, it appears that this is happening because of the division. If sufficient enough it shifts the upper to the right. Then when I shift it back into place I have a gap of zeroes.
Unfortunately this topic is very hard to google, since anything with upper/lower brings up results about rounding up/down. And anything about splitting ints returns results about splitting them into a char array. Anything about int arithmetic bring up basic algebra with integer math. Maybe I am just not good at googling. But can you guys give me some pointers on how to do this?
Splitting like this is just a thing I am trying, it doesnt have to be the solution. All I need to be able to do is to temporarily go over 64bit integer limit. The final result will be under 64bit (After the division part). I remember learning in college about splitting it up like this and then doing the math and re-combining. But unfortunately as I said I am having trouble finding anything online on how to do the actual math on it.
Lastly, my numbers are sometimes small. So I cant chop off the right bits. I need the results to basically be equivalent to if I used something like int128 or something.
I suppose a different way to look at this problem is this. Since I have no problem with splitting the int64, we can forget about that part. So then we can pretend that two int64's are being fed to me, one is upper and one is lower. I cant combine them, because they wont fit into a single int64. So I need to divide them first by Z. Combining step is easy. How do I do the division?
Thanks.
As I understand it, you want to perform (x*y)//z.
Your numbers x,y,z all fit on 64bits, except that you need 128 bits for intermediate x*y.
The problem you have is indeed related to division: you have
h * y = qh * z + rh
l * y = ql * z + rl
h * y << 32 + l*y = (qh<<32 + ql) * z + (rh<<32 + rl)
but nothing says that (rh<<32 + rl) < z, and in your case high bits of l*y overlap low bits of h * y, so you get the wrong quotient, off by potentially many units.
What you should do as second operation is rather:
rh<<32 + l * y = ql' * z + rl'
Then get the total quotient qh<<32 + ql'
But of course, you must care to avoid overflow when evaluating left operand...
Since you are splitting only one of the operands of x*y, I'll assume that the intermediate result always fits on 96 bits.
If that is correct, then your problem is to divide a 3 32bits limbs x*y by a 2 32bits limbs z.
It is thus like Burnigel - Ziegler divide and conquer algorithm for division.
The algorithm can be decomposed like this:
obtain the 3 limbs a2,a1,a0 of multiplication x*y by using karatsuba for example
split z into 2 limbs z1,z0
perform the div32( (a2,a1,a0) , (z1,z0) )
here is some pseudo code, only dealing with positive operands, and with no guaranty to be correct, but you get an idea of implementation:
p = 1<<32;
function (a1,a0) = split(a)
a1 = a >> 32;
a0 = a - (a1 * p);
function (a2,a1,a0) = mul22(x,y)
(x1,x0) = split(x) ;
(y1,y0) = split(y) ;
(h1,h0) = split(x1 * y1);
assert(h1 == 0); -- assume that results fits on 96 bits
(l1,l0) = split(x0 * y0);
(m1,m0) = split((x1 - x0) * (y0 - y1)); -- karatsuba trick
a0 = l0;
(carry,a1) = split( l1 + l0 + h0 + m0 );
a2 = l1 + m1 + h0 + carry;
function (q,r) = quorem(a,b)
q = a // b;
r = a - (b * q);
function (q1,q0,r0) = div21(a1,a0,b0)
(q1,r1) = quorem(a1,b0);
(q0,r0) = quorem( r1 * p + a0 , b0 );
(q1,q0) = split( q1 * p + q0 );
function q = div32(a2,a1,a0,b1,b0)
(q,r) = quorem(a2*p+a1,b1*p+b0);
q = q * p;
(a2,a1)=split(r);
if a2<b1
(q1,q0,r)=div21(a2,a1,b1);
assert(q1==0); -- since a2<b1...
else
q0=p-1;
r=(a2-b1)*p+a1+b1;
(d1,d0) = split(q0*b0);
r = (r-d1)*p + a0 - d0;
while(r < 0)
q = q - 1;
r = r + b1*p + b0;
function t=muldiv(x,y,z)
(a2,a1,a0) = mul22(x,y);
(z1,z0) = split(z);
if z1 == 0
(q2,q1,r1)=div21(a2,a1,z0);
assert(q2==0); -- otherwise result will not fit on 64 bits
t = q1*p + ( ( r1*p + a0 )//z0);
else
t = div32(a2,a1,a0,z1,z0);

Rounding of bits during different arithmetic operation in verilog?

I am new to HDL coding.
I have a problem regarding rounding and there are different types of rounding available, like Round half away from zero, Round half to even, Round half to odd etc. (found in https://en.wikipedia.org/wiki/Rounding). I wrote a verilog code of rounding for my multiplication, addition etc. with the help of internet and the code is working properly. But, I want to know what type of rounding the code is performing. Whether it is Round half to odd or Round half to even or anything else. Can anybody tell me what type of rounding the code is performing? The code is given below
`timescale 1ns / 1ps
module rounding();
parameter x = 16;
parameter y = 8;
reg [(x)-1:0] A=16'b0010011010110011;
reg [(x)-1:0] B;
initial begin
B = (A[y-1:0]) >= (1 << y-1) ? (A >>> y) + 1 : (A >>> y);
$display("%b", B);
end
endmodule
If I give input A=16'b0010011010110011; output is coming as 0000000000100111;
if A=16'b0000111100001111; output is coming as 0000000000001111
Your code divides input value A by 256 and apply rounding to the nearest integer:
A=16'b0010011010110011 -> 9907 (dec) -> 9907/256 = 38.69 (round up) ~ 39 -> 0000000000100111
A=16'b0000111100001111 -> 3855 (dec) -> 3855/256 = 15.06 (round down) ~ 15 -> 0000000000001111

Convert DFA to RE

I constructed a finite automata for the language L of all strings made of the symbols 0, 1 and 2 (Σ = {0, 1, 2}) where the last symbol is not smaller than the first symbol. E.g., the strings 0, 2012, 01231 and 102 are in the language, but 10, 2021 and 201 are not in the language.
Then from that an GNFA so I can convert to RE.
My RE looks like this:
(0(0+1+2)* )(1(0(1+2)+1+2)* )(2((0+1)2+2))*)
I have no idea if this is correct, as I think I understand RE but not entirely sure.
Could someone please tell me if it’s correct and if not why?
There is a general method to convert any DFA into a regular expression, and is probably what you should be using to solve this homework problem.
For your attempt specifically, you can tell whether an RE is incorrect by finding a word that should be in the language, but that your RE doesn't accept, or a word that shouldn't be in the language that the RE does accept. In this case, the string 1002 should be in the language, but the RE doesn't match it.
There are two primary reasons why this string isn't matched. The first is that there should be a union rather than a concatenation between the three major parts of the language (words starting with 0, 1 and 2, respectively:
(0(0+1+2)*) (1(0(1+2)+1+2)*) (2((0+1)2+2))*) // wrong
(0(0+1+2)*) + (1(0(1+2)+1+2)*) + (2((0+1)2+2))*) // better
The second problem is that in the 1 and 2 cases, the digits smaller than the starting digit need to be repeatable:
(1(0 (1+2)+1+2)*) // wrong
(1(0*(1+2)+1+2)*) // better
If you do both of those things, the RE will be correct. I'll leave it as an exercise for you to follow that step for the 2 case.
The next thing you can try is find a way to make the RE more compact:
(1(0*(1+2)+1+2)*) // verbose
(1(0*(1+2))*) // equivalent, but more compact
This last step is just a matter of preference. You don't need the trailing +1+2 because 0* can be of zero length, so 0*(1+2) covers the +1+2 case.
You can use an algorithm but this DFA might be easy enough to convert as a one-off.
First, note that if the first symbol seen in the initial state is 0, you transition to state A and remain there. A is accepting. This means any string beginning with 0 is accepted. Thus, our regular expression might as well have a term like 0(0+1+2)* in it.
Second, note that if the first symbol seen in the initial state is 1, you transition to state B and remain in states B and D from that point on. You only leave B if you see 0 and you stay out of B as long as you keep seeing 0. The only way to end on D is if the last symbol you saw was 0. Therefore, strings beginning with 1 are accepted if and only if the strings don't end in 0. We can have a term like 1(0+1+2)*(1+2) in our regular expression as well to cover these cases.
Third, note that if the first symbol seen in the initial state is 2, you transition to state C and remain in states C and E from that point on. You leave state C if you see anything but 2 and stay out of B until you see a 2 again. The only way to end up on C is if the last symbol you saw was 2. Therefore, strings beginning with 2 are accepted if and only if the strings end in 2. We can have a term like 2(0+1+2)*(2) in our regular expression as well to cover these cases.
Finally, we see that there are no other cases to consider; our three terms cover all cases and the union of them fully describes our language:
0(0+1+2)* + 1(0+1+2)*(1+2) + 2(0+1+2)*2
It was easy to just write out the answer here because this DFA is sort of like three simple DFAs put together with a start state. More complicated DFAs might be easier to convert to REs using algorithms that don't require you understand or follow what the DFA is doing.
Note that if the start state is accepting (mentioned in a comment on another answer) the RE changes as follows:
e + 0(0+1+2)* + 1(0+1+2)*(1+2) + 2(0+1+2)*2
Basically, we just tack the empty string onto it since it is not already generated by any of the other parts of the aggregate expression.
You have the equivalent of what is known as a right-linear system. It's right-linear because the variables occur on the right hand sides only to the first degree and only on the right-hand sides of each term. The system that you have may be written - with a change in labels from 0,1,2 to u,v,w - as
S ≥ u A + v B + w C
A ≥ 1 + (u + v + w) A
B ≥ 1 + u D + (v + w) B
C ≥ 1 + (u + v) E + w C
D ≥ u D + (v + w) B
E ≥ (u + v) E + w C
The underlying algebra is known as a Kleene algebra. It is defined by the following identities that serve as its fundamental properties
(xy)z = x(yz), x1 = x = 1x,
(x + y) + z = x + (y + z), x + 0 = x = 0 + x,
y0z = 0, w(x + y)z = wxz + wyz,
x + y = y + x, x + x = x,
with a partial ordering relation defined by
x ≤ y ⇔ y ≥ x ⇔ ∃z(x + z = y) ⇔ x + y = y
With respect to this ordering relation, all finite subsets have least upper bounds, including the following
0 = ⋁ ∅, x + y = ⋁ {x, y}
The sum operator "+" is the least upper bound operator.
The system you have is a right-linear fixed point system, since it expresses the variables on the left as a (right-linear) function, as given on the right, of the variables. The object being specified by the system is the least solution with respect to the ordering; i.e. the least fixed point solution; and the regular expression sought out is the value that the main variable has in the least fixed point solution.
The last axiom(s) for Kleene algebras can be stated in any of a number of equivalent ways, including the following:
0* = 1
the least fixed point solution to x ≥ a + bx + xc is x = b* a c*.
There are other ways to express it. A consequence is that one has identities such as the following:
1 + a a* = a* = 1 + a* a
(a + b)* = a* (b a*)*
(a b)* a = a (b a)*
In general, right linear systems, such as the one corresponding to your problem may be written in vector-matrix form as 𝐪 ≥ 𝐚 + A 𝐪, with the least fixed point solution given in matrix form as 𝐪 = A* 𝐚. The central theorem of Kleene algebras is that all finite right-linear systems have least fixed point solutions; so that one can actually define matrix algebras over Kleene algebras with product and sum given respectively as matrix product and matrix sum, and that this algebra can be made into a Kleene algebra with a suitably-defined matrix star operation through which the least fixed point solution is expressed. If the matrix A decomposes into block form as
B C
D E
then the star A* of the matrix has the block form
(B + C E* D)* (B + C E* D)* C E*
(E + D B* C)* D B* (E + D B* C)*
So, what this is actually saying is that for a vector-matrix system of the form
x ≥ a + B x + C y
y ≥ b + D x + E y
the least fixed point solution is given by
x = (B + C E* D)* (a + C E* b)
y = (E + D B* C)* (D B* a + b)
The star of a matrix, if expressed directly in terms of its components, will generally be huge and highly redundant. For an n×n matrix, it has size O(n³) - cubic in n - if you allow for redundant sub-expressions to be defined by macros. Otherwise, if you in-line insert all the redundancy then I think it blows up to a highly-redundant mess that is exponential in n in size.
So, there's intelligence required and involved (literally meaning: AI) in finding or pruning optimal forms that avoid the blow-up as much as possible. That's a non-trivial job for any purported matrix solver and regular expression synthesis compiler.
An heuristic, for your system, is to solve for the variables that don't have a "1" on the right-hand side and in-line substitute the solutions - and to work from bottom-up in terms of the dependency chain of the variables. That would mean starting with D and E first
D ≥ u* (v + w) B
E ≥ (u + v)* w C
In-line substitute into the other inequations
S ≥ u A + v B + w C
A ≥ 1 + (u + v + w) A
B ≥ 1 + u u* (v + w) B + (v + w) B
C ≥ 1 + (u + v) (u + v)* w C + w C
Apply Kleene algebra identities (e.g. x x* y + y = x* y)
S ≥ u A + v B + w C
A ≥ 1 + (u + v + w) A
B ≥ 1 + u* (v + w) B
C ≥ 1 + (u + v)* w C
Solve for the next layer of dependency up: A, B and C:
A ≥ (u + v + w)*
B ≥ (u* (v + w))*
C ≥ ((u + v)* w)*
Apply some more Kleene algebra (e.g. (x* y)* = 1 + (x + y)* y) to get
B ≥ 1 + N (v + w)
C ≥ 1 + N w
where, for convenience we set N = (u + v + w)*. In-line substitute at the top-level:
S ≥ u N + v (1 + N (v + w)) + w (1 + N w).
The least fixed point solution, in the main variable S, is thus:
S = u N + v + v N (v + w) + w + w N w.
where
N = (u + v + w)*.
As you can already see, even with this simple example, there's a lot of chess-playing to navigate through the system to find an optimally-pruned solution. So, it's certainly not a trivial problem. What you're essentially doing is synthesizing a control-flow structure for a program in a structured programming language from a set of goto's ... essentially the core process of reverse-compiling from assembly language to a high level language.
One measure of optimization is that of minimizing the loop-depth - which here means minimizing the depth of the stars or the star height. For example, the expression x* (y x*)* has star-height 2 but reduces to (x + y)*, which has star height 1. Methods for reducing star-height come out of the research by Hashiguchi and his resolution of the minimal star-height problem. His proof and solution (dating, I believe, from the 1980's or 1990's) is complex and to this day the process still goes on of making something more practical of it and rendering it in more accessible form.
Hashiguchi's formulation was cast in the older 1950's and 1960's formulation, predating the axiomatization of Kleene algebras (which was in the 1990's), so to date, nobody has rewritten his solution in entirely algebraic form within the framework of Kleene algebras anywhere in the literature ... as far as I'm aware. Whoever accomplishes this will have, as a result, a core element of an intelligent regular expression synthesis compiler, but also of a reverse-compiler and programming language synthesis de-compiler. Essentially, with something like that on hand, you'd be able to read code straight from binary and the lid will be blown off the world of proprietary systems. [Bite tongue, bite tongue, mustn't reveal secret yet, must keep the ring hidden.]

How to assign a dynamic range of bits

Assume I have three registers concatenated like so:
{C, A, Q}
Where C is one bit, and A and Q are, say, 8 bits.
I need to periodically perform a shift operation like so:
if (shift_regs) {C, A, Q} <= {C, A, Q} >> 1;
Another register, P, starts with a value equal to the number of bits in Q, in this case 8, and counts down by one every time a shift occurs. The whole sequence stops when P reaches 0, and the output result is 16 bits equal to {A, Q}.
However, I want perform an optimization by detecting the case where Q prematurely becomes zero before all its bits are shifted out, and assign the output result to {{8-bits_shifted_so_far{1'b0}}, A, Q[7:8-bits_shifted_so_far]} instead.
Example sequence:
C = 0, A = 8'b1101_0100, Q = 8'b0000_0011
1) after first shift, {C, A, Q} becomes {1'b0, 8'b0110_1010, 8'b0000_0001}.
2) after second shift, {C, A, Q} becomes {1'b0, 8'b0011_0101, 8'b0000_0000}.
3) Q is now zero, so assign output to {{8-bits_shifted_so_far{1'b0}}, A, Q[7:8-bits_shifted_so_far]}, i.e. {6'b00_0000, 8'b0011_0101, 2'b00}.
How can I implement this in verilog or systemverilog?
In keeping with your example, I will assume the output has a width of 16 bits.
Your problem can be simplified, since you are performing this operation at the point when Q == 8'h00, then your output does not need to contain Q[7:8-bits_shifted_so_far], it can be replaced with zeros. As Serge pointed out, dynamic ranges are not possible in verilog, but your problem has a simple solution:
//Note the blocking assignment
if(shift_regs) {C, A, Q} = {C, A, Q} >> 1;
//Set the output to A bit-shifted left by the desired amount
if(!Q) output = {8'h00, A} << (8-bits_shifted_so_far);
Essentially, all you are doing is outputting A bit shifted relative to how many shift operations have been performed.

Unexpected result of Not operator in assignment

I have two 8-bit inputs A and B,
input [7:0] A,B;
and a 9-bit output F,
output reg [8:0] F;
A and B are combined and assigned to F like this:
F <= ~(A^B);
If A is equal to 8'hFF, and B is equal to 8'hF0, why does F become 9'h1F0 and not 9'h0F0?
Why is the output 9'h1F0 and not 9'h0F0?
You defined F as 9 bits wide. Thus the compiler will expand the right-hand-side arguments to 9 bits before doing any operations.
As both A and B are unsigned they become resp
A = 9'h0FF, B=9'h0F0. EXOR gives 9'h00F. Ones complement then gives 9'h1F0.
Beware that the width expansion does not happen if you put the expression between {}:
F2 = {~(A^B)};
F2 will be 9'h0F0;
Because sections 11.8.2 Steps for evaluating an expression and 11.8.3 Steps for evaluating an assignment of the IEEE 1800-2017 LRM effectively say that the operands get extended first to match the size of the result before any operation is performed.

Resources