I am working on an assignment and am a little lost and don't really know how to get started. I need to implement the following flags in a 32Bit ALU:
• Z ("Zero"): Set to 1 ("True") if the result of the operation is zero
• N ("Negative"): Set to 1 ("True") if the first bit of the result is 1, which indicates a negative number
• O ("Overflow"): Set to 1 ("True") to indicate that the operation overflowed the bus width.
Additionally, a comparison function that compares input a to input b and then set one of three flags:
• LT if input a is less than input b
• GT if input a is greater than input b
• EQ if input a is equal to input b
I need to modify this ALU to include the three flags and comparison outputs then change the test bench to test for all of these modifications.
This was all the information I received for this assignment and there is no textbook or any other resources really. It's an online class, and I cannot get a response from my instructor. So I am a little confused as to how to get started. I am still a total newbie when it comes to digital logic so please bear with me. I just need some help understanding how these flags and comparison works. If any one can explain this a little better to me as far as how they work and what they do, and possibly how I would implement them into the ALU and testbench, I would really appreciate it.
I don't expect anyone to do my assignment, I really just need help understanding it.
ALU
module alu32 (a, b, out, sel);
input [31:0] a, b;
input [3:0] sel;
output [31:0] out,
reg [31:0] out;
//Code starts here
always #(a, b, sel)
begin
case (sel)
//Arithmetic Functions
0 : out <= a + b;
1 : out <= a - b;
2 : out <= b - a;
3 : out <= a * b;
4 : out <= a / b;
5 : out <= b % a;
//Bit-wise Logic Functions
6 : out <= ~a; //Not
7 : out <= a & b; //And
8 : out <= a | b; //Or
9 : out <= a ^ b; //XOR
10 : out <= a ^~ b; //XNOR
//Logic Functions
11 : out <= !a;
12 : out <= a && b;
13 : out <= a || b;
default: out <= a + b;
endcase
end
endmodule
ALU Testbench
module alu32_tb();
reg [31:0] a, b;
reg [3:0] sel;
wire [31:0] out;
initial begin
$monitor("sel=%d a=%d b=%d out=%d", sel,a,b,out);
//Fundamental tests - all a+b
#0 sel=4'd0; a = 8'd0; b = 8'd0;
#1 sel=4'd0; a = 8'd0; b = 8'd25;
#1 sel=4'd0; a = 8'd37; b = 8'd0;
#1 sel=4'd0; a = 8'd45; b = 8'd75;
//Arithmetic
#1 sel=4'd1; a = 8'd120; b = 8'd25; //a-b
#1 sel=4'd2; a = 8'd30; b = 8'd120; //b-a
#1 sel=4'd3; a = 8'd75; b = 8'd3; //a*b
#1 sel=4'd4; a = 8'd75; b = 8'd3; //a/b
#1 sel=4'd5; a = 8'd74; b = 8'd3; //a%b
//Bit-wise Logic Functions
#1 sel=4'd6; a = 8'd31; //Not
#1 sel=4'd7; a = 8'd31; b = 8'd31; //And
#1 sel=4'd8; a = 8'd30; b = 8'd1; //Or
#1 sel=4'd9; a = 8'd30; b = 8'd1; //XOR
#1 sel=4'd10; a = 8'd30; b = 8'd1; //XNOR
//Logic Functions
#1 sel=4'd11; a = 8'd25; //Not
#1 sel=4'd12; a = 8'd30; b = 8'd0; //And
#1 sel=4'd13; a = 8'd0; b = 8'd30; //Or
#1 $finish;
end
alu32 myalu (.a(a), .b(b), .out(out), .sel(sel));
endmodule
You can add these flag outputs to the design. Like the following. Simply connect them in testbench.
// In design:
output zero;
output overflow;
output negative;
// In testbench:
wire zero,overflow,negative;
alu32 myalu (.a(a), .b(b), .out(out), .sel(sel), .zero(zero), .overflow(overflow),.negative(negative));
For logic part, you can do it with continuous assignments. You may need to add some logic for using these flags only during certain values of sel.
Z ("Zero"): Set to 1 ("True") if the result of the operation is zero
So, we can have condition like all the bits of out must be zero. This can be done in many other ways.
// Bit wise OR-ing on out
assign zero = ~(|out);
O ("Overflow"): Set to 1 ("True") to indicate that the operation overflowed the bus width.
According to this description and the code shown, you simply want carry flag here.That is, a signed extension of addition operation. Refer to this page on WikiPedia for overflow condition.
But, Overflow condition is not the same as the carry bit. Overflow represents data loss while carry represents a bit used for calculation in next stage.
So, doing something like following may be useful:
// Extend the result for capturing carry bit
// Simply use this bit if you want result > bus width
{carry,out} <= a+b;
// overflow in signed arithmetic:
assign overflow = ({carry,out[31]} == 2'b01);
N ("Negative"): Set to 1 ("True") if the first bit of the result is 1, which indicates a negative number
Again this is simply the MSB of the out register. But, the underflow condition is entirely a different thing.
// Depending on sel, subtraction must be performed here
assign negative = (out[31] == 1 && (sel == 1 || sel == 2));
Also, simple condition like assign lt = (a<b) ? 1 : 0; and others can detect the input LT, GT and EQ conditions.
Refer the answer here for the overflow/underflow flag understanding. Overflow-Carry link may also be useful.
Refer Carryout-Overflow, ALU in Verilog and ALU PDF for further information about ALU implementation.
Related
I am working on a Verilog fixed point adder, using which I will also do the subtraction. When I do the subtraction not always I get the correct result.
For example, 1-1=0, but I get -0.
Kindly have a look on the below mentioned code:
`timescale 1ns/1ps
module adder #(
//Parameterized values
parameter Q = 27,
parameter N = 32
)
(
input [N-1:0] a,
input [N-1:0] b,
output [N-1:0] c
);
reg [N-1:0] res;
assign c = res;
always #(a,b) begin
// both negative or both positive
if(a[N-1] == b[N-1]) begin //Since they have the same sign, absolute magnitude increases
res[N-2:0] = a[N-2:0] + b[N-2:0]; //So just the two numbers are added
res[N-1] = a[N-1]; //and the sign is set appropriately...
end
// one of them is negative...
else if(a[N-1] == 0 && b[N-1] == 1) begin // subtracts a-b
if( a[N-2:0] > b[N-2:0] ) begin // if a is greater than b,
res[N-2:0] = a[N-2:0] - b[N-2:0];
res[N-1] = 0; // manually the sign is set to positive
end
else begin // if a is less than b,
res[N-2:0] = b[N-2:0] - a[N-2:0]; // subtracting a from b to avoid a 2's complement answer
if (res[N-2:0] == 0)
res[N-1] = 0; // To remove negative zero....
else
res[N-1] = 1; // and manually the sign is set to negative
end
end
else begin // subtract b-a (a negative, b positive)
if( a[N-2:0] > b[N-2:0] ) begin // if a is greater than b,
res[N-2:0] = a[N-2:0] - b[N-2:0]; // subtracting b from a to avoid a 2's complement answer
if (res[N-2:0] == 0)
res[N-1] = 0;
else
res[N-1] = 1; // and manually the sign is set to negative
end
else begin // if a is less than b,
res[N-2:0] = b[N-2:0] - a[N-2:0];
res[N-1] = 0;
end
end
end
endmodule
Testbench for the adder is below:
`timescale 1ns/1ps
module tb_adder (
);
reg clk;
reg [ 31 : 0 ] a;
reg [ 31 : 0 ] b;
wire [ 31: 0 ] c;
adder adder_i (
.a(a),
.b(b),
.c(c)
);
parameter CLKPERIODE = 100;
initial clk = 1'b1;
always #(CLKPERIODE/2) clk = !clk;
initial begin
$monitor ("adder=%h", c);
#1
a = 32'h08000000;
b = 32'hF8000000;
#(CLKPERIODE)
$finish();
end
endmodule
I am having a hard time to find where did I go wrong as I am a newbie in Verilog. I am using this module to calculate Taylor Series in Fixed Point arithmetic. Any suggestions?
The only case I could find where your code produces the dirty zero is when both inputs are the dirty zero themselves. i.e.
a = b = 32'h80000000 = "-0"
It looks like this happens because in this case your code takes the branch at
if(a[N-1] == b[N-1]) begin //Since they have the same sign, absolute magnitude increases
and this branch doesn't have the same check as the others that specifically avoids it. You could fix this by moving that code to the end of the always block so it runs no matter what branch is taken earlier.
Can you help me guys do a 32-bit ALU and explain me some things?
Wanna do:
0 bitwise AND: out = inA & inB.
1 bitwise OR: out = inA | inB.
2 addition: out = inA + inB.
6 subtraction: out = inA – inB //2's complement
7 Set On Less Than: out = ((inA < inB)?1:0)
12 NOR: out = ~( inA | inB)
Done this so far:
module ALU #(parameter N=32)(ALUOut, Zero, ALUinA, ALUinB, ALUop);
output [N-1:0] ALUOut;
reg [N-1:0] ALUOut;
output Zero;
reg Zero;
input [3:0] ALUop;
input [N-1:0] ALUinA, ALUinB;
always #(ALUinA or ALUinB or ALUop)
begin
case (ALUop)
4'b0000: ALUOut = ALUinA & ALUinB ; // 0:AND
Your code is good. Just some modifications required. ALUOut must be [N:0], since you'll require a carry bit in case of addition. Also, borrow bit must be required in case of subtraction.
Referring to SystemVerilog LRM 1800-2012 Section 11.6 Expression bit lengths,
SystemVerilog uses the bit length of the operands to determine how many bits to use while evaluating an
expression.
So, ALUOut[N-1:0] = ALUinA[N-1:0] + ALUinB[N-1:0]; will strictly evaluate an expression of N, while ALUOut = ALUinA + ALUinB; will
evaluate depending on size of ALUOut. Here, you can not see the difference, since all youe operands are N bits wide, but when ALUOut is increased to N+1 bits(including carry), then it can create a difference.
For example,
module top();
bit [3:0] a,b;
logic [3:0] sum;
bit carry;
assign sum[3:0] = a[3:0] + b[3:0];
// assign {carry,sum}= a + b;
initial
$monitor("a = %0d b = %0d carry = %0d sum = %0d",a,b,carry,sum);
initial
begin
a = 5; b = 1;
#5 ; a = 15; b = 1;
end
endmodule
shall execute to a = 15 b = 1 carry = 0 sum = 0 while, using the commented assign statement executes to a = 15 b = 1 carry = 1 sum = 0
Refer to LRM 1800-2012, Section 11.6 for further information.
Also, this and this links regarding ALU design can be useful.
In 2's complement -B is ~B+1 (~ is bit invert). Therefor A - B == A + (-B) == A + ~B + 1. But your doing RTL, so you don't need to write the 2's complement for subtraction as it is default. A - B and A + ~B + 1 will synthesize the same.
A[N-1:0] + B[N-1:0] is always an unsigned operation. A + B can be a signed operation if A and B are declared as input signed [N-1:0] A, B, otherwise it is an unsigned operation.
Other notes:
There is an issue with your header. Many simulators, synthesizers, and other Verilog tools will accept what you have, but it is not complaint with the IEEE standard. There are two header styles, ANSI and non-ANSI. I recommend ANSI unless required to follow the IEEE1364-1995 version of the standard.
ANSI style (IEEE Std 1364-2001 and above):
module ALU #(parameter N=32)(
output reg [N-1:0] ALUOut,
output reg Zero,
input [N-1:0] ALUinA, ALUinB,
input [3:0] ALUop );
Non-ANSI style (IEEE Std 1364-1995 and above):
module ALU (ALUOut, Zero, ALUinA, ALUinB, ALUop);
parameter N=32;
output [N-1:0] ALUOut;
output Zero;
input [3:0] ALUop;
input [N-1:0] ALUinA, ALUinB;
reg [N-1:0] ALUOut;
reg Zero;
always #(ALUinA or ALUinB or ALUop) is syntax legal. However since IEEE1364-2001 combinational logic is recommenced to be written as always #* or always #(*) (#* and #(*) are synonymous, user preference). With SystemVerilog (IEEE1800), the successor of Verilog (IEEE1364), always_comb is recommend over always #* for combinational logic, and always_latch for level-sensitive latching logic.
Imagine we want to describe a combinational circuit that satisfy the following truth table:
a b | s0 s1 s2 s3
-----------------
0 0 | 1 d d d
0 1 | 0 1 d d
1 0 | 0 0 1 d
1 1 | 0 0 0 1
(where d stands for "don't care" value, that is, we don't care if the value of this output is 0 or 1)
If we go through traditional design, we can take advantage of these "don't cares" and assign to them the most convenient values so the resulting equations (and hence, the circuit) are the most simple ones. For example, we could change the previous truth table into this one:
a b | s0 s1 s2 s3
-----------------
0 0 | 1 1 1 1
0 1 | 0 1 0 1
1 0 | 0 0 1 1
1 1 | 0 0 0 1
And the final equations would be (using Verilog notation):
s0 = ~a & ~b;
s1 = ~a;
s2 = ~b;
s3 = 1;
(remember when you had to choose values for your outputs in a K-map so you would group as much cells as you can)
But what if I choose to design it using Verilog? I cannot do this:
module encoder (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b1ddd;
2'b01 : s = 4'b01dd;
2'b10 : s = 4'b001d;
2'b11 : s = 4'b0001;
default: s = 4'bdddd;
endcase
end
endmodule
I was told at How to assign default values to outputs in a combinational always block... that I couldn't use x as an output either, only as input. And if I use z, the resulting circuit is even worse in terms of complexity and resources used, as tristate buffers are needed.
So I'm forced to choose at design time which values (1 or 0) I want to output, and these values don't have to yield the most optimized circuit:
module encoder (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b1000;
2'b01 : s = 4'b0100;
2'b10 : s = 4'b0010;
2'b11 : s = 4'b0001;
default: s = 4'b0000;
endcase
end
endmodule
Which leads to these equations (ignoring the default clause for the moment):
s0 = ~a & ~b;
s1 = ~a & b;
s2 = a & ~b;
s3 = a & b;
Or this implementation (taken from the output of YOSIS 0.3.0 at EdaPlayGround):
Which may or may not be the best solution for a given target, but it is what we allow the synthesizer to infer given the outputs we want.
Using the XST synthesizer targetting a Spartan 3E-100k FPGA, the above module uses 2 slices and 4 LUTs.
I assume that Verilog (or any other HDL) should free the designer from having to do such choices, so the synthesizer can apply whatever optimizations are available if the designer allows it to choose the most convenient value for a given output and for a given set of inputs. If that would be the case, then the previous design could have been optimized to look like this:
Targetting the same FPGA as above, it uses 2 slices and 3 LUTs.
For this example, I've been able to make optimizations by hand, but consider a controller module with dozen of outputs to a datapath module. There could be output signals from the controller that may have a don't care value for a given state of the controller.
For example: controller outputs a signal to select from register A or register B, and another signal to enable load of register C, so register C can be loaded with either A or B, or keep its current value.
If load is 0, I don't really care about the value of select, so everytime in the controller description I output load = 0, I should be able to output a "don't care" to select.
So my questions are:
Is there any way to write a Verilog (not SystemVerilog) description so I can give "don't care values" to outputs from combinational blocks?
If not, is this a limitation in the language, or is it much a matter of "you should make your designs so 'don't care' values are not needed"?
ADDENDUM
To my surprise, XST recognizes `x` as a valid output. It's synthesizable and seems to behave the way I expected, resulting in the same circuit to be implemented with 2 slices and 3 LUTs. YOSIS, on the other way, seems to ignore it and produces the same output as the non optimized design.
Rectification: I've tested XST with another design: a circuit that produces this truth table:
a b | s0 s1 s2 s3
-----------------
0 0 | 0 d d d
0 1 | 1 0 d d
1 0 | 1 1 0 d
1 1 | 1 1 1 0
The corresponding Verilog module, written without don't cares, could be written in a number of ways, for example, this one:
module encoder (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b0111;
2'b01 : s = 4'b1011;
2'b10 : s = 4'b1101;
2'b11 : s = 4'b1110;
default: s = 4'b1111;
endcase
end
endmodule
Which produces the worst result, in terms of minimization (2 slices, 4 LUTs in a Spartan 3E FPGA)
An optimized by hand version can be obtained by starting from this truth table:
a b | s0 s1 s2 s3
-----------------
0 0 | 0 0 0 0
0 1 | 1 0 1 0
1 0 | 1 1 0 0
1 1 | 1 1 1 0
It's easy to observe here that 3 from 4 outputs can be obtained without a single logic gate. Thus, XST reports 1 slice, 1 LUT (the only one needed to calculate s0)
module encoder (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b0000;
2'b01 : s = 4'b1010;
2'b10 : s = 4'b1100;
2'b11 : s = 4'b1110;
default: s = 4'b1110; // yes, same output as above
endcase
end
endmodule
If use the dirty trick of using x as a "don't care":
module encoder (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b0xxx;
2'b01 : s = 4'b10xx;
2'b10 : s = 4'b110x;
2'b11 : s = 4'b1110;
default: s = 4'bxxxx;
endcase
end
endmodule
The design synthesizes, but the result is not minimal. XST reports 1 slice, 2 LUTs.
The paper #Tim links in his comment is very clear about this matter: avoid using x in your designs. But according to this example, the language does not allow us to help the synthesizer to minimize a circuit.
Saving one or two LUTs may not be a great deal, but if the savings allow this module to stay within a slice, the P&R will have less work to place it wherever it wants.
When I use Quartus II ver 15.0, assiging "don't care" to output is OK and generated area-efficient circuit.
For example, if I synthesize this code, that :
module test1 (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b1000;
2'b01 : s = 4'b0100;
2'b10 : s = 4'b0010;
2'b11 : s = 4'b0001;
default: s = 4'b0000;
endcase
end
endmodule
Quartus generated a circuit which uses 5 Logic Elements.
However, If I use "don't care" assignment in the code above:
module test1 (
input wire a,
input wire b,
output reg [3:0] s
);
always #* begin
case ({a,b})
2'b00 : s = 4'b1xxx;
2'b01 : s = 4'b01xx;
2'b10 : s = 4'b001x;
2'b11 : s = 4'b0001;
default: s = 4'b0000;
endcase
end
endmodule
a circuit which uses only 2 Logic Elements is generated. It's interesting that although the total logic elements are less used, the generated circuit seems to be more complex.
I was wondering whether the generated circuit is correct. So I ran Quartus's simulator with the circuit which uses "don't care". The result is the simplest circuit we want.
I would think that supplying x to an output would do the trick -- "unknown" should do exactly what you want. I believe you can wire it directly as an output, but if that's forbidden, you could generate it by wiring both 1 and 0 to the output.
I'm trying to implement as follows to multiplying by 15.
module mul15(
output [10:0] result,
input [3:0] a
);
assign result = a*15;
endmodule
But is there any improve way to multiplying to a by 15?
I think there are 2 ways like this
1.result = a<<4 -1;
2.result = {a,3'b1111_1111};
Ans I think the best way is 2.
but I'm not sure also with aspect to synthesis.
update:
What if I am multiplying 0 at {a,3'b1111_1111}? This is 255 not 0.
Does anyone know the best way?
Update
How about this way?
Case1
result = {a,8'b0}+ {a,7'b0}+ {a,6'b0}+ {a,5'b0}+ {a,4'b0}+ {a,7'b0}+ {a,3'b0}+ {a,2'b0}+ {a,1'b0}+ a;
But it looks 8 adder used.
Case2
result = a<<8 -1
I'm not sure what is the best way else.
There is always a*16 - a. Static multiplications of power of 2 are basically free in hardware; it is just hard-coded 0s to the LSB. So you just need one 11-bit full-subtracter, which is a full adder and some inverters.
other forms:
result = a<<4 - a;
result = {a,4'b0} - a; // unsigned full-subtractor
result = {a,4'b0} + ~a + 1'b1; // unsigned full-adder w/ carry in, 2's complement
result = {{3{a[3]}},a,4'b0} + ~{ {7{a[3]}}, a} + 1'b1; // signed full-adder w/ carry in, 2's complement
The cleanest RTL version is as you have stated in the question:
module mul15(
input [3:0] a
output reg [7:0] result,
);
always #* begin
result = a * 4'd15;
end
endmodule
The Multiplicand 15 in binary is 4'b1111; That is 8 + 4 + 2 + 1.
Instead of a multiplier it could be broken down into the sum of these powers of 2. Powers of 2 are just barrel shifts. This is how a shift and add multiplier would work.
module mul15(
input [3:0] a
output reg [7:0] result,
);
always #* begin
// 8 4 2 1 =>15
result = (a<<3) + (a<<2) + (a<<1) + a;
end
endmodule
To minimise the number of adders required a CSD could be used. making 15 out of 16-1:
module mul15(
input [3:0] a
output reg [7:0] result,
);
always #* begin
// 16 - 1 =>15
result = (a<<4) - a;
end
endmodule
With a modern synthesis tool these should all result in same the thing. Therefore having more readable code which gives a clear instruction to the tool as to what you intended gives it the free rein to optimise as required.
I know how to design a 4x4 array multiplier , but if I follow the same logic , the coding becomes tedious.
4 x 4 - 16 partial products
64 x 64 - 4096 partial products.
Along with 8 full adders and 4 half adders, How many full adders and half adders do I need for 64 x 64 bit. How do I reduce the number of Partial products? Is there any simple way to solve this ?
Whenever tediously coding a repetitive pattern you should use a generate statement instead:
module array_multiplier(a, b, y);
parameter width = 8;
input [width-1:0] a, b;
output [width-1:0] y;
wire [width*width-1:0] partials;
genvar i;
assign partials[width-1 : 0] = a[0] ? b : 0;
generate for (i = 1; i < width; i = i+1) begin:gen
assign partials[width*(i+1)-1 : width*i] = (a[i] ? b << i : 0) +
partials[width*i-1 : width*(i-1)];
end endgenerate
assign y = partials[width*width-1 : width*(width-1)];
endmodule
I've verified this module using the following test-bench:
http://svn.clifford.at/handicraft/2013/array_multiplier/array_multiplier_tb.v
EDIT:
As #Debian has asked for a pipelined version - here it is. This time using a for loop in an always-region for the array part.
module array_multiplier_pipeline(clk, a, b, y);
parameter width = 8;
input clk;
input [width-1:0] a, b;
output [width-1:0] y;
reg [width-1:0] a_pipeline [0:width-2];
reg [width-1:0] b_pipeline [0:width-2];
reg [width-1:0] partials [0:width-1];
integer i;
always #(posedge clk) begin
a_pipeline[0] <= a;
b_pipeline[0] <= b;
for (i = 1; i < width-1; i = i+1) begin
a_pipeline[i] <= a_pipeline[i-1];
b_pipeline[i] <= b_pipeline[i-1];
end
partials[0] <= a[0] ? b : 0;
for (i = 1; i < width; i = i+1)
partials[i] <= (a_pipeline[i-1][i] ? b_pipeline[i-1] << i : 0) +
partials[i-1];
end
assign y = partials[width-1];
endmodule
Note that with many synthesis tools it's also possible to just add (width) register stages after the non-pipelined adder and let the tools register balancing pass do the pipelining.
[how to] reduce the number of partial products?
A method somewhat common used to be modified Booth encoding:
At the cost of more complicated addend selection, it at least almost halves their number.
In its simplest form, considering groups of three adjacent bits (overlapping by one) from one of the operands, say, b, and selecting 0, a, 2a, -2a or -a as an addend.
The code below generates only half of expected the output.
module arr_multi(a, b, y);
parameter w = 8;
input [w-1:0] a, b; // w-width
output [(2*w)-1:0] y; // p-partials
wire [(2*w*w)-1:0] p; //assign width as input bits multiplied by
output bits
genvar i;
assign p[(2*w)-1 : 0] = a[0] ? b : 0; //first output size bits
generate
for (i = 1; i < w; i = i+1)
begin
assign p[(w*(4+(2*(i-1))))-1 : (w*2)*i] = (a[i]?b<<i :0) + p[(w*(4+(2*
(i-2))))-1 :(w*2)*(i-1)];
end
endgenerate
assign y=p[(2*w*w)-1:(2*w)*(w-1)]; //taking last output size bits
endmodule