How to design a 64 x 64 bit array multiplier in Verilog? - verilog

I know how to design a 4x4 array multiplier , but if I follow the same logic , the coding becomes tedious.
4 x 4 - 16 partial products
64 x 64 - 4096 partial products.
Along with 8 full adders and 4 half adders, How many full adders and half adders do I need for 64 x 64 bit. How do I reduce the number of Partial products? Is there any simple way to solve this ?

Whenever tediously coding a repetitive pattern you should use a generate statement instead:
module array_multiplier(a, b, y);
parameter width = 8;
input [width-1:0] a, b;
output [width-1:0] y;
wire [width*width-1:0] partials;
genvar i;
assign partials[width-1 : 0] = a[0] ? b : 0;
generate for (i = 1; i < width; i = i+1) begin:gen
assign partials[width*(i+1)-1 : width*i] = (a[i] ? b << i : 0) +
partials[width*i-1 : width*(i-1)];
end endgenerate
assign y = partials[width*width-1 : width*(width-1)];
endmodule
I've verified this module using the following test-bench:
http://svn.clifford.at/handicraft/2013/array_multiplier/array_multiplier_tb.v
EDIT:
As #Debian has asked for a pipelined version - here it is. This time using a for loop in an always-region for the array part.
module array_multiplier_pipeline(clk, a, b, y);
parameter width = 8;
input clk;
input [width-1:0] a, b;
output [width-1:0] y;
reg [width-1:0] a_pipeline [0:width-2];
reg [width-1:0] b_pipeline [0:width-2];
reg [width-1:0] partials [0:width-1];
integer i;
always #(posedge clk) begin
a_pipeline[0] <= a;
b_pipeline[0] <= b;
for (i = 1; i < width-1; i = i+1) begin
a_pipeline[i] <= a_pipeline[i-1];
b_pipeline[i] <= b_pipeline[i-1];
end
partials[0] <= a[0] ? b : 0;
for (i = 1; i < width; i = i+1)
partials[i] <= (a_pipeline[i-1][i] ? b_pipeline[i-1] << i : 0) +
partials[i-1];
end
assign y = partials[width-1];
endmodule
Note that with many synthesis tools it's also possible to just add (width) register stages after the non-pipelined adder and let the tools register balancing pass do the pipelining.

[how to] reduce the number of partial products?
A method somewhat common used to be modified Booth encoding:
At the cost of more complicated addend selection, it at least almost halves their number.
In its simplest form, considering groups of three adjacent bits (overlapping by one) from one of the operands, say, b, and selecting 0, a, 2a, -2a or -a as an addend.

The code below generates only half of expected the output.
module arr_multi(a, b, y);
parameter w = 8;
input [w-1:0] a, b; // w-width
output [(2*w)-1:0] y; // p-partials
wire [(2*w*w)-1:0] p; //assign width as input bits multiplied by
output bits
genvar i;
assign p[(2*w)-1 : 0] = a[0] ? b : 0; //first output size bits
generate
for (i = 1; i < w; i = i+1)
begin
assign p[(w*(4+(2*(i-1))))-1 : (w*2)*i] = (a[i]?b<<i :0) + p[(w*(4+(2*
(i-2))))-1 :(w*2)*(i-1)];
end
endgenerate
assign y=p[(2*w*w)-1:(2*w)*(w-1)]; //taking last output size bits
endmodule

Related

what is the the difference between Dataflow model and RTL style coding in verilog/SystemC

I have studied in my school that both the models are used in same perspective but when i went through online there are pages which define some tips to convert Dataflow models to RTL models, so can someone explain me with an example what is the difference exactly. Thanks in advance.
This is a data flow model
module #(parameter width = 8) array_multiplier(
input [width-1:0] a, b,
output [width-1:0] y
);
assign Y = a * b;
endmodule
This is an RTL model of a multiplier (extracted from here)
module #(parameter width = 8) array_multiplier_(
input clk,
input [width-1:0] a, b,
output [width-1:0] y
);
reg [width-1:0] a_pipeline [0:width-2];
reg [width-1:0] b_pipeline [0:width-2];
reg [width-1:0] partials [0:width-1];
integer i;
always #(posedge clk) begin
a_pipeline[0] <= a;
b_pipeline[0] <= b;
for (i = 1; i < width-1; i = i+1) begin
a_pipeline[i] <= a_pipeline[i-1];
b_pipeline[i] <= b_pipeline[i-1];
end
partials[0] <= a[0] ? b : 0;
for (i = 1; i < width; i = i+1)
partials[i] <= (a_pipeline[i-1][i] ? b_pipeline[i-1] << i : 0) +
partials[i-1];
end
assign y = partials[width-1];
endmodule

Systemverilog recursion update value for next stage

I am trying to create a recursive logic in Systemverilog but I seem to be missing the right logic to carry the output of one iteration to the next.
Here is an example of the problem:
parameter WIDTH=4;
module test_ckt #(parameter WIDTH = 4)(CK, K, Z);
input CK;
input [WIDTH-1:0] K;
output reg Z;
wire [WIDTH/2-1:0] tt;
wire [WIDTH-1:0] tempin;
assign tempin = K;
genvar i,j;
generate
for (j=$clog2(WIDTH); j>0; j=j-1)
begin: outer
wire [(2**(j-1))-1:0] tt;
for (i=(2**j)-1; i>0; i=i-2)
begin
glitchy_ckt #(.WIDTH(1)) gckt (tempin[i:i], tempin[(i-1):i-1], tt[((i+1)/2)-1]);
end
// How do I save the value for the next iteration?
wire [(2**(j-1))-1:0] tempin;
assign outer[j].tempin = outer[j].tt;
end
endgenerate
always #(posedge CK)
begin
// How do I use the final output here?
Z <= tt[0];
end
endmodule
module glitchy_ckt #(parameter WIDTH = 1)(A1, B1, Z1);
input [WIDTH-1:0] A1,B1;
output Z1;
assign Z1 = ~A1[0] ^ B1[0];
endmodule
Expected topology:
S1 S2
K3--<inv>--|==
|XOR]---<inv>----|
K2---------|== |
|==
<--gckt---> |XOR]
|==
K1--<inv>--|== |
|XOR]------------|
K0---------|== <-----gckt---->
Example input and expected outputs:
Expected output:
A - 1010
----
S1 0 0 <- j=2 and i=3,1.
S2 1 <- j=1 and i=1.
Actual output:
A - 1010
----
S1 0 0 <- j=2 and i=3,1.
S2 0 <- j=1 and i=1. Here, because tempin is not updated, inputs are same as (j=2 & i=1).
Test-bench:
`timescale 1 ps / 1 ps
`include "test_ckt.v"
module mytb;
reg CK;
reg [WIDTH-1:0] A;
wire Z;
test_ckt #(.WIDTH(WIDTH)) dut(.CK(CK), .K(A), .Z(Z));
always #200 CK = ~CK;
integer i;
initial begin
$display($time, "Starting simulation");
#0 CK = 0;
A = 4'b1010;
#500 $finish;
end
initial begin
//dump waveform
$dumpfile("test_ckt.vcd");
$dumpvars(0,dut);
end
endmodule
How do I make sure that tempin and tt get updated as I go from one stage to the next.
Your code does not have any recursion in it. You were trying to solve it using loops, but generate blocks are very limited constructs and, for example, you cannot access parameters defined in other generate iterations (but you can access variables or module instances).
So, the idea is to use a real recursive instantiation of the module. In the following implementation the module rec is the one which is instantiated recursively. It actually builds the hierarchy from your example (I hope correctly).
Since you tagged it as system verilog, I used the system verilog syntax.
module rec#(WIDTH=1) (input logic [WIDTH-1:0]source, output logic result);
if (WIDTH <= 2) begin
always_comb
result = source; // << generating the result and exiting recursion.
end
else begin:blk
localparam REC_WDT = WIDTH / 2;
logic [REC_WDT-1:0] newSource;
always_comb // << calculation of your expression
for (int i = 0; i < REC_WDT; i++)
newSource[i] = source[i*2] ^ ~source[(i*2)+1];
rec #(REC_WDT) rec(newSource, result); // << recursive instantiation with WIDTH/2
end // else: !if(WIDTH <= 2)
initial $display("%m: W=%0d", WIDTH); // just my testing leftover
endmodule
The module is instantiated first time from the test_ckt:
module test_ckt #(parameter WIDTH = 4)(input logic CK, input logic [WIDTH-1:0] K, output logic Z);
logic result;
rec#(WIDTH) rec(K, result); // instantiate first time )(top)
always_ff #(posedge CK)
Z <= result; // assign the results
endmodule // test_ckt
And your testbench, a bit changed:
module mytb;
reg CK;
reg [WIDTH-1:0] A;
wire Z;
test_ckt #(.WIDTH(WIDTH)) dut(.CK(CK), .K(A), .Z(Z));
always #200 CK = ~CK;
integer i;
initial begin
$display($time, "Starting simulation");
CK = 0;
A = 4'b1010;
#500
A = 4'b1000;
#500 $finish;
end
initial begin
$monitor("Z=%b", Z);
end
endmodule // mytb
Use of $display/$monitor is more convenient than dumping traces for such small examples.
I did not do much testing of what I created, so there could be issues, but you can get basic ideas from it in any case. I assume it should work with any WIDTH which is power of 2.

Unexpected behaviour using the ternary operator (Verilog)

In the following Verilog module, I'd like to understand why the blocking assignment using concatenation doesn't give the same result as the 2 commented out blocking assignments.
When I run the program on the FPGA, it gives the expected result with the 2 blocking assignments (the leds blink), but not with the blocking assignment using concatenation (the leds stay off).
Bonus points for answers pointing to the Verilog specification explaining what is at play here!
/* Every second, the set of leds that are lit will change */
module blinky(
input clk,
output [3:0] led
);
reg [3:0] count = 0;
reg [27:0] i = 0;
localparam [27:0] nTicksPerSecond = 100000000;
assign led = {count[3],count[2],count[1],count[0]};
always # (posedge(clk)) begin
// This works:
//count = i==nTicksPerSecond ? (count + 1) : count;
//i = i==nTicksPerSecond ? 0 : i+1;
// But this doesn't:
{count,i} = i==nTicksPerSecond ?
{count+1, 28'b0 } :
{count , i+1};
end
endmodule
PS: I use Vivado 2018.2
The reason is because the widths of count+1 and i+1 are both 32 bits. An unsized number is 32 bits wide (1800-2017 LRM section 5.7.1) and the width of the addition operator is the size of the largest operand (LRM section 11.6.1). To make your code work, add a proper size to your numeric literals
{count,i} = i==nTicksPerSecond ?
{count+4'd1, 28'b0 } :
{count , i+28'd1};
A simpler way to write this code is
always # (posedge clk)
if (i== nTicksPerSecond)
begin
count <= count + 1;
i <= 0;
end
else
begin
i <= i + 1;
end

Bit slicing in verilog

How can I write wdata[((8*j)+7) : (8*i)] = $random; in verilog programming language? , where i and j are reg type variable. Modelsim gives error for constant range variable. How could I write it in proper manner.
You should think from Hardware prospective for the solution.
Here is one solution. Hope that it will help you.
module temp(clk);
input clk;
reg i, j;
reg [23:0] register, select;
wire [23:0] temp;
initial
begin
i = 'd1;
j = 'd1;
end
generate
for(genvar i = 0; i<24; i++)
begin
assign temp[i] = select[i] ? $random : register[i];
end
endgenerate
always # (posedge clk)
begin
register <= temp;
end
always # *
begin
select = (32'hffff_ffff << ((j<<3)+8)) ^ (32'hffff_ffff << (i<<3));
end
endmodule
Use the array slicing construction. You can find more detailed explanation at Array slicing Q&A
bit [7:0] PA, PB;
int loc;
initial begin
loc = 3;
PA = PB; // Read/Write
PA[7:4] = 'hA; // Read/Write of a slice
PA[loc -:4] = PA[loc+1 +:4]; // Read/Write of a variable slice equivalent to PA[3:0] = PA[7:4];
end
Verilog 2001 Syntax
[M -: N] // negative offset from bit index M, N bit result
[M +: N] // positive offset from bit index M, N bit result

Priority encoder in verilog

I am somewhat new to verilog, I tried running this code but it gives me an error:
module enc(in,out);
input [7:0] in;
output [3:0] out;
reg i;
reg [3:0] out;
always #*
begin
for (i=0;i<7;i=i+1)
begin
if ((in[i]==1) && (in[7:i+1]==0))
out = i;
else
out = 0;
end
end
endmodule
I think it complains about in[7:i+1] but i don't understand why ?
Can someone please advise..
EDIT
ok so I am reluctant to using the X due to their numerous problems.. I was thinking of modifying the code to something like this :
module enc(in,out);
input [7:0] in;
output [2:0] out;
reg i;
reg [2:0] out,temp;
always #*
begin
temp = 0;
for (i=0;i<8;i=i+1)
begin
if (in[i]==1)
temp = i;
end
out = temp;
end
endmodule
Do you think that will do the trick ? I currently don't have access to a simulator..
A priority encoder mean giving priority to a one bit if two or more bits meet the criteria. Looking at your code, it appears you wanted to give priority to a LSB while using a up counter. out is assigned in every look, so even if your could compile, the final result would be 6 or 0.
For an LSB priority encoder, first start with a default value for out and use a down counter:
module enc (
input wire [7:0] in,
output reg [2:0] out
);
integer i;
always #* begin
out = 0; // default value if 'in' is all 0's
for (i=7; i>=0; i=i-1)
if (in[i]) out = i;
end
endmodule
If you are only interested in simulation than your linear loop approach should be fine, something like
out = 0;
for (i = W - 1; i > 0; i = i - 1) begin
if (in[i] && !out)
out = i;
end
If you also care about performance, the question becomes more interesting. I once experimented with different approaches to writing parameterized priority encoders here. It turned out that Synopsys can generate efficient implementation even from the brain-dead loop above but other toolchains needed explicit generate magic. Here is an excerpt from the link:
output [WIDTH_LOG - 1:0] msb;
wire [WIDTH_LOG*WIDTH - 1:0] ors;
assign ors[WIDTH_LOG*WIDTH - 1:(WIDTH_LOG - 1)*WIDTH] = x;
genvar w, i;
integer j;
generate
for (w = WIDTH_LOG - 1; w >= 0; w = w - 1) begin
assign msb[w] = |ors[w*WIDTH + 2*(1 << w) - 1:w*WIDTH + (1 << w)];
if (w > 0) begin
assign ors[(w - 1)*WIDTH + (1 << w) - 1:(w - 1)*WIDTH] = msb[w] ? ors[w*WIDTH + 2*(1 << w) - 1:w*WIDTH + (1 << w)] : ors[w*WIDTH + (1 << w) - 1:w*WIDTH];
end
end
endgenerate
So my Edited solution worked... how silly !! I forgot to declare reg [2:0] i; and instead wrote reg i;
Thanks everybody
Hunks, I have to tell you, all your solutions are either too complex or non-synthesizable, or implement into slow multiplexors. Alexej Bolshakov at OpenCores uploaded an outstandin' parametrizable encoder on Aug 23, 2015, based on OR elements. No muxes, 100% synthesizable. His code (with my tiny formatting):
module encoder #(
parameter LINES = 16,
parameter WIDTH = $clog2(LINES)
)(
input [LINES-1:0] unitary_in,
output wor [WIDTH-1:0] binary_out
);
genvar i, j;
generate
for (i = 0; i < LINES; i = i + 1)
begin: loop_i
for (j = 0; j < WIDTH; j = j + 1)
begin: loop_j
if (i[j])
assign binary_out[j] = unitary_in[i];
end
end
endgenerate
endmodule
RTL viewer screenshot, Model-Sim screenshot
This solution divides the input into four blocks and checks for the first nonzero block. This block is further subdivided in the same way. It is reasonably efficient.
// find position of most significant 1 bit in 64 bits input
// (system verilog)
module bitscan(
input logic [63:0] in, // number input
output logic [5:0] out, // bit position output
output logic zeroout // indicates if input is zero
);
logic [63:0] m0; // intermediates
logic [15:0] m1;
logic [3:0] m2;
logic [5:0] r;
always_comb begin
m0 = in;
// choose between four 16-bit blocks
if (|m0[63:48]) begin
m1 = m0[63:48];
r[5:4] = 3;
end else if (|m0[47:32]) begin
m1 = m0[47:32];
r[5:4] = 2;
end else if (|m0[31:16]) begin
m1 = m0[31:16];
r[5:4] = 1;
end else begin
m1 = m0[15:0];
r[5:4] = 0;
end
// choose between four 4-bit blocks
if (|m1[15:12]) begin
m2 = m1[15:12];
r[3:2] = 3;
end else if (|m0[11:8]) begin
m2 = m1[11:8];
r[3:2] = 2;
end else if (|m0[7:4]) begin
m2 = m1[7:4];
r[3:2] = 1;
end else begin
m2 = m1[3:0];
r[3:2] = 0;
end
// choose between four remaining bits
if (m2[3]) r[1:0] = 3;
else if (m2[2]) r[1:0] = 2;
else if (m2[1]) r[1:0] = 1;
else r[1:0] = 0;
out = r;
zeroout = ~|m2;
end
endmodule
Here is another solution that uses slightly less resourcess:
module bitscan4 (
input logic [63:0] in,
output logic [5:0] out,
output logic zout
);
logic [63:0] m0;
logic [3:0] m1;
logic [3:0] m2;
logic [5:0] r;
always_comb begin
r = 0;
m0 = in;
if (|m0[63:48]) begin
r[5:4] = 3;
m1[3] = |m0[63:60];
m1[2] = |m0[59:56];
m1[1] = |m0[55:53];
m1[0] = |m0[51:48];
end else if (|m0[47:32]) begin
r[5:4] = 2;
m1[3] = |m0[47:44];
m1[2] = |m0[43:40];
m1[1] = |m0[39:36];
m1[0] = |m0[35:32];
end else if (|m0[31:16]) begin
r[5:4] = 1;
m1[3] = |m0[31:28];
m1[2] = |m0[27:24];
m1[1] = |m0[23:20];
m1[0] = |m0[19:16];
end else begin
r[5:4] = 0;
m1[3] = |m0[15:12];
m1[2] = |m0[11:8];
m1[1] = |m0[7:4];
m1[0] = |m0[3:0];
end
if (m1[3]) begin
r[3:2] = 3;
end else if (m1[2]) begin
r[3:2] = 2;
end else if (m1[1]) begin
r[3:2] = 1;
end else begin
r[3:2] = 0;
end
m2 = m0[{r[5:2],2'b0}+: 4];
if (m2[3]) r[1:0] = 3;
else if (m2[2]) r[1:0] = 2;
else if (m2[1]) r[1:0] = 1;
else r[1:0] = 0;
zout = ~|m2;
out = r;
end
endmodule
To be able to use variable indexes in part-slice suffixes, you must enclose the for block into a generate block, like this:
gen var i;
generate
for (i=0;i<7;i=i+1) begin :gen_slices
always #* begin
... do whatever with in[7:i+1]
end
end
The problem is that apllying this to your module, the way it's written, leads to other errors. Your rewritten module would look like this (be warned: this won't work either)
module enc (
input wire [7:0] in,
output reg [2:0] out // I believe you wanted this to be 3 bits width, not 4.
);
genvar i; //a generate block needs a genvar
generate
for (i=0;i<7;i=i+1) begin :gen_block
always #* begin
if (in[i]==1'b1 && in[7:i+1]=='b0) // now this IS allowed :)
out = i;
else
out = 3'b0;
end
end
endgenerate
endmodule
This will throw a synthesis error about out being driven from more than one source. This means that the value assigned to out comes from several sources at the same time, and that is not allowed.
This is because the for block unrolls to something like this:
always #* begin
if (in[0]==1'b1 && in[7:1]=='b0)
out = 0;
else
out = 3'b0;
end
always #* begin
if (in[1]==1'b1 && in[7:2]=='b0)
out = 1;
else
out = 3'b0;
end
always #* begin
if (in[2]==1'b1 && in[7:3]=='b0)
out = 2;
else
out = 3'b0;
end
.... and so on...
So now you have multiple combinational block (always #*) trying to set a value to out. All of them will work at the same time, and all of them will try to put a specific value to out whether the if block evaluates as true or false. Recall that the condition of each if statement is mutually exclusive with respect of the other if conditions (i.e. only one if must evaluate to true).
So a quick and dirty way to avoid this multisource situation (I'm sure there are more elegant ways to solve this) is to let out to be high impedance if the if block is not going to assign it a value. Something like this:
module enc (
input wire [7:0] in,
output reg [2:0] out // I believe you wanted this to be 3 bits width, not 4.
);
genvar i; //a generate block needs a genvar
generate
for (i=0;i<7;i=i+1) begin :gen_block
always #* begin
if (in[i]==1'b1 && in[7:i+1]=='b0) // now this IS allowed :)
out = i;
else
out = 3'bZZZ;
end
end
endgenerate
always #* begin
if (in[7]) // you missed the case in which in[7] is high
out = 3'd7;
else
out = 3'bZZZ;
end
endmodule
On the other way, if you just need a priority encoder and your design uses fixed and small widths for inputs and outputs, you may write your encoder as this:
module enc (
input wire [7:0] in,
output reg [2:0] out
);
always #* begin
casex (in)
8'b1xxxxxxx : out = 3'd7;
8'b01xxxxxx : out = 3'd6;
8'b001xxxxx : out = 3'd5;
8'b0001xxxx : out = 3'd4;
8'b00001xxx : out = 3'd3;
8'b000001xx : out = 3'd2;
8'b0000001x : out = 3'd1;
8'b00000001 : out = 3'd0;
default : out = 3'd0;
endcase
end
endmodule
(although there seems to be reasons to not to use casex in a design. Read the comment #Tim posted about it in this other question: How can I assign a "don't care" value to an output in a combinational module in Verilog )
In conclusion: I'm afraid that I have not a bullet-proof design for your requirements (if we take into account the contents of the paper Tim linked in his comment), but at least, you know now why i was unallowed inside a part-slice suffix.
On the other way, you can have half of the work done by studying this code I gave as an answer to another SO question. In this case, the module works like a priority encoder, parametrized and without casex statements, only the output is not binary, but one-hot encoded.
How to parameterize a case statement with don't cares?
out = in&(~(in-1))
gives you the one-hot results(FROM LSB->MSB where the first 1 at)

Resources