Verilog, generic adder tree - verilog

So, I'm trying to write an adder tree in verilog. The generic part of it is that it has a configurable number of elements to add and a configurable word size. However, I'm encountering problem after problem and I'm starting to question that this is the right way to solve my problem. (I will be using it in a larger project.) It is definately possible to just hard code the adder tree, alhough that will take alot of text.
So, I though I'd check with you stack overflowers on what you think about it. Is this "the way to do it"? I'm open for suggestions on different approaches too.
I can also mention that I'm quite new to verilog.
In case anyone is interested, here's my current non-working code: (I'm not expecting you to solve the problems; I'm just showing it for convenience.)
module adderTree(
input clk,
input [`WORDSIZE * `BANKSIZE - 1 : 0] terms_flat,
output [`WORDSIZE - 1 : 0] sum
);
genvar i, j;
reg [`WORDSIZE - 1 : 0] pipeline [2 * `BANKSIZE - 1 : 0]; // Pipeline array
reg clkPl = 0; // Pipeline clock
assign sum = pipeline[0];
// Pack flat terms
generate
for (i = `BANKSIZE; i < 2 * `BANKSIZE; i = i + 1) begin
always # (posedge clk) begin
pipeline[i] <= terms_flat[i * `WORDSIZE +: `WORDSIZE];
clkPl = 1;
end
end
endgenerate
// Add terms logarithmically
generate
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always # (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
end
end
end
endgenerate
endmodule

Here are a few comments you might find useful:
CLOCKING
It is generally good to have as few clocks as possible in your design (preferably just one).
In this particular case it appears you are trying generating a new clock clkPl, but this does not work because it will never return to 0. (The "reg clkPl=0;" will reset it to 0 at time 0, then it is set permanently to 1 in "clkPl = 1;".)
You can fix this by simply replacing
always # (posedge clkPl)
with
always # (posedge clk)
ASSIGNMENTS
It is good form to only use blocking assignments in combinatorial blocks, and non-blocking in clocked blocks. You are mixing both blocking and non-blocking assignments in your "Pack flat terms" section.
As you don't need clkPl you can simply delete the line with the blocking assignment ("clkPl = 1;")
TREE STRUCTURE
Your double for loop:
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always # (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
end
end
end
looks like it will access incorrect elements.
e.g. for BANKSIZE = 28, **i will count up to 7, at which point "pipeline[i * (2 ** i) + j]"="pipeline[7*2**7+j]"="pipeline[896+j] which will be out of bounds for the array. (The array has 2*BANKSIZE=512 elements in it.)
I think you actually want this structure:
assign sum = pipeline[1];
for (i = 1; i < `BANKSIZE; i = i + 1) begin
always # (posedge clk) begin
pipeline[i] <= pipeline[i*2] + pipeline[i*2 + 1];
end
end
LOWER LATENCY
Note that most verilog tools are very good at synthesising adds of multiple elements so you may want to consider combining more terms at each level of the hierarchy.
(Adding more terms costs less than someone might expect because the tools can use optimisations such as carry save adders to reduce the gate delay.)

Related

How to create an array to store integers in a testbench?

I'm going through all my Verilog modules and creating good testbenches for them. I already finished the larger project, but I want to get better at writing testbenches as well as upload the testbenches to the project repository.
I have the following code for a testbench that tests a module I have. This module takes a 16-bit input, trims it down to 10 bits (data input from an accelerometer, only 10 bits are used but the input is 16 bits for consistency), then converts the 10 bit signed number into decimal by outputting the number of ones, tens, hundreds, and thousands.
This is the code I have for the testbench (Note: Decimal_Data represents the "usable" 10 bits of data):
module Binary_to_Decimal_TB();
reg [15:0] Accel_Data = 16'd0;
wire [3:0] ones;
wire [3:0] tens;
wire [3:0] hundreds;
wire [3:0] thousands;
wire negative;
wire [9:0] Decimal_Data;
integer i;
integer j = 0;
Binary_to_Decimal BtD(.Accel_Data(Accel_Data), .Decimal_Data(Decimal_Data),
.ones(ones), .tens(tens), .hundreds(hundreds), .thousands(thousands), .negative(negative));
initial
begin
for (i = 0; i < 2047; i = i + 1)
begin
Accel_Data = i;
#2;
if (Decimal_Data < 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) != 4 * Decimal_Data)
j = j + 1;
end
else if (Decimal_Data == 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) != 0)
j = j + 1;
end
else
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) != 2048 - (4 * (Decimal_Data - 512)))
j = j + 1;
end
end
$display("Conversion mismatches:", j);
$finish;
end
endmodule
It works as it should, and I get zero mismatches between the numbers represented by the converted data, and the original 2's complement 10-bit number.
However, I want to write it so that if there were an error, it would save the binary number where there is a mismatch between the input and output. In C++, I'd create an array, and dynamically allocate it with the binary numbers where a mismatch is detected (since we don't know how many mismatches there would be if the design were faulty).
More clearly, right now I can see how many mismatch errors occur. I want to implement a way to see where the errors occur
I know I'd write to the array in the conditional where I increment j, but I don't know how to create an array that's used for this purpose in Verilog.
Also, I've heard SystemVerilog is better for verification, so maybe there's something in SystemVerilog that I could use to accomplish this? I haven't really looked into SystemVerilog, but I do want to learn it.
In SystemVerilog you can create a queue, which is dynamic, to store the mismatched results.
logic [15:0] mismatches [$];
...
if (Decimal_Data < 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) !== 4 * Decimal_Data) begin
j = j + 1;
mismatches.push_back(Accel_Data);
end
end
...
$display("Conversion mismatches:", j);
foreach (mismatches[i]) $display(mismatches[i]);
$finish;
Refer to IEEE Std 1800-2017, section 7.10 Queues.
Since you are comparing 4-state values, you should use the case inequality operator (!==), as I have shown above. Furthermore, since you are comparing your DUT outputs to each other, you should also check if there are x or z values using $isunknown.
Also, there is common code in your testbench which can be combined. This is especially important as you add more checking code. For example:
logic [15:0] mismatches [$];
integer expect_data;
initial
begin
for (i = 0; i < 2047; i = i + 1)
begin
Accel_Data = i;
#2;
if (Decimal_Data < 512)
begin
expect_data = 4 * Decimal_Data;
end
else if (Decimal_Data == 512)
begin
expect_data = 0;
end
else
begin
expect_data = 2048 - (4 * (Decimal_Data - 512));
end
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) !== expect_data) begin
j = j + 1;
mismatches.push_back(Accel_Data);
end
end
$display("Conversion mismatches:", j);
foreach (mismatches[i]) $display(mismatches[i]);
$finish;
end
endmodule
Out of curiosity, why not print the mismatches when they occur? No need to store them and print them out later since it is all non-synthesized code. Example:
logic [15:0] mismatches [$];
...
if (Decimal_Data < 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) !== 4 * Decimal_Data) begin
j = j + 1;
$display("ERROR: Mistmatch (expected=%d) (actual=%d) #%t", 4*Decimal_data, ones+(tens*10)+(hundreds*100)+(thousands*1000), $time);
end
end
...
$display("Conversion mismatches:", j);
$finish;

Initialize and increment a genvar variable inside nested for loop of system verilog

I'm trying to write a synthesizable code in .sv to meet my requirements, but got stuck at the very end without a solution;
I've an incoming wire which is 99 bit wide and 10 (MAX) deep
I need to feed this to a module and get its output which is 99 bit
the output needs to be assigned to an array (99 bit wide ) from 0 .. max
I can't paste the snippet, but this is what I've coded:
--
param MAX = 10;
param TOT_INST = 45 ;
wire [98:0] inwire [MAX-1:0] ;
wire [98:0] outwire [TOT_INST-1:0]
genvar x,y,z ;
generate
for (x = 0, z=0; x<=MAX-1; x=x+1, z=z+1 )
begin : gen1
for (y = x + 1; y<=MAX-1; y=y+1)
begin : gen2
some_module mod_inst ( .in1(inwire[x]), .in2(inwire[y]), .y(outwire[z]) );
end
end
endgenerate
--
The expectation is to get outwire[0] to be an output of inwire[0] and inwire[1], outwire[1] to be a function of inwire[0], inwire[1] etc. So, it becomes necessary to increment the index of outwire.
I used another genvar z for this purpose (to increment from 0 to 44). But, it looks like the SV doesn't support multiple genvar variables to be incremented? im getting compilation error at the for loop itself.
Is there any way to achieve what i need? I really appreciate you taking time to go through this question. Any insight would be really helpful.
Thanks
Jr.
I understand your intent. It seems you are trying to use comma expressions, but they wont work here.
Also, it seems that the genvar can only be assigned in the initialization and increment of the for loop, otherwise it would be easy to increment them on the innermost loop.
Since you must have unique drivers to the outwires, and the number of entries you declared (45) matches the number of instances you will create I assume you simply want them to be set incrementally.
What I would do is to calculate the number of iterations algebraically and create a local parameter. If you can't see how, review triangular numbers.
parameter MAX = 10;
// Generalizing your TOT_INST
parameter TOT_INST = MAX * (MAX - 1) / 2 ;
wire [98:0] inwire [MAX-1:0] ;
wire [98:0] outwire [TOT_INST-1:0];
genvar x,y;
generate
for (x = 0; x <= MAX-1; x = x + 1 )
begin : gen1
for (y = x + 1; y<=MAX-1; y=y+1)
begin : gen2
localparam z = TOT_INST - ((MAX - x - 2) * (MAX - x - 1)) / 2 + y - MAX;
initial begin
$display("%d %d %d", x, y, z);
end
end
end
endgenerate
The formula would be simpler if we used the x in the inner loop.

Most significant bit operand in part-select of vector wire is illegal

I want to make a parameterized FIR filter in verilog on xilinix. This is my code:
module FIRFilter(xInput, clock, reset, filterCoeff, yOutput);
parameter inputBits = 8, lengthOfFilter = 4, coeffBitLength = 8, lengthOfCoeff = lengthOfFilter + 1, outputBitWdth = 2 * inputBits;
input [(coeffBitLength * lengthOfCoeff) - 1 : 0] filterCoeff;
input clock, reset;
input [inputBits - 1 : 0] xInput;
reg [outputBitWdth - 1 : 0] addWires [lengthOfFilter - 1 : 0];
output reg [outputBitWdth - 1 : 0] yOutput;
reg [inputBits - 1 : 0] registers [lengthOfFilter - 1 : 0];
integer i, j;
always # (posedge clock, posedge reset)
begin
if(reset)
begin
for(i = 0; i < lengthOfFilter; i = i + 1)
begin
registers[i] <= 0;
end
end
else
begin
registers[0] <= xInput;
for(i = 1; i < lengthOfFilter; i = i + 1)
begin
registers[i] <= registers[i - 1];
end
end
end
always # (posedge clock)
begin
addWires[0] = filterCoeff[(lengthOfFilter * coeffBitLength) - 1 : (lengthOfFilter - 1) * coeffBitLength] * xInput;
for(j = 1; j < lengthOfFilter; j = j + 1)
begin
addWires[j] = (filterCoeff[((j + 1) * coeffBitLength) - 1 : j * coeffBitLength] * registers[j - 1]) + addWires[j - 1];
end
yOutput = (filterCoeff[coeffBitLength - 1 : 0] * registers[lengthOfFilter - 1]) + addWires[lengthOfFilter - 1];
end
endmodule
But I keep getting this error
ERROR:HDLCompilers:109 - "FIRFilter.v" line 33 Most significant bit operand in part-select of vector wire 'filterCoeff' is illegal
ERROR:HDLCompilers:110 - "FIRFilter.v" line 33 Least significant bit operand in part-select of vector wire 'filterCoeff' is illegal
ERROR:HDLCompilers:45 - "FIRFilter.v" line 33 Illegal right hand side of blocking assignment
I searched online for the solution but haven't got any satisfactory answer.
Can someone help me with the this?
Verilog does not allow part selects signal[msb:lsb] where msb and lsb are not constants. You can use another construct called an indexed part select where you specify a constant width, but a variable offset signal[offset+:width]
addWires[0] = filterCoeff[(lengthOfFilter * coeffBitLength) +:coeffBitLength] * xInput;

Verilog error : A reference to a wire or reg is not allowed in a constant expression

I'm new to Verilog and I would really appreciate it if someone could help me out with this error:
output reg [0:image_width][image_height:0] result
....
integer i, j, imageX, imageY, x, y, kernelX, kernelY;
....
#(negedge ACLK)
for(x = 0; x < image_width; x++) begin
for(y = 0; y < image_height; y++)
begin
//multiply every value of the filter with corresponding image pixel
for(kernelX = 0; kernelX < kernel_width; kernelX++) begin
for(kernelY = 0; kernelY < kernel_height; kernelY++)
begin
imageX = (x - kernel_width / 2 + kernelX + image_width) % image_width;
imageY = (y - kernel_height / 2 + kernelY + image_height) % image_height;
// ignore input samples which are out of bound
if( imageY >= 0 && imageY < image_height && imageX >= 0 && imageX < image_width )
//ERROR HERE!!!
result[x][y] += image[imageX][imageY] * kernel[kernelX][kernelY];
end
end
end
end
end
The error I get is:
error: A reference to a wire or reg ('x') is not allowed in a constant expression.
error: Array index expressions must be constant here.
error: A reference to a wire or reg ('imageX') is not allowed in a constant expression.
error: Array index expressions must be constant here.
error: A reference to a wire or reg ('kernelX') is not allowed in a constant expression.
error: Array index expressions must be constant here.
Could somebody tell me what I'm doing wrong? Thank you!
This line is the problem:
result[x][y] += image[imageX][imageY] * kernel[kernelX][kernelY];
Indexing into arrays is only allowed for constant expressions. You are not allowed to use variables in vector indexes. Remember that you're working with an HDL: you're dictating physical connections in hardware. Having a variable in the index implies the ability to dynamically rewire the circuit. This SO question has some rough workarounds that may work for you. However, you should really try to refactor your algorithm to avoid the need to use the variable indexing in the first place.
By the way, you should be using non-blocking assignments instead of the blocking assignments you currently have. Your code is in a clocked block, so blocking combinational logic should be avoided:
imageX <= (x - kernel_width / 2 + kernelX + image_width) % image_width;
imageY <= (y - kernel_height / 2 + kernelY + image_height) % image_height;
// ignore input samples which are out of bound
if( imageY >= 0 && imageY < image_height && imageX >= 0 && imageX < image_width )
result[x][y] <= result[x][y] + image[imageX][imageY] * kernel[kernelX][kernelY];
#(negedge ACLK);
^
I'm pretty sure that semicolon doesn't belong there. As written, the for loops are all outside the always block.
Additionally, your image array currently only has one bit per pixel. Is this intentional? Whether it is or not, I would recommend that you reconsider this architecture; filtering an image of any significant size in a single clock cycle is not going to synthesize very well.

Illegal operand for constant expression

I'm trying to build a task, which must delve into some hierarchy, that can concisely compare different pins on a particular instance. In particular, I'd like to do something like the following:
task check_expected;
input integer pin;
input [9:0] expected;
integer i, j;
reg [9:0] check;
begin
j = 0;
for (i = 0; i < 20; i = i + 1) begin
case (pin)
0: begin
check[0] = test.inst[i].lane_0.PIN_FIRST;
check[1] = test.inst[i].lane_1.PIN_FIRST;
...
check[9] = test.inst[i].lane_9.PIN_FIRST;
end
1: begin
check[0] = test.inst[i].lane_0.PIN_SECOND;
check[1] = test.inst[i].lane_1.PIN_SECOND;
...
check[9] = test.inst[i].lane_9.PIN_SECOND;
end
...
9: begin
check[0] = test.inst[i].lane_0.PIN_TENTH;
check[1] = test.inst[i].lane_1.PIN_TENTH;
...
check[9] = test.inst[i].lane_9.PIN_TENTH;
end
endcase
if (check[0] !== expected[j*10 + 0]) begin
TEST_FAILED = TEST_FAILED + 1;
$display("ERROR Expected=%b, # %0t",expected[j*10 + 0],$time);
end
if (check[1] !== expected[j*10 + 1]) begin
TEST_FAILED = TEST_FAILED + 1;
$display("ERROR Expected=%b, # %0t",expected[j*10 + 1],$time);
end
...
if (check[9] !== expected[j*10 + 9]) begin
TEST_FAILED = TEST_FAILED + 1;
$display("ERROR Expected=%b, # %0t",expected[j*10 + 9],$time);
end
end
end
endtask
Unfortunately, attempting to do the above throws a NOTPAR error during elaboration, claiming that it is unacceptable to assign a register to a non-constant (it doesn't like any lines like check[0] = test.inst[i].lane_0.PIN_FIRST;). This is just for testing purposes, not anything synthesizeable, by the way.
Can someone explain why this is disallowed and suggest a different solution? It's looking like I'll need to write a task for each and every loop iteration, and that seems like it would be needlessly bloated and ugly.
Thanks
To answer my own question, the answer is that there is no way to do it with Verilog. Verilog is an incredibly dumb (in terms of capabilities) language, and, with a task, can only support constant indices for module instances. No looping is possible.

Resources