I write a simple Verilog code for calculating inner product between 2 vectors. I don't understand why, but I get an error in the module's output - red signal. There is probably some problem with the result variable.
Does anyone know what I'm missing?
The code -
`resetall
`timescale 1ns/10ps
`include "../hdl/params.v"
module ZCalculationParllel(features_vec, weights_vec, bias, predict);
// Feature vector in size of |NUMBER_OF_PIXELS| * |PIXEL_PRECISION|
input [(`NUMBER_OF_PIXELS * `PIXEL_PRECISION) - 1 : 0] features_vec;
// Weights vector in size of |NUMBER_OF_PIXELS| * |WEIGHT_BIAS_PRECISION|
input [(`NUMBER_OF_PIXELS * `WEIGHT_BIAS_PRECISION) - 1 : 0] weights_vec;
// Bias vector in size |WEIGHT_BIAS_PRECISION|
input [`WEIGHT_BIAS_PRECISION - 1 : 0] bias;
// The output value for prediction
//output predict; in case we want return one bit represent prediction
output [(2 * `PIXEL_PRECISION + $bits(`NUMBER_OF_PIXELS)) - 1 : 0] predict;
// Accomulator array for saving the multiplication result before the adders
wire [`PIXEL_PRECISION_PLUS_WEIGHT_BIAS_PRECISION - 1 : 0] multiplications [0 : `NUMBER_OF_PIXELS - 1];
wire [(2 * `PIXEL_PRECISION + $bits(`NUMBER_OF_PIXELS)) - 1 : 0] result;
genvar k;
generate
for (k = 0; k < `NUMBER_OF_PIXELS; k = k + 1)
begin: elementMul
assign multiplications[k] = features_vec[(k + 1) * `PIXEL_PRECISION - 1 -: `PIXEL_PRECISION] * weights_vec[(k + 1) * `WEIGHT_BIAS_PRECISION - 1 -: `WEIGHT_BIAS_PRECISION];
end
endgenerate
assign result = 2;
genvar i;
generate
for (i = 0; i < `NUMBER_OF_PIXELS; i = i + 1)
begin: elementAdd
assign result = result + multiplications[i];
end
endgenerate
assign predict = result;
endmodule
The test -
`resetall
`timescale 1ns/10ps
`include "../hdl/ZCalculationParllel.v"
`include "../hdl/params.v"
module ZCalculationParllel_tb ;
reg [(`NUMBER_OF_PIXELS * `PIXEL_PRECISION) - 1 : 0] features;
reg [(`NUMBER_OF_PIXELS * `WEIGHT_BIAS_PRECISION) - 1 : 0] weights;
reg [`WEIGHT_BIAS_PRECISION - 1 : 0] b;
wire [(2 * `PIXEL_PRECISION + $bits(`NUMBER_OF_PIXELS)) - 1 : 0] ans;
ZCalculationParllel z (
.features_vec(features),
.weights_vec(weights),
.bias(b),
.predict(ans)
);
initial
begin
b = 8'b00000010;
features[7:0] = 8'b00000010;
features[15:8] = 8'b00000001;
features[24:16] = 8'b00000010;
weights[7:0] = 8'b00000010;
weights[15:8] = 8'b00000001;
weights[24:16] = 8'b00000010;
end
endmodule
assign are applied concurrency, not sequentially. Xs will result if any of the assignments are conflicting.
Replace your result related code with something like the following:
reg [(2 * `PIXEL_PRECISION + $bits(`NUMBER_OF_PIXELS)) - 1 : 0] result;
integer i;
always #* begin
result = 2;
for (i = 0; i < `NUMBER_OF_PIXELS; i = i + 1) begin
result = result + multiplications[i];
end
end
Related
I'm going through all my Verilog modules and creating good testbenches for them. I already finished the larger project, but I want to get better at writing testbenches as well as upload the testbenches to the project repository.
I have the following code for a testbench that tests a module I have. This module takes a 16-bit input, trims it down to 10 bits (data input from an accelerometer, only 10 bits are used but the input is 16 bits for consistency), then converts the 10 bit signed number into decimal by outputting the number of ones, tens, hundreds, and thousands.
This is the code I have for the testbench (Note: Decimal_Data represents the "usable" 10 bits of data):
module Binary_to_Decimal_TB();
reg [15:0] Accel_Data = 16'd0;
wire [3:0] ones;
wire [3:0] tens;
wire [3:0] hundreds;
wire [3:0] thousands;
wire negative;
wire [9:0] Decimal_Data;
integer i;
integer j = 0;
Binary_to_Decimal BtD(.Accel_Data(Accel_Data), .Decimal_Data(Decimal_Data),
.ones(ones), .tens(tens), .hundreds(hundreds), .thousands(thousands), .negative(negative));
initial
begin
for (i = 0; i < 2047; i = i + 1)
begin
Accel_Data = i;
#2;
if (Decimal_Data < 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) != 4 * Decimal_Data)
j = j + 1;
end
else if (Decimal_Data == 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) != 0)
j = j + 1;
end
else
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) != 2048 - (4 * (Decimal_Data - 512)))
j = j + 1;
end
end
$display("Conversion mismatches:", j);
$finish;
end
endmodule
It works as it should, and I get zero mismatches between the numbers represented by the converted data, and the original 2's complement 10-bit number.
However, I want to write it so that if there were an error, it would save the binary number where there is a mismatch between the input and output. In C++, I'd create an array, and dynamically allocate it with the binary numbers where a mismatch is detected (since we don't know how many mismatches there would be if the design were faulty).
More clearly, right now I can see how many mismatch errors occur. I want to implement a way to see where the errors occur
I know I'd write to the array in the conditional where I increment j, but I don't know how to create an array that's used for this purpose in Verilog.
Also, I've heard SystemVerilog is better for verification, so maybe there's something in SystemVerilog that I could use to accomplish this? I haven't really looked into SystemVerilog, but I do want to learn it.
In SystemVerilog you can create a queue, which is dynamic, to store the mismatched results.
logic [15:0] mismatches [$];
...
if (Decimal_Data < 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) !== 4 * Decimal_Data) begin
j = j + 1;
mismatches.push_back(Accel_Data);
end
end
...
$display("Conversion mismatches:", j);
foreach (mismatches[i]) $display(mismatches[i]);
$finish;
Refer to IEEE Std 1800-2017, section 7.10 Queues.
Since you are comparing 4-state values, you should use the case inequality operator (!==), as I have shown above. Furthermore, since you are comparing your DUT outputs to each other, you should also check if there are x or z values using $isunknown.
Also, there is common code in your testbench which can be combined. This is especially important as you add more checking code. For example:
logic [15:0] mismatches [$];
integer expect_data;
initial
begin
for (i = 0; i < 2047; i = i + 1)
begin
Accel_Data = i;
#2;
if (Decimal_Data < 512)
begin
expect_data = 4 * Decimal_Data;
end
else if (Decimal_Data == 512)
begin
expect_data = 0;
end
else
begin
expect_data = 2048 - (4 * (Decimal_Data - 512));
end
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) !== expect_data) begin
j = j + 1;
mismatches.push_back(Accel_Data);
end
end
$display("Conversion mismatches:", j);
foreach (mismatches[i]) $display(mismatches[i]);
$finish;
end
endmodule
Out of curiosity, why not print the mismatches when they occur? No need to store them and print them out later since it is all non-synthesized code. Example:
logic [15:0] mismatches [$];
...
if (Decimal_Data < 512)
begin
if (((ones * 1) + (tens * 10) + (hundreds * 100) + (thousands * 1000)) !== 4 * Decimal_Data) begin
j = j + 1;
$display("ERROR: Mistmatch (expected=%d) (actual=%d) #%t", 4*Decimal_data, ones+(tens*10)+(hundreds*100)+(thousands*1000), $time);
end
end
...
$display("Conversion mismatches:", j);
$finish;
I want to expand each bit n times.
For example,
// n = 2
5'b10101 -> 10'b1100110011
// n = 3
5'b10101 -> 15'b111000111000111
Is there any simple way (i.e., not using generate block) in Verilog or SystemVerilog?
EDIT 19.02.21
Actually, I'm doing 64bit mask to 512bit mask conversion, but it is different from {8{something}}. My current code is the following:
logic [63 : 0] x;
logic [511 : 0] y;
genvar i;
for (i = 0; i < 64; i = i + 1) begin
always_comb begin
y[(i + 1) * 8 - 1 : i * 8] = x[i] ? 8'hFF : 8'h00;
end
end
I just wonder there exists more "beautiful" way.
I think that your method is a good one. You cannot do it without some kind of a loop (unless you want to type all the iterations manually). There might be several variants for implementing it.
For example, using '+:' operator instead of an expression, which simplifies it a bit.
genvar i;
for (i = 0; i < 64; i = i + 1) begin
always_comb begin
y[i * 8 +: 8] = x[i] ? 8'hFF : 8'h00;
end
end
Thew above method actually generated 64 always blocks (as in your original one). Though sensitivity list of every block will be just a single bit from 'x'.
You can move the for loop inside an always block:
always #* begin
for (int j = 0; j < 64; j++) begin
y3[j * 8 +: 8] = x[j] ? 8'hFF : 8'h00;
end
end
this will end up as a single always block, but sensitivity list will include all bits of 'x'.
If this operation is used multiple times, you can use a function :
function logic [511 : 0] transform(input logic [63 : 0] x);
for (int j = 0; j < 64; j++) begin
transform[j * 8 +: 8] = x[j] ? 8'hFF : 8'h00;
end
endfunction
...
always #* begin
y = transform(x);
end
If n is a parameter you can do:
always_comb begin
y = '0;
for(int idx=0; idx<($bits(y)/n) && idx<$bits(x); idx++) begin
y[idx*n +: n] = {n{x[idx]}};
end
end
If n is a signal you have to assign each bit:
always_comb begin
y = '0;
foreach(y[idx]) begin
y[idx] = x[ idx/n ];
end
end
A variable divisor will add timing and area overhead. Depending on your design target, it may or may not be an issue (synthesis optimization or simulation only).
My answer might not be the best of the answers, but if I were you, I would do something as below (assuming x and y are registers in your module that will be used in a synchronous design):
// your module name and ports
reg [63:0] x;
reg [511:0] y;
// your initializations
always#(posedge clk) begin
y[0+:8] <= x[0] ? 8'hff : 8'h00;
y[8+:8] <= x[1] ? 8'hff : 8'h00;
y[16+:8] <= x[2] ? 8'hff : 8'h00;
y[24+:8] <= x[3] ? 8'hff : 8'h00;
y[32+:8] <= x[4] ? 8'hff : 8'h00;
*
*
*
y[504+:8] <= x[63] ? 8'hff : 8'h00;
end
For different always conditions:
// your module name and ports
reg [63:0] x;
reg [511:0] y;
// your initializations
always#('some sensitivity conditions') begin
y[0+:8] <= x[0] ? 8'hff : 8'h00;
y[8+:8] <= x[1] ? 8'hff : 8'h00;
y[16+:8] <= x[2] ? 8'hff : 8'h00;
y[24+:8] <= x[3] ? 8'hff : 8'h00;
y[32+:8] <= x[4] ? 8'hff : 8'h00;
*
*
*
y[504+:8] <= x[63] ? 8'hff : 8'h00;
end
However, if I wanted a separate module that inputs x and outputs y, I would do something as below:
module mask_conversion(
input [63:0] x;
output [511:0] y;
);
assign y[0+:8] = x[0] ? 8'hff : 8'h00;
assign y[8+:8] = x[1] ? 8'hff : 8'h00;
assign y[16+:8] = x[2] ? 8'hff : 8'h00;
assign y[24+:8] = x[3] ? 8'hff : 8'h00;
assign y[32+:8] = x[4] ? 8'hff : 8'h00;
*
*
*
assign y[504+:8] = x[63] ? 8'hff : 8'h00;
endmodule
It is not that difficult to type all these, you just need to copy and paste, and change numbers manually. As a result you will get guaranteed code that does what you want.
I want to make a parameterized FIR filter in verilog on xilinix. This is my code:
module FIRFilter(xInput, clock, reset, filterCoeff, yOutput);
parameter inputBits = 8, lengthOfFilter = 4, coeffBitLength = 8, lengthOfCoeff = lengthOfFilter + 1, outputBitWdth = 2 * inputBits;
input [(coeffBitLength * lengthOfCoeff) - 1 : 0] filterCoeff;
input clock, reset;
input [inputBits - 1 : 0] xInput;
reg [outputBitWdth - 1 : 0] addWires [lengthOfFilter - 1 : 0];
output reg [outputBitWdth - 1 : 0] yOutput;
reg [inputBits - 1 : 0] registers [lengthOfFilter - 1 : 0];
integer i, j;
always # (posedge clock, posedge reset)
begin
if(reset)
begin
for(i = 0; i < lengthOfFilter; i = i + 1)
begin
registers[i] <= 0;
end
end
else
begin
registers[0] <= xInput;
for(i = 1; i < lengthOfFilter; i = i + 1)
begin
registers[i] <= registers[i - 1];
end
end
end
always # (posedge clock)
begin
addWires[0] = filterCoeff[(lengthOfFilter * coeffBitLength) - 1 : (lengthOfFilter - 1) * coeffBitLength] * xInput;
for(j = 1; j < lengthOfFilter; j = j + 1)
begin
addWires[j] = (filterCoeff[((j + 1) * coeffBitLength) - 1 : j * coeffBitLength] * registers[j - 1]) + addWires[j - 1];
end
yOutput = (filterCoeff[coeffBitLength - 1 : 0] * registers[lengthOfFilter - 1]) + addWires[lengthOfFilter - 1];
end
endmodule
But I keep getting this error
ERROR:HDLCompilers:109 - "FIRFilter.v" line 33 Most significant bit operand in part-select of vector wire 'filterCoeff' is illegal
ERROR:HDLCompilers:110 - "FIRFilter.v" line 33 Least significant bit operand in part-select of vector wire 'filterCoeff' is illegal
ERROR:HDLCompilers:45 - "FIRFilter.v" line 33 Illegal right hand side of blocking assignment
I searched online for the solution but haven't got any satisfactory answer.
Can someone help me with the this?
Verilog does not allow part selects signal[msb:lsb] where msb and lsb are not constants. You can use another construct called an indexed part select where you specify a constant width, but a variable offset signal[offset+:width]
addWires[0] = filterCoeff[(lengthOfFilter * coeffBitLength) +:coeffBitLength] * xInput;
In verilog I have an array of binary values. How do I take the absolute value of the subtracted values ?
Verilog code:
module aaa(clk);
input clk;
reg [7:0] a [1:9];
reg [7:0] s [1:9];
always#(posedge clk)
begin
s[1] = a[1] - a[2];
s[2] = a[2] - a[3];
s[3] = a[1] + a[3];
end
endmodule
I want my s[1] and s[2] values to be always positive. How can I do it in synthesisable verilog?
I have tried using signed reg, but it shows an error.
Regardless of whether the number is signed or not twos complement is still used which correctly performs addition and subtraction at the bit level.
If a number is to be interpreted as signed the MSB can be used to tell if it is positive (0) or negative (1)
To absolute the number just invert based on the MSB:
reg [31:0] ans ; // Something else drives this value
reg [31:0] abs_ans; // Absolute version of ans
// invert (absolute value)
always #* begin
if (ans[31] == 1'b1) begin
abs_ans = -ans;
end
else begin
abs_ans = ans;
end
end
NB: using = because it is a combinatorial block, if using a flip-flop (edge trigger) use <= as #TzachiNoy has mentioned.
This should do the work:
s[1] <= (a[1]>a[2])?(a[1]-a[2]):(a[2]-a[1]);
Note: you should always use '<=' in clocked always blocks.
Just following the answer from #Morgan, and because I already had a module in my system that performed this operation, here is my contribution:
module Mod(
input signed [11:0] i,
output signed [11:0] o
);
assign o = i[11] ? -i : i; // This does all the magic
endmodule
And here is a testbench:
module tb;
reg signed [11:0] i;
wire signed [11:0] o;
Mod M(i,o);
integer t;
initial begin
for (t = -10; t < 10; t = t + 1) begin
#1
i <= t;
$display("i = %d, o = %d", i, o);
end
end
endmodule
The output is:
i = x, o = x
i = -10, o = 10
i = -9, o = 9
i = -8, o = 8
i = -7, o = 7
i = -6, o = 6
i = -5, o = 5
i = -4, o = 4
i = -3, o = 3
i = -2, o = 2
i = -1, o = 1
i = 0, o = 0
i = 1, o = 1
i = 2, o = 2
i = 3, o = 3
i = 4, o = 4
i = 5, o = 5
i = 6, o = 6
i = 7, o = 7
i = 8, o = 8
So, I'm trying to write an adder tree in verilog. The generic part of it is that it has a configurable number of elements to add and a configurable word size. However, I'm encountering problem after problem and I'm starting to question that this is the right way to solve my problem. (I will be using it in a larger project.) It is definately possible to just hard code the adder tree, alhough that will take alot of text.
So, I though I'd check with you stack overflowers on what you think about it. Is this "the way to do it"? I'm open for suggestions on different approaches too.
I can also mention that I'm quite new to verilog.
In case anyone is interested, here's my current non-working code: (I'm not expecting you to solve the problems; I'm just showing it for convenience.)
module adderTree(
input clk,
input [`WORDSIZE * `BANKSIZE - 1 : 0] terms_flat,
output [`WORDSIZE - 1 : 0] sum
);
genvar i, j;
reg [`WORDSIZE - 1 : 0] pipeline [2 * `BANKSIZE - 1 : 0]; // Pipeline array
reg clkPl = 0; // Pipeline clock
assign sum = pipeline[0];
// Pack flat terms
generate
for (i = `BANKSIZE; i < 2 * `BANKSIZE; i = i + 1) begin
always # (posedge clk) begin
pipeline[i] <= terms_flat[i * `WORDSIZE +: `WORDSIZE];
clkPl = 1;
end
end
endgenerate
// Add terms logarithmically
generate
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always # (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
end
end
end
endgenerate
endmodule
Here are a few comments you might find useful:
CLOCKING
It is generally good to have as few clocks as possible in your design (preferably just one).
In this particular case it appears you are trying generating a new clock clkPl, but this does not work because it will never return to 0. (The "reg clkPl=0;" will reset it to 0 at time 0, then it is set permanently to 1 in "clkPl = 1;".)
You can fix this by simply replacing
always # (posedge clkPl)
with
always # (posedge clk)
ASSIGNMENTS
It is good form to only use blocking assignments in combinatorial blocks, and non-blocking in clocked blocks. You are mixing both blocking and non-blocking assignments in your "Pack flat terms" section.
As you don't need clkPl you can simply delete the line with the blocking assignment ("clkPl = 1;")
TREE STRUCTURE
Your double for loop:
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always # (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
end
end
end
looks like it will access incorrect elements.
e.g. for BANKSIZE = 28, **i will count up to 7, at which point "pipeline[i * (2 ** i) + j]"="pipeline[7*2**7+j]"="pipeline[896+j] which will be out of bounds for the array. (The array has 2*BANKSIZE=512 elements in it.)
I think you actually want this structure:
assign sum = pipeline[1];
for (i = 1; i < `BANKSIZE; i = i + 1) begin
always # (posedge clk) begin
pipeline[i] <= pipeline[i*2] + pipeline[i*2 + 1];
end
end
LOWER LATENCY
Note that most verilog tools are very good at synthesising adds of multiple elements so you may want to consider combining more terms at each level of the hierarchy.
(Adding more terms costs less than someone might expect because the tools can use optimisations such as carry save adders to reduce the gate delay.)