Reduce array to sum of elements - verilog

I am trying to reduce a vector to a sum of all it elements. Is there an easy way to do this in verilog?
Similar to the systemverilog .sum method.
Thanks

My combinational solution for this problem:
//example array
parameter cells = 8;
reg [7:0]array[cells-1:0] = {1,2,3,4,5,1,1,1};
//###############################################
genvar i;
wire [7:0] summation_steps [cells-2 : 0];//container for all sumation steps
generate
assign summation_steps[0] = array[0] + array[1];//for less cost starts witch first sum (not array[0])
for(i=0; i<cells-2; i=i+1) begin
assign summation_steps[i+1] = summation_steps[i] + array[i+2];
end
endgenerate
wire [7:0] result;
assign result = summation_steps[cells-2];

Verilog doesn't have any built-in array methods like SV. Therefore, a for-loop can be used to perform the desired functionality. Example:
parameter N = 64;
integer i;
reg [7:0] array [0:N-1]
reg [N+6:0] sum; // enough bits to handle overflow
always #*
begin
sum = {(N+7){1'b0}}; // all zero
for(i = 0; i < N; i=i+1)
sum = sum + array[i];
end

In critiquing the other answers delivered here, there are some comments to make.
The first important thing is to provide space for the sum to be accumulated. statements such as the following, in RTL, won't do that:
sum = sum + array[i]
because each of the unique nets created on the Right Hand Side (RHS) of the expression are all being assigned back to the same signal called "sum", leading to ambiguity in which of the unique nets is actually the driver (called a multiple driver hazard). To compound the problem, this statement also creates a combinational loop issue because sum is used combinationally to drive itself - not good. What would be good would be if something different could be used as the load and as the driver on each successive iteration of the loop....
Back to the argument though, in the above situation, the signal will be driven to an unknown value by most simulator tools (because: which driver should it pick? so assume none of them are right, or all of them are right - unknown!!). That is if it manages to get through the compiler at all (which is unlikely, and it doesn't at least in Cadence IEV).
The right way to do it would be to set up the following. Say you were summing bytes:
parameter NUM_BYTES = 4;
reg [7:0] array_of_bytes [NUM_BYTES-1:0];
reg [8+$clog2(NUM_BYTES):0] sum [NUM_BYTES-1:1];
always #* begin
for (int i=1; i<NUM_BYTES; i+=1) begin
if (i == 1) begin
sum[i] = array_of_bytes[i] + array_of_bytes[i-1];
end
else begin
sum[i] = sum[i-1] + array_of_bytes[i];
end
end
end
// The accumulated value is indexed at sum[NUM_BYTES-1]

Here is a module that works for arbitrarily sized arrays and does not require extra storage:
module arrsum(input clk,
input rst,
input go,
output reg [7:0] cnt,
input wire [7:0] buf_,
input wire [7:0] n,
output reg [7:0] sum);
always #(posedge clk, posedge rst) begin
if (rst) begin
cnt <= 0;
sum <= 0;
end else begin
if (cnt == 0) begin
if (go == 1) begin
cnt <= n;
sum <= 0;
end
end else begin
cnt <= cnt - 1;
sum <= sum + buf_;
end
end
end
endmodule
module arrsum_tb();
localparam N = 6;
reg clk = 0, rst = 0, go = 0;
wire [7:0] cnt;
reg [7:0] buf_, n;
wire [7:0] sum;
reg [7:0] arr[9:0];
integer i;
arrsum dut(clk, rst, go, cnt, buf_, n, sum);
initial begin
$display("time clk rst sum cnt");
$monitor("%4g %b %b %d %d",
$time, clk, rst, sum, cnt);
arr[0] = 5;
arr[1] = 6;
arr[2] = 7;
arr[3] = 10;
arr[4] = 2;
arr[5] = 2;
#5 clk = !clk;
#5 rst = 1;
#5 rst = 0;
#5 clk = !clk;
go = 1;
n = N;
#5 clk = !clk;
#5 clk = !clk;
for (i = 0; i < N; i++) begin
buf_ = arr[i];
#5 clk = !clk;
#5 clk = !clk;
go = 0;
end
#5 clk = !clk;
$finish;
end
endmodule
I designed it for 8-bit numbers but it can easily be adapted for other kinds of numbers too.

Related

Why is my register not updating in the testbench?

I have a register that I increment by a different value base on the different inputs, When I run a test bench to check it, the value does not increment. Not sure why. Attached is my code and the TB, as well as a screen shot of the simulation:
`timescale 1ns / 1ps
module TEMP_P1(
input clk,
input reset,
input inquarter,
input indime,
input innickle,
input inbev1,
input inbev2,
input inbev3,
output reg [3:0] outquarter,
output reg [3:0] outdime,
output reg [3:0] outnickle,
output reg [1:0] outbev1,
output reg [1:0] outbev2,
output reg [1:0] outbev3
);
reg [31:0] total;
reg [3:0] quarter_count;
reg [3:0] dime_count;
reg [3:0] nickle_count;
always#(clk)begin
if(reset)begin
total = 0;
quarter_count = 0;
dime_count = 0;
nickle_count = 0;
end else begin
if (inquarter) begin
quarter_count = quarter_count + 1;
total = total + 25;
end
if (indime) begin
dime_count = dime_count + 1;
total = total + 10;
end
if (innickle) begin
nickle_count = nickle_count + 1;
total = total + 5;
end
if ((inbev1 === 1) && ( total >= 100)) begin
outbev1 = 1;
total = total - 100;
end
if ((inbev2 == 1) && ( total >= 120)) begin
outbev2 = 1;
total = total - 120;
end
if ((inbev3 == 1) && ( total >= 115)) begin
outbev3 = 1;
total = total - 115;
end
if (total >= 25) begin
outquarter = 1;
outdime = 0;
outnickle = 0;
total = total - 25;
end
if ((total >= 10) && (total < 25)) begin
outquarter = 0;
outdime = 1;
outnickle = 0;
total = total - 10;
end
if ((total >= 5) && (total < 10)) begin
outquarter = 0;
outdime = 0;
outnickle = 1;
total = total - 5;
end
end
end
endmodule
`timescale 1ns / 1ps
module TEMP_P1_TB(
);
reg clk;
reg reset;
reg inquarter;
reg indime;
reg innickle;
reg inbev1;
reg inbev2;
reg inbev3;
wire [3:0] outquarter;
wire [3:0] outdime;
wire [3:0] outnickle;
wire [1:0] outbev1;
wire [1:0] outbev2;
wire [1:0] outbev3;
TEMP_P1 UUT(.clk(clk), .reset(reset), .inquarter(inquarter), .indime(indime), .innickle(innickle), .inbev1(inbev1), .inbev2(inbev2), .inbev3(inbev3),
.outquarter(outquarter), .outdime(outdime), .outnickle(outnickle), .outbev1(outbev1), .outbev2(outbev2), .outbev3(outbev3));
initial begin
clk = 0;
reset = 0;
inquarter = 0;
indime = 0;
innickle = 0;
inbev1 = 0;
inbev2 = 0;
inbev3 = 0;
#10;
reset = 1;
#1
reset = 0;
#1
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inbev1 = 1;
#1;
inbev1 = 0;
#1;
end
always begin
#1 clk = ~clk;
end
endmodule
I have tried changing blocking v. nonblocking assignments.
One major problem with your code is that you have Verilog simulation race conditions.
To model registers, you need to use these coding styles:
Signal changes triggered off only one edge of the clock (positive, for example):
#(posedge clk)
Nonblocking assignments: <=
This style must be used in both the design and testbench.
For the design, refer to these simple code examples.
In the testbench, you should replace all of the # delays, except for the clk signal, with #(posedge clk). This assures that the inputs will be synchronous to the clock. clk should use a blocking assignment; so there is no need to change your code there.
Also, the testbench code can be easier to understand if you use loops. I use repeat loops. Here are all the changes to both the design and testbench:
`timescale 1ns / 1ps
module TEMP_P1(
input clk,
input reset,
input inquarter,
input indime,
input innickle,
input inbev1,
input inbev2,
input inbev3,
output reg [3:0] outquarter,
output reg [3:0] outdime,
output reg [3:0] outnickle,
output reg [1:0] outbev1,
output reg [1:0] outbev2,
output reg [1:0] outbev3
);
reg [31:0] total;
reg [3:0] quarter_count;
reg [3:0] dime_count;
reg [3:0] nickle_count;
always #(posedge clk) begin
if (reset) begin
total <= 0;
quarter_count <= 0;
dime_count <= 0;
nickle_count <= 0;
end else begin
if (inquarter) begin
quarter_count <= quarter_count + 1;
total <= total + 25;
end
if (indime) begin
dime_count <= dime_count + 1;
total <= total + 10;
end
if (innickle) begin
nickle_count <= nickle_count + 1;
total <= total + 5;
end
if ((inbev1 === 1) && ( total >= 100)) begin
outbev1 <= 1;
total <= total - 100;
end
if ((inbev2 == 1) && ( total >= 120)) begin
outbev2 <= 1;
total <= total - 120;
end
if ((inbev3 == 1) && ( total >= 115)) begin
outbev3 <= 1;
total <= total - 115;
end
if (total >= 25) begin
outquarter <= 1;
outdime <= 0;
outnickle <= 0;
total <= total - 25;
end
if ((total >= 10) && (total < 25)) begin
outquarter <= 0;
outdime <= 1;
outnickle <= 0;
total <= total - 10;
end
if ((total >= 5) && (total < 10)) begin
outquarter <= 0;
outdime <= 0;
outnickle <= 1;
total <= total - 5;
end
end
end
endmodule
`timescale 1ns / 1ps
module TEMP_P1_TB;
reg clk;
reg reset;
reg inquarter;
reg indime;
reg innickle;
reg inbev1;
reg inbev2;
reg inbev3;
wire [3:0] outquarter;
wire [3:0] outdime;
wire [3:0] outnickle;
wire [1:0] outbev1;
wire [1:0] outbev2;
wire [1:0] outbev3;
TEMP_P1 UUT(.clk(clk), .reset(reset), .inquarter(inquarter), .indime(indime), .innickle(innickle), .inbev1(inbev1), .inbev2(inbev2), .inbev3(inbev3),
.outquarter(outquarter), .outdime(outdime), .outnickle(outnickle), .outbev1(outbev1), .outbev2(outbev2), .outbev3(outbev3));
initial begin
clk <= 0;
reset <= 0;
inquarter <= 0;
indime <= 0;
innickle <= 0;
inbev1 <= 0;
inbev2 <= 0;
inbev3 <= 0;
repeat (5) #(posedge clk);
reset <= 1;
repeat (1) #(posedge clk);
reset <= 0;
repeat (5) begin
repeat (1) #(posedge clk);
inquarter <= 1;
repeat (1) #(posedge clk);
inquarter <= 0;
end
repeat (1) #(posedge clk);
inbev1 <= 1;
repeat (1) #(posedge clk);
inbev1 <= 0;
repeat (10) #(posedge clk);
$finish;
end
always begin
#1 clk = ~clk;
end
endmodule
The above code fixes the timing problems (race conditions).
However, you also have problems with your logic since the total gets cleared after each quarter input, as you can see in the waves below:
If that is not your intended behavior, I recommend you spend more time looking at your internal waveforms. If you still have problems, you can ask a new question.
In the UUT, add the keyword posedge to # statement, like this:
always#(posedge clk)begin
In the testbench, assert reset at the beginning, release it a couple of clocks later, don't assert it again. It was asserted for 1/2 of a clock some time after the test was started. Best practice is to assert it at the unless there is a good reason not to do so.
initial begin
reset = 1;
#5 reset = 0;
Use non-blocking assignments everywhere in the UUT # statement for the clocked process like this (I did not fix them all, you should). Non-blocking assignments in a synchronous process is the correct way model registers:
always#(posedge clk)begin
if(reset)begin
total <= 0;
quarter_count <= 0;
dime_count <= 0;
nickle_count <= 0;
end else begin
if (inquarter) begin
quarter_count <= quarter_count + 1;
total <= total + 25;
In the testbench, hold the inputs (example inquarter) for at least 1 clock cycle so that they are able to be sampled at a logic 1, at the clock edge. inquarter is 1/2 a clock cycle wide. The clk period is #2, so if you change an input every #1, that is making a skinny pulse that would be easy for the DUT to miss. I did not fix these in the testbench you should. Register quarter_count is catching the input at 1 almost by accident in my simulation.
Make these changes and the registers start behaving as registers.
The waves looks like this for me on edaplayground:
Another issue is that you have an always block in the testbench
always begin
#1 clk = ~clk;
however there is no $finish or $stop anywhere in your code. The simulation will run forever until you click some sort of kill/stop in the simulation GUI, or kill the process from the command line. The solution to this is to add a
$finish;
at the end of the testbench main initial block. Now the simulation will compile, elaborate, run, and stop relatively quickly. My eda playground simulation run of your post takes about 15 seconds total.

How do you select a range of bits from an expression of registers?

I am trying to take the average of 8 8-bit registers. I was able to do it structurally, by having four 8-> 9-bit adders, two 9-> 10 bit adder, and one 10-> 11-bit adder. This works correctly; however, I was curious if there is a better way/ more efficient way to do this.
For the structural way, all I have to do is have the assign a wire from the 10->11 bit adder to the output.
I'm trying to do something like below, but it says
Index <10> is out of range [7:0] for signal .
I have it index 10, in case all the registers are large like 255.
module avg(
num_in,
clk,
rs,
ave8
) ;
input clk ;
input rs;
input [7:0] num_in ;
output [7:0] ave8 ;
reg [7:0] registers [7:0] ;
always #(posedge clk) begin
if(rs) begin
registers[0] <= 0;
registers[1] <= 0;
registers[2] <= 0;
registers[3] <= 0;
registers[4] <= 0;
registers[5] <= 0;
registers[6] <= 0;
registers[7] <= 0;
end
registers[0] <= num_in;
registers[1] <= registers[0];
registers[2] <= registers[1];
registers[3] <= registers[2];
registers[4] <= registers[3];
registers[5] <= registers[4];
registers[6] <= registers[5];
registers[7] <= registers[6];
end
// This assign function is what I am focused on.
assign ave8 = {registers[0] + registers[1] + registers[2] + registers[3] + registers[4] + registers[5] + registers[6] + registers[7]}[10:3];
One way is to create a sum wire:
wire [10:0] sum = registers[0] + registers[1] + registers[2] + registers[3] + registers[4] + registers[5] + registers[6] + registers[7];
assign ave8 = sum[10:3];
I suggest using for loops:
module avg(
input [7:0] num_in ,
input clk ,
input rs ,
output reg [7:0] ave
);
parameter SIZE = 8;
reg [7:0] registers [SIZE-1:0] ;
reg [10:0] accumulator;
integer i;
always #(posedge clk) begin
if(rs)
for(i=0;i<SIZE;i=i+1)
registers[i] <= 0;
registers[0] <= num_in;
for(i=1;i<SIZE;i=i+1)
registers[i] <= registers[i-1];
accumulator = 0;
for(i=0;i<SIZE;i=i+1)
accumulator = accumulator + registers[i];
ave <= accumulator/SIZE;
end
endmodule
This would be so much easier to write in SystemVerilog, which ISE supports:
module avg(
input [7:0] num_in ,
input clk ,
input rs ,
output logic [7:0] ave
);
parameter SIZE = 8;
logic [7:0] registers [SIZE] ;
always #(posedge clk) begin
if(rs)
registers = '{default:0};
registers <= {num_in, registers[1:$size(registers)-1]};
ave <= registers.sum() with (int'(item))/SIZE;
end
endmodule

Multiplication of 2 matrix in verilog

I've written a code for matrx multiplication in Verilog.
module multiply3x3(i1,i2,i3,i4,i5,i6,i7,i8,i9,j1,j2,j3,j4,j5,j6,j7,j8,j9,prod);
output reg [31:0]prod;
wire [7:0]resultant[3:0][3:0];
wire [7:0]a[3:0][3:0];
wire [7:0]b[3:0][3:0];
genvar i,j,k;
generate
for (i = 0; i <= 2; i=i+1) begin:i_
for (j = 0; j <= 2; j=j+1) begin:j_
assign resultant[i][j] = 8'd0;
for (k = 0; k <= 2; k=k+1) begin:k_
assign resultant[i][j] = resultant[i][j] + a[i][k] * b[k][j];
end
end
end
endgenerate
endmodule
initial begin
#100 prod = {resultant[0][0],resultant[0][1],resultant[0][2],resultant[1][0],resultant[1][1],resultant[1][2],resultant[2][0],resultant[2][1],resultant[2][2]};
end
This is where the multiplication happens, but i cannot get the output for this.
What am I doing wrong?
consider a,b declared properly.
Accumulation (a = a + p) doesn't work with wires. The type wire is supposed to model a physical wire.
You'll have to declare the variable resultant as a reg. The reg type, in Verilog, can in some cases be treated like a variable in other programming languages.
Also, you can't use the assign statement on a wire or reg multiple times (like you've done in line 78 and 80 of https://pastebin.com/txrcwUBd). You should use always (and not generate) blocks to perform such things.
Corrected Verilog:
reg [7:0] resultant[3:0][3:0];
int i, j, k;
always #(*)
for(i=0; i<3; i=i+1)
for(j=0; j<3; j=j+1) begin
resultant[i][j] = 8'd0;
for(k=0; k<3; k=k+1)
resultant[i][j] = resultant[i][j] + (a[i][k]*b[k][j]);
end

Initializing arrays in Verilog

How do I initialize the array Save_state? This statement is giving X value at the output:
reg [9:0] count
reg [9:0] Save_state [0: 1024];
always # (posedge Clock )
Count <=count+1 ;
Save_state[count] <=count ;
You can use an initial block as well. This is allowed in simulation and is synthesizable on some architectures (Xilinx FPGA and CPLD support register initialization)
reg [9:0] count
reg [9:0] Save_state [0: 1024];
integer i;
initial begin
count = 0;
for (i=0;i<=1024;i=i+1)
Save_state[i] = 0;
end
always # (posedge Clock ) begin
count <= count + 1;
Save_state[count] <= count;
end
Although for this particular example, in which the elements of the Save_state array will always have the same value, you can do like this (synthesizable on Xilinx and Altera, AFAIK):
reg [9:0] Save_state [0: 1024];
integer i;
initial begin
for (i=0;i<=1024;i=i+1)
Save_state[i] = i[9:0];
end
And at the beginning of you simulation, Save_state already have the values 0,1,2,...,1023 stored in it.
You can use a reset port to initialize count and save_state such as the following code :
integer i;
reg [9:0] count;
reg [9:0] save_state [0:1024];
always #(posedge clock or posedge reset) begin
if (reset) begin
count <= 0;
for (i=0; i<=1024; i=i+1)
save_state[i] <= 0;
end
else begin
count <= count + 1;
save_state[count] <= count;
end
end
The two statements inside the else block is applied at the same time and at the end of always block.

Error count for two different patterns in verilog

I have made 2 forms of data patterns and wants to compare them in the form of error count.....when the 2 patterns are not equal, the error count should be high....i made the code including test bench, but when i ran behavioral sumilation, the error count is only high at value 0 and not at value 1.....I expect it to be high at both 0 and 1....please help me out in this, since I am new with verilog
here is the code
`timescale 1ns / 1ps
module pattern(
clk,
start,
rst,enter code here
clear,
data_in1,
data_in2,
error
);
input [1:0] data_in1;
input [1:0] data_in2;
input clk;
input start;
input rst;
input clear;
output [1:0] error;
reg [1:0] comp_out;
reg [1:0] i = 0;
assign error = comp_out;
always#(posedge clk)
begin
comp_out = 0;
if(rst)
comp_out = 0;
else
begin
for(i = 0; i < 2; i = i + 1)
begin
if(data_in1[i] != data_in2[i])
comp_out <= comp_out + 1;
end
end
end
endmodule
here is the test bench for the above code
`timescale 1ns / 1ps
module tb_pattern();
// inputs
reg clk;
reg rst;
reg [1:0] data_in1;
reg [1:0] data_in2;
wire [1:0] error;
//outputs
//wire [15:0] count;
//instantiate the unit under test (UUT)
pattern uut (
// .count(count),
.clk(clk),
.start(start),
.rst(rst),
.clear(clear),
.data_in1(data_in1),
.data_in2(data_in2),
.error(error)
);
initial begin
clk = 1'b0;
rst = 1'b1;
repeat(4) #10 clk = ~clk;
rst = 1'b0;
forever #10 clk = ~clk; // generate a clock
end
initial begin
//initialize inputs
clk = 0;
//rst = 1;
data_in1 = 2'b00;
data_in2 = 2'b01;
#100
data_in1 = 2'b11;
data_in2 = 2'b00;
#100
$finish;
end
//force rest after delay
//#20 rst = 0;
//#25 rst = 1;
endmodule
When incrementing in a for loop you need to use blocking assignment (=), however when assigning flops you should use non-blocking assignment (<=). When you need to use a for loop to assign a flop, it is best to split the combinational and synchronous functionality into separate always blocks.
...
reg [1:0] comp_out, next_comb_out;
always #* begin : comb
next_comp_out = 0;
for (i = 0; i < 2; i = i + 1) begin
if (data_in1[i] != data_in2[i]) begin
next_comp_out = next_comp_out + 1;
end
end
end
always #(posedge clk) begin : dff
if (rst) begin
comb_out <= 1'b0;
end
else begin
comb_out <= next_comp_out;
end
end
...
begin
for(i = 0; i < 2; i = i + 1)
begin
if(data_in1[i] != data_in2[i])
comp_out <= comp_out + 1;
end
end
This for loop doesn't work the way you think it does. Because this is a non-blocking assignment, only the last iteration of the loop actually applies. So only the last bit is actually being compared here.
If both bits of your data mismatch, then the loop unrolls to something which looks like this:
comp_out <= comp_out + 1;
comp_out <= comp_out + 1;
Because this is non-blocking, the RHS of the equation are both evaluated at the same time, leaving you with:
comp_out <= 0 + 1;
comp_out <= 0 + 1;
So even though you tried to use this as a counter, only the last line takes effect, and you get a mismatch count of '1', no matter how many bits mismatch.
Try using a blocking statement (=) for comp_out assignment instead.

Resources