How to implement transition coverage on non-consecutive sampling points? - verilog

var_1 changes from value 0 to 1 then from 1 to 2 and so on till 15, but not on consecutive sampling points. I sample on every clock cycle, but the value might change after some arbitrary clk cycles. The transition coverage I write does not work. Can we write transition coverage for this case?
bit [3:0] var_1;
var1: coverpoint var_1
{
bins var_1_trans_bin = (0=>1=>2=>3=>4=>5=>6=>7=>8=>9=>10=>11=>12=>13=>14=>15);
bins var_1_bin[] = {[0:15]};
}
I see that the var_1_bin is getting covered 100% but not the var_1_trans_bin.
Here is the whole code:
module GRG_coverpoint;
bit [3:0] var_1;
bit Running;
bit clk;
// Example showing bins and transitions
covergroup CG1 #(posedge clk);
coverpoint var_1
{
bins var_1_trans_bin = (0=>1=>2=>3=>4=>5=>6=>7=>8=>9=>10=>11=>12=>13=>14=>15);
bins var_1_bin[] = {[0:15]};
}
endgroup
initial begin
automatic CG1 cg1_inst = new;
for (int j = 0; j < 16; j++)
begin
var_1 = j;
#20;
end
$display ("CG1 Coverage = %.2f%%", cg1_inst.get_coverage());
Running = 0;
end
initial begin
clk = 0;
Running = 1;
while (Running) begin
#5 clk = ~clk;
end
$display ("Finished!!");
end
endmodule

As you realized, you do not want to sample coverage on every clock cycle. You want to sample it only when var_1 changes value. You can declare the covergroup without the optional coverage_event (#(posedge clk) in your case), then call the sample method in a separate procedural block every time the variable changes:
module GRG_coverpoint;
bit [3:0] var_1;
bit [3:0] var_2;
bit Running;
bit clk;
// Example showing bins and transitions
covergroup CG1;
coverpoint var_1
{
bins var_1_trans_bin = (0=>1=>2=>3=>4=>5=>6=>7=>8=>9=>10=>11=>12=>13=>14=>15);
bins var_1_bin[] = {[0:15]};
}
endgroup
CG1 cg1_inst = new;
initial begin
cg1_inst.sample(); // Sample the initial 0 value
forever #(var_1) cg1_inst.sample();
end
initial begin
for (int j = 0; j < 16; j++)
begin
var_1 = j;
#20;
end
$display ("CG1 Coverage = %.2f%%", cg1_inst.get_coverage());
Running = 0;
end
initial begin
clk = 0;
Running = 1;
while (Running) begin
#5 clk = ~clk;
end
$display ("Finished!!");
end
endmodule

Related

Why is my register not updating in the testbench?

I have a register that I increment by a different value base on the different inputs, When I run a test bench to check it, the value does not increment. Not sure why. Attached is my code and the TB, as well as a screen shot of the simulation:
`timescale 1ns / 1ps
module TEMP_P1(
input clk,
input reset,
input inquarter,
input indime,
input innickle,
input inbev1,
input inbev2,
input inbev3,
output reg [3:0] outquarter,
output reg [3:0] outdime,
output reg [3:0] outnickle,
output reg [1:0] outbev1,
output reg [1:0] outbev2,
output reg [1:0] outbev3
);
reg [31:0] total;
reg [3:0] quarter_count;
reg [3:0] dime_count;
reg [3:0] nickle_count;
always#(clk)begin
if(reset)begin
total = 0;
quarter_count = 0;
dime_count = 0;
nickle_count = 0;
end else begin
if (inquarter) begin
quarter_count = quarter_count + 1;
total = total + 25;
end
if (indime) begin
dime_count = dime_count + 1;
total = total + 10;
end
if (innickle) begin
nickle_count = nickle_count + 1;
total = total + 5;
end
if ((inbev1 === 1) && ( total >= 100)) begin
outbev1 = 1;
total = total - 100;
end
if ((inbev2 == 1) && ( total >= 120)) begin
outbev2 = 1;
total = total - 120;
end
if ((inbev3 == 1) && ( total >= 115)) begin
outbev3 = 1;
total = total - 115;
end
if (total >= 25) begin
outquarter = 1;
outdime = 0;
outnickle = 0;
total = total - 25;
end
if ((total >= 10) && (total < 25)) begin
outquarter = 0;
outdime = 1;
outnickle = 0;
total = total - 10;
end
if ((total >= 5) && (total < 10)) begin
outquarter = 0;
outdime = 0;
outnickle = 1;
total = total - 5;
end
end
end
endmodule
`timescale 1ns / 1ps
module TEMP_P1_TB(
);
reg clk;
reg reset;
reg inquarter;
reg indime;
reg innickle;
reg inbev1;
reg inbev2;
reg inbev3;
wire [3:0] outquarter;
wire [3:0] outdime;
wire [3:0] outnickle;
wire [1:0] outbev1;
wire [1:0] outbev2;
wire [1:0] outbev3;
TEMP_P1 UUT(.clk(clk), .reset(reset), .inquarter(inquarter), .indime(indime), .innickle(innickle), .inbev1(inbev1), .inbev2(inbev2), .inbev3(inbev3),
.outquarter(outquarter), .outdime(outdime), .outnickle(outnickle), .outbev1(outbev1), .outbev2(outbev2), .outbev3(outbev3));
initial begin
clk = 0;
reset = 0;
inquarter = 0;
indime = 0;
innickle = 0;
inbev1 = 0;
inbev2 = 0;
inbev3 = 0;
#10;
reset = 1;
#1
reset = 0;
#1
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inquarter = 1;
#1;
inquarter = 0;
#1;
inbev1 = 1;
#1;
inbev1 = 0;
#1;
end
always begin
#1 clk = ~clk;
end
endmodule
I have tried changing blocking v. nonblocking assignments.
One major problem with your code is that you have Verilog simulation race conditions.
To model registers, you need to use these coding styles:
Signal changes triggered off only one edge of the clock (positive, for example):
#(posedge clk)
Nonblocking assignments: <=
This style must be used in both the design and testbench.
For the design, refer to these simple code examples.
In the testbench, you should replace all of the # delays, except for the clk signal, with #(posedge clk). This assures that the inputs will be synchronous to the clock. clk should use a blocking assignment; so there is no need to change your code there.
Also, the testbench code can be easier to understand if you use loops. I use repeat loops. Here are all the changes to both the design and testbench:
`timescale 1ns / 1ps
module TEMP_P1(
input clk,
input reset,
input inquarter,
input indime,
input innickle,
input inbev1,
input inbev2,
input inbev3,
output reg [3:0] outquarter,
output reg [3:0] outdime,
output reg [3:0] outnickle,
output reg [1:0] outbev1,
output reg [1:0] outbev2,
output reg [1:0] outbev3
);
reg [31:0] total;
reg [3:0] quarter_count;
reg [3:0] dime_count;
reg [3:0] nickle_count;
always #(posedge clk) begin
if (reset) begin
total <= 0;
quarter_count <= 0;
dime_count <= 0;
nickle_count <= 0;
end else begin
if (inquarter) begin
quarter_count <= quarter_count + 1;
total <= total + 25;
end
if (indime) begin
dime_count <= dime_count + 1;
total <= total + 10;
end
if (innickle) begin
nickle_count <= nickle_count + 1;
total <= total + 5;
end
if ((inbev1 === 1) && ( total >= 100)) begin
outbev1 <= 1;
total <= total - 100;
end
if ((inbev2 == 1) && ( total >= 120)) begin
outbev2 <= 1;
total <= total - 120;
end
if ((inbev3 == 1) && ( total >= 115)) begin
outbev3 <= 1;
total <= total - 115;
end
if (total >= 25) begin
outquarter <= 1;
outdime <= 0;
outnickle <= 0;
total <= total - 25;
end
if ((total >= 10) && (total < 25)) begin
outquarter <= 0;
outdime <= 1;
outnickle <= 0;
total <= total - 10;
end
if ((total >= 5) && (total < 10)) begin
outquarter <= 0;
outdime <= 0;
outnickle <= 1;
total <= total - 5;
end
end
end
endmodule
`timescale 1ns / 1ps
module TEMP_P1_TB;
reg clk;
reg reset;
reg inquarter;
reg indime;
reg innickle;
reg inbev1;
reg inbev2;
reg inbev3;
wire [3:0] outquarter;
wire [3:0] outdime;
wire [3:0] outnickle;
wire [1:0] outbev1;
wire [1:0] outbev2;
wire [1:0] outbev3;
TEMP_P1 UUT(.clk(clk), .reset(reset), .inquarter(inquarter), .indime(indime), .innickle(innickle), .inbev1(inbev1), .inbev2(inbev2), .inbev3(inbev3),
.outquarter(outquarter), .outdime(outdime), .outnickle(outnickle), .outbev1(outbev1), .outbev2(outbev2), .outbev3(outbev3));
initial begin
clk <= 0;
reset <= 0;
inquarter <= 0;
indime <= 0;
innickle <= 0;
inbev1 <= 0;
inbev2 <= 0;
inbev3 <= 0;
repeat (5) #(posedge clk);
reset <= 1;
repeat (1) #(posedge clk);
reset <= 0;
repeat (5) begin
repeat (1) #(posedge clk);
inquarter <= 1;
repeat (1) #(posedge clk);
inquarter <= 0;
end
repeat (1) #(posedge clk);
inbev1 <= 1;
repeat (1) #(posedge clk);
inbev1 <= 0;
repeat (10) #(posedge clk);
$finish;
end
always begin
#1 clk = ~clk;
end
endmodule
The above code fixes the timing problems (race conditions).
However, you also have problems with your logic since the total gets cleared after each quarter input, as you can see in the waves below:
If that is not your intended behavior, I recommend you spend more time looking at your internal waveforms. If you still have problems, you can ask a new question.
In the UUT, add the keyword posedge to # statement, like this:
always#(posedge clk)begin
In the testbench, assert reset at the beginning, release it a couple of clocks later, don't assert it again. It was asserted for 1/2 of a clock some time after the test was started. Best practice is to assert it at the unless there is a good reason not to do so.
initial begin
reset = 1;
#5 reset = 0;
Use non-blocking assignments everywhere in the UUT # statement for the clocked process like this (I did not fix them all, you should). Non-blocking assignments in a synchronous process is the correct way model registers:
always#(posedge clk)begin
if(reset)begin
total <= 0;
quarter_count <= 0;
dime_count <= 0;
nickle_count <= 0;
end else begin
if (inquarter) begin
quarter_count <= quarter_count + 1;
total <= total + 25;
In the testbench, hold the inputs (example inquarter) for at least 1 clock cycle so that they are able to be sampled at a logic 1, at the clock edge. inquarter is 1/2 a clock cycle wide. The clk period is #2, so if you change an input every #1, that is making a skinny pulse that would be easy for the DUT to miss. I did not fix these in the testbench you should. Register quarter_count is catching the input at 1 almost by accident in my simulation.
Make these changes and the registers start behaving as registers.
The waves looks like this for me on edaplayground:
Another issue is that you have an always block in the testbench
always begin
#1 clk = ~clk;
however there is no $finish or $stop anywhere in your code. The simulation will run forever until you click some sort of kill/stop in the simulation GUI, or kill the process from the command line. The solution to this is to add a
$finish;
at the end of the testbench main initial block. Now the simulation will compile, elaborate, run, and stop relatively quickly. My eda playground simulation run of your post takes about 15 seconds total.

How to write verilog testbench to loop through a n bit input n times

I am writing a testbench to loop through a 16 bit Data input I have where it will go through each bit and change the value from a 0 to a 1, for example the first iteration would be 10000...00, second would be 010000...00, 001000...00, and so on. Here is what I have right now.
module testbench();
//inputs
reg [15:0] Data = 0;
//outputs
wire [15:0] Errors;
OLS uut (
.Data (Data),
.Errors (Errors)
);
integer k = 0;
initial
begin
Data = 0;
for(k = 0; k<16; k=k+1)
begin
Data[k] = 1;
if(k>0)
begin
Data[k-1] = 0;
end
end
end
endmodule
I am unsure if I have made a mistake with my testbench or if this is expected behavior, but I can't tell how I am supposed to see the expected output in each iteration. I have tried to use console outputs to keep track of where I am in the loop and if I am resetting the previous bit to 0 after I am done with that one.
I expect to get 0 in the 'Errors' output in every iteration, so basically I need help to verify my code does what I want it to do, and also how to read the graphical output of the simulation.
The loop in the post unrolls in 0 time.
Some delay is needed to create a waveform.
Also need a $finish, otherwise the testbench runs forever.
Like this:
module testbench();
//inputs
reg [15:0] Data = 0;
integer k = 0;
initial
begin
#100;
end
initial
begin
Data = 0;
for(k = 0; k<16; k=k+1)
begin
Data[k] = 1;
if(k>0)
begin
Data[k-1] = 0;
end
#5; // delay here
end
end
initial
begin
$dumpfile("dump.vcd");
$dumpvars;
end
endmodule

vivado simulation error: Iteration limit 10000 is reached

While I was trying to run the simulation in vivado, I got:
ERROR: Iteration limit 10000 is reached. Possible zero delay
oscillation detected where simulation time can not advance. Please
check your source code. Note that the iteration limit can be changed
using switch -maxdeltaid. Time: 10 ns Iteration: 10000
I don't have any initial statement in my module being tested.
Could anybody point out where the problem could be?
`timescale 1ns / 1ps
module mulp(
input clk,
input rst,
input start,
input [4:0] mplier, // -13
input [4:0] mplcant, // -9
output reg done,
output [9:0] product
);
parameter N = 6;
parameter Idle = 2'b00;
parameter Load = 2'b01;
parameter Oper = 2'b10;
parameter Finish = 2'b11;
reg done_r;
reg [N-1:0] A, A_r, B, B_r;
reg [1:0] state, state_r;
reg [2:0] count, count_r;
wire [N-2:0] C, C_comp;
reg [N-2:0] C_r;
assign C = mplcant; assign C_comp = {~C + 1};
assign product = {A_r[N-2:0], B_r[N-2:0]};
always #(posedge clk) begin
if (rst) begin
state_r <= Idle;
count_r <= 0;
done_r <= 0;
A_r <= 0;
B_r <= 0;
end else begin
state_r <= state;
count_r <= count;
done_r <= done;
A_r <= A;
B_r <= B;
end // if
end // always
always #(*) begin
state = state_r;
count = count_r - 1; // count: 6
done = done_r;
A = A_r;
B = B_r;
case (state)
Idle: begin
if (start) begin
state <= Load;
end // if
end
Load: begin
A = 0; B = {mplier, 1'b0}; count = N; // start at 6
state = Oper;
end
Oper: begin
if (count == 0)
state = Finish;
else begin
case (B[1:0])
2'b01: begin
// add C to A
A = A_r + {C[N-2], C[N-2:0]};
// shift A and B
A = {A_r[N-1], A_r[N-1:1]};
B = {A_r[0], B_r[N-1:1]};
end
2'b10: begin
A = A_r + {C_comp[N-2], C_comp[N-2:0]};
A = {A_r[N-1], A[N-1:1]};
B = {A_r[0], B_r[N-1:1]};
end
(2'b00 | 2'b11): begin
A = {A_r[N-1], A[N-1:1]};
B = {A_r[0], B_r[N-1:1]};
end
default: begin
state = Idle; done = 1'bx; // error
end
endcase
end // else
end // Oper
Finish: begin
done = 1;
state = Idle;
end // Finish
default: begin
done = 1'bx;
state = Idle;
end
endcase
end // always
endmodule
You have a combinational loop. You are sampling and driving the state signal in the combinational always block. Typically, you sample the registered state variable (state_r in your code) in an FSM. Change:
case (state)
to:
case (state_r)
Unrelated, but you should use all blocking assignments in the combo block (not a mixture). Change:
state <= Load;
to:
state = Load;

I want to implement a circuit in my DE1-SOC based on the SDRAM, where should I start? (I already finished a part)

I want to make a simple project on which I load 10 numbers in SDRAM of my Altera DE1-SOC ready to be taken as input for a Logic Unit I am creating,
the logic unit only does a simple arithmetic " Y =(X+1)*(X-1), X is the input and Y is the output ".It will pick the values (one by one) from the SDRAM, calculate and spit out the result in another SDRAM arrangement.
Then the SDRAM should store this data, I wish to take this data out of the DE1-SOC to a PC, for example.
Until now I've done this code, (in case is necessary to check):
module mem_prue1 (rst_n, clk, fin);
input clk, rst_n;
output fin;
wire [6:0] data_X;
reg [6:0] sec_A, sec_B, s_sec_A, s_sec_B;
reg [13:0] rslt_Y, s_rslt_Y;
reg save_sec_A, save_sec_B, save_rslt_Y, set_ram;
reg clear, enable, next_num, no_num, fin, w_mem_out;
reg [1:0] state, nextstate;
reg [3:0] indx;
parameter S0 = 0; parameter S1 = 1; parameter S2 = 2; parameter S3 = 3;
RAM_IN RAM_IN_inst1 (
.data_X (data_X),
.indx(indx)
);
RAM_OUT RAM_OUT_inst1 (
.s_rslt_Y (s_rslt_Y),
.w_mem_out (w_mem_out),
.set_ram (set_ram)
);
always # (posedge clk or negedge rst_n)
begin
if (~rst_n)
begin
set_ram <= 1;
indx <= 0;
no_num <=0;
enable <= 1;
s_sec_A <= 0;
s_sec_B <= 0;
s_rslt_Y <= 0;
state <= S0;
end
else if (clear)
begin
enable <= 0;
state <= nextstate;
no_num <= 0;
indx <= 0;
set_ram <= 1;
fin <= 1;
end
else
begin
set_ram <= 0;
state <= nextstate;
if (save_sec_A)
s_sec_A <= sec_A;
if (save_sec_B)
s_sec_B <= sec_B;
if (save_rslt_Y)
s_rslt_Y <= rslt_Y;
if (next_num)
begin
if (indx >= 9)
begin
indx <= 0; /// resetea el indice de la memoria
no_num <= 1; // se informa que no hay numeros
end
else
indx <= indx + 4'b0001;
end
end
end
always # (*)
begin
w_mem_out = 0;
sec_A = 0; sec_B = 0; rslt_Y = 0;
save_sec_A = 0; save_sec_B = 0;
save_rslt_Y = 0; clear = 0;
next_num = 0;
case (state)
S0:
begin
if (~enable)
nextstate = S0;
else
begin
sec_A = data_X + 7'b0000001;
save_sec_A = 1;
nextstate = S1;
end
end
S1: begin
sec_B = data_X - 7'b0000001;
save_sec_B = 1;
nextstate = S2;
end
S2: begin
rslt_Y = s_sec_A * s_sec_B;
save_rslt_Y = 1;
nextstate = S3;
end
S3: begin
w_mem_out = 1;
next_num = 1;
nextstate = S0;
if (no_num == 1)
clear = 1;
end
default:
nextstate = S0;
endcase
end
endmodule
This is the memory I "simulated" as a RAM for input data :
module RAM_IN (data_X, indx);
input [0:3] indx;
output [6:0] data_X;
reg [6:0] data_X;
reg [6:0] in_ram [0:9];
always # (indx)
data_X = in_ram [indx];
initial
begin
$readmemb("C:/altera/15.0/PROYECTOS/mem_prue/in_ram.txt", in_ram);
end
endmodule
and this for output data:
module RAM_OUT (s_rslt_Y, w_mem_out, set_ram);
input [13:0]s_rslt_Y;
input set_ram, w_mem_out;
reg [3:0] addr_out; // tamano de 57600 datos
reg [13:0] mem_out [0:9];
always # (w_mem_out or set_ram)
begin
if (set_ram)
addr_out = 0;
else if (w_mem_out == 1)
begin
mem_out [addr_out] = s_rslt_Y;
addr_out = addr_out + 4'b0001;
end
else
addr_out = addr_out;
end
endmodule
and The test bench:
module mem_prue1_tb ();
wire fin;
reg clk, rst_n;
mem_prue1 mem_prue1_inst1 (
.clk(clk),
.rst_n (rst_n),
.fin (fin)
);
initial
begin
rst_n <= 1;
#1 rst_n <= 0;
#2 rst_n <= 1;
clk <= 1;
end
always
begin
#5 clk = ~clk;
end
//---------------------------
integer out,i;
initial begin
out=$fopen("C:/altera/15.0/PROYECTOS/mem_prue/mem_out.txt");
end
always#(posedge clk) begin
if(fin==1)
for(i=0;i<=9;i=i+1) begin
$fdisplay(out,"%b",mem_prue1_inst1.RAM_OUT_inst1.mem_out[i]);
if(i==9)begin
$stop;
end
end
end
endmodule
So, basically now I want to substitute that "simulated" RAM for real SDRAM, I don't know what is the most practical way to do it.
Should I use QSYS, NIOS-II, or only by learning the Megawizard IP library and generating a variation of the UniPHY. I'm just learning to use the FPGA, so I'm kinda confused at this part. I want to download the proper manuals and tutorials for learn this in detail but I wish you guys could orient me.
PD: My target would be to "isolate" my logic unit from the "simulated ram" because I'm guessing if I program just like I did, it will consume logic resources and my main goal is to calculate the Area, Energy and Speed consumption of my logic ONLY, without the memory burden.
Thanks.
Your keywords, (QSYS, megawizard, uniphy) indicate Altera. If you are just going to simulate the SDRAM, you should be okay. Sometimes, bringing up that interface in a real chip gets hairy the first time.
If you are just doing simulation, I would use QSYS to generate the SDRAM controller module. If you can do DDR3, that there is the ability to generate an Example Design. If you do that, you will be able to see how the interface to the DDR3 works. In fact it should be already to go.
As an FYI, there will be more latency on the read though, so you need to be able to either wait for the response, or you need to have a pipeline architecture, where you can have multiple reads in flight simultaneously.
The "FPGAs Now What?" tutorial offers some advice (for a Xilinx platform, which apparently doesn't match your particular case) on SDRAM simulation. Basically, it boils down to finding an SDRAM vendor with an available Verilog/VHDL model, and plugging it in to a simulation testbench. (Note that these models aren't going to be synthesizeable.)
http://www.xess.com/static/media/appnotes/FpgasNowWhatBook.pdf
Altera has a tutorial for connecting the SDRAM to a Nios II system (using Qsys) on the DE1-SoC board.
ftp://ftp.altera.com/up/pub/Altera_Material/16.0/Tutorials/Verilog/DE1-SoC/Using_the_SDRAM.pdf
If you're implementing your own controller (or using a HW only IP Core), the tutorial also has the timing information for the SDRAM as well.

Reduce array to sum of elements

I am trying to reduce a vector to a sum of all it elements. Is there an easy way to do this in verilog?
Similar to the systemverilog .sum method.
Thanks
My combinational solution for this problem:
//example array
parameter cells = 8;
reg [7:0]array[cells-1:0] = {1,2,3,4,5,1,1,1};
//###############################################
genvar i;
wire [7:0] summation_steps [cells-2 : 0];//container for all sumation steps
generate
assign summation_steps[0] = array[0] + array[1];//for less cost starts witch first sum (not array[0])
for(i=0; i<cells-2; i=i+1) begin
assign summation_steps[i+1] = summation_steps[i] + array[i+2];
end
endgenerate
wire [7:0] result;
assign result = summation_steps[cells-2];
Verilog doesn't have any built-in array methods like SV. Therefore, a for-loop can be used to perform the desired functionality. Example:
parameter N = 64;
integer i;
reg [7:0] array [0:N-1]
reg [N+6:0] sum; // enough bits to handle overflow
always #*
begin
sum = {(N+7){1'b0}}; // all zero
for(i = 0; i < N; i=i+1)
sum = sum + array[i];
end
In critiquing the other answers delivered here, there are some comments to make.
The first important thing is to provide space for the sum to be accumulated. statements such as the following, in RTL, won't do that:
sum = sum + array[i]
because each of the unique nets created on the Right Hand Side (RHS) of the expression are all being assigned back to the same signal called "sum", leading to ambiguity in which of the unique nets is actually the driver (called a multiple driver hazard). To compound the problem, this statement also creates a combinational loop issue because sum is used combinationally to drive itself - not good. What would be good would be if something different could be used as the load and as the driver on each successive iteration of the loop....
Back to the argument though, in the above situation, the signal will be driven to an unknown value by most simulator tools (because: which driver should it pick? so assume none of them are right, or all of them are right - unknown!!). That is if it manages to get through the compiler at all (which is unlikely, and it doesn't at least in Cadence IEV).
The right way to do it would be to set up the following. Say you were summing bytes:
parameter NUM_BYTES = 4;
reg [7:0] array_of_bytes [NUM_BYTES-1:0];
reg [8+$clog2(NUM_BYTES):0] sum [NUM_BYTES-1:1];
always #* begin
for (int i=1; i<NUM_BYTES; i+=1) begin
if (i == 1) begin
sum[i] = array_of_bytes[i] + array_of_bytes[i-1];
end
else begin
sum[i] = sum[i-1] + array_of_bytes[i];
end
end
end
// The accumulated value is indexed at sum[NUM_BYTES-1]
Here is a module that works for arbitrarily sized arrays and does not require extra storage:
module arrsum(input clk,
input rst,
input go,
output reg [7:0] cnt,
input wire [7:0] buf_,
input wire [7:0] n,
output reg [7:0] sum);
always #(posedge clk, posedge rst) begin
if (rst) begin
cnt <= 0;
sum <= 0;
end else begin
if (cnt == 0) begin
if (go == 1) begin
cnt <= n;
sum <= 0;
end
end else begin
cnt <= cnt - 1;
sum <= sum + buf_;
end
end
end
endmodule
module arrsum_tb();
localparam N = 6;
reg clk = 0, rst = 0, go = 0;
wire [7:0] cnt;
reg [7:0] buf_, n;
wire [7:0] sum;
reg [7:0] arr[9:0];
integer i;
arrsum dut(clk, rst, go, cnt, buf_, n, sum);
initial begin
$display("time clk rst sum cnt");
$monitor("%4g %b %b %d %d",
$time, clk, rst, sum, cnt);
arr[0] = 5;
arr[1] = 6;
arr[2] = 7;
arr[3] = 10;
arr[4] = 2;
arr[5] = 2;
#5 clk = !clk;
#5 rst = 1;
#5 rst = 0;
#5 clk = !clk;
go = 1;
n = N;
#5 clk = !clk;
#5 clk = !clk;
for (i = 0; i < N; i++) begin
buf_ = arr[i];
#5 clk = !clk;
#5 clk = !clk;
go = 0;
end
#5 clk = !clk;
$finish;
end
endmodule
I designed it for 8-bit numbers but it can easily be adapted for other kinds of numbers too.

Resources