I am new to Verilog, I was trying to write a simple code but I am not sure how to do it in a expert way.
I have a 12 bit register "data", each bit of that register have a specific value. e.g.
Bit 0 = 12;
Bit 1 = 16;
Bit 2 = 33;
Bit 11 = 180;
Now if any bit of "data" register is 1 then the result should be the sum of all value that coresponds to that bit value. e.g.
data = 12'b100000000101
result = 225 (180+33+12)
Right now i am checking each bit of data, if it is 1 then i register that corresponding value and add it to previous registered value. This method takes number of cycles.
How can i do it in a fast way in verilog.
thank you

It depends on what you mean by "fast". Presumably you mean time, but remember that time=cycles/frequency - reducing the number of cycles will often reduce the maximum frequency your circuit can operate at.
For example, here's a circuit that does the entire add in one cycle:
always#(*) begin
tempsum = 0;
tempsum = tempsum + (data[0]? 12:0);
tempsum = tempsum + (data[1]? 16:0);
tempsum = tempsum + (data[2]? 33:0);
always#(posedge clock)
result <= tempsum;
If you synthesized this circuit, you'd see a long chain of adders. In could calculate the result in a single cycle, but would have a long critical path, and therefore have a lower fMax. Whether this would be "faster" is impossible to know until you synthesize it (there are too many factors to guess).
A better multi-cycle approach could be to use a tree, i.e.:
reg [31:0] sum [29:0];
always # (posedge clock) begin
// level 0
sum[0] <= (data[0]? 12:0) + (data[1]? 16:0);
sum[1] <= (data[2]? 33:0) + (data[3]? 40:0);
// ...
sum[15] <= (data[30]? 160:0) + (data[31]? 180:0);
// level 1
sum[16] <= sum [0] + sum [1];
sum[17] <= sum [2] + sum [3];
// ...
sum[23] <= sum [14] + sum [15];
// level 2
sum[24] <= sum [16] + sum [17];
sum[25] <= sum [18] + sum [19];
// ...
// level 3
sum[28] <= sum [24] + sum [25];
sum[29] <= sum [26] + sum [27];
result <= sum [28] + sum [29];
All that said, ultimately the "fastest" approach will also depend on the other requirements of your system, what you're implementing it on, etc.

you can try something like below :-
reg [15:0] sum;
always #(*)begin
for (i=0;i<12;i++)begin
if (data[i])
sum = sum+Bit[i];
end //for
end //always
assign finalSum = |data ? finalSum: 'h0;


Initialize and increment a genvar variable inside nested for loop of system verilog

I'm trying to write a synthesizable code in .sv to meet my requirements, but got stuck at the very end without a solution;
I've an incoming wire which is 99 bit wide and 10 (MAX) deep
I need to feed this to a module and get its output which is 99 bit
the output needs to be assigned to an array (99 bit wide ) from 0 .. max
I can't paste the snippet, but this is what I've coded:
param MAX = 10;
param TOT_INST = 45 ;
wire [98:0] inwire [MAX-1:0] ;
wire [98:0] outwire [TOT_INST-1:0]
genvar x,y,z ;
for (x = 0, z=0; x<=MAX-1; x=x+1, z=z+1 )
begin : gen1
for (y = x + 1; y<=MAX-1; y=y+1)
begin : gen2
some_module mod_inst ( .in1(inwire[x]), .in2(inwire[y]), .y(outwire[z]) );
The expectation is to get outwire[0] to be an output of inwire[0] and inwire[1], outwire[1] to be a function of inwire[0], inwire[1] etc. So, it becomes necessary to increment the index of outwire.
I used another genvar z for this purpose (to increment from 0 to 44). But, it looks like the SV doesn't support multiple genvar variables to be incremented? im getting compilation error at the for loop itself.
Is there any way to achieve what i need? I really appreciate you taking time to go through this question. Any insight would be really helpful.
I understand your intent. It seems you are trying to use comma expressions, but they wont work here.
Also, it seems that the genvar can only be assigned in the initialization and increment of the for loop, otherwise it would be easy to increment them on the innermost loop.
Since you must have unique drivers to the outwires, and the number of entries you declared (45) matches the number of instances you will create I assume you simply want them to be set incrementally.
What I would do is to calculate the number of iterations algebraically and create a local parameter. If you can't see how, review triangular numbers.
parameter MAX = 10;
// Generalizing your TOT_INST
parameter TOT_INST = MAX * (MAX - 1) / 2 ;
wire [98:0] inwire [MAX-1:0] ;
wire [98:0] outwire [TOT_INST-1:0];
genvar x,y;
for (x = 0; x <= MAX-1; x = x + 1 )
begin : gen1
for (y = x + 1; y<=MAX-1; y=y+1)
begin : gen2
localparam z = TOT_INST - ((MAX - x - 2) * (MAX - x - 1)) / 2 + y - MAX;
initial begin
$display("%d %d %d", x, y, z);
The formula would be simpler if we used the x in the inner loop.

Verilog Minimum Bit Width

I'm looking for a clean way to declare Verilog/SystemVerilog types with a parameterised bit width. This is what I've got so far and was wondering if there is a better way to do it. I've looked through the system functions in the LRM 1800-2009 and -2017. The closest I could find is $bits, but I would like something like $minbits. Have I overlooked something?
In VHDL, it's done by simply specifying the range:
signal counter: integer range 0 to MAX_COUNT;
...and the compiler will calculate the minimum bit width to hold that range.
For the parameter values of 20 ns and 125 ms, the counter should be 23 bits with MAX_COUNT being 6,250,000.
module Debounce
parameter CLOCK_PERIOD_ns = 20, // nanoseconds.
parameter DEBOUNCE_PERIOD_ms = 125 // milliseconds.
. . .
function int MinBitWidth([1023:0] value);
for (MinBitWidth = 0; value > 0; MinBitWidth = MinBitWidth + 1)
value = value >> 1;
localparam MAX_COUNT_32BITS = DEBOUNCE_PERIOD_ms * 1_000_000 / CLOCK_PERIOD_ns; // Default type of 32-bits.
localparam COUNTER_BITS = MinBitWidth(MAX_COUNT_32BITS); // Calculate actual bit width needed.
typedef logic [COUNTER_BITS - 1 : 0] TCounter;
localparam TCounter MAX_COUNT = MAX_COUNT_32BITS; // Assign to a type of the actual bit width (truncation warning from Quartus).
localparam TCounter ONE = 1;
TCounter counter;
. . .
always #(posedge clock)
. . .
if (counter == MAX_COUNT_32BITS - 1) // Synthesises a 32-bit comparer no matter how many bits are needed with unused bits tied to ground.
. . .
if (counter == MAX_COUNT - ONE) // Synthesises a 23-bit comparer as expected.
. . .
counter <= counter + 1; // Synthesises a 23-bit counter as expected.
. . .
counter <= counter + ONE; // Synthesises a 23-bit counter as expected.
Incorrect Algorithm
I considered $clog2 which is the correct way to obtain an address bus width from a RAM depth parameter. However, this is not the same as the minimum bit width of a value. Let me explain...
Consider a value of 4 which is 100 base-2 (3 bits wide).
The $clog2 algorithm calculates a value of 2, which is incorrect. It should be 3. The reason for this miscalculation is because $clog2 subtracts 1 from the value before it starts to compute the number of bits, i.e. 4 becomes 3, then it calculates the minimum bit width of the value 3, giving 2 bits. While this is mathematically correct for the ceiling of log base-2, it is not the bit width of the original value.
Here is the clogb2 algorithm from the LRM:
function integer clogb2;
input [31:0] value;
value = value - 1; // GOTCHA!
for (clogb2 = 0; value > 0; clogb2 = clogb2 + 1) begin
value = value >> 1;
Correct Algorithm
The correct algorithm is to calculate the minimum bit width of the original value, which is the algorithm given by #jonathan-mayer in his first answer before he edited it.
Here is the correct algorithm as a function:
function integer MinBitWidth;
input [1023:0] value;
for (MinBitWidth = 0; value > 0; MinBitWidth = MinBitWidth + 1)
value = value >> 1;
Just do +1 to get correct values for powers of 2.
$clog2(MAX_COUNT_32BITS + 1);
$clog2 from IEEE Std 1800-2017, section 20.8.1 Integer math functions:
The system function $clog2 shall return the ceiling of the log base 2
of the argument (the log rounded up to an integer value).
module tb;
parameter CLOCK_PERIOD_ns = 20; // nanoseconds.
parameter DEBOUNCE_PERIOD_ms = 125; // milliseconds.
localparam MAX_COUNT_32BITS = DEBOUNCE_PERIOD_ms * 1_000_000 / CLOCK_PERIOD_ns; // Default type of 32-bits.
localparam COUNTER_BITS = $clog2(MAX_COUNT_32BITS); // Calculate actual bit width needed.
initial begin

calculation of simulation time in verilog

I want to calculate the simulation time of a calculation of one prime number, which is the number of clock cycle to calculate one prime number. As we know, the calculation of a large prime number takes more clock cycles than a small prime number.
I used $time in Verilog whenever a prime is calculated and captured it in a time_s register. I calculated the difference of calculation after another prime number. Here is my code where you can see time_s1 captured the time when a prime is calculated. time_s2 is the time to calculate the difference.
module prime_number_count(
input clk
//for count 1
parameter N =100; // size of array
parameter N_bits = 32;
reg [N_bits-1:0] prime_number[0:N-1]; // memory array for prime_number
reg [N_bits-1:0] prime_aftr50 [0:49]; // memory array to get
integer k; // counter variable
integer k1; // counter variable
integer count;
integer test;
integer time_s1;
integer time_s2;
integer check; //Counts 1 to k
localparam S_INC = 2'b01;
localparam S_CHECK = 2'b10;
reg [1:0] state;
initial begin
prime_number[0] = 'd1;
prime_number[1] = 'd2;
//prime_aftr50[0] = 'd0;
state = S_CHECK; //Check set count first
count = 'd3;
k = 'd2; //0,1 preloaded
check = 'd1;
test = 'd1;
time_s1 = 'd0;
time_s2 = 'd0;
k1 = 'd0;
always #(posedge clk )
$display ("time of clock %d ", $time );
if(state == S_INC)
begin // if state is 1
//$display("State: Incrementing Number to check %d", count+1);
count <= count+1 ;
state <= S_CHECK ; // chang the state to 2
check <= 'd1; // Do not check against [0] value 1
test <= 'd1; // Safe default
else if (state == S_CHECK) begin
if (test == 0) begin
// Failed Prime test (exact divisor found)
$display("Reject %3d", count);
state <= S_INC ;
if (time_s2>30000)begin
time_s1 <=$realtime ;
state <= S_INC ;
k <= k + 1;
$display("Found %1d th Prime_1 %1d", k, count);
$display("display of simulation time" , time_s2);
end // end of simulation time
if (check == k) begin
//Passed Prime check
time_s1 <=$time ;
prime_number[k] <= count;
k <= k + 1;
state <= S_INC ;
$display("Found %1d th Prime_1 %1d", k, count);
$display("display of simulation time" , time_s2);
else begin
test <= count % prime_number[check] ;
check <= check + 1;
//$display("Checking %1d against %1d prime %1d : %1d", count, check, prime_number[check], count % prime_number[check]);
always #(posedge clk )
time_s2 <=$realtime-time_s1;
// $display("display of simulation time" , time_s2) ;
always # (posedge clk) begin
if ( k==51+(50*k1)) begin
prime_aftr50[k1] <= count;
k1 <= k1+1;
Background on time
Semantically I would recommend using time over integer, behind the scenes they are the same thing. But as it is only an integer it is limited to the accuracy of the timescale time_unit*. Therefore I would suggest you actually use realtime which is a real behind the scenes.
For displaying time %t can be used instead of %d decimal of %f for reals. The formatting of this can be controlled through $timeformat.
realtime capture = 0.0;
//To change the way (below) is displayed
initial begin
capture = $realtime;
$display("%t", capture);
To control how %t is displayed :
//$timeformat(unit#, prec#, "unit", minwidth);
$timeformat(-3, 2, " ms", 10); // -3 and " ms" give useful display msg
unit is the base that time is to be displayed in, from 0 to -15
precision is the number of decimal points to display.
"unit" is a string appended to the time, such as " ns".
minwidth is the minimum number of characters that will be displayed.
unit: recommended "unit" text
0 = 1 sec
-1 = 100 ms
-2 = 10 ms
-3 = 1 ms
-4 = 100 us
-5 = 10 us
-6 = 1 us
-7 = 100 ns
-8 = 10 ns
-9 = 1 ns
-10 = 100 ps
-11 = 10 ps
-12 = 1 ps
-13 = 100 fs
-14 = 10 fs
-15 = 1 fs
With these changes: realtime types, $realtime captures and displaying with %t analysing simulation time becomes a little easier.
Now to calculate the time between finding primes:
Add to your the following to intial begin:
$timeformat(-9, 2, " ns", 10);
Then in the state which adds the prime to the list you just need to add the following:
//Passed Prime check
time_s2 = time_s1; //Last Prime
time_s1 = $realtime ;
$display("Found %1d th Prime_1 %1d", k, count);
$display("Found at time : %t", time_s1);
$display("Time Diff : %t", time_s1 - time_s2);
Working example on EDA Playground.
*: time scales for verilog simulations are set by, the time_unit sets the decimal point so any further accuracy from the precision is lost when using time or integer to record timestamps.
`timescale <time_unit>/ <time_precision>
See section 22.7 of IEEE 1800-1012 for more info.

Verilog, generic adder tree

So, I'm trying to write an adder tree in verilog. The generic part of it is that it has a configurable number of elements to add and a configurable word size. However, I'm encountering problem after problem and I'm starting to question that this is the right way to solve my problem. (I will be using it in a larger project.) It is definately possible to just hard code the adder tree, alhough that will take alot of text.
So, I though I'd check with you stack overflowers on what you think about it. Is this "the way to do it"? I'm open for suggestions on different approaches too.
I can also mention that I'm quite new to verilog.
In case anyone is interested, here's my current non-working code: (I'm not expecting you to solve the problems; I'm just showing it for convenience.)
module adderTree(
input clk,
input [`WORDSIZE * `BANKSIZE - 1 : 0] terms_flat,
output [`WORDSIZE - 1 : 0] sum
genvar i, j;
reg [`WORDSIZE - 1 : 0] pipeline [2 * `BANKSIZE - 1 : 0]; // Pipeline array
reg clkPl = 0; // Pipeline clock
assign sum = pipeline[0];
// Pack flat terms
for (i = `BANKSIZE; i < 2 * `BANKSIZE; i = i + 1) begin
always # (posedge clk) begin
pipeline[i] <= terms_flat[i * `WORDSIZE +: `WORDSIZE];
clkPl = 1;
// Add terms logarithmically
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always # (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
Here are a few comments you might find useful:
It is generally good to have as few clocks as possible in your design (preferably just one).
In this particular case it appears you are trying generating a new clock clkPl, but this does not work because it will never return to 0. (The "reg clkPl=0;" will reset it to 0 at time 0, then it is set permanently to 1 in "clkPl = 1;".)
You can fix this by simply replacing
always # (posedge clkPl)
always # (posedge clk)
It is good form to only use blocking assignments in combinatorial blocks, and non-blocking in clocked blocks. You are mixing both blocking and non-blocking assignments in your "Pack flat terms" section.
As you don't need clkPl you can simply delete the line with the blocking assignment ("clkPl = 1;")
Your double for loop:
for (i = 0; i < $clog2(`BANKSIZE); i = i + 1) begin
for (j = 0; j < 2 ** i; j = j + 1) begin
always # (posedge clkPl) begin
pipeline[i * (2 ** i) + j] <= pipeline[i * 2 * (2 ** i) + 2 * j] + pipeline[i * 2 * (2 ** i) + 2 * j + 1];
looks like it will access incorrect elements.
e.g. for BANKSIZE = 28, **i will count up to 7, at which point "pipeline[i * (2 ** i) + j]"="pipeline[7*2**7+j]"="pipeline[896+j] which will be out of bounds for the array. (The array has 2*BANKSIZE=512 elements in it.)
I think you actually want this structure:
assign sum = pipeline[1];
for (i = 1; i < `BANKSIZE; i = i + 1) begin
always # (posedge clk) begin
pipeline[i] <= pipeline[i*2] + pipeline[i*2 + 1];
Note that most verilog tools are very good at synthesising adds of multiple elements so you may want to consider combining more terms at each level of the hierarchy.
(Adding more terms costs less than someone might expect because the tools can use optimisations such as carry save adders to reduce the gate delay.)

Size of arithmetic operation result in Verilog

I am making a signed comparator in Verilog. Here is the code:
module signedComparator(a0, a1, a2, b0, b1, b2, G, E, L);
input a0, a1, a2, b0, b1, b2;
output reg G, E, L;
always#(a0 or a1 or a2 or b0 or b1 or b2)
if(a2 == 0 && b2 == 0) //both a and b >= 0
L <= {a1,a0} < {b1,b0};
G <= {a1,a0} > {b1,b0};
E <= {a1,a0} == {b1,b0};
else if(a2 == 1 && b2 == 0) //a negative, b >= 0
L <= 1;
G <= 0;
E <= 0;
else if(a2 == 0 && b2 == 1) //a >= 0, b negative
L <= 0;
G <= 1;
E <= 0;
else //both a and b negative
L <= (~{a1,a0} + 1) > (~{b1,b0} + 1);
G <= (~{a1,a0} + 1) < (~{b1,b0} + 1);
E <= (~{a1,a0} + 1) == (~{b1,b0} + 1);
I am wondering, when adding vectors, what is the length of the intermediate result? I am concerned about the last case (L <= (~{a1,a0} + 1) > (~{b1,b0} + 1);). When adding 1 to ~{a1,a0}, is the result three bits in length for the comparison, or will {1,1} + 1 = {0,0}? Is there documentation somewhere for what the data type of intermediate results in verilog will be? This is hard to search for since I don't yet know the proper terminology.
I'm assuming this is for synthesis and have a few comments about your code. You seem to be using individual bits as inputs to the module and then using concatenation to make vectors later on. You can avoid this by declaring ports as signed vectors and doing a comparison directly.
input signed [2:0] a,b;
if(a == b)
else if(a > b)
Also, you are using non-blocking assignments to model combinational logic. These will work in the code you posted but really shouldn't be used in this manner. They work much better modeling synchronous logic via a clocked process. There's a good paper that summarizes a good coding style for synthesis.
I am wondering, when adding vectors, what is the length of the intermediate result?
The spec has a table for this as it depends on the operands and context.
An integer : Unsized constants are at least 32-bits
{a,b} : sizeof(a) + sizeof(b)
~{a} : sizeof(a)
a + b : max(sizeof(a),sizeof(b))
Thus your comparison operands will both be (at least) 32-bits. You can explicitly assign a constant size by using a tick before the value.
4'b1 // 0001 Binary 1
4'd1 // 0001 Decimal 1
4'd8 // 1000 Decimal 8
1'b1 // 1 Binary 1
'b1 // The same as 1, tick here only specifies dec/oct/bin format
Is there documentation somewhere for what the data type of
intermediate results in verilog will be?
By far the best resource I've found for details like this is the spec itself, IEEE 1364.
