How to print system verilog coverage bin value at end of simulation?

How to print system verilog coverage bin value at end of simulation? - verilog

I want to find out how many times the state machine went through the following sequence of states by displaying the count at the end of the simulation.
I could not find a way to dump the value of the bin "b" in the code below.
interface i;
typedef enum { S0, S1, S2, S3} state_e;
state_e state;
assign state = dut1.sm_state;
covergroup my_cg #(state);
coverpoint state {
bins b = (S0 => S1 => S2 => S3);
}
endgroup
my_cg cg1 = new();
final begin
$display("COVERAGE:CG1.state:%0d", cg1.state.get_coverage());
end
endinterface
Currently the output gives 100 if the sm went through the arc even once. I would instead like the count how many times it went through the arc.

get_coverage does not give you a count of bin hits—it only gives you the percent of bins hit, or the ratio of bins hit to the total number of bins. For performance, most tools stop counting after the bin has met its required minimum hits, just 1 hit by default. This saves not only the counting, but also evaluating the set of selection expressions that determine which bin to hit.
For debug, most tools give you a way of reporting the actual bin hit counts for the entire simulation.

Related

Same random number sequence generated by Verilog-A code when running the code consecutively

Recently I have to use Verilog-A to generate a set of random numbers (sigmaX, sigmaY, sigmaZ). Statistically, each of them has mean=0 and std=1, and sigmaX^2+sigmaY^2+sigmaZ^2=1. The following code in test_solver.va file is writen in Verilog-A to realize such random numebr set at each time step:
`include "disciplines.h"
`include "constants.h"
module test_va(p,n,mb,mc,md,me,mf,mg);
inout p,n;
output mb,mc,md,me,mf,mg;
electrical p,n,mb,mc,md,me,mf,mg;
real randomX,randomY,randomZ; // Gaussian random variables with mean = 0, stdev = 1
real sigmaX,sigmaY,sigmaZ; // Normalized thermal noise vector components
integer seedX,seedY,seedZ; // Seed variables for RNG
integer random_seed;
//------------------------------------------------------------------//
// Define mag(x, y, z)
//------------------------------------------------------------------//
analog function real mag;
input x, y, z;
real x, y, z;
begin
mag = sqrt(pow(x,2)+pow(y,2)+pow(z,2));
end
endfunction
analog begin
random_seed = 1;
seedX = $random+random_seed;
seedY = $random+random_seed;
seedZ = $random+random_seed;
randomX = $rdist_normal(seedX, 0.0, 1.0);
randomY = $rdist_normal(seedY, 0.0, 1.0);
randomZ = $rdist_normal(seedZ, 0.0, 1.0);
sigmaX = randomX/mag(randomX, randomY, randomZ);
sigmaY = randomY/mag(randomX, randomY, randomZ);
sigmaZ = randomZ/mag(randomX, randomY, randomZ);
V(mb) <+ randomX;
V(mc) <+ randomY;
V(md) <+ randomZ;
V(me) <+ sigmaX;
V(mf) <+ sigmaY;
V(mg) <+ sigmaZ;
end
endmodule
I used HSPICE 2019 to test the random number output at each simulation step, by running the folloing test_solver.sp file:
Title Simple
.option post=1
.option probe=0
*.option runlvl=4
.option ingold=2
*.option accurate=1
*.option method=bdf
*.option bdfrtol=1e-5
*.option bdfatol=1e-5
.option numdgt=4
.option brief
.option measfile=1
.option lis_new=1
.option vaopts=str('-G')
.save
.hdl ./test_solver.va
vin 1 0 PULSE(0 0.5 2NS 1NS 1NS 10NS 20NS)
X 1 0 2 3 4 5 6 7 test_va
.tran 0.01n 20.0n 1E-10 uic
.print tran V(1) V(2) V(3) V(4) V(5) V(6) V(7)
.end
However, I noticed that it always generates an identical random number set (sigmaX, sigmaY, sigmaZ) if I run in HSPICE consecutively. But my requirement is to have different random number sets when running the same code consecutively.
I also noticed that if I change random_seed=1 in the test_solver.va file, for example, to random_seed=2 (or 3 or 4 ...) and run in HSPICE, it will generate a different random number set than before. But it still generates the same set when running the same code consecutively.
So I wonder if there is anything wrong with my test_solver.va code, or we have to change "random_seed=1" every time. Then it might not be easy to realize if I integrate this code into others and run many times.

First of all, pseudo-random number generators are deterministic. That means if you start with the same seed you will always get the same result.
I'm not aware of any way to do what you want directly in Verilog-A. I think that you will need to write your own function in 'C'. One technique that is often used is to call a high resolution timer and assume that the time in micro- or nanoseconds is essentially random. Alternatively you can call a function like getrandom().
The next problem is getting the 'C' random value back to your Verilog-A. I'm not familiar with HSPICE, but this can be done with Verilog PLI on some other simulators.
Alternatively you could wrap your simulation in a shell script and do something like this
script: read /dev/urandom and write a random number to a file
run hspice
in your Verilog-A use a system task like $fread to read the file that the script produced

Is there a method in verilog to start reading ROM data from a specific address?

I've designed a ROM for coefficients and an up-down counter to read these coefficients one by one but there are two cases for the starting point where a specific number of coefficients for type1 and another set of coefficients for type 2 ... so for example for type 1 I want to start from address zero and for type 2 start from address 30 ... I remember that someone told me it is possible using some # or something but I don't remember what is the actual way to do this
this for my counter code
module UDcounter(input clk,rst,up,GItype,
output reg [5:0]addr);
always #(posedge clk,posedge rst)
if (rst)
addr<=6'b0;
else
begin
if (GItype) //assume 1 is a long GI type
begin
// addr=6'b000000;
if (up)
addr=addr+1;
else addr=addr-1;
end
else //for short GI
begin
//addr=6'b100000;
if (up)
addr=addr+1;
else addr=addr-1;
end
end
endmodule
the error here is that every clock cycle it start addressing from addr=0 for example and the output address is always 1 (for the +1) line

So what I understood from your question is that you want to design a ROM which will store coefficients.
Going by your question I assume that you have two types of coefficient viz type a & type b stored in the ROM, say the starting address for type a is 0 and for type b is 30. To go about accessing the ROM you would want two counters viz addr_ptr_a and addr_ptr_b which will act as address pointers, lets assume that the ROM has about 60 address locations then addr_ptr_a will count from 0 to 29 and addr_ptr_b will count from 30 to 60.
The GItype signal can be used to determine which counter to enable.
I am assuming a sequential read operation, for a random read operation you would need a separate logic to generate the read address.

Loop Convergence - Verilog Synthesis

I am trying to successively subtract a particular number to get the last digit of the number (without division). For example when q=54, we get q=4 after the loop. Same goes for q=205, output is q=5.
if(q>10)
while(q>10)
begin
q=q-10;
end
The iteration should converge logically. However, I am getting an error:
"[Synth 8-3380] loop condition does not converge after 2000 iterations"
I checked the post - Use of For loop in always block. It says that the number of iterations in a loop must be fixed.
Then I tried to implement this loop with fixed iterations as well like below (just for checking if this atleast synthesizes):
if(q>10)
while(loopco<9)
begin
q=q-10;
loopco=loopco-1;
end
But the above does not work too. Getting the same error "[Synth 8-3380] loop condition does not converge after 2000 iterations". Logically, it should be 10 iterations as I had declared the value of loopco=8.
Any suggestions on how to implement the above functionality in verilog will be helpful.

That code can not be synthesized. For synthesis the loop has to have a compile time known number of iterations. Thus it has to know how many subtractions to make. In this case it can't.
Never forget that for synthesis you are converting a language to hardware. In this case the tool needs to generate the code for N subtractions but the value of N is not known.
You are already stating that you are trying to avoid division. That suggest to me you know the generic division operator can not be synthesized. Trying to work around that using repeated subtract will not work. You should have been suspicious: If it was the easy it would have been done by now.
You could build it yourself if you know the upper limit of q (which you do from the number of bits):
wire [5:0] q;
reg [3:0] rem;
always #( * )
if (q<6'd10)
rem = q;
else if (q<6'd20)
rem = q - 6'd10;
else if (q<6'd30)
rem = q - 6'd20;
etc.
else
rem = q - 6'd60;
Just noticed this link which pops up next to your question which shows it has been asked in the past:
How to NOT use while() loops in verilog (for synthesis)?

Stacking and dynamic programing

Basically I'm trying to solve this problem :
Given N unit cube blocks, find the smaller number of piles to make in order to use all the blocks. A pile is either a cube or a pyramid. For example two valid piles are the cube 4 *4 *4=64 using 64 blocks, and the pyramid 1²+2²+3²+4²=30 using 30 blocks.
However, I can't find the right angle to approach it. I feel like it's similar to the knapsack problem, but yet, couldn't find an implementation.
Any help would be much appreciated !

First I will give a recurrence relation which will permit to solve the problem recursively. Given N, let
SQUARE-NUMS
TRIANGLE-NUMS
be the subset of square numbers and triangle numbers in {1,...,N} respectively. Let PERMITTED_SIZES be the union of these. Note that, as 1 occurs in PERMITTED_SIZES, any instance is feasible and yields a nonnegative optimum.
The follwing function in pseudocode will solve the problem in the question recursively.
int MinimumNumberOfPiles(int N)
{
int Result = 1 + min { MinimumNumberOfPiles(N-i) }
where i in PERMITTED_SIZES and i smaller than N;
return Result;
}
The idea is to choose a permitted bin size for the items, remove these items (which makes the problem instance smaller) and solve recursively for the smaller instances. To use dynamic programming in order to circumvent multiple evaluation of the same subproblem, one would use a one-dimensional state space, namely an array A[N] where A[i] is the minimum number of piles needed for i unit blocks. Using this state space, the problem can be solved iteratively as follows.
for (int i = 0; i < N; i++)
{
if i is 0 set A[i] to 0,
if i occurs in PERMITTED_SIZES, set A[i] to 1,
set A[i] to positive infinity otherwise;
}
This initializes the states which are known beforehand and correspond to the base cases in the above recursion. Next, the missing states are filled using the following loop.
for (int i = 0; i <= N; i++)
{
if (A[i] is positive infinity)
{
A[i] = 1 + min { A[i-j] : j is in PERMITTED_SIZES and j is smaller than i }
}
}
The desired optimal value will be found in A[N]. Note that this algorithm only calculates the minimum number of piles, but not the piles themselves; if a suitable partition is needed, it has to be found either by backtracking or by maintaining additional auxiliary data structures.
In total, provided that PERMITTED_SIZES is known, the problem can be solved in O(N^2) steps, as PERMITTED_SIZES contains at most N values.
The problem can be seen as an adaptation of the Rod Cutting Problem where each square or triangle size has value 0 and every other size has value 1, and the objective is to minimize the total value.
In total, an additional computation cost is necessary to generate PERMITTED_SIZES from the input.
More precisely, the corresponding choice of piles, once A is filled, can be generated using backtracking as follows.
int i = N; // i is the total amount still to be distributed
while ( i > 0 )
{
choose j such that
j is in PERMITTED_SIZES and j is smaller than i
and
A[i] = 1 + A[i-j] is minimized
Output "Take a set of size" + j; // or just output j, which is the set size
// the part above can be commented as "let's find out how
// the value in A[i] was generated"
set i = i-j; // decrease amount to distribute
}

How do you move non-zero elements in an array to the top in a single cycle?

I have the following 8-bit array:
0
4
0
0
5
0
2
0
How do I make it to the following in a single cycle (without iterating the element one by one)?
4
5
2
0
0
0
0
0
I know how to do it in software (MATLAB), but I'm not sure how to do it with combinational logic.
% initialise temporary vectors
TempType = zeros(maxType,1);
TempStart = zeros(maxType,1);
TempStop = zeros(maxType,1);
index = 1;
% remove zero elements from the middle
for j = 1:maxType
if (PreType(j) > 0 && PreStart(j) > 0 && PreStop(j) > 0)
TempType(index) = PreType(j);
TempStart(index) = PreStart(j);
TempStop(index) = PreStop(j);
index = index + 1;
end
end

I think any simplified sorting algorithm can do the job. For example, here is a modified bubble sort solution implemented in a single cycle:
module MoveZeros;
parameter W1 = 8;
parameter W2 = 10;
integer i, j;
logic [W1-1:0] array[W2-1:0] = {0,4,0,0,5,0,2,0,0,1};
logic [W1-1:0] temp;
always_comb begin
for (i=W2-1 ; i >=0 ; i=i-1)
for (j=W2-1 ; j >= 0 ; j=j-1)
begin
if (array[j]==0 && array[j-1] != 0) begin
temp = array[j];
array[j] = array [j-1];
array[j-1] = temp;
end
end
end
endmodule
output:
# array = '{4, 5, 2, 1, 0, 0, 0, 0, 0, 0}
Working example on edaplayground. Depending on your cycle time and the width of your input array (W2), you may want to break this algorithm into multiple cycles.
Synthesis tools unroll loops, therefore, the synthesized circuit will have O(W2^2) comparators and multiplexers, which can explode. Hence for bigger arrays, a multi-cycle solution is the way to go.

This is not an answer, which would take several hours of work, but SO's comments are not up to this sort of question. You should ask on comp.arch.fpga, if it's still alive.
Start by finding a datasheet for one of the old asynchronous fall-through FIFOs; these will include a circuit diagram. You don't really want to do anything like this, because the stage-to-stage handshaking is hairy, and you can't apply all 8 values simultaneously, but it'll give you ideas for a more synchronous implementation. Adapting a fall-through FIFO to do what you want is trivial - just ignore zero inputs.
If you can go up to 8 clock cycles, a more synchronous implementation is easy, with relatively limited hardware.
One cycle doesn't look too difficult, but will use more hardware. How sure are you that you must do it in one cycle? How much hardware can you use? If you've got a free PLL/DLL I'd be inclined to use that to get an 8x clock.
EDIT
Actually, with the benefit of more than 2 minutes thought, this seems pretty easy, even in one cycle.
Say you've got 8 registers with your 8 inputs (I0-I7), and 8 output registers (Q0-Q7). Each output register has associated logic which selects an input register for source data. The Q0 selector finds the lowest-numbered I register which contains non-zero data. The Q1 selector finds the next highest I register which contains non-zero data, and so on. Each selector drives a mux which loads the corresponding output register. Q0 requires an 8-1 mux (eight 8-bit inputs from I0-I7, one 8-bit output which goes to the input of Q0). Q1 requires a 7-1 mux (the inputs can only be I1-I7), and so on, until Q7, which doesn't require a mux at all (it can only be driven by I7).
The only smarts are in the selectors which find the source data for each output register. The Q7 selector is trivial; Q7 can only select I7, and only if all of I0-I7 contain non-zero data. Q6 is a bit more complicated, and so on.
If you can't see how to code a selector, ask specifically about that one in a new question, to avoid all the comments.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string