Execution order of initial and always blocks in Verilog - verilog

I'm new to Verilog programming and would like to know how the Verilog program is executed. Does all initial and always block execution begin at time t = 0, or does initial block execution begin at time t = 0 and all always blocks begin after initial block execution? I examined the Verilog program's abstract syntax tree, and all initial and always blocks begin at the same hierarchical level. Thank you very much.

All initial and all always blocks throughout your design create concurrent processes that start at time 0. The ordering is indeterminate as far as the LRM is concerned. But may be repeatable for debug purposes when executing the same version of the same simulation tool. In other words, never rely on the simulation ordering to make you code execute properly.

Verilog requires event-driven simulation. As such, order of execution of all 'always' blocks and 'assign' statements depends on the flow of those events. Signal updated one block will cause execution of all other blocks which depend on those signals.
The difference between always blocks and initial blocks is that the latter is executed unconditionally at time 0 and usually produces some initial events, like generation of clocks and/or schedule reset signals. So, in a sense, initial blocks are executed first, before other blocks react to the events which are produced by them.
But, there is no execution order across multiple initial blocks or across initial blocks and always blocks which were forced into execution by other initial blocks.
In addition, there are other ways to generate events besides initial blocks.

In practice, nobody cares, and you shouldn't either.
On actual hardware, the chip immediately after powering-up is very unstable because of the transient states of the power supply circuit, hence its initial states untrustworthy.
The method to ensure initial state in practice is to set them in the initial block as
always # (event) {
if(~n_reset) {
initial_state=0
} else {
do_something();
}
}

Related

Understanding Verilog Code with two Clocks

I am pretty new to Verilog
and I use it to verify some code from a simulation program.
Right now I am struggeling if a verilog code snippet because the simulation programm uses 2 clocks ( one system clock and a pll of this ) where two hardware componentes work together, thus synchronize each other:
module something (input data)
reg vid;
always #(posegde sys_clk)
vid <= data;
always #(posegde pll_clk)
if (vid)
// do something
When reading about non blocking assignments it says the evaluation of the left-hand side is postponed until other evaluations in the current time step are completed.
Intuitive I thought this means they are evaluated at the end of the time step, thus if data changes from 0 to 1 in sys_clk tick "A", this means at the end of "A" and the beginning of next sys_clk tick this value is in vid and so only after "A" the second always block ( of pll_clk) can read vid = 1
Is this how it works or did i miss something ?
Thank you :)
In this particular case it means that
if posedge sys_clk and pll_clk happen simultaneously then vid will not have a chance to update before it gets used in the pll_clk block. So, if vid was '0' before the clock edges (and is updated to '1' in the first block), it will still be '0' in the if statement of the second block. This sequence is guaranteed by use of the non-blocking assignment in the first block
if the posedges are not happening at the same time, then the value of vid will be updated at posedge sys_clk and picked up later at the following posedge of pll_clk.
In simulation non-blocking assignment guarantees that the assignment itself happens after all the blocks are evaluated in the current clock tick. It has nothing to do with the next clock cycle. However, the latter is often used in tutorials to illustrate a particular single-clock situation, creating confusion.
Also being simultaneous is a simulation abstraction, meaning that both edges happen in the same clock tick (or within a certain small time interval in hardware).

SystemVerilog calculations right before writing to clocking block

I have a task, whose job it is to drive data onto a bus via a clocking block. See snippet:
task effects_driver::ReadQueueData();
stream_intf_.cycles(); // clocking block event
if (sample_q_.size() >= 2) // check to see if there is enough data in the queue
begin
automatic bit [31:0] sample0, sample1;
sample0 = sample_q_.pop_front(); // read from queue
sample1 = sample_q_.pop_front(); // read from queue
stream_intf_.cb.AXIS_TDATA <= {sample1, sample0}; // drive two samples at once
stream_intf_.cb.AXIS_TVALID <= 1;
end
else
...
endtask
You'll notice that I need to read a couple of items out of a queue before writing it to the clocking block. Is this the correct way to do it? Am I guaranteed that the simulator will perform these blocking assignments to the automatic variable before writing it to the clocking block?
P.S. I run into this scenario semi-frequently--where I need to do some quick calculations on the fly right before writing to the clocking block.
I believe you meant to ask "Am I guaranteed that the simulator will perform these blocking assignments to the automatic variable before writing it to the clocking block?" because that is what your code is doing.
The answer to that is yes, blocking assignments are guaranteed to complete before executing the statement following it.
Also note that there is no need to declare sample0 and sample1 with automatic lifetimes because class methods always have automatic lifetimes. Variables declared within them are implicitly automatic.

Behaviour of Blocking Assignments inside Tasks called from within always_ff blocks

Have looked for an answer to this question online everywhere but I haven't managed to find an answer yet.
I've got a SystemVerilog project at the moment where I've implemented a circular buffer in a separate module to the main module. The queue module itself has a synchronous portion that acquires data from a set of signals but it also has a combinatorial section that responds to an input. Now when I want to query the state of this queue in my main module a task, inside an always_ff block sets the input using a blocking assignment, then the next statement reads the output and acts on that.
An example would look something like this in almost SystemVerilog:
module foo(clk, ...)
queue = queue(clk, ...)
always_ff#(posedge clk)
begin
check_queue(...)
end
task check_queue();
begin
query_in = 3;
if (query_out == 5)
begin
<<THINGS HAPPEN>>
end
end
endtask
endmodule
module queue(clk, query_in, query_out)
always_comb
begin
query_out = query_in + 2;
end
endmodule
My question essentially comes down to, does this idea work? In my head because the queue is combinatorial it should respond as soon as the input stimulus is applied it should be fine but because it's within a task within an always_ff block I'm a bit concerned about the use of blocking assignments.
Can someone help? If you need more information then let me know and I can give some clarifications.
This creates a race condition and most likely will not work. It has nothing to do with your use of a task. You are trying to read the value of a signal (queue_out) that is being assigned in another concurrent process. Whether it gets updated or not by the time you get to the If statement is a race. U?se a non-blocking assignment to all variable that go outside of the always_ff block and it guarantees you get the previous value.
in order to figure out the stuff, you can just mentally inline the task inside the always_ff. BTW, it really looks like a function in your case. Now, remember that execution of any always block must finish before any other is executed. So, the following will never evaluate to '5' at the same clock edge:
query_in = 3;
if (query_out == 5)
query_out will become 5 after this block (your task) is evaluated and will be ready at the next clock edge only. So, you are supposed to get a one cycle delay.
You need to split it into several always blocks.

Execution in verilog sequentially or concurrently

I am new to verilog and am finding the execution of verilog tricky. How does the execution occurs in a verilog program. Say I have 2 modules and a testbench-
module module1(clock,A,B,C);
input A,B,clock;
output C;
assign c=A+B;
endmodule
module module2(clock,A,B,C);
input A,B,clock;
output C;
assign C=A-B;
endmodule
module testbench;
reg A,B,clock;
wire C;
module1 m1(clock,A,B,C);
module2 m2(clock,A,B,C);
initial
clock=1'b0;
always
#4 clock=~clock;
endmodule
I understand all initial blocks start at time 0.But are these initial blocks then executed sequentially i.e. if a initial block has more than one lines, will all of them executed sequentially or concurrently. Also, how does module execution take place?Will module1 start first as it appears before module2 in testbench and finish completely and then module2 start or both run concurrently. What happens when clock changes after 4sec, will the module running stop in between if clock changes or will it complete its previous execution and then start again with new clock?
In verilog, instantiation of a module means adding physical hardware to your board.
Modules are nothing but small hardware blocks that work concurrently. Every module can have some procedural blocks, continuous assignment statements or both.
Every procedural block executes concurrently, similar applies to continuous assignment statements.
I refer as this:
Procedural blocks: initial, always etc. blocks.
Continuous assignment: assign, force etc.
So, no matter in what sequence you instantiate modules, all are going to work in parallel.
Here comes the concept of timestamp. Each timestamp contains active, inactive and NBA regions. Refer to figure here:
For each timestamp, all the instances are checked in every region. If any execution is to be done in let's say module1 then it is done, in parallel, other module let's say module2 is also checked. If there is some dependency between the modules, then they are executed again.
Here, in your example, c is a single wire, and output of both modules, this generates a race around condition between modules, which is of course not good.
Think from hardware perspective. Two or more different hardware blocks can have same inputs but can not have same outputs. So, the output wires must be different.
module testbench;
reg A,B,clock;
wire C1,C2; // different wires
module1 m1(clock,A,B,C1);
module2 m2(clock,A,B,C2);
initial clock=1'b0;
always #4 clock=~clock;
endmodule
Also, here the modules have continuous assignment, so there is no effect of clock. No the modules are running in between the clocks also. It's just that there are no events scheduled in those timestamps.
As we know now, all procedural blocks are executed in parallel. But the contents inside procedural block is executed sequentially. To make the contents in concurrent, fork..join construct is used. For example:
initial
begin
a<=0;
#5;
b<=1; // b is assigned at 5ns
end
initial
fork
a<=0;
#5;
b<=1; // b is assigned at 0ns
join
Refer to Verilog Procedural Blocks, Concurrent and Sequential Statements sites for further information.
Another way to think about this from a simulation point of view
All of the initial, always, and continuous assign statements in your design execute concurrently starting at time 0. It doesn't matter whether they are in different modules or not - they are all equally concurrent. The elaboration step flattens out all of your module instances. All that is left are hierarchical names for things that were inside those modules.
Now, unless you are running the simulation on massively parallel CPUs (essentially running on the real synthesized hardware does), there is no way to actually run all of these processes concurrently, A software simulator has to choose one process to go first. You just can't rely on which one it chooses.
That is what the Verilog algorithm does. It puts everything scheduled to run at time 0 into an event queue (active queue), and starts executing each process one at time. It executes each process until it finishes, or it has to block waiting for some delay or a signal to change. It the process has to block, it gets suspended and put onto another queue. Then the next process in the current queue starts executing, and these steps keep repeating until the current queue is empty.
Then the scheduling algorithm picks another queue to become the active queue, and advances time if that queue is scheduled with some delay.

Which region are continuous assignments and primitive instantiations with #0 scheduled

All #0 related code examples I have found are related to procedural code (IE code inside begin-end). What about continuous assignments and primitive instantiations? The IEEE 1364 & IEEE 1800 (Verilog & SystemVerilog respectively) only give a one line description that I can find (Quoting all version of IEEE 1364 under the section name "The stratified event queue"):
An explicit zero delay (#0) requires that the process be suspended and added as an inactive event for the current time so that the process is resumed in the next simulation cycle in the current time.
I read documents and talked with a few engineers that have been working with Verilog long before the IEEE Std 1364-1995. In summary, the inactive region was failed solution to synchronizing flip-flops with Verilog's indeterminate processing order. Later Verilog created non-blocking assignments (<=) and resolved the synchronizing with indeterminate order. The inactive region was left in the scheduler to not break legacy code and a few obscure corner cases. Modern guidelines say to avoid the using #0 because it creates race conditions and may hinder simulation performance. The performance impact is a don't care for small designs. I run huge designs that with mixed RTL to transistor level modules. So even small performance gains add up and not having to debug a rouge race conditions are time savers.
I've ran test case removing/adding #0 to Verilog primitives on large scale designs. Some simulators have notable changes others do not. It is difficult to tell who is doing a better job following the LRM or has a smarter optimizer.
Adding a per-compile script to remove hard coded forms of #0, is easy enough. The challenge is with parameterized delay. Do I really need to create generate blocks for to avoid the inactive region? Feels like it could introduce more problems than solve:
generate
if ( RISE > 0 || FALL > 0)
tranif1 #(RISE,FALL) ipassgate ( D, S, G );
else
tranif1 ipassgate ( D, S, G );
if ( RISE > 0 || FALL > 0 || DECAY > 0)
cmos #(RISE,FALL,DECAY) i1 ( out, in, NG, PG );
else
cmos i1 ( out, in, NG, PG );
if (DELAY > 0)
assign #(DELAY) io = drive ? data : 'z;
else
assign io = drive ? data : 'z;
endgenerate
Verilog primitives and continuous assignments have been with Verilog since the beginning. I believe parameterized delay has been around longer then the inactive region. I haven't found any documentation on recommendation or explanation for these conditions. My local network of Verilog/SystemVerilog gurus are all unsure which region it should run in. Is there a detail we are all overlooking or is it a gray area in the language? If it is a gray area, how do I determine which way it is implanted?
An accepted answer should include a citation to any version of IEEE1364 or IEEE1800. Or at least a way to do proof of concept testing.
This is an easy one. Section 28.16 Gate and net delays of the 1800-2012 LRM as well as section 7.14 Gate and net delays of the 1364-2005 LRM both say
For both gates and nets, the default delay shall be zero when no delay
specification is given.
So that means
gateName instanceName (pins);
is equivalent to writing
gateName #0 instanceName (pins);
I'm not sure where the text you quoted came from, but section 4.4.2.3 Inactive events region of the 1800-2012 LRM says
If events are being executed in the active region set, an explicit #0
delay control requires the process to be suspended and an event to be
scheduled into the Inactive region of the current time slot so that
the process can be resumed in the next Inactive to Active iteration.
The key text is delay control, which is a procedural construct. So #0 as an inactive event only applies to procedural statements.
The problem with procedural #0's is that they move race conditions, they don't eliminate them. Sometimes you have to add multiple serial #0's to move away from a races condition, but you don't always know how many because another piece of code is also adding #0's. Just look at the UVM code; it's littered with messy #0's because they did not take the time to code things properly.

Resources