All #0 related code examples I have found are related to procedural code (IE code inside begin-end). What about continuous assignments and primitive instantiations? The IEEE 1364 & IEEE 1800 (Verilog & SystemVerilog respectively) only give a one line description that I can find (Quoting all version of IEEE 1364 under the section name "The stratified event queue"):
An explicit zero delay (#0) requires that the process be suspended and added as an inactive event for the current time so that the process is resumed in the next simulation cycle in the current time.
I read documents and talked with a few engineers that have been working with Verilog long before the IEEE Std 1364-1995. In summary, the inactive region was failed solution to synchronizing flip-flops with Verilog's indeterminate processing order. Later Verilog created non-blocking assignments (<=) and resolved the synchronizing with indeterminate order. The inactive region was left in the scheduler to not break legacy code and a few obscure corner cases. Modern guidelines say to avoid the using #0 because it creates race conditions and may hinder simulation performance. The performance impact is a don't care for small designs. I run huge designs that with mixed RTL to transistor level modules. So even small performance gains add up and not having to debug a rouge race conditions are time savers.
I've ran test case removing/adding #0 to Verilog primitives on large scale designs. Some simulators have notable changes others do not. It is difficult to tell who is doing a better job following the LRM or has a smarter optimizer.
Adding a per-compile script to remove hard coded forms of #0, is easy enough. The challenge is with parameterized delay. Do I really need to create generate blocks for to avoid the inactive region? Feels like it could introduce more problems than solve:
generate
if ( RISE > 0 || FALL > 0)
tranif1 #(RISE,FALL) ipassgate ( D, S, G );
else
tranif1 ipassgate ( D, S, G );
if ( RISE > 0 || FALL > 0 || DECAY > 0)
cmos #(RISE,FALL,DECAY) i1 ( out, in, NG, PG );
else
cmos i1 ( out, in, NG, PG );
if (DELAY > 0)
assign #(DELAY) io = drive ? data : 'z;
else
assign io = drive ? data : 'z;
endgenerate
Verilog primitives and continuous assignments have been with Verilog since the beginning. I believe parameterized delay has been around longer then the inactive region. I haven't found any documentation on recommendation or explanation for these conditions. My local network of Verilog/SystemVerilog gurus are all unsure which region it should run in. Is there a detail we are all overlooking or is it a gray area in the language? If it is a gray area, how do I determine which way it is implanted?
An accepted answer should include a citation to any version of IEEE1364 or IEEE1800. Or at least a way to do proof of concept testing.
This is an easy one. Section 28.16 Gate and net delays of the 1800-2012 LRM as well as section 7.14 Gate and net delays of the 1364-2005 LRM both say
For both gates and nets, the default delay shall be zero when no delay
specification is given.
So that means
gateName instanceName (pins);
is equivalent to writing
gateName #0 instanceName (pins);
I'm not sure where the text you quoted came from, but section 4.4.2.3 Inactive events region of the 1800-2012 LRM says
If events are being executed in the active region set, an explicit #0
delay control requires the process to be suspended and an event to be
scheduled into the Inactive region of the current time slot so that
the process can be resumed in the next Inactive to Active iteration.
The key text is delay control, which is a procedural construct. So #0 as an inactive event only applies to procedural statements.
The problem with procedural #0's is that they move race conditions, they don't eliminate them. Sometimes you have to add multiple serial #0's to move away from a races condition, but you don't always know how many because another piece of code is also adding #0's. Just look at the UVM code; it's littered with messy #0's because they did not take the time to code things properly.
Related
I'm new to Verilog programming and would like to know how the Verilog program is executed. Does all initial and always block execution begin at time t = 0, or does initial block execution begin at time t = 0 and all always blocks begin after initial block execution? I examined the Verilog program's abstract syntax tree, and all initial and always blocks begin at the same hierarchical level. Thank you very much.
All initial and all always blocks throughout your design create concurrent processes that start at time 0. The ordering is indeterminate as far as the LRM is concerned. But may be repeatable for debug purposes when executing the same version of the same simulation tool. In other words, never rely on the simulation ordering to make you code execute properly.
Verilog requires event-driven simulation. As such, order of execution of all 'always' blocks and 'assign' statements depends on the flow of those events. Signal updated one block will cause execution of all other blocks which depend on those signals.
The difference between always blocks and initial blocks is that the latter is executed unconditionally at time 0 and usually produces some initial events, like generation of clocks and/or schedule reset signals. So, in a sense, initial blocks are executed first, before other blocks react to the events which are produced by them.
But, there is no execution order across multiple initial blocks or across initial blocks and always blocks which were forced into execution by other initial blocks.
In addition, there are other ways to generate events besides initial blocks.
In practice, nobody cares, and you shouldn't either.
On actual hardware, the chip immediately after powering-up is very unstable because of the transient states of the power supply circuit, hence its initial states untrustworthy.
The method to ensure initial state in practice is to set them in the initial block as
always # (event) {
if(~n_reset) {
initial_state=0
} else {
do_something();
}
}
I am pretty new to Verilog
and I use it to verify some code from a simulation program.
Right now I am struggeling if a verilog code snippet because the simulation programm uses 2 clocks ( one system clock and a pll of this ) where two hardware componentes work together, thus synchronize each other:
module something (input data)
reg vid;
always #(posegde sys_clk)
vid <= data;
always #(posegde pll_clk)
if (vid)
// do something
When reading about non blocking assignments it says the evaluation of the left-hand side is postponed until other evaluations in the current time step are completed.
Intuitive I thought this means they are evaluated at the end of the time step, thus if data changes from 0 to 1 in sys_clk tick "A", this means at the end of "A" and the beginning of next sys_clk tick this value is in vid and so only after "A" the second always block ( of pll_clk) can read vid = 1
Is this how it works or did i miss something ?
Thank you :)
In this particular case it means that
if posedge sys_clk and pll_clk happen simultaneously then vid will not have a chance to update before it gets used in the pll_clk block. So, if vid was '0' before the clock edges (and is updated to '1' in the first block), it will still be '0' in the if statement of the second block. This sequence is guaranteed by use of the non-blocking assignment in the first block
if the posedges are not happening at the same time, then the value of vid will be updated at posedge sys_clk and picked up later at the following posedge of pll_clk.
In simulation non-blocking assignment guarantees that the assignment itself happens after all the blocks are evaluated in the current clock tick. It has nothing to do with the next clock cycle. However, the latter is often used in tutorials to illustrate a particular single-clock situation, creating confusion.
Also being simultaneous is a simulation abstraction, meaning that both edges happen in the same clock tick (or within a certain small time interval in hardware).
I am facing some doubts regarding the nondeterminism in Verilog Scheduling Semantics mentioned in the Verilog LRM. Below is the excerpt which I am unable to understand:
"Another source of nondeterminism is that statements without time-control constructs in behavioral blocks do not have to be
executed as one event. Time control statements are the # expression
and # expression constructs (see 9.7). At any time while evaluating a
behavioral statement, the simulator may suspend execution and place
the partially completed event as a pending active event on the event
queue. The effect of this is to allow the interleaving of process
execution. Note that the order of interleaved execution is
non-deterministic and not under control of the user."
The only inference I could make was that statements in a behavioral block may be paused for the execution of other behavioral blocks (which are active in the same timestep) so as to interleave process execution though I am not sure.
Also, I don't understand the line "statements without time-control constructs in behavioral
blocks do not have to be executed as one event". What does the LRM mean by saying it doesn't execute as one event and what would happen if a behavioral block would contain all time-controlled statements?
Can anyone please explain this with the help of some examples? Thanks in advance.
The only thing which simulation guarantees is that all statements in the always blocks will be executed sequentially. Say, as in the following block:
always #(b,c,e,f) begin
a = b | c;
d = e & f;
g = a ^ d ^ x;
...
end
However, the simulator can decide to execute first 2 statements in a row, but then stop execution of this block before the last statement and let other blocks to continue. Then it will return to the last statement. In this sense you have a non-deterministic order of the execution of the statements.
Guess what?! Value of x can definitely change while it is waiting. a and d can potentially also change while other statements are executed. So, result of g could be non-deterministic. Good programming will help to remove this type of non-determinism: list all events in the sensitivity list, do not multiply-drive signals, ... The simulator then will do the best to avoid those situations.
The need for this interleave simulation is to allow simulators to do better optimization for performance reasons and allow other blocks to progress in case of the very long and loopy statements.
Answering the comment about time and event controls
In the above block a simulator can decide when to break the execution. From the point of view of the programmer, it is non-deterministic. You do not know when it can happen. The only thing that is known that it would happen in the same simulation (time) tick. A good simulator will try its best to avoid any side effect of this.
Timing and delay controls provide deterministic stops. For example,
always #(b,c,e,f) begin
a = b | c;
d = e & f;
#1 g = a ^ d ^ x;
...
end
In the above statement with #1 you actually tell the simulator to stop execution of the statements and wait till the next time tick.
always #(b,c,e,f) begin
a = b | c;
d = e & f;
#(posedge clk)
g = a ^ d ^ x;
...
end
Here it will stop execution and wait for the posedge clk event.
Note that above examples are not synthesizable and should be used in behavioral code (test bench) only. For synthesis you have to stick with the very first example and make sure that your code is written in accordance to the good Verilog coding practices.
I am self-teaching myself VHDL, and took this answer's advice by starting with Frank Vahid's Digital Design (2nd edition).
On page 254 (link) he explains that updates to storage items occur on the next rising clock edge, therefore, at the end of a state's clock cycle. This is explained in figure 5.14 using the "Jreg" storage item, the behavior of which I understand.
What I don't understand, is why the storage item "P" behaves differently:
1) Its value is already known upon entering state S0 (whereas "Jreg" is not)
2) Upon transitioning to state S1, "P" immediately updates to the value given by state S1, whereas "Jreg" is not updated until the end of that clock cycle
Is there a difference between "Jreg" and "P" I'm not aware of? Is it an error in the example?
It appears that P is a combinational signal (not dependent on a clock) and that Jreg is sequential register (is dependent on a clock). Jreg appears to behave as a counter (which requires a clock or drive signal of some sort). The example says that the machine waits for an input signal B to go high and once it does, it sets output P high. Then, using Jreg to count the number of clock cycles passed since B went high, it holds P high until Jreg counts to a certain number of clock cycles (2) at which point both Jreg and P are reset to 0.
1) Jreg is unknown at the start and most likely so is P, however P does not have to wait for a clock tick because its a combinational signal.
2) Again Jreg is sequential so it must wait for a clock tick to change its state.
Everywhere it is mentioned this as a guideline, but after lot of thought i want to know what harm will it cause if we use Nonblocking statement inside Always Block for Combinatorial also. I won't be mixing the two together. But what i feel is when we use Nonblocking for Combinatorial statements in Always Block it represents the hardware more accurately. Does it not...?
For Example, if we take the following circuit:
In this diagram when the inputs a,b,c are supplied the outputs x1 and x will not be available instantly. There will be gate delays. first x1 will be available and then x will be available. If we use blocking statements both are available instantly. If we use nonblocking, it resembles the hardware more accurately.
For Example, if we take the following code based on the above diagram
module block_nonblock(output logic x,x1,y,y1,input logic a,b,c);
always#* begin : BLOCKING
x1 = a & b;
x = x1 & c;
end
always#* begin : NONBLOCKING
y1 <= a & b;
y <= y1 & c;
end
endmodule
This synthesizes as:
Both are synthesized as And gates, and give same simulation results but when we check for the changes in output in delta time, i feel the Non blocking matches the hardware more accurately as compared to the Blocking.
Also i went through : IEEE P1364.1 / D1.6 Draft Standard for VerilogĀ® Register Transfer Level Synthesis,
which specifies the use of non blocking for Sequential modeling but doesn't specify specifically using blocking for Combinational modeling using Always
Block. It says don't mix the two(Blocking and Nonblocking) in Combinational statements.
So, shouldn't we use nonblocking for combinational statements in always blocks which are dealing with pure combi logic (non sequential/ no clocks involved)
The harm is in simulation; in performance, and in race conditions.
Your NONBLOCKING code executes twice for every change in a or b. Non-blocking assignment updates are scheduled into a later queue of events, and this causes a much bigger rippling effect where blocks get repeatedly executed.
When you simulate RTL code, you are doing so in the absence of physical delays and synthesis tools understand how the logic is going to be implemented. But simulation tools cannot do this and also need to work with non-synthesizable code. They have to execute the code exactly as written. And they also have to deal with massive amounts of concurrency executing code on a processor with a single or limited number of threads. So simulation introduces race condition in software that would not exist in real hardware. Non-blocking assignments are there to prevent those race conditions when writing sequential logic. But they can have the opposite affect if you use them in combinational logic, especially when used in the combinational logic involved in the generation of clocks.
You ask "So, shouldn't we use nonblocking for combinational statemnts in Sequential?" The answer is No.
If you use non-blocking assignments for combinational logic in clocked always blocks, you will get more flip-flops than you expect. Basically, non-blocking assignments in clocked always blocks will behave like flip-flops when you simulate and infer flip-flops when you synthesise.
So,
1 - use blocking assignments for gates and
2 - use non-blocking assignments for flip-flops.
Your very own description of the behaviour of the circuit suggest actually the use of blocking operations.
You say: (emphasys mine)
In this diagram when the inputs a,b,c are supplied the outputs x1 and
x will not be available instantly. There will be gate delays. first x1
will be available and then x will be available. If we use blocking
statements both are available instantly. If we use nonblocking, it
resembles the hardware more accurately.
So you need x1 to be available before x. So your always block must use a blocking assignment, so...
always#* begin : BLOCKING
x1 = a & b;
x = x1 & c;
end
x1 first will have a known value, and then, and only then, x will have a value. If you use non blocking...
always#* begin : NON BLOCKING
x1 <= a & b;
x <= x1 & c;
end
You are telling the simulator that x1 and x will be evaluated at the same time.
Although for synthesis this may work, in simulation this won't, and you want to make sure your simulated circuit works as intended before going over the synthesis phase.
We should not use non-blocking assignments in Combinational block, because if we use non-blocking it will infer the transport delays in the design due to this our results will never come what we expected, but if we use blocking , this blocking will able to suppress the transport delays in the design and we can safely said that there is no glitches in the waveform. So it is recommended that we should use blocking statements in combinational designs.
I have found a satisfactory answer and need input for it. I feel we should use Nonblocking statements for both combinational and sequential statements.
For sequential it is pretty clear y we should use.
I will describe the reason for Combi Blocks.
For combinational segments we will use Nonblocking Statements because, when we use Blocking or NonBlocking statements, even though it gives us the same hardware or RTL in the end; it is the nonblocking statements which shows us the glitches in simulation. Theses glitches will be there in hardware as well (because of Gate Delays), so we can rectify them when we see them in Simulation so that they will cause less harm at a later stage of the design/development cycle.
In the circuit I originally mentioned in the question if we give the inputs together as (A = 1,B = 1,C = 0) and then change them together after say 10ns as (A=1,B=0,C=1) then we can see that there is a glitch. This glitch will be there in actual Hardware as well. But this is only shown in simulation by Nonblocking Statements Output (Y) and not by Blocking Statements Output (X). Once we see a glitch we van take additional measures to prevent this from happening, so that it doesnt happen in hardware.
Hence i feel it is safe to conclude that we must use Nonblocking statements for combi blocks.