SystemVerilog : fork - join and writing parallel testbenches - verilog

I am following the testbench example at this link:
http://www.verificationguide.com/p/systemverilog-testbench-example-00.html
I have two questions regarding fork-join statements. The test environment has the following tasks for initiating the test:
task test();
fork
gen.main();
driv.main();
join_any
endtask
task post_test();
wait(gen.ended.triggered);
wait(gen.repeat_count == driv.no_transactions);
endtask
task run;
pre_test();
test();
post_test();
$finish;
endtask
My first question is why do we wait for the generator event to be triggered in the post_test() task? why not instead do a regular fork-join which, as far as I understand, will wait for both threads to finish before continuing.
I read another Stack Overflow question (System Verilog fork join - Not actually parallel?) that said these threads are not actually executed in parallel in the CPU sense, but only in the simulation sense.
My second question is what are the point of fork-joins if they are not actually executed in parallel. There would be no performance benefit, so why not follow a sequential algorithm like:
while true:
Create new input
Feed input to module
Check output
To me this seems much simpler than the testbench example.
Thanks for your help!

without having the code for gen and driv, it is difficult to say. However, most likely both driv and gen are communicating with each other in some manner, i.e. gen produces data which driv consumes and drive something else.
If gen and driv are written in as gen input/cousume input fashion, than your loop would make sense, however, most likely they generate and consume data based on some events and cannot be easily split in such functions easily. Something like the following is usually much cleaner.
gen:
while() begin
wait(some event);
generateData;
prepareForTheNextEvent;
end
driv:
while() begin
wait(gen ready);
driveData;
end
so, for the above reason you cannot run them sequentially. They must run in parallel. For all programming purposes they are running in parallel. In more details they run in the same single thread, but verilog schedules their execution based on events generated in simulation. So, you need fork.
As for the join_any, I think, that the test in your case is supposed to finish when either of the threads is done. However the driver has also to finish all outstanding jobs before it can exit. Therefore there are those wait statements in the posttest task.

Related

Execution order of initial and always blocks in Verilog

I'm new to Verilog programming and would like to know how the Verilog program is executed. Does all initial and always block execution begin at time t = 0, or does initial block execution begin at time t = 0 and all always blocks begin after initial block execution? I examined the Verilog program's abstract syntax tree, and all initial and always blocks begin at the same hierarchical level. Thank you very much.
All initial and all always blocks throughout your design create concurrent processes that start at time 0. The ordering is indeterminate as far as the LRM is concerned. But may be repeatable for debug purposes when executing the same version of the same simulation tool. In other words, never rely on the simulation ordering to make you code execute properly.
Verilog requires event-driven simulation. As such, order of execution of all 'always' blocks and 'assign' statements depends on the flow of those events. Signal updated one block will cause execution of all other blocks which depend on those signals.
The difference between always blocks and initial blocks is that the latter is executed unconditionally at time 0 and usually produces some initial events, like generation of clocks and/or schedule reset signals. So, in a sense, initial blocks are executed first, before other blocks react to the events which are produced by them.
But, there is no execution order across multiple initial blocks or across initial blocks and always blocks which were forced into execution by other initial blocks.
In addition, there are other ways to generate events besides initial blocks.
In practice, nobody cares, and you shouldn't either.
On actual hardware, the chip immediately after powering-up is very unstable because of the transient states of the power supply circuit, hence its initial states untrustworthy.
The method to ensure initial state in practice is to set them in the initial block as
always # (event) {
if(~n_reset) {
initial_state=0
} else {
do_something();
}
}

Execution in verilog sequentially or concurrently

I am new to verilog and am finding the execution of verilog tricky. How does the execution occurs in a verilog program. Say I have 2 modules and a testbench-
module module1(clock,A,B,C);
input A,B,clock;
output C;
assign c=A+B;
endmodule
module module2(clock,A,B,C);
input A,B,clock;
output C;
assign C=A-B;
endmodule
module testbench;
reg A,B,clock;
wire C;
module1 m1(clock,A,B,C);
module2 m2(clock,A,B,C);
initial
clock=1'b0;
always
#4 clock=~clock;
endmodule
I understand all initial blocks start at time 0.But are these initial blocks then executed sequentially i.e. if a initial block has more than one lines, will all of them executed sequentially or concurrently. Also, how does module execution take place?Will module1 start first as it appears before module2 in testbench and finish completely and then module2 start or both run concurrently. What happens when clock changes after 4sec, will the module running stop in between if clock changes or will it complete its previous execution and then start again with new clock?
In verilog, instantiation of a module means adding physical hardware to your board.
Modules are nothing but small hardware blocks that work concurrently. Every module can have some procedural blocks, continuous assignment statements or both.
Every procedural block executes concurrently, similar applies to continuous assignment statements.
I refer as this:
Procedural blocks: initial, always etc. blocks.
Continuous assignment: assign, force etc.
So, no matter in what sequence you instantiate modules, all are going to work in parallel.
Here comes the concept of timestamp. Each timestamp contains active, inactive and NBA regions. Refer to figure here:
For each timestamp, all the instances are checked in every region. If any execution is to be done in let's say module1 then it is done, in parallel, other module let's say module2 is also checked. If there is some dependency between the modules, then they are executed again.
Here, in your example, c is a single wire, and output of both modules, this generates a race around condition between modules, which is of course not good.
Think from hardware perspective. Two or more different hardware blocks can have same inputs but can not have same outputs. So, the output wires must be different.
module testbench;
reg A,B,clock;
wire C1,C2; // different wires
module1 m1(clock,A,B,C1);
module2 m2(clock,A,B,C2);
initial clock=1'b0;
always #4 clock=~clock;
endmodule
Also, here the modules have continuous assignment, so there is no effect of clock. No the modules are running in between the clocks also. It's just that there are no events scheduled in those timestamps.
As we know now, all procedural blocks are executed in parallel. But the contents inside procedural block is executed sequentially. To make the contents in concurrent, fork..join construct is used. For example:
initial
begin
a<=0;
#5;
b<=1; // b is assigned at 5ns
end
initial
fork
a<=0;
#5;
b<=1; // b is assigned at 0ns
join
Refer to Verilog Procedural Blocks, Concurrent and Sequential Statements sites for further information.
Another way to think about this from a simulation point of view
All of the initial, always, and continuous assign statements in your design execute concurrently starting at time 0. It doesn't matter whether they are in different modules or not - they are all equally concurrent. The elaboration step flattens out all of your module instances. All that is left are hierarchical names for things that were inside those modules.
Now, unless you are running the simulation on massively parallel CPUs (essentially running on the real synthesized hardware does), there is no way to actually run all of these processes concurrently, A software simulator has to choose one process to go first. You just can't rely on which one it chooses.
That is what the Verilog algorithm does. It puts everything scheduled to run at time 0 into an event queue (active queue), and starts executing each process one at time. It executes each process until it finishes, or it has to block waiting for some delay or a signal to change. It the process has to block, it gets suspended and put onto another queue. Then the next process in the current queue starts executing, and these steps keep repeating until the current queue is empty.
Then the scheduling algorithm picks another queue to become the active queue, and advances time if that queue is scheduled with some delay.

How to use uvm_test_done objection in test sequence?

I am doing following in my UVM testbench to create seq and start test.
I've some sequences. I'm copying a code snippet from one of the sequences bellow.
Inside body():
`uvm_create_on(my_seq, p_sequencer.my_sequencer)
my_seq.randomize();
`uvm_send(my_seq)
2.In my test, I'm doing following to start a sequence:
task run_phase(uvm_phase phase);
.....
phase.raise_objection(this);
seq.start(env.virtual_sequencer);
phase.drop_objection(this);
endtask
Now, if I do this, the test is starts and ends at zero time. What I mean is, the DUT is not being driven by my sequence. If I make following change then it seems to work fine:
Option1:changing run_phase in test-
task run_phase(uvm_phase phase);
.....
phase.raise_objection(this);
seq.start(env.virtual_sequencer);
#100000; // Adding delay here.
phase.drop_objection(this);
endtask
If I do this then test starts and I can see that DUT is being driven and everything is working as expected. However, test always ends at time 1000000- even though the sequence is not done sending all the transactions to DUT. It's not good as I don't know how long my DUT will take to complete a test. So, I rather tried something like this:
Option 2: Keeping the default code in test (not adding delay in run_phase). Making following change inside body of my_seq:
Inside body():
uvm_test_done.raise_objection(this);
`uvm_create_on(my_seq, p_sequencer.my_sequencer)
my_seq.randomize();
`uvm_send(my_seq)
uvm_test_done.drop_objection(this);
If I do this then it works fine. Is it the proper way of handling objection ? Going back to my original implementation, I assumed that my sequence is blocking. So, whenever I start a sequence in run_phase of test using start(...), it'll be considered as blocking and will wait at that line until sequence is done sending all the transaction. So, I didn't add any delay in my original code.
I think I'm missing something here. Any help will be greatly appreciated.
If you're doing a fork in your main sequence, then its body() task (which is called by start()) won't block. If you need to do a fork...join_none due to some kind of synchronization you need, you should also implement some kind of mechanism to know when the forked off processes terminate so that you can stall body until then. Example:
// inside your main sequence
event forked_finished;
task body();
fork
`uvm_do(some_sub_seq)
-> forked_finished;
join_none
// do some other stuff here in parallel
// make sure that the forked processes also finished
#forked_finished;
endtask
This code here assumes that the forked process finishes after your other code does. In production code you probably wouldn't rely on this assumption and would use a uvm_event to test first if the event already triggered before waiting.
Because body() waits until everything is finished w.r.t. stimulus, then you shouldn't have any problem setting and objection before starting this sequence and the lowering it once it's done.
You really have to consider the semantics of your sequence. Usually I expect a sequence's body to not complete until it is finished. So doing a fork/join_none would be undesirable because the caller of the sequence would have a way of knowing that the sequence has completed. Similar to what you see in your test.
The solution is to not have my_seq::body return until it is complete.
If the caller of my_seq needs to do something in parallel with my_seq, then it should be their responsibility to do the appropriate fork.

Can we have race conditions in a single-thread program?

You can find on here a very good explanation about what is a race condition.
I have seen recently many people making confusing statements about race conditions and threads.
I have learned that race conditions could only occur between threads. But I saw code that looked like race conditions, in event and asynchronous based languages, even if the program was single thread, like in Node.js, in GTK+, etc.
Can we have a race condition in a single thread program?
All examples are in a fictional language very close to Javascript.
Short:
A race condition can only occur between two or more threads / external state (one of them can be the OS). We cannot have race conditions inside a single thread process, non I/O doing program.
But a single thread program can in many cases :
give situations which looks similar to race conditions, like in event based program with an event loop, but are not real race conditions
trigger a race condition between or with other thread(s), for example, or because the execution of some parts of the program depends on external state :
other programs, like clients
library threads or servers
system clock
I) Race conditions can only occur with two or more threads
A race condition can only occur when two or more threads try to access a shared resource without knowing it is modified at the same time by unknown instructions from the other thread(s). This gives an undetermined result. (This is really important.)
A single thread process is nothing more than a sequence of known instructions which therefore results in a determined result, even if the execution order of instructions is not easy to read in the code.
II) But we are not safe
II.1) Situations similar to race conditions
Many programming languages implements asynchronous programming features through events or signals, handled by a main loop or event loop which check for the event queue and trigger the listeners. Example of this are Javascript, libuevent, reactPHP, GNOME GLib... Sometimes, we can find situations which seems to be race conditions, but they are not.
The way the event loop is called is always known, so the result is determined, even if the execution order of instructions is not easy to read (or even cannot be read if we do not know the library).
Example:
setTimeout(
function() { console.log("EVENT LOOP CALLED"); },
1
); // We want to print EVENT LOOP CALLED after 1 milliseconds
var now = new Date();
while(new Date() - now < 10) //We do something during 10 milliseconds
console.log("EVENT LOOP NOT CALLED");
in Javascript output is always (you can test in node.js) :
EVENT LOOP NOT CALLED
EVENT LOOP CALLED
because, the event loop is called when the stack is empty (all functions have returned).
Be aware that this is just an example and that in languages that implements events in a different way, the result might be different, but it would still be determined by the implementation.
II.2) Race condition between other threads, for example :
II.2.i) With other programs like clients
If other processes are requesting our process, that our program do not treat requests in an atomic way, and that our process share some resources between the requests, there might be a race condition between clients.
Example:
var step;
on('requestOpen')(
function() {
step = 0;
}
);
on('requestData')(
function() {
step = step + 1;
}
);
on('requestEnd')(
function() {
step = step +1; //step should be 2 after that
sendResponse(step);
}
);
Here, we have a classical race condition setup. If a request is opened just before another ends, step will be reset to 0. If two requestData events are triggered before the requestEnd because of two concurrent requests, step will reach 3. But this is because we take the sequence of events as undetermined. We expect that the result of a program is most of the time undetermined with an undetermined input.
In fact, if our program is single thread, given a sequence of events the result is still always determined. The race condition is between clients.
There is two ways to understand the thing :
We can consider clients as part of our program (why not ?) and in this case, our program is multi thread. End of the story.
More commonly we can consider that clients are not part of our program. In this case they are just input. And when we consider if a program has a determined result or not, we do that with input given. Otherwise even the simplest program return input; would have a undetermined result.
Note that :
if our process treat request in an atomic way, it is the same as if there was a mutex between client, and there is no race condition.
if we can identify request and attach the variable to a request object which is the same at every step of the request, there is no shared resource between clients and no race condition
II.2.ii) With library thread(s)
In our programs, we often use libraries which spawn other processes or threads, or that just do I/O with other processes (and I/O is always undetermined).
Example :
databaseClient.sendRequest('add Me to the database');
databaseClient.sendRequest('remove Me from the database');
This can trigger a race condition in an asynchronous library. This is the case if sendRequest() returns after having sent the request to the database, but before the request is really executed. We immediately send another request and we cannot know if the first will be executed before the second is evaluated, because database works on another thread. There is a race condition between the program and the database process.
But, if the database was on the same thread as the program (which in real life does not happen often) is would be impossible that sendRequest returns before the request is processed. (Unless the request is queued, but in this case, the result is still determined as we know exactly how and when the queue is read.)
II.2.i) With system clock
#mingwei-samuel answer gives an example of a race condition with a single thread JS program, between to setTimeout callback. Actually, once both setTimeout are called, the execution order is already determined. This order depends on the system clock state (so, an external thread) at the time of setTimeout call.
Conclusion
In short, single-thread programs are not free from trigerring race conditions. But they can only occur with or between other threads of external programs. The result of our program might be undetermined, because the input our program receive from those other programs is undetermined.
Race conditions can occur with any system that has concurrently executing processes that create state changes in external processes, examples of which include :
multithreading,
event loops,
multiprocessing,
instruction level parallelism where out-of-order execution of instructions has to take care to avoid race conditions,
circuit design,
dating (romance),
real races in e.g. the olympic games.
Yes.
A "race condition" is a situation when the result of a program can change depending on the order operations are run (threads, async tasks, individual instructions, etc).
For example, in Javascript:
setTimeout(() => console.log("Hello"), 10);
setTimeout(() => setTimeout(() => console.log("World"), 4), 4);
// VM812:1 Hello
// VM812:2 World
setTimeout(() => console.log("Hello"), 10);
setTimeout(() => setTimeout(() => console.log("World"), 4), 4);
// VM815:2 World
// VM815:1 Hello
So clearly this code depends on how the JS event loop works, how tasks are ordered/chosen, what other events occurred during execution, and even how your operating system chose to schedule the JS runtime process.
This is contrived, but a real program could have a situation where "Hello" needs to be run before "World", which could result in some nasty non-deterministic bugs. How people could consider this not a "real" race condition, I'm not sure.
Data Races
It is not possible to have data races in single threaded code.
A "data race" is multiple threads accessing a shared resource at the same time in an inconstant way, or specifically for memory: multiple threads accessing the same memory, where one (or more) is writing. Of course, with a single thread this is not possible.
This seems to be what #jillro's answer is talking about.
Note: the exact definitions of "race condition" and "data race" are not agreed upon. But if it looks like a race condition, acts like a race condition, and causes nasty non-deterministic bugs like a race condition, then I think it should be called a race condition.

Verilog doesn't have something like main()?

I understand that modules are essentially like c++ functions. However, I didn't find something like a main() section that calls those functions. How does it work without a main() section?
Trying to find (or conceptually force) a main() equivalent in HDL is the wrong way to go about learning HDL -- it will prevent you from making progress. For synthesisable descriptions you need to make the leap from sequential thinking (one instruction running after another) to "parallel" thinking (everything is running all the time). Mentally, look at your code from left to right instead of top to bottom, and you may realize that the concept of main() isn't all that meaningful.
In HDL, we don't "call" functions, we instantiate modules and connect their ports to nets; again, you'll need to change your mental view of the process.
Once you get it, it all becomes much smoother...
Keep in mind that the normal use of Verilog is modeling/describing circuits. When you apply power, all the circuits start to run, so you need to write your reset logic to get each piece into a stable, usable operating state. Typically you'll include a reset line and do your initialization in response to that.
Verilog has initial blocks are kinda like main() in C. These are lists of statements that are scheduled to run from time 0. Verilog can have multiple initial blocks though, that are executed concurrently.
always blocks will also work as main() if they've an empty sensitivity list:
always begin // no sensitivity list
s = 4;
#10; // delay statements, or sim will infinite loop
s = 8;
#10;
end

Resources