Verilog automatic task - verilog

What does it mean if a task is declared with the automatic keyword in Verilog?
task automatic do_things;
input [31:0] number_of_things;
reg [31:0] tmp_thing;
begin
// ...
end
endtask;
Note: This question is mostly because I'm curious if there are any hardware programmers on the site. :)

"automatic" does in fact mean "re-entrant". The term itself is stolen from software languages -- for example, C has the "auto" keyword for declaring variables as being allocated on the stack when the scope it's in is executed, and deallocated afterwards, so that multiple invocations of the same scope do not see persistent values of that variable. The reason you may not have heard of this keyword in C is that it is the default storage class for all types :-) The alternatives are "static", which means "allocate this variable statically (to a single global location in memory), and refer to this same memory location throughout the execution of the program, regardless of how many times the function is invoked", and "volatile", which means "this is a register elsewhere on my SoC or something on another device which I have no control over; compiler, please don't optimize reads to me away, even when you think you know my value from previous reads with no intermediate writes in the code".
"automatic" is intended for recursive functions, but also for running the same function in different threads of execution concurrently. For instance, if you "fork" off N different blocks (using Verilog's fork->join statement), and have them all call the same function at the same time, the same problems arise as a function calling itself recursively.
In many cases, your code will be just fine without declaring the task or function as "automatic", but it's good practice to put it in there unless you specifically need it to be otherwise.

It means that the task is re-entrant - items declared within the task are dynamically allocated rather than shared between different invocations of the task.
You see - some of us do Verilog... (ugh)

The "automatic" keyword also allows you to write recursive functions (since verilog 2001). I believe they should be synthesisable if they bottom out, but I'm not sure if they have tool support.
I too, do verilog!

As Will and Marty say, the automatic was intended for recursive functions.
If a normal (i.e. not automatic) function is called with different values and processed by the simulator in the same time slice, the returned value is indeterminate. That can be quite a tricky bug to spot! This is only a simulation issue, when synthesised the logic will be correct.
Making the function automatic fixes this.

In computing, a computer program or subroutine is called re-entrant if multiple invocations can safely run concurrently (Wikipedia).
In simple words, the keyword automatic makes it safe, when multiple instances of a task run at a same time.
:D

Automatic is just opposite to static in usual programming. So is the case with Verilog. Think of static variables, they cannot be re-initialized. See the Verilog description below:
for (int i = 0; i < 3; i++) begin
static int f = 0;
f = f + 1;
end
Result of the above program will be f = 3. Also, see the program below:
for (int i = 0; i < 3; i++) begin
int f = 0;
f = f + 1;
end
The result of above program is f = 1. What makes a difference is static keyword.
Conclusion is tasks in Verilog should be automatic because they are invoked (called) so many times. If they were static (if not declared explicitly, they are static), they could have used the result from the previous call which often we do not want.

Related

Peterson's solution just use one variable

For Pi:
do {
turn = i; // prepare enter section
while(turn==j);
//critical section
turn = j; //exit section.
} while(true);
For Pj:
do {
turn = j; // prepare enter section
while(turn==i);
//critical section
turn = i; //exit section.
} while(true);
In this simplified algorithm, if process i want to enter critical section for i, it will set "turn = i"(different from Peterson's solution which will set "turn = j"). this algorithm does not seem to cause deadlock or starvation, so why Peterson's algorithm not simplified like this?
Another Question: as i know, mutual exclusion mechanisms such as semaphore P/V operations require atomicity (P should do test sem.value and sem.value-- concurrently). but why the algorithm above just use one variable turn does not seem to require atomicity (turn = i, test turn == j not atomicity )?
Before you ask whether the algorithm avoids deadlock and starvation, you first have to verify that it still locks. With your version, even assuming sequential consistency, the operations could be sequenced like this:
Pi Pj
turn = i;
while (turn == j); // exits immediately
turn = j;
while (turn == i); // exits immediately
// critical section // critical section
and you have a lock violation.
To your second question: it depends on what you mean by "atomicity". You do need it to be the case that when one thread stores turn = i; then the other thread loading turn will only read i or j and not anything else. On some machines, depending on the type of turn and the values of i and j, you could get tearing and load an entirely different value. So whatever language you are using may require you to declare turn as "atomic" in some fashion to avoid this. In C++ in particular, if turn isn't declared std::atomic, then any concurrent read/write access is a data race, and the behavior of the entire program becomes undefined (that's bad).
Besides the need to avoid tearing and data races, Peterson's algorithm also requires strict memory ordering (sequential consistency), which on many systems / languages is not guaranteed unless specially requested, again perhaps by declaring the variable as atomic in some fashion.
It is true that unlike more typical lock algorithms, Peterson doesn't require an atomic read-modify-write, only atomic sequentially consistent loads and stores. That's precisely what makes it an interesting and clever algorithm. But there's a substantial tradeoff in complexity and performance, especially if you want more than two threads, and most real-life systems do have reasonably efficient atomic RMW instructions, so Peterson is rarely used in practice.

C++98: is volatile needed for access to shared global objects in a multithreaded application

Sadly I'm stuck with C++98, which I'm using in an embedded application.
My question is: I have a multithreaded application, with various global shared variables (evil, I know).
I do protect every access to them using mutexes. Do I also need to declare these global variables as volatile, in order to prevent the compiler from optimizing accesses to them?
Searching online it seems that volatile is absolutely useless for multithreading, but a lot of articles are related to C++11, which did introduce a memory model which recognizes threads, but I'm in C++98 land.
I also found some resources that indicate that volatile is instead useful in my case, such as this Barr Group's article.
Let me emphasize the fact that I don't want to get rid of the mutexes at all, or try lock free programming. The mutexes are absolutely staying, I just want to understand if the volatile keyword is needed.
Do I also need to declare these global variables as volatile, in order to prevent the compiler from optimizing accesses to them?
No. And if you did, you would still be in trouble because volatile is not sufficient -- things other than the compiler (such as the CPU, posting buffers, and memory controllers) can also optimize accesses.
As I'm sure you've read elsewhere, volatile has no defined multi-threading semantics in C++98. So unless it does in your particular threading standard (which you don't specify), then it's completely useless to you.
Presumably, your code uses mutexes properly. No optimization is allowed to break code that only relies on guarantees provided by the relevant standards or implementation. So if you're using the mutexes correctly, then your code is guaranteed to work.
What does keyword volatile mean ?
You enforce your program always to read/write your variable to memory (like cache miss every time). Always (CPU->L1->L2->L3->Bus->Memoryand Memory->Bus->L3->L2->L1->CPU)
Actually it slows your program, so you should not use it until exact need.
You may know that compiler may do some optimizations, but these are designed not to affect/change your program logic.
Example 1:
int d = 5;
int b = 10;
for (int i=0; i < 1e9; i++){
cout << i;
b++;
}
cout << d; // d may be created only here, or not created at all. compiler may just cout << 5;
// var b - seems to be skipped at all due to being unused
Example 2:
int a = 0;
for (int i = 0; i < 10; i++){
a++;
sleep(1000);
}
cout << a;
// this code could be compiled like the following one
sleep(10000);
cout << 10;
Keyword volatile prevents your var from such optimizations.
i.e. vars a, b will be created and incremented, also cache missed always.
Here is one of the most common use cases for using volatile:
You have a third party sensor or scoreboard, that is attached to var address, and its value is always read and shown on the scoreboard. so in this case you need volatile, to prevent compiler optimizations on it, so your score board can show correct value.
I guess now you can decide whether you need volatile or not

Verilog doesn't have something like main()?

I understand that modules are essentially like c++ functions. However, I didn't find something like a main() section that calls those functions. How does it work without a main() section?
Trying to find (or conceptually force) a main() equivalent in HDL is the wrong way to go about learning HDL -- it will prevent you from making progress. For synthesisable descriptions you need to make the leap from sequential thinking (one instruction running after another) to "parallel" thinking (everything is running all the time). Mentally, look at your code from left to right instead of top to bottom, and you may realize that the concept of main() isn't all that meaningful.
In HDL, we don't "call" functions, we instantiate modules and connect their ports to nets; again, you'll need to change your mental view of the process.
Once you get it, it all becomes much smoother...
Keep in mind that the normal use of Verilog is modeling/describing circuits. When you apply power, all the circuits start to run, so you need to write your reset logic to get each piece into a stable, usable operating state. Typically you'll include a reset line and do your initialization in response to that.
Verilog has initial blocks are kinda like main() in C. These are lists of statements that are scheduled to run from time 0. Verilog can have multiple initial blocks though, that are executed concurrently.
always blocks will also work as main() if they've an empty sensitivity list:
always begin // no sensitivity list
s = 4;
#10; // delay statements, or sim will infinite loop
s = 8;
#10;
end

Interview Question on .NET Threading

Could you describe two methods of synchronizing multi-threaded write access performed
on a class member?
Please could any one help me what is this meant to do and what is the right answer.
When you change data in C#, something that looks like a single operation may be compiled into several instructions. Take the following class:
public class Number {
private int a = 0;
public void Add(int b) {
a += b;
}
}
When you build it, you get the following IL code:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: dup
// Pushes the value of the private variable 'a' onto the stack
IL_0003: ldfld int32 Simple.Number::a
// Pushes the value of the argument 'b' onto the stack
IL_0008: ldarg.1
// Adds the top two values of the stack together
IL_0009: add
// Sets 'a' to the value on top of the stack
IL_000a: stfld int32 Simple.Number::a
IL_000f: ret
Now, say you have a Number object and two threads call its Add method like this:
number.Add(2); // Thread 1
number.Add(3); // Thread 2
If you want the result to be 5 (0 + 2 + 3), there's a problem. You don't know when these threads will execute their instructions. Both threads could execute IL_0003 (pushing zero onto the stack) before either executes IL_000a (actually changing the member variable) and you get this:
a = 0 + 2; // Thread 1
a = 0 + 3; // Thread 2
The last thread to finish 'wins' and at the end of the process, a is 2 or 3 instead of 5.
So you have to make sure that one complete set of instructions finishes before the other set. To do that, you can:
1) Lock access to the class member while it's being written, using one of the many .NET synchronization primitives (like lock, Mutex, ReaderWriterLockSlim, etc.) so that only one thread can work on it at a time.
2) Push write operations into a queue and process that queue with a single thread. As Thorarin points out, you still have to synchronize access to the queue if it isn't thread-safe, but it's worth it for complex write operations.
There are other techniques. Some (like Interlocked) are limited to particular data types, and there are even more (like the ones discussed in Non-blocking synchronization and Part 4 of Joseph Albahari's Threading in C#), though they are more complex: approach them with caution.
In multithreaded applications, there are many situations where simultaneous access to the same data can cause problems. In such cases synchronization is required to guarantee that only one thread has access at any one time.
I imagine they mean using the lock-statement (or SyncLock in VB.NET) vs. using a Monitor.
You might want to read this page for examples and an understanding of the concept. However, if you have no experience with multithreaded application design, it will likely become quickly apparent, should your new employer put you to the test. It's a fairly complicated subject, with many possible pitfalls such as deadlock.
There is a decent MSDN page on the subject as well.
There may be other options, depending on the type of member variable and how it is to be changed. Incrementing an integer for example can be done with the Interlocked.Increment method.
As an excercise and demonstration of the problem, try writing an application that starts 5 simultaneous threads, incrementing a shared counter a million times per thread. The intended end result of the counter would be 5 million, but that is (probably) not what you will end up with :)
Edit: made a quick implementation myself (download). Sample output:
Unsynchronized counter demo:
expected counter = 5000000
actual counter = 4901600
Time taken (ms) = 67
Synchronized counter demo:
expected counter = 5000000
actual counter = 5000000
Time taken (ms) = 287
There are a couple of ways, several of which are mentioned previously.
ReaderWriterLockSlim is my preferred method. This gives you a database type of locking, and allows for upgrading (although the syntax for that is incorrect in the MSDN last time I looked and is very non-obvious)
lock statements. You treat a read like a write and just prevent access to the variable
Interlocked operations. This performs an operations on a value type in an atomic step. This can be used for lock free threading (really wouldn't recommend this)
Mutexes and Semaphores (haven't used these)
Monitor statements (this is essentially how the lock keyword works)
While I don't mean to denigrate other answers, I would not trust anything that does not use one of these techniques. My apologies if I have forgotten any.

Is it ok to have multiple threads writing the same values to the same variables?

I understand about race conditions and how with multiple threads accessing the same variable, updates made by one can be ignored and overwritten by others, but what if each thread is writing the same value (not different values) to the same variable; can even this cause problems? Could this code:
GlobalVar.property = 11;
(assuming that property will never be assigned anything other than 11), cause problems if multiple threads execute it at the same time?
The problem comes when you read that state back, and do something about it. Writing is a red herring - it is true that as long as this is a single word most environments guarantee the write will be atomic, but that doesn't mean that a larger piece of code that includes this fragment is thread-safe. Firstly, presumably your global variable contained a different value to begin with - otherwise if you know it's always the same, why is it a variable? Second, presumably you eventually read this value back again?
The issue is that presumably, you are writing to this bit of shared state for a reason - to signal that something has occurred? This is where it falls down: when you have no locking constructs, there is no implied order of memory accesses at all. It's hard to point to what's wrong here because your example doesn't actually contain the use of the variable, so here's a trivialish example in neutral C-like syntax:
int x = 0, y = 0;
//thread A does:
x = 1;
y = 2;
if (y == 2)
print(x);
//thread B does, at the same time:
if (y == 2)
print(x);
Thread A will always print 1, but it's completely valid for thread B to print 0. The order of operations in thread A is only required to be observable from code executing in thread A - thread B is allowed to see any combination of the state. The writes to x and y may not actually happen in order.
This can happen even on single-processor systems, where most people do not expect this kind of reordering - your compiler may reorder it for you. On SMP even if the compiler doesn't reorder things, the memory writes may be reordered between the caches of the separate processors.
If that doesn't seem to answer it for you, include more detail of your example in the question. Without the use of the variable it's impossible to definitively say whether such a usage is safe or not.
It depends on the work actually done by that statement. There can still be some cases where Something Bad happens - for example, if a C++ class has overloaded the = operator, and does anything nontrivial within that statement.
I have accidentally written code that did something like this with POD types (builtin primitive types), and it worked fine -- however, it's definitely not good practice, and I'm not confident that it's dependable.
Why not just lock the memory around this variable when you use it? In fact, if you somehow "know" this is the only write statement that can occur at some point in your code, why not just use the value 11 directly, instead of writing it to a shared variable?
(edit: I guess it's better to use a constant name instead of the magic number 11 directly in the code, btw.)
If you're using this to figure out when at least one thread has reached this statement, you could use a semaphore that starts at 1, and is decremented by the first thread that hits it.
I would expect the result to be undetermined. As in it would vary from compiler to complier, langauge to language and OS to OS etc. So no, it is not safe
WHy would you want to do this though - adding in a line to obtain a mutex lock is only one or two lines of code (in most languages), and would remove any possibility of problem. If this is going to be two expensive then you need to find an alternate way of solving the problem
In General, this is not considered a safe thing to do unless your system provides for atomic operation (operations that are guaranteed to be executed in a single cycle).
The reason is that while the "C" statement looks simple, often there are a number of underlying assembly operations taking place.
Depending on your OS, there are a few things you could do:
Take a mutual exclusion semaphore (mutex) to protect access
in some OS, you can temporarily disable preemption, which guarantees your thread will not swap out.
Some OS provide a writer or reader semaphore which is more performant than a plain old mutex.
Here's my take on the question.
You have two or more threads running that write to a variable...like a status flag or something, where you only want to know if one or more of them was true. Then in another part of the code (after the threads complete) you want to check and see if at least on thread set that status... for example
bool flag = false
threadContainer tc
threadInputs inputs
check(input)
{
...do stuff to input
if(success)
flag = true
}
start multiple threads
foreach(i in inputs)
t = startthread(check, i)
tc.add(t) // Keep track of all the threads started
foreach(t in tc)
t.join( ) // Wait until each thread is done
if(flag)
print "One of the threads were successful"
else
print "None of the threads were successful"
I believe the above code would be OK, assuming you're fine with not knowing which thread set the status to true, and you can wait for all the multi-threaded stuff to finish before reading that flag. I could be wrong though.
If the operation is atomic, you should be able to get by just fine. But I wouldn't do that in practice. It is better just to acquire a lock on the object and write the value.
Assuming that property will never be assigned anything other than 11, then I don't see a reason for assigment in the first place. Just make it a constant then.
Assigment only makes sense when you intend to change the value unless the act of assigment itself has other side effects - like volatile writes have memory visibility side-effects in Java. And if you change state shared between multiple threads, then you need to synchronize or otherwise "handle" the problem of concurrency.
When you assign a value, without proper synchronization, to some state shared between multiple threads, then there's no guarantees for when the other threads will see that change. And no visibility guarantees means that it it possible that the other threads will never see the assignt.
Compilers, JITs, CPU caches. They're all trying to make your code run as fast as possible, and if you don't make any explicit requirements for memory visibility, then they will take advantage of that. If not on your machine, then somebody elses.

Resources