Delphi [volatile] and InterlockedCompareExchange not reliable? - multithreading

I wrote a simple lock-free node stack (Delp[hi XE4, Win7-64, 32-bit app) where I can have multiple 'stacks' and pop/push nodes between them from various threads simultaneously.
It works 99.999% of the time but eventually glitch under a stress test using all CPU cores.
Stripped-down, it comes down to this (not real/compiled code):
Nodes are :
type POBNode = ^TOBNode;
[volatile]TOBNode = record
[volatile]Next : POBNode;
Data : Int64;
end;
Simplified stack :
type TOBStack = class
private
[volatile]Head:POBNode;
function Pop:POBNode;
procedure Push(NewNode:POBNode);
end;
procedure TOBStack.Push(NewNode:POBNode);
var zTmp : POBNode;
begin;
repeat
zTmp:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
NewNode.Next:=zTmp;
if InterlockedCompareExchangePointer(Head,NewNode,zTmp)=zTmp
then break (*success*)
else continue;
until false;
end;
function TOBStack.Pop:POBNode;
begin;
repeat
Result:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
if Result=nil
the exit;
NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
then break (*Success*)
else continue;(*Fail, try again*)
until False;
end;
I have tried many variations on this but cannot get it to be stable.
If I create a few threads that each have a stack and they all push/pop to/from a global stack, it eventually glitch, but not quickly. Only when I stress it for minutes on end from multiple threads, in tight loops.
I cannot have hidden bugs in my code, so I need more advice than to ask a specific question to get this running 100% error-free, 24/7.
Does the code above look fine for a lock-free thread-safe stack?
What else can I look at? This is impossible to debug normally as the errors occur at various places, telling me there is a pointer or RAM corruption happening somewhere. I also get duplicate entries, meaning that a node that was popped of one stack, then later returned to that same stack, is still on top of the old stack... Impossible to happen according to my algorithm? This lead me to believe it's possible to violate Delphi/Windows InterlockedCompareExchange Methods or there is some hidden knowledge I have yet to be revealed. :) (I also tried TInterlocked)
I have made a complete test case that can be copied from ftp.true.co.za.
In there I run 8 threads doing 400 000 push/pops each and it usually crashes (safely due to checks/raised exceptions) after a few cycles of these tests, sometimes many many test cycles complete before one suddenly crash.
Any advice would be appreciated.
Regards
Anton
E

At first I was skeptical of this being an ABA problem as indicated by gabr. It seemed to me that: if one thread looks at the current Head, and another thread pushes then pops; you're happy to still operate on the same Head in the same way.
However, consider this from your Pop method:
NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
If the thread is swapped out after the first line.
A value for NewHead is stored in a local variable.
Then another thread successfully pops the node this thread was targetting.
It also manages to push the same node back, but with a different value for Next before the first thread resumes.
The second line will pass the comparand check allowing head to receive the NewHead value from the local variable.
However, the current value for NewHead is incorrect, thereby corrupting your stack.
There's a subtle variation on this problem not even covered by your test app. This problem isn't checked in your test app because you aren't destroying any nodes until the end of your test.
Look at current head
Another thread pops some nodes.
The nodes are destroyed, and new nodes created and pushed.
By the time your "looking thread" is active again, it could be looking at an entirely different node that is coincidentally at the same address.
If you're popping, you might assign a garbage pointer to Head.
Apart form the above...
Looking at your test app there's also some really dodgy code. E.g.
You generate a "random number": J:=GetTickCount and 7;(*Get a 'random' number 0..7*).
Do you realise how fast computers are?
Do you realise that GetTickCount will generate reams of duplicates in a tight loop?
I.e. the numbers you generate will be nothing like random.
And when comments don't agree with code, my spidey-sense kicks in.
You're allocating memory of a hard-coded size: GetMem(zTmp,12);(*Allocate new node*).
Why aren't you using SizeOf?
You're using a multi-platform compiler.... The size of that structure can vary.
There is absolutely zero reason to hard-code the size of the structure.
Right now, given these two examples, I wouldn't be entirely confident that there isn't also an error in your test code.

Related

Confusion about C++11 lock free stack push() function

I'm reading C++ Concurrency in Action by Anthony Williams, and don't understand its push implementation of the lock_free_stack class. Listing 7.12 to be precise
void push(T const& data)
{
counted_node_ptr new_node;
new_node.ptr=new node(data);
new_node.external_count=1;
new_node.ptr->next=head.load(std::memory_order_relaxed)
while(!head.compare_exchange_weak(new_node.ptr->next,new_node, std::memory_order_release, std::memory_order_relaxed));
}
So imagine 2 threads (A, B) calling push function. Both of them reach while loop but not start it. So they both read the same value from head.load(std::memory_order_relaxed).
Then we have the following things going on:
B thread gets swiped out for any reason
A thread starts the loop and obviously successfully adds a new node to the stack.
B thread gets back on track and also starts the loop.
And this is where it gets interesting as it seems to me.
Because there was a load operation with std::memory_order_relaxed and compare_exchange_weak(..., std::memory_order_release, ...) in case of success it looks like there is no synchronization between threads whatsoever.
I mean it's like std::memory_order_relaxed - std::memory_order_release and not std::memory_order_acquire - std::memory_order_release.
So B thread will simply add a new node to the stack but to its initial state when we had no nodes in the stack and reset head to this new node.
I was doing my research all around this subject and the best i could find was in this post Does exchange or compare_and_exchange reads last value in modification order?
So the question is, is it true? and all RMW functions see the last value in modification order? No matter what std::memory_order we used, if we use RMW operation it will synchronize with all threads (CPU and etc) and find the last value to be written to the atomic operation upon it is called?
So after some research and asking a bunch of people I believe I found the proper answer to this question, I hope it'll be a help to someone.
So the question is, is it true? and all RMW functions see the last
value in modification order?
Yes, it is true.
No matter what std::memory_order we used, if we use RMW operation it
will synchronize with all threads (CPU and etc) and find the last
value to be written to the atomic operation upon it is called?
Yes, it is also true, however there is something that needs to be highlighted.
RMW operation will synchronize only the atomic variable it works with. In our case, it is head.load
Perhaps you would like to ask why we need release - acquire semantics at all if RMW does the synchronization even with the relaxed memory order.
The answer is because RMW works only with the variable it synchronizes, but other operations which occurred before RMW might not be seen in the other thread.
lets look at the push function again:
void push(T const& data)
{
counted_node_ptr new_node;
new_node.ptr=new node(data);
new_node.external_count=1;
new_node.ptr->next=head.load(std::memory_order_relaxed)
while(!head.compare_exchange_weak(new_node.ptr->next,new_node, std::memory_order_release, std::memory_order_relaxed));
}
In this example, in case of using two push threads they won't be synchronized with each other to some extent, but it could be allowed here.
Both threads will always see the newest head because compare_exchange_weak
provides this. And a new node will be always added to the top of the stack.
However if we tried to get the value like this *(new_node.ptr->next) after this line new_node.ptr->next=head.load(std::memory_order_relaxed) things could easily turn ugly: empty pointer might be dereferenced.
This might happen because a processor can change the order of instructions and because there is no synchronization between threads the second thread could see the pointer to a top node even before that was initialized!
And this is exactly where release-acquire semantic comes to help. It ensures that all operations which happened before release operation will be seen in acquire part!
Check out and compare listings 5.5 and 5.8 in the book.
I also recommend you to read this article about how processors work, it might provide some essential information for better understanding.
memory barriers

If one thread writes to a location and another thread is reading, can the second thread see the new value then the old?

Start with x = 0. Note there are no memory barriers in any of the code below.
volatile int x = 0
Thread 1:
while (x == 0) {}
print "Saw non-zer0"
while (x != 0) {}
print "Saw zero again!"
Thread 2:
x = 1
Is it ever possible to see the second message, "Saw zero again!", on any (real) CPU? What about on x86_64?
Similarly, in this code:
volatile int x = 0.
Thread 1:
while (x == 0) {}
x = 2
Thread 2:
x = 1
Is the final value of x guaranteed to be 2, or could the CPU caches update main memory in some arbitrary order, so that although x = 1 gets into a CPU's cache where thread 1 can see it, then thread 1 gets moved to a different cpu where it writes x = 2 to that cpu's cache, and the x = 2 gets written back to main memory before x = 1.
Yes, it's entirely possible. The compiler could, for example, have just written x to memory but still have the value in a register. One while loop could check memory while the other checks the register.
It doesn't happen due to CPU caches because cache coherency hardware logic makes the caches invisible on all CPUs you are likely to actually use.
Theoretically, the write race you talk about could happen due to posted write buffering and read prefetching. Miraculous tricks were used to make this impossible on x86 CPUs to avoid breaking legacy code. But you shouldn't expect future processors to do this.
Leaving aside for a second tricks done by the compiler (even ones allowed by language standards), I believe you're asking how the micro-architecture could behave in such scenario. Keep in mind that the code would most likely expand into a busy wait loop of cmp [x] + jz or something similar, which hides a load inside it. This means that [x] is likely to live in the cache of the core running thread 1.
At some point, thread 2 would come and perform the store. If it resides on a different core, the line would first be invalidated completely from the first core. If these are 2 threads running on the same physical core - the store would immediately affect all chronologically younger loads.
Now, the most likely thing to happen on a modern out-of-order machine is that all the loads in the pipeline at this point would be different iterations of the same first loop (since any branch predictor facing so many repetitive "taken" resolution is likely to assume the branch will continue being taken, until proven wrong), so what would happen is that the first load to encounter the new value modified by the other thread will cause the matching branch to simply flush the entire pipe from all younger operations, without the 2nd loop ever having a chance to execute.
However, it's possible that for some reason you did get to the 2nd loop (let's say the predictor issue a not-taken prediction just at the right moment when the loop condition check saw the new value) - in this case, the question boils down to this scenario:
Time -->
----------------------------------------------------------------
thread 1
cmp [x],0 execute
je ... execute (not taken)
...
cmp [x],0 execute
jne ... execute (not taken)
Can_We_Get_Here:
...
thread2
store [x],1 execute
In other words, given that most modern CPUs may execute instructions out of order, can a younger load be evaluated before an older one to the same address, allowing the store (from another thread) to change the value so it may be observed inconsistently by the loads.
My guess is that the above timeline is quite possible given the nature of out-of-order execution engines today, as they simply arbitrate and perform whatever operation is ready. However, on most x86 implementations there are safeguards to protect against such a scenario, since the memory ordering rules strictly say -
8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
Such mechanisms may detect this scenario and flush the machine to prevent the stale/wrong values becoming visible. So The answer is - no, it should not be possible, unless of course the software or the compiler change the nature of the code to prevent the hardware from noticing the relation. Then again, memory ordering rules are sometimes flaky, and i'm not sure all x86 manufacturers adhere to the exact same wording, but this is a pretty fundamental example of consistency, so i'd be very surprised if one of them missed it.
The answer seems to be, "this is exactly the job of the CPU cache coherency." x86 processors implement the MESI protocol, which guarantee that the second thread can't see the new value then the old.

Interview Question on .NET Threading

Could you describe two methods of synchronizing multi-threaded write access performed
on a class member?
Please could any one help me what is this meant to do and what is the right answer.
When you change data in C#, something that looks like a single operation may be compiled into several instructions. Take the following class:
public class Number {
private int a = 0;
public void Add(int b) {
a += b;
}
}
When you build it, you get the following IL code:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: dup
// Pushes the value of the private variable 'a' onto the stack
IL_0003: ldfld int32 Simple.Number::a
// Pushes the value of the argument 'b' onto the stack
IL_0008: ldarg.1
// Adds the top two values of the stack together
IL_0009: add
// Sets 'a' to the value on top of the stack
IL_000a: stfld int32 Simple.Number::a
IL_000f: ret
Now, say you have a Number object and two threads call its Add method like this:
number.Add(2); // Thread 1
number.Add(3); // Thread 2
If you want the result to be 5 (0 + 2 + 3), there's a problem. You don't know when these threads will execute their instructions. Both threads could execute IL_0003 (pushing zero onto the stack) before either executes IL_000a (actually changing the member variable) and you get this:
a = 0 + 2; // Thread 1
a = 0 + 3; // Thread 2
The last thread to finish 'wins' and at the end of the process, a is 2 or 3 instead of 5.
So you have to make sure that one complete set of instructions finishes before the other set. To do that, you can:
1) Lock access to the class member while it's being written, using one of the many .NET synchronization primitives (like lock, Mutex, ReaderWriterLockSlim, etc.) so that only one thread can work on it at a time.
2) Push write operations into a queue and process that queue with a single thread. As Thorarin points out, you still have to synchronize access to the queue if it isn't thread-safe, but it's worth it for complex write operations.
There are other techniques. Some (like Interlocked) are limited to particular data types, and there are even more (like the ones discussed in Non-blocking synchronization and Part 4 of Joseph Albahari's Threading in C#), though they are more complex: approach them with caution.
In multithreaded applications, there are many situations where simultaneous access to the same data can cause problems. In such cases synchronization is required to guarantee that only one thread has access at any one time.
I imagine they mean using the lock-statement (or SyncLock in VB.NET) vs. using a Monitor.
You might want to read this page for examples and an understanding of the concept. However, if you have no experience with multithreaded application design, it will likely become quickly apparent, should your new employer put you to the test. It's a fairly complicated subject, with many possible pitfalls such as deadlock.
There is a decent MSDN page on the subject as well.
There may be other options, depending on the type of member variable and how it is to be changed. Incrementing an integer for example can be done with the Interlocked.Increment method.
As an excercise and demonstration of the problem, try writing an application that starts 5 simultaneous threads, incrementing a shared counter a million times per thread. The intended end result of the counter would be 5 million, but that is (probably) not what you will end up with :)
Edit: made a quick implementation myself (download). Sample output:
Unsynchronized counter demo:
expected counter = 5000000
actual counter = 4901600
Time taken (ms) = 67
Synchronized counter demo:
expected counter = 5000000
actual counter = 5000000
Time taken (ms) = 287
There are a couple of ways, several of which are mentioned previously.
ReaderWriterLockSlim is my preferred method. This gives you a database type of locking, and allows for upgrading (although the syntax for that is incorrect in the MSDN last time I looked and is very non-obvious)
lock statements. You treat a read like a write and just prevent access to the variable
Interlocked operations. This performs an operations on a value type in an atomic step. This can be used for lock free threading (really wouldn't recommend this)
Mutexes and Semaphores (haven't used these)
Monitor statements (this is essentially how the lock keyword works)
While I don't mean to denigrate other answers, I would not trust anything that does not use one of these techniques. My apologies if I have forgotten any.

Keep app responsive during long task

A certain form in our application displays a graphical view of a model. The user can, amongst loads of other stuff, initiate a transformation of the model that can take quite some time. This transformation sometimes proceeds without any user interaction, at other times frequent user input is necessary. While it lasts the UI should be disabled (just showing a progress dialog) unless user input is needed.
Possible Approaches:
Ignore the issue, just put the transformation code in a procedure and call that. Bad because the app seems hung in cases where the transformation needs some time but requires no user input.
Sprinkle the code with callbacks: This is obtrusive - you’d have to put a lot of these calls in the transformation code - as well as unpredictable - you could never be sure that you’d found the right spots.
Sprinkle the code with Application.ProcessMessages: Same problems as with callbacks. Additionally you get all the issues with ProcessMessages.
Use a thread: This relieves us from the “obtrusive and unpredictable” part of 2. and 3. However it is a lot of work because of the “marshalling” that is needed for the user input - call Synchronize, put any needed parameters in tailor-made records etc. It’s also a nightmare to debug and prone to errors.
//EDIT: Our current solution is a thread. However it's a pain in the a** because of the user input. And there can be a lot of input code in a lot of routines. This gives me a feeling that a thread is not the right solution.
I'm going to embarass myself and post an outline of the unholy mix of GUI and work code that I've produced:
type
// Helper type to get the parameters into the Synchronize'd routine:
PGetSomeUserInputInfo = ^TGetSomeUserInputInfo;
TGetSomeUserInputInfo = record
FMyModelForm: TMyModelForm;
FModel: TMyModel;
// lots of in- and output parameters
FResult: Boolean;
end;
{ TMyThread }
function TMyThread.GetSomeUserInput(AMyModelForm: TMyModelForm;
AModel: TMyModel; (* the same parameters as in TGetSomeUserInputInfo *)): Boolean;
var
GSUII: TGetSomeUserInputInfo;
begin
GSUII.FMyModelForm := AMyModelForm;
GSUII.FModel := AModel;
// Set the input parameters in GSUII
FpCallbackParams := #GSUII; // FpCallbackParams is a Pointer field in TMyThread
Synchronize(DelegateGetSomeUserInput);
// Read the output parameters from GSUII
Result := GSUII.FResult;
end;
procedure TMyThread.DelegateGetSomeUserInput;
begin
with PGetSomeUserInputInfo(FpCallbackParams)^ do
FResult := FMyModelForm.DoGetSomeUserInput(FModel, (* the params go here *));
end;
{ TMyModelForm }
function TMyModelForm.DoGetSomeUserInput(Sender: TMyModel; (* and here *)): Boolean;
begin
// Show the dialog
end;
function TMyModelForm.GetSomeUserInput(Sender: TMyModel; (* the params again *)): Boolean;
begin
// The input can be necessary in different situations - some within a thread, some not.
if Assigned(FMyThread) then
Result := FMyThread.GetSomeUserInput(Self, Sender, (* the params *))
else
Result := DoGetSomeUserInput(Sender, (* the params *));
end;
Do you have any comments?
I think as long as your long-running transformations require user interaction, you're not going to be truly happy with any answer you get. So let's back up for a moment: Why do you need to interrupt the transformation with requests for more information? Are these really questions you couldn't have anticipated before starting the transformation? Surely the users aren't too happy about the interruptions, either, right? They can't just set the transformation going and then go get a cup of coffee; they need to sit and watch the progress bar in case there's an issue. Ugh.
Maybe the issues the transformation encounters are things that could be "saved up" until the end. Does the transformation need to know the answers immediately, or could it finish everything else, and then just do some "fix-ups" afterward?
Definitely go for a threaded option (even after your edit, saying you find it complex). The solution that duffymo suggests is, in my opinion, very poor UI design (even though it's not explicitly about the appearance, it is about how the user interfaces with your application). Programs that do this are annoying, because you have no idea how long the task will take, when it will complete, etc. The only way this approach could be made better would be by stamping the results with the generation date/time, but even then you require the user to remember when they started the process.
Take the time/effort and make the application useful, informative and less frustrating for your end user.
For an optimal solution you will have to analyse your code anyway, and find all the places to check whether the user wants to cancel the long-running operation. This is true both for a simple procedure and a threaded solution - you want the action to finish after a few tenths of a second to have your program appear responsive to the user.
Now what I would do first is to create an interface (or abstract base class) with methods like:
IModelTransformationGUIAdapter = interface
function isCanceled: boolean;
procedure setProgress(AStep: integer; AProgress, AProgressMax: integer);
procedure getUserInput1(...);
....
end;
and change the procedure to have a parameter of this interface or class:
procedure MyTransformation(AGuiAdapter: IModelTransformationGUIAdapter);
Now you are prepared to implement things in a background thread or directly in the main GUI thread, the transformation code itself will not need to be changed, once you have added code to update the progress and check for a cancel request. You only implement the interface in different ways.
I would definitely go without a worker thread, especially if you want to disable the GUI anyway. To make use of multiple processor cores you can always find parts of the transformation process that are relatively separated and process them in their own worker threads. This will give you much better throughput than a single worker thread, and it is easy to accomplish using AsyncCalls. Just start as many of them in parallel as you have processor cores.
Edit:
IMO this answer by Rob Kennedy is the most insightful yet, as it does not focus on the details of the implementation, but on the best experience for the user. This is surely the thing your program should be optimised for.
If there really is no way to either get all information before the transformation is started, or to run it and patch some things up later, then you still have the opportunity to make the computer do more work so that the user has a better experience. I see from your various comments that the transformation process has a lot of points where the execution branches depending on user input. One example that comes to mind is a point where the user has to choose between two alternatives (like horizontal or vertical direction) - you could simply use AsyncCalls to initiate both transformations, and there are chances that the moment the user has chosen his alternative both results are already calculated, so you can simply present the next input dialog. This would better utilise multi-core machines. Maybe an idea to follow up on.
TThread is perfect and easy to use.
Develope and debug your slow function.
if it is ready, put the call into the tthread execute method.
Use the onThreadTerminate Event to find out the end of you function.
for user feedback use syncronize!
I think your folly is thinking of the transformation as a single task. If user input is required as part of the calculation and the input asked for depends on the caclulation up to that point, then I would refactor the single task into a number of tasks.
You can then run a task, ask for user input, run the next task, ask for more input, run the next task, etc.
If you model the process as a workflow, it should become clear what tasks, decisions and user input is required.
I would run each task in a background thread to keep the user interface interactive, but without all the marshaling issues.
Process asynchronously by sending a message to a queue and have the listener do the processing. The controller sends an ACK message to the user that says "We've received your request for processing. Please check back later for results." Give the user a mailbox or link to check back and see how things are progressing.
While I don't completely understand what your trying to do, what I can offer is my view on a possible solution. My understanding is that you have a series of n things to do, and along the way decisions on one could cause one or more different things to be added to the "transformation". If this is the case, then I would attempt to separate (as much as possible) the GUI and decisions from the actual work that needs to be done. When the user kicks off the "transformation" I would (not in a thread yet) loop through each of the necessary decisions but not performing any work...just asking the questions required to do the work and then pushing the step along with the parameters into a list.
When the last question is done, spawn your thread passing it the list of steps to run along with the parameters. The advantage of this method is you can show a progress bar of 1 of n items to give the user an idea of how long it might take when they come back after getting their coffee.
I'd certainly go with threads. Working out how a thread will interact with the user is often difficult, but the solution that has worked well for me is to not have the thread interact with the user, but have the user side GUI interact with the thread. This solves the problem of updating the GUI using synchronize, and gives the user more responsive activity.
So, to do this, I use various variables in the thread, accessed by Get/Set routines that use critical sections, to contain status information. For starters, I'd have a "Cancelled" property for the GUI to set to ask the thread to stop please. Then a "Status"property that indicates if the thread is waiting, busy or complete. You might have a "human readable" status to indicate what is happening, or a percentage complete.
To read all this information, just use a timer on the form and update. I tend to have a "statusChanged" property too, which is set if one of the other items needs refreshing, which stops too much reading going on.
This has worked well for me in various apps, including one which displays the status of up to 8 threads in a list box with progress bars.
If you decide going with Threads, which I also find somewhat complex the way they are implemented in Delphi, I would recommend the OmniThreadLibrary by Primož Gabrijelčič or Gabr as he is known here at Stack Overflow.
It is the simplest to use threading library I know of. Gabr writes great stuff.
If you can split your transformation code into little chunks, then you can run that code when the processor is idle. Just create an event handler, hook it up to the Application.OnIdle event. As long as you make sure that each chunk of code is fairly short (the amount of time you want the application to be unresponsive...say 1/2 a second. The important thing is to set the done flag to false at the end of your handler :
procedure TMyForm .IdleEventHandler(Sender: TObject;
var Done: Boolean);
begin
{Do a small bit of work here}
Done := false;
end;
So for example if you have a loop, instead of using a for loop, use a while loop, make sure the scope of the loop variable is at the form level. Set it to zero before setting the onIdle event, then for example perform 10 loops per onidle hit until you hit the end of the loop.
Count := 0;
Application.OnIdle := IdleEventHandler;
...
...
procedure TMyForm .IdleEventHandler(Sender: TObject;
var Done: Boolean);
var
LocalCount : Integer;
begin
LocalCount := 0;
while (Count < MaxCount) and (Count < 10) do
begin
{Do a small bit of work here}
Inc(Count);
Inc(LocalCount);
end;
Done := false;
end;

Is it ok to have multiple threads writing the same values to the same variables?

I understand about race conditions and how with multiple threads accessing the same variable, updates made by one can be ignored and overwritten by others, but what if each thread is writing the same value (not different values) to the same variable; can even this cause problems? Could this code:
GlobalVar.property = 11;
(assuming that property will never be assigned anything other than 11), cause problems if multiple threads execute it at the same time?
The problem comes when you read that state back, and do something about it. Writing is a red herring - it is true that as long as this is a single word most environments guarantee the write will be atomic, but that doesn't mean that a larger piece of code that includes this fragment is thread-safe. Firstly, presumably your global variable contained a different value to begin with - otherwise if you know it's always the same, why is it a variable? Second, presumably you eventually read this value back again?
The issue is that presumably, you are writing to this bit of shared state for a reason - to signal that something has occurred? This is where it falls down: when you have no locking constructs, there is no implied order of memory accesses at all. It's hard to point to what's wrong here because your example doesn't actually contain the use of the variable, so here's a trivialish example in neutral C-like syntax:
int x = 0, y = 0;
//thread A does:
x = 1;
y = 2;
if (y == 2)
print(x);
//thread B does, at the same time:
if (y == 2)
print(x);
Thread A will always print 1, but it's completely valid for thread B to print 0. The order of operations in thread A is only required to be observable from code executing in thread A - thread B is allowed to see any combination of the state. The writes to x and y may not actually happen in order.
This can happen even on single-processor systems, where most people do not expect this kind of reordering - your compiler may reorder it for you. On SMP even if the compiler doesn't reorder things, the memory writes may be reordered between the caches of the separate processors.
If that doesn't seem to answer it for you, include more detail of your example in the question. Without the use of the variable it's impossible to definitively say whether such a usage is safe or not.
It depends on the work actually done by that statement. There can still be some cases where Something Bad happens - for example, if a C++ class has overloaded the = operator, and does anything nontrivial within that statement.
I have accidentally written code that did something like this with POD types (builtin primitive types), and it worked fine -- however, it's definitely not good practice, and I'm not confident that it's dependable.
Why not just lock the memory around this variable when you use it? In fact, if you somehow "know" this is the only write statement that can occur at some point in your code, why not just use the value 11 directly, instead of writing it to a shared variable?
(edit: I guess it's better to use a constant name instead of the magic number 11 directly in the code, btw.)
If you're using this to figure out when at least one thread has reached this statement, you could use a semaphore that starts at 1, and is decremented by the first thread that hits it.
I would expect the result to be undetermined. As in it would vary from compiler to complier, langauge to language and OS to OS etc. So no, it is not safe
WHy would you want to do this though - adding in a line to obtain a mutex lock is only one or two lines of code (in most languages), and would remove any possibility of problem. If this is going to be two expensive then you need to find an alternate way of solving the problem
In General, this is not considered a safe thing to do unless your system provides for atomic operation (operations that are guaranteed to be executed in a single cycle).
The reason is that while the "C" statement looks simple, often there are a number of underlying assembly operations taking place.
Depending on your OS, there are a few things you could do:
Take a mutual exclusion semaphore (mutex) to protect access
in some OS, you can temporarily disable preemption, which guarantees your thread will not swap out.
Some OS provide a writer or reader semaphore which is more performant than a plain old mutex.
Here's my take on the question.
You have two or more threads running that write to a variable...like a status flag or something, where you only want to know if one or more of them was true. Then in another part of the code (after the threads complete) you want to check and see if at least on thread set that status... for example
bool flag = false
threadContainer tc
threadInputs inputs
check(input)
{
...do stuff to input
if(success)
flag = true
}
start multiple threads
foreach(i in inputs)
t = startthread(check, i)
tc.add(t) // Keep track of all the threads started
foreach(t in tc)
t.join( ) // Wait until each thread is done
if(flag)
print "One of the threads were successful"
else
print "None of the threads were successful"
I believe the above code would be OK, assuming you're fine with not knowing which thread set the status to true, and you can wait for all the multi-threaded stuff to finish before reading that flag. I could be wrong though.
If the operation is atomic, you should be able to get by just fine. But I wouldn't do that in practice. It is better just to acquire a lock on the object and write the value.
Assuming that property will never be assigned anything other than 11, then I don't see a reason for assigment in the first place. Just make it a constant then.
Assigment only makes sense when you intend to change the value unless the act of assigment itself has other side effects - like volatile writes have memory visibility side-effects in Java. And if you change state shared between multiple threads, then you need to synchronize or otherwise "handle" the problem of concurrency.
When you assign a value, without proper synchronization, to some state shared between multiple threads, then there's no guarantees for when the other threads will see that change. And no visibility guarantees means that it it possible that the other threads will never see the assignt.
Compilers, JITs, CPU caches. They're all trying to make your code run as fast as possible, and if you don't make any explicit requirements for memory visibility, then they will take advantage of that. If not on your machine, then somebody elses.

Resources