QThread::idealThreadCount() always returning "2" - multithreading

I need to display, in a QSpinBox, the number of cores, or threads, that the CPU has. The problem is:
QThread cpuInfo(this); //get CPU info
ui->spnBx_nmb_nodes->setValue(cpuInfo.idealThreadCount()); //get thread count
This is always returning "2". I tried in a "2 cores/4 threads" notebook; a "4 cores/8 threads" computer and a "12 cores/ 24 threads" server. In all the cases, this is returning "2" as the ideal thread count.
Can someone, please, give me some light?

idealThreadCount()'s implementation is different on different OS's:
On Windows, QThread::idealThreadCount() calls the Win32 function GetNativeSystemInfo() and from its results, returns the dwNumberOfProcessors value from the SYSTEM_INFO struct that call populates.
On Linux (and most other Unix-y OS)'s, QThread::idealThreadCount() calls sysconf(_SC_NPROCESSORS_ONLN) and returns that value.
On MacOS/X (and BSD and iOS), QThread::idealThreadCount() calls sysctl(CTL_HW, HW_NCPU) and returns the value it receives from there.
QThread::idealThreadCount() also contains some other back-end implementations for less-commonly used OS's, which I won't attempt to summarize here; if you need to look for yourself, the code is at lines 461-515 of qtbase/src/corelib/thread/qthread_unix.cpp.
Given all of the above, the question devolves to, why is the OS-command (that Qt is calling through to) returning 2 instead of a more appropriate number? It sounds like a bug to me, although one other possibility is that idealThreadCount() is returning the correct number, but your QSpinBox is clamping that number down to 2 for some reason. If you haven't done so already, I suggest printing out the value returned by cpuInfo.idealThreadCount() directly, in addition to passing it to setValue(), just to be sure.

Try the following code:
auto const value = 8;
auto *nmb_nodes = ui->spnBx_nmb_nodes;
nmb_nodes->setValue(value);
Q_ASSERT(nmb_nodes->value() == value);
My bet is that the assertion will not be fulfilled. So your problem is likely not what you think it is.

Related

How does thread context-switching work with global variable?

I have been confused at this question:
I have C++ function:
void withdraw(int x) {
balance = balance - x;
}
balance is a global integer variable, which equals to 100 at the start.
We run the above function with two different thread: thread A and thread B. Thread A run withdraw(50) and thread B run withdraw(30).
Assuming we don't protect balance, what is the final result of balance after running those threads in following sequences?
A1->A2->A3->B1->B2->B3
B1->B2->B3->A1->A2->A3
A1->A2->B1->B2->B3->A3
B1->B2->A1->A2->A3->B3
Explanation:
A1 means OS execute the first line of function withdraw in thread A, A2 means OS execute the second line of function withdraw in thread A, B3 means OS execute the third line of function withdraw in thread B, and so on.
The sequence is how OS schedule thread A & B presumably.
My answer is
20
20
50 (Before context switch, OS saves balance. After context switch, OS restore balance to 50)
70 (Similar to above)
But my friend disagrees, he said that balance was a global variable. Thus it is not saved in stack, so it does not affected by context switching. He claimed that all 4 sequences result in 20.
So who is right? I can't find fault in his logic.
(We assume we have one processor that can only execute one thread at a time)
Consider this line:
balance = balance - x;
Thread A reads balance. It is 100. Now, thread A subtracts 50 and ... oops
Thread B reads balance. It is 100. Now, thread B subtracts 30 and updates the variable, which is now 70.
...thread A continues now updates the variable, which is now 50. You've just lost the work that Thread B.
Threads don't execute "lines of code" -- they execute machine instructions. It does not matter if a global variable is affected by context switching. What matters is when the variable is read, and when it is written, by each thread, because the value is "taken off the shelf" and modified, then "put back". Once the first thread has read the global variable and is working with the value "somewhere in space", the second thread must not read the global variable until the first thread has written the updated value.
Unless the threading standard you are using specifies, then there's no way to know. Most typical threading standards don't, so typically there's no way to know.
Your answer sounds like nonsense though. The OS has no idea what balance is nor any way to do anything to it around a context switch. Also, threads can run at the same time without context switches.
Your friend's answer also sounds like nonsense. How does he know that it won't be cached in a register by the compiler and thus some of the modifications will stomp on previous ones?
But the point is, both of you are just guessing about what might happen to happen. If you want to answer this usefully, you have to talk about what is guaranteed to happen.
Clearly homework, but saved by doing actual work before asking.
First, forget about context switching. Context switching is totally irrelevant to the problem. Assume that you have multiple processors, each executing one thread, and each progressing at an unknown speed, stopping and starting at unpredictable times. Even better, assume that this stopping and storing is controlled by an enemy, who will try to break your program.
And because context switching is irrelevant, the OS will not save or restore anything. It won't touch the variable balance. Only your two threads will.
Your friend is absolutely, totally wrong. It's quite the opposite. Since balance is a global variable, both threads can read and write it. But you don't only have the problem that they might read and write it in unknown order, as you examined, it is worse. They could access it at the same time, and if one thread modifies data while another reads it, you have a race condition and anything at all could happen. Not only could you get any result, your program could also crash.
If balance was a local variable saved on the stack, then both threads would have each its own variable, and nothing bad would happen.
Consider this line:
balance = balance - x;
Thread A reads balance. It is 100. Now, thread A subtracts 50 and ... oops
Thread B reads balance. It is 100. Now, thread B subtracts 30 and updates the variable, which is now 70.
...thread A continues and updates the variable, which is now 50. You've just completely lost the work of Thread B.
Threads don't execute "lines of code" -- they execute machine instructions. It does not matter if a global variable is affected by context switching. What matters is when the variable is read, and when it is written, by each thread, because the value is "taken off the shelf" and modified, then "put back". Once the first thread has read the global variable and is working with the value "somewhere in space", the second thread cannot read the global variable until the first thread has written the updated value.
Simple and short answer for c++: Unsynchronized access to a shared variable is undefined behavior, so anything can happen. The value can e.g. be 100,70,50,20,42 or -458995. The program could crash or not. And in theory its even allowed to order pizza.
The actual machine code that is executed is usually far away from what your program looks like and in the case of undefined behavior, you are no longer guaranteed, that the actual behavior has anything to do with the c++ code you have written.

Delphi [volatile] and InterlockedCompareExchange not reliable?

I wrote a simple lock-free node stack (Delp[hi XE4, Win7-64, 32-bit app) where I can have multiple 'stacks' and pop/push nodes between them from various threads simultaneously.
It works 99.999% of the time but eventually glitch under a stress test using all CPU cores.
Stripped-down, it comes down to this (not real/compiled code):
Nodes are :
type POBNode = ^TOBNode;
[volatile]TOBNode = record
[volatile]Next : POBNode;
Data : Int64;
end;
Simplified stack :
type TOBStack = class
private
[volatile]Head:POBNode;
function Pop:POBNode;
procedure Push(NewNode:POBNode);
end;
procedure TOBStack.Push(NewNode:POBNode);
var zTmp : POBNode;
begin;
repeat
zTmp:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
NewNode.Next:=zTmp;
if InterlockedCompareExchangePointer(Head,NewNode,zTmp)=zTmp
then break (*success*)
else continue;
until false;
end;
function TOBStack.Pop:POBNode;
begin;
repeat
Result:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
if Result=nil
the exit;
NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
then break (*Success*)
else continue;(*Fail, try again*)
until False;
end;
I have tried many variations on this but cannot get it to be stable.
If I create a few threads that each have a stack and they all push/pop to/from a global stack, it eventually glitch, but not quickly. Only when I stress it for minutes on end from multiple threads, in tight loops.
I cannot have hidden bugs in my code, so I need more advice than to ask a specific question to get this running 100% error-free, 24/7.
Does the code above look fine for a lock-free thread-safe stack?
What else can I look at? This is impossible to debug normally as the errors occur at various places, telling me there is a pointer or RAM corruption happening somewhere. I also get duplicate entries, meaning that a node that was popped of one stack, then later returned to that same stack, is still on top of the old stack... Impossible to happen according to my algorithm? This lead me to believe it's possible to violate Delphi/Windows InterlockedCompareExchange Methods or there is some hidden knowledge I have yet to be revealed. :) (I also tried TInterlocked)
I have made a complete test case that can be copied from ftp.true.co.za.
In there I run 8 threads doing 400 000 push/pops each and it usually crashes (safely due to checks/raised exceptions) after a few cycles of these tests, sometimes many many test cycles complete before one suddenly crash.
Any advice would be appreciated.
Regards
Anton
E
At first I was skeptical of this being an ABA problem as indicated by gabr. It seemed to me that: if one thread looks at the current Head, and another thread pushes then pops; you're happy to still operate on the same Head in the same way.
However, consider this from your Pop method:
NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
If the thread is swapped out after the first line.
A value for NewHead is stored in a local variable.
Then another thread successfully pops the node this thread was targetting.
It also manages to push the same node back, but with a different value for Next before the first thread resumes.
The second line will pass the comparand check allowing head to receive the NewHead value from the local variable.
However, the current value for NewHead is incorrect, thereby corrupting your stack.
There's a subtle variation on this problem not even covered by your test app. This problem isn't checked in your test app because you aren't destroying any nodes until the end of your test.
Look at current head
Another thread pops some nodes.
The nodes are destroyed, and new nodes created and pushed.
By the time your "looking thread" is active again, it could be looking at an entirely different node that is coincidentally at the same address.
If you're popping, you might assign a garbage pointer to Head.
Apart form the above...
Looking at your test app there's also some really dodgy code. E.g.
You generate a "random number": J:=GetTickCount and 7;(*Get a 'random' number 0..7*).
Do you realise how fast computers are?
Do you realise that GetTickCount will generate reams of duplicates in a tight loop?
I.e. the numbers you generate will be nothing like random.
And when comments don't agree with code, my spidey-sense kicks in.
You're allocating memory of a hard-coded size: GetMem(zTmp,12);(*Allocate new node*).
Why aren't you using SizeOf?
You're using a multi-platform compiler.... The size of that structure can vary.
There is absolutely zero reason to hard-code the size of the structure.
Right now, given these two examples, I wouldn't be entirely confident that there isn't also an error in your test code.

Java: ordering results retrieved from asynchronous tasks

I've got a computation (CTR encryption) that requires results in a precise order.
For this I created a multithreaded design that calculates said results, in this case the result is a ByteBuffer. The calculation itself of course runs asynchronous, so the results may become available at any time and in any order. The "user" is a single-threaded application that uses the results by calling a method, after which the ByteBuffers are returned to the pool of resources by said method - the management of resources is already handled (using a thread safe stack).
Now the question: I need something that aggregates the results and makes them available in the right order. If the next result is not available, the method that the user called should block until it is. Does anyone know a good strategy or class in java.util.concurrent that can return asynchronously calculated results in order?
The solution it must be thread safe. I would like to avoid third party libraries, Thread.sleep() / Thread.wait() and theading related keywords other than "synchronized". Futhermore, The tasks may be given to e.g. an Executor in the correct order if that is required. This is for research, so feel free to use Java 1.6 or even 1.7 constructs.
Note: I've tagged these quesions [jre] as I want to keep within the classes defined in the JRE and [encryption] as somebody may already have had to deal with it, but the question itself is purely about java & multi-threading.
Use the executors framework:
ExecutorService executorService = Executors.newFixedThreadPool(5);
List<Future> futures = executorService.invokeAll(listOfCallables);
for (Future future : futures) {
//do something with future.get();
}
executorService.shutdown();
The listOfCallables will be a List<Callable<ByteBuffer>> that you have constructed to operate on the data. For example:
list.add(new SubTaskCalculator(1, 20));
list.add(new SubTaskCalculator(21, 40));
list.add(new SubTaskCalculator(41, 60));
(arbitrary ranges of numbers, adjust that to your task at hand)
.get() blocks until the result is complete, but at the same time other tasks are also running, so when you reach them, their .get() will be ready.
Returning results in the right order is trivial. As each result arrives, store it in an arraylist, and once you have ALL the results, just sort the arraylist. You could use a PriorityQueue to keep the results sorted at all times as they arrive, but there is no point in doing this, since you will not be making any use of the results before all of them have arrived anyway.
So, what you could do is this:
Declare a "WorkItem" class which contains one of your bytearrays and its ordinal number, so that they can be sorted by ordinal number.
In your work threads, do something like this:
...do work and produce a work_item...
synchronized( LockObject )
{
ResultList.Add( work_item );
number_of_results++;
LockObject.notifyAll();
}
In your main thread, do something like this:
synchronized( LockObject )
while( number_of_results != number_of_items )
LockObject.wait();
ResultList.Sort();
...go ahead and use the results...
My new answer after gaining a better understanding of what you want to do:
Declare a "WorkItem" class which contains one of your bytearrays and its ordinal number, so that they can be sorted by ordinal number.
Make use of a java.util.PriorityQueue which is kept sorted by ordinal number. Essentially, all we care is that the first item in the priority queue at any given time will be the next item to process.
Each work thread stores its result in the PriorityQueue and issues a NotifyAll on some locking object.
The main thread waits on the locking object, and then if there are items in the queue, and if the ordinal of the (peeked, not dequeued) first item in the queue is equal to the number of items processed so far, then it dequeues the item and processes it. If not, it keeps waiting. If all of the items have been produced and processed, it is done.

When should the Win32 InterlockedExchange function be used?

I came across the function InterlockedExchange and was wondering when I should use this function. In my opinion, setting a 32 Bit value on an x86 processor should always be atomic?
In the case where I want to use the function, the new value does not depend on the old value (it is not an increment operation).
Could you provide an example where this method is mandatory (I'm not looking for InterlockedCompareExchange)
InterlockedExchange is both a write and a read -- it returns the previous value.
This is necessary to ensure another thread didn't write a different value just after you did. For example, say you're trying to increment a variable. You can read the value, add 1, then set the new value with InterlockedExchange. The value returned by InterlockedExchange must match the value you originally read, otherwise another thread probably incremented it at the same time, and you need to loop around and try again.
As well as writing the new value, InterlockedExchange also reads and returns the previous value; this whole operation is atomic. This is useful for lock-free algorithms.
(Incidentally, 32-bit writes are not guaranteed to be atomic. Consider the case where the write is unaligned and straddles a cache boundary, for instance.)
In a multi-processor or multi-core machine each core has it's own cache - so each core has each own potentially different "view" of what the content of the system memory is.
Thread synchronization mechanisms take care of synchronizing between cores, for more information look at http://blogs.msdn.com/oldnewthing/archive/2008/10/03/8969397.aspx or google for acquire and release semantics
Setting a 32-bit value is atomic, but only if you're setting a literal.
b = a is 2 operations:
mov eax,dword ptr [a]
mov dword ptr [b],eax
Theoretically there could be some interruption between the first and second operation.
Writing a value is never atomic by default. When you write a value to a variable, several machine instructions are generated. With modern, preemptive OSes, the OS might switch to another thread between the individual operations of the write.
This is even more a problem on multi-processor machines, where several threads could be executing at the same time, and trying to write to a single memory location simultaneously.
Interlocked operations avoid this by using specialized instructions to make the write (x86 has dedicated instructions for this kind of situation), which do the read-modify-write in one instruction. These instructions also lock the memory bus of all processors, to ensure that no other executing thread could be writing to the value at the same time.
InterlockedExchange makes sure that the change of a variable and the return of its original value are not interrupted by other threads.
So, if 'i' is an int, these calls (taken individually) do not need InterlockedExchange around 'i':
a = i;
i = 9;
i = a;
i = a + 9;
a = i + 9;
if(0 == i)
None of these statements rely upon BOTH the initial AND final values of 'i'. But these following calls DO need InterlockedExchange around 'i':
a = i++; //a = InterlockedExchange(&i, i + 1);
Without it, two threads running through this same code might get the same value of 'i' assigned to 'a' or 'a' may unexpectedly skip two or more numbers.
if(0 == i++) //if(0 == InterlockedExchange(&i, i + 1))
Two threads may both execute the code that is only supposed to happen once.
etc.
wow, so many conflicting answers. Hard to sift through who's right, who's wrong, and what information is misleading.
I'm unsure of the answer too, given the above half-answers, but I think it works like this, I may be wrong, and it will be interesting to find out if I am:
32-bit read & writes ARE atomic, but depending on your code, that may not mean much.
don't worry about non-aligned read/writes. ALL 32-bit writes to a 32-bit variable have to be aligned or the machine page-faults.
don't worry about a write wrapping around the end of a cached page, that can't happen.
If you need to write-then-read on one thread, and you're writing on another thread, then you need to use InterlockedExchange. If you're simply reading the value on one thread, and writing it on another, then you don't need to use it, but those values may be wiggly because of multithreading.

Is it ok to have multiple threads writing the same values to the same variables?

I understand about race conditions and how with multiple threads accessing the same variable, updates made by one can be ignored and overwritten by others, but what if each thread is writing the same value (not different values) to the same variable; can even this cause problems? Could this code:
GlobalVar.property = 11;
(assuming that property will never be assigned anything other than 11), cause problems if multiple threads execute it at the same time?
The problem comes when you read that state back, and do something about it. Writing is a red herring - it is true that as long as this is a single word most environments guarantee the write will be atomic, but that doesn't mean that a larger piece of code that includes this fragment is thread-safe. Firstly, presumably your global variable contained a different value to begin with - otherwise if you know it's always the same, why is it a variable? Second, presumably you eventually read this value back again?
The issue is that presumably, you are writing to this bit of shared state for a reason - to signal that something has occurred? This is where it falls down: when you have no locking constructs, there is no implied order of memory accesses at all. It's hard to point to what's wrong here because your example doesn't actually contain the use of the variable, so here's a trivialish example in neutral C-like syntax:
int x = 0, y = 0;
//thread A does:
x = 1;
y = 2;
if (y == 2)
print(x);
//thread B does, at the same time:
if (y == 2)
print(x);
Thread A will always print 1, but it's completely valid for thread B to print 0. The order of operations in thread A is only required to be observable from code executing in thread A - thread B is allowed to see any combination of the state. The writes to x and y may not actually happen in order.
This can happen even on single-processor systems, where most people do not expect this kind of reordering - your compiler may reorder it for you. On SMP even if the compiler doesn't reorder things, the memory writes may be reordered between the caches of the separate processors.
If that doesn't seem to answer it for you, include more detail of your example in the question. Without the use of the variable it's impossible to definitively say whether such a usage is safe or not.
It depends on the work actually done by that statement. There can still be some cases where Something Bad happens - for example, if a C++ class has overloaded the = operator, and does anything nontrivial within that statement.
I have accidentally written code that did something like this with POD types (builtin primitive types), and it worked fine -- however, it's definitely not good practice, and I'm not confident that it's dependable.
Why not just lock the memory around this variable when you use it? In fact, if you somehow "know" this is the only write statement that can occur at some point in your code, why not just use the value 11 directly, instead of writing it to a shared variable?
(edit: I guess it's better to use a constant name instead of the magic number 11 directly in the code, btw.)
If you're using this to figure out when at least one thread has reached this statement, you could use a semaphore that starts at 1, and is decremented by the first thread that hits it.
I would expect the result to be undetermined. As in it would vary from compiler to complier, langauge to language and OS to OS etc. So no, it is not safe
WHy would you want to do this though - adding in a line to obtain a mutex lock is only one or two lines of code (in most languages), and would remove any possibility of problem. If this is going to be two expensive then you need to find an alternate way of solving the problem
In General, this is not considered a safe thing to do unless your system provides for atomic operation (operations that are guaranteed to be executed in a single cycle).
The reason is that while the "C" statement looks simple, often there are a number of underlying assembly operations taking place.
Depending on your OS, there are a few things you could do:
Take a mutual exclusion semaphore (mutex) to protect access
in some OS, you can temporarily disable preemption, which guarantees your thread will not swap out.
Some OS provide a writer or reader semaphore which is more performant than a plain old mutex.
Here's my take on the question.
You have two or more threads running that write to a variable...like a status flag or something, where you only want to know if one or more of them was true. Then in another part of the code (after the threads complete) you want to check and see if at least on thread set that status... for example
bool flag = false
threadContainer tc
threadInputs inputs
check(input)
{
...do stuff to input
if(success)
flag = true
}
start multiple threads
foreach(i in inputs)
t = startthread(check, i)
tc.add(t) // Keep track of all the threads started
foreach(t in tc)
t.join( ) // Wait until each thread is done
if(flag)
print "One of the threads were successful"
else
print "None of the threads were successful"
I believe the above code would be OK, assuming you're fine with not knowing which thread set the status to true, and you can wait for all the multi-threaded stuff to finish before reading that flag. I could be wrong though.
If the operation is atomic, you should be able to get by just fine. But I wouldn't do that in practice. It is better just to acquire a lock on the object and write the value.
Assuming that property will never be assigned anything other than 11, then I don't see a reason for assigment in the first place. Just make it a constant then.
Assigment only makes sense when you intend to change the value unless the act of assigment itself has other side effects - like volatile writes have memory visibility side-effects in Java. And if you change state shared between multiple threads, then you need to synchronize or otherwise "handle" the problem of concurrency.
When you assign a value, without proper synchronization, to some state shared between multiple threads, then there's no guarantees for when the other threads will see that change. And no visibility guarantees means that it it possible that the other threads will never see the assignt.
Compilers, JITs, CPU caches. They're all trying to make your code run as fast as possible, and if you don't make any explicit requirements for memory visibility, then they will take advantage of that. If not on your machine, then somebody elses.

Resources