When myThread.Start(...) is called, do we have the assurance that the thread is started? - multithreading

When myThread.Start(...) is called, do we have the assurance that the thread is started? The MSDN documentation isn't really specific about that. It says that the status of is changed to Running.
I am asking because I've seen a couple of times the following code. It creates a thread, starts it and then loop until the status become Running. Is that necessary to loop?
Thread t = new Thread(new ParameterizedThreadStart(data));
t.Start(data);
while (t.ThreadState != System.Threading.ThreadState.Running &&
t.ThreadState != System.Threading.ThreadState.WaitSleepJoin)
{
Thread.Sleep(10);
}
Thanks!

If you're set on not allowing your loop to continue until the thread has "started", then it will depend on what exactly you mean by "started". Does that mean that the thread has been created by the OS and signaled to run, but not necessarily that it's done anything yet? Does that mean that it's executed one or more operations?
While it's likely fine, your loop isn't bulletproof, since it's theoretically possible that the entire thread executes between the time you call Start and when you check the ThreadState; it's also not a good idea to check the property directly twice.
If you want to stick with checking the state, something like this would/could be more reliable:
ThreadState state = t.ThreadState;
while(state != ThreadState.Runnung && state != ThreadState.WaitSleepJoin)
{
Thread.Sleep(10:
state = t.ThreadState;
}
However, this is still subject to the possibility of the thread starting, running, then stopping before you even get the chance to check. Yes, you could expand the scope of the if statement to include other states, but I would recommend using a WaitHandle to signal when the thread "starts".
ManualResetEvent signal;
void foo()
{
Thread t = new Thread(new ParameterizedThreadStart(ThreadMethod));
signal = new ManualResetEvent();
t.Start(data);
signal.WaitOne();
/* code to execute after the thread has "started" */
}
void ThreadMethod(object foo)
{
signal.Set();
/* do your work */
}
You still have the possiblity of the thread ending before you check, but you're guaranteed to have that WaitHandle set once the thread starts. The call to WaitOne will block indefinitely until Set has been called on the WaitHandle.

Guess it depends on what you are doing after the loop. If whatever comes after it critically dependant on the thread running then checking is not a bad idea. Personnally I'd use a ManualResetEvent or something similiar that was set by the Thread rather than checking the ThreadStatus

No. Thread.Start causes a "thread to be scheduled for execution". It will start, but it may take a (short) period of time before the code within your delegate actually runs. In fact, the code above doesn't do what (I suspect) the author intended, either. Setting the thread's threadstate to ThreadState.Running (which does happen in Thread.Start) just makes sure it's scheduled to run -- but the ThreadState can be "Running" before the delegate is actually executing.
As John Bergess suggested, using a ManualResetEvent to notify the main thread that the thread is running is a much better option than sleeping and checking the thread's state.

Related

Why wait() method from QWaitCondition always takes a QMutex as parameter?

I am trying to pause my thread waiting for an user action. I know I could use Qt::BlockingQueuedConnection but that is not the point here. I would like to use QWaitCondition but I don't understand in this particular case why I need a QMutex.
Consider this code :
class MyWorker: public QThread
{
private:
QMutex mDummy;
QWaitCondition mStep1;
void doStuff1(){}
void doStuff2(){}
signals:
void step1Finished();
public:
MyWorker(...): {}
protected:
void run()
{
doStuff1();
emit step1Finished();
mDummy.lock();
mStep1.wait(mDummy);
mDummy.unlock();
doStuff2();
}
}
In this case the QMutex mDummy seems useless to me. I use it only because wait() need it as parameter.
I know that wait() unlock the mutex then (re)lock it after waking up, but why there no possibility to use wait() without it?
First of all, wait condition needs a mutex, so you gotta give it one. That's what a wait condition is. It is the most low level signalling mechanism between threads in multi-threading, so it doesn't provide the "convenience" you seem to be looking for.
But you also need the mutex to get things work right. A wait condition might have a spurious wakeup, that is it could be woken up for "no reason" (google "wait condition spurious wakeup" to learn more). So you have to have some condition in there to check, and keep waiting if it's still not time to continue. And to avoid race conditions, that check has to be protected by mutex.
Snippets:
// wait
mDummy.lock();
mStopWaiting = false; // maybe here, if you want to make sure this waits in all cases
while (!mStopWaiting)
{
// note that wait releases the mutex while waiting
mStep1.wait(&mDummy);
}
mDummy.unlock();
// signal end of wait
mDummy.lock();
mStopWaiting = true;
mStep1.wakeOne(); // or wakeAll() maybe depending on other code
mDummy.unlock();
As you can see, that mutex isn't so dummy after all. Note that all access to mStopWaiting has to be protected by this mutex, not just here.
Imagine you want to wait for something to happen. Since that something has to happen in another thread (since this thread is waiting) it has to be protected in some way to avoid race conditions.
Imagine you use the following code:
Acquire a lock.
Check if the thing you want to wait for has happened.
If it has, stop, you're done.
If it hasn't, wait.
Oops. We're still holding the lock. There's no way the thing we're waiting for can happen because no other thread can access it.
Let's try again.
Acquire a lock.
Check if the thing you want to wait for has happened.
If it has, stop, you're done.
If it hasn't, release the lock and wait.
Oops. What if after we release the lock but before we wait, it happens. Then we'll be waiting for something that already happened.
So what we need for step 4 is an atomic "unlock and wait" operation. This releases the lock and waits without giving another thread a chance to sneak in and change things before we can start waiting.
If you don't need an atomic "unlock and wait" operation, don't use QWaitCondition. This is its sole purpose. It takes a QMutex so it knows what to unlock. That QMutex must protect whatever it is the thread is waiting for or your code will be vulnerable to the very race condition QWaitCondition exists to solve for you.

why does std::condition_variable::wait need mutex?

TL;DR
Why does std::condition_variable::wait needs a mutex as one of its variables?
Answer 1
You may look a the documentation and quote that:
wait... Atomically releases lock
But that's not a real reason. That's just validate my question even more: why does it need it in the first place?
Answer 2
predicate is most likely query the state of a shared resource and it must be lock guarded.
OK. fair.
Two questions here
Is it always true that predicate query the state of a shared resource? I assume yes. I t doesn't make sense to me to implement it otherwise
What if I do not pass any predicate (it is optional)?
Using predicate - lock makes sense
int i = 0;
void waits()
{
std::unique_lock<std::mutex> lk(cv_m);
cv.wait(lk, []{return i == 1;});
std::cout << i;
}
Not Using predicate - why can't we lock after the wait?
int i = 0;
void waits()
{
cv.wait(lk);
std::unique_lock<std::mutex> lk(cv_m);
std::cout << i;
}
Notes
I know that there are no harmful implications to this practice. I just don't know how to explain to my self why it was design this way?
Question
If predicate is optional and is not passed to wait, why do we need the lock?
When using a condition variable to wait for a condition, a thread performs the following sequence of steps:
It determines that the condition is not currently true.
It starts waiting for some other thread to make the condition true. This is the wait call.
For example, the condition might be that a queue has elements in it, and a thread might see that the queue is empty and wait for another thread to put things in the queue.
If another thread were to intercede between these two steps, it could make the condition true and notify on the condition variable before the first thread actually starts waiting. In this case, the waiting thread would not receive the notification, and it might never stop waiting.
The purpose of requiring the lock to be held is to prevent other threads from interceding like this. Additionally, the lock must be unlocked to allow other threads to do whatever we're waiting for, but it can't happen before the wait call because of the notify-before-wait problem, and it can't happen after the wait call because we can't do anything while we're waiting. It has to be part of the wait call, so wait has to know about the lock.
Now, you might look at the notify_* methods and notice that those methods don't require the lock to be held, so there's nothing actually stopping another thread from notifying between steps 1 and 2. However, a thread calling notify_* is supposed to hold the lock while performing whatever action it does to make the condition true, which is usually enough protection.
TL;DR
If predicate is optional and is not passed to wait, why do we need the lock?
condition_variable is designed to wait for a certain condition to come true, not to wait just for a notification. So to "catch" the "moment" when the condition becomes true you need to check the condition and wait for the notification. And to avoid a race condition you need those two to be a single atomic operation.
Purpose Of condition_variable:
Enable a program to implement this: do some action when a condition C holds.
Intended Protocol:
Condition producer changes state of the world from !C to C.
Condition consumer waits for C to happen and takes the action while/after condition C holds.
Simplification:
For simplicity (to limit number of cases to think of) let's assume that C never switches back to !C. Let's also forget about spurious wakeups. Even with this assumptions we'll see that the lock is necessary.
Naive Approach:
Let's have two threads with an essential code summarized like this:
void producer() {
_condition = true;
_condition_variable.notify_all();
}
void consumer() {
if (!_condition) {
_condition_variable.wait();
}
action();
}
The Problem:
The problem here is a race condition. A problematic interleaving of the threads is following:
The consumer reads condition, checks it to be false and decides to wait.
A thread scheduler interrupts consumer and resumes producer.
The producer updates condition to become true and invokes notify_all().
The consumer is resumed.
The consumer actually does wait(), but is never notified and waken up (a liveness hazard).
So without locking the consumer may miss the event of the condition becoming true.
Solution:
Disclaimer: this code still does not handle spurious wakeups and possibility of condition becoming false again.
void producer() {
{ std::unique_lock<std::mutex> l(_mutex);
_condition = true;
}
_condition_variable.notify_all();
}
void consumer() {
{ std::unique_lock<std::mutex> l(_mutex);
if (!_condition) {
_condition_variable.wait(l);
}
}
action();
}
Here we check condition, release lock and start waiting as a single atomic operation, preventing the race condition mentioned before.
See Also
Why Lock condition await must hold the lock
You need a std::unique_lock when using std::condition_variable for the same reason you need a std::FILE* when using std::fwrite and for the same reason a BasicLockable is necessary when using std::unique_lock itself.
The feature std::fwrite gives you, entire the reason it exists, is to write to files. So you have to give it a file. The feature std::unique_lock provides you is RAII locking and unlocking of a mutex (or another BasicLockable, like std::shared_mutex, etc.) so you have to give it something to lock and unlock.
The feature std::condition_variable provides, the entire reason it exists, is the atomically waiting and unlocking a lock (and completing a wait and locking). So you have to give it something to lock.
Why would someone want that is a separate question that has been discussed already. For example:
When is a condition variable needed, isn't a mutex enough?
Conditional Variable vs Semaphore
Advantages of using condition variables over mutex
And so on.
As has been explained, the pred parameter is optional, but having some sort of a predicate and testing it isn't. Or, in other words, not having a predicate doesn't make any sense inn a manner similar to how having a condition variable without a lock doesn't making any sense.
The reason you have a lock is because you have shared state you need to protect from simultaneous access. Some function of that shared state is the predicate.
If you don't have a predicate and you don't have a lock you really don't need a condition variable just like if you don't have a file you really don't need fwrite.
A final point is that the second code snippet you wrote is very broken. Obviously it won't compile as you define the lock after you try to pass it as an argument to condition_variable::wait(). You probably meant something like:
std::mutex mtx_cv;
std::condition_variable cv;
...
{
std::unique_lock<std::mutex> lk(mtx_cv);
cv.wait(lk);
lk.lock(); // throws std::system_error with an error code of std::errc::resource_deadlock_would_occur
}
The reason this is wrong is very simple. condition_variable::wait's effects are (from [thread.condition.condvar]):
Effects:
— Atomically calls lock.unlock() and blocks on *this.
— When unblocked, calls lock.lock() (possibly blocking on the lock), then returns.
— The function will unblock when signaled by a call to notify_one() or a call to notify_all(), or spuriously
After the return from wait() the lock is locked, and unique_lock::lock() throws an exception if it has already locked the mutex it wraps ([thread.lock.unique.locking]).
Again, why would someone want coupling waiting and locking the way std::condition_variable does is a separate question, but given that it does - you cannot, by definition, lock a std::condition_variable's std::unique_lock after std::condition_variable::wait has returned.
It's not stated in the documentation (and could be implemented differently) but conceptually you can imagine the condition variable has another mutex to both protect its own data but also coordinate the condition, waiting and notification with modification of the consumer code data (e.g. queue.size()) affecting the test.
So when you call wait(...) the following (logically) happens.
Precondition: The consumer code holds the lock (CCL) controlling the consumer condition data (CCD).
The condition is checked, if true, execution in the consumer code continues still holding the lock.
If false, it first acquires its own lock (CVL), adds the current thread to the waiting thread collection releases the consumer lock and puts itself to waiting and releases its own lock (CVL).
That final step is tricky because it needs to sleep the thread and release the CVL at the same time or in that order or in a way that threads notified just before going to wait are able to (somehow) not go to wait.
The step of acquiring the CVL before releasing the CCD is key. Any parallel thread trying to update the CCD and notify will be blocked either by the CCL or CVL. If the CCL was released before acquiring the CVL a parallel thread could acquire the CCL, change the data and then notify before the the to-be-waiting thread is added to the waiters.
A parallel thread acquires the CCL, modifies the data to make the condition true (or at least worth testing) and then notifies. Notification acquires the the CVL and identifies a blocked thread (or threads) if any to unwait. The unwaited threads then seek to acquire the CCL and may block there but won't leave wait and re-perform the test until they've acquired it.
Notification must acquire the CVL to make sure threads that have found the test false have been added to the waiters.
It's OK (possibly preferable for performance) to notify without holding the CCL because the hand-off between the CCL and CVL in the wait code is ensuring the ordering.
It may be preferrable because notifying when holding the CCL may mean all the unwaited threads just unwait to block (on the CCL) while the thread modifying the data is still holding the lock.
Notice that even if the CCD is atomic you must modify it holding the CCL or that Lock CVL, unlock CCL step won't ensure the total ordering required to make sure notifications aren't sent when threads are in the process of going to wait.
The standard only talks about atomicity of operations and another implementation may have a way of blocking notification before completing the 'add to waiters' step has completed following a failed test. The C++ Standard is careful to not dictate an implementation.
In all that, to answer some of the specific questions.
Must the state be shared? Sort of. There could be an external condition like a file being in a directory and the wait is timed to re-try after a time-period. You can decide for yourself whether you consider the file system or even just the wall-clock to be shared state.
Must there be any state? Not necessarily. A thread can wait on notification.
That could be tricky to coordinate because there has to be enough sequencing to stop the other thread notifying out of turn. The commonest solution is to have some boolean flag set by the notifying thread so the notified thread knows if it missed it. The normal use of void wait(std::unique_lock<std::mutex>& lk) is when the predicate is checked outside:
std::unique_lock<std::mutex> ulk(ccd_mutex)
while(!condition){
cv.wait(ulk);
}
Where the notifying thread uses:
{
std::lock_guard<std::mutex> guard(ccd_mutex);
condition=true;
}
cv.notify();
The reason is that in some times the waiting-thread holds the m_mutex:
#include <mutex>
#include <condition_variable>
void CMyClass::MyFunc()
{
std::unique_lock<std::mutex> guard(m_mutex);
// do something (on the protected resource)
m_condiotion.wait(guard, [this]() {return !m_bSpuriousWake; });
// do something else (on the protected resource)
guard.unluck();
// do something else than else
}
and a thread should never go to sleep while holding a m_mutex. One doesn't want to lock everybody out, while sleeping. So, atomically: {guard is unlocked and the thread go to sleep}. Once it waked up by the other-thread (m_condiotion.notify_one(), let's say) guard is locked again, and then the thread continue.
Reference (video)
Because if not so, there's a race condition before the waiting thread noticing the change of the shared state and the wait() call.
Assume we got a shared state of type std::atomic state_, there's still a fair chance for the waiting thread to miss a notification:
T1(waiting) | T2(notification)
---------------------------------------------- * ---------------------------
1) for (int i = state_; i != 0; i = state_) { |
2) | state_ = 0;
3) | cv.notify();
4) cv.wait(); |
5) }
6) // go on with the satisfied condition... |
Note that the wait() call failed to notice the latest value of state_ and may keep waiting forever.

Why can a sub-classed QThread simply fail to start?

This is using a sub-classed QThread based on the ideas expressed in the whitepaper "QThreads: You were not doing so wrong". It does not have an event loop, nor does it have slots. It just emits signals and stops. In fact its primary signal is the QThread finished one.
Basically I have a Qt using a background thread to monitor stuff. Upon finding what it is looking for, it records its data, and terminates.
The termination sends a signal to the main event loop part of the application, which processes it, and when done, starts the background anew. I can usually get this working for tens of seconds, but then it just seems to quit.
It seems that when the main application tries to start the thread, it doesn't really run. I base this on telemetry code that increments counters as procedures get executed.
basically
//in main application. Setup not shown.
//background points to the QThread sub-class object
void MainWindow::StartBackground()
{
background->startcount++;
background->start();
if ( background->isRunning() )
{
background->startedcount++;
}
}
//in sub-classed QThread
void Background::run()
{
runcount++;
//Do stuff until done
}
So when I notice that it seems that my background thread isn't running, by watching Process Explorer, I cause the debugger to break in, and check the counts. What I see is that startcount and startedcount are equal. And have a value of one greater than runcount
So I can only conclude that the thread didn't really run, but I have been unable to find out any evidence of why.
I have not been able to find documentation on QThreads not starting do to some error condition, or what evidence there is of such an error.
I suppose I could set up a slot to catch started from the thread. The starting code could loop on a timed-out semaphore, trying again and again until the started slot actually resets the semaphore. But it feels ugly.
EDIT - further information
So using the semaphore method, I have a way to breakpoint on failure to start.
I sampled isFinished() right before I wanted to do start(), and it was false. After my 100ms semaphore timeout it became true.
So the question seems to be evolving into 'Why does QThread sometimes emit a finished() signal before isFinished() becomes true?'
Heck of a race condition. I'd hate to spin on isFinished() before starting the next background thread.
So this may be a duplicate of
QThread emits finished() signal but isRunning() returns true and isFinished() returns false
But not exactly, because I do override run() and I have no event loop.
In particular the events 8 and 9 in that answer are not in the same order. My slot is getting a finished() before isFinished() goes true.
I'm not sure an explicit quit() is any different than letting run() return;
It sounds as if you have a race condition wherein you may end up trying to restart your thread before the previous iteration has actually finished. If that's the case then, from what I've seen, the next call to QThread::start will be silently ignored. You need to update your code so that it checks the status of the thread before restarting -- either by calling QThread::isFinished or handling the QThread::finished signal.
On the other hand... why have the thread repeatedly started/stopped. Would it not be easier to simply start the thread once? Whatever code is run within the context of QThread::run can monitor whatever it monitors and signal the main app when it finds anything of note.
Better still. Separate the monitor logic from the thread entirely...
class monitor: public QObject {
.
.
.
};
QThread monitor_thread;
monitor monitor;
/*
* Fix up any signals to/from monitor.
*/
monitor.moveToThread(&monitor_thread);
monitor_thread.start();
The monitor class can do whatever it wants and when it's time to quit the app can just call monitor_thread::quit.
There is a race condition in the version of Qt I am using. I don't know if it was reported or not before, but I do not have the latest, so it's probably moot unless I can demonstrate it in the current version.
Similar bugs were reported here long ago:
QThread.isFinished returns False in slot connected to finished() signal
(the version I use is much more recent than Qt 4.8.5)
What more important is I can workaround it with the following code
while ( isRunning() )
{
msleep(1);
}
start();
I've run a few tests, and it never seems to take more than 1ms for the race condition to settle. Probably just needs a context switch to clean up.

Is there any meaning to call pthread_detach(th) after calling pthread_join(th,NULL) ?

I found a piece of strange code in a open source software
for (i=0; i<store->scan_threads; i++) {
pthread_join(thread_ids[i], NULL);
pthread_detach(thread_ids[i]);
}
Is there any meaning to call pthread_detach ?
That stanza is silly and unsafe.
Design-wise, the detach is unnecessary — the join completion already means that the thread is completely finished. There's nothing to detach. (The code in question simply spawns threads with default joinability.)
Implementation-wise, the detach is unsafe. A thread ID may be recycled as soon as the thread is finished — oops, didn't mean to detach that other thread! Worse, the ID is not guaranteed to be meaningful at all after the call to join returns — SEGV?
In this code (considering that this code is from main thread.... )
pthread_join(thread_ids[i], NULL);
this will wait the main thread to return thread with thread id "thread_ids[i]", and if main thread is doing some more work then
pthread_detach(thread_ids[i]);
will release the resource used by the thread (with thread id "thread_ids[i]).

Disabling a System.Threading.Timer instance while its callback is in progress

I am using two instances of System.Threading.Timer to fire off 2 tasks that are repeated periodically.
My question is: If the timer is disabled but at that point of time this timer is executing its callback on a thread, then will the Main method exit, or will it wait for the executing callbacks to complete?
In the code below, Method1RunCount is synchronized for read and write using lock statement ( this part of code is not shown below). The call back for timer1 increments Method1RunCount by 1 at end of each run.
static void Main(string[] args)
{
TimerCallback callback1 = Method1;
System.Threading.Timer timer1 = new System.Threading.Timer(callback1,null,0, 90000);
TimerCallback callback2 = Method2;
System.Threading.Timer timer2 = new System.Threading.Timer(callback2, null, 0, 60000);
while (true)
{
System.Threading.Thread.Sleep(250);
if (Method1RunCount == 4)
{
//DISABLE the TIMERS
timer1.Change(System.Threading.Timeout.Infinite, System.Threading.Timeout.Infinite);
timer2.Change(System.Threading.Timeout.Infinite, System.Threading.Timeout.Infinite);
break;
}
}
}
This kind of code tends to work by accident, the period of the timer is large enough to avoid the threading race on the Method1RunCount variable. Make the period smaller and there's a real danger that the main thread won't see the value "4" at all. Odds go down considerably when the processor is heavily loaded and the main thread doesn't get scheduled for while. The timer's callback can then execute more than once while the main thread is waiting for the processor. Completing missing the value getting incremented to 4. Note how the lock statement does not in fact prevent this, it isn't locked by the main thread since it is probably sleeping.
There's also no reasonable guess you can make at how often Method2 runs. Not just because it has a completely different timer period but fundamentally because it isn't synchronized to either the Method1 or the Main method execution at all.
You'd normally increment Method1RunCount at the end of Method1. That doesn't otherwise guarantee that Method1 won't be aborted. It runs on a threadpool thread, they have the Thread.IsBackground property always set to true. So the CLR will readily abort them when the main thread exits. This again tends to not cause a problem by accident.
If it is absolutely essential that Method1 executes exactly 4 times then the simple way to ensure that is to let Method1 do the counting. Calling Timer.Change() inside the method is fine. Use a class like AutoResetEvent to let the main thread know about it. Which now no longer needs the Sleep anymore. You still need a lock to ensure that Method1 cannot be re-entered while it is executing. A good way to know that you are getting thread synchronization wrong is when you see yourself using Thread.Sleep().
From the docs on System.Threading.Timer (http://msdn.microsoft.com/en-us/library/system.threading.timer.aspx):
When a timer is no longer needed, use the Dispose method to free the
resources held by the timer. Note that callbacks can occur after the
Dispose() method overload has been called, because the timer queues
callbacks for execution by thread pool threads. You can use the
Dispose(WaitHandle) method overload to wait until all callbacks have
completed.

Resources