SetEvent ResetEvent WaitForMultipleObjectsEx - Race condition? - multithreading

I am not able to understand the PulseEvent or race condition. But to avoid it I am trying to SetEvent instead, and ResetEvent every time before WaitForMultipleObjectsEx.
This is my flow:
Thread One - Uses CreateEvent to create an auto reseting event, I then spawn and tell Thread TWO about it.
Thread One - Tell thread TWO to run.
Thread TWO will do ResetEvent on event and then immediately start WaitForMultipleObjectsEx on the event and some other stuff for file watching. If WaitForMultipleObjectsEx returns, and it is not due to the event, then restart the loop immediately. If WaitForMultipleObjectsEx returns, due to event going to signaled, then do not restart loop.
So now imagine this case please:
Thread TWO - loop is running
Thread One - needs to add a path, so it does (1) SetEvent, and then (2) sends another message to thread 2 to add a path, and then (3) sends message to thread 2 to restart loop.
The messages of add path and restart loop will not come in to Thread TWO unless I stop the loop in TWO, which is done by the SetEvent. Thread TWO will see it was stoped due to the event, and so it wont restart the loop. So it will now get the message to add path, so it will add path, then restart loop.
Thread One - needs to stop the thread, so it does (1) SetEvent and then (2) waits for message thread 2, when it gets that message it will terminate the thread.
Will this avoid race condition?
Thank you

Suppose the loop needs to be interrupted twice in succession. You're imagining a sequence of events something like this, on thread ONE and thread TWO:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread TWO reads the message "restart the wait loop".
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread ONE sends message related to the second interruption.
Thread TWO stops the loop, receives the message about the second interruption.
But since you don't have any control over the timing between the two threads, it might instead happen like this:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread TWO reads the message "restart the wait loop".
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE sends a message about the second interruption, but TWO isn't listening!
Even if the message passing mechanism is synchronous, so that ONE won't continue until TWO has read the message, it could happen this way:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread TWO reads the message "restart the wait loop", but is then swapped out.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE sends a message about the second interruption, but TWO isn't listening!
(Obviously, a similar thing can happen if you use PulseEvent.)
One quick solution would be to use a second event for TWO to signal ONE at the appropriate point, i.e., after resetting the main event but before waiting on it, but that seems somewhat inelegant and also doesn't generalize very well. If you can guarantee that there will never be two interruptions in close-enough succession, you might simply choose to ignore the race condition, but note that it is difficult to reason about this because there is no theoretical limit to how long it might take for thread TWO to resume running after being swapped out.
The various alternatives depend on how the messages are being passed between the threads and any other constraints. [If you can provide more information about your current implementation I'll update my answer accordingly.]
This is an overview of some of the more obvious options.
If the message-passing mechanism is synchronous (if thread ONE waits for thread TWO to receive the message before proceeding) then using a single auto-reset event should just work. Thread ONE won't set the event until after thread TWO has received the restart-loop message. If the event is already set when thread TWO starts waiting, that just means that there were two interruptions in immediate succession; TWO will never stall waiting for a message that isn't coming. [This potential stall is the only reason I can think of why you might not want to use an auto-reset event. If you have another concern, please edit your question to provide more details.]
If is OK for sending a message to be non-blocking, and you aren't already locked in to a particular solution, any of these options would probably be sensible:
User mode APCs (the QueueUserAPC function) provide a message-passing mechanism that automatically interrupts alertable waits.
You could implement a simple queue (protected by a critical section) which uses an event to indicate whether there is a message pending or not. In this case you can safely use a manual-reset event provided that you only manipulate it when you hold the same critical section that protects the queue.
You could use an auto-reset event in combination with any sort of thread-safe queue, provided only that the queue allows you to test for emptiness without blocking. The idea here is that thread ONE would always insert the message into the queue before setting the event, and if thread TWO sees that the event is set but it turns out that the queue is empty, the event is ignored. If efficiency is a concern, you might even be able to find a suitable lock-free queue implementation. (I don't recommend attempting that yourself.)
(All of those mechanisms could also be made synchronous by using a second event object.)
I wouldn't recommend the following approaches, but if you happen to already be using one of these for messaging this is how you can make it work:
If you're using named pipes for messaging, you could use asynchronous I/O in thread TWO. Thread TWO would use an auto-reset event internally, you specify the event handle when you issue the I/O call and Windows sets it when I/O arrives. From the point of view of thread ONE, there's only a single operation. From the point of view of thread TWO, if the event is set, a message is definitely available. (I believe this is somewhat similar to your original approach, you just have to issue the I/O call in advance rather than afterwards.)
If you're using a window queue for messaging, the MsgWaitForMultipleObjectsEx() function allows you to wait for a window message and other events simultaneously.
PS:
The other problem with PulseEvent, the one mentioned in the documentation, is that this can happen:
Thread TWO starts waiting.
Thread TWO is preempted by Windows and all user code on the thread stops running.
Thread ONE pulses the event.
Thread TWO is restarted by Windows, and the wait is resumed.
Thread ONE sends a message, but TWO isn't listening.
(Personally I'm a bit disappointed that the kernel doesn't deal with this situation; I would have thought that it would be possible for it to set a flag saying that the wait shouldn't be resumed. But I can only assume that there is a good reason why this is impractical.)

The Auto-Reset Events
Would you please try to change the flow so there is just SetEvent and WaitForMultipleObjectsEx with auto-reset events? You may create more events if you need. For example, each thread will have its own pair of events: one to get notifications and another to report about its state changes - you define the scheme that best suits your needs.
Since there will be auto-reset events, there would be neither ResetEvent nor PulseEvent.
If you will be able to change the logic of the algorithm flow this way - the program will become clear, reliable, and straightforward.
I advise this because this is how our applications work since the times of Windows NT 3.51 – we manage to do everything we need with just SetEvent and WaitForMultipleObjects (without the Ex suffix).
As for the PulseEvent, as you know, it is very unreliable, even though it exists from the very first version of Windows NT - 3.1 - maybe it was reliable then, but not now.
To create the auto-reset events, use the bManualReset argument of the CreateEvent API function (if this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled -- this is not what you need). If this parameter is FALSE, the function creates an auto-reset event object. The system will automatically reset the event state to non-signaled after a single waiting thread has been released, i.e., after WaitForMultipleObjects or WaitForSingleObject or other wait functions that explicitly wait for this event to become signaled.
These auto-reset events are very reliable and easy to use.
Let me make a few additional notes on the PulseEvent. Even Microsoft has admitted that PulseEvent is unreliable and should not be used -- see https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx -- because only those threads will be notified that are in the "wait" state when PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is, and, even if you are responsible for the program flow, the state can be changed by the operating system contrary to your program logic. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call (APC) and returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode APC at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
The Manual-Reset Events
The Manual-Reset events are not that bad. :-) You can reliably use them when you need to notify multiple instances of a global state change that occurs only once, for example, application exit. The auto-reset events can only be used to notify one thread (because if more threads are waiting simultaneously for an auto-reset event and you set the event, one random thread will exist and will reset the event, but the behavior of the remaining threads that also wait for the event, will be undefined). From the Microsoft documentation, we may assume that one and only one thread will exit while others would definitely not exit, but this is not very explicitly articulated in the documentation. Anyway, we must take the following quote into consideration: "Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order" Source - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx
So, when you need to notify all the threads quickly – just set the manual-reset event to the signaled state, rather than signaling each auto-reset event for each thread. Once you have signaled the manual-reset event, do not call ResetEvent since then. The drawback of this solution is that the threads need to have an additional event handle passed in the array of their WaitForMultipleObjects. The array size is limited, although, to MAXIMUM_WAIT_OBJECTS, which is 64, we never reached close to this limit in practice.
You can get more ideas about auto-reset events and manual reset events from https://www.codeproject.com/Articles/39040/Auto-and-Manual-Reset-Events-Revisited

Related

set a deadline for each callback in an event-driven/ event-loop based program

In a typical ASIO or event-based programming library like libevent, is there a way to set a deadline for each callback?
I am worried about possible infinite loops within the callbacks. Is there a way to gracefully detect them, remove the misbehaving callback from task queue and continue processing other tasks in the queue?
I can think of a way to detect it through an external thread and kill the event-loop thread and create a different thread but I am trying to see if there are any other commonly used methods. I believe this is a problem which someone has faced at some point of time and thought through a solution
There is no general way to unstick a thread without its cooperation, whether it's running a callback or not. The thread may hold critical locks or may have acquired resources that would never get released if the thread was somehow coerced to stop from the outside.
If you really do need this functionality, then all code that could potentially be interrupted must be designed to support some specific method of interruption. You can start a deadline timer when you enter the callback and cancel it when you're finished. The deadline timer would have to trigger the thread's interruption mechanism. You'd need at least one other thread running the I/O service in order for some thread to run the timer handler while the callback was running in another thread.
You can also isolate the code in its own process with some kind of wrapper. Then if the code fails to terminate, you can kill the process from the outside.

How do you detect that a TEvent has been set?

The Delphi XE2 documentation says this about TEvent:
Sometimes, you need to wait for a thread to finish some operation rather than waiting for a particular thread to complete execution. To do this, use an event object. Event objects (System.SyncObjs.TEvent) should be created with global scope so that they can act like signals that are visible to all threads.
When a thread completes an operation that other threads depend on, it calls TEvent.SetEvent. SetEvent turns on the signal, so any other thread that checks will know that the operation has completed. To turn off the signal, use the ResetEvent method.
For example, consider a situation where you must wait for several threads to complete their execution rather than a single thread. Because you don't know which thread will finish last, you can't simply use the WaitFor method of one of the threads. Instead, you can have each thread increment a counter when it is finished, and have the last thread signal that they are all done by setting an event.
The Delphi documentation does not, however, explain how another thread can detect that TEvent.Set event was called. Could you please explain how to check to see if TEvent.Set was called?
If you want to test if an event is signaled or not, call the WaitFor method and pass a timeout value of 0. If the event is set, it will return wrSignaled. If not, it will time out immediately and return wrTimeout.
Having said that, the normal usage of an event is not to check whether it's signaled in this manner, but to synchronize by blocking the current thread until the event is signaled. You do this by passing a nonzero value to the timeout parameter, either the constant INFINITE if you're certain that it will finish and you want to wait until it does, or a smaller value if you don't want to block for an indefinite amount of time.

some information on timer_helper_thread() of librt.so.1

Can anybody give some information on timer_helper_thread() function of librt.so.1.
I am using posix timer_create() function in my application for timer functionality and i am using SIEV_THREAD for notifiction. When timeout happens, i could see in gdb that two thread are getting created. One is the thread whose start function i have specified and another is the thread whose start function is timer_help_therad() of librt.so.1. Among these two timer_helper_thread() is not exiting even after my thread is exiting. Can anbody tell me when will timer_helper_thread() exit and give some informatin on it?
Short answer: don't worry about it; it's an implementation detail and will clean up after itself when your program exits. But if you're curious...
From glibc's timer_create(2) man page:
SIGEV_THREAD:
Upon timer expiration, invoke sigev_notify_function as if it were the start function of a new thread. (Among the implementation possibilities here are that each timer notification could result in the creation of a new thread, or that a single thread is created to receive all notifications.)
And also:
The functionality for SIGEV_THREAD is implemented within glibc, rather than the kernel.
So glibc (i.e. librt.so) assumes that the kernel cannot create a thread in response to a timer event -- that all it supports is sending a signal. So someone needs to receive that signal and create the handler thread. If you wanted to muck with the details of receiving the signal yourself, you wouldn't have used SIGEV_THREAD, so glibc doesn't bother you and instead creates its own thread just for handling timer events.
This timer helper thread lasts from the fist time you call timer_create() until your program ends. Unless you're doing something unusual, you don't need to worry about it; it will clean up after itself when your program exits. The only thing it does is wait for a timer to expire, so it's not using up any extra processing power. Furthermore, it looks like there will only ever be the one helper thread, no matter how many timers you create.
#jander: Your comment is interesting here "This timer helper thread lasts from the fist time you call timer_create() until your program ends."
There are threads created on everytime a timer is timeout. Is this same as the timer_helper_thread() you mention?
I have a similar post where I observe a separate thread created only for timer_create(). Would this be the timer_helper_thread()?
Ref: New thread on invocation of timer_create()

What's the best way to signal threads that sleep or block to stop?

I've got a service that I need to shut down and update. I'm having difficulties with this in two different cases:
I have some threads that sleep for large amounts of time. Obviously I can't wait for them to wake up to finish shutting down the service. I had a thought to use an AutoResetEvent that gets set by some controller thread when the sleep interval is up (by just checking every two seconds or something), and triggering it immediately at OnClose time. Is there a better way to facilitate that?
I have one thread that makes a call to a blocking method call (one which I cannot modify). How do you signal such a thread to stop?
I'm not sure if I understood your first question correctly, but have you looked at using WaitForSingleObject as an alternative to Sleep? You can specify a timeout as well as an object to wait on, so if you want it to wake up earlier, just signal the object.
What exactly do you mean by "call to a blocking thread"? Or did you just mean a blocking call? In general, there isn't a way to interrupt a thread without forcefully terminating it. However, if the call is a system call, there might be ways to return control by making the call fail, eg. cancelling I/O or closing an associated handle.
For 1. you can get your threads into an interruptable Sleep by using SleepEx rather than Sleep. Once they get this shutdown kick (initiated from your termination logic using QueueUserApc), you can detect it happened using the return code from SleepEx and terminate those threads accordingly. This is similar to the suggestion to use WaitForSingleObject, but you don't need another per-thread handle that's just used to terminate the associated thread.
The return value is zero if the
specified time interval expired.
The return value is WAIT_IO_COMPLETION
if the function returned due to one or
more I/O completion callback
functions. This can happen only if
bAlertable is TRUE, and if the thread
that called the SleepEx function is
the same thread that called the
extended I/O function.
For 2., that's a tough one unless you have access to some resource used in that thread that can cause the blocking call to abort in such a way that the calling thread can handle it cleanly. You may just have to implement code to kill that thread with extreme prejudice using TerminateThread (probably this should be the last thing you do before exiting the process) and see what happens under test.
An easy and reliable solution is to kill the service process. A process is the memory-safe abstraction of the OS, after all, so you can safely terminate one without regard for process-internal state - of course, if your process is communicating or fiddling with external state, all bets are off...
Additionally, you could implement the solution which OS's themselves commonly do: one warning signal asking the process to clean up as best possible (which sets a flag and gracefully exits what can be gracefully stopped), and then forceful termination if the process doesn't exit by itself (which ends pesky things like blocking I/O).
All services should be built such that forceful termination isn't harmful, since these processes are system managed and may be terminated by things such as a reboot - i.e., your service ideally should permit this without corrupting storage anyhow.
Oh, and one final warning; windows services may share a process (I presume for efficiency, though it strikes me as an avoidable optimization), so if you go this route, you want to make sure your service is not sharing a process with other services. You can ensure this by passing the option SERVICE_WIN32_OWN_PROCESS to ChangeServiceConfig.

Is SetEvent atomic?

Is it safe to have 2 or more threads call the Win32 API's SetEvent on the same event handler not being protected by a critical section?
It's safe, but remember that if one thread Sets it, and another thread Sets it at the same time, you're not going to get two notifications, just one; since the 2nd one changed it from True to...True. If you're worried about this, use Semaphores instead.
Assuming you have multiple threads waiting on the same event, running the same code.
If your code doesnt clear the event until its done processing, you effectively have a CS. Since the event remains signaled until it is cleared(aka not autoreset), having multiple threads signal the does nothing except spin the CPU.
If your code clears it at the begining of processing or the event is autorset, then you would have multiple threads running the same function, which is unsafe if these threads share anything.
there are no restrictions on calling SetEvent from multiple threads.

Resources