how dose pthread implement thread cancellation, which system call it used? - multithreading

I know that if a thread is PTHREAD_CANCEL_ENABLE, then it can be canceled, and pthread_cancel(pthread_t thread) will send a cancellation request to the specified thread, and the reaction of the thread depend on the following 2 attributes:
if the cancellation type is PTHREAD_CANCEL_ASYNCHRONOUS, cancellation requests are acted on immediately.
if the cancellation type is PTHREAD_CANCEL_DEFERRED, cancellation requests are acted on as soon as the thread reaches a cancellation point.
I just want to know what happened at the cancellation point and in the signal handler, and how it can respond to the cancellation request at the cancellation point. I know pthreads use signals to implement this, but I do not know the details. This problem bothered me for a long time!!!
Wish you guys could help me, thanks a lot for your answers!!!

Related

how glibc implemented thread cancellation

There are two cancel type in pthread:
PTHREAD_CANCEL_DEFERRED
A cancellation request is deferred until the thread next calls a function that is a cancella‐tion point
PTHREAD_CANCEL_ASYNCHRONOUS
The thread can be canceled at any time.
My problem is how PTHREAD_CANCEL_DEFERRED is implemented and what happened at the cancellation point function like read, write ...
Why does the thread not execute the code after the cancellation point function, how to do this? Whether the operating system supports it?

SetEvent ResetEvent WaitForMultipleObjectsEx - Race condition?

I am not able to understand the PulseEvent or race condition. But to avoid it I am trying to SetEvent instead, and ResetEvent every time before WaitForMultipleObjectsEx.
This is my flow:
Thread One - Uses CreateEvent to create an auto reseting event, I then spawn and tell Thread TWO about it.
Thread One - Tell thread TWO to run.
Thread TWO will do ResetEvent on event and then immediately start WaitForMultipleObjectsEx on the event and some other stuff for file watching. If WaitForMultipleObjectsEx returns, and it is not due to the event, then restart the loop immediately. If WaitForMultipleObjectsEx returns, due to event going to signaled, then do not restart loop.
So now imagine this case please:
Thread TWO - loop is running
Thread One - needs to add a path, so it does (1) SetEvent, and then (2) sends another message to thread 2 to add a path, and then (3) sends message to thread 2 to restart loop.
The messages of add path and restart loop will not come in to Thread TWO unless I stop the loop in TWO, which is done by the SetEvent. Thread TWO will see it was stoped due to the event, and so it wont restart the loop. So it will now get the message to add path, so it will add path, then restart loop.
Thread One - needs to stop the thread, so it does (1) SetEvent and then (2) waits for message thread 2, when it gets that message it will terminate the thread.
Will this avoid race condition?
Thank you
Suppose the loop needs to be interrupted twice in succession. You're imagining a sequence of events something like this, on thread ONE and thread TWO:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread TWO reads the message "restart the wait loop".
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread ONE sends message related to the second interruption.
Thread TWO stops the loop, receives the message about the second interruption.
But since you don't have any control over the timing between the two threads, it might instead happen like this:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread TWO reads the message "restart the wait loop".
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE sends a message about the second interruption, but TWO isn't listening!
Even if the message passing mechanism is synchronous, so that ONE won't continue until TWO has read the message, it could happen this way:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread TWO reads the message "restart the wait loop", but is then swapped out.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE sends a message about the second interruption, but TWO isn't listening!
(Obviously, a similar thing can happen if you use PulseEvent.)
One quick solution would be to use a second event for TWO to signal ONE at the appropriate point, i.e., after resetting the main event but before waiting on it, but that seems somewhat inelegant and also doesn't generalize very well. If you can guarantee that there will never be two interruptions in close-enough succession, you might simply choose to ignore the race condition, but note that it is difficult to reason about this because there is no theoretical limit to how long it might take for thread TWO to resume running after being swapped out.
The various alternatives depend on how the messages are being passed between the threads and any other constraints. [If you can provide more information about your current implementation I'll update my answer accordingly.]
This is an overview of some of the more obvious options.
If the message-passing mechanism is synchronous (if thread ONE waits for thread TWO to receive the message before proceeding) then using a single auto-reset event should just work. Thread ONE won't set the event until after thread TWO has received the restart-loop message. If the event is already set when thread TWO starts waiting, that just means that there were two interruptions in immediate succession; TWO will never stall waiting for a message that isn't coming. [This potential stall is the only reason I can think of why you might not want to use an auto-reset event. If you have another concern, please edit your question to provide more details.]
If is OK for sending a message to be non-blocking, and you aren't already locked in to a particular solution, any of these options would probably be sensible:
User mode APCs (the QueueUserAPC function) provide a message-passing mechanism that automatically interrupts alertable waits.
You could implement a simple queue (protected by a critical section) which uses an event to indicate whether there is a message pending or not. In this case you can safely use a manual-reset event provided that you only manipulate it when you hold the same critical section that protects the queue.
You could use an auto-reset event in combination with any sort of thread-safe queue, provided only that the queue allows you to test for emptiness without blocking. The idea here is that thread ONE would always insert the message into the queue before setting the event, and if thread TWO sees that the event is set but it turns out that the queue is empty, the event is ignored. If efficiency is a concern, you might even be able to find a suitable lock-free queue implementation. (I don't recommend attempting that yourself.)
(All of those mechanisms could also be made synchronous by using a second event object.)
I wouldn't recommend the following approaches, but if you happen to already be using one of these for messaging this is how you can make it work:
If you're using named pipes for messaging, you could use asynchronous I/O in thread TWO. Thread TWO would use an auto-reset event internally, you specify the event handle when you issue the I/O call and Windows sets it when I/O arrives. From the point of view of thread ONE, there's only a single operation. From the point of view of thread TWO, if the event is set, a message is definitely available. (I believe this is somewhat similar to your original approach, you just have to issue the I/O call in advance rather than afterwards.)
If you're using a window queue for messaging, the MsgWaitForMultipleObjectsEx() function allows you to wait for a window message and other events simultaneously.
PS:
The other problem with PulseEvent, the one mentioned in the documentation, is that this can happen:
Thread TWO starts waiting.
Thread TWO is preempted by Windows and all user code on the thread stops running.
Thread ONE pulses the event.
Thread TWO is restarted by Windows, and the wait is resumed.
Thread ONE sends a message, but TWO isn't listening.
(Personally I'm a bit disappointed that the kernel doesn't deal with this situation; I would have thought that it would be possible for it to set a flag saying that the wait shouldn't be resumed. But I can only assume that there is a good reason why this is impractical.)
The Auto-Reset Events
Would you please try to change the flow so there is just SetEvent and WaitForMultipleObjectsEx with auto-reset events? You may create more events if you need. For example, each thread will have its own pair of events: one to get notifications and another to report about its state changes - you define the scheme that best suits your needs.
Since there will be auto-reset events, there would be neither ResetEvent nor PulseEvent.
If you will be able to change the logic of the algorithm flow this way - the program will become clear, reliable, and straightforward.
I advise this because this is how our applications work since the times of Windows NT 3.51 – we manage to do everything we need with just SetEvent and WaitForMultipleObjects (without the Ex suffix).
As for the PulseEvent, as you know, it is very unreliable, even though it exists from the very first version of Windows NT - 3.1 - maybe it was reliable then, but not now.
To create the auto-reset events, use the bManualReset argument of the CreateEvent API function (if this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled -- this is not what you need). If this parameter is FALSE, the function creates an auto-reset event object. The system will automatically reset the event state to non-signaled after a single waiting thread has been released, i.e., after WaitForMultipleObjects or WaitForSingleObject or other wait functions that explicitly wait for this event to become signaled.
These auto-reset events are very reliable and easy to use.
Let me make a few additional notes on the PulseEvent. Even Microsoft has admitted that PulseEvent is unreliable and should not be used -- see https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx -- because only those threads will be notified that are in the "wait" state when PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is, and, even if you are responsible for the program flow, the state can be changed by the operating system contrary to your program logic. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call (APC) and returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode APC at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
The Manual-Reset Events
The Manual-Reset events are not that bad. :-) You can reliably use them when you need to notify multiple instances of a global state change that occurs only once, for example, application exit. The auto-reset events can only be used to notify one thread (because if more threads are waiting simultaneously for an auto-reset event and you set the event, one random thread will exist and will reset the event, but the behavior of the remaining threads that also wait for the event, will be undefined). From the Microsoft documentation, we may assume that one and only one thread will exit while others would definitely not exit, but this is not very explicitly articulated in the documentation. Anyway, we must take the following quote into consideration: "Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order" Source - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx
So, when you need to notify all the threads quickly – just set the manual-reset event to the signaled state, rather than signaling each auto-reset event for each thread. Once you have signaled the manual-reset event, do not call ResetEvent since then. The drawback of this solution is that the threads need to have an additional event handle passed in the array of their WaitForMultipleObjects. The array size is limited, although, to MAXIMUM_WAIT_OBJECTS, which is 64, we never reached close to this limit in practice.
You can get more ideas about auto-reset events and manual reset events from https://www.codeproject.com/Articles/39040/Auto-and-Manual-Reset-Events-Revisited

will setting pthread_canelState to PTHREAD_CANCEL_DISABLE queue up the cancellation requests?

I am trying to understand Posix threads. In the man page of pthread_cancel(), it is mentioned that "thread’s cancelability state, determined by pthread_setcancelstate(), can be enabled or disabled. If a thread has disabled cancellation, then a cancellation request remains queued until the thread enables cancellation.
But when I was reading about thread cancellation points on http://www.makelinux.net/alp/029, it is mentioned that if we set the cancel type as disabled (uncancellable), the cancellation requests are quietly ignored.
Can any one please let me know whether cancellation requests are getting queued or ignored if we set the cancellation type as DISABLED?
POSIX threads controls thread cancellation by a combination of two binaries variables:
cancellation STATE and cancellation TYPE. The associated functions are pthread_setcancelstate() and pthread_setcanceltype() accordingly.
When the STATE is set to disabled, the cancellation request is ignored.
It is not thrown out, it is suspended (or as you correctly wrote - "queued"), until the STATE is set back to enabled. Since the state is enabled, the OS starts the cancellation process according to the cancellation type. If you have a code that must be executed before a thread is cancelled (e.g. memory de-allocation etc.), you may set the thread cancellation state to disabled, before entering the code, and enable the cancellation exiting the code. The second question is how and when the thread is really stopped (cancelled). The the cancellation type answers this question. If the type is set to (not recommended) asynchronous, the cancellation may occur at the nearest instruction. If the type is set to the default deferred cancellation, the cancellation will occur at the next "cancellation point", a POSIX function that checks thread cancellation status and terminates the thread.

What is the difference between cancelling and exiting a pthread?

The term "cancelling a pthread" and "exiting a pthread" looks confusing.
Can someone help me clearly explain the difference between the two?
P.S: Please don't help me with a link to the man pages, i have already seen that :-)
Added:
1) How are the thread data structures handled, and the cleanup
different in both these cases?
2) When there are signals pending for a thread, how are the pending
signals mask handled in both these cases?
Referring the 1st question in the OP's addendum, verbatim from man pthread_cancel():
When a cancellation requested is acted on, the following steps occur for thread (in this order):
Cancellation clean-up handlers are popped (in the reverse of the order in which they were pushed) and called. (See pthread_cleanup_push(3).)
Thread-specific data destructors are called, in an unspecified order. (See pthread_key_create(3).)
The thread is terminated. (See pthread_exit(3).)
The above steps happen asynchronously with respect to the pthread_cancel() call; the return status of pthread_cancel() merely informs the caller whether the cancellation request was successfully queued.
After a canceled thread has terminated, a join with that thread using pthread_join(3) obtains PTHREAD_CANCELED as the thread's exit status. (Joining with a thread is the only way to know that cancellation has completed.)
The only difference I see is the exit point: For a cancelled thread it's any cancellation point the thread function might pass, else it's pthread_exit(), return or the end of the thread function.
Update (referring the 2nd question):
I'd say if a signal was put into a thread's queue and still is pending when cancellation has finished the signal is lost. I'm not sure but I could imagine that signal processing is going on as long as the thread lives, that is also "during" the cancellation.
All I could find regarding this is from man pthread_exit:
BUGS
Currently, there are limitations in the kernel implementation logic for wait(2) ing on a stopped thread group with a dead thread group leader. This can manifest in problems such as a locked terminal if a stop signal is sent to a foreground process whose thread group leader has already called pthread_exit(3).
All quotes are comming from the Debian (non-free) package manpages-posix-dev (2.16-1) . (The source package is here.)

Status of threads when signal handler runs

Assume a multi-threaded application, with a signal handler defined in it.
Now if a signal is delivered to the PROCESS, and signal handler is invoked - My doubt is what happens to other threads during the period signal handler is running. Do they keep running, as if nothing has happened or they are suspended for that period .. or ...?
Also if someone can tell me WHY to justify the answer?
The specification is pretty clear how signals and threads interact:
Signals generated for the process shall be delivered to exactly one of those threads within the process which is in a call to a sigwait() function selecting that signal or has not blocked delivery of the signal.
As the signal is delivered to exactly one thread, other threads are unaffected (and keep running).
The threads are independent: a signal from one thread to a second thread will not affect any of the others. The why is because they are independent. The only reason why it would affect the others is if the signal handler of the thread in question somehow interacts with other threads.

Resources