select system occaionally not failing with EINTR - linux

Using timer_create, we deliver a real time signal to a thread which waits on select function.
This signal is caught and handled in the thread. Based on fact the select will be interrupted when a signal is caught, I have some logic implemented if select fails with error number EINTR.
This works fine most of the time, but occasionally I notice that select is not getting interrupted (or some how the code within EINTR case not getting executed).
What are possible reasons for this?

It can be that when the timer expiry signal is delivered you are not waiting in select, hence it does not return EINTR.
If you want to receive EINTR only when the thread is blocked in select, you may block that signal in the thread using pthread_sigmask and use pselect or epoll_pwait that would unblock that signal while waiting only. This way the rest of your code does not need to be concerned with handling EINTR.
If you have more than one thread in the process make sure you block that signal in all other threads, so that only one thread gets delivered that signal. See Signal Concepts for more details.
A more elegant option (IMO) is to avoid using timer_create and rather pass the delay to the next timer expiry as select time-out argument (this is what libevent does). But that requires you to maintain your own min-heap of timers.

Related

pause() system call and receiving a SIGINT signal

I'm a beginner in Linux and Process signal handling.
Let's say we have a process A and it execute pause() function, we know that puts the current process to sleep until a signal is received by the process.
But when we type ctrl-c, kernel also sends a SIGINT to process A and when A receives the signal, it execute the SIGINT's default handler which is terminating the current process. So my question is:
Does the process A resume first or handler get executed first?
For simplicity, let's assume process A has only a single thread, which is blocking in a pause() call, and exactly one signal gets sent to the process.
Does the process A resume first or handler get executed first?
The signal handler gets executed first, then the pause() call returns.
What if there are multiple signals?
Standard signals are not queued, so if you send say two INT signals to the process very quickly in succession, only one of them is delivered.
If there are multiple signals, the order is unspecified.
What about POSIX realtime signals? (SIGRTMIN+0 to SIGRTMAX-0)
They are just like standard named signals, except they are queued (to a limit), and if more than one of them is pending, they get delivered in increasing numerical order.
If there are both standard and realtime signals pending, it is unspecified which ones get delivered first; although in practice, in Linux and many other systems, the standard signals get delivered first, then the realtime ones.
What if there are multiple threads in the process?
The kernel will pick one thread among those that do not have the signal masked (via sigprocmask() or pthread_sigmask()), and use that thread to deliver the signal to the signal handler.
If there are more than one thread blocking in a pause() call, one of them gets woken up. If there are more than one pending signal, it is unspecified whether the one woken thread handles them all, or if more than one thread is woken up.
In general, I warmly recommend reading the man 7 signal, man 7 signal-safety, man 2 sigaction, man 2 sigqueue, and man 2 sigwaitinfo man pages. (While the links go to the Linux man pages project, each of the pages includes a Conforming To section naming the related standards, and Linux-specific behaviour is clearly marked.)

SetEvent ResetEvent WaitForMultipleObjectsEx - Race condition?

I am not able to understand the PulseEvent or race condition. But to avoid it I am trying to SetEvent instead, and ResetEvent every time before WaitForMultipleObjectsEx.
This is my flow:
Thread One - Uses CreateEvent to create an auto reseting event, I then spawn and tell Thread TWO about it.
Thread One - Tell thread TWO to run.
Thread TWO will do ResetEvent on event and then immediately start WaitForMultipleObjectsEx on the event and some other stuff for file watching. If WaitForMultipleObjectsEx returns, and it is not due to the event, then restart the loop immediately. If WaitForMultipleObjectsEx returns, due to event going to signaled, then do not restart loop.
So now imagine this case please:
Thread TWO - loop is running
Thread One - needs to add a path, so it does (1) SetEvent, and then (2) sends another message to thread 2 to add a path, and then (3) sends message to thread 2 to restart loop.
The messages of add path and restart loop will not come in to Thread TWO unless I stop the loop in TWO, which is done by the SetEvent. Thread TWO will see it was stoped due to the event, and so it wont restart the loop. So it will now get the message to add path, so it will add path, then restart loop.
Thread One - needs to stop the thread, so it does (1) SetEvent and then (2) waits for message thread 2, when it gets that message it will terminate the thread.
Will this avoid race condition?
Thank you
Suppose the loop needs to be interrupted twice in succession. You're imagining a sequence of events something like this, on thread ONE and thread TWO:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread TWO reads the message "restart the wait loop".
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread ONE sends message related to the second interruption.
Thread TWO stops the loop, receives the message about the second interruption.
But since you don't have any control over the timing between the two threads, it might instead happen like this:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread TWO reads the message "restart the wait loop".
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE sends a message about the second interruption, but TWO isn't listening!
Even if the message passing mechanism is synchronous, so that ONE won't continue until TWO has read the message, it could happen this way:
Thread ONE realizes that the first interruption is complete.
Thread ONE sends a message telling TWO to restart the wait loop.
Thread TWO reads the message "restart the wait loop", but is then swapped out.
Thread ONE now realizes that another interruption is needed.
Thread ONE sets the event to ask for another interruption.
Thread TWO resets the event.
Thread TWO starts waiting.
Thread ONE sends a message about the second interruption, but TWO isn't listening!
(Obviously, a similar thing can happen if you use PulseEvent.)
One quick solution would be to use a second event for TWO to signal ONE at the appropriate point, i.e., after resetting the main event but before waiting on it, but that seems somewhat inelegant and also doesn't generalize very well. If you can guarantee that there will never be two interruptions in close-enough succession, you might simply choose to ignore the race condition, but note that it is difficult to reason about this because there is no theoretical limit to how long it might take for thread TWO to resume running after being swapped out.
The various alternatives depend on how the messages are being passed between the threads and any other constraints. [If you can provide more information about your current implementation I'll update my answer accordingly.]
This is an overview of some of the more obvious options.
If the message-passing mechanism is synchronous (if thread ONE waits for thread TWO to receive the message before proceeding) then using a single auto-reset event should just work. Thread ONE won't set the event until after thread TWO has received the restart-loop message. If the event is already set when thread TWO starts waiting, that just means that there were two interruptions in immediate succession; TWO will never stall waiting for a message that isn't coming. [This potential stall is the only reason I can think of why you might not want to use an auto-reset event. If you have another concern, please edit your question to provide more details.]
If is OK for sending a message to be non-blocking, and you aren't already locked in to a particular solution, any of these options would probably be sensible:
User mode APCs (the QueueUserAPC function) provide a message-passing mechanism that automatically interrupts alertable waits.
You could implement a simple queue (protected by a critical section) which uses an event to indicate whether there is a message pending or not. In this case you can safely use a manual-reset event provided that you only manipulate it when you hold the same critical section that protects the queue.
You could use an auto-reset event in combination with any sort of thread-safe queue, provided only that the queue allows you to test for emptiness without blocking. The idea here is that thread ONE would always insert the message into the queue before setting the event, and if thread TWO sees that the event is set but it turns out that the queue is empty, the event is ignored. If efficiency is a concern, you might even be able to find a suitable lock-free queue implementation. (I don't recommend attempting that yourself.)
(All of those mechanisms could also be made synchronous by using a second event object.)
I wouldn't recommend the following approaches, but if you happen to already be using one of these for messaging this is how you can make it work:
If you're using named pipes for messaging, you could use asynchronous I/O in thread TWO. Thread TWO would use an auto-reset event internally, you specify the event handle when you issue the I/O call and Windows sets it when I/O arrives. From the point of view of thread ONE, there's only a single operation. From the point of view of thread TWO, if the event is set, a message is definitely available. (I believe this is somewhat similar to your original approach, you just have to issue the I/O call in advance rather than afterwards.)
If you're using a window queue for messaging, the MsgWaitForMultipleObjectsEx() function allows you to wait for a window message and other events simultaneously.
PS:
The other problem with PulseEvent, the one mentioned in the documentation, is that this can happen:
Thread TWO starts waiting.
Thread TWO is preempted by Windows and all user code on the thread stops running.
Thread ONE pulses the event.
Thread TWO is restarted by Windows, and the wait is resumed.
Thread ONE sends a message, but TWO isn't listening.
(Personally I'm a bit disappointed that the kernel doesn't deal with this situation; I would have thought that it would be possible for it to set a flag saying that the wait shouldn't be resumed. But I can only assume that there is a good reason why this is impractical.)
The Auto-Reset Events
Would you please try to change the flow so there is just SetEvent and WaitForMultipleObjectsEx with auto-reset events? You may create more events if you need. For example, each thread will have its own pair of events: one to get notifications and another to report about its state changes - you define the scheme that best suits your needs.
Since there will be auto-reset events, there would be neither ResetEvent nor PulseEvent.
If you will be able to change the logic of the algorithm flow this way - the program will become clear, reliable, and straightforward.
I advise this because this is how our applications work since the times of Windows NT 3.51 – we manage to do everything we need with just SetEvent and WaitForMultipleObjects (without the Ex suffix).
As for the PulseEvent, as you know, it is very unreliable, even though it exists from the very first version of Windows NT - 3.1 - maybe it was reliable then, but not now.
To create the auto-reset events, use the bManualReset argument of the CreateEvent API function (if this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled -- this is not what you need). If this parameter is FALSE, the function creates an auto-reset event object. The system will automatically reset the event state to non-signaled after a single waiting thread has been released, i.e., after WaitForMultipleObjects or WaitForSingleObject or other wait functions that explicitly wait for this event to become signaled.
These auto-reset events are very reliable and easy to use.
Let me make a few additional notes on the PulseEvent. Even Microsoft has admitted that PulseEvent is unreliable and should not be used -- see https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx -- because only those threads will be notified that are in the "wait" state when PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is, and, even if you are responsible for the program flow, the state can be changed by the operating system contrary to your program logic. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call (APC) and returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode APC at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
The Manual-Reset Events
The Manual-Reset events are not that bad. :-) You can reliably use them when you need to notify multiple instances of a global state change that occurs only once, for example, application exit. The auto-reset events can only be used to notify one thread (because if more threads are waiting simultaneously for an auto-reset event and you set the event, one random thread will exist and will reset the event, but the behavior of the remaining threads that also wait for the event, will be undefined). From the Microsoft documentation, we may assume that one and only one thread will exit while others would definitely not exit, but this is not very explicitly articulated in the documentation. Anyway, we must take the following quote into consideration: "Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order" Source - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx
So, when you need to notify all the threads quickly – just set the manual-reset event to the signaled state, rather than signaling each auto-reset event for each thread. Once you have signaled the manual-reset event, do not call ResetEvent since then. The drawback of this solution is that the threads need to have an additional event handle passed in the array of their WaitForMultipleObjects. The array size is limited, although, to MAXIMUM_WAIT_OBJECTS, which is 64, we never reached close to this limit in practice.
You can get more ideas about auto-reset events and manual reset events from https://www.codeproject.com/Articles/39040/Auto-and-Manual-Reset-Events-Revisited

pthread_sigmask not working properly with aio callback threads

My application is sometimes terminating from SIGIO or SIGUSR1 signals even though I have blocked these signals.
My main thread starts off with blocking SIGIO and SIGUSR1, then makes 2 AIO read operations. These operations use threads to get notification about operation status. The notify functions (invoked as detached threads) start another AIO operation (they manipulate the data that has been read and start writing it back to the file) and notification is handled by sending signal (one operation uses SIGIO, the other uses SIGUSR1) to this process. I am receiving these signals synchronously by calling sigwait in the main thread. Unfortunately, sometimes my program crashes, being stopped by SIGUSR1 or SIGIO signal (which should be blocked by a sigmask).
One possible solution is to set SIG_IGN handlers for them but this doesn't solve the problem. Their handlers shouldn't be invoked, rather should they be retrieved from pending signals by sigwait in the next iteration of the main program loop.
I have no idea which thread handles this signal in this manner. Maybe it's the init who receives this signal? Or some shell thread? I have no idea.
I'd hazard a guess that the signal is being received by one of your AIO callback threads, or by the very thread which generates the signal. (Prove me wrong and I'll delete this answer.)
Unfortunately per the standard, "[t]he signal mask of [a SIGEV_THREAD] thread is implementation-defined." For example, on Linux (glibc 2.12), if I block SIGUSR1 in main, then contrive to run a SIGEV_THREAD handler from an aio_read call, the handler runs with SIGUSR1 unblocked.
This makes SIGEV_THREAD handlers unsuitable for an application that must reliably and portably handle signals.

Interrupt while placing process on the waiting queue

Suppose there is a process that is trying to enter the critical region but since it is occupied by some other process, the current process has to wait for it. So, at the time when the process is getting added to the waiting queue of the semaphore, suppose an interrupt comes (ex- battery finished), then what will happen to that process and the waiting queue?
I think that since the battery has finished so this interrupt will have the highest priority and so the context of the process which was placing the process on the waiting queue would be saved and interrupt service routine for this routing will be executed.
And then it will return to the process that was placing the process on the queue.
Please give some hints/suggestions for this question.
This is very hardware / OS dependant, however a few thoughts:
As has been mentioned in the comments, a ‘battery finished’ interrupt may be considered as a special case, simply because the machine may turn off without taking any action, in which case the processes + queue will disappear. In general however, assuming a non-fatal interrupt and an OS that suspends / resumes correctly, I think it’s unlikely there will be any noticeable impact to the execution of either process.
In a multi-core setup, the process may not be immediately suspended. The interrupt could be handled by a different core and neither of the processes you’ve mentioned would be any the wiser.
In a pre-emptive multitasking OS there's also no guarantee that the process adding to the queue would be resumed immediately after the interrupt, the scheduler could decide to activate the process currently in the critical section or another process entirely. What would happen when the process adding itself to the semaphore wait queue resumed would depend on how far through adding it was, how the queue has been implemented and what state the semaphore was in. It may be that it never gets on to the wait queue because it detects that the other process has already woken up and left the critical section, or it may be that it completes adding itself to the queue and suspends as if nothing had happened…
In a single core/processor machine with a cooperative multitasking OS, I think the scenario you’ve described in your question is quite likely, with the executing process being suspended to handle the interrupt and then resumed afterwards until it finished adding itself to the queue and yielded.
It depends on the implementation, but conceptually the same operating process should be performing both the addition of the process to the wait queue and the management of the interrupts, so your process being moved to wait would instead be treated as interrupted from the wait queue.
For Java, see the API for Thread.interrupt()
Interrupts this thread.
Unless the current thread is interrupting itself, which is always permitted, the checkAccess method of this thread is invoked, which may cause a SecurityException to be thrown.
If this thread is blocked in an invocation of the wait(), wait(long), or wait(long, int) methods of the Object class, or of the join(), join(long), join(long, int), sleep(long), or sleep(long, int), methods of this class, then its interrupt status will be cleared and it will receive an InterruptedException.
If this thread is blocked in an I/O operation upon an interruptible channel then the channel will be closed, the thread's interrupt status will be set, and the thread will receive a ClosedByInterruptException.
If this thread is blocked in a Selector then the thread's interrupt status will be set and it will return immediately from the selection operation, possibly with a non-zero value, just as if the selector's wakeup method were invoked.
If none of the previous conditions hold then this thread's interrupt status will be set.
Interrupting a thread that is not alive need not have any effect.

Linux/vxworks signals

I came across the following in a vxworks manual and was wondering why this is the case.
What types of things do signals do that make them undesirable?
In applications, signals are most
appropriate for error and exception
handling, and not for a
general-purpose inter-task
communication.
The main issue with signals is that signal handlers are registered on a per process/memory space basis (in vxWorks, the kernel represents one memory space, and each RTP is a different memory space).
This means that regardless of the thread/task context, the same signal handler will get executed (for a given process). This can cause some problems with side-effects if your signal handler is not well behaved.
For example, if your signal uses a mutex for protect a shared resource, this could cause nasty problems, or at least, unexpected behavior
Task A Task B Signal Handler
Take Mutex
...
Gets preempted
does something
....
<SIGNAL ARRIVES>----->Take Mutex (blocks)
resumes
....
Give Mutex
----->Resumes Handler
I'm not sure the example above really conveys what I'm trying to.
Here are some other characteristics of signals:
Handler not executed until the task/process is scheduled. Just because you sent the signal, doesn't mean the handler will execute right away
No guarantee on which Task/Thread will execute the handler. Any thread/task in the process could run it (whichever thread/task executes first). VxWorks has ways around this.
Note that the above only applies to asynchronous signals sent via a kill call.
An exception will generate a synchronous signal which WILL get executed right away in the current context.

Resources