How do I suspend another thread (not the current one)? - linux

I'm trying to implement a simulation of a microcontroller. This simulation is not meant to do a clock cycle precise representation of one specific microcontroller but check the general correctness of the code.
I thought of having a "main thread" executing normal code and a second thread executing ISR code. Whenever an ISR needs to be run, the ISR thread suspends the "main thread".
Of course, I want to have a feature to block interrupts.
I thought of solving this with a mutex that the ISR thread holds whenever it executes ISR code while the main thread holds it as long as "interrupts are blocked".
A POR (power on reset) can then be implemented by not only suspending but killing the main thread (and starting a new one executing the POR function).
The windows API provides the necessary functions.
But it seems to be impossible to do the above with posix threads (on linux).
I don't want to change the actual hardware independent microcontroller code. So inserting anything to check for pending interrupts is not an option.
Receiving interrupts at non well behaved points is desirable, as this also happens on microcontrollers (unless you block interrupts).
Is there a way to suspend another thread on linux? (Debuggers must use that option somehow, I think.)
Please, don't tell me this is a bad idea. I know that is true in most circumstances. But the main code does not use standard libs or lock/mutexes/semaphores.

SIGSTOP does not work - it always stops the entire process.
Instead you can use some other signals, say SIGUSR1 for suspending and SIGUSR2 for resuming:
// at process start call init_pthread_suspending to install the handlers
// to suspend a thread use pthread_kill(thread_id, SUSPEND_SIG)
// to resume a thread use pthread_kill(thread_id, RESUME_SIG)
#include <signal.h>
#define RESUME_SIG SIGUSR2
#define SUSPEND_SIG SIGUSR1
static sigset_t wait_mask;
static __thread int suspended; // per-thread flag
void resume_handler(int sig)
{
suspended = 0;
}
void suspend_handler(int sig)
{
if (suspended) return;
suspended = 1;
do sigsuspend(&wait_mask); while (suspended);
}
void init_pthread_suspending()
{
struct sigaction sa;
sigfillset(&wait_mask);
sigdelset(&wait_mask, SUSPEND_SIG)
sigdelset(&wait_mask, RESUME_SIG);
sigfillset(&sa.sa_mask);
sa.sa_flags = 0;
sa.sa_handler = resume_handler;
sigaction(RESUME_SIG, &sa, NULL);
sa.sa_handler = suspend_handler;
sigaction(SUSPEND_SIG, &sa, NULL);
}
I am very annoyed by replies like "you should not suspend another thread, that is bad".
Guys why do you assume others are idiots and don't know what they are doing? Imagine that others, too, have heard about deadlocking and still, in full consciousness, want to suspend other threads.
If you don't have a real answer to their question why do you waste your and the readers' time.
An yes, IMO pthreads are very short-sighted api, a disgrace for POSIX.

The Hotspot JAVA VM uses SIGUSR2 to implement suspend/resume for JAVA threads on linux.
A procedure based on on a signal handler for SIGUSR2 might be:
Providing a signal handler for SIGUSR2 allows a thread to request a lock
(which has already been acquired by the signal sending thread).
This suspends the thread.
As soon as the suspending thread releases the lock, the signal handler can
(and will?) get the lock. The signal handler releases the lock immediately and
leaves the signal handler.
This resumes the thread.
It will probably be necessary to introduce a control variable to make sure that the main thread is in the signal handler before starting the actual processing of the ISR.
(The details depend on whether the signal handler is called synchronously or asynchronously.)
I don't know, if this is exactly how it is done in the Java VM, but I think the above procedure does what I need.

Somehow I think sending the other thread SIGSTOP works.
However, you are far better off writing some thread communication involving senaogires.mutexes and global variables.
You see, if you suspend the other thread in malloc() and you call malloc() -> deadlock.
Did I mention that lots of C standard library functions, let alone other libraries you use, will call malloc() behind your back?
EDIT:
Hmmm, no standard library code. Maybe use setjmp/longjump() from signal handler to simulate the POR and a signal handier to simulate interrupt.
TO THOSE WHO KEEP DOWNVOTING THIS: The answer was accepted for the contents after EDIT, which is a specific scenario that cannot be used in any other scenario.

Solaris has the thr_suspend(3C) call that would do what you want. Is switching to Solaris a possibility?
Other than that, you're probably going to have to do some gymnastics with mutexes and/or semaphores. The problem is that you'll only suspend when you check the mutex, which will probably be at a well-behaved point. Depending on what you're actually trying to accomplish, this might now be desirable.

It makes more sense to have the main thread execute the ISRs - because that's how the real controller works (presumably). Just have it check after each emulated instruction if there is both an interrupt pending, and interrupts are currently enabled - if so, emulate a call to the ISR.
The second thread is still used - but it just listens for the conditions which cause an interrupt, and mark the relevant interrupt as pending (for the other thread to later pick up).

Related

What is a Spinning Thread?

I have stumbled upon the term spinning, referring to a thread while reading this (ROS)
What is the general concept behind spinning a thread?
My intuition would say that a spinning thread is a thread that keeps executing in a multithreading process with a certain frequency, somewhat related to the concept of polling (i.e. keep checking some condition with a certain frequency) but I am not sure at all about it.
Could you give some explanation? The more general the better.
There are a couple of separate concepts here.
In terms of ROS (the link you reference), ros::spin() runs the ROS callback invoker, so that pending events are delivered to your program callbacks via a thread belonging to your program. This sort of call typically does not return; it will wait for new events to be ready, and invoke appropriate callbacks when they occur.
But you also refer to "spinning a thread."
This is a separate topic. It generally relates to a low level programming pattern whereby a thread will repeatedly check for some condition being met without being suspended.
A common way to wait for some condition to be met is to just wait on a conditional variable. In this example, the thread will be suspended by the kernel until some other thread calls notify on the condition variable. Upon the notify, the kernel will resume the thread, and the condition will evaluate to true, allowing the thread to continue.
std::mutex m;
std::condition_variable cv;
bool ready = false;
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{ return ready; }); /* thread suspended */
Alternatively a spinning approach would repeatedly check some condition, without going to sleep. Caution: this results in high CPU, and there are subtle caveats to implementing correctly).
Here is an example of a simple spinlock (although note that spinning threads can be used for other purposes than spinlocks). In the below code, notice that the while loop repeatedly calls test_and_set ... which is just an attempt to set the flag to true; that's the spin part.
// spin until true
std::atomic_flag lock = ATOMIC_FLAG_INIT;
while (lock.test_and_set(std::memory_order_acquire)); // acquire lock
/* got the flag .. do work */
lock.clear(std::memory_order_release); // release lock
spin like while loop without sleeping, your task consumes cpu resource constantly until the conditions is satisfied.

How to ensure a signal handler never yields to a thread within the same process group?

This is a bit of a meta question since I think I have a solution that works for me, but it has its own downsides and upsides. I need to do a fairly common thing, catch SIGSEGV on a thread (no dedicated crash handling thread), dump some debug information and exit.
The catch here is the fact that upon crash, my application runs llvm-symbolizer which takes a while (relatively speaking) and causes a yield (either because of clone + execve or exceeding the time quanta for the thread, I've seen latter happen when doing symbolication myself in-process using libLLVM). The reason for doing all this is to get a stack trace with demangled symbols and with line/file information (stored in a separate DWP file). For obvious reasons I do not want a yield happening across my SIGSEGV handler since I intend to terminate the application (thread group) after it has executed and never return from the signal handler.
I'm not that familiar with Linux signal handling and with glibc's wrappers doing magic around them, though, I know the basic gotchas but there isn't much information on the specifics of handling signals like whether synchronous signal handlers get any kind of special priority in terms of scheduling.
Brainstorming, I had a few ideas and downsides to them:
pthread_kill(<every other thread>, SIGSTOP) - Cumbersome with more threads, interacts with signal handlers which seems like it could have unintended side effects. Also requires intercepting thread creation from other libraries to keep track of the thread list and an increasing chance of pre-emption with every system call. Possibly even change their contexts once they're stopped to point to a syscall exit stub or flat out use SIGKILL.
Global flag to serve as cancellation points for all thread (kinda like pthread_cancel/pthread_testcancel). Safer but requires a lot of maintenance and across a large codebase it can be hellish, in addition to a a mild performance overhead. Global flag could also cause the error to cascade since the program is already in an unpredictable state so letting any other thread run there is already not great.
"Abusing" the scheduler which is my current pick, with my implementation as one of the answers. Switching to FIFO scheduling policy and raising priority therefore becoming the only runnable thread in that group.
Core dumps not an option since the goal here was to avoid them in the first place. I would prefer not requiring a helper program aside from from the symbolizer as well.
Environment is a typical glibc based Linux (4.4) distribution with NPTL.
I know that crash handlers are fairly common now so I believe none of the ways I picked are that great, especially considering I've never seen the scheduler "hack" ever get used that way. So with that, does anyone have a better alternative that is cleaner and less riskier than the scheduler "hack" and am I missing any important points in my general ideas about signals?
Edit: It seems that I haven't really considered MP in this equation (as per comments) and the fact that other threads are still runnable in an MP situation and can happily continue running alongside the FIFO thread on a different processor. I can however change the affinity of the process to only execute on the same core as the crashing thread, which basically will effectively freeze all other threads at schedule boundaries. However, that still leaves the "FIFO thread yielding due to blocking IO" scenario open.
It seems like the FIFO + SIGSTOP option is the best one, though I do wonder if there are any other tricks that can make a thread unschedulable short of using SIGSTOP. From the docuemntation it seems like it's not possible to set a thread's CPU affinity to zero (leaving it in a limbo state where it's technically runnable except no processors are available for it to run on).
upon crash, my application runs llvm-symbolizer
That is likely to cause deadlocks. I can't find any statement about llvm-symbolizer being async-signal safe. It's likely to call malloc, and if so will surely deadlock if the crash also happens inside malloc (e.g. due to heap corruption elsewhere).
Switching to FIFO scheduling policy and raising priority therefore becoming the only runnable thread in that group.
I believe you are mistaken: a SCHED_FIFO thread will run so long as it is runnable (i.e. does not issue any blocking system calls). If the thread does issue such a call (which it has to: to e.g. open the separate .dwp file), it will block and other threads will become runnable.
TL;DR: there is no easy way to achieve what you want, and it seems unnecessary anyway: what do you care that other threads continue running while the crashing thread finishes its business?
This is the best solution I could come up (parts omitted for brevity but it shows the principle) with, my basic assumption being that in this situation the process runs as root. This approach can lead to resource starvation in case things go really bad and requires privileges (if I understand the man(7) sched page correctly) I run the part of the signal handler that causes preemptions under the OSSplHigh guard and exit the scope as soon as I can. This is not strictly C++ related since the same could be done in C or any other native language.
void spl_get(spl_t& O)
{
os_assert(syscall(__NR_sched_getattr,
0, &O, sizeof(spl_t), 0) == 0);
}
void spl_set(spl_t& N)
{
os_assert(syscall(__NR_sched_setattr,
0, &N, 0) == 0);
}
void splx(uint32_t PRI, spl_t& O) {
spl_t PL = {0};
PL.size = sizeof(PL);
PL.sched_policy = SCHED_FIFO;
PL.sched_priority = PRI;
spl_set(PL, O);
}
class OSSplHigh {
os::spl_t OldPrioLevel;
public:
OSSplHigh() {
os::splx(2, OldPrioLevel);
}
~OSSplHigh() {
os::spl_set(OldPrioLevel);
}
};
The handler itself is quite trivial using sigaltstack and sigaction though I do not block SIGSEGV on any thread. Also oddly enough syscalls sched_setattr and sched_getattr or the struct definition weren't exposed through glibc contrary to the documentation.
Late Edit: The best solution involved sending SIGSTOP to all threads (by intercepting pthread_create via linker's --wrap option) to keep a ledger of all running threads, thank you to suggestion in the comments.

suspendThread in windows

Keeping my question short... i am writing simulation for a RTOS. As usual the main problem comes with context switch simulation. In case of interrupts it is really becoming hard not to deviate from 'Good' coding guidelines.
Say Task A is running and user application is calculating its harmless private stuff which will run for a long time. during this task A, an interrupt X is supposed to occur. (hint: task A has nothing to do with triggering this interrupt X)... now how do i perform context switch from Task A to interrupt X handler?
My current implementation is based on a context thread that waits till some context switch is requested; an interrupt controller thread that can generate interrupts if someone request interrupt triggering; and a main thread that is running Task A. Now i use interrupt controller thread to generate a new thread for interrupt X and then request context thread to do the context switch. Context thread Suspends Task A main thread and resumes interrupt X handler thread. At the end of interrupt X handler thread, Task A main thread is resumed..
[Edit] just to clarify, i already know suspending and terminating threads from outside is really bad. That is why i asked this question. Plus please don't recommend using event etc. for controlling Task A. it is user application code and i can't control it. He can even use while(1){} if he wants...
I suspect that you can't do what you want to do in that way.
You mentioned that suspending a thread from outside is really bad. The reason is that you have no idea what the thread is doing when you suspend it. It's impossible to know whether the thread currently owns a mutex; if it does then any other thread that tries to access the same mutex is going to deadlock.
You have the problem that the runtime being used by the threads that might be suspended is the same as the one being used by the supervisor. That means there are many potential such deadlocks between the supervisor and the other threads.
In a real environment (i.e. not a simulator), the operating system kernel can suspend threads because there are checks in place to ensure that these deadlocks can't happen. I don't know the details, but it probably involves masking interrupts at certain critical points, and probably not sharing the same mutexes between user-mode code and critical parts of the kernel scheduler. (In your case that would mean your scheduler could not use any of the same OS API functions, either directly or indirectly, as are allowed to be used by the user threads, in case they involve mutexes. This of course would be virtually impossible to achieve.)
The reason I asked in a comment whether you have any control over the user code compiler is that if you controlled the compiler then you could arrange for the user code to effectively mask interrupts for the duration of each instruction and only yield to another thread at well-defined points between instructions. This is how it is done in a control system that I work on.
The other aspect is platform dependence. In Linux and other unix-like operating systems, you have signals, which are like user-mode interrupts. You could potentially use signals to emulate context switching, although you would still have the same problem with mutexes. There is absolutely no equivalent on Windows (as far as I know) precisely because of the problem already stated. The nearest thing is an asynchronous procedure call, but this will run only when the thread has put itself into an alertable wait state (which means the thread is in a deterministic state and is now safe to interrupt).
I think you are going to have to re-think the whole concept so that your supervisory thread has the sort of privileged control above the user threads that the OS has in a non-emulated environment. That will probably involve replacing the compiler or the run-time libraries, or both, with something of your own making.

What is a safe and easy way to exchange data from a threaded ISR? (Raspberry Pi)

I'm trying to develop a C/C++ userspace application on the Raspberry Pi which processes data coming from an SPI device. I'm using the WiringPi Library (function wiringPiISR) which registers a function (the real interrupt handler) that will be called from a pthreaded interrupt handler on an IRQ event.
I heard that STL containers aren't thread safe, but is it enough to have a mutex lock while executing my callback function and of course a lock in the main thread while accessing the buffer/container there?
My "real interrupt handler" which is registered through wiringPiISR looks like this
std::deque<uint8_t> buffer;
static void irq_handler()
{
uint8_t data;
while (digitalRead(IRQ_PIN)==0)
{
data = spi_txrx(CMD_READBYTE);
pthread_mutex_lock(&mutex1);
callback(data);
pthread_mutex_unlock(&mutex1);
}
}
static void callback(uint8_t byte)
{
buffer.push_back(byte);
}
Or is there an easier way to achieve the data exchange between a threaded ISR and main thread?
Is that a real ISR ?
Anyway mutex are not a good fit for ISR, because they lead to priority inversion.
Let's look at normal mutex usage, with two thread :
Thread A runs and take the mmutex
for some reason, thread A is preempted, and thread B executes.
thread B try to take the mutex, but can't.
thread B is put to sleep, allowing another thread to run, for instance thread C or thread A
...
At some point, thread A wille be rescheduled, will resume it's operation, and release the mutex.
When thread B is scheduled again, takes the mutex.
Now the scenario is very different when it comes to ISR. ISR won't be put to sleep in favor of a lower priority thread, so the mutex owning thread will not run while you are in the ISR, and you will never get out of point three.
So the real question is, "When running an IRQ handler, is it possible for other code to run ?" Otherwise you are in deadlock !

Linux kernel interrupt handler mutex protection?

Do I need to protect my interrupt handler being called many times for the same interrupt?
Given the following code, I am not sure on the system calls I should make. I am getting rare, random dead-locks with this current implementation :-
void interrupt_handler(void)
{
down_interruptible(&sem); // or use a lock here ?
clear_intr(); // clear interrupt source on H/W
wake_up_interruptible(...);
up(&sem); // unlock?
return IRQ_HANDLED;
}
void set/clear_intr()
{
spin_lock_irq(&lock);
RMW(x); // set/clear a bit by read/modify/write the H/W interrupt routing register
spin_unlock_irq(&lock);
}
void read()
{
set_intr(); // same as clear_intr, but sets a bit
wait_event_interruptible(...);
}
Should interrupt_handler:down_interruptible be spin_lock_irq / spin_lock_irqsave / local_irq_disable?
Should set/clear_intr:spin_lock_irq be spin_lock_irqsave / local_irq_disable?
Can it (H/W -> kernel -> driver handler) keep generating/getting interrupts until its cleared? Can the interrupt_handler keep getting called while within it?
If as currently implemented the interrupt handler is reentrant then will it block on the down_interruptible?
From LDD3 :-
must be reentrant—it must be capable of running in more than one context at the same time.
Edit 1) after some nice help, suggestions are :-
remove down_interruptible from within interrupt_handler
Move spin_lock_irq outside set/clear methods (no need for spin_lock_irqsave you say?) I really don't see the benefit to this?!
Code :-
void interrupt_handler(void)
{
read_reg(y); // eg of other stuff in the handler
spin_lock_irq(&lock);
clear_intr(); // clear interrupt source on H/W
spin_unlock_irq(&lock);
wake_up_interruptible(...);
return IRQ_HANDLED;
}
void set/clear_intr()
{
RMW(x);
}
void read()
{
error_checks(); // eg of some other stuff in the read method
spin_lock_irq(&lock);
set_intr(); // same as clear_intr, but sets a bit
spin_unlock_irq(&lock);
wait_event_interruptible(...);
// more code here...
}
Edit2) After reading some more SO posts : reading Why kernel code/thread executing in interrupt context cannot sleep? which links to Robert Loves article, I read this :
some interrupt handlers (known in
Linux as fast interrupt handlers) run
with all interrupts on the local
processor disabled. This is done to
ensure that the interrupt handler runs
without interruption, as quickly as
possible. More so, all interrupt
handlers run with their current
interrupt line disabled on all
processors. This ensures that two
interrupt handlers for the same
interrupt line do not run
concurrently. It also prevents device
driver writers from having to handle
recursive interrupts, which complicate
programming.
And I have fast interrupts enabled (SA_INTERRUPT)! So no need for mutex/locks/semaphores/spins/waits/sleeps/etc/etc!
Don't use semaphores in interrupt context, use spin_lock_irqsave instead. quoting LDD3:
If you have a spinlock that can be
taken by code that runs in (hardware
or software) interrupt context, you
must use one of the forms of spin_lock
that disables interrupts. Doing
otherwise can deadlock the system,
sooner or later. If you do not access
your lock in a hardware interrupt
handler, but you do via software
interrupts (in code that runs out of a
tasklet, for example, a topic covered
in Chapter 7), you can use
spin_lock_bh to safely avoid deadlocks
while still allowing hardware
interrupts to be serviced.
As for point 2, make your set_intr and clear_intr require the caller to lock the spinlock, otherwise you'll find your code deadlocking. Again from LDD3:
To make your locking work properly,
you have to write some functions with
the assumption that their caller has
already acquired the relevant lock(s).
Usually, only your internal, static
functions can be written in this way;
functions called from outside must
handle locking explicitly. When you
write internal functions that make
assumptions about locking, do yourself
(and anybody else who works with your
code) a favor and document those
assumptions explicitly. It can be very
hard to come back months later and
figure out whether you need to hold a
lock to call a particular function or
not.
Use spinlock in interrupt context because you don't want to sleep in interrupt context if you didn't acquired a lock.
The code you posted does not look like a device driver irq handler.
The irq handlers in kernel drivers return irqreturn_t and take in int irq_no, void * data as arguements.
You have also not specified if you are registering a threaded handler or a non threaded handler.
A non threaded irq handler cannot have any sleeping calls whether or not you hold any spinlocks.
wait_event, mutex, semaphore, etc, are all sleeping calls and must not be used in a non threaded irq handler. You can however, hold a spinlock to prevent interruption to your interrupt handler. This will ensure that maskable irqs and scheduler do not interrupt your irq handler in the middle.
In a threaded irq handler, such things as sleeping calls (wait queues, mutex, etc) can be used but are still not recommended.

Resources