Safe global state for signal handling

Safe global state for signal handling - linux

I am toying around with Rust and various UNIX libraries. A use-case that I have right now is that I want to react to POSIX signals. To keep things reasonable I want to create an abstraction over the signal handling so that the rest of my program doesn't have to worry about them as much.
Let's call the abstraction SignalHandler:
struct SignalHandler {
pub signals: Arc<Vec<libc::c_int>>,
}
I would like this signals vector to be filled with all the signals that are received. My real state is more complicated, but let's use this vector as an example.
I want the API to behave like this:
// ← No signals are being captured
let Some(h) = SignalHandler::try_create();
// ← Signals are added to h.signals
// Only one signal handler can be active at a time per process
assert_eq!(None, SignalHandler::try_create());
// ← Signals are added to h.signals
drop(h);
// ← No signals are being captured
The problem is that registering a signal handler (e.g. using the nix crate) requires a pointer to a C function:
use nix::sys::signal;
let action = signal::SigAction::new(handle_signal, signal::SockFlag::empty(), signal::SigSet::empty());
signal::sigaction(signal::SIGINT, &action);
I can't pass the signals vector to the handle_signal function, since it needs to have the C ABI and thus can't be a closure. I would like to give out a Weak<_> pointer to that function somehow. This probably means using global state.
So the question is: what data structure should I use for global state that can either be "unset" (i.e. no signals vector) or atomically "set" to some mutable state that I initialize in try_create?

For this type of global state, I would recommend using the lazy_static crate. You can use a macro to define a lazily-evaluated, mutable global reference. You may be able to get a way with a global Option<T> variable with that.
That is one problem with this situation though. A big issue you will run into is that it is hard to do what you want only inside of a signal handler. Since a signal handler must be re-entrant, any type of locks are out as well as any memory allocation (unless the memory allocator used is also re-entrant). That means an Arc<Mutex<Vec<T>>> type or something similar will not work. You potentially already know and are dealing with that in some way though.
Depending on your needs, I might point you towards the chan_signal crate, which is an abstraction over signals which uses a thread and the sigwait syscall to receive signals.
Hope that helps, another interesting resource to look at would be the signalfd function which creates a file descriptor to enqueue signals on. The nix crate has a binding to that as well.

Related

What are examples of types that implement only one of Send and Sync?

Just to get better understanding of the Send and Sync traits, are there examples of types that either:
Implement Send and do not implement Sync.
Implement Sync and do not implement Send.

First of all, it is important to realize that most structs (or enums) are Send:
any struct that does not contain any reference can be Send + 'static
any struct that contain references with a lower-bound lifetime of 'a can be Send + 'a
As a result, you would generally expect any Sync struct to be Send too, because Send is such an easy bar to reach (compared to the much harder bar of being Sync which requires safe concurrent modification from multiple threads).
However, nothing prevents the creator of a type to specifically mark it as not Send. For example, let's resuscitate conditions!
The idea of conditions, in Lisp, is that you setup a handler for a given condition (say: FileNotFound) and then when deep in the stack this condition is met then your handler is called.
How would you implement this in Rust?
Well, to preserve threads independence, you would use thread-local storage for the condition handlers (see std::thread_local!). Each condition would be a stack of condition handlers, with either only the top one invoked or an iterative process starting from the top one but reaching down until one succeeds.
But then, how would you set them?
Personally, I'd use RAII! I would bind the condition handler in the thread-local stack and register it in the frame (for example, using an intrusive doubly-linked list as the stack).
This way, when I am done, the condition handler automatically un-registers itself.
Of course, the system has to account for users doing unexpected things (like storing the condition handlers in the heap and not dropping them in the order they were created), and this is why we use a doubly-linked list, so that the handler can un-register itself from the middle of the stack if necessary.
So we have a:
struct ConditionHandler<T> {
handler: T,
prev: Option<*mut ConditionHandler<T>>,
next: Option<*mut ConditionHandler<T>>,
}
and the "real" handler is passed by the user as T.
Would this handler be Sync?
Possibly, depends how you create it but there is no reason you could not create a handler so that a reference to it could not be shared between multiple threads.
Note: those threads could not access its prev/next data members, which are private, and need not be Sync.
Would this handler be Send?
Unless specific care is taken, no.
The prev and next fields are not protected against concurrent accesses, and even worse if the handler were to be dropped while another thread had obtained a reference to it (for example, another handler trying to un-register itself) then this now dangling reference would cause Undefined Behavior.
Note: the latter issue means that just switching Option<*mut Handler<T>> for AtomicPtr<ConditionHandler<T>> is not sufficient; see Common Pitfalls in Writing Lock-Free Algorithms for more details.
And there you have it: a ConditionHandler<T> is Sync if T is Sync but will never be Send (as is).
For completeness, many types implement Send but not Sync (most Send types, actually): Option or Vec for example.

Cell and RefCell implement Send but not Sync because they can be safely sent between threads but not shared between them.

Golang: Best way to read from a hashmap w/ mutex

This is a continuation from here: Golang: Shared communication in async http server
Assuming I have a hashmap w/ locking:
//create async hashmap for inter request communication
type state struct {
*sync.Mutex // inherits locking methods
AsyncResponses map[string]string // map ids to values
}
var State = &state{&sync.Mutex{}, map[string]string{}}
Functions that write to this will place a lock. My question is, what is the best / fastest way to have another function check for a value without blocking writes to the hashmap? I'd like to know the instant a value is present on it.
MyVal = State.AsyncResponses[MyId]

Reading a shared map without blocking writers is the very definition of a data race. Actually, semantically it is a data race even when the writers will be blocked during the read! Because as soon as you finish reading the value and unblock the writers - the value may not exists in the map anymore.
Anyway, it's not very likely that proper syncing would be a bottleneck in many programs. A non-blocking lock af a {RW,}Mutex is probably in the order of < 20 nsecs even on middle powered CPUS. I suggest to postpone optimization not only after making the program correct, but also after measuring where the major part of time is being spent.

Using sigprocmask to implement locks

I'm implementing user threads in Linux kernel 2.4, and I'm using ualarm to invoke context switches between the threads.
We have a requirement that our thread library's functions should be uninterruptable by the context switching mechanism for threads, so I looked into blocking signals and learned that using sigprocmask is the standard way to do this.
However, it looks like I need to do quite a lot to implement this:
sigset_t new_set, old_set;
sigemptyset(&new_set);
sigaddset(&new_set, SIGALRM);
sigprocmask(SIG_BLOCK, &new_set, &old_set);
This blocks SIGALARM but it does this with 3 function invocations! A lot can happen in the time it takes for these functions to run, including the signal being sent.
The best idea I had to mitigate this was temporarily disabling ualarm, like this:
sigset_t new_set, old_set;
time=ualarm(0,0);
sigemptyset(&new_set);
sigaddset(&new_set, SIGALRM);
sigprocmask(SIG_BLOCK, &new_set, &old_set);
ualarm(time, 0);
Which is fine except that this feels verbose. Isn't there a better way to do this?

As WhirlWind points out, the signal set functions are quite lightweight and may even be implemented as macros; and you can also just keep around a signal set that contains only SIGALRM and re-use that.
Regardless, it doesn't actually matter if the signal happens during the sigaddset() or sigemptyset() calls - the new_set and old_set variable are (presumably) thread-local, and the critical section isn't entered until after sigprocmask() returns.

You'll find that sigemptyset() and sigaddset() in signals.h are just macros or inline functions, so they execute inline in your code. Just use a stack variable when you call them.
However, why don't you do this in a single-threaded startup section of your code? I also doubt the function call to sigprocmask will be atomic. Blocking signals does not mean your code will be uninterruptible.
By the way, I'm not sure how you're using ualarm, but if you're not catching or ignoring SIGALARM when you call it the first time, you'll probably kill your process.

sigprocmask() is the only function that goes to kernel level and actually changes the signal masking status. The other functions are just manipulation functions for setting up the mask before calling sigprocmask or passing the set to another signal related function.

How to implement a thread safe timer on linux?

As we know, doing things in signal handlers is really bad, because they run in an interrupt-like context. It's quite possible that various locks (including the malloc() heap lock!) are held when the signal handler is called.
So I want to implement a thread safe timer without using signal mechanism.
How can I do?
Sorry, actually, I'm not expecting answers about thread-safe, but answers about implementing a timer on Unix or Linux which is thread-safe.

Use usleep(3) or sleep(3) in your thread. This will block the thread until the timeout expires.
If you need to wait on I/O and have a timer expire before any I/O is ready, use select(2), poll(2) or epoll(7) with a timeout.
If you still need to use a signal handler, create a pipe with pipe(2), do a blocking read on the read side in your thread, or use select/poll/epoll to wait for it to be ready, and write a byte to the write end of your pipe in the signal handler with write(2). It doesn't matter what you write to the pipe - the idea is to just get your thread to wake up. If you want to multiplex signals on the one pipe, write the signal number or some other ID to the pipe.

You should probably use something like pthreads, the POSIX threads library. It provides not only threads themselves but also basic synchronization primitives like mutexes (locks), conditions, semaphores. Here's a tutorial I found that seems to be decent:
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
For what it's worth, if you're totally unfamiliar with multithreaded programming, it might be a little easier to learn it in Java or Python, if you know either of those, than in C.

I think the usual way around the problems you describe is to make the signal handlers do only a minimal amount of work. E.g. setting some timer_expired flag. Then you have some thread that regularly checks whether the flag has been set, and does the actual work.
If you don't want to use signals I suppose you'd have to make a thread sleep or busy-wait for the specified time.

Use a Posix interval timer, and have it notify via a signal. Inside the signal handler function almost none of C's functions, like printf() can be used, as they aren't re-entrant.
Use a single global flag, declared static volatile for your signal handler to manipulate. The handler should literally have this one line of code, and NOTHING else; This flag should impact the flow control elsewhere in the 1 & Only thread in the program.
static volatile bool g_zig_instead_of_zag_flg = false;
...
void signal_handler_fnc()
g_zig_instead_of_zag_flg = true;
return
int main() {
if(false == g_zig_instead_of_zag) {
do_zag();
} else {
do_zig();
g_zig_instead_of_zag = false;
return 0;
}
Michael Kerrisk's The Linux Programming Interface has examples of both methods, and a few more, but the examples come with a lot of his own private functions you have to get working, and the examples carefully avoid many of the gotchas they should explore, so not great.
Using the Poxix interval timer that notifies via a thread makes everything a lot worse, and AFAICT, that notification method is pretty much useless. I only say pretty much because I am allowing that there may be SOME case where doing nothing in the main() thread, and everything in the handler thread is useful, but I sure can't think of any such case.

Design Pattern for multithreaded observers

In a digital signal acquisition system, often data is pushed into an observer in the system by one thread.
example from Wikipedia/Observer_pattern:
foreach (IObserver observer in observers)
observer.Update(message);
When e.g. a user action from e.g. a GUI-thread requires the data to stop flowing, you want to break the subject-observer connection, and even dispose of the observer alltogether.
One may argue: you should just stop the data source, and wait for a sentinel value to dispose of the connection. But that would incur more latency in the system.
Of course, if the data pumping thread has just asked for the address of the observer, it might find it's sending a message to a destroyed object.
Has someone created an 'official' Design Pattern countering this situation? Shouldn't they?

If you want to have the data source to always be on the safe side of concurrency, you should have at least one pointer that is always safe for him to use.
So the Observer object should have a lifetime that isn't ended before that of the data source.
This can be done by only adding Observers, but never removing them.
You could have each observer not do the core implementation itself, but have it delegate this task to an ObserverImpl object.
You lock access to this impl object. This is no big deal, it just means the GUI unsubscriber would be blocked for a little while in case the observer is busy using the ObserverImpl object. If GUI responsiveness would be an issue, you can use some kind of concurrent job-queue mechanism with an unsubscription job pushed onto it. ( like PostMessage in Windows )
When unsubscribing, you just substitute the core implementation for a dummy implementation. Again this operation should grab the lock. This would indeed introduce some waiting for the data source, but since it's just a [ lock - pointer swap - unlock ] you could say that this is fast enough for real-time applications.
If you want to avoid stacking Observer objects that just contain a dummy, you have to do some kind of bookkeeping, but this could boil down to something trivial like an object holding a pointer to the Observer object he needs from the list.
Optimization :
If you also keep the implementations ( the real one + the dummy ) alive as long as the Observer itself, you can do this without an actual lock, and use something like InterlockedExchangePointer to swap the pointers.
Worst case scenario : delegating call is going on while pointer is swapped --> no big deal all objects stay alive and delegating can continue. Next delegating call will be to new implementation object. ( Barring any new swaps of course )

You could send a message to all observers informing them the data source is terminating and let the observers remove themselves from the list.
In response to the comment, the implementation of the subject-observer pattern should allow for dynamic addition / removal of observers. In C#, the event system is a subject/observer pattern where observers are added using event += observer and removed using event -= observer.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string