POSIX condition variables VS Win32 Event Objects (about spurious wakeup problem)

POSIX condition variables VS Win32 Event Objects (about spurious wakeup problem) - multithreading

In POSIX, because of "spurious wakeup" problem, programmers are forced to use while() instead of if when checking condition.
I think spurious wakeup is unintuitive and confusing problem, but I thought it's an inevitable problem.
Recently, I found that event objects of win32 doesn't have "spurious wakeup" problem.
Why POSIX system and other systems still use condition variable which has "spurious wakeup" problem? (despite this can be solved?)

You ask:
Why POSIX system and other systems still use condition variable which has "spurious wakeup" problem? (despite this can be solved?)
Basically, it's faster than the alternative.
The RATIONALE section of the POSIX.1-2017 treatment of pthread_cond_broadcast and pthread_cond_signal specifically has this to say about "Multiple Awakenings by Condition Signal":
While [the "spurious wakeup"] problem could be resolved, the loss of efficiency for a fringe condition that occurs only rarely is unacceptable, especially given that one has to check the predicate associated with a condition variable anyway. Correcting this problem would unnecessarily reduce the degree of concurrency in this basic building block for all higher-level synchronization operations.
Emphasis added.
The text further observes that forcing "a predicate-testing-loop around the condition wait" is a more robust coding practice than the alternative, because the application will necessarily tolerate superfluous broadcasts and signals from elsewhere in the code.

Related

Cross (POSIX) platform analog for sigtimedwait()

Use case is the need to mask SIGPIPE in pthreads that do their own write() and/or SSL_write() and have it compile on current POSIX-ish systems like Linux, macOS, BSD, etc. The typical approach on Linux is explained quite nicely here, and there is lots of good additional discussion on the topic here.
The typical signal(SIGPIPE, SIG_IGN) does work everywhere I have tried, but (I believe) there should be a more surgical solution that avoids globally ignoring SIGPIPE. It would also be nice to avoid platform specific pragma if possible.
The sigtimedwait() function does not appear to exist in (current?) versions of macOS, so a cross platform solution does not look likely using that approach.
The sigwait() function seems to exist everywhere, but that will block forever if the particular signal is not actually pending. So the next best approach appears to be to use sigpending() to see what is pending, and then sigwait() to service it, which both appear to be available.
What has me concerned is that there is virtually nothing (that I can find) written on this particular problem, which is usually a sign that I am missing something painfully obvious.
So, is pthread_sigmask() / sigpending() / sigwait() / pthread_sigmask() a good choice for the above use case? Or are there (non?)obvious pitfalls I should be aware of?

So, is pthread_sigmask() / sigpending() / sigwait() /
pthread_sigmask() a good choice for the above use case? Or are there
(non?)obvious pitfalls I should be aware of?
There's the fact that sigwait() and sigtimedwait() were released in the same version of POSIX. If you're looking to achieve portability by relying on standards, and if macOS fails to conform by omitting the latter, then you should be concerned about how else it fails to conform. Indeed, there are other areas of nonconformance that may bite you, though not necessarily with your particular proposed series of function calls.
For best portability I would suggest going for the simplest solutions possible. In this case I would simply ignore the signal (that is, set its disposition to SIG_IGN). I infer that you understand that signal dispositions are per-process characteristics, not per-thread characteristics, but so what? All of your write()s should be checking their return values to detect short writes and error conditions anyway, and if they do that correctly then they will take appropriate action without any need to receive a signal.

Could software written only in Rust fully avoid race conditions?

Wikipedia defines a race condition as:
A race condition or race hazard is the behavior of an electronics, software, or other system where the output is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when events do not happen in the order the programmer intended.
Rust is a:
safe, concurrent, practical language
If we create software that is 100% Rust, can we avoid race conditions? Why or why not?

No.
I've seen race conditions in:
filesystem accesses,
database accesses,
access to other services.
The environment in which a program evolves in full of data-races, and there's nothing a programming language can do but embrace it.
Rust focuses on memory-safety. In the context of multi-threaded programming, this means preventing data races.
A program with no data race can still contain race conditions:
data race: modification of a value while it is being read/written by another thread with no synchronization, the resulting behavior is unpredictable (especially when optimizers are involved),
race condition: a timing issue on a sequence of events, the resulting behavior is one of a small set of possible behaviors. It can be solved by synchronization, but this is not the only solution.
Race conditions are not memory errors. For Rust, this means they are considered safe, although of course they are still undesirable. They may happen at many different levels: between threads, processes, servers, ...

Does a plain read of a variable that is updated with interlocked functions always return the latest value?

If you only change a MyInt: Integer variable in one or more threads with one of the interlocked functions, lets say InterlockedIncrement, can we guarantee that after the InterlockedIncrement is executed, a plain read of the variable in any thread will return the latest updated value? Yes, no and why?
If not, is it possible to achieve that in Delphi? Note that I'm talking about only one variable, no need to worry about consistency about two or more variables.
The root problems and doubt seems essentially equal to the one in this SO post, but it is targeted at C# there, and I'm using Delphi 2007, so no access to volatile, neither of newer versions of Delphi as well. In that discussion, two major problems that seems to affect Delphi as well were raised:
The cache of the processor reading the variable may not be updated.
The compiler may optimize the code in a way that causes problems to read.
If this is really a problem, I'm very worried to use even a simple counter with InterlockedIncrement, or solutions like the lock-free initialization proposed in here, and would go to just plain Critical Sections of MultiReaderSingleWritter for safety.
Initial analysis
This is what I've found so far, but fell free to address the problems in other ways if appropriate, or even raising other unknown problems so the objective of the question can be achieved:
For the problem 1, I expected that the "full-fence" would also force the cache of other processors to be updated... but reading around it seems to not be the case. It looks that the cache would only be updated if a "read barrier" (or whatever it is called) would be called on the processor what will read the variable. If this is true, is there a way to call such "read barrier" in Delphi, just before reading the variable? Full-fence seems to imply both read and write barriers, so that would also be ok. Since that there is no InterlockedRead function according to the discussion in the first post, could we try (just speculating) to workaround using something like InterlockedCompareExchange (ugh... writing the variable to be able to read it, smells bad), or maybe "lock" low level assembly calls (that could be encapsulated)?
For the problem 2, Delphi optimizations would impact in this matter? Any way to avoid it?
Edit: The solution must work in D2007, but I'd like, preferably, to not make harder a possible future migration to newer Delphi, and use the same piece of code in ARM as well (this became clear to me after David's comments). So, if possible, it would be nice if solution is not coupled with x86/64 memory model. Would be nice if I need only to replace the plain Windows.pas interlocked functions to whatever provides the same interlocked functionality in newer Delphi/ARM, without the need to review the logic for ARM (one less concern).
But, Do the interlocked functions have enough abstraction from CPU architecture in this case? Problem 1 suggests that it doesn't, but I'm not sure if it would affect ARM Delphi. Any way around it, that keeps it simple and still allow relevant better performance over critical sections and similar sync objects?

Using static boolean vs. critical section for concurrency

So I'm designing a new software interface for a USB HID device and I have a question about concurrency protection. I assume that I will have to add concurrency protection around my the calls to ReadFile and WriteFile (please correct me if wrong) as these may be called from different threads in my design.
In the past I have sometimes used static booleans to implement thread safety, adding a loop with a 1ms wait on it until the bool indicated that the code was safe to enter. I have also used CriticalSections. Could anyone tell me if CriticalSections are fundamentally better than using a static bool. I know that I wont have to code up a waiting loop, but what polling rate do they use in VC++ to check the state of the lock? Are they hooked into the OS in some way that makes them better? Is using a bool for a concurrency check not always safe? etc.

I don't know much about C++, but concurrency usually isn't implemented by polling, and generally shouldn't be—polling wastes processor time and wastes energy. The two main low-level approaches are
Blocking on a lock, perhaps with a timeout.
Using a lock-free primitive, most likely either compare-and-set or compare-and-swap.
These approaches are both supported by typical modern hardware. They lead to very different "flavors" of interaction. Writing lock-free data structures is best left to the experts (they tend to be complicated, and, more importantly, it tends to be difficult to see whether they are correct and whether they guarantee progress in the face of contention—their descriptions are usually accompanied by pages of proofs). Fortunately, you can get libraries of them, and they are faster and better-behaved than blocking ones in many but not all cases.

Well never mind. I'll just stick with critical sections. I was hoping to get some interesting information about how critical sections work, or at least a reference to an article about that. IE - why they are different than just writing your own polling loop. I see from this discussion: std::mutex performance compared to win32 CRITICAL_SECTION that there is some confusion about how std::mutex works, but I'm thinking that it is better to use CRITICAL_SECTIONS as it seems to be the surest way to get the the fastest concurrency protection on Windows.
Thanks anyways.

How efficient is a try_lock on a mutex?

How efficient is a try_lock on a mutex? I.e. how much assembler instructions are there likely and how much time are they consuming in both possible cases (i.e. the mutex was already locked before or it was free and could be locked).
In case you have problems to answer the question, here is a how to (in case that is really unclear):
If that answer depends a lot on the OS implementation and hardware: Please answer it for common OS`s (e.g. Linux, Windows, MacOSX), recent versions of them (in case they differ a lot from earlier versions) and common hardware (x86, amd64, ppc, arm).
If that also depends on the library: Take pthread as an example.
Please also answer if they really differ at all. And if they differ, please state the differences. I.e. what do they do differently? What common algorithms are there around? Are there different algorithms around or do all common systems (common by the above list if that is unclear) have implemented mutexes just in the same way?
As of this Meta discussion, this really should be a separate question.
Also, I have asked this as a separate question from the performance of a lock because I am not sure if try_lock may behave different. Maybe also depending on the implementation. Then again, please answer it for common implementations. And this very similar/related question obviously shows that this is an interesting question which can be answered.

A mutex is a logical construction that is independent of any implementation. Operations on mutexes therefore are neither efficient nor inefficient - they are simply defined.
Your question is therefore akin to asking "How efficient is a car?", without reference to what kind of car you might be talking about.
I could implement mutexes in the real world with smoke signals, carrier pigeons or a pencil and paper. I could also implement them on a computer. I could implement a mutex with certain operations on a Cray 1, on an Intel Core 2 Duo, or on the 486 in my basement. I could implement them in hardware. I could implement them in software in the operating system kernel, or in userspace, or using some combination of the two. I might simulate mutexes (but not implement them) using lock-free algorithms that are guaranteed conflict-free within a critical section.
EDIT: Your subsequent edits don't help the situation. "In a low level language (like C or whatever)" is mostly irrelevant, because then we're into measuring language implementation performance, and that's a slippery slope at best. "[F]rom pthread or whatever the native system library provides" is similarly unhelpful, because as I said, there are so many ways that one could implement mutexes in different environments that it's not even a useful comparison to make.
This is why your question is unanswerable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string