Optimizing bottlenecked thread safe code wrapped with unsafe thread code

Optimizing bottlenecked thread safe code wrapped with unsafe thread code - multithreading

I have the following pseudocode which I want to optimize:
(1) Lock(L1);
{ Not thread-safe code }
{ Thread safe bottleneck code }
{ Not thread-safe code }
(2) Unlock(L1);
I am looking for a way to optimize the code, such that the safe bottleneck code will be performed in parallel.
Not a good idea I had:
(1) Lock(L1);
{ Not thread-safe code }
Unlock(L1);
{ Thread safe bottleneck code }
Lock(L1);
{ Not thread-safe code }
(2) Unlock(L1);
The way I see it, the lock at (1) must be unlocked before entering the thread-safe code, but if I do that there's always some edge case that causes a race condition.
I can add as many locks as I want, but prefer not to remove the lock and unlock at (1) and (2), unless there's no other option.
Assume each of the blocks are atomic and cannot be splitted.
Any suggestions?
Thanks.

Related

why does std::condition_variable::wait need mutex?

TL;DR
Why does std::condition_variable::wait needs a mutex as one of its variables?
Answer 1
You may look a the documentation and quote that:
wait... Atomically releases lock
But that's not a real reason. That's just validate my question even more: why does it need it in the first place?
Answer 2
predicate is most likely query the state of a shared resource and it must be lock guarded.
OK. fair.
Two questions here
Is it always true that predicate query the state of a shared resource? I assume yes. I t doesn't make sense to me to implement it otherwise
What if I do not pass any predicate (it is optional)?
Using predicate - lock makes sense
int i = 0;
void waits()
{
std::unique_lock<std::mutex> lk(cv_m);
cv.wait(lk, []{return i == 1;});
std::cout << i;
}
Not Using predicate - why can't we lock after the wait?
int i = 0;
void waits()
{
cv.wait(lk);
std::unique_lock<std::mutex> lk(cv_m);
std::cout << i;
}
Notes
I know that there are no harmful implications to this practice. I just don't know how to explain to my self why it was design this way?
Question
If predicate is optional and is not passed to wait, why do we need the lock?

When using a condition variable to wait for a condition, a thread performs the following sequence of steps:
It determines that the condition is not currently true.
It starts waiting for some other thread to make the condition true. This is the wait call.
For example, the condition might be that a queue has elements in it, and a thread might see that the queue is empty and wait for another thread to put things in the queue.
If another thread were to intercede between these two steps, it could make the condition true and notify on the condition variable before the first thread actually starts waiting. In this case, the waiting thread would not receive the notification, and it might never stop waiting.
The purpose of requiring the lock to be held is to prevent other threads from interceding like this. Additionally, the lock must be unlocked to allow other threads to do whatever we're waiting for, but it can't happen before the wait call because of the notify-before-wait problem, and it can't happen after the wait call because we can't do anything while we're waiting. It has to be part of the wait call, so wait has to know about the lock.
Now, you might look at the notify_* methods and notice that those methods don't require the lock to be held, so there's nothing actually stopping another thread from notifying between steps 1 and 2. However, a thread calling notify_* is supposed to hold the lock while performing whatever action it does to make the condition true, which is usually enough protection.

TL;DR
If predicate is optional and is not passed to wait, why do we need the lock?
condition_variable is designed to wait for a certain condition to come true, not to wait just for a notification. So to "catch" the "moment" when the condition becomes true you need to check the condition and wait for the notification. And to avoid a race condition you need those two to be a single atomic operation.
Purpose Of condition_variable:
Enable a program to implement this: do some action when a condition C holds.
Intended Protocol:
Condition producer changes state of the world from !C to C.
Condition consumer waits for C to happen and takes the action while/after condition C holds.
Simplification:
For simplicity (to limit number of cases to think of) let's assume that C never switches back to !C. Let's also forget about spurious wakeups. Even with this assumptions we'll see that the lock is necessary.
Naive Approach:
Let's have two threads with an essential code summarized like this:
void producer() {
_condition = true;
_condition_variable.notify_all();
}
void consumer() {
if (!_condition) {
_condition_variable.wait();
}
action();
}
The Problem:
The problem here is a race condition. A problematic interleaving of the threads is following:
The consumer reads condition, checks it to be false and decides to wait.
A thread scheduler interrupts consumer and resumes producer.
The producer updates condition to become true and invokes notify_all().
The consumer is resumed.
The consumer actually does wait(), but is never notified and waken up (a liveness hazard).
So without locking the consumer may miss the event of the condition becoming true.
Solution:
Disclaimer: this code still does not handle spurious wakeups and possibility of condition becoming false again.
void producer() {
{ std::unique_lock<std::mutex> l(_mutex);
_condition = true;
}
_condition_variable.notify_all();
}
void consumer() {
{ std::unique_lock<std::mutex> l(_mutex);
if (!_condition) {
_condition_variable.wait(l);
}
}
action();
}
Here we check condition, release lock and start waiting as a single atomic operation, preventing the race condition mentioned before.
See Also
Why Lock condition await must hold the lock

You need a std::unique_lock when using std::condition_variable for the same reason you need a std::FILE* when using std::fwrite and for the same reason a BasicLockable is necessary when using std::unique_lock itself.
The feature std::fwrite gives you, entire the reason it exists, is to write to files. So you have to give it a file. The feature std::unique_lock provides you is RAII locking and unlocking of a mutex (or another BasicLockable, like std::shared_mutex, etc.) so you have to give it something to lock and unlock.
The feature std::condition_variable provides, the entire reason it exists, is the atomically waiting and unlocking a lock (and completing a wait and locking). So you have to give it something to lock.
Why would someone want that is a separate question that has been discussed already. For example:
When is a condition variable needed, isn't a mutex enough?
Conditional Variable vs Semaphore
Advantages of using condition variables over mutex
And so on.
As has been explained, the pred parameter is optional, but having some sort of a predicate and testing it isn't. Or, in other words, not having a predicate doesn't make any sense inn a manner similar to how having a condition variable without a lock doesn't making any sense.
The reason you have a lock is because you have shared state you need to protect from simultaneous access. Some function of that shared state is the predicate.
If you don't have a predicate and you don't have a lock you really don't need a condition variable just like if you don't have a file you really don't need fwrite.
A final point is that the second code snippet you wrote is very broken. Obviously it won't compile as you define the lock after you try to pass it as an argument to condition_variable::wait(). You probably meant something like:
std::mutex mtx_cv;
std::condition_variable cv;
...
{
std::unique_lock<std::mutex> lk(mtx_cv);
cv.wait(lk);
lk.lock(); // throws std::system_error with an error code of std::errc::resource_deadlock_would_occur
}
The reason this is wrong is very simple. condition_variable::wait's effects are (from [thread.condition.condvar]):
Effects:
— Atomically calls lock.unlock() and blocks on *this.
— When unblocked, calls lock.lock() (possibly blocking on the lock), then returns.
— The function will unblock when signaled by a call to notify_one() or a call to notify_all(), or spuriously
After the return from wait() the lock is locked, and unique_lock::lock() throws an exception if it has already locked the mutex it wraps ([thread.lock.unique.locking]).
Again, why would someone want coupling waiting and locking the way std::condition_variable does is a separate question, but given that it does - you cannot, by definition, lock a std::condition_variable's std::unique_lock after std::condition_variable::wait has returned.

It's not stated in the documentation (and could be implemented differently) but conceptually you can imagine the condition variable has another mutex to both protect its own data but also coordinate the condition, waiting and notification with modification of the consumer code data (e.g. queue.size()) affecting the test.
So when you call wait(...) the following (logically) happens.
Precondition: The consumer code holds the lock (CCL) controlling the consumer condition data (CCD).
The condition is checked, if true, execution in the consumer code continues still holding the lock.
If false, it first acquires its own lock (CVL), adds the current thread to the waiting thread collection releases the consumer lock and puts itself to waiting and releases its own lock (CVL).
That final step is tricky because it needs to sleep the thread and release the CVL at the same time or in that order or in a way that threads notified just before going to wait are able to (somehow) not go to wait.
The step of acquiring the CVL before releasing the CCD is key. Any parallel thread trying to update the CCD and notify will be blocked either by the CCL or CVL. If the CCL was released before acquiring the CVL a parallel thread could acquire the CCL, change the data and then notify before the the to-be-waiting thread is added to the waiters.
A parallel thread acquires the CCL, modifies the data to make the condition true (or at least worth testing) and then notifies. Notification acquires the the CVL and identifies a blocked thread (or threads) if any to unwait. The unwaited threads then seek to acquire the CCL and may block there but won't leave wait and re-perform the test until they've acquired it.
Notification must acquire the CVL to make sure threads that have found the test false have been added to the waiters.
It's OK (possibly preferable for performance) to notify without holding the CCL because the hand-off between the CCL and CVL in the wait code is ensuring the ordering.
It may be preferrable because notifying when holding the CCL may mean all the unwaited threads just unwait to block (on the CCL) while the thread modifying the data is still holding the lock.
Notice that even if the CCD is atomic you must modify it holding the CCL or that Lock CVL, unlock CCL step won't ensure the total ordering required to make sure notifications aren't sent when threads are in the process of going to wait.
The standard only talks about atomicity of operations and another implementation may have a way of blocking notification before completing the 'add to waiters' step has completed following a failed test. The C++ Standard is careful to not dictate an implementation.
In all that, to answer some of the specific questions.
Must the state be shared? Sort of. There could be an external condition like a file being in a directory and the wait is timed to re-try after a time-period. You can decide for yourself whether you consider the file system or even just the wall-clock to be shared state.
Must there be any state? Not necessarily. A thread can wait on notification.
That could be tricky to coordinate because there has to be enough sequencing to stop the other thread notifying out of turn. The commonest solution is to have some boolean flag set by the notifying thread so the notified thread knows if it missed it. The normal use of void wait(std::unique_lock<std::mutex>& lk) is when the predicate is checked outside:
std::unique_lock<std::mutex> ulk(ccd_mutex)
while(!condition){
cv.wait(ulk);
}
Where the notifying thread uses:
{
std::lock_guard<std::mutex> guard(ccd_mutex);
condition=true;
}
cv.notify();

The reason is that in some times the waiting-thread holds the m_mutex:
#include <mutex>
#include <condition_variable>
void CMyClass::MyFunc()
{
std::unique_lock<std::mutex> guard(m_mutex);
// do something (on the protected resource)
m_condiotion.wait(guard, [this]() {return !m_bSpuriousWake; });
// do something else (on the protected resource)
guard.unluck();
// do something else than else
}
and a thread should never go to sleep while holding a m_mutex. One doesn't want to lock everybody out, while sleeping. So, atomically: {guard is unlocked and the thread go to sleep}. Once it waked up by the other-thread (m_condiotion.notify_one(), let's say) guard is locked again, and then the thread continue.
Reference (video)

Because if not so, there's a race condition before the waiting thread noticing the change of the shared state and the wait() call.
Assume we got a shared state of type std::atomic state_, there's still a fair chance for the waiting thread to miss a notification:
T1(waiting) | T2(notification)
---------------------------------------------- * ---------------------------
1) for (int i = state_; i != 0; i = state_) { |
2) | state_ = 0;
3) | cv.notify();
4) cv.wait(); |
5) }
6) // go on with the satisfied condition... |
Note that the wait() call failed to notice the latest value of state_ and may keep waiting forever.

Is this reader-writer lock implementation correct?

wondering if the following implementation of reader/writer problem correct.
We're using only one mutex and a count variable to indicate the num of readers.
read api:
void read() {
mutex.lock();
count ++;
mutex.unlock();
// Do read
mutex.lock();
count --;
mutex.unlock();
}
write api:
void write() {
while(1) {
mutex.lock();
if(count == 0) {
// Do write
mutex.unlock();
return;
}
mutex.unlock();
}
}
Looks like in the code:
Only one lock is used so there is no deadlock problem;
Writer can only write when count == 0 so there is no race conditions.
As for a read/write problem prior to reader, is there any problem for the above code? Looks like all the standard implementation uses two locks(eg. https://en.wikipedia.org/wiki/Readers%E2%80%93writers_problem#First_readers-writers_problem). If the above implementation seems correct, why are we using two locks in wiki? Thank you!

It's correct, but it will perform atrociously. Imagine if while a reader is trying to do work there are two waiting writers. Those two waiting writers will constantly acquire and release the mutex, saturating the CPU resources while the reader is trying to finish its work so that the system as a whole can make forward progress.
The nightmare scenario would be where the reader shares a physical core with one of the waiting writers. Yikes.
Correct, yes. Useful and sensible, definitely not!
One reason to use two locks is to prevent two writers from competing. A more common solution, at least in my experience, is to use a lock with a condition variable to release waiting writers or alternate phases.

Posix Thread Synchronization Primitives: pthread_cond_signal() and pthread_cond_wait()

I was writing a multithreading code using pthread_cond in conjuction with mutexes, which made me wonder:
is the signal one time, so if the signal is sent before the other thread is waiting for it, the other thread will keep waiting indefinitely?
Since cond_wait() unlocks the mutex, is it a thumb rule to write this statement JUST before mutex_unlock(), (I realise this makes the latter redundant, but I do that just for clarity) or are there many scenarios where you would want to write the function outside the mutex lock?

Make this your mantra:
Only ever wait for something ...
Waiting should almost always look like this:
if (pthread_mutex_lock(...) != 0) {
/* something terrible happened, panic */
}
while (test-condition) {
pthread_cond_wait(...)
}
pthread_mutex_unlock(...)
If the exclusive check of test-condition fails, and so a context enters pthread_cond_wait the associated mutex will be atomically unlocked.
This means another context can enter code that looks like:
if (pthread_mutex_lock(...) != 0) {
/* panic */
}
test-condition = false;
pthread_cond_signal(...);
pthread_mutex_unlock(...);
Changing the predicate and atomically waking the first context that is in the call to pthread_cond_wait, which in turn checks the predicate test-condition and can jump past the loop.
If we just look at the waiting code again:
if (pthread_mutex_lock(...) != 0) {
/* something terrible happened, panic */
}
while (test-condition) {
pthread_cond_wait(...)
}
pthread_mutex_unlock(...)
Between the call to wait and unlock, there is always exclusivity; Either because the mutex was acquired exclusively (the predicated wait loop was not entered), or because before returning from a call topthread_cond_wait the mutex was re-acquired atomically.
Synchronization is hard to get right, and is costly for a multi-threaded application; One should attempt to keep critical sections simple to squeeze the margins for error to their minimum size.
Another important thing to do is check the return values of all these pthread_* calls; The return value is important information about state that you always need to know, and nearly always need to act upon.
Some useful man pages (for return values):
pthread_mutex_lock
pthread_cond_wait
pthread_cond_signal

Spawn a thread and worker queue only if resource is busy

Hoping someone can help me design this correctly.
In my TCP code, I have a SendMessage() function that tries to write to the wire. I am trying to design the call so that it moves to a producer/consumer model if a lot of concurrent requests happen, but at the same time, stays single-threaded if there are no concurrent requests (for maximum performance).
I'm struggling on how to design this without race conditions because there is no way to move locks between threads.
What I have so far is something like (pseudo-coded):
SendMessage(msg) {
if(Monitor.TryEnter(wirelock,200)) {
try{
sendBytes(msg);
}
finally {
Monitor.Exit...
}
}
else {
_SomeThreadSafeQueue.add(msg)
Monitor.TryEnter(consumerlock,..
Task.Factory.New(ConsumerThreadMethod....
}
}
ConsumerThreadMethod() {
lock (wirelock) {
while(therearemessagesinthequeue)
sendBytes...
}
}
Any obvious race conditions?
EDIT: Found a flaw in the last one. How about this instead?
SendMessage(msg) {
if(Monitor.TryEnter(wirelock)) {
try{
sendBytes(msg);
}
finally {
Monitor.Exit...
}
}
else {
_SomeThreadSafeQueue.add(msg)
if (Interlocked.Increment(ref _threadcounter) == 1)
{
Task.Factory.StartNew(() => ConsumerThreadMethod());
}
else
{
Interlocked.Decrement(ref _threadcounter);
}
}
}
ConsumerThreadMethod() {
while(therearemessagesinthequeue)
lock (wirelock) {
sendBytes...
}
}
Interlocked.Decrement(ref _threadcounter);
}
So basically using the interlocked counter as a way to only ever spawn one thread (if necessary)

No obvious races but TryEnter is a cause for some serious idle time. I actually think that using a consumer thread all the time is the best solution. If there is little to do, the overhead will be really small (the consumer thread will be asleep when not working, if designed correctly).
Now you create a new task for each sent message, resulting in huge contention on the lock, since you are using a while loop in the consumer thread.
EDIT: Since you are using non-blocking sockets, a single consumer thread should be enough to handle all send requests. The throughput of a single thread is higher than your network. If you have more consumers it's hard to make sure that no two consumer threads send on the same socket, without serializing everything using a mutex. I don't think switching between single-threaded and multi-threaded is a good idea.
Your current "multithreaded" solution does not give you any performance gain since all work is protected using the same mutex. It will be as slow, or slower, than a single thread.

Some questions about pthread_mutex_lock and pthread_mutex_unlock

When a thread has acquired the lock and execute the following code, Could the thread will unlock the lock it has acquired just with the return statement? some code like this.
static pthread_mutex_t mutex;
int foo()
{
pthread_mutex_lock(mutex);
.........
execute some code here and some errors happen
return -1;// but without pthread_mutex_unlock
pthread_mutex_unlock(mutext)
return 0;
}
Some errors happens before pthread_mutex_unlock statement and the thread returns to the callee. Will the thread give back the mutext lock for other threads without executing pthread_mutex_unlock?

No, the lock is not automatically released. This is why, in C++ code, it is common to use Resource Aquisition is Initialization (RAII), which takes advantage of construction/destruction to ensure that each call to the lock function has a corresponding call to unlock. If you are writing pure C code, though, you will need to make sure that you unlock the mutex, even in error situations, before returning.
Note that you can make your coding a little bit easier by doing the following:
static inline int some_function_critical_section_unsynchronized(void) {
// ...
}
int some_function(void) {
int status = 0;
pthread_mutex_lock(mutex);
status = some_function_critical_section_unsynchronized();
pthread_mutex_unlock(mutex);
return status;
}
In other words, if you can separate the logic into smaller functions, you may be able to tease out the locking code from your logic. Of course, sometimes this is not possible (like when coding in this fashion would make the critical section too large, and for performance, the less readable form is required).
If you can use C++, I would strongly suggest using boost::thread and boost::scoped_lock to ensure that the acquired mutex is automatically freed when its usage has gone out of scope.

No, it will not automatically unlock the mutex. You must explicitly call pthread_mutex_unlock() in the error path, if the mutex has been locked by the function.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string