Mutexes vs Monitors - A Comparison - multithreading

From what I have learned about Mutexes - they generally provide a locking capability on a shared resources. So if a new thread wants to access this locked shared resource - it either quits or has to continually poll the lock (and wastes processor cycles in waiting for the lock).
However, a monitor has condition variables which provides a more asynchronous way for waiting threads - by putting them on wait queue and thereby not making them consume processor cycles.
Would this be the only advantage of monitors over mutexes (or any general locking mechanism without condition variables) ?

Mutexes are low level construct. They just provide mutual exclusion and memory visibility/ordering. Monitors, on the other hand, are higher level - they allow threads to wait for an application specific condition to hold.
So, in some cases monitors are just overkill over a simple lock/unlock, but in most cases mutexes alone are not nearly enough - so you see them used with one or more condition variables - conceptually using monitors equivalent.

I think, a monitor locks an object (Multi thread cannot access the object at the same time.)
While a mutex locks a process (multi-thread only one can go through the process.)

Related

why POSIX doesn't provide a robust IPC semaphore(regarding process crash safety)

According to this link, How do I recover a semaphore when the process that decremented it to zero crashes? it seems that there is no robust semaphore inter-process, and the author finally chose filelock, which is guaranteed to be released properly by the system-level or kernel-level control.
But I also found robust mutex provided by pthread https://man7.org/linux/man-pages/man3/pthread_mutexattr_setrobust.3.html why there is no something like robust semaphore?
And an extra question: what robust alternatives we have regarding IPC synchronization? filelock seems to be the best one. I think providing such mechanism is not that difficult from system or kernel level,since they do implement fielock. then why they don't provide some other approaches?
When you use a mutex, it can be acquired by at most one thread at a time. Therefore, once the mutex has been acquired, the owner can write its process ID or thread ID (depending on the system) into the mutex, and future users can detect whether the owner is still alive or not.
However, a semaphore is ultimately a counter. It is possible that different threads may increment or decrement the counter. There isn't intrinsically one resource that is being shared; there could instead be multiple resources.
For example, if we're trying to limit ourselves to a certain number of outgoing connections (say, 8), then we could create a semaphore with that value and allow threads to acquire it (wait) to make a connection, and then increment it (post) when they're done. If we never want to make more than 8 connections at once, the semaphore will never block; we'll have acquired it successfully each time, even though there's no exclusion.
In such a situation, there isn't going to be space inside the semaphore to store every process's thread ID. Using memory allocation is tricky because that code needs to be synchronized independently, and even if that could be solved, it means that a semaphore value would have at least O(N) performance when acquiring the semaphore. I work on a production system that uses hundreds of threads, so you can imagine the performance problems if we had such a semaphore design.
There are other solutions which you can use when you need robustness, such as file locking or a robust mutex in a shared memory segment, but none of them have the same properties as a semaphore. Therefore, any discussion of what primitives should be used instead depends on the particular needs of the situation (which should probably be placed in a new question).

Can multiple threads acquire lock on the same object?

I am taking a course on concurrency. The text says that multi-threading allows high throughput as it takes advantage of the multiples cores of the cpu.
I have a question about locking in the context of multiple cores. If we have multiple threads and they are running in different cpu cores, why can't two threads acquire the same lock? How does os protect against such scenarios?
Locking and locks are for synchronization to prevent data corruption when multiple threads want to write to the same memory.
Generally you run multiple threads and use locking only in critical situations.
If two or more threads want to write into the same place at the same time then the multi core calculation is limited. Of course you can use no locking in this situation but results can be unpredictable at that moment.
For example to write multi-threaded calculation of matrix multiplication you make a thread for every row of the resulting matrix. There is no locking needed because every thread writes to different place and this scenario can fully benefit from multiple processors.
If you want to permit more than one shared access to a resource then you can use Semaphore (in java).
If we have multiple threads and they are running in different cpu cores, why can't two threads acquire the same lock?
The purpose of mutex/lock is to implement mutual exclusion - only one thread can lock a mutex at a time. Or, in other words, many threads cannot lock the same mutex at the same time, by definition. This mechanism is needed to allow multiple threads to store into or read from a shared non-atomic resource without data race conditions.
How does os protect against such scenarios?
OS support is needed to prevent the threads from busy-waiting when locking a mutex that is already locked by another thread. Linux implementations of mutex (and semaphore) use futex to put the waiting threads to sleep and wake them up when the mutex is released.
Here is a longer explanation from Linus Torvalds of how mutex is implemented.

Multithreading on multiple core/processors

I get the idea that if locking and unlocking a mutex is an atomic operation, it can protect the critical section of code in case of a single processor architecture.
Any thread, which would be scheduled first, would be able to "lock" the mutex in a single machine code operation.
But how are mutexes any good when the threads are running on multiple cores? (Where different threads could be running at the same time on different "cores" at the same time).
I can't seem to grasp the idea of how a multithreaded program would work without any deadlock or race condition on multiple cores?
The general answer:
Mutexes are an operating system concept. An operating system offering mutexes has to ensure that these mutexes work correctly on all hardware that this operation system wants to support. If implementing a mutex is not possible for a specific hardware, the operating system cannot offer mutexes on that hardware. If the operating system requires the existence of mutexes to work correctly, it cannot support that hardware at all. How the operating system is implementing mutexes for a specific hardware is unsurprisingly very hardware dependent and varies a lot between the operating systems and their supported hardware.
The detailed answer:
Most general purpose CPUs offer atomic operations. These operations are designed to be atomic across all CPU cores within a system, whether these cores are part of a single or multiple individual CPUs.
With as little as two atomic operations, atomic_or and atomic_and, it is possible to implement a lock. E.g. think of
int atomic_or ( int * addr, int val )
It atomically calculates *addr = *addr | val and returns the old value of *addr prior to performing the calculation. If *lock == 0 and multiple threads call atomic_or(lock, 1), then only one of them will get 0 as result; only the first thread to perform that operation. All other threads get 1 as result. The one thread that got 0 is the winner, it has the lock, all other threads register for an event and go to sleep.
The winner thread now has exclusive access to the section following the atomic_or, it can perform the desired work and once it is done, it just clears the lock again (atomic_and(lock, 0)) and generates a system event, that the lock is now available again.
The system will then wake up one, some, or all of the threads that registered for this event before going to sleep and the race for the lock starts all over. Either one of the woken up threads will win the race or possibly none of them, as another thread was even faster and may have grabbed the lock in between the atomic_and and before the other threads were even woken up but that is okay and still correct, as it's still only one thread having access. All threads that failed to obtain the lock go back to sleep.
Of course, the actual implementations of modern systems are often much more complicated than that, they may take things like threads priorities into account (high prio threads may be preferred in the lock race) or might ensure that every thread waiting for a mutex will eventually also get it (precautions exist that prevent a thread from always losing the lock-race). Also mutexes can be recursive, in which case the system ensures that the same thread can obtain the same mutex multiple times without deadlocking and this requires some additional bookkeeping.
Probably needless to say but atomic operations are more expensive operations as they require the cores within a system to synchronize their work and this will slow their processing throughput. They may be somewhat expensive if all cores run on a single CPU but they may even be very expensive if there are multiple CPUs as the synchronization must take place over the CPU bus system that connects the CPUs with each other and this bus system usually does not operate at CPU speed level.
On the other hand, using mutexes will always slow down processing to begin with as providing exclusive access to resources has to slow down processing if multiple threads ever require access at the same time to continue their work. So for implementing mutexes this is irrelevant. Actually, if you can implement a function in a thread-safe way using just atomic operations instead of full featured mutexes, you will quite often have a noticeable speed benefit, despite these operations being more expensive than normal operations.
Threads are managed by the operating system, which among other things, is responsible for scheduling threads to cores, so it can also avoid scheduling a specific thread onto a core.
A mutex is an operating-system concept. You're basically asking the OS to block a thread until some other thread tells the OS it's ok
On modern operating systems, threads are an abstraction over the physical hardware. A programmer targets the thread as an abstraction for code execution. There is no separate abstraction for working on a hardware core available. The operating system is responsible for mapping threads to physical cores.
A mutex is a data structure that lives in system memory. Any thread that has access can read that memory position, regardless of what thread or core it is running in. It doesn't matter whether your code is executing on core 1 or 20, its still has the ability to read the current state of the lock.
In other words, regardless of the number of threads or cores, there is only shared system memory for them to act on.

thread synchronization vs process synchronization

can we use the same synchronization mechanisams for both thread synchronization and process synchronization
what are thes synchronization mechanisams that are avilable only within the process
semaphores are generally what are used for multi process synchronization in terms of shared memory access, etc.
critical sections, mutexes and conditions are the more common tools for thread synchronization within a process.
generally speaking, the methods used to synchronize threads are not used to synchronize processes, but the reverse is usually not true. In fact its fairly common to use semaphores for thread synchronization.
There are several synchronization entities. They have different purposes and scope. Different languages and operating system implement them differently. On Windows, for one, you can use monitors for synching threads within a processes, or mutex for synching processes. There are semaphores, events, barriers... It all depends on the case. .NET provides so called slim versions that have improved performance but target only in-process synching.
One thing to remember though. Synching processes requires system resource, allocation and manipulation (locking and releasing) of which take quite a while.
An application consists of one or more
processes. A process, in the simplest
terms, is an executing program. One or
more threads run in the context of the
process. A thread is the basic unit to
which the operating system allocates
processor time. A thread can execute
any part of the process code,
including parts currently being
executed by another thread.
Ref.
As to specific synchronisation constructs, that will depend on the OS/Environment/language
One difference: Threads within a process have equal access to the memory of the process. Memory is typically private to a process, but can be explicitly shared.

Critical Sections that Spin on Posix?

The Windows API provides critical sections in which a waiting thread will spin a limited amount of times before context switching, but only on a multiprocessor system. These are implemented using InitializeCriticalSectionAndSpinCount. (See http://msdn.microsoft.com/en-us/library/ms682530.aspx.) This is efficient when you have a critical section that will often only be locked for a short period of time and therefore contention should not immediately trigger a context switch. Two related questions:
For a high-level, cross-platform threading library or an implementation of a synchronized block, is having a small amount of spinning before triggering a context switch a good default?
What, if anything, is the equivalent to InitializeCriticalSectionAndSpinCount on other OS's, especially Posix?
Edit: Of course no spin count will be optimal for all cases. I'm only interested in whether using a nonzero spin count would be a better default than not using one.
My opinion is that the optimal "spin-count" for best application performance is too hardware-dependent for it to be an important part of a cross-platform API, and you should probably just use mutexes (in posix, pthread_mutex_init / destroy / lock / trylock) or spin-locks (pthread_spin_init / destroy / lock / trylock). Rationale follows.
What's the point of the spin count? Basically, if the lock owner is running simultaneously with the thread attempting to acquire the lock, then the lock owner might release the lock quickly enough that the EnterCriticalSection caller could avoid giving up CPU control in acquiring the lock, improving that thread's performance, and avoiding context switch overhead. Two things:
1: obviously this relies on the lock owner running in parallel to the thread attempting to acquire the lock. This is impossible on a single execution core, which is almost certainly why Microsoft treats the count as 0 in such environments. Even with multiple cores, it's quite possible that the lock owner is not running when another thread attempts to acquire the lock, and in such cases the optimal spin count (for that attempt) is still 0.
2: with simultaneous execution, the optimal spin count is still hardware dependent. Different processors will take different amounts of time to perform similar operations. They have different instruction sets (the ARM I work with most doesn't have an integer divide instruction), different cache sizes, the OS will have different pages in memory... Decrementing the spin count may take a different amount of time on a load-store architecture than on an architecture in which arithmetic instructions can access memory directly. Even on the same processor, the same task will take different amounts of time, depending on (at least) the contents and organization of the memory cache.
If the optimal spin count with simultaneous execution is infinite, then the pthread_spin_* functions should do what you're after. If it is not, then use the pthread_mutex_* functions.
For a high-level, cross-platform threading library or an
implementation of a synchronized block, is having a small amount of
spinning before triggering a context switch a good default?
One would think so. Many moons ago, Solaris 2.x implemented adaptive locks, which did exactly this - spin for a while, if the mutex is held by a thread executing on another CPU or block otherwise.
Obviously, it makes no sense to spin on single-CPU systems.

Resources