what is the difference between semaphore and critical region?

what is the difference between semaphore and critical region? - semaphore

the only thing i understood is
semaphore is a primitive way
critical region has a GUARD variable (semaphore also does but the name is not GUARD!)
??
so whats the difference?

Generally, a critical region is a place where, if two separate threads of execution were to be present, a race condition or some other undesirable effect would occur. Semaphores are one way of preventing two threads from being in the critical region at the same point in time.

The GUARD would only allow 1 thread to enter the critical region at a time, whereas the semaphore can allow n threads (you specify n) to concurrently enter the critical region.

When a process executes code that manipulates shared data (or resource), we say that the process is in it’s critical section (CS) (for that shared data)
and semaphore is Nonnegative integer variable used as a flag and
Signals if and when resource is free

There are two interpretations of "critical region":
A region of code that will produce undefined results if executed simultaneously by two threads.
A region of code that is isolated from all executors except for the current thread. An example of this would be an interrupt handler. These regions are more commonly called "critical sections". On Intel CPUs you can begin/end a critical section with the CLI/STI instructions.

Related

Critical region codes and Semaphores

The semaphores ,which are data structure created by the operating system, are used for providing a synchronization and creating mutual exclusion between the processes. wait() and signal() are methods which are invoked by the operating system in order to manage the semaphores and these methods cannot be interrupted by interrupt service routine signals.
What I am wondering is whether critical region codes between wait() and signal() methods can be interrupted or not ?

Yes they can be interrupted simply because the definition itself proposes no such restriction.
In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed are protected. This protected section is the critical section or critical region. It cannot be executed by more than one process at a time.
So critical-section demands mutual exclusion but it does not say anything about atomicity.
So Yes, critical region codes between wait() and signal() methods can be interrupted but a good synchronization construct would be that once a process/thread enters critical section, even if that process later is interrupted, no other process would be able to enter critical section.

Many threads and critical region

If I have many threads running at the same time how can I allow only 1 thread to enter the critical region? Also what will happen if I had more than 1 thread in the critical region?

There are several kinds of bugs that can happen as a result of improperly-protected critical sections, but the most common is called a race condition. This occurs when the correctness of your program's behavior or output is dependent on events occurring in a particular sequence, but it's possible that the events will occur in a different sequence. This will tend to cause the program to behave in an unexpected or unpredictable manner. (Sorry to be a little vague on that last point but by its very nature it's often difficult to predict in advance what the exact result will be other than to say "it probably won't be what you wanted").
Generally, to fix this, you'd use some kind of lock to ensure that only one thread can access the critical section at once. The most common mechanism for this is a mutex lock, which is used for the "simple" case - you have some kind of a shared resource and only one thread can access it at one time.
There are some other mechanisms available too for more complicated cases, such as:
Reader-Writer Locks - either one person can write to the resource or an unlimited number of people can be reading from it.
Counting semaphore - some specified number of threads can access a particular thread at one time. As an analogy, think of a parking lot that only has, for example, 100 spaces - once 100 cars are parked there, they can't accept any more (or, at least, until one of them leaves).
The .NET Framework provides a ManualResetEvent - basically, the threads in question have to wait until an event happens.
This isn't a lock per se, but it's becoming increasingly common to use immutable data structures to obviate the need for locking in the first place. The idea here is that no thread can modify another thread's data, they're always working against either a local version or the unmodified "central" version.

How does a Mutex work? Does a mutex protect variables globally? Does the scope in which it is defined matter?

Does a mutex lock access to variables globally, or just those in the same scope as the locked mutex?
Note that I had to change the title of this question, as a lot of answers seem to be confused as to what I was asking. This is not a question about the scope (global or otherwise) of a "mutex object", it is a question about what scope of variables are "locked" by a mutex.
I believe the answer to be that a mutex locks access to all variables, ie; all global and locally scoped variables. (This is a result of a mutex blocking thread execution rather than access to specific regions of memory.)
I am attempting to understand Mutexes.
I was attempting to understand what sections of memory, or equivalently, which variables, a mutex would lock.
However my understanding from reading around online is that Mutexes do not lock memory, they lock (or block) simultaneously running threads which are all members of the same process. (Is that correct?)
https://mortoray.com/2011/12/16/how-does-a-mutex-work-what-does-it-cost/
So my question has become simply "are mutexes global?"
... or are they perhaps "generally speaking global, but the stackoverflow community can imagine some special cases in which they are not?"
When originally considering my question, I was interested in things such as those shown in the following example.
// both in global scope, this mutex will lock any global scope variable?
int global_variable;
mutex global_variable_mutex;
int main()
{
// one thread operates here and locks global_variable_mutex
// before reading/writing
{
// local variables in a loop
// launch some threads here, and wait later
int local_variable;
mutex local_variable_mutex;
// wait for launched thread to return
// does the mutex here prevent data races to the variable
// global_variable ???
}
}
One may assume this is pseudo-code for C++ or C, or any other similarly relevant language.
2021 edit: Question title has been changed to better reflect the contents of the question and associated answers.

So my question has become simply "are mutexes global?"
No. A mutex has a lock() and an unlock() method, and the only thing a mutex does is cause its lock() call (from any thread) not to return for as long as another thread has that mutex locked. When the thread that was holding the mutex locked calls unlock(), that is when the lock() call will return in the first thread. That way it is guaranteed that only a single thread will be holding the mutex-lock (i.e. executing in the region between its lock() call and its unlock() call) at any given time.
That's really all there is to it. So a mutex will effect only the threads that call lock() on that particular mutex, and nothing else.
Mutex stands for "Mutual Exclusion" - using one correctly ensures that only one thread at a time will ever be executing any "critical section" protected by the same mutex.
If there are some variables you only ever modify inside critical sections protected by the same mutex, your code doesn't have a data race. No matter whether they're global, static, or pointed to by different variables in different threads or any other way two threads might have a reference to the same object.

When I asked this question I was confused...
When I originally asked this question, I was confused because I had no conceputal understanding of how a "mutex" functions in hardware, whereas I did have a conceptual understanding of many other things that exist in hardware. (For example, how a compiler converts text into machine readable instructions. How cache and memory work. How graphics or coprocessors work. How network hardware and interfaces work, etc.)
Misconception 1: Mutex does not lock memory locations
When I first heard about Mutex, long before writing this question, I misunderstood a mutex to be a feature which locks regions of memory. (That region might be global.)
This is not what happens. Other threads and processes can continue to access main memory and cache if another thread locks a mutex. You can see immediatly why such a design would be inefficient, since it would block all other system processes, for the sake of synchronizing one.
Misconception 2: The scope in which a mutex object is declared is irrelevant
The context of this is C code, and C like languages where you have scoped blocks defined by { and } however the same logic could apply to Python where scope is defined by indentation.
I believe that this misunderstanding came from the existance of scoped_lock objects, and similar concepts where scope is used to manage the lifetime (locking and unlocking, resources) of a Mutex object.
One could also argue that since pointers and references to a Mutex can be passed around a program, the scope of a Mutex couldn't be used to define what variables are "locked" by a mutex.
For example, I misunderstood the following snippet:
{
int x, y, z;
Mutex m;
m.lock();
}
I believed that the above snippet would lock access to variables x, y and z from all other threads because x, y and z are declared in the same scope as the mutex m. This is also not how a mutex works.
Understanding 1: Mutex is typically implemented in hardware using atomic operations
Atomic operations are completely seperate from the concept of mutex, however they are a prerequisite to understanding how a mutex can exist, and how it can work.
When a CPU executes something like c = a + b, this involves a sequence of individual (atomic) operations. The word Atom is derived from Atomos meaning "indivisible", or "fundamental". (Atoms are divisible, but when theorists of Ancient Greece originally concieved of the objects from which matter was composed, they assumed that particles must be divisible down to some fundamental smallest possible component, which itself is indivisible. They were not too far wrong, since an atom is made from other fundamental particles which so far we understand to be indivisible.)
Returning to the point: c = a + b is something like the following:
load a from memory into register 1
load b from memory into register 2
do operation add: add contents of register 2 to register 1, result is in register 1
save register 1 to memory c
The add operation might take several clock cycles, and loading/saving to memory takes typically of order 100 clock cycles on modern x86 machines. However each operation is atomic in the sense that a single CPU instruction is being completed, and this instruction cannot be divided into any smaller step of smaller instructions. The instructions are themselves fundamental computing operations.
With that understood, there exists a set of atomic instructions which can do things such as:
load a value from memory increment it and save it to memory
load a value from memory decrement it and save it to memory
load a value from memory, compare it to a value which is already loaded into a register, and branch depending on the comparison result
Note that such operations are typically significantly slower than their non-atomic sequence counterparts. This is because optimizations such as pipelining are forfit when executing the above instructions. (I think?)
At this point my knowledge becomes a bit less accurate and more hand-wavey, but as far as I understand, these operations are typically implemented by having some digital logic inside the processor which blocks all other processes from running while these atomic operations (listed above) are executing.
Meaning: If there are 8 CPU cores running, if one core encounters an instruction like the above, it signals the other cores to stop running until it has finished that atomic operation. (It is at least something approximatly along these lines.)
Understanding 2: Actual mutex operation
Given the above, it is possible to implement a mutex using these atomic machine instructions. Other answers posted here suggest possible ways of doing it including something similar to reference counting. (Semaphore.)
How an acutal mutex in C++ works is this:
Each mutex object has a variable in memory associated with it, the value of this variable indicates whether a mutex is locked or not
This mutex variable is updated using the special atomic operations that a CPU supports for the purpose of allowing a mutex to be programmed
Elsewhere in memory there are some other variables/data which you want to protect/synchronize access to
This synchronization is done using the mutex variable/data
Before a thread reads/writes to some data/variable which needs to be accessed mutually exclusively by all threads which operate on it, that thread must first "lock" the special mutex data/variable
This is done using the atomic operations built into a CPU for the purpose of supporting mutex programming
So you see, the data which is "locked" and accessed mutually exclusively is entirely independent from the actual data used to store the state of the mutex.
If another thread wants to read/write the data which must be accessed mutually exclusively, it will try to lock the mutex. If the mutex is already locked, that means another thread has the right to access this data, and no other thread is permitted to, therefore this thread will typically go to sleep, and will be re-woken by the operating system when the mutex is next unlocked.
It is important to note the operating system thread (kernel) is critically involved in the mutex process. Typically, before a thread sleeps, it will tell the operating sytem that it wishes to be woken up again when the mutex is free. The operating system is also notified when other threads lock or unlock a mutex. Hence synchronization of information about the state of a mutex is passed via messages through the operating system kernel.
This is why writing a multiple thread OS kernel is (proabably) impossible (if not very difficult). I don't know if this has actually been done successfully. It sounds like a difficult problem which might be the subject of current CS research.
This is pretty much everything I know about the subject. Obviously my knowledge is not absolute...
Note: Feel free to correct my Greek history or x86 Machine Instruction knowledge in the comments section. No doubt not everything here is perfectly accurate.

As your question suggests, I assume you are asking your question independent of any programming language.
First it is important to understand what is a mutex and how it works? A mutex is a binary semaphore. Then what is a semaphore? A semaphore is an integer with following attributes,
You can initialize it into any permitted value (For a mutex, it is 1 or 0).
A thread can access the semaphore and it can increment or decrement its integer value.
When a thread decrements it,
If the result is positive or zero, that thread can continue its process.
If the result is negative, that thread will be waiting and the semaphore value will not be further decremented by any later thread.
If a thread increments it, (in that case semaphore value will be either positive or 0) and the result is 0, one of the waiting threads can continue execution.
So when there's a situation where a thread is trying to access a shared resource it will decrement the mutex value (from 0, so that other thread is waiting). And when it finishes, it will increment the mutex value (So that the waiting thread can continue). That's how the access control happens by means of a mutex (Binary semaphore).
I think you understand that your question is a non-applicable one here. As a simple answer for
So my question has become simply "are mutexes global?"
is simply NO.

A mutex has whatever scope you assign to it. It can be global or local again based on where and how you declare it. If for example you declare a mutex in global memory in a place where you can access it globally, then it is indeed global. If instead you declare it at function or private class scope level, then only that function or class will have access to it.
That said, in order to be useful for synchronization, the mutex needs to be declared in a scope that can be accessed by the threads needing to synchronize on it. Whether that's at global scope or some local scope depends on your program structure. I'd advise declaring it at the highest scope accessible to the threads but no higher.
In your particular example, the mutex is indeed global because you've declared it in global memory.

Locking doesn't operate on the variables it protects, it just works by giving threads a way to arrange that only one thread at a time will be doing something (like reading+writing a data structure). And that it will be finished, with memory effects visible, before the next thread's turn to read and maybe modify that data. (A readers+writers lock allows multiple readers but only one writer).
Any thread that can access the mutex object can lock / unlock it. The mutex object itself is a normal variable that you can put in any scope you want, even a local variable and then put a pointer to it somewhere that other threads can see. (Although normally you wouldn't do that.)
Mutex is named for "Mutual Exclusion" - using one correctly ensures that only one thread at a time will ever be executing any "critical section" (wikipedia) protected by the same mutex. Separate mutexes can allow different threads to hold different locks. Different functions or blocks that use the same mutex (normally because they access the same data) won't both run at once.
If there are some variables you only ever modify inside critical sections protected by the same mutex, those accesses won't be data race, and if you don't have other bugs, your code is thread-safe. No matter whether they're global, static, or pointed to by different variables in different threads or any other way two threads might have a reference to the same object.
If you write code that accesses shared data without taking a lock on a mutex, it might see a partially-updated value, especially for a struct with multiple pointers / integers. (And in C++, simultaneous accesses to non-atomic variables is undefined behaviour if they're not all reads).
Locking is a cooperative activity, normally nothing stops you from getting it wrong. If you're familiar with file locking, you may have heard of advisory vs. mandatory locks (the OS will deny open calls by other programs). Mutexes in multi-threaded programs are advisory; no memory protection or other hardware mechanism stops another thread from executing code that accesses the bytes of an object.
(At a low enough level, that's actually useful for lock-free atomics, especially with some control over ordering of those operations from memory barriers and/or release-store / acquire-load. And CPU cache hardware is up to the task of maintaining coherency from multiple accesses. But if you use locking, you don't have to worry about any of that. If you use locking incorrectly, understanding the possible symptoms might help identify that there is a locking problem.)
Some programs have phases where only a single thread is running, or only one that would need to touch certain variables, so enforced locking for every access to a variable isn't something that every language provides. (C++ std::atomic<T> is sort of like that; every access is as-if there was a lock/unlock of a lock protecting just that T object, except it's limited to operations that most CPUs can do without needing to lock/unlock a separate lock. Unless you use a large T, then there actually is a lock. Or if you use a memory order weaker than the default seq_cst, you can see orderings that wouldn't have been possible if all accesses acquiring/releasing locks.)
Besides, consistency between multiple variables is often important, so it matters that you hold one lock across multiple operations on multiple variables, or multiple members of the same struct.
Some tools can help detect code that doesn't respect a mutex while other threads are running, though, like clang -fsanitize=thread.

Out-of-order execution and reordering: can I see what after barrier before the barrier?

According to wikipedia: A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
Usually, articles talking about something like (I will use monitors instead of membars):
class ReadWriteExample {
int A = 0;
int Another = 0;
//thread1 runs this method
void writer () {
lock monitor1; //a new value will be stored
A = 10; //stores 10 to memory location A
unlock monitor1; //a new value is ready for reader to read
Another = 20; //#see my question
}
//thread2 runs this method
void reader () {
lock monitor1; //a new value will be read
assert A == 10; //loads from memory location A
print Another //#see my question
unlock monitor1;//a new value was just read
}
}
But I wonder is it possible that compiler or cpu will shuffle the things around in a such way that code will print 20? I don't need guarantee.
I.e. by definition operations issued prior to barrier can't be pushed down by compiler, but is it possible that operations issued after barrier would be occasionally seen before barrier? (just a probability)
Thanks

My answer below only addresses Java's memory model. The answer really can't be made for all languages as each may define the rules differently.
But I wonder is it possible that compiler or cpu will shuffle the things around in a such way that code will print 20? I don't need guarantee.
Your answer seems to be "Is it possible for the store of A = 20, be re-ordered above the unlock monitor?"
The answer is yes, it can be. If you look at the JSR 166 Cookbook, the first grid shown explains how re-orderings work.
In your writer case the first operation would be MonitorExit the second operation would be NormalStore. The grid explains, yes this sequence is permitted to be re-ordered.
This is known as Roach Motel ordering, that is, memory accesses can be moved into a synchronized block but cannot be moved out
What about another language? Well, this question is too broad to answer all questions as each may define the rules differently. If this is the case you would need to refine your question.

In Java there is the concept of happens-before. You can read all the details about it on in the Java Specification. A Java compiler or runtime engine can re-order code but it must abide by the happens-before rules. These rules are important for a Java developer that wants to have detailed control on how their code is re-ordered. I myself have been burnt by re-ordering code, turns out I was referencing the same object via two different variables and the runtime engine re-ordered my code not realizing that the operations were on the same object. If I had either a happens-before (between the two operations) or used the same variable, then the re-ordering would not have occurred.
Specifically:
It follows from the above definitions that:
An unlock on a monitor happens-before every subsequent lock on that monitor.
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field.
A call to start() on a thread happens-before any actions in the
started thread.
All actions in a thread happen-before any other thread successfully
returns from a join() on that thread.
The default initialization of any object happens-before any other
actions (other than default-writes) of a program.

Short answer - yes. This is very compiler and CPU architecture dependent. You have here the definition of a Race Condition. The scheduling Quantum won't end mid-instruction (can't have two writes to same location). However - the quantum could end between instructions - plus how they are executed out-of-order in the pipeline is architecture dependent (outside of the monitor block).
Now comes the "it depends" complications. The CPU guarantees little (see race condition). You might also look at NUMA (ccNUMA) - it is a method to scale CPU & Memory access by grouping CPUs (Nodes) with local RAM and a group owner - plus a special bus between Nodes.
The monitor doesn't prevent the other thread from running. It only prevents it from entering the code between the monitors. Therefore when the Writer exits the monitor-section it is free to execute the next statement - regardless of the other thread being inside the monitor. Monitors are gates that block access. Also - the quantum could interrupt the second thread after the A== statement - allowing Another to change value. Again - the quantum won't interrupt mid-instruction. Always think of threads executing in perfect parallel.
How do you apply this? I'm a bit out of date (sorry, C#/Java these days) with current Intel processors - and how their Pipelines work (hyperthreading etc). Years ago I worked with a processor called MIPS - and it had (through compiler instruction ordering) the ability to execute instructions that occurred serially AFTER a Branch instruction (Delay Slot). On this CPU/Compiler combination - YES - what you describe could happen. If Intel offers the same - then yes - it could happen. Esp with the NUMA (both Intel & AMD have this, I'm most familiar with AMD implementation).
My point - if threads were running across NUMA nodes - and access was to the common memory location then it could occur. Of course the OS tries hard to schedule operations within the same node.
You might be able to simulate this. I know C++ on MS allows access to NUMA technology (I've played with it). See if you can allocate memory across two nodes (placing A on one, and Another on the other). Schedule the threads to run on specific Nodes.
What happens in this model is that there are two pathways to RAM. I suppose this isn't what you had in mind - probably only a single path/Node model. In which case I go back to the MIPS model I described above.
I assumed a processor that interrupts - there are others that have a Yield model.

Conditional Variable vs Semaphore

When to use a semaphore and when to use a conditional variable?

Locks are used for mutual exclusion. When you want to ensure that a piece of code is atomic, put a lock around it. You could theoretically use a binary semaphore to do this, but that's a special case.
Semaphores and condition variables build on top of the mutual exclusion provide by locks and are used for providing synchronized access to shared resources. They can be used for similar purposes.
A condition variable is generally used to avoid busy waiting (looping repeatedly while checking a condition) while waiting for a resource to become available. For instance, if you have a thread (or multiple threads) that can't continue onward until a queue is empty, the busy waiting approach would be to just doing something like:
//pseudocode
while(!queue.empty())
{
sleep(1);
}
The problem with this is that you're wasting processor time by having this thread repeatedly check the condition. Why not instead have a synchronization variable that can be signaled to tell the thread that the resource is available?
//pseudocode
syncVar.lock.acquire();
while(!queue.empty())
{
syncVar.wait();
}
//do stuff with queue
syncVar.lock.release();
Presumably, you'll have a thread somewhere else that is pulling things out of the queue. When the queue is empty, it can call syncVar.signal() to wake up a random thread that is sitting asleep on syncVar.wait() (or there's usually also a signalAll() or broadcast() method to wake up all the threads that are waiting).
I generally use synchronization variables like this when I have one or more threads waiting on a single particular condition (e.g. for the queue to be empty).
Semaphores can be used similarly, but I think they're better used when you have a shared resource that can be available and unavailable based on some integer number of available things. Semaphores are good for producer/consumer situations where producers are allocating resources and consumers are consuming them.
Think about if you had a soda vending machine. There's only one soda machine and it's a shared resource. You have one thread that's a vendor (producer) who is responsible for keeping the machine stocked and N threads that are buyers (consumers) who want to get sodas out of the machine. The number of sodas in the machine is the integer value that will drive our semaphore.
Every buyer (consumer) thread that comes to the soda machine calls the semaphore down() method to take a soda. This will grab a soda from the machine and decrement the count of available sodas by 1. If there are sodas available, the code will just keep running past the down() statement without a problem. If no sodas are available, the thread will sleep here waiting to be notified of when soda is made available again (when there are more sodas in the machine).
The vendor (producer) thread would essentially be waiting for the soda machine to be empty. The vendor gets notified when the last soda is taken from the machine (and one or more consumers are potentially waiting to get sodas out). The vendor would restock the soda machine with the semaphore up() method, the available number of sodas would be incremented each time and thereby the waiting consumer threads would get notified that more soda is available.
The wait() and signal() methods of a synchronization variable tend to be hidden within the down() and up() operations of the semaphore.
Certainly there's overlap between the two choices. There are many scenarios where a semaphore or a condition variable (or set of condition variables) could both serve your purposes. Both semaphores and condition variables are associated with a lock object that they use to maintain mutual exclusion, but then they provide extra functionality on top of the lock for synchronizing thread execution. It's mostly up to you to figure out which one makes the most sense for your situation.
That's not necessarily the most technical description, but that's how it makes sense in my head.

Let's reveal what's under the hood.
Conditional variable is essentially a wait-queue, that supports blocking-wait and wakeup operations, i.e. you can put a thread into the wait-queue and set its state to BLOCK, and get a thread out from it and set its state to READY.
Note that to use a conditional variable, two other elements are needed:
a condition (typically implemented by checking a flag or a counter)
a mutex that protects the condition
The protocol then becomes,
acquire mutex
check condition
block and release mutex if condition is true, else release mutex
Semaphore is essentially a counter + a mutex + a wait queue. And it can be used as it is without external dependencies. You can use it either as a mutex or as a conditional variable.
Therefore, semaphore can be treated as a more sophisticated structure than conditional variable, while the latter is more lightweight and flexible.

Semaphores can be used to implement exclusive access to variables, however they are meant to be used for synchronization. Mutexes, on the other hand, have a semantics which is strictly related to mutual exclusion: only the process which locked the resource is allowed to unlock it.
Unfortunately you cannot implement synchronization with mutexes, that's why we have condition variables. Also notice that with condition variables you can unlock all the waiting threads in the same instant by using the broadcast unlocking. This cannot be done with semaphores.

semaphore and condition variables are very similar and are used mostly for the same purposes. However, there are minor differences that could make one preferable. For example, to implement barrier synchronization you would not be able to use a semaphore.But a condition variable is ideal.
Barrier synchronization is when you want all of your threads to wait until everyone has arrived at a certain part in the thread function. this can be implemented by having a static variable which is initially the value of total threads decremented by each thread when it reaches that barrier. this would mean we want each thread to sleep until the last one arrives.A semaphore would do the exact opposite! with a semaphore, each thread would keep running and the last thread (which will set semaphore value to 0) will go to sleep.
a condition variable on the other hand, is ideal. when each thread gets to the barrier we check if our static counter is zero. if not, we set the thread to sleep with the condition variable wait function. when the last thread arrives at the barrier, the counter value will be decremented to zero and this last thread will call the condition variable signal function which will wake up all the other threads!

I file condition variables under monitor synchronization. I've generally seen semaphores and monitors as two different synchronization styles. There are differences between the two in terms of how much state data is inherently kept and how you want to model code - but there really isn't any problem that can be solved by one but not the other.
I tend to code towards monitor form; in most languages I work in that comes down to mutexes, condition variables, and some backing state variables. But semaphores would do the job too.

semaphore need to know the count upfront for initialization. There is no such requirement for condition variables.

The the mutex and conditional variables are inherited from semaphore.
For mutex, the semaphore uses two states: 0, 1
For condition variables the semaphore uses counter.
They are like syntactic sugar

conditionalVar + mutex == semaphore

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string