How to control simultaneous access to multiple shared queues by multiple producers?

How to control simultaneous access to multiple shared queues by multiple producers? - multithreading

One way would be to lock and then check the status of first shared queue, push data if space available, or ignore if not, and then unlock.
Then check the status of second shared queue, push data if space available, or ignore if not, and then unlock.
So, on and so forth.
Here we'll be constantly locking and unlocking to see the status of a shared queue and then act accordingly.
Questions:
What are the drawbacks of this method? Of course time will be spent in locking and unlocking. Is that it?
What are the other ways to achieve the same effect without the current method's drawbacks?

Lock contention is very expensive because it requires a context switch - see the LMAX Disruptor for a more in-depth explanation, in particular the performance results page; the Disruptor is a lock-free bounded queue that exhibits less latency than a bounded queue that uses locks.
One way to reduce lock contention is to have your producers check the queues in a different order from each other, for example instead of each producer checking Queue1, then Queue2, ... and finally QueueN, each producer would repeatedly generate a random number between [1, N] and then check Queue[Rand(N)]. A more complex solution would be to maintain a set of queues sorted according to their available space (e.g. in Java this would be a ConcurrentSkipListSet), then have each producer remove the queue from the head of the set (i.e. the queue with the most available space that is not being simultaneously accessed by another producer), add an element, and insert the queue back into the set; a simpler solution along the same vein is to maintain an unbounded unsorted queue of queues and to have a producer remove and check the queue from the head of the queue of queues and to then insert the queue back into the tail of the queue of queues, which would ensure that only one producer is able to check a queue at any given point of time.
Another solution is to reduce and ideally eliminate the number of locks - it's difficult to write lock-free algorithms but it's also potentially very rewarding as demonstrated by the performance of LMAX's lock-free queue. In lieu of replacing your locked bounded queues with LMAX's lock-free bounded queues, another solution is to replace your locked bounded queues with lock-free unbounded queues (e.g. Java's ConcurrentLinkedQueue; lock-free unbounded queues are much more likely to be in your language's standard library than lock-free bounded queues) and to place conservative lock-free guards on these queues. For example, using Java's AtomicInteger for the guards:
public class BoundedQueue<T> {
private ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private AtomicInteger bound = new AtomicInteger(0);
private final int maxSize;
public BoundedQueue(int maxSize) {
this.maxSize = maxSize;
}
public T poll() {
T retVal = queue.poll();
if(retVal != null) {
bound.decrementAndGet();
}
return retVal;
}
public boolean offer(T t) {
if(t == null) throw new NullPointerException();
int boundSize = bound.get();
for(int retryCount = 0; retryCount < 3 && boundSize < maxSize; retryCount++) {
if(bound.compareAndSet(boundSize, boundSize + 1)) {
return queue.offer(t);
}
boundSize = bound.get();
}
return false;
}
}
poll() will return the element from the head of the queue, decrementing bound if the head element isn't null i.e. if the queue isn't empty. offer(T t) attempts to increment the size of bound without exceeding maxSize, if this succeeds then it puts the element at the tail of the queue, else if this fails three times then the method returns false. This is a conservative guard because it is possible for offer to fail even if the queue isn't full, e.g. if an element is removed after boundSize = bound.get() sets boundSize to maxSize, or if the bound.compareAndSet(expected, newVal) method happens to fail three times due to multiple consumers calling poll().

Really, you are making too many locks/unlocks here. The solution is to make the same check twice:
check if space is available, if not, continue
lock
check if space is available AGAIN
... go on as you did before.
This way you will lock if you needn't to do it only in very rare cases.
I have first seen the solution in the book "Professional Java EE Design Patterns" (Yener, Theedom)
Edit.
About spreading the start queue numbers among threads.
Notice, that without any special organization these threads are waiting for queues only the first time. The next time the necessary timeshift will be already created by simply waiting. Of course, we can create the timeshift ourselves, spreading the start numbers among threads. And that simple spread by equal shift will be more effective that a random one.

Related

How/when to release memory in wait-free algorithms

I'm having trouble figuring out a key point in wait-free algorithm design. Suppose a data structure has a pointer to another data structure (e.g. linked list, tree, etc), how can the right time for releasing a data structure?
The problem is this, there are separate operations that can't be executed atomically without a lock. For example one thread reads the pointer to some memory, and increments the use count for that memory to prevent free while this thread is using the data, which might take long, and even if it doesn't, it's a race condition. What prevents another thread from reading the pointer, decrementing the use count and determining that it's no longer used and freeing it before the first thread incremented the use count?
The main issue is that current CPUs only have a single word CAS (compare & swap). Alternatively the problem is that I'm clueless about waitfree algorithms and data structures and after reading some papers I'm still not seeing the light.
IMHO Garbage collection can't be the answer, because it would either GC would have to be prevented from running if any single thread is inside an atomic block (which would mean it can't be guaranteed that the GC will ever run again) or the problem is simply pushed to the GC, in which case, please explain how the GC would figure out if the data is in the silly state (a pointer is read [e.g. stored in a local variable] but the the use count didn't increment yet).
PS, references to advanced tutorials on wait-free algorithms for morons are welcome.
Edit: You should assume that the problem is being solved in a non-managed language, like C or C++. After all if it were Java, we'd have no need to worry about releasing memory. Further assume that the compiler may generate code that will store temporary references to objects in registers (invisible to other threads) right before the usage counter increment, and that a thread can be interrupted between loading the object address and incrementing the counter. This of course doesn't mean that the solution must be limited to C or C++, rather that the solution should give a set of primitives that allowing the implementation of wait-free algorithms on linked data structures. I'm interested in the primitives and how they solve the problem of designing wait-free algorithms. With such primitives a wait-free algorithm can be implemented equally well in C++ and Java.

After some research I learned this.
The problem is not trivial to solve and there are several solutions each with advantages and disadvantages. The reason for the complexity comes from inter CPU synchronization issues. If not done right it might appear to work correctly 99.9% of the time, which isn't enough, or it might fail under load.
Three solutions that I found are 1) hazard pointers, 2) quiescence period based reclamation (used by the Linux kernel in the RCU implementation) 3) reference counting techniques. 4) Other 5) Combinations
Hazard pointers work by saving the currently active references in a well-known per thread location, so any thread deciding to free memory (when the counter appears to be zero) can check if the memory is still in use by anyone. An interesting improvement is to buffer request to release memory in a small array and free them up in a batch when the array is full. The advantage of using hazard pointers is that it can actually guarantee an upper bound on unreclaimed memory. The disadvantage is that it places extra burden on the reader.
Quiescence period based reclamation works by delaying the actual release of the memory until it's known that each thread has had a chance to finish working on any data that may need to be released. The way to know that this condition is satisfied is to check if each thread passed through a quiescent period (not in a critical section) after the object was removed. In the Linux kernel this means something like each task making a voluntary task switch. In a user space application it would be the end of a critical section. This can be achieved by a simple counter, each time the counter is even the thread is not in a critical section (reading shared data), each time the counter is odd the thread is inside a critical section, to move from a critical section or back all the thread needs to do is to atomically increment the number. Based on this the "garbage collector" can determine if each thread has had a chance to finish. There are several approaches, one simple one would be to queue up the requests to free memory (e.g. in a linked list or an array), each with the current generation (managed by the GC), when the GC runs it checks the state of the threads (their state counters) to see if each passed to the next generation (their counter is higher than the last time or is the same and even), any memory can be reclaimed one generation after it was freed. The advantage of this approach is that is places the least burden on the reading threads. The disadvantage is that it can't guarantee an upper bound for the memory waiting to be released (e.g. one thread spending 5 minutes in a critical section, while the data keeps changing and memory isn't released), but in practice it works out all right.
There is a number of reference counting solutions, many of them require double compare and swap, which some CPUs don't support, so can't be relied upon. The key problem remains though, taking a reference before updating the counter. I didn't find enough information to explain how this can be done simply and reliably though. So .....
There are of course a number of "Other" solutions, it's a very important topic of research with tons of papers out there. I didn't examine all of them. I only need one.
And of course the various approaches can be combined, for example hazard pointers can solve the problems of reference counting. But there's a nearly infinite number of combinations, and in some cases a spin lock might theoretically break wait-freedom, but doesn't hurt performance in practice. Somewhat like another tidbit I found in my research, it's theoretically not possible to implement wait-free algorithms using compare-and-swap, that's because in theory (purely in theory) a CAS based update might keep failing for non-deterministic excessive times (imagine a million threads on a million cores each trying to increment and decrement the same counter using CAS). In reality however it rarely fails more than a few times (I suspect it's because the CPUs spend more clocks away from CAS than there are CPUs, but I think if the algorithm returned to the same CAS on the same location every 50 clocks and there were 64 cores there could be a chance of a major problem, then again, who knows, I don't have a hundred core machine to try this). Another results of my research is that designing and implementing wait-free algorithms and data-structures is VERY challenging (even if some of the heavy lifting is outsourced, e.g. to a garbage collector [e.g. Java]), and might perform less well than a similar algorithm with carefully placed locks.
So, yeah, it's possible to free memory even without delays. It's just tricky. And if you forget to make the right operations atomic, or to place the right memory barrier, oh, well, you're toast. :-) Thanks everyone for participating.

I think atomic operations for increment/decrement and compare-and-swap would solve this problem.
Idea:
All resources have a counter which is modified with atomic operations. The counter is initially zero.
Before using a resource: "Acquire" it by atomically incrementing its counter. The resource can be used if and only if the incremented value is greater than zero.
After using a resource: "Release" it by atomically decrementing its counter. The resource should be disposed/freed if and only if the decremented value is equal to zero.
Before disposing: Atomically compare-and-swap the counter value with the minimum (negative) value. Dispose will not happen if a concurrent thread "Acquired" the resource in between.
You haven't specified a language for your question. Here goes an example in c#:
class MyResource
{
// Counter is initially zero. Resource will not be disposed until it has
// been acquired and released.
private int _counter;
public bool Acquire()
{
// Atomically increment counter.
int c = Interlocked.Increment(ref _counter);
// Resource is available if the resulting value is greater than zero.
return c > 0;
}
public bool Release()
{
// Atomically decrement counter.
int c = Interlocked.Decrement(ref _counter);
// We should never reach a negative value
Debug.Assert(c >= 0, "Resource was released without being acquired");
// Dispose when we reach zero
if (c == 0)
{
// Mark as disposed by setting counter its minimum value.
// Only do this if the counter remain at zero. Atomic compare-and-swap operation.
if (Interlocked.CompareExchange(ref _counter, int.MinValue, c) == c)
{
// TODO: Run dispose code (free stuff)
return true; // tell caller that resource is disposed
}
}
return false; // released but still in use
}
}
Usage:
// "r" is an instance of MyResource
bool acquired = false;
try
{
if (acquired = r.Acquire())
{
// TODO: Use resource
}
}
finally
{
if (acquired)
{
if (r.Release())
{
// Resource was disposed.
// TODO: Nullify variable or similar to let GC collect it.
}
}
}

I know this is not the best way but it works for me:
for shared dynamic data-structure lists I use usage counter per item
for example:
struct _data
{
DWORD usage;
bool delete;
// here add your data
_data() { usage=0; deleted=true; }
};
const int MAX = 1024;
_data data[MAX];
now when item is started to be used somwhere then
// start use of data[i]
data[i].cnt++;
after is no longer used then
// stop use of data[i]
data[i].cnt--;
if you want to add new item to list then
// add item
for (i=0;i<MAX;i++) // find first deleted item
if (data[i].deleted)
{
data[i].deleted=false;
data[i].cnt=0;
// copy/set your data
break;
}
and now in the background once in a while (on timer or whatever)
scann data[] an all undeleted items with cnt == 0 set as deleted (+ free its dynamic memory if it has any)
[Note]
to avoid multi-thread access problems implement single global lock per data list
and program it so you cannot scann data while any data[i].cnt is changing
one bool and one DWORD suffice for this if you do not want to use OS locks
// globals
bool data_cnt_locked=false;
DWORD data_cnt=0;
now any change of data[i].cnt modify like this:
// start use of data[i]
while (data_cnt_locked) Sleep(1);
data_cnt++;
data[i].cnt++;
data_cnt--;
and modify delete scan like this
while (data_cnt) Sleep(1);
data_cnt_locked=true;
Sleep(1);
if (data_cnt==0) // just to be sure
for (i=0;i<MAX;i++) // here scan for items to delete ...
if (!data[i].cnt)
if (!data[i].deleted)
{
data[i].deleted=true;
data[i].cnt=0;
// release your dynamic data ...
}
data_cnt_locked=false;
PS.
do not forget to play with the sleep times a little to suite your needs
lock free algorithm sleep times are sometimes dependent on OS task/scheduler
this is not really an lock free implementation
because while GC is at work then all is locked
but if ather than that multi access is not blocking to each other
so if you do not run GC too often you are fine

Effects of swapping buffers on concurrent access

Consider an application with two threads, Producer and Consumer.
Both threads are running approximately equally frequent, multiple times in a second.
Both threads access the same memory region, where Producer writes to the memory, and Consumer reads the current chunk of data and does something with it, without invalidating the data.
A classical approach is this one:
int[] sharedData;
//Called frequently by thread Producer
void WriteValues(int[] data)
{
lock(sharedData)
{
Array.Copy(data, sharedData, LENGTH);
}
}
//Called frequently by thread Consumer
void WriteValues()
{
int[] data;
lock(sharedData)
{
Array.Copy(sharedData, data, LENGTH);
}
DoSomething(data);
}
If we assume that the Array.Copy takes time, this code would run slow, since Producer always has to wait for Consumer during copying and vice versa.
An approach to this problem would be to create two buffers, one which is accessed by the Consumer, and one which is written to by the Producer, and swap the buffers, as soon as writing has finished.
int[] frontBuffer;
int[] backBuffer;
//Called frequently by thread Producer
void WriteValues(int[] data)
{
lock(backBuffer)
{
Array.Copy(data, backBuffer, LENGTH);
int[] temp = frontBuffer;
frontBuffer = backBuffer;
backBuffer = temp;
}
}
//Called frequently by thread Consumer
void WriteValues()
{
int[] data;
int[] currentFrontBuffer = frontBuffer;
lock(currentForntBuffer)
{
Array.Copy(currentFrontBuffer , data, LENGTH);
}
DoSomething(currentForntBuffer );
}
Now, my questions:
Is locking, as shown in the 2nd example, safe? Or does the change of references introduce problems?
Will the code in the 2nd example execute faster than the code in the 1st example?
Are there any better methods to efficiently solve the problem described above?
Could there be a way to solve this problem without locks? (Even if I think it is impossible)
Note: this is no classical producer/consumer problem: It is possible for Consumer to read the values multiple times before Producer writes it again - the old data stays valid until Producer writes new data.

Is locking, as shown in the 2nd example, safe? Or does the change of references introduce problems?
As far as I can tell, because reference assignment is atomic, this may be safe but not ideal. Because the WriteValues() method reads from frontBuffer without a lock or memory barrier forcing a cache refresh, there no guarantee that the variable will ever be updated with new values from main memory. There is then a potential to continuously read the stale, cached values of that instance from the local register or CPU cache. I'm unsure of whether the compiler/JIT might infer a cache refresh anyway based on the local variable, maybe somebody with more specific knowledge can speak to this area.
Even if the values aren't stale, you may also run into more contention than you would like. For example...
Thread A calls WriteValues()
Thread A takes a lock on the instance in frontBuffer and starts copying.
Thread B calls WriteValues(int[])
Thread B writes its data, moves the currently locked frontBuffer instance into backBuffer.
Thread B calls WriteValues(int[])
Thread B waits on the lock for backBuffer because Thread A still has it.
Will the code in the 2nd example execute faster than the code in the 1st example?
I suggest that you profile it and find out. X being faster than Y only matters if Y is too slow for your particular needs, and you are the only one who knows what those are.
Are there any better methods to efficiently solve the problem described above?
Yes. If you are using .Net 4 and above, there is a BlockingCollection type in System.Collections.Concurrent that models the Producer/Consumer pattern well. If you consistently read more than you write, or have multiple readers to very few writers, you may also want to consider the ReaderWriterLockSlim class. As a general rule of thumb, you should do as little within a lock as you can, which will also help to alleviate your time issue.
Could there be a way to solve this problem without locks? (Even if I think it is impossible)
You might be able to, but I wouldn't suggest trying that unless you are extremely familiar with multi-threading, cache coherency, and potential compiler/JIT optimizations. Locking will most likely be fine for your situation and it will be much easier for you (and others reading your code) to reason about and maintain.

How are "nonblocking" data structures possible?

I'm having trouble understanding how any data structure can be "nonblocking".
Say you're making a "nonblocking" hashtable. At some point or another, your hashtable gets too full, so you have to re-hash into a larger table.
This implies you need to allocate memory, which is a global resource. So it seems that you must obtain some sort of lock to prevent global corruption of the heap... irrespective of possible problems with your data structure itself!
But then that means every other thread must block while you allocate your memory...
What am I missing here?
(How) can you allocate memory without blocking another thread which is doing the same?

Two examples for non blocking designs are optimistic design and Transactional Memory.
The idea of this is - in most of the cases, the blocking is redundant - since two OPs can concurrently occur without interrupting each other. However, sometimes when 2 OPs occur concurrently and the data becomes corrupted because of it - you can roll back to your previous state, and retry.
There might still be locks in these designs, but the time the data is locked is significantly shorter, and is limited only to the critical time where the affect of the OP is taking place.

Just for some definitions, additional information and to distinguish between non-blocking, lock-free and wait-free terms, I recommend reading the following article (I won't copy the relevant passages here as it's too long):
Definitions of Non-blocking, Lock-free and Wait-free

Most strategies have one fundamental pattern in common. They use a compare and swap (CAS) operation in a loop until it succeeds.
For example, lets consider a stack implemented with a linked list. I chose a linked list implementation because it is easy to make concurrent with a CAS, but there are other ways to do it. I will use C-like pseudocode.
Push(T item)
{
Node node = new Node(); // allocate node memory
Node initial;
do
{
initial = head;
node.Value = item;
node.Next = initial;
}
while (CompareAndSwap(head, node, initial) != initial);
}
Pop()
{
Node node;
Node initial;
do
{
initial = head;
node = initial.Next;
}
while (CompareAndSwap(head, node, initial) != initial);
T value = initial.Value;
delete initial; // deallocate node memory
return value;
}
In the above code CompareAndSwap is a non-blocking atomic operation that replaces the value in a memory address with a new value and returns the old value. If the old value does not match the expected value then you spin through the loop and try it all again.

All that non-blocking means is that you never wait indefinitely, not that you never wait at all. As long as your heap is also implemented using a non-blocking algorithm, you can implement other non-blocking algorithms on top of it.

What does threadsafe mean?

Recently I tried to Access a textbox from a thread (other than the UI thread) and an exception was thrown. It said something about the "code not being thread safe" and so I ended up writing a delegate (sample from MSDN helped) and calling it instead.
But even so I didn't quite understand why all the extra code was necessary.
Update:
Will I run into any serious problems if I check
Controls.CheckForIllegalCrossThread..blah =true

Eric Lippert has a nice blog post entitled What is this thing you call "thread safe"? about the definition of thread safety as found of Wikipedia.
3 important things extracted from the links :
“A piece of code is thread-safe if it functions correctly during
simultaneous execution by multiple threads.”
“In particular, it must satisfy the need for multiple threads to
access the same shared data, …”
“…and the need for a shared piece of data to be accessed by only one
thread at any given time.”
Definitely worth a read!

In the simplest of terms threadsafe means that it is safe to be accessed from multiple threads. When you are using multiple threads in a program and they are each attempting to access a common data structure or location in memory several bad things can happen. So, you add some extra code to prevent those bad things. For example, if two people were writing the same document at the same time, the second person to save will overwrite the work of the first person. To make it thread safe then, you have to force person 2 to wait for person 1 to complete their task before allowing person 2 to edit the document.

Wikipedia has an article on Thread Safety.
This definitions page (you have to skip an ad - sorry) defines it thus:
In computer programming, thread-safe describes a program portion or routine that can be called from multiple programming threads without unwanted interaction between the threads.
A thread is an execution path of a program. A single threaded program will only have one thread and so this problem doesn't arise. Virtually all GUI programs have multiple execution paths and hence threads - there are at least two, one for processing the display of the GUI and handing user input, and at least one other for actually performing the operations of the program.
This is done so that the UI is still responsive while the program is working by offloading any long running process to any non-UI threads. These threads may be created once and exist for the lifetime of the program, or just get created when needed and destroyed when they've finished.
As these threads will often need to perform common actions - disk i/o, outputting results to the screen etc. - these parts of the code will need to be written in such a way that they can handle being called from multiple threads, often at the same time. This will involve things like:
Working on copies of data
Adding locks around the critical code
Opening files in the appropriate mode - so if reading, don't open the file for write as well.
Coping with not having access to resources because they're locked by other threads/processes.

Simply, thread-safe means that a method or class instance can be used by multiple threads at the same time without any problems occurring.
Consider the following method:
private int myInt = 0;
public int AddOne()
{
int tmp = myInt;
tmp = tmp + 1;
myInt = tmp;
return tmp;
}
Now thread A and thread B both would like to execute AddOne(). but A starts first and reads the value of myInt (0) into tmp. Now for some reason, the scheduler decides to halt thread A and defer execution to thread B. Thread B now also reads the value of myInt (still 0) into it's own variable tmp. Thread B finishes the entire method so in the end myInt = 1. And 1 is returned. Now it's Thread A's turn again. Thread A continues. And adds 1 to tmp (tmp was 0 for thread A). And then saves this value in myInt. myInt is again 1.
So in this case the method AddOne() was called two times, but because the method was not implemented in a thread-safe way the value of myInt is not 2, as expected, but 1 because the second thread read the variable myInt before the first thread finished updating it.
Creating thread-safe methods is very hard in non-trivial cases. And there are quite a few techniques. In Java you can mark a method as synchronized, this means that only one thread can execute that method at a given time. The other threads wait in line. This makes a method thread-safe, but if there is a lot of work to be done in a method, then this wastes a lot of space. Another technique is to 'mark only a small part of a method as synchronized' by creating a lock or semaphore, and locking this small part (usually called the critical section). There are even some methods that are implemented as lock-less thread-safe, which means that they are built in such a way that multiple threads can race through them at the same time without ever causing problems, this can be the case when a method only executes one atomic call. Atomic calls are calls that can't be interrupted and can only be done by one thread at a time.

In real world example for the layman is
Let's suppose you have a bank account with the internet and mobile banking and your account have only $10.
You performed transfer balance to another account using mobile banking, and the meantime, you did online shopping using the same bank account.
If this bank account is not threadsafe, then the bank allows you to perform two transactions at the same time and then the bank will become bankrupt.
Threadsafe means that an object's state doesn't change if simultaneously multiple threads try to access the object.

You can get more explanation from the book "Java Concurrency in Practice":
A class is thread‐safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.

A module is thread-safe if it guarantees it can maintain its invariants in the face of multi-threaded and concurrence use.
Here, a module can be a data-structure, class, object, method/procedure or function. Basically scoped piece of code and related data.
The guarantee can potentially be limited to certain environments such as a specific CPU architecture, but must hold for those environments. If there is no explicit delimitation of environments, then it is usually taken to imply that it holds for all environments that the code can be compiled and executed.
Thread-unsafe modules may function correctly under mutli-threaded and concurrent use, but this is often more down to luck and coincidence, than careful design. Even if some module does not break for you under, it may break when moved to other environments.
Multi-threading bugs are often hard to debug. Some of them only happen occasionally, while others manifest aggressively - this too, can be environment specific. They can manifest as subtly wrong results, or deadlocks. They can mess up data-structures in unpredictable ways, and cause other seemingly impossible bugs to appear in other remote parts of the code. It can be very application specific, so it is hard to give a general description.

Thread safety: A thread safe program protects it's data from memory consistency errors. In a highly multi-threaded program, a thread safe program does not cause any side effects with multiple read/write operations from multiple threads on same objects. Different threads can share and modify object data without consistency errors.
You can achieve thread safety by using advanced concurrency API. This documentation page provides good programming constructs to achieve thread safety.
Lock Objects support locking idioms that simplify many concurrent applications.
Executors define a high-level API for launching and managing threads. Executor implementations provided by java.util.concurrent provide thread pool management suitable for large-scale applications.
Concurrent Collections make it easier to manage large collections of data, and can greatly reduce the need for synchronization.
Atomic Variables have features that minimize synchronization and help avoid memory consistency errors.
ThreadLocalRandom (in JDK 7) provides efficient generation of pseudorandom numbers from multiple threads.
Refer to java.util.concurrent and java.util.concurrent.atomic packages too for other programming constructs.

Producing Thread-safe code is all about managing access to shared mutable states. When mutable states are published or shared between threads, they need to be synchronized to avoid bugs like race conditions and memory consistency errors.
I recently wrote a blog about thread safety. You can read it for more information.

You are clearly working in a WinForms environment. WinForms controls exhibit thread affinity, which means that the thread in which they are created is the only thread that can be used to access and update them. That is why you will find examples on MSDN and elsewhere demonstrating how to marshall the call back onto the main thread.
Normal WinForms practice is to have a single thread that is dedicated to all your UI work.

I find the concept of http://en.wikipedia.org/wiki/Reentrancy_%28computing%29 to be what I usually think of as unsafe threading which is when a method has and relies on a side effect such as a global variable.
For example I have seen code that formatted floating point numbers to string, if two of these are run in different threads the global value of decimalSeparator can be permanently changed to '.'
//built in global set to locale specific value (here a comma)
decimalSeparator = ','
function FormatDot(value : real):
//save the current decimal character
temp = decimalSeparator
//set the global value to be
decimalSeparator = '.'
//format() uses decimalSeparator behind the scenes
result = format(value)
//Put the original value back
decimalSeparator = temp

To understand thread safety, read below sections:
4.3.1. Example: Vehicle Tracker Using Delegation
As a more substantial example of delegation, let's construct a version of the vehicle tracker that delegates to a thread-safe class. We store the locations in a Map, so we start with a thread-safe Map implementation, ConcurrentHashMap. We also store the location using an immutable Point class instead of MutablePoint, shown in Listing 4.6.
Listing 4.6. Immutable Point class used by DelegatingVehicleTracker.
class Point{
public final int x, y;
public Point() {
this.x=0; this.y=0;
}
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Point is thread-safe because it is immutable. Immutable values can be freely shared and published, so we no longer need to copy the locations when returning them.
DelegatingVehicleTracker in Listing 4.7 does not use any explicit synchronization; all access to state is managed by ConcurrentHashMap, and all the keys and values of the Map are immutable.
Listing 4.7. Delegating Thread Safety to a ConcurrentHashMap.
public class DelegatingVehicleTracker {
private final ConcurrentMap<String, Point> locations;
private final Map<String, Point> unmodifiableMap;
public DelegatingVehicleTracker(Map<String, Point> points) {
this.locations = new ConcurrentHashMap<String, Point>(points);
this.unmodifiableMap = Collections.unmodifiableMap(locations);
}
public Map<String, Point> getLocations(){
return this.unmodifiableMap; // User cannot update point(x,y) as Point is immutable
}
public Point getLocation(String id) {
return locations.get(id);
}
public void setLocation(String id, int x, int y) {
if(locations.replace(id, new Point(x, y)) == null) {
throw new IllegalArgumentException("invalid vehicle name: " + id);
}
}
}
If we had used the original MutablePoint class instead of Point, we would be breaking encapsulation by letting getLocations publish a reference to mutable state that is not thread-safe. Notice that we've changed the behavior of the vehicle tracker class slightly; while the monitor version returned a snapshot of the locations, the delegating version returns an unmodifiable but “live” view of the vehicle locations. This means that if thread A calls getLocations and thread B later modifies the location of some of the points, those changes are reflected in the Map returned to thread A.
4.3.2. Independent State Variables
We can also delegate thread safety to more than one underlying state variable as long as those underlying state variables are independent, meaning that the composite class does not impose any invariants involving the multiple state variables.
VisualComponent in Listing 4.9 is a graphical component that allows clients to register listeners for mouse and keystroke events. It maintains a list of registered listeners of each type, so that when an event occurs the appropriate listeners can be invoked. But there is no relationship between the set of mouse listeners and key listeners; the two are independent, and therefore VisualComponent can delegate its thread safety obligations to two underlying thread-safe lists.
Listing 4.9. Delegating Thread Safety to Multiple Underlying State Variables.
public class VisualComponent {
private final List<KeyListener> keyListeners
= new CopyOnWriteArrayList<KeyListener>();
private final List<MouseListener> mouseListeners
= new CopyOnWriteArrayList<MouseListener>();
public void addKeyListener(KeyListener listener) {
keyListeners.add(listener);
}
public void addMouseListener(MouseListener listener) {
mouseListeners.add(listener);
}
public void removeKeyListener(KeyListener listener) {
keyListeners.remove(listener);
}
public void removeMouseListener(MouseListener listener) {
mouseListeners.remove(listener);
}
}
VisualComponent uses a CopyOnWriteArrayList to store each listener list; this is a thread-safe List implementation particularly suited for managing listener lists (see Section 5.2.3). Each List is thread-safe, and because there are no constraints coupling the state of one to the state of the other, VisualComponent can delegate its thread safety responsibilities to the underlying mouseListeners and keyListeners objects.
4.3.3. When Delegation Fails
Most composite classes are not as simple as VisualComponent: they have invariants that relate their component state variables. NumberRange in Listing 4.10 uses two AtomicIntegers to manage its state, but imposes an additional constraint—that the first number be less than or equal to the second.
Listing 4.10. Number Range Class that does Not Sufficiently Protect Its Invariants. Don't do this.
public class NumberRange {
// INVARIANT: lower <= upper
private final AtomicInteger lower = new AtomicInteger(0);
private final AtomicInteger upper = new AtomicInteger(0);
public void setLower(int i) {
//Warning - unsafe check-then-act
if(i > upper.get()) {
throw new IllegalArgumentException(
"Can't set lower to " + i + " > upper ");
}
lower.set(i);
}
public void setUpper(int i) {
//Warning - unsafe check-then-act
if(i < lower.get()) {
throw new IllegalArgumentException(
"Can't set upper to " + i + " < lower ");
}
upper.set(i);
}
public boolean isInRange(int i){
return (i >= lower.get() && i <= upper.get());
}
}
NumberRange is not thread-safe; it does not preserve the invariant that constrains lower and upper. The setLower and setUpper methods attempt to respect this invariant, but do so poorly. Both setLower and setUpper are check-then-act sequences, but they do not use sufficient locking to make them atomic. If the number range holds (0, 10), and one thread calls setLower(5) while another thread calls setUpper(4), with some unlucky timing both will pass the checks in the setters and both modifications will be applied. The result is that the range now holds (5, 4)—an invalid state. So while the underlying AtomicIntegers are thread-safe, the composite class is not. Because the underlying state variables lower and upper are not independent, NumberRange cannot simply delegate thread safety to its thread-safe state variables.
NumberRange could be made thread-safe by using locking to maintain its invariants, such as guarding lower and upper with a common lock. It must also avoid publishing lower and upper to prevent clients from subverting its invariants.
If a class has compound actions, as NumberRange does, delegation alone is again not a suitable approach for thread safety. In these cases, the class must provide its own locking to ensure that compound actions are atomic, unless the entire compound action can also be delegated to the underlying state variables.
If a class is composed of multiple independent thread-safe state variables and has no operations that have any invalid state transitions, then it can delegate thread safety to the underlying state variables.

What is a semaphore?

A semaphore is a programming concept that is frequently used to solve multi-threading problems. My question to the community:
What is a semaphore and how do you use it?

Think of semaphores as bouncers at a nightclub. There are a dedicated number of people that are allowed in the club at once. If the club is full no one is allowed to enter, but as soon as one person leaves another person might enter.
It's simply a way to limit the number of consumers for a specific resource. For example, to limit the number of simultaneous calls to a database in an application.
Here is a very pedagogic example in C# :-)
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
namespace TheNightclub
{
public class Program
{
public static Semaphore Bouncer { get; set; }
public static void Main(string[] args)
{
// Create the semaphore with 3 slots, where 3 are available.
Bouncer = new Semaphore(3, 3);
// Open the nightclub.
OpenNightclub();
}
public static void OpenNightclub()
{
for (int i = 1; i <= 50; i++)
{
// Let each guest enter on an own thread.
Thread thread = new Thread(new ParameterizedThreadStart(Guest));
thread.Start(i);
}
}
public static void Guest(object args)
{
// Wait to enter the nightclub (a semaphore to be released).
Console.WriteLine("Guest {0} is waiting to entering nightclub.", args);
Bouncer.WaitOne();
// Do some dancing.
Console.WriteLine("Guest {0} is doing some dancing.", args);
Thread.Sleep(500);
// Let one guest out (release one semaphore).
Console.WriteLine("Guest {0} is leaving the nightclub.", args);
Bouncer.Release(1);
}
}
}

The article Mutexes and Semaphores Demystified by Michael Barr is a great short introduction into what makes mutexes and semaphores different, and when they should and should not be used. I've excerpted several key paragraphs here.
The key point is that mutexes should be used to protect shared resources, while semaphores should be used for signaling. You should generally not use semaphores to protect shared resources, nor mutexes for signaling. There are issues, for instance, with the bouncer analogy in terms of using semaphores to protect shared resources - you can use them that way, but it may cause hard to diagnose bugs.
While mutexes and semaphores have some similarities in their implementation, they should always be used differently.
The most common (but nonetheless incorrect) answer to the question posed at the top is that mutexes and semaphores are very similar, with the only significant difference being that semaphores can count higher than one. Nearly all engineers seem to properly understand that a mutex is a binary flag used to protect a shared resource by ensuring mutual exclusion inside critical sections of code. But when asked to expand on how to use a "counting semaphore," most engineers—varying only in their degree of confidence—express some flavor of the textbook opinion that these are used to protect several equivalent resources.
...
At this point an interesting analogy is made using the idea of bathroom keys as protecting shared resources - the bathroom. If a shop has a single bathroom, then a single key will be sufficient to protect that resource and prevent multiple people from using it simultaneously.
If there are multiple bathrooms, one might be tempted to key them alike and make multiple keys - this is similar to a semaphore being mis-used. Once you have a key you don't actually know which bathroom is available, and if you go down this path you're probably going to end up using mutexes to provide that information and make sure you don't take a bathroom that's already occupied.
A semaphore is the wrong tool to protect several of the essentially same resource, but this is how many people think of it and use it. The bouncer analogy is distinctly different - there aren't several of the same type of resource, instead there is one resource which can accept multiple simultaneous users. I suppose a semaphore can be used in such situations, but rarely are there real-world situations where the analogy actually holds - it's more often that there are several of the same type, but still individual resources, like the bathrooms, which cannot be used this way.
...
The correct use of a semaphore is for signaling from one task to another. A mutex is meant to be taken and released, always in that order, by each task that uses the shared resource it protects. By contrast, tasks that use semaphores either signal or wait—not both. For example, Task 1 may contain code to post (i.e., signal or increment) a particular semaphore when the "power" button is pressed and Task 2, which wakes the display, pends on that same semaphore. In this scenario, one task is the producer of the event signal; the other the consumer.
...
Here an important point is made that mutexes interfere with real time operating systems in a bad way, causing priority inversion where a less important task may be executed before a more important task because of resource sharing. In short, this happens when a lower priority task uses a mutex to grab a resource, A, then tries to grab B, but is paused because B is unavailable. While it's waiting, a higher priority task comes along and needs A, but it's already tied up, and by a process that isn't even running because it's waiting for B. There are many ways to resolve this, but it most often is fixed by altering the mutex and task manager. The mutex is much more complex in these cases than a binary semaphore, and using a semaphore in such an instance will cause priority inversions because the task manager is unaware of the priority inversion and cannot act to correct it.
...
The cause of the widespread modern confusion between mutexes and semaphores is historical, as it dates all the way back to the 1974 invention of the Semaphore (capital "S", in this article) by Djikstra. Prior to that date, none of the interrupt-safe task synchronization and signaling mechanisms known to computer scientists was efficiently scalable for use by more than two tasks. Dijkstra's revolutionary, safe-and-scalable Semaphore was applied in both critical section protection and signaling. And thus the confusion began.
However, it later became obvious to operating system developers, after the appearance of the priority-based preemptive RTOS (e.g., VRTX, ca. 1980), publication of academic papers establishing RMA and the problems caused by priority inversion, and a paper on priority inheritance protocols in 1990, 3 it became apparent that mutexes must be more than just semaphores with a binary counter.
Mutex: resource sharing
Semaphore: signaling
Don't use one for the other without careful consideration of the side effects.

Mutex: exclusive-member access to a resource
Semaphore: n-member access to a resource
That is, a mutex can be used to syncronize access to a counter, file, database, etc.
A sempahore can do the same thing but supports a fixed number of simultaneous callers. For example, I can wrap my database calls in a semaphore(3) so that my multithreaded app will hit the database with at most 3 simultaneous connections. All attempts will block until one of the three slots opens up. They make things like doing naive throttling really, really easy.

Consider, a taxi that can accommodate a total of 3(rear)+2(front) persons including the driver. So, a semaphore allows only 5 persons inside a car at a time.
And a mutex allows only 1 person on a single seat of the car.
Therefore, Mutex is to allow exclusive access for a resource (like an OS thread) while a Semaphore is to allow access for n number of resources at a time.

#Craig:
A semaphore is a way to lock a
resource so that it is guaranteed that
while a piece of code is executed,
only this piece of code has access to
that resource. This keeps two threads
from concurrently accesing a resource,
which can cause problems.
This is not restricted to only one thread. A semaphore can be configured to allow a fixed number of threads to access a resource.

Semaphore can also be used as a ... semaphore.
For example if you have multiple process enqueuing data to a queue, and only one task consuming data from the queue. If you don't want your consuming task to constantly poll the queue for available data, you can use semaphore.
Here the semaphore is not used as an exclusion mechanism, but as a signaling mechanism.
The consuming task is waiting on the semaphore
The producing task are posting on the semaphore.
This way the consuming task is running when and only when there is data to be dequeued

There are two essential concepts to building concurrent programs - synchronization and mutual exclusion. We will see how these two types of locks (semaphores are more generally a kind of locking mechanism) help us achieve synchronization and mutual exclusion.
A semaphore is a programming construct that helps us achieve concurrency, by implementing both synchronization and mutual exclusion. Semaphores are of two types, Binary and Counting.
A semaphore has two parts : a counter, and a list of tasks waiting to access a particular resource. A semaphore performs two operations : wait (P) [this is like acquiring a lock], and release (V)[ similar to releasing a lock] - these are the only two operations that one can perform on a semaphore. In a binary semaphore, the counter logically goes between 0 and 1. You can think of it as being similar to a lock with two values : open/closed. A counting semaphore has multiple values for count.
What is important to understand is that the semaphore counter keeps track of the number of tasks that do not have to block, i.e., they can make progress. Tasks block, and add themselves to the semaphore's list only when the counter is zero. Therefore, a task gets added to the list in the P() routine if it cannot progress, and "freed" using the V() routine.
Now, it is fairly obvious to see how binary semaphores can be used to solve synchronization and mutual exclusion - they are essentially locks.
ex. Synchronization:
thread A{
semaphore &s; //locks/semaphores are passed by reference! think about why this is so.
A(semaphore &s): s(s){} //constructor
foo(){
...
s.P();
;// some block of code B2
...
}
//thread B{
semaphore &s;
B(semaphore &s): s(s){} //constructor
foo(){
...
...
// some block of code B1
s.V();
..
}
main(){
semaphore s(0); // we start the semaphore at 0 (closed)
A a(s);
B b(s);
}
In the above example, B2 can only execute after B1 has finished execution. Let's say thread A comes executes first - gets to sem.P(), and waits, since the counter is 0 (closed). Thread B comes along, finishes B1, and then frees thread A - which then completes B2. So we achieve synchronization.
Now let's look at mutual exclusion with a binary semaphore:
thread mutual_ex{
semaphore &s;
mutual_ex(semaphore &s): s(s){} //constructor
foo(){
...
s.P();
//critical section
s.V();
...
...
s.P();
//critical section
s.V();
...
}
main(){
semaphore s(1);
mutual_ex m1(s);
mutual_ex m2(s);
}
The mutual exclusion is quite simple as well - m1 and m2 cannot enter the critical section at the same time. So each thread is using the same semaphore to provide mutual exclusion for its two critical sections. Now, is it possible to have greater concurrency? Depends on the critical sections. (Think about how else one could use semaphores to achieve mutual exclusion.. hint hint : do i necessarily only need to use one semaphore?)
Counting semaphore: A semaphore with more than one value. Let's look at what this is implying - a lock with more than one value?? So open, closed, and ...hmm. Of what use is a multi-stage-lock in mutual exclusion or synchronization?
Let's take the easier of the two:
Synchronization using a counting semaphore: Let's say you have 3 tasks - #1 and 2 you want executed after 3. How would you design your synchronization?
thread t1{
...
s.P();
//block of code B1
thread t2{
...
s.P();
//block of code B2
thread t3{
...
//block of code B3
s.V();
s.V();
}
So if your semaphore starts off closed, you ensure that t1 and t2 block, get added to the semaphore's list. Then along comes all important t3, finishes its business and frees t1 and t2. What order are they freed in? Depends on the implementation of the semaphore's list. Could be FIFO, could be based some particular priority,etc. (Note : think about how you would arrange your P's and V;s if you wanted t1 and t2 to be executed in some particular order, and if you weren't aware of the implementation of the semaphore)
(Find out : What happens if the number of V's is greater than the number of P's?)
Mutual Exclusion Using counting semaphores: I'd like you to construct your own pseudocode for this (makes you understand things better!) - but the fundamental concept is this : a counting semaphore of counter = N allows N tasks to enter the critical section freely. What this means is you have N tasks (or threads, if you like) enter the critical section, but the N+1th task gets blocked (goes on our favorite blocked-task list), and only is let through when somebody V's the semaphore at least once. So the semaphore counter, instead of swinging between 0 and 1, now goes between 0 and N, allowing N tasks to freely enter and exit, blocking nobody!
Now gosh, why would you need such a stupid thing? Isn't the whole point of mutual exclusion to not let more than one guy access a resource?? (Hint Hint...You don't always only have one drive in your computer, do you...?)
To think about : Is mutual exclusion achieved by having a counting semaphore alone? What if you have 10 instances of a resource, and 10 threads come in (through the counting semaphore) and try to use the first instance?

I've created the visualization which should help to understand the idea. Semaphore controls access to a common resource in a multithreading environment.
ExecutorService executor = Executors.newFixedThreadPool(7);
Semaphore semaphore = new Semaphore(4);
Runnable longRunningTask = () -> {
boolean permit = false;
try {
permit = semaphore.tryAcquire(1, TimeUnit.SECONDS);
if (permit) {
System.out.println("Semaphore acquired");
Thread.sleep(5);
} else {
System.out.println("Could not acquire semaphore");
}
} catch (InterruptedException e) {
throw new IllegalStateException(e);
} finally {
if (permit) {
semaphore.release();
}
}
};
// execute tasks
for (int j = 0; j < 10; j++) {
executor.submit(longRunningTask);
}
executor.shutdown();
Output
Semaphore acquired
Semaphore acquired
Semaphore acquired
Semaphore acquired
Could not acquire semaphore
Could not acquire semaphore
Could not acquire semaphore
Sample code from the article

A semaphore is an object containing a natural number (i.e. a integer greater or equal to zero) on which two modifying operations are defined. One operation, V, adds 1 to the natural. The other operation, P, decreases the natural number by 1. Both activities are atomic (i.e. no other operation can be executed at the same time as a V or a P).
Because the natural number 0 cannot be decreased, calling P on a semaphore containing a 0 will block the execution of the calling process(/thread) until some moment at which the number is no longer 0 and P can be successfully (and atomically) executed.
As mentioned in other answers, semaphores can be used to restrict access to a certain resource to a maximum (but variable) number of processes.

A hardware or software flag. In multi tasking systems , a semaphore is as variable with a value that indicates the status of a common resource.A process needing the resource checks the semaphore to determine the resources status and then decides how to proceed.

Semaphores are act like thread limiters.
Example: If you have a pool of 100 threads and you want to perform some DB operation. If 100 threads access the DB at a given time, then there may be locking issue in DB so we can use semaphore which allow only limited thread at a time.Below Example allow only one thread at a time. When a thread call the acquire() method, it will then get the access and after calling the release() method, it will release the acccess so that next thread will get the access.
package practice;
import java.util.concurrent.Semaphore;
public class SemaphoreExample {
public static void main(String[] args) {
Semaphore s = new Semaphore(1);
semaphoreTask s1 = new semaphoreTask(s);
semaphoreTask s2 = new semaphoreTask(s);
semaphoreTask s3 = new semaphoreTask(s);
semaphoreTask s4 = new semaphoreTask(s);
semaphoreTask s5 = new semaphoreTask(s);
s1.start();
s2.start();
s3.start();
s4.start();
s5.start();
}
}
class semaphoreTask extends Thread {
Semaphore s;
public semaphoreTask(Semaphore s) {
this.s = s;
}
#Override
public void run() {
try {
s.acquire();
Thread.sleep(1000);
System.out.println(Thread.currentThread().getName()+" Going to perform some operation");
s.release();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

So imagine everyone is trying to go to the bathroom and there's only a certain number of keys to the bathroom. Now if there's not enough keys left, that person needs to wait. So think of semaphore as representing those set of keys available for bathrooms (the system resources) that different processes (bathroom goers) can request access to.
Now imagine two processes trying to go to the bathroom at the same time. That's not a good situation and semaphores are used to prevent this. Unfortunately, the semaphore is a voluntary mechanism and processes (our bathroom goers) can ignore it (i.e. even if there are keys, someone can still just kick the door open).
There are also differences between binary/mutex & counting semaphores.
Check out the lecture notes at http://www.cs.columbia.edu/~jae/4118/lect/L05-ipc.html.

This is an old question but one of the most interesting uses of semaphore is a read/write lock and it has not been explicitly mentioned.
The r/w locks works in simple fashion: consume one permit for a reader and all permits for writers.
Indeed, a trivial implementation of a r/w lock but requires metadata modification on read (actually twice) that can become a bottle neck, still significantly better than a mutex or lock.
Another downside is that writers can be started rather easily as well unless the semaphore is a fair one or the writes acquire permits in multiple requests, in such case they need an explicit mutex between themselves.
Further read:

Mutex is just a boolean while semaphore is a counter.
Both are used to lock part of code so it's not accessed by too many threads.
Example
lock.set()
a += 1
lock.unset()
Now if lock was a mutex, it means that it will always be locked or unlocked (a boolean under the surface) regardless how many threads try access the protected snippet of code. While locked, any other thread would just wait until it's unlocked/unset by the previous thread.
Now imagine if instead lock was under the hood a counter with a predefined MAX value (say 2 for our example). Then if 2 threads try to access the resource, then lock would get its value increased to 2. If a 3rd thread then tried to access it, it would simply wait for the counter to go below 2 and so on.
If lock as a semaphore had a max of 1, then it would be acting exactly as a mutex.

A semaphore is a way to lock a resource so that it is guaranteed that while a piece of code is executed, only this piece of code has access to that resource. This keeps two threads from concurrently accesing a resource, which can cause problems.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string