I am a beginner of multi thread programming.
I've been using Concurrent classes in a multi-threaded environment, but suddenly I was curious about using blockingqueue.
I thought Concurrent classes like ConcurrentHashMap use locking.
Recently, I happened to use QUEUE, and I looked at thread-safe queues. So I knew that there were BlockingQueue and linkedConcurrentQueue, and I studied these two queues.
The blockingqueue is thread-safe using locking. This is the typical thread-safe way I was thinking.
Concurrentqueue is processed thread-safe by using an algorithm called CAS.
However, I do not know whether the target of this CAS is the queue itself or an element belonging to the queue.
Even if multiple threads poll elements in concurrentqueue at the same time, are they polling different elements? Isn't there a case that polls for the same element at a time?
If so, lock-free concurrentqueue looks too good compared to blockingqueue... What is the reason that blockingqueue is still deprecate and alive? (except for take() method)
Are there any articles I would like to study or reference? Thank you!
A blocking operation cannot be lock-free (and vice versa). A lock-free operation provides a progress guarantee. Here is the definition from Wikipedia:
An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress (for some sensible definition of progress).
A thread that is blocked cannot make progress until it is "unblocked". But that usually means that some other thread must have made progress in order to unblock the blocked thread. So the blocked thread depends on some other thread's progress. In a lock-free operation this is not the case - a thread can only be prevented to make progress, if some other thread is instead making progress and thereby interferes with the first thread.
That's why a lock-free queue cannot have a blocking pop operation that blocks in case the queue is empty, waiting for some other thread to push an item. Instead they usually have a tryPop operation that indicates whether it was successful or not.
A (lock-free) Compare-And-Swap (CAS) operation can usually only be performed on a single pointer. It is one of the fundamental operations required by lock-free algorithms. In case of the ConcurrentLinkedQueue (which is an implementation of the queue proposed by Michael and Scott in Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms), the head and tail pointers are updated using CAS operations.
The topic of concurrent programming is quite vast and complex. If you are really interested, I would suggest to start with the excellent book The Art of Multiprocessor Programming.
There is lot of discussion on thread synchronization on SO as well as on many forums all-over the Internet. However, I could not find precise information as to how exactly it happens at OS level conceptually.
As we all know there are these types of thread synchronization objects:
Mutex
Semaphore
Critical section
And as I understand it fully, allowing multiple threads at a time to modify a resource (for example, two threads simultaneously changing bits of a variable in memory) is not a good idea and so we use these objects. But then that's what exactly same should happen when multiple threads try to access these objects as well.
What really happens at the core? How exactly does OS achieve this?
How can we explain this to someone at conceptual level (rather than going in hardware or assembly level details)?
First let's sum up what the fundamental problem of threading really is-- two threads try to access the same piece of memory at the same time. You can imagine that when this happens we can't guarantee that a piece of memory is in a valid state, and our program might be incorrect.
Trying to keep this very high level, part of the way processors work is by throwing interrupts which basically tell a thread to stop what is doing and do something else. This is where much of the problem of threading lies. A thread can be interrupted in the middle of task. Imagine one thread is interrupted in the middle of an operation and some intermediate garbage value exists because the thread hasn't finished its task. Another thread could come along and read this value and destroy the correctness of your program.
The OS achieves this with Atomic instructions. Without getting into the details, image that there were some instructions that were guaranteed to either be completed or not completed. This means that if a thread checks the result of an instruction it won't see an intermediate results. So an atomic add method would either show the value before the add or after the add, but not during the add when their might be some intermediate state.
Now if you have a few atomic instructions you might be able to imagine that you could build higher level abstractions that deal with threads and thread safety on the back of these. Maybe the most basic example in a lock created with the test and set primitive. Take a look at this wikipedia article https://en.wikipedia.org/wiki/Test-and-set. Now that was probably a lot because these things get pretty complex. But I will attempt to given an example that clarifies. If you have two processes running that are trying to access some section of code, a very naive solution would be to create a lock variable
boolean isLocked = false;
Anytime a process tried to acquire this lock you could merely check isLocked==false and wait until isLocked ==true before executing some code. For example...
while(isLocked){
//wait for isLocked == false
}
isLocked = true;
// execute the code you want to be locked
isLocked = false;
Of course, we know that something as simple as setting or reading a boolean can be interrupted and cause threading mayhem. So, the good folks that developed kernels and processors and hardware created an atomic test and set operation which returns the old value of a boolean and sets the new value to true. So of course you can implement our lock above by doing something like.
while(testAndSet(isLocked)){ //wait until the old value returned is
false so the lock is unlocked } //do some critical stuff
//unlock after you did the critical stuff lock = false;
I only show the implementation of a basic lock above to prove the point that it is possible to build higher level abstractions on atomic instructions. Atomic instruction are about as low level as you can get conceptually, in my opinion, without delving into hardware specifics. You can imagine though that within hardware, the hardware must somehow set a flag of some sort when memory is being read that precludes another thread from accessing the same memory.
Hope that helps!
I'm learning Operating System now, and I'm quite confused with the two concepts - mutex and atomic operation. In my understanding, they are the same, but my OS instructor gave us such a question,
Suppose a multi-processor operating system kernel tracks the number of processes created by each user. This operating system kernel maintains a counter variable for each user that it increments every time it creates a new process for a user and decrements every time a process from that user terminates. Furthermore, this operating system runs on a processor that provides atomic fetch-and-increment and fetch-and-decrement instructions.
Should the operating system update the counter using the atomic increment and decrement instructions, or should it update the counter in a critical section protected by a mutex?
This question indicates that mutex and atomic operation are two things. Could anyone help me with it?
An atomic operation is one that cannot be subdivided into smaller parts. As such, it will never be halfway done, so you can guarantee that it will always be observed in a consistent state. For example, modern hardware implements atomic compare-and-swap operations.
A mutex (short for mutual exclusion) excludes other processes or threads from executing the same section of code (the critical section). Basically, it ensures that at most one thread is executing a given section of code. A mutex is also called a lock.
Underneath the hood, locks must be implemented using hardware somehow, and the implementation must make use of the atomicity guarantees of the underlying hardware.
Most nontrivial operations cannot be made atomic, so you must either use a lock to block other threads from operating while the critical section executes, or else you must carefully design a lock-free algorithm that ensures that all the critical state-changing operations can be safely implemented using atomic operations.
This is a very deep subject, and there is a large body of literature on all these topics. The Wikipedia links I've given are a good starting point, but since you're taking a class on operating systems right now, it might be best for you to ask your professor to provide good resources for learning and understanding this stuff.
If you're a total noob, my answer may be a good place to start. I've just learned how these work, and feel I'm in a good place to relay back.
Generally, both of these are means of avoiding bad things that happen when you read something that's halfway written.
Mutex
A mutex is like the key to a bathroom at a small business. Only one person ever has the key, so if some other person comes along they'll likely have to wait. Here's the rubs:
If someone walks off with the key, then the waiting person never stops waiting.
Nothing can stop some other process from making its own door to the bathroom.
In the context of code, a mutex is mostly the key part, and the person is a process.
Atomic
Atomic means something that can't be split into smaller steps. In the natural world there is no CPU clock -- so everything we do could be smaller steps -- but let's pretend...
When you're typing on your keyboard, every key you hit is an atomic action. It happens all at once, and you can not hit two keys at exactly the same time. Here's what's good about this:
No waiting: the fact that no two keys are being hit at the same time is not because one has to wait. It's because one is always done by the time the next gets there.
No collision: no matter how much you hammer away, you'll never get two characters overlaid. One always happens before the other, completely.
For a counter example, if you were trying to type two words at the same time, that would be not atomic. The letters would mix up.
In the context of code, hitting keys is the same as running a single CPU command. It doesn't matter what other commands are in queue, the one your are doing will finish in its entirety before the next happens.
If you can do something atomically, then you don't have to worry about collision. But not everything is feasible within these bounds. Generally, atomics are for really low level operations -- like getting and setting an primitive (int, boolean, etc). For anything that's going to run a bunch of CPU commands but wants to be atomic, there's a couple tricks:
Use a mutex. Kind of cheating, not really atomic. But some things do this and call themselves atomic.
Carefully writing code such that it never requires more than one concurrent instruction on a piece of data in a row to remain correct. This one gets a bit deeper, but sometimes it can be done.
From here there's tons of reading to get into the nitty gritty details, but this should be enough to give you a foundation understanding of the subject.
first read #Daniel answer then mine.
If your processor provides atomic instructions enough to complete your task you do not need Mutex/locks. In your case fetch-increment and fetch-decrement are supposed to be atomic so you do not need to use Mutex.
Atomic operations use low level/hardware level locks to make some operations ATOMIC: operations which are virtually performed in one go/cpu cycle. So atomic operations never place system in inconsistent state
EDIT
No Atomic and Mutex are not same thing but two opposite things used for same purpose of making sure that state of system should not become inconsistent. You use Mutex for Non-ATOMIC operations while for ATOMIC operations you do not use Mutex.
Back in my days as a BeOS programmer, I read this article by Benoit Schillings, describing how to create a "benaphore": a method of using atomic variable to enforce a critical section that avoids the need acquire/release a mutex in the common (no-contention) case.
I thought that was rather clever, and it seems like you could do the same trick on any platform that supports atomic-increment/decrement.
On the other hand, this looks like something that could just as easily be included in the standard mutex implementation itself... in which case implementing this logic in my program would be redundant and wouldn't provide any benefit.
Does anyone know if modern locking APIs (e.g. pthread_mutex_lock()/pthread_mutex_unlock()) use this trick internally? And if not, why not?
What your article describes is in common use today. Most often it's called "Critical Section", and it consists of an interlocked variable, a bunch of flags and an internal synchronization object (Mutex, if I remember correctly). Generally, in the scenarios with little contention, the Critical Section executes entirely in user mode, without involving the kernel synchronization object. This guarantees fast execution. When the contention is high, the kernel object is used for waiting, which releases the time slice conductive for faster turnaround.
Generally, there is very little sense in implementing synchronization primitives in this day and age. Operating systems come with a big variety of such objects, and they are optimized and tested in significantly wider range of scenarios than a single programmer can imagine. It literally takes years to invent, implement and test a good synchronization mechanism. That's not to say that there is no value in trying :)
Java's AbstractQueuedSynchronizer (and its sibling AbstractQueuedLongSynchronizer) works similarly, or at least it could be implemented similarly. These types form the basis for several concurrency primitives in the Java library, such as ReentrantLock and FutureTask.
It works by way of using an atomic integer to represent state. A lock may define the value 0 as unlocked, and 1 as locked. Any thread wishing to acquire the lock attempts to change the lock state from 0 to 1 via an atomic compare-and-set operation; if the attempt fails, the current state is not 0, which means that the lock is owned by some other thread.
AbstractQueuedSynchronizer also facilitates waiting on locks and notification of conditions by maintaining CLH queues, which are lock-free linked lists representing the line of threads waiting either to acquire the lock or to receive notification via a condition. Such notification moves one or all of the threads waiting on the condition to the head of the queue of those waiting to acquire the related lock.
Most of this machinery can be implemented in terms of an atomic integer representing the state as well as a couple of atomic pointers for each waiting queue. The actual scheduling of which threads will contend to inspect and change the state variable (via, say, AbstractQueuedSynchronizer#tryAcquire(int)) is outside the scope of such a library and falls to the host system's scheduler.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am applying my new found knowledge of threading everywhere and getting lots of surprises
Example:
I used threads to add numbers in an
array. And outcome was different every
time. The problem was that all of my
threads were updating the same
variable and were not synchronized.
What are some known thread issues?
What care should be taken while using
threads?
What are good multithreading resources.
Please provide examples.
sidenote:(I renamed my program thread_add.java to thread_random_number_generator.java:-)
In a multithreading environment you have to take care of synchronization so two threads doesn't clobber the state by simultaneously performing modifications. Otherwise you can have race conditions in your code (for an example see the infamous Therac-25 accident.) You also have to schedule the threads to perform various tasks. You then have to make sure that your synchronization and scheduling doesn't cause a deadlock where multiple threads will wait for each other indefinitely.
Synchronization
Something as simple as increasing a counter requires synchronization:
counter += 1;
Assume this sequence of events:
counter is initialized to 0
thread A retrieves counter from memory to cpu (0)
context switch
thread B retrieves counter from memory to cpu (0)
thread B increases counter on cpu
thread B writes back counter from cpu to memory (1)
context switch
thread A increases counter on cpu
thread A writes back counter from cpu to memory (1)
At this point the counter is 1, but both threads did try to increase it. Access to the counter has to be synchronized by some kind of locking mechanism:
lock (myLock) {
counter += 1;
}
Only one thread is allowed to execute the code inside the locked block. Two threads executing this code might result in this sequence of events:
counter is initialized to 0
thread A acquires myLock
context switch
thread B tries to acquire myLock but has to wait
context switch
thread A retrieves counter from memory to cpu (0)
thread A increases counter on cpu
thread A writes back counter from cpu to memory (1)
thread A releases myLock
context switch
thread B acquires myLock
thread B retrieves counter from memory to cpu (1)
thread B increases counter on cpu
thread B writes back counter from cpu to memory (2)
thread B releases myLock
At this point counter is 2.
Scheduling
Scheduling is another form of synchronization and you have to you use thread synchronization mechanisms like events, semaphores, message passing etc. to start and stop threads. Here is a simplified example in C#:
AutoResetEvent taskEvent = new AutoResetEvent(false);
Task task;
// Called by the main thread.
public void StartTask(Task task) {
this.task = task;
// Signal the worker thread to perform the task.
this.taskEvent.Set();
// Return and let the task execute on another thread.
}
// Called by the worker thread.
void ThreadProc() {
while (true) {
// Wait for the event to become signaled.
this.taskEvent.WaitOne();
// Perform the task.
}
}
You will notice that access to this.task probably isn't synchronized correctly, that the worker thread isn't able to return results back to the main thread, and that there is no way to signal the worker thread to terminate. All this can be corrected in a more elaborate example.
Deadlock
A common example of deadlock is when you have two locks and you are not careful how you acquire them. At one point you acquire lock1 before lock2:
public void f() {
lock (lock1) {
lock (lock2) {
// Do something
}
}
}
At another point you acquire lock2 before lock1:
public void g() {
lock (lock2) {
lock (lock1) {
// Do something else
}
}
}
Let's see how this might deadlock:
thread A calls f
thread A acquires lock1
context switch
thread B calls g
thread B acquires lock2
thread B tries to acquire lock1 but has to wait
context switch
thread A tries to acquire lock2 but has to wait
context switch
At this point thread A and B are waiting for each other and are deadlocked.
There are two kinds of people that do not use multi threading.
1) Those that do not understand the concept and have no clue how to program it.
2) Those that completely understand the concept and know how difficult it is to get it right.
I'd make a very blatant statement:
DON'T use shared memory.
DO use message passing.
As a general advice, try to limit the amount of shared state and prefer more event-driven architectures.
I can't give you examples besides pointing you at Google. Search for threading basics, thread synchronisation and you'll get more hits than you know.
The basic problem with threading is that threads don't know about each other - so they will happily tread on each others toes, like 2 people trying to get through 1 door, sometimes they will pass though one after the other, but sometimes they will both try to get through at the same time and will get stuck. This is difficult to reproduce, difficult to debug, and sometimes causes problems. If you have threads and see "random" failures, this is probably the problem.
So care needs to be taken with shared resources. If you and your friend want a coffee, but there's only 1 spoon you cannot both use it at the same time, one of you will have to wait for the other. The technique used to 'synchronise' this access to the shared spoon is locking. You make sure you get a lock on the shared resource before you use it, and let go of it afterwards. If someone else has the lock, you wait until they release it.
Next problem comes with those locks, sometimes you can have a program that is complex, so much that you get a lock, do something else then access another resource and try to get a lock for that - but some other thread has that 2nd resource, so you sit and wait... but if that 2nd thread is waiting for the lock you hold for the 1st resource.. it's going to sit and wait. And your app just sits there. This is called deadlock, 2 threads both waiting for each other.
Those 2 are the vast majority of thread issues. The answer is generally to lock for as short a time as possible, and only hold 1 lock at a time.
I notice you are writing in java and that nobody else mentioned books so Java Concurrency In Practice should be your multi-threaded bible.
-- What are some known thread issues? --
Race conditions.
Deadlocks.
Livelocks.
Thread starvation.
-- What care should be taken while using threads? --
Using multi-threading on a single-processor machine to process multiple tasks where each task takes approximately the same time isn’t always very effective.For example, you might decide to spawn ten threads within your program in order to process ten separate tasks. If each task takes approximately 1 minute to process, and you use ten threads to do this processing, you won’t have access to any of the task results for the whole 10 minutes. If instead you processed the same tasks using just a single thread, you would see the first result in 1 minute, the next result 1 minute later, and so on. If you can make use of each result without having to rely on all of the results being ready simultaneously, the single
thread might be the better way of implementing the program.
If you launch a large number of threads within a process, the overhead of thread housekeeping and context switching can become significant. The processor will spend considerable time in switching between threads, and many of the threads won’t be able to make progress. In addition, a single process with a large number of threads means that threads in other processes will be scheduled less frequently and won’t receive a reasonable share of processor time.
If multiple threads have to share many of the same resources, you’re unlikely to see performance benefits from multi-threading your application. Many developers see multi-threading as some sort of magic wand that gives automatic performance benefits. Unfortunately multi-threading isn’t the magic wand that it’s sometimes perceived to be. If you’re using multi-threading for performance reasons, you should measure your application’s performance very closely in several different situations, rather than just relying on some non-existent magic.
Coordinating thread access to common data can be a big performance killer. Achieving good performance with multiple threads isn’t easy when using a coarse locking plan, because this leads to low concurrency and threads waiting for access. Alternatively, a fine-grained locking strategy increases the complexity and can also slow down performance unless you perform some sophisticated tuning.
Using multiple threads to exploit a machine with multiple processors sounds like a good idea in theory, but in practice you need to be careful. To gain any significant performance benefits, you might need to get to grips with thread balancing.
-- Please provide examples. --
For example, imagine an application that receives incoming price information from
the network, aggregates and sorts that information, and then displays the results
on the screen for the end user.
With a dual-core machine, it makes sense to split the task into, say, three threads. The first thread deals with storing the incoming price information, the second thread processes the prices, and the final thread handles the display of the results.
After implementing this solution, suppose you find that the price processing is by far the longest stage, so you decide to rewrite that thread’s code to improve its performance by a factor of three. Unfortunately, this performance benefit in a single thread may not be reflected across your whole application. This is because the other two threads may not be able to keep pace with the improved thread. If the user interface thread is unable to keep up with the faster flow of processed information, the other threads now have to wait around for the new bottleneck in the system.
And yes, this example comes directly from my own experience :-)
DONT use global variables
DONT use many locks (at best none at all - though practically impossible)
DONT try to be a hero, implementing sophisticated difficult MT protocols
DO use simple paradigms. I.e share the processing of an array to n slices of the same size - where n should be equal to the number of processors
DO test your code on different machines (using one, two, many processors)
DO use atomic operations (such as InterlockedIncrement() and the like)
YAGNI
The most important thing to remember is: do you really need multithreading?
I agree with pretty much all the answers so far.
A good coding strategy is to minimise or eliminate the amount of data that is shared between threads as much as humanly possible. You can do this by:
Using thread-static variables (although don't go overboard on this, it will eat more memory per thread, depending on your O/S).
Packaging up all state used by each thread into a class, then guaranteeing that each thread gets exactly one state class instance to itself. Think of this as "roll your own thread-static", but with more control over the process.
Marshalling data by value between threads instead of sharing the same data. Either make your data transfer classes immutable, or guarantee that all cross-thread calls are synchronous, or both.
Try not to have multiple threads competing for the exact same I/O "resource", whether it's a disk file, a database table, a web service call, or whatever. This will cause contention as multiple threads fight over the same resource.
Here's an extremely contrived OTT example. In a real app you would cap the number of threads to reduce scheduling overhead:
All UI - one thread.
Background calcs - one thread.
Logging errors to a disk file - one thread.
Calling a web service - one thread per unique physical host.
Querying the database - one thread per independent group of tables that need updating.
Rather than guessing how to do divvy up the tasks, profile your app and isolate those bits that are (a) very slow, and (b) could be done asynchronously. Those are good candidates for a separate thread.
And here's what you should avoid:
Calcs, database hits, service calls, etc - all in one thread, but spun up multiple times "to improve performance".
Don't start new threads unless you really need to. Starting threads is not cheap and for short running tasks starting the thread may actually take more time than executing the task itself. If you're on .NET take a look at the built in thread pool, which is useful in a lot of (but not all) cases. By reusing the threads the cost of starting threads is reduced.
EDIT: A few notes on creating threads vs. using thread pool (.NET specific)
Generally try to use the thread pool. Exceptions:
Long running CPU bound tasks and blocking tasks are not ideal run on the thread pool cause they will force the pool to create additional threads.
All thread pool threads are background threads, so if you need your thread to be foreground, you have to start it yourself.
If you need a thread with different priority.
If your thread needs more (or less) than the standard 1 MB stack space.
If you need to be able to control the life time of the thread.
If you need different behavior for creating threads than that offered by the thread pool (e.g. the pool will throttle creating of new threads, which may or may not be what you want).
There are probably more exceptions and I am not claiming that this is the definitive answer. It is just what I could think of atm.
I am applying my new found knowledge of threading everywhere
[Emphasis added]
DO remember that a little knowledge is dangerous. Knowing the threading API of your platform is the easy bit. Knowing why and when you need to use synchronisation is the hard part. Reading up on "deadlocks", "race-conditions", "priority inversion" will start you in understanding why.
The details of when to use synchronisation are both simple (shared data needs synchronisation) and complex (atomic data types used in the right way don't need synchronisation, which data is really shared): a lifetime of learning and very solution specific.
An important thing to take care of (with multiple cores and CPUs) is cache coherency.
I am surprised that no one has pointed out Herb Sutter's Effective Concurrency columns yet. In my opinion, this is a must read if you want to go anywhere near threads.
a) Always make only 1 thread responsible for a resource's lifetime. That way thread A won't delete a resource thread B needs - if B has ownership of the resource
b) Expect the unexpected
DO think about how you will test your code and set aside plenty of time for this. Unit tests become more complicated. You may not be able to manually test your code - at least not reliably.
DO think about thread lifetime and how threads will exit. Don't kill threads. Provide a mechanism so that they exit gracefully.
DO add some kind of debug logging to your code - so that you can see that your threads are behaving correctly both in development and in production when things break down.
DO use a good library for handling threading rather than rolling your own solution (if you can). E.g. java.util.concurrency
DON'T assume a shared resource is thread safe.
DON'T DO IT. E.g. use an application container that can take care of threading issues for you. Use messaging.
In .Net one thing that surprised me when I started trying to get into multi-threading is that you cannot straightforwardly update the UI controls from any thread other than the thread that the UI controls were created on.
There is a way around this, which is to use the Control.Invoke method to update the control on the other thread, but it is not 100% obvious the first time around!
Don't be fooled into thinking you understand the difficulties of concurrency until you've split your head into a real project.
All the examples of deadlocks, livelocks, synchronization, etc, seem simple, and they are. But they will mislead you, because the "difficulty" in implementing concurrency that everyone is talking about is when it is used in a real project, where you don't control everything.
While your initial differences in sums of numbers are, as several respondents have pointed out, likely to be the result of lack of synchronisation, if you get deeper into the topic, be aware that, in general, you will not be able to reproduce exactly the numeric results you get on a serial program with those from a parallel version of the same program. Floating-point arithmetic is not strictly commutative, associative, or distributive; heck, it's not even closed.
And I'd beg to differ with what, I think, is the majority opinion here. If you are writing multi-threaded programs for a desktop with one or more multi-core CPUs, then you are working on a shared-memory computer and should tackle shared-memory programming. Java has all the features to do this.
Without knowing a lot more about the type of problem you are tackling, I'd hesitate to write that 'you should do this' or 'you should not do that'.