Why does Scala use a ForkJoinPool to back its default ExecutionContext?

Why does Scala use a ForkJoinPool to back its default ExecutionContext? - multithreading

In Scala you can use a global ExecutionContext if you don't need to define your own by importing scala.concurrent.ExecutionContext.Implicits.global.
My question is why ForkJoinPool was chosen for this executor instead of ThreadPoolExecutor.
My understanding is that the fork-join framework is excellent at recursively decomposing problems. You're supposed to write code that breaks a task into halves, so that half can be executed in-thread and the other half can be executed by another thread. This seems like a very particular programming model and not one that would be generally applicable to handling execution of general asynchronous tasks across a wide range of possible applications.
So why was ForkJoinPool chosen for the default execution context that most people will likely use? Does the work-stealing design result in improved performance even if you don't use the full fork-join paradigm?

I can't speak for the scala designers, but idiomatic use of scala Futures often involves creation of a lot of very small, short-lived tasks (e.g. every map call creates a new task) and so the work-stealing design is appropriate.
If you care about these kind of precise details you might prefer to use scalaz-concurrent's Future, which uses trampolines to avoid creating extra tasks for each map step and makes execution contexts explicit (and AIUI defaults to a ThreadPoolExecutor).

Related

Non-determinism of synchronous execution contexts (a.k.a. `parasitic`)

In Scala, as explained in the PR that introduced it, parasitic allows to steal
execution time from other threads by having its
Runnables run on the Thread which calls execute and then yielding back control
to the caller after all its Runnables have been executed.
It appears to be a neat trick to avoid context switches when:
you are doing a trivial operation following on a Future coming from an actually long running operation, or
you are working with an API that doesn't allow you to specify an ExecutionContext for a Future but you but you would like to make sure the operation continues on that same thread, without introducing a different threadpool
The PR that originally introduced parasitic further explains that
When using parasitic with abstractions such as Future it will in many cases be non-deterministic
as to which Thread will be executing the logic, as it depends on when/if that Future is completed.
This concept is also repeated in the official Scala documentation in the paragraphs about Synchronous Execution Contexts:
One might be tempted to have an ExecutionContext that runs computations within the current thread:
val currentThreadExecutionContext = ExecutionContext.fromExecutor(
new Executor {
// Do not do this!
def execute(runnable: Runnable) { runnable.run() }
})
This should be avoided as it introduces non-determinism in the execution of your future.
Future {
doSomething
}(ExecutionContext.global).map {
doSomethingElse
}(currentThreadExecutionContext)
The doSomethingElse call might either execute in doSomething’s thread or in the main thread, and therefore be either asynchronous or synchronous. As explained here a callback should not be both.
I have a couple of doubts:
how is parasitic different from the synchronous execution context in the Scala documentation?
what is the source of non-determinism mentioned in both sources? From the comment in the PR that introduced parasitic it sounds like if doSomething completes very quickly it may return control to the main thread and you may end up actually not running doSomethingElse on a global thread but on the main one. That's what I could make of it, but I would like to confirm.
is there a reliable way to have a computation run on the same thread as its preceding task? I guess using a lazy implementation of the Future abstraction (e.g. IO in Cats) could make this more easy and reliable, but I'd like to understand if this is possible at all using the Future from the standard library.

parasitic has an upper bound on stack recursion, to try to mitigate the risk of StackOverflowErrors due to nested submissions, and can instead defer Runnables to a queue.
The source of non-determinisism is: If the Future is not yet completed: it will register to run on the completing thread. if the Future is completed it will run on the registering thread. Since those two situations can depend on timing, it is not deterministic which thread will execute the code.
How do you know A) which Thread that is, and B) whether it would ever be able to execute another task again?
I find that it is easier to think about Futures as read-handles for values that may or may not exist at a specific point in time. That nicely untangles the notion of Threads from the picture, and now it is rather about: When a value is available, I want to do X—and here is the ExecutionContext that will do X.

Semi-explicit parallelism in Haskell

I am reading semi-explicit parallelism in Haskell, and get some confusion.
par :: a -> b -> b
People say that this approach allows us to make automatically parallelization by evaluating every sub-expression of a Haskell program in parallel. But this approach has the following disadvantages:
1) It creates far too many small items of the work, which cannot be efficiently scheduled. As I understand, if you use par function for every lines of Haskell program, it will creates too many threads, and it's not practical at all. Is that right?
2) With this approach, parallelism is limited by data dependencies in the source program. If I understand correctly, it means every sub-expression must be independent. Like, in the par function, a and b must be independent.
3) The Haskell runtime system does not necessarily create a thread to compute the value of the expression a. Instead, it creates a spark, which has the potential to be executed on a different thread from the parent thread.
So, my question is : finally the runtime system will create a thread to compute a or not? Or if the expression a is needed to compute the expression b, the system will create a new thread to compute a? Otherwise, it will not. Is this true?
I am a newbie to Haskell, so maybe my questions are still basic for all of you. Thanks for your answer.

The par combinator you mention is part of the Glasgow parallel Haskell (GpH) which implements semi-explicit parallelism, which however means it is not fully implicit and hence does not provide automatic parallelisation. The programmer still needs to identify subexperssions deemed worthwhile executing in parallel to avoid the issue you mention in 1).
Moreover, the annotation is not prescriptive (as e.g. pthread_create in C or forkIO in Haskell) but advisory, that means the runtime system finally decides whether to evaluate the subexpressions in parallel or not. This provides additional flexibility and a way for dynamic granularity control. Additionally, so-called Evaluation Strategies have been designed to abstract over par and pseq and to separate specification of coordination from computation. For instance, a strategy parListChunk allows to chunk a list and force it to weak head normal form (this is a case where some strictness is needed).
2) Parallelism is limited by data dependencies in the sense that the computation defines the way the graph is reduced and which computation is demanded at which point. It is not true that every sub-expression must be independent. For instance E1 par E2 returns the result of E2, it means to be useful, some part of E1 needs to be used in E2 and hence E2 depends on E1.
3) The picture is slightly confused here because of the GHC-specific terminology. There are Capabilities (or Haskell Execution Contexts) which implement parallel graph reduction and maintain a spark pool and a thread pool each. Usually there is one Capability per core (can be thought of as OS threads). On the other end of the continuum there are sparks, which are basically pointers to parts of the graph that have not been evaluated yet (thunks). And there are threads (actually sort of tasks or work units), so that to be evaluated in parallel a spark needs to be turned into a thread (which has a so called thread state object that contains the necessary execution environment and allows a thunk to be evaluated in parallel). A thunk may depend on results of other thunks and blocks until these results arrive. These threads are much more lightweight than OS threads and are being multiplexed onto the available Capablities.
So, in summary, a runtime will not even neccesarily create a lightweight thread to evaluate a sub-expression. By the way, random work-stealing is used for load-balancing.
This is a very high-level approach to parallelism and avoids race conditions and deadlocks by design. The synchronisation is implicitly mediated through graph reduction. A nice position statement discusses further Why Parallel Functional Programming Matters. For more information on the abstract machine behind the scenes have a look at Stackless Tagless G-Machine and checkout the notes on Haskell Execution Model on GHC wiki (usually the most up-to-date documentation alongside the source code).

Yes, you are correct. You would not gain anything by creating a spark for every expression you want computed. You would get way, way too many sparks. Trying to manage this is what Data Parallel Haskell is about. DPH is a way of breaking down a nested computations into well-sized chunks which can then be computed in parallel. Keep in mind that this is still a research effort and probably not ready for mainstream consumption.
Once again, you are correct. If a depends on b you have to compute as much of a as b needs to be able to start computation of b.
Yup. Threads actually have a pretty high overhead compared to some of the alternatives. Sparks are somewhat like thunks only they can be computedd independently of time.
No, the RTS will not create a thread to compute a. You can decide how many threads the RTS should have running (+RTS -N6 for six threads) and they will be kept alive for the duration of the program.
par only creates a spark. A spark is not a thread. The sparks occupy a work pool, and the scheduler performs work stealing – i.e. when a thread goes idle it picks up a spark from the pool and computes it.

How to articulate the difference between asynchronous and parallel programming?

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.

When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.

I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.

This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.

async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.

My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.

I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.

It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.

Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.

Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.

Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.

I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.

Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.

Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java

What is so great about TPL

I have done this POC and verified that when you create 4 threads and run them on Quad core machine, all cores get busy - so, CLR is already scheduling threads on different cores effectively, so why the class TASK?
I agree Task simplifies the creation and use of threads, but apart from that what? Its just a wrapper around threads and threadpools right? Or does it in some way help scheduling threads on multicore machines?
I am specifially looking at whats with Task wrt multicore that wasnt there in 2.0 threads.

"I agree Task simplifies the creation and use of threads"
Isn't that enough? Isn't it fabulous that it provides higher-level building blocks so that us mere mortals can build lock-free multithreaded code which is safe because really smart people like Joe Duffy have done the work for us?
If TPL really just consisted of a way of starting a new task, it wouldn't be much use - the work-stealing etc is nice, but probably not crucial to most of us. It's the building blocks around tasks - and in particular around the idea of a "future" - which provide the value. Do you really want to write Parallel.ForEach yourself? Do you want to want to work out how to perform partitioning efficiently yourself? I know that if I tried doing that, it would take me a long time and I'd certainly do a worse job of it than the PFX team.
Many of the advances in development haven't been about making it possible to do something which was impossible before - they've been about raising the abstraction level so that a problem can be solved once and then that solution reused. Do you feel the same way about the CLR itself? You could do the same thing in assembly yourself, obviously... but by raising the abstraction level, the CLR and C# make us more productive.

Although you could do everything equivalently in TPL or threadpool, for a better abstraction, and scalability patterns TPL is preferred over Threadpool. But it is upto the programmer, and if you know exactly what you are doing, and based on your scheduling and synchronization requirements play out in your specific application you could use Threadpool more effectively. There are some stuff you get free with TPL which you've got to code up when using Threadpool, like following few I can think of now.
work stealing
worker thread local pool
scheduling groups of actions like Parallel.For

The TPL lets you program in terms of Tasks not threads. Thinking of tasks solely as a better thread abstraction would be a mistake. Tasks allow you to specify the work you want to get executed not how you want the work executed (threads). This allows you to express the potential parallelism of your application and have the TPL runtime--the scheduler and thread pool--handle how that work gets executed. This means that the TPL will take a lot of the burden off you of having your application deal with ensuring the best perfromance on a wide variety of hardware with different numbers of cores.
For example, the TPL makes it easy to implement key several design patterns that allow you to express the potential parallelism of your application.
http://msdn.microsoft.com/en-us/library/ff963553.aspx
Like Futures (mentioned by Jon) as well as pipelines and parallel loops.

How to define threadsafe?

Threadsafe is a term that is thrown around documentation, however there is seldom an explanation of what it means, especially in a language that is understandable to someone learning threading for the first time.
So how do you explain Threadsafe code to someone new to threading?
My ideas for options are the moment are:
Do you use a list of what makes code
thread safe vs. thread unsafe
The book definition
A useful metaphor

Multithreading leads to non-deterministic execution - You don't know exactly when a certain piece of parallel code is run.
Given that, this wonderful multithreading tutorial defines thread safety like this:
Thread-safe code is code which has no indeterminacy in the face of any multithreading scenario. Thread-safety is achieved primarily with locking, and by reducing the possibilities for interaction between threads.
This means no matter how the threads are run in particular, the behaviour is always well-defined (and therefore free from race conditions).

Eric Lippert says:
When I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?".
It is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent.

G'day,
A good place to start is to have a read of the POSIX paper on thread safety.
Edit: Just the first few paragraphs give you a quick overview of thread safety and re-entrant code.
HTH
cheers,

i maybe wrong but one of the criteria for being thread safe is to use local variables only. Using global variables can have undefined result if the same function is called from different threads.

A thread safe function / object (hereafter referred to as an object) is an object which is designed to support multiple concurrent calls. This can be achieved by serialization of the parallel requests or some sort of support for intertwined calls.
Essentially, if the object safely supports concurrent requests (from multiple threads), it is thread safe. If it is not thread safe, multiple concurrent calls could corrupt its state.
Consider a log book in a hotel. If a person is writing in the book and another person comes along and starts to concurrently write his message, the end result will be a mix of both messages. This can also be demonstrated by several threads writing to an output stream.

I would say to understand thread safe, start with understanding difference between thread safe function and reentrant function.
Please check The difference between thread-safety and re-entrancy for details.

Tread-safe code is code that won't fail because the same data was changed in two places at once. Thread safe is a smaller concept than concurrency-safe, because it presumes that it was in fact two threads of the same program, rather than (say) hardware modifying data, or the OS.

A particularly valuable aspect of the term is that it lies on a spectrum of concurrent behavior, where thread safe is the strongest, interrupt safe is a weaker constraint than thread safe, and reentrant even weaker.
In the case of thread safe, this means that the code in question conforms to a consistent api and makes use of resources such that other code in a different thread (such as another, concurrent instance of itself) will not cause an inconsistency, so long as it also conforms to the same use pattern. the use pattern MUST be specified for any reasonable expectation of thread safety to be had.
The interrupt safe constraint doesn't normally appear in modern userland code, because the operating system does a pretty good job of hiding this, however, in kernel mode this is pretty important. This means that the code will complete successfully, even if an interrupt is triggered during its execution.
The last one, reentrant, is almost guaranteed with all modern languages, in and out of userland, and it just means that a section of code may be entered more than once, even if execution has not yet preceeded out of the code section in older cases. This can happen in the case of recursive function calls, for instance. It's very easy to violate the language provided reentrancy by accessing a shared global state variable in the non-reentrant code.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string