I'm working on a project that involves computationally intensive tasks that I want to run in parallel. To do so, I'm using multiple async statements to run the tasks and awaitAll to wait until all threads completed computation.
suspend fun tasks() {
coroutineScope {
val result = List (10) {
async {
// do stuff
}
}.awaitAll()
}
}
My question is how to bridge between this code that is run in parallel and regular synchronous code.
I tried to use runBlocking but that seems to run all the async tasks after one another, therefore defeating the whole purpose of using coroutines. The only way I got it to work was to use suspend functions all the way up to the main function, however that is not suitable in my case, as I rely on third-party libraries to call my code from regular functions.
Is there a way to call suspend functions from regular functions while still maintaining their ability to run in parallel?
The reason why subtasks were running sequentially is that by default runBlocking() utilizes the thread that invoked it to create a single-thread dispatcher and runs all coroutines using it. If your coroutines never suspend, they will be invoked sequentially.
In order to use another dispatcher, we simply need to pass it to runBlocking():
runBlocking(Dispatchers.Default) { ... }
However, if you don't use coroutines already and you don't plan to suspend, but only perform CPU-intensive tasks, then I'm not sure if you can at all benefit from coroutines. Coroutines are mostly for performing tasks that often have to wait for something, so we can utilize their suspend & resume feature. They are useful, if we need to often fork and join. But if we simply need to parallelize CPU computation, then the classic approach with executors will do.
Also, when bridging using runBlocking(), be careful to not invoke it from the coroutine itself. We can't block the thread while running inside the coroutine context. It could cause serious performance problems or even a deadlock. If you invoke runBlocking(), and then inside it, somewhere deep into the call stack, you invoke one of your bridges again, you will run into problems.
Related
In Scala, as explained in the PR that introduced it, parasitic allows to steal
execution time from other threads by having its
Runnables run on the Thread which calls execute and then yielding back control
to the caller after all its Runnables have been executed.
It appears to be a neat trick to avoid context switches when:
you are doing a trivial operation following on a Future coming from an actually long running operation, or
you are working with an API that doesn't allow you to specify an ExecutionContext for a Future but you but you would like to make sure the operation continues on that same thread, without introducing a different threadpool
The PR that originally introduced parasitic further explains that
When using parasitic with abstractions such as Future it will in many cases be non-deterministic
as to which Thread will be executing the logic, as it depends on when/if that Future is completed.
This concept is also repeated in the official Scala documentation in the paragraphs about Synchronous Execution Contexts:
One might be tempted to have an ExecutionContext that runs computations within the current thread:
val currentThreadExecutionContext = ExecutionContext.fromExecutor(
new Executor {
// Do not do this!
def execute(runnable: Runnable) { runnable.run() }
})
This should be avoided as it introduces non-determinism in the execution of your future.
Future {
doSomething
}(ExecutionContext.global).map {
doSomethingElse
}(currentThreadExecutionContext)
The doSomethingElse call might either execute in doSomething’s thread or in the main thread, and therefore be either asynchronous or synchronous. As explained here a callback should not be both.
I have a couple of doubts:
how is parasitic different from the synchronous execution context in the Scala documentation?
what is the source of non-determinism mentioned in both sources? From the comment in the PR that introduced parasitic it sounds like if doSomething completes very quickly it may return control to the main thread and you may end up actually not running doSomethingElse on a global thread but on the main one. That's what I could make of it, but I would like to confirm.
is there a reliable way to have a computation run on the same thread as its preceding task? I guess using a lazy implementation of the Future abstraction (e.g. IO in Cats) could make this more easy and reliable, but I'd like to understand if this is possible at all using the Future from the standard library.
parasitic has an upper bound on stack recursion, to try to mitigate the risk of StackOverflowErrors due to nested submissions, and can instead defer Runnables to a queue.
The source of non-determinisism is: If the Future is not yet completed: it will register to run on the completing thread. if the Future is completed it will run on the registering thread. Since those two situations can depend on timing, it is not deterministic which thread will execute the code.
How do you know A) which Thread that is, and B) whether it would ever be able to execute another task again?
I find that it is easier to think about Futures as read-handles for values that may or may not exist at a specific point in time. That nicely untangles the notion of Threads from the picture, and now it is rather about: When a value is available, I want to do X—and here is the ExecutionContext that will do X.
I have read about Kotlin coroutines recently and now, I wonder what is the difference between asyncTask class and multi Thread programming and coroutines? in what situation I should use each one?
AsyncTask is abstract class and it must be subclassed. AsyncTask has 4 steps: onPreExecute, doInBackground, onProgressUpdate and onPostExecute.
They are executed serially on single background thread.
If you want to fetch a URL or perform a heavyweight computation in Android, you have to use async programming.
they can be used when there is small task to communicate with main thread.
for tasks that use multiple instances in parallel.
Thread is a concurrent unit of execution. It has its own call stack.
With threads, the operating system switches running threads preemptively according to its scheduler.
they can be used for tasks running in parallel use Multiple threads.
for task where you want to control the CPU usage relative to the GUI thread.
Coroutines use to write asynchronous code that will look like normal sequential code.
they can provide a very high level of concurrency with very little overhead.
They are simple to read, unlike thread they are lightweight and unlike AsyncTask lot of them can run at same time.
AsyncTask was the first hand solution proposed by Google in Android SDK in order to process work in the background, while keeping the main thread free from many complex operations. In fact, AsyncTask let you to do complex processing in an asynchronous manner. Compared to classical Java Thread, the AsyncTask was somehow specialized, providing UI wrappers around threads in order to allow a more enjoyable experience as a developer, coding in an async way. The AsyncTask class was deprecated in the meantime and the recommended way to solve things is by using coroutines.
Coroutines are not a new concept introducer by Kotlin, in fact this concept exists in a lot of programming languages (Go has Goroutines and Java will provide something called Fibers). The main advantage of using coroutines is the simplicity of code, the only thing that differentiate a sync task/function in face of an async task/function is the usage of suspend keyword put in front of the function.
For example, the following function is executed in a synchronous way:
fun doSomething() = println("Print something")
while the following one is executed on a asynchronous way, due to the usage of suspend keyword:
suspend fun doSomething() = println("Print something")
When a suspend function is reached, the program will not block there, and will go further in running the rest of the code, but will receive a Continuation which will return the value computed by the suspended function when this one will be available.
In Documentation, Dart is Single Threaded but to perform two operations at a time we use future objects which work same as thread.
Use Future objects (futures) to perform asynchronous operations.
If Dart is single threaded then why it allows to perform asynchronous operations.
Note: Asynchronous operations are parallel operations which are called threads
You mentioned that :
Asynchronous operations are parallel operations which are called threads
First of all, Asynchronous operations are not exactly parallel or even concurrent. Its just simply means that we do not want to block our flow of execution(Thread) or wait for the response until certain work is done. But the way we implement Asynchronous operations could decide either it is parallel or concurrent.
Parallellism vs Concurrency ?
Parallelism is actually doing lots of things simultaneously at the
same time. ex - You are walking and at the same time you're digesting
you food. Both tasks are completely running parallel and exactly at the
same time.
While
Concurrency is the illusion of Parallelism.Tasks seems to be Executed
parallel but they aren't. It like handing lots of things at a time but
only doing one task at a specific time. ex - You are walking and suddenly stop to tie your show lace. After tying your shoe lace you again start walking.
Now coming to Dart, Future Objects along with async and await keywords are used to perform asynchronous task. Here asynchronous doesn't means that tasks will be executed parallel or concurrent to each other. Instead in Dart even the asynchronous task is executed on the same thread which means that while we wait for another task to be completed, we will continue executing our synchronous code . Future Objects are used to represent the result of task which will be done at some time in future.
If you want to really execute your task concurrently then consider using Isolates(Which runs in separate thread and doesn't shares it memory with the main thread(or spawning thread).
Why? Because it is a necessity. Some operations, like http requests or timers, are asynchronous in nature.
There are isolates which allow you to execute code in a different process. The difference to threads in other programming languages is that isolates do not share memory with each other (which would lead to concurrency issues), they only communicate through messages.
To receive these messages (or wrapped in a Future, the result of it), Dart uses an event loop.
The Event Loop and Dart
Are Futures in Dart threads?
Dart is single threaded, but it can call native code(like c/c++) to perform asynchronous operations, which can introduce new thread.
In Flutter, Flutter engine is implement in c++, which provide the low-level implementation of Flutter’s core API, including asynchronous tasks like file and network I/O through new thread underneath.
Like Dart, JavaScript is also single threaded, I find this video very helpful to understand "Single Threaded" thing. what the heck is event loop
Here are a few notes:
Asynchronous doesn't mean multi-threaded. It means the code is not run at the same time. Usually asyncronous just means that it is scheduled to be run on the same thread (Isolate) after other tasks have finished.
Dart isn't actually single threaded. You can create another thread by creating another Isolate. However, within an Isolate the Dart code runs on a single thread and separate Isolates don't share memory. They can only communicate by messages.
A Future says that a value (or an error) will be returned at some point in the future. It doesn't say which thread the work is done on. Most futures are done on the current Isolate, but some futures (IO, for example) can be done on separate threads.
See this answer for links to more resources.
I have an article explaining this https://medium.com/#truongsinh/flutter-dart-async-concurrency-demystify-1cc739aaae57
In short, Flutter/Dart is not technically single-threaded, even though Dart code is executed in a single thread. Dart is a concurrent language with message passing pattern, that can take full advantage of modern multi-core architecture, without worrying about lock or mutex. Blocking in Dart can be either I/O-bound or CPU-bound, which should be solved, respectively, by Future and Dart’s Isolate/Flutter’s compute.
Would it make sense to use both asyncio and threading in the same python project so that code runs in different threads where is some of them asyncio is used to get a sequentially looking code for asynchronous activities?
or would trying to do this mean that I am missing some basic concept on the usage of either threading or asyncio?
I didn't understand what you're asking (part about "sequentially looking code for asynchronous activities"), but since there's no answers I'll write some thoughts.
Let's talk why we need asyncio/threads at all. Imagine we have a task to make two requests.
If we will use plain one-thread non-async code, only option for us
is to make request for one url and only after it's done - for
another:
request(url1)
request(url2)
Problem here is that we do job ineffective: each function most time of it's execution do nothing just waiting for network results. It would be cool if we somehow would be able to use CPU for second request while first one stuck with network stuff and don't need it.
This problem can be solved (and usually solves) by running functions in different threads:
with ThreadPoolExecutor(max_workers=2) as e:
e.submit(request, url1)
e.submit(request, url2)
We would get results faster this way. While first request is stuck with network, CPU would be able to do something useful for second request in another thread.
This is however not ideal solution: switching between threads have some cost, executing flow is more complex than in the first example.
There should be way better.
Use one function idle period to start executing another function is what asyncio in general about:
await asyncio.gather(
async_request(url1),
async_request(url2),
)
Event loop manages execution flow: when first coroutine reaches some I/O operation and CPU can be used to do job elsewhere, second coroutine starts. Later event loop returns to remain executing of first coroutine.
We get "parallel" requests and clean understandable code. Since we have parallelization in single thread, we just don't need another.
Actually, when we use asyncio threads still can be useful. If we ready to pay for them, they can help us to cast synchronous I/O functions to asynchronous very quickly:
async def async_request(url):
loop = asyncio.get_event_loop()
return (await loop.run_in_executor(None, request, url))
But again, it's optional and we usually can find module to make requests (and other I/O tasks) asynchronously without threads.
I didn't face with any other tasks when threads can be useful in asynchronous programs.
Sure it may make sense.
Asynchronous code in principle runs a bunch of routines in the same thread.
This means that the moment one routine has to wait for input or output (I/O) it will halt that routine temporarily and simply starts processing another routine until it encounters a wait there, etc.
Multi-threaded (or "parallelized" code) runs in principle at the same time on different cores of your machine. (Note that in Python parallel processing is achieved by using multiple processes as pointed out by #Yassine Faris below).
It may make perfect sense to use both in the same program. Use asyncio in order to keep processing while waiting for I/O. Use multi-threading (multi processing in Python) to do, for example, heavy calculations in parallel in another part of your program.
Does an asynchronous call always create a new thread? What is the difference between the two?
Does an asynchronous call always create or use a new thread?
Wikipedia says:
In computer programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
I know async calls can be done on single threads? How is this possible?
Whenever the operation that needs to happen asynchronously does not require the CPU to do work, that operation can be done without spawning another thread. For example, if the async operation is I/O, the CPU does not have to wait for the I/O to complete. It just needs to start the operation, and can then move on to other work while the I/O hardware (disk controller, network interface, etc.) does the I/O work. The hardware lets the CPU know when it's finished by interrupting the CPU, and the OS then delivers the event to your application.
Frequently higher-level abstractions and APIs don't expose the underlying asynchronous API's available from the OS and the underlying hardware. In those cases it's usually easier to create threads to do asynchronous operations, even if the spawned thread is just waiting on an I/O operation.
If the asynchronous operation requires the CPU to do work, then generally that operation has to happen in another thread in order for it to be truly asynchronous. Even then, it will really only be asynchronous if there is more than one execution unit.
This question is darn near too general to answer.
In the general case, an asynchronous call does not necessarily create a new thread. That's one way to implement it, with a pre-existing thread pool or external process being other ways. It depends heavily on language, object model (if any), and run time environment.
Asynchronous just means the calling thread doesn't sit and wait for the response, nor does the asynchronous activity happen in the calling thread.
Beyond that, you're going to need to get more specific.
No, asynchronous calls do not always involve threads.
They typically do start some sort of operation which continues in parallel with the caller. But that operation might be handled by another process, by the OS, by other hardware (like a disk controller), by some other computer on the network, or by a human being. Threads aren't the only way to get things done in parallel.
JavaScript is single-threaded and asynchronous. When you use XmlHttpRequest, for example, you provide it with a callback function that will be executed asynchronously when the response returns.
John Resig has a good explanation of the related issue of how timers work in JavaScript.
Multi threading refers to more than one operation happening in the same process. While async programming spreads across processes. For example if my operations calls a web service, The thread need not wait till the web service returns. Here we use async programming which allows the thread not wait for a process in another machine to complete. And when it starts getting response from the webservice it can interrupt the main thread to say that web service has completed processing the request. Now the main thread can process the result.
Windows always had asynchronous processing since the non preemptive times (versions 2.13, 3.0, 3.1, etc) using the message loop, way before supporting real threads. So to answer your question, no, it is not necessary to create a thread to perform asynchronous processing.
Asynchronous calls don't even need to occur on the same system/device as the one invoking the call. So if the question is, does an asynchronous call require a thread in the current process, the answer is no. However, there must be a thread of execution somewhere processing the asynchronous request.
Thread of execution is a vague term. In a cooperative tasking systems such as the early Macintosh and Windows OS'es, the thread of execution could simply be the same process that made the request running another stack, instruction pointer, etc... However, when people generally talk about asynchronous calls, they typically mean calls that are handled by another thread if it is intra-process (i.e. within the same process) or by another process if it is inter-process.
Note that inter-process (or interprocess) communication (IPC) is commonly generalized to include intra-process communication, since the techniques for locking, and synchronizing data are usually the same regardless of what process the separate threads of execution run in.
Some systems allow you to take advantage of the concurrency in the kernel for some facilities using callbacks. For a rather obscure instance, asynchronous IO callbacks were used to implement non-blocking internet severs back in the no-preemptive multitasking days of Mac System 6-8.
This way you have concurrent execution streams "in" you program without threads as such.
Asynchronous just means that you don't block your program waiting for something (function call, device, etc.) to finish. It can be implemented in a separate thread, but it is also common to use a dedicated thread for synchronous tasks and communicate via some kind of event system and thus achieve asynchronous-like behavior.
There are examples of single-threaded asynchronous programs. Something like:
...do something
...send some async request
while (not done)
...do something else
...do async check for results
The nature of asynchronous calls is such that, if you want the application to continue running while the call is in progress, you will either need to spawn a new thread, or at least utilise another thread you that you have created solely for the purposes of handling asynchronous callbacks.
Sometimes, depending on the situation, you may want to invoke an asynchronous method but make it appear to the user to be be synchronous (i.e. block until the asynchronous method has signalled that it is complete). This can be achieved through Win32 APIs such as WaitForSingleObject.