Hi, is there is a better way to wait for an async execution to finish in rust - rust

Just to explain, I send a command and an id to the other thread, the other thread executes the thing, and return the command and id as a tuple. Because multiple functions might be called at nearly the same time and they are io operations, they might be returned in a different order than they came. So, I have this function to wait for the right return value and put the rest into a map(the variable called results) so that the other functions calling can retrieve theirs as well. However, I think if there is a case of two functions calling this one at the same time, they might be racing to retrieve the value at the channels. I just think there should be a more elegant way of doing this.
For now I have this function to wait for an async execution in another thread:
async fn wait_for(&mut self, id : u64) -> (u64, io::Result<SomeResult>) {
loop {
let temp = self.channels.1.recv().await.unwrap();
if temp.0 == id {
return temp
} else {
self.results.insert(temp.0, temp.1);
}
if let Some(res) = self.results.remove(&id) {
return (id, res)
}
}
}
P.S. I also had a thought of having another thread do the checking and return the results in order, but that pretty much defeats the purpose of async/await (as far as I know).

Related

How to leverage concurrency for processing large array of data in rust?

I am a javascript developer and I am building a desktop application using Tauri. I am used to single-threaded languages and trying to get comfortable with the concept of concurrency.
My problem can be summarized as follows. I receive a JSON array from the backend of length 500. I loop through this array and perform some asynchronous operations like making a network call. In the end, I aggregate, structure, and return the data. This entire process takes around 25-35 seconds on my machine.
I wanted to leverage the concept of concurrency to reduce the time required for this operation. One possible solution that I thought of, was to create n number of threads, let's say 8, and parallelly process the data.
async fun main(){
// creating variables which will hold final structured data
let app_data;
let app_to_bundle_map;
// create 8 threads
let thread_1 = thread::spawn(|| {})
let thread_2 = thread::spawn(|| {})
//.. so on
for i in 1..500{
let thread_assigned = i % 8
match thread_assigned {
1 => {
// process_data() on thread 1 and insert into app_data & app_to_bundle_map.
// But how do I assign process_data() to the thread's closure?
// How do I make sure thread 1 is available for use?
}
2 => {
// process_data() on thread 2 and insert into app_data & app_to_bundle_map
}
_ => {
// process_data() on thread _ and insert into app_data & app_to_bundle_map
}
}
}
}
fun process_data(item: String, data: &Data){
// perform some heavy operations like
make_network_call(item.url)
// perform more operations and modify the function argument
more_processing()
}
async fun make_a_network_call(url: String) -> String{
let client = reqwest::Client::builder().build().unwrap();
let mut _res: Result<Response, Error> = client.get(url).send().await;
match _res{
OK(_res) => {
// return response
}
Err(_res) => {
format!(r#"{{"error": "{placeholder}"}}"#, placeholder = _res.to_string())
}
}
}
Another option I thought of was dividing my data of size 500 into 8 parts and then processing them parallelly. Is this a better approach? Or are both the approaches wrong, if so, what do you suggest is the correct method to solve such problems in rust? Overall, my final goal is to reduce the time from 25-35 seconds to less than 10 seconds. Looking forward to everybody's insights. Thank you in advance.
Concurrency is hard [citation needed]. What you are trying to do here is manually handling it. That of course could work, but it will be a pain, especially if you are a beginner. Luckily there are fantastic libraries already out there that provide handling concurrency part for you. You should looking if they already provide what you need.From your description I am not quite sure if you are CPU or IO bound.
If you are CPU bound you should look at rayon crate. It allows to easily iterate in parallel over some iterator.
If you are IO bound you should look at async rust. There are many libraries, that allow to do many things, but I would recommend tokio to begin with. It is production-ready and has great emphasis put on networking. You would need however to learn a bit about async rust, as it requires different thinking model than normal synchronous code.
And regardless of which one you choose you should familiarize yourself with channels. They are a great and easy tool for passing data around, including from one thread to another.

How to understand FusedFuture

Recently, I am reading source code in the repo parity-bridges-common. There are some rs files full of incomprehensibly asynchronous syntax that I am not familiar with. Especially about FusedFuture. Example file:
sync_loop.rs
message_loop.rs
According to the futures::future::fuse, what I understand is that we can transfer a Future into FusedFuture and then poll it, again and again, for many times. Besides this, I find a function named terminated() in FuseFurture and example below.
Creates a new Fuse-wrapped future which is already terminated.
This can be useful in combination with looping and the select! macro, which bypasses terminated futures.
let (sender, mut stream) = mpsc::unbounded();
// Send a few messages into the stream
sender.unbounded_send(()).unwrap();
sender.unbounded_send(()).unwrap();
drop(sender);
// Use `Fuse::terminated()` to create an already-terminated future
// which may be instantiated later.
// bear: We must terminated() here in order to use select!?
let foo_printer = Fuse::terminated();
pin_mut!(foo_printer);
loop {
select! {
_ = foo_printer => {},
() = stream.select_next_some() => {
if !foo_printer.is_terminated() {
println!("Foo is already being printed!");
} else {
// bear: here we reset the foo_printer pointed value?
foo_printer.set(async {
// do some other async operations
println!("Printing foo from `foo_printer` future");
}.fuse());
}
},
complete => break, // `foo_printer` is terminated and the stream is done
}
}
I cannot understand when to use this function and how to combine it with select!. Can anyone help me understand it more easily? Or are there better docs or examples about this usage?
Some posts i found, maybe useful:
Why doesn’t tokio::select! require FusedFuture?
What is the difference between futures::select! and tokio::select?

Polling many futures of different types

I'm trying to understand how to implement polling multiple futures with different types. For context, I'm calling an API that will return something like:
[{"type": "source_a", "id": 123}, {"type": "source_b", "id": 234}, ...]
Each type in the API response requires a call to another API, with each API returning different data types. The code I've written works something like this:
async fn get_data(sources: Vec<Source>) -> Data {
let mut data = Default::default();
for source in sources {
if source.kind == "source_a" {
let source_data = get_source_a(source).await;
process_source_a(source_data, &mut data);
} else if source.kind == "source_b" {
...
}
}
data
}
This won't run concurrently, it will simply fetch sources one at a time and process them. How can I rewrite this so that each source is fetched concurrently and then processed once data is available? Speaking Rustily, I think what I want is to execute a closure that mutably borrows data when the future is ready. Should I be looking at something like an Arc<RefCell<Data>>?
To process the futures in parallel, you need to await something like join_all, which will run them concurrently and return when they are all done. For this to work, you have to resolve two issues:
join_all requires futures of the same type, so you need to box them or otherwise unify them.
data needs to be accessed by multiple async blocks, so it needs to be protected by Arc and Mutex.
The first issue can be solved simply by spawning the async fns as tasks, which has the added advantage of potentially running them in parallel (in addition to them being run concurrently). The example below uses tokio::spawn, but it would be almost exactly the same for async_std. Since there is no reproducible example, I can't test the code, but it could look like this:
async fn get_data(sources: Vec<Source>) -> Data {
let data = Arc::new(Mutex::new(Data::default()));
let mut tasks = vec![];
for source in sources {
if source.kind == "source_a" {
let data = Arc::clone(&data);
tasks.push(tokio::task::spawn(async move {
let source_data = get_source_a(source).await;
process_source_a(source_data, &mut data.lock().unwrap());
}));
} else if source.kind == "source_b" {
// ...
}
}
// Wait for all sources to finish and propagate the panic if any.
// With async_std this wouldn't require the `for_each()`.
futures::future::join_all(tasks)
.await
.for_each(|x| x.unwrap());
// As all tasks are done, there should be no references to `data` at
// this point, so we can extract it out of the Arc<Mutex<_>> wrapping.
data.try_unwrap().unwrap().into_inner()
}

what is the meaning of "await" used in Rust?

This question may somewhat relate more to async-programming than Rust.But after googling a lot, there are still somepoint I think is missing. And since I am learning Rust, I would put it in a Rust way.
Let me give my understanding of async-programming first---After all, this is the basis, maybe I am wrong or not:
To make program run efficiently, dealing tasks concurrently is essential. Then thread is used, and the thread could be joined whenever the data from the thread is needed. But thread is not enough to handle many tasks,like a server does. Then thread-pool is used, but how to fetch data when it is needed with no information of which thread should be waiting for? Then callback function(cb for short) comes up.With cb,only what needs to do in cb should be considered. In addition, to make cpu little overhead, green thread comes up.
But what if the asyn-waiting things need to do one after another, which leads to "callback hell"? Ok, the "future/promise" style comes up, which let code looks like sync-code, or maybe like a chain(like in javascript). But still the code looks not quite nice. Finally, the "async/await" style comes up, as another syntactic sugar for "future/promise" style. And usually, the "async/await" with green thread style is called "coroutine", be it using only one native thread or multi-native threads over async tasks.
=============================================
As far as I know at this point, as keyword "await" can only be used in the scope of an "async" function, and only "async" function could be "awaited". But why? And what is it used to, as there is already "async"? Anyway, I tested the code below:
use async_std::{task};
// async fn easy_task() {
// for i in 0..100 {
// dbg!(i);
// }
// println!("finished easy task");
// }
async fn heavy_task(cnt1: i32, cnt2: i32) {
for i in 0..cnt1 {
println!("heavy_task1 cnt:{}", i);
}
println!("heavy task: waiting sub task");
// normal_sub_task(cnt2);
sub_task(cnt2).await;
println!("heavy task: sub task finished");
for i in 0..cnt1 {
println!("heavy_task2 cnt:{}", i);
}
println!("finished heavy task");
}
fn normal_sub_task(cnt: i32) {
println!("normal sub_task: start sub task");
for i in 0..cnt {
println!("normal sub task cnt:{}", i);
}
println!("normal sub_task: finished sub task");
}
async fn sub_task(cnt: i32) {
println!("sub_task: start sub task");
for i in 0..cnt {
println!("sub task cnt:{}", i);
}
println!("sub_task: finished sub task");
}
fn outer_task(cnt: i32) {
for i in 0..cnt {
println!("outer task cnt:{}", i);
}
println!("finished outer task");
}
fn main() {
// let _easy_f = easy_task();
let heavy_f = heavy_task(3000, 500);
let handle = task::spawn(heavy_f);
print!("=================after spawn==============");
outer_task(5000);
// task::join_handle(handle);
task::block_on(handle);
}
the conclusion I got from test is:
1.No matter awaiting async sub_task or just doing normal_sub_task(sync version) in the middle of async heavy_task(), the code below that (the heavy loop task2) would not cut in line.
2.No matter awaiting async sub_task or just doing normal_sub_task(sync version) in the middle of async heavy_task(), the outer_task would sometimes cut in line, breaking the heavy_task1 or async_sub_task/normal_sub_task.
Therefore, what is the meaning of "await", it seems that only keyword "asyc" is used here.
reference:
asyc_std
sing_dance_example from rust asyncbook
module Task in official rust module
recommended article of rust this week about async-programming
another article about rust thread and async-programming using future crates
stackoverflow question:What is the purpose of async/await in Rust?
the conclusion 2 I got seems to be violated against what Shepmaster said, "...we felt async functions should run synchronously to the first await."
The await keyword suspends the execution of an asynchronous function until the awaited future (future.await) produces a value.
It is the same meaning of all the other languages that uses the await concept.
When a future is awaited the "status of execution" of the async function is persisted into an internal
execution context and others async functions have the opportunity to progress if they are ready to run.
When the awaited future completes the async function resumes at the exact point of suspension.
If you think I need only async and write something like:
// OK: let result = future.await
let result = future
You don't get a value but something that represents a value ready in the future.
And if you mark async a function without awaiting anything inside
the body of the function you are injecting into an asynchronous engine a sequential task
that when executed will run to completion as a normal function, preventing asynchronous behavoir.
Some more comments about your code
Probably the confusion arise from a misunderstaning ot the task concept.
When learning async in rust I found the async book pretty useful.
The book define tasks as:
Tasks are the top-level futures that have been submitted to an executor
heavy_task is really the unique task in your example because it is the only future submitted to the async
runtime with task::block_on.
For example, the function outer_task has nothing to do with asynchronous world:
it is not a task, it get excuted immediately when called.
heavy_task behaves asychronously and await
sub_task(cnt2) future ... but sub_task future once executed
goes immediately to completion.
So, as it stand, your code behave practically as sequential.
But keep in mind that things in reality are more subtle, because in presence of other async tasks the
await inside heavy_task works as a suspension point and gives opportunity to other
tasks to be executed toward completion.

Function/Code Design with Concurrency in Swift

I'm trying to create my first app in Swift which involves making multiple requests to a website. These requests are each done using the block
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in ... }
task.resume()
From what I understand this block uses a thread different to the main thread.
My question is, what is the best way to design code that relies on the values in that block? For instance, the ideal design (however not possible due to the fact that the thread executing these blocks is not the main thread) is
func prepareEmails() {
var names = getNames()
var emails = getEmails()
...
sendEmails()
}
func getNames() -> NSArray {
var names = nil
....
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
names = ...
})
task.resume()
return names
}
func getEmails() -> NSArray {
var emails = nil
....
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
emails = ...
})
task.resume()
return emails
}
However in the above design, most likely getNames() and getEmails() will return nil, as the the task will not have updated emails/name by the time it returns.
The alternative design (which I currently implement) is by effectively removing the 'prepareEmails' function and doing everything sequentially in the task functions
func prepareEmails() {
getNames()
}
func getNames() {
...
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
getEmails(names)
})
task.resume()
}
func getEmails(names: NSArray) {
...
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
sendEmails(emails, names)
})
task.resume()
}
Is there a more effective design than the latter? This is my first experience with concurrency, so any advice would be greatly appreciated.
The typical pattern when calling an asynchronous method that has a completionHandler parameter is to use the completionHandler closure pattern, yourself. So the methods don't return anything, but rather call a closure with the returned information as a parameter:
func getNames(completionHandler:(NSArray!)->()) {
....
let task = NSURLSession.sharedSession().dataTaskWithRequest(request) {data, response, error -> Void in
let names = ...
completionHandler(names)
}
task.resume()
}
func getEmails(completionHandler:(NSArray!)->()) {
....
let task = NSURLSession.sharedSession().dataTaskWithRequest(request) {data, response, error -> Void in
let emails = ...
completionHandler(emails)
}
task.resume()
}
Then, if you need to perform these sequentially, as suggested by your code sample (i.e. if the retrieval of emails was dependent upon the names returned by getNames), you could do something like:
func prepareEmails() {
getNames() { names in
getEmails() {emails in
sendEmails(names, emails) // I'm assuming the names and emails are in the input to this method
}
}
}
Or, if they can run concurrently, then you should do so, as it will be faster. The trick is how to make a third task dependent upon two other asynchronous tasks. The two traditional alternatives include
Wrapping each of these asynchronous tasks in its own asynchronous NSOperation, and then create a third task dependent upon those other two operations. This is probably beyond the scope of the question, but you can refer to the Operation Queue section of the Concurrency Programming Guide or see the Asynchronous vs Synchronous Operations and Subclassing Notes sections of the NSOperation Class Reference.
Use dispatch groups, entering the group before each request, leaving the group within the completion handler of each request, and then adding a dispatch group notification block (called when all of the group "enter" calls are matched by their corresponding "leave" calls):
func prepareEmails() {
let group = dispatch_group_create()
var emails: NSArray!
var names: NSArray!
dispatch_group_enter(group)
getNames() { results in
names = results
dispatch_group_leave(group)
}
dispatch_group_enter(group)
getEmails() {results in
emails = results
dispatch_group_leave(group)
}
dispatch_group_notify(group, dispatch_get_main_queue()) {
if names != nil && emails != nil {
self.sendEmails(names, emails)
} else {
// one or both of those requests failed; tell the user
}
}
}
Frankly, if there's any way to retrieve both the emails and names in a single network request, that's going to be far more efficient. But if you're stuck with two separate requests, you could do something like the above.
Note, I wouldn't generally use NSArray in my Swift code, but rather use an array of String objects (e.g. [String]). Furthermore, I'd put in error handling where I return the nature of the error if either of these fail. But hopefully this illustrates the concepts involved in (a) writing your own methods with completionHandler blocks; and (b) invoking a third bit of code dependent upon the completion of two other asynchronous tasks.
The answers above (particularly Rob's DispatchQueue based answer) describe the concurrency concepts necessary to run two tasks in parallel and then respond to the result. The answers lack error handling for clarity because traditionally, correct solutions to concurrency problems are quite verbose.
Not so with HoneyBee.
HoneyBee.start()
.setErrorHandler(handleErrorFunc)
.branch {
$0.chain(getNames)
+
$0.chain(getEmails)
}
.chain(sendEmails)
This code snippet manages all of the concurrency, routes all errors to handleErrorFunc and looks like the concurrent pattern that is desired.

Resources