When should you use tokio::join!() over tokio::spawn()? - rust

Let's say I want to download two web pages concurrently with Tokio...
Either I could implement this with tokio::spawn():
async fn v1() {
let t1 = tokio::spawn(reqwest::get("https://example.com"));
let t2 = tokio::spawn(reqwest::get("https://example.org"));
let (r1, r2) = (t1.await.unwrap(), t2.await.unwrap());
println!("example.com = {}", r1.unwrap().status());
println!("example.org = {}", r2.unwrap().status());
}
Or I could implement this with tokio::join!():
async fn v2() {
let t1 = reqwest::get("https://example.com");
let t2 = reqwest::get("https://example.org");
let (r1, r2) = tokio::join!(t1, t2);
println!("example.com = {}", r1.unwrap().status());
println!("example.org = {}", r2.unwrap().status());
}
In both cases, the two requests are happening concurrently. However, in the second case, the two requests are running in the same task and therefore on the same thread.
So, my questions are:
Is there an advantage to tokio::join!() over tokio::spawn()?
If so, in which scenarios? (it doesn't have to do anything with downloading web pages)
I'm guessing there's a very small overhead to spawning a new task, but is that it?

The difference will depend on how you have configured the runtime. tokio::join! will run tasks concurrently in the same task, while tokio::spawn! creates a new task for each.
In a single-threaded runtime, these are effectively the same. In a multi-threaded runtime, using tokio::spawn! twice like that may use two separate threads.
From the docs for tokio::join!:
By running all async expressions on the current task, the expressions are able to run concurrently but not in parallel. This means all expressions are run on the same thread and if one branch blocks the thread, all other expressions will be unable to continue. If parallelism is required, spawn each async expression using tokio::spawn and pass the join handle to join!.
For IO-bound tasks, like downloading web pages, you aren't going to notice the difference; most of the time will be spent waiting for packets and each task can efficiently interleave their processing.
Use tokio::spawn! when tasks are more CPU-bound and could block each other.

I would typically look at this from the other angle; why would I use tokio::spawn over tokio::join? Spawning a new task has more constraints than joining two futures, the 'static requirement can be very annoying and as such is not my go-to choice.
In addition to the cost of spawning the task, that I would guess is fairly marginal, there is also the cost of signaling the original task when its done. That I would also guess is marginal but you'd have to measure them in your environment and async workloads to see if they actually have an impact or not.
But you're right, the biggest boon to using two tasks is that they have the opportunity to work in parallel, not just concurrently. But on the other hand, async is most suited to I/O-bound workloads where there is lots of waiting and, depending on your workload, is probably unlikely that this lack of parallelism would have much impact.
All in all, tokio::join is a nicer and more flexible to use and I doubt the technical difference would make an impact on performance. But as always: measure!

#kmdreko's answer was great and I'd like to add some details to it!
As mentioned, using tokio::spawn has a 'static requirement, so the following snippet doesn't compile:
async fn v1() {
let url = String::from("https://example.com");
let t1 = tokio::spawn(reqwest::get(&url)); // `url` does not live long enough
let t2 = tokio::spawn(reqwest::get(&url));
let (r1, r2) = (t1.await.unwrap(), t2.await.unwrap());
}
However, the equivalent snippet with tokio::join! does compile:
async fn v2() {
let url = String::from("https://example.com");
let t1 = reqwest::get(&url);
let t2 = reqwest::get(&url);
let (r1, r2) = tokio::join!(t1, t2);
}
Also, that answer got me curious about the cost of spawning a new task so I wrote the following simple benchmark:
use std::time::Instant;
#[tokio::main]
async fn main() {
let now = Instant::now();
for _ in 0..100_000 {
v1().await;
}
println!("tokio::spawn = {:?}", now.elapsed());
let now = Instant::now();
for _ in 0..100_000 {
v2().await;
}
println!("tokio::join! = {:?}", now.elapsed());
}
async fn v1() {
let t1 = tokio::spawn(do_nothing());
let t2 = tokio::spawn(do_nothing());
t1.await.unwrap();
t2.await.unwrap();
}
async fn v2() {
let t1 = do_nothing();
let t2 = do_nothing();
tokio::join!(t1, t2);
}
async fn do_nothing() {}
In release mode, I get the following output on my macOS laptop:
tokio::spawn = 862.155882ms
tokio::join! = 369.603µs
EDIT: This benchmark is flawed in many ways (see comments), so don't rely on it for the specific numbers. However, the conclusion that spawning is more expensive than joining 2 tasks seems to be true.

Related

How to terminate a blocking tokio task?

In my application I have a blocking task that synchronically reads messages from a queue and feeds them to a running task.
All of this works fine, but the problem that I'm having is that the process does not terminate correctly, since the queue_reader task does not stop.
I've constructed a small example based on the tokio documentation at: https://docs.rs/tokio/1.20.1/tokio/task/fn.spawn_blocking.html
use tokio::sync::mpsc;
use tokio::task;
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
loop {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
println!("Finalizing thread");
queue_reader.abort(); // This doesn't seem to terminate the queue_reader task
queue_reader.await.unwrap(); // <-- The process hangs on this task.
println!("Done");
}
At first I expected that queue_reader.abort() should terminate the task, however it doesn't. My expectation is that tokio can only do this for tasks that use .await internally, because that will handle control over to tokio. Is this right?
In order to terminate the queue_reader task I introduced a oneshot channel, over which I signal the termination, as shown in the next snippet.
use tokio::task;
use tokio::sync::{oneshot, mpsc};
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// A new channel to communicate when the process must finish.
let (term_tx, mut term_rx) = oneshot::channel();
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
// As long as termination is not signalled
while term_rx.try_recv().is_err() {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
// Signal termination
term_tx.send(()).unwrap();
println!("Finalizing thread");
queue_reader.await.unwrap();
println!("Done");
}
My question is, is this the canonical/best way to do this, or are there better alternatives?
Tokio cannot terminate CPU-bound/blocking tasks.
It is technically possible to kill OS threads, but generally it is not a good idea, as it's expensive to create new threads and it can leave your program in an invalid state. Even if Tokio decided this was something worth implementing, it would serverely limit its implementation - it would be forced into a multithread model, just to support the possibility that you'd want to kill a blocking task before it's finished.
Your solution is pretty good; give your blocking task the responsibility for terminating itself and provide a way to tell it to do so. If this future was part of a library, you could abstract the mechanism away by returning a "handle" to the task that had a cancel() method.
Are there better alternatives? Maybe, but that would depend on other factors. Your solution is good and easily extended, for example if you later needed to send different types of signal to the task.

Rust: Safe multi threading with recursion

I'm new to Rust.
For learning purposes, I'm writing a simple program to search for files in Linux, and it uses a recursive function:
fn ffinder(base_dir:String, prmtr:&'static str, e:bool, h:bool) -> std::io::Result<()>{
let mut handle_vec = vec![];
let pth = std::fs::read_dir(&base_dir)?;
for p in pth {
let p2 = p?.path().clone();
if p2.is_dir() {
if !h{ //search doesn't include hidden directories
let sstring:String = get_fname(p2.display().to_string());
let slice:String = sstring[..1].to_string();
if slice != ".".to_string() {
let handle = thread::spawn(move || {
ffinder(p2.display().to_string(),prmtr,e,h);
});
handle_vec.push(handle);
}
}
else {//search include hidden directories
let handle2 = thread::spawn(move || {
ffinder(p2.display().to_string(),prmtr,e,h);
});
handle_vec.push(handle2);
}
}
else {
let handle3 = thread::spawn(move || {
if compare(rmv_underline(get_fname(p2.display().to_string())),rmv_underline(prmtr.to_string()),e){
println!("File found at: {}",p2.display().to_string().blue());
}
});
handle_vec.push(handle3);
}
}
for h in handle_vec{
h.join().unwrap();
}
Ok(())
}
I've tried to use multi threading (thread::spawn), however it can create too many threads, exploding the OS limit, which breaks the program execution.
Is there a way to multi thread with recursion, using a safe,limited (fixed) amount of threads?
As one of the commenters mentioned, this is an absolutely perfect case for using Rayon. The blog post mentioned doesn't show how Rayon might be used in recursion, only making an allusion to crossbeam's scoped threads with a broken link. However, Rayon provides its own scoped threads implementation that solves your problem as well in that only uses as many threads as you have cores available, avoiding the error you ran into.
Here's the documentation for it:
https://docs.rs/rayon/1.0.1/rayon/fn.scope.html
Here's an example from some code I recently wrote. Basically what it does is recursively scan a folder, and each time it nests into a folder it creates a new job to scan that folder while the current thread continues. In my own tests it vastly outperforms a single threaded approach.
let source = PathBuf::from("/foo/bar/");
let (tx, rx) = mpsc::channel();
rayon::scope(|s| scan(&source, tx, s));
fn scan<'a, U: AsRef<Path>>(
src: &U,
tx: Sender<(Result<DirEntry, std::io::Error>, u64)>,
scope: &Scope<'a>,
) {
let dir = fs::read_dir(src).unwrap();
dir.into_iter().for_each(|entry| {
let info = entry.as_ref().unwrap();
let path = info.path();
if path.is_dir() {
let tx = tx.clone();
scope.spawn(move |s| scan(&path, tx, s)) // Recursive call here
} else {
// dbg!("{}", path.as_os_str().to_string_lossy());
let size = info.metadata().unwrap().len();
tx.send((entry, size)).unwrap();
}
});
}
I'm not an expert on Rayon, but I'm fairly certain the threading strategy works like this:
Rayon creates a pool of threads to match the number of logical cores you have available in your environment. The first call to the scoped function creates a job that the first available thread "steals" from the queue of jobs available. Each time we make another recursive call, it doesn't necessarily execute immediately, but it creates a new job that an idle thread can then "steal" from the queue. If all of the threads are busy, the job queue just fills up each time we make another recursive call, and each time a thread finishes its current job it steals another job from the queue.
The full code can be found here: https://github.com/1Dragoon/fcp
(Note that repo is a work in progress and the code there is currently typically broken and probably won't work at the time you're reading this.)
As an exercise to the reader, I'm more of a sys admin than an actual developer, so I also don't know if this is the ideal approach. From Rayon's documentation linked earlier:
scope() is a more flexible building block compared to join(), since a loop can be used to spawn any number of tasks without recursing
The language of that is a bit confusing. I'm not sure what they mean by "without recursing". Join seems to intend for you to already have tasks known about ahead of time and to execute them in parallel as threads become available, whereas scope seems to be more aimed at only creating jobs when you need them rather than having to know everything you need to do in advance. Either that or I'm understanding their meaning backwards.

Polling many futures of different types

I'm trying to understand how to implement polling multiple futures with different types. For context, I'm calling an API that will return something like:
[{"type": "source_a", "id": 123}, {"type": "source_b", "id": 234}, ...]
Each type in the API response requires a call to another API, with each API returning different data types. The code I've written works something like this:
async fn get_data(sources: Vec<Source>) -> Data {
let mut data = Default::default();
for source in sources {
if source.kind == "source_a" {
let source_data = get_source_a(source).await;
process_source_a(source_data, &mut data);
} else if source.kind == "source_b" {
...
}
}
data
}
This won't run concurrently, it will simply fetch sources one at a time and process them. How can I rewrite this so that each source is fetched concurrently and then processed once data is available? Speaking Rustily, I think what I want is to execute a closure that mutably borrows data when the future is ready. Should I be looking at something like an Arc<RefCell<Data>>?
To process the futures in parallel, you need to await something like join_all, which will run them concurrently and return when they are all done. For this to work, you have to resolve two issues:
join_all requires futures of the same type, so you need to box them or otherwise unify them.
data needs to be accessed by multiple async blocks, so it needs to be protected by Arc and Mutex.
The first issue can be solved simply by spawning the async fns as tasks, which has the added advantage of potentially running them in parallel (in addition to them being run concurrently). The example below uses tokio::spawn, but it would be almost exactly the same for async_std. Since there is no reproducible example, I can't test the code, but it could look like this:
async fn get_data(sources: Vec<Source>) -> Data {
let data = Arc::new(Mutex::new(Data::default()));
let mut tasks = vec![];
for source in sources {
if source.kind == "source_a" {
let data = Arc::clone(&data);
tasks.push(tokio::task::spawn(async move {
let source_data = get_source_a(source).await;
process_source_a(source_data, &mut data.lock().unwrap());
}));
} else if source.kind == "source_b" {
// ...
}
}
// Wait for all sources to finish and propagate the panic if any.
// With async_std this wouldn't require the `for_each()`.
futures::future::join_all(tasks)
.await
.for_each(|x| x.unwrap());
// As all tasks are done, there should be no references to `data` at
// this point, so we can extract it out of the Arc<Mutex<_>> wrapping.
data.try_unwrap().unwrap().into_inner()
}

Runtime returning without completing all tasks despite using block_on in Tokio

I am writing a multi-threaded concurrent Kafka producer using Rust and Tokio. The project has 2 modes, an interactive mode that runs in an infinite loop and a file mode which takes a file as an argument and then reads the file and sends these messages to Kafka via multiple threads. Interactive mode works fine! but file mode has issues.
To achieve this, I had initially started with Rayon, but then switched to a more flexible runtime; tokio. Now, I am able to parallelize the task of sending data over a specified number of threads within tokio, however, it seems that runtime is getting dropped before all messages are produced. Here is my code:
pub fn worker(brokers: String, f: File, t: usize, topic: Arc<String>) {
let reader = BufReader::new(f);
let mut rt = runtime::Builder::new()
.threaded_scheduler()
.core_threads(t)
.build()
.unwrap();
let producers: Arc<Vec<Mutex<BaseProducer>>> = Arc::new(
(0..t)
.map(|_| get_producer(&brokers))
.collect::<Vec<Mutex<BaseProducer>>>(),
);
let acounter = atomic::AtomicUsize::new(0);
let _results: Vec<_> = reader
.lines()
.map(|line| line.unwrap())
.map(move |line| {
let prods = producers.clone();
let tp = topic.clone();
let cnt = acounter.swap(
(acounter.load(atomic::Ordering::SeqCst) + 1) % t,
atomic::Ordering::SeqCst,
);
rt.block_on(async move {
match prods[cnt]
.lock()
.unwrap()
.send(BaseRecord::to(&(*tp)).payload(&line).key(""))
{
Ok(_) => (),
Err(e) => eprintln!("{:?}", e),
};
})
})
.collect();
}
fn get_producer(brokers: &String) -> Mutex<BaseProducer> {
Mutex::new(
BaseProducer::from_config(
ClientConfig::new()
.set("bootstrap.servers", &brokers)
.set("message.timeout.ms", "5000"),
)
.expect("Producer creation error"),
)
}
As a high-level walkthrough: I create mutable producers equal to the number of threads specified and every task within this thread will use one of these producers. The file is read line by line sequentially and every line is moved into the closure that produces it as a message to Kafka.
The code works fine, for the most part, but there are issues related to the runtime exiting without completing all tasks, even when I am using the block_on function in the runtime. Which is supposed to block until the future is complete (Async block in my case here).
I believe the issue is that the issue is with runtime getting dropped without all the threading within Tokio exiting successfully.
I tried reading a file with this approach habing 100,000 records, on a single thread, I was able to produce 28,000 records. On 2 threads, close to 46,000 records. And while utilising all 8 cores of my CPU, I was getting 99,000-100,000 messages indeterministically.
I have checked several answers on SO, but none help in my case. I also read through the documentation of tokio::runtime::Runtime here and tried to use spawn and then use futures::future::join, but that didn't work either.
Any help is appreciated!

What's a good detailed explanation of Tokio's simple TCP echo server example (on GitHub and API reference)?

Tokio has the same example of a simple TCP echo server on its:
GitHub main page (https://github.com/tokio-rs/tokio)
API reference main page (https://docs.rs/tokio/0.2.18/tokio/)
However, in both pages, there is no explanation of what's actually going on. Here's the example, slightly modified so that the main function does not return Result<(), Box<dyn std::error::Error>>:
use tokio::net::TcpListener;
use tokio::prelude::*;
#[tokio::main]
async fn main() {
if let Ok(mut tcp_listener) = TcpListener::bind("127.0.0.1:8080").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
let mut buf = [0; 1024];
// In a loop, read data from the socket and write the data back.
loop {
let n = match tcp_stream.read(&mut buf).await {
// socket closed
Ok(n) if n == 0 => return,
Ok(n) => n,
Err(e) => {
eprintln!("failed to read from socket; err = {:?}", e);
return;
}
};
// Write the data back
if let Err(e) = tcp_stream.write_all(&buf[0..n]).await {
eprintln!("failed to write to socket; err = {:?}", e);
return;
}
}
});
}
}
}
After reading the Tokio documentation (https://tokio.rs/docs/overview/), here's my mental model of this example. A task is spawned for each new TCP connection. And a task is ended whenever a read/write error occurs, or when the client ends the connection (i.e. n == 0 case). Therefore, if there are 20 connected clients at a point in time, there would be 20 spawned tasks. However, under the hood, this is NOT equivalent to spawning 20 threads to handle the connected clients concurrently. As far as I understand, this is basically the problem that asynchronous runtimes are trying to solve. Correct so far?
Next, my mental model is that a tokio scheduler (e.g. the multi-threaded threaded_scheduler which is the default for apps, or the single-threaded basic_scheduler which is the default for tests) will schedule these tasks concurrently on 1-to-N threads. (Side question: for the threaded_scheduler, is N fixed during the app's lifetime? If so, is it equal to num_cpus::get()?). If one task is .awaiting for the read or write_all operations, then the scheduler can use the same thread to perform more work for one of the other 19 tasks. Still correct?
Finally, I'm curious whether the outer code (i.e. the code that is .awaiting for tcp_listener.accept()) is itself a task? Such that in the 20 connected clients example, there aren't really 20 tasks but 21: one to listen for new connections + one per connection. All of these 21 tasks could be scheduled concurrently on one or many threads, depending on the scheduler. In the following example, I wrap the outer code in a tokio::spawn and .await the handle. Is it completely equivalent to the example above?
use tokio::net::TcpListener;
use tokio::prelude::*;
#[tokio::main]
async fn main() {
let main_task_handle = tokio::spawn(async move {
if let Ok(mut tcp_listener) = TcpListener::bind("127.0.0.1:8080").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
// ... same as above ...
});
}
}
});
main_task_handle.await.unwrap();
}
This answer is a summary of an answer I received on Tokio's Discord from Alice Ryhl. Big thank you!
First of all, indeed, for the multi-threaded scheduler, the number of OS threads is fixed to num_cpus.
Second, Tokio can swap the currently running task at every .await on a per-thread basis.
Third, the main function runs in its own task, which is spawned by the #[tokio::main] macro.
Therefore, for the first code block example, if there are 20 connected clients, there would be 21 tasks: one for the main macro + one for each of the 20 open TCP streams. For the second code block example, there would be 22 tasks because of the extra outer tokio::spawn but it's needless and doesn't add any concurrency.

Resources