How to tokio::join multiple tasks? - rust

Imagine that some futures are stored in a Vec whose length are runtime-determined, you are supposed to join these futures concurrently, what should you do?
Obviously, by the example in the document of tokio::join, manually specifying each length the Vec could be, like 1, 2, 3, ... and dealing with respectable case should work.
extern crate tokio;
let v = Vec::new();
// directly or indirectly you push many futures to the vector
// to join these futures concurrently one possible way is
if v.len() == 0 {}
if v.len() == 1 { join!(v.pop()); }
if v.len() == 2 { join!(v.pop(), v.pop() ); }
// ...
And I also noticed that tokio::join! take a list as parameter in the document, when I use syntax like
or something like
tokio::join![ v ] / tokio::join![ v[..] ] / tokio::join![ v[..][..] ]
it just doesn't work
And here comes the question that is there any doorway to join these futures more efficient or should I miss something against what the document says?

You can use futures::future::join_all to "merge" your collection of futures together into a single future, that resolves when all of the subfutures resolve.

join_all and try_join_all, as well as more versatile FuturesOrdered and FuturesUnordered utilities from the same crate futures, are executed as a single task. This is probably fine if the constituent futures are not often concurrently ready to perform work, but if you want to make use of CPU parallelism with the multi-threaded runtime, consider spawning the individual futures as separate tasks and waiting on the tasks to finish.
Tokio 1.21.0 or later: JoinSet
With recent Tokio releases, you can use JoinSet to get the maximum flexibility, including the ability to abort all tasks. The tasks in the set are also aborted when JoinSet is dropped.
use tokio::task::JoinSet;
let mut set = JoinSet::new();
for fut in v {
while let Some(res) = set.join_next().await {
let out = res?;
// ...
Older API
Spawn tasks with tokio::spawn and wait on the join handles:
use futures::future;
// ...
let outputs = future::try_join_all(v.into_iter().map(tokio::spawn)).await?;
You can also use the FuturesOrdered and FuturesUnordered combinators to process the outputs asynchronously in a stream:
use futures::stream::FuturesUnordered;
use futures::prelude::*;
// ...
let mut completion_stream = v.into_iter()
while let Some(res) = {
// ...
One caveat with waiting for tasks this way is that the tasks are not cancelled when the future (e.g. an async block) that has spawned the task and possibly owns the returned JoinHandle gets dropped. The JoinHandle::abort method needs to be used to explicitly cancel the task.

A full example:
async fn main() {
let tasks = (0..5).map(|i| tokio::spawn(async move {
sleep(Duration::from_secs(1)).await; // simulate some work
i * 2
let result = futures::future::join_all(tasks).await;
println!("{:?}", result); // [Ok(8), Ok(6), Ok(4), Ok(2), Ok(0)]


Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
impl<T> Channel<T> {
pub fn send(&self, value: T) {
// Notify the consumer a value is available
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
// Wait for values to be available
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
async fn main() {
let mut i = 0;
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
i += 1;
println!("No.{i} loop finished.");
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

Rust deadlock with shared struct: Arc + channel + atomic

I'm new to Rust and was trying to generate plenty of JSON data on the fly for a project, but I'm having deadlocks.
I've tried removing the serialization (json_serde) and sending the HashMaps in the channel instead but I still get deadlocks on my computer. If I however comment the send( line and send a string myself, code works flawlessly, thus the deadlock is caused by my DatasetGenerator, but I don't understand why.
Code summary:
Have a DatasetGenerator object that can generate sequences of "events" and serialize them to JSON. works like an "iterator" - It increments an internal atomic counter in the generator and then generates the i-th item in the sequence + serializes the JSON.
Have a generator threadpool generate these JSONs at high throughput (very large payloads each)
Send these JSONs through a channel to other thread (which will send them through network but irrelevant for this question)
Depending if I comment tx_ref.send( or tx_ref.send(some_new_string) below my code deadlocks or succeeds:
extern crate threads_pool;
use threads_pool::*;
mod generator;
use std::sync::mpsc;
use std::sync::Arc;
use std::thread;
fn main() {
// N will be an argument, and a very high number. For tests use this:
const N: i64 = 12; // Increase this if you're not getting the deadlock yet, or run cargo run again until it happens.
let (tx, rx) = mpsc::channel();
let tx_producer = tx.clone();
let producer_thread = thread::spawn(move || {
let pool = ThreadPool::new(4);
let generator = Arc::new(generator::data_generator::DatasetGenerator::new(3000));
for i in 0..N {
println!("Generating #{}", i);
let tx_ref = tx_producer.clone();
let generator_ref = generator.clone();
pool.execute(move || {
////////// v !!!DEADLOCK HERE!!! v //////////
tx_ref.send("tx failed."); // This locks!
//tx_ref.send(format!(" {} ", i)).expect("tx failed."); // This works!
////////// ^ !!!DEADLOCK HERE!!! ^ //////////
println!("Generator done!");
println!("-» Consumer consuming!");
for j in 0..N {
let s = rx.recv().expect("rx failed");
println!("-» Consumed #{}: {} ... ", j, &s[..10]);
println!("Consumer done!!");
println!("Success. Exit!");
This is my DatasetGenerator which seems to be causing all the trouble (as not using serde but outputting the HashMaps still gives deadlocks). src/generator/
use serde_json::Value;
use std::collections::HashMap;
use std::sync::atomic;
pub struct DatasetGenerator {
num_features: usize,
pub counter: atomic::AtomicI64,
feature_names: Vec<String>,
type Datapoint = HashMap<String, Value>;
type Out = String;
impl DatasetGenerator {
pub fn new(num_features: usize) -> DatasetGenerator {
let mut feature_names = Vec::new();
for i in 0..num_features {
feature_names.push(format!("f_{}", i));
DatasetGenerator {
counter: atomic::AtomicI64::new(0),
/// Generates the next item in the sequence (iterator-like).
pub fn next(&self) -> Out {
let value = self.counter.fetch_add(1, atomic::Ordering::SeqCst);
/// Generates the ith item in the sequence. DEADLOCKS!!! ///////////////////////////
pub fn gen(&self, ith: i64) -> Out {
let mut data = Datapoint::with_capacity(self.num_features);
for f in 0..self.num_features {
let name = self.feature_names.get(f).unwrap();
data.insert(name.to_string(), Value::from(ith));
serde_json::json!(data).to_string() // Tried without serialization and still deadlocks!
Commit with deadlock code is here if you want to try out yourself with cargo run:
Deadlock on Windows with Rust 1.60.0:
Thank you for the help! it's greatly appreciated :)
I've followed the suggestions from #kmdreko's answer below, and apparently the problem is in the generator: not all the items are generated. Even though pool.execute() is called N times, only a random number of closures c < N are executed even if I place pool.close() before leaving the producer_thread. Why does that happen / How can it be fixed?
Fix: Turns out this lockup is caused by the threads_pool library (0.2.6). I switched the thread pool to rayon's and it worked smoothly at the first try.
One thing you should change: an mpsc::Receiver will return an error on .recv() if it cannot possibly yield a result by realizing that all the associated mpsc::Senders have dropped, which is a good indicator that all the work is done. Your tx_refs and even tx_producer will be dropped when their respective tasks/threads complete, however you still have tx in scope that can theoretically give a value. This is what gives you the apparent deadlock. You should simply remove tx_producer and use tx directly so it is moved into the producer thread and dropped accordingly.
Now, you'll see either all N tasks complete, or you'll get an error indicating that some tasks did not complete. The reason not all tasks are completing is because you're creating the thread pool, spawning all the tasks, and then immediately destroying it. The threads_pool documentation says that the threads will finish their current job when the pool is destroyed, but you want to wait until all jobs have completed. For that you need to call the .close() method provided by the PoolManager trait before the end of the closure.
The reason you saw inconsistent behavior, but was benefited by returning a string directly is because the jobs required less work and the threads could get away with completing all them before they saw their signal to exit. Your requires much more computation so its not surprising they'd only process 4-plus-a-bit jobs before they see they've been told to exit.

How to run multiple futures that call thread::sleep in parallel? [duplicate]

This question already has answers here:
Why does Future::select choose the future with a longer sleep period first?
(1 answer)
What is the best approach to encapsulate blocking I/O in future-rs?
(1 answer)
Closed 4 years ago.
I have a slow future that blocks for 1 second before running to completion.
I've tried to use the join combinator but the composite future my_app executes the futures sequentially:
#![feature(pin, futures_api, arbitrary_self_types)]
extern crate futures; // v0.3
use futures::prelude::*;
use futures::task::Context;
use std::pin::PinMut;
use std::{thread, time};
use futures::executor::ThreadPoolBuilder;
struct SlowComputation {}
impl Future for SlowComputation {
type Output = ();
fn poll(self: PinMut<Self>, _cx: &mut Context) -> Poll<Self::Output> {
let millis = time::Duration::from_millis(1000);
fn main() {
let fut1 = SlowComputation {};
let fut2 = SlowComputation {};
let my_app = fut1.join(fut2);
.expect("Failed to create threadpool")
Why does join work like that? I expected the futures to be spawned on different threads.
What is the right way to obtain my goal?
futures-preview = "0.3.0-alfa.6"
$ time target/debug/futures03
real 0m2.004s
user 0m0.000s
sys 0m0.004s
If you combine futures with join() they'll be transformed into a single task, running on a single thread.
If the futures are well-behaved, they would run in parallel in an event-driven (asynchronous) manner. You would expect your application to sleep for 1 second.
But unfortunately the future you implemented is not well-behaved. It blocks the current thread for one second, disallowing any other work to be done during this time. Because the futures are run on the same thread, they cannot run at the same time. Your application will sleep for 2 seconds.
Note that if you change your example to the following, the futures will remain separate tasks and you can run them independently in parallel on your thread pool:
fn main() {
let fut1 = SlowComputation {};
let fut2 = SlowComputation {};
let mut pool = ThreadPoolBuilder::new()
.expect("Failed to create threadpool");
Writing futures that block the main thread is highly discouraged and in a real application you should probably use timers provided by a library, for example tokio::timer::Delay or tokio::timer::timeout::Timeout.

Is it possible to share a HashMap between threads without locking the entire HashMap?

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don't want to lock the whole HashMap for a single update/remove, so my HashMap looks something like HashMap<u8, Mutex<u8>>. This works, but it makes no sense since the thread will lock the whole map anyways.
Here's this working version, without threads; I don't think that's necessary for the example.
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
fn main() {
let s = Arc::new(Mutex::new(S::new()));
let z = s.clone();
let _ = z.lock().unwrap();
struct S {
x: HashMap<u8, Mutex<u8>>, // other non-mutable fields
impl S {
pub fn new() -> S {
S {
x: HashMap::default(),
Is this possible in any way? Is there something obvious I missed in the documentation?
I've been trying to get this working, but I'm not sure how. Basically every example I see there's always a Mutex (or RwLock, or something like that) guarding the inner value.
I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?
In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.
Here's an example:
use std::{
sync::{Arc, Mutex, RwLock},
fn main() {
let data = Arc::new(RwLock::new(HashMap::new()));
let threads: Vec<_> = (0..10)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
for t in threads {
t.join().expect("Thread panicked");
println!("{:?}", data);
fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
loop {
// Assume that the element already exists
let map ="RwLock poisoned");
if let Some(element) = map.get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
let mut map = data.write().expect("RwLock poisoned");
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
map.entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.
This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.
In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.
The next step up would be to use a crate like arc-swap, which says:
Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.
The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.
I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.
As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.
Suppose the key of the data is map-able to a u8
You can have Arc<HashMap<u8,Mutex<HashMap<Key,Value>>>
When you initialize the data structure you populate all the first level map before putting it in Arc (it will be immutable after initialization)
When you want a value from the map you will need to do a double get, something like:
where the unwrap is safe because we initialized the first map with all the value.
to write in the map something like:
data.get(&map_to_u8(id)).unwrap().lock().expect("poison").entry(id).or_insert_with(|| value);
It's easy to see contention is reduced because we now have 256 Mutex and the probability of multiple threads asking the same Mutex is low.
#Shepmaster example with 100 threads takes about 10s on my machine, the following example takes a little more than 1 second.
use std::{
sync::{Arc, Mutex, RwLock},
fn main() {
let mut inner = HashMap::new( );
for i in 0..=u8::max_value() {
inner.insert(i, Mutex::new(HashMap::new()));
let data = Arc::new(inner);
let threads: Vec<_> = (0..100)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
for t in threads {
t.join().expect("Thread panicked");
println!("{:?}", data);
fn worker_thread(id: u8, data: Arc<HashMap<u8,Mutex<HashMap<u8,Mutex<i32>>>>> ) {
loop {
// first unwrap is safe to unwrap because we populated for every `u8`
if let Some(element) = data.get(&id).unwrap().lock().expect("poison").get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
data.get(&id).unwrap().lock().expect("poison").entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
Maybe you want to consider evmap:
A lock-free, eventually consistent, concurrent multi-value map.
The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.

Preferred method for awaiting concurrent threads

I have a program that loops over HTTP responses. These don't depend on each other, so they can be done simultaneously. I am using threads to do this:
extern crate hyper;
use std::thread;
use std::sync::Arc;
use hyper::Client;
fn main() {
let client = Arc::new(Client::new());
for num in 0..10 {
let client_helper = client.clone();
thread::spawn(move || {
client_helper.get(&format!("{}", num))
This works, but I can see other possibilities of doing this such as:
let mut threads = vec![];
threads.push(thread::spawn(move || {
/* snip */
for thread in threads {
let _ = thread.join();
It would also make sense to me to use a function that returns the thread handler, but I couldn't figure out how to do that ... not sure what the return type has to be.
What is the optimal/recommended way to wait for concurrent threads in Rust?
Your first program does not actually have any parallelism. Each time you spin up a worker thread, you immediately wait for it to finish before you start the next one. This is, of course, worse than useless.
The second way works, but there are crates that do some of the busywork for you. For example, scoped_threadpool and crossbeam have thread pools that allow you to write something like (untested, may contain mistakes):
let client = &Client::new();// No Arc needed
run_in_pool(|scope| {
for num in 0..10 {
scope.spawn(move || {
client.get(&format!("{}", num)).send().unwrap();
