Optimizing channel sends - rust

I have a simple program that I'd like to optimize. (I don't care about ordering of messages)
use tokio::sync::mpsc;
use tokio::task;
use tokio::time::{sleep, Duration};
async fn do_sleep() {
sleep(Duration::from_millis(100)).await;
}
#[tokio::main]
async fn main() {
let (tx, mut rx) = mpsc::channel(32);
let tx2 = tx.clone();
let models = vec![1, 2, 3];
for model in models {
tx.send(format!("sending {}", model)).await;
}
while let Some(message) = rx.recv().await {
do_sleep().await;
println!("GOT = {}", message);
}
}
Concretely, I want to optimize the tx.send(). What is the tradeoff to wrapping every tx.send() in a future vs a tokio task?

Sequentially awaiting the send()s guarantees ordering wrt. what the receiver observes, tokio::spawn(chan.send(msg)) can lead to reordering.
tokio::spawn removes any kind of backpressure on the producer when the consumer is slow, i.e. mpsc::channel(buf_size) puts a limit of buf_size messages in the internal buffer before senders have to wait. buf_size puts an upper bound on the memory consumption and forces the sender to slow down in case of a slow consumer.
You can increase the buffer size to a value that can handle bursts of messages if you want to keep a hard upper bound on memory or you can use an unbounded channel which never waits on send().
If the producer always and continuously outpaces the receiver, then you'll eventually run into memory issues.
It's likely that you will see more improvements by parallelizing the handling of messages on the receiving end.

Related

Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
}
impl<T> Channel<T> {
pub fn send(&self, value: T) {
self.values.lock().unwrap()
.push_back(value);
// Notify the consumer a value is available
self.notify.notify_one();
}
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
}
// Wait for values to be available
self.notify.notified().await;
}
}
}
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
#[tokio::main]
async fn main() {
let mut i = 0;
loop{
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
});
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
sender.send(1);
});
}
}else{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
receiver.recv().await;
});
handles.push(handle);
}
}
}
futures::future::join_all(handles).await;
i += 1;
println!("No.{i} loop finished.");
}
}
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

the trait `std::marker::Sync` is not implemented for `std::sync::mpsc::Sender<i32>

I'm attempting to build a multithreaded application using MPSC and I'm running into the error in the title. I'm not sure what the proper pattern for this use case is - I'm looking for a pattern that will allow me to clone the producer channel and move it into a new thread to be used.
This new thread will keep an open websocket and send a subset of the websocket message data through the producer whenever a websocket message is received. Data from other threads will be needed in the consumer thread which is why I assume the MPSC pattern is a suitable choice.
In addition to the message in the title it also shows these:
`std::sync::mpsc::Sender<i32>` cannot be shared between threads safely
help: the trait `std::marker::Sync` is not implemented for `std::sync::mpsc::Sender`
Can/should I implement Send for this? Is this an appropriate time to use Rc or Pin? I believe this is happening because I'm attempting to send a type that doesn't implement Send across an .await in an async closure, but I don't what to make of it or what to reach for in this situation.
I've been able to reduce my issue down to this:
use futures::stream::{self, StreamExt};
use std::sync::mpsc::{channel, Receiver, Sender};
#[tokio::main]
async fn main() {
let (tx, rx): (Sender<i32>, Receiver<i32>) = channel();
tokio::spawn(async move {
let a = [1, 2, 3];
let mut s = stream::iter(a.iter())
.cycle()
.for_each(move |int| async {
tx.send(*int);
})
.await;
});
}
There are several issues with your code. The first is that you're missing a move in the innermost async block, so the compiler tries to borrow a reference to tx. That's why you get the error that Sender (type of tx) doesn't implement Sync.
Once you add the missing move you get a different error:
error[E0507]: cannot move out of `tx`, a captured variable in an `FnMut` closure
Now the issue is that for_each() will call the closure multiple times, so you are actually not allowed to move tx into the async block - because there would be nothing to move after the first invocation of the closure.
Since MPSC channels allow multiple producers, Sender implements Clone, so you can simply clone tx before moving it to the async block. This compiles:
let (tx, _rx): (Sender<i32>, Receiver<i32>) = channel();
tokio::spawn(async move {
let a = [1, 2, 3];
let _s = stream::iter(a.iter())
.cycle()
.for_each(move |int| {
let tx = tx.clone();
async move {
tx.send(*int).unwrap();
}
})
.await;
});
Playground
Finally, as pointed out in the comments, you almost certainly want to use async channels here. While the channel you create is unbounded, so senders will never block, the receiver will block when there are no messages and therefore halt a whole executor thread.
As it happens, the sender side of tokio MPSC channels also implements Sync making code close to the one from your question compile:
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();
tokio::spawn(async move {
let a = [1, 2, 3];
let _s = stream::iter(a.iter())
.cycle()
.for_each(|int| async {
tx.send(*int).unwrap();
})
.await;
});
assert_eq!(rx.recv().await, Some(1));
assert_eq!(rx.recv().await, Some(2));
assert_eq!(rx.recv().await, Some(3));
Playground

Rust deadlock with shared struct: Arc + channel + atomic

I'm new to Rust and was trying to generate plenty of JSON data on the fly for a project, but I'm having deadlocks.
I've tried removing the serialization (json_serde) and sending the HashMaps in the channel instead but I still get deadlocks on my computer. If I however comment the send(generator.next()) line and send a string myself, code works flawlessly, thus the deadlock is caused by my DatasetGenerator, but I don't understand why.
Code summary:
Have a DatasetGenerator object that can generate sequences of "events" and serialize them to JSON.
generator.next() works like an "iterator" - It increments an internal atomic counter in the generator and then generates the i-th item in the sequence + serializes the JSON.
Have a generator threadpool generate these JSONs at high throughput (very large payloads each)
Send these JSONs through a channel to other thread (which will send them through network but irrelevant for this question)
Depending if I comment tx_ref.send(generator_ref.next()) or tx_ref.send(some_new_string) below my code deadlocks or succeeds:
src/main.rs:
extern crate threads_pool;
use threads_pool::*;
mod generator;
use std::sync::mpsc;
use std::sync::Arc;
use std::thread;
fn main() {
// N will be an argument, and a very high number. For tests use this:
const N: i64 = 12; // Increase this if you're not getting the deadlock yet, or run cargo run again until it happens.
let (tx, rx) = mpsc::channel();
let tx_producer = tx.clone();
let producer_thread = thread::spawn(move || {
let pool = ThreadPool::new(4);
let generator = Arc::new(generator::data_generator::DatasetGenerator::new(3000));
for i in 0..N {
println!("Generating #{}", i);
let tx_ref = tx_producer.clone();
let generator_ref = generator.clone();
pool.execute(move || {
////////// v !!!DEADLOCK HERE!!! v //////////
tx_ref.send(generator_ref.next()).expect("tx failed."); // This locks!
//tx_ref.send(format!(" {} ", i)).expect("tx failed."); // This works!
////////// ^ !!!DEADLOCK HERE!!! ^ //////////
})
.unwrap();
}
println!("Generator done!");
});
println!("-» Consumer consuming!");
for j in 0..N {
let s = rx.recv().expect("rx failed");
println!("-» Consumed #{}: {} ... ", j, &s[..10]);
}
println!("Consumer done!!");
producer_thread.join().unwrap();
println!("Success. Exit!");
}
This is my DatasetGenerator which seems to be causing all the trouble (as not using serde but outputting the HashMaps still gives deadlocks). src/generator/dataset_generator.rs:
use serde_json::Value;
use std::collections::HashMap;
use std::sync::atomic;
pub struct DatasetGenerator {
num_features: usize,
pub counter: atomic::AtomicI64,
feature_names: Vec<String>,
}
type Datapoint = HashMap<String, Value>;
type Out = String;
impl DatasetGenerator {
pub fn new(num_features: usize) -> DatasetGenerator {
let mut feature_names = Vec::new();
for i in 0..num_features {
feature_names.push(format!("f_{}", i));
}
DatasetGenerator {
num_features,
counter: atomic::AtomicI64::new(0),
feature_names,
}
}
/// Generates the next item in the sequence (iterator-like).
pub fn next(&self) -> Out {
let value = self.counter.fetch_add(1, atomic::Ordering::SeqCst);
self.gen(value)
}
/// Generates the ith item in the sequence. DEADLOCKS!!! ///////////////////////////
pub fn gen(&self, ith: i64) -> Out {
let mut data = Datapoint::with_capacity(self.num_features);
for f in 0..self.num_features {
let name = self.feature_names.get(f).unwrap();
data.insert(name.to_string(), Value::from(ith));
}
serde_json::json!(data).to_string() // Tried without serialization and still deadlocks!
}
}
Commit with deadlock code is here if you want to try out yourself with cargo run: https://github.com/AlbertoEAF/learn-rust/tree/dc5fa867e5a70b605553ef65796fdc9dd42d38a0/rest-injector
Deadlock on Windows with Rust 1.60.0:
Thank you for the help! it's greatly appreciated :)
Update
I've followed the suggestions from #kmdreko's answer below, and apparently the problem is in the generator: not all the items are generated. Even though pool.execute() is called N times, only a random number of closures c < N are executed even if I place pool.close() before leaving the producer_thread. Why does that happen / How can it be fixed?
Fix: Turns out this lockup is caused by the threads_pool library (0.2.6). I switched the thread pool to rayon's and it worked smoothly at the first try.
One thing you should change: an mpsc::Receiver will return an error on .recv() if it cannot possibly yield a result by realizing that all the associated mpsc::Senders have dropped, which is a good indicator that all the work is done. Your tx_refs and even tx_producer will be dropped when their respective tasks/threads complete, however you still have tx in scope that can theoretically give a value. This is what gives you the apparent deadlock. You should simply remove tx_producer and use tx directly so it is moved into the producer thread and dropped accordingly.
Now, you'll see either all N tasks complete, or you'll get an error indicating that some tasks did not complete. The reason not all tasks are completing is because you're creating the thread pool, spawning all the tasks, and then immediately destroying it. The threads_pool documentation says that the threads will finish their current job when the pool is destroyed, but you want to wait until all jobs have completed. For that you need to call the .close() method provided by the PoolManager trait before the end of the closure.
The reason you saw inconsistent behavior, but was benefited by returning a string directly is because the jobs required less work and the threads could get away with completing all them before they saw their signal to exit. Your generator_ref.next() requires much more computation so its not surprising they'd only process 4-plus-a-bit jobs before they see they've been told to exit.

What are idiomatic ways to send data between threads?

I want to do some calculation in a separate thread, and then recover the data from the main thread. What are the canonical ways to pass some data from a thread to another in Rust?
fn main() {
let handle = std::thread::spawn(|| {
// I want to send this to the main thread:
String::from("Hello world!")
});
// How to recover the data from the other thread?
handle.join().unwrap();
}
There are lots of ways to send send data between threads -- without a clear "best" solution. It depends on your situation.
Using just thread::join
Many people do not realize that you can very easily send data with only the thread API, but only twice: once to the new thread and once back.
use std::thread;
let data_in = String::from("lots of data");
let handle = thread::spawn(move || {
println!("{}", data_in); // we can use the data here!
let data_out = heavy_compuations();
data_out // <-- simply return the data from the closure
});
let data_out = handle.join().expect("thread panicked :(");
println!("{}", data_out); // we can use the data generated in the thread here!
(Playground)
This is immensely useful for threads that are just spawned to do one specific job. Note the move keyword before the closure that makes sure all referenced variables are moved into the closure (which is then moved to another thread).
Channels from std
The standard library offers a multi producer single consumer channel in std::sync::mpsc. You can send arbitrarily many values through a channel, so it can be used in more situations. Simple example:
use std::{
sync::mpsc::channel,
thread,
time::Duration,
};
let (sender, receiver) = channel();
thread::spawn(move || {
sender.send("heavy computation 1").expect("receiver hung up :(");
thread::sleep(Duration::from_millis(500));
sender.send("heavy computation 2").expect("receiver hung up :(");
});
let result1 = receiver.recv().unwrap();
let result2 = receiver.recv().unwrap();
(Playground)
Of course you can create another channel to provide communication in the other direction as well.
More powerful channels by crossbeam
Unfortunately, the standard library currently only provides channels that are restricted to a single consumer (i.e. Receiver can't be cloned). To get more powerful channels, you probably want to use the channels from the awesome crossbeam library. Their description:
This crate is an alternative to std::sync::mpsc with more features and better performance.
In particular, it is a mpmc (multi consumer!) channel. This provides a nice way to easily share work between multiple threads. Example:
use std::thread;
// You might want to use a bounded channel instead...
let (sender, receiver) = crossbeam_channel::unbounded();
for _ in 0..num_cpus::get() {
let receiver = receiver.clone(); // clone for this thread
thread::spawn(move || {
for job in receiver {
// process job
}
});
}
// Generate jobs
for x in 0..10_000 {
sender.send(x).expect("all threads hung up :(");
}
(Playground)
Again, adding another channel allows you to communicate results back to the main thread.
Other methods
There are plenty of other crates that offer some other means of sending data between threads. Too many to list them here.
Note that sending data is not the only way to communicate between threads. There is also the possibility to share data between threads via Mutex, atomics, lock-free data structures and many other ways. This is conceptually very different. It depends on the situation whether sending or sharing data is the better way to describe your cross thread communication.
The idiomatic way to do so is to use a channel. It conceptually behaves like an unidirectional tunnel: you put something in one end and it comes out the other side.
use std::sync::mpsc::channel;
fn main() {
let (sender, receiver) = channel();
let handle = std::thread::spawn(move || {
sender.send(String::from("Hello world!")).unwrap();
});
let data = receiver.recv().unwrap();
println!("Got {:?}", data);
handle.join().unwrap();
}
The channel won't work anymore when the receiver is dropped.
They are mainly 3 ways to recover the data:
recv will block until something is received
try_recv will return immediately. If the channel is not closed, it is either Ok(data) or Err(TryRevcError::Empty).
recv_timeout is the same as try_recv but it waits to get a data a certain amount of time.

How can I send a message to a specific thread?

I need to create some threads where some of them are going to run until their runner variable value has been changed. This is my minimal code.
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
let mut log_runner = Arc::new(Mutex::new(true));
println!("{}", *log_runner.lock().unwrap());
let mut threads = Vec::new();
{
let mut log_runner_ref = Arc::clone(&log_runner);
// log runner thread
let handle = thread::spawn(move || {
while *log_runner_ref.lock().unwrap() == true {
// DO SOME THINGS CONTINUOUSLY
println!("I'm a separate thread!");
}
});
threads.push(handle);
}
// let the main thread to sleep for x time
thread::sleep(Duration::from_millis(1));
// stop the log_runner thread
*log_runner.lock().unwrap() = false;
// join all threads
for handle in threads {
handle.join().unwrap();
println!("Thread joined!");
}
println!("{}", *log_runner.lock().unwrap());
}
It looks like I'm able to set the log_runner_ref in the log runner thread after 1 second to false. Is there a way to mark the treads with some name / ID or something similar and send a message to a specific thread using its specific marker (name / ID)?
If I understand it correctly, then the let (tx, rx) = mpsc::channel(); can be used for sending messages to all the threads simultaneously rather than to a specific one. I could send some identifier with the messages and each thread will be looking for its own identifier for the decision if to act on received message or not, but I would like to avoid the broadcasting effect.
MPSC stands for Multiple Producers, Single Consumer. As such, no, you cannot use that by itself to send a message to all threads, since for that you'd have to be able to duplicate the consumer. There are tools for this, but the choice of them requires a bit more info than just "MPMC" or "SPMC".
Honestly, if you can rely on channels for messaging (there are cases where it'd be a bad idea), you can create a channel per thread, assign the ID outside of the thread, and keep a HashMap instead of a Vec with the IDs associated to the threads. Receiver<T> can be moved into the thread (it implements Send if T implements Send), so you can quite literally move it in.
You then keep the Sender outside and send stuff to it :-)

Resources