How to find if tokio::sync::mpsc::Receiver has been closed? - rust

I have a loop where I do some work and send result with Sender. The work takes time and I need to retry it in case of failure. It's possible that while I retry it, the receiver has been closed and my retries are going to be a waste of time. Because of this, I need a way to check if Receiver is available without sending a message.
In an ideal world, I want my code to look like this in pseudocode:
let (tx, rx) = tokio::sync::mpsc::channel(1);
tokio::spawn(async move {
// do som stuff with rx and drop it after some time
rx.recv(...).await;
});
let mut attempts = 0;
loop {
if tx.is_closed() {
break;
}
if let Ok(result) = do_work().await {
attempts = 0;
let _ = tx.send(result).await;
} else {
if attempts >= 10 {
break;
} else {
attempts += 1;
continue;
}
}
};
The problem is that Sender doesn't have an is_closed method. It does have pub fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), ClosedError>>, but I don't know what Context is or where can I find it.
When I don't have a value to send, how can I check if the sender is able to send?

Sender has a try_send method:
Attempts to immediately send a message on this Sender
This method differs from send by returning immediately if the channel's buffer is full or no receiver is waiting to acquire some data. Compared with send, this function has two failure cases instead of one (one for disconnection, one for a full buffer).
Use it instead of send and check for the error:
if let Err(TrySendError::Closed(_)) = tx.send(result).await {
break;
}
It is possible to do what you want by using poll_fn from futures crate. It adapts a function returning Poll to return a Future
use futures::future::poll_fn; // 0.3.5
use std::future::Future;
use tokio::sync::mpsc::{channel, error::ClosedError, Sender}; // 0.2.22
use tokio::time::delay_for; // 0.2.22
fn wait_until_ready<'a, T>(
sender: &'a mut Sender<T>,
) -> impl Future<Output = Result<(), ClosedError>> + 'a {
poll_fn(move |cx| sender.poll_ready(cx))
}
#[tokio::main]
async fn main() {
let (mut tx, mut rx) = channel::<i32>(1);
tokio::spawn(async move {
// Receive one value and close the channel;
let val = rx.recv().await;
println!("{:?}", val);
});
wait_until_ready(&mut tx).await.unwrap();
tx.send(123).await.unwrap();
wait_until_ready(&mut tx).await.unwrap();
delay_for(std::time::Duration::from_secs(1)).await;
tx.send(456).await.unwrap(); // 456 likely never printed out,
// despite having a positive readiness response
// and the send "succeeding"
}
Note, however, that in the general case this is susceptible to TOCTOU. Even though Sender's poll_ready reserves a slot in the channel for later usage, it is possible that the receiving end is closed between the readiness check and the actual send. I tried to indicate this in the code.

Send a null message that the receiver ignores. It could be anything. For example, if you're sending T now you could change it to Option<T> and have the receiver ignore Nones.
Yeah, that will work, although I don't really liked this approach since I need to change communication format.
I wouldn't get hung up on the communication format. This isn't a well-defined network protocol that should be isolated from implementation details; it's an internal communication mechanism between two pieces of your own code.

Related

Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
}
impl<T> Channel<T> {
pub fn send(&self, value: T) {
self.values.lock().unwrap()
.push_back(value);
// Notify the consumer a value is available
self.notify.notify_one();
}
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
}
// Wait for values to be available
self.notify.notified().await;
}
}
}
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
#[tokio::main]
async fn main() {
let mut i = 0;
loop{
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
});
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
sender.send(1);
});
}
}else{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
receiver.recv().await;
});
handles.push(handle);
}
}
}
futures::future::join_all(handles).await;
i += 1;
println!("No.{i} loop finished.");
}
}
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

Why does my asynchronous request pool (using crossbeam channels) block?

My main goal is to write an API Server, which retrieves part of the information from another external API server. However, this API server is quite fragile, therefore I would like to limit the global amount of concurrent requests made to those external API Servers for example to 10 or 20.
Thus, my idea was to write something a HttpPool, which consumes task via a crossbeam bounded channels and distributes them among tokio tasks. The ideas was to use a bounded channel to avoid publishing to much work and use a set of tasks to limit the amount of request per external API call.
It deems to work, if I do not create more than 8 tasks. If I define more, it blocks after fetching the first tasks from the queue.
use std::{error::Error, result::Result};
use tokio::sync::oneshot::Sender;
use tokio::time::timeout;
use tokio::time::{sleep, Duration};
use crossbeam_channel;
#[derive(Debug)]
struct HttpTaskRequest {
url: String,
result: Sender<String>,
}
type PoolSender = crossbeam_channel::Sender<HttpTaskRequest>;
type PoolReceiver = crossbeam_channel::Receiver<HttpTaskRequest>;
#[derive(Debug)]
struct HttpPool {
size: i32,
sender: PoolSender,
receiver: PoolReceiver,
}
impl HttpPool {
fn new(capacity: i32) -> Self {
let (tx, rx) = crossbeam_channel::bounded::<HttpTaskRequest>(capacity as usize);
HttpPool {
size: capacity,
sender: tx,
receiver: rx,
}
}
async fn start(self) -> Result<HttpPool, Box<dyn Error>> {
for i in 0..self.size {
let task_receiver = self.receiver.clone();
tokio::spawn(async move {
loop {
match task_receiver.recv() {
Ok(request) => {
if request.result.is_closed() {
println!("Task[{i}] received url {} already closed by receiver, seems to reach timeout already", request.url);
} else {
println!("Task[{i}] started to work {:?}", request.url);
let resp = reqwest::get("https://httpbin.org/ip").await;
println!("Resp: {:?}", resp);
println!("Done Send request for url {}", request.url);
request.result.send("Result".to_owned()).expect("Failed to send result");
}
}
Err(err) => println!("Error: {err}"),
}
}
});
}
Ok(self)
}
pub async fn request(&self, url: String) -> Result<(), Box<dyn Error>> {
let (os_sender, os_receiver) = tokio::sync::oneshot::channel::<String>();
let request = HttpTaskRequest {
result: os_sender,
url: url.clone(),
};
self.sender.send(request).expect("Failed to publish message to task group");
// check if a timeout or value was returned
match timeout(Duration::from_millis(100), os_receiver).await {
Ok(res) => {
println!("Request finished without reaching the timeout {}",res.unwrap());
}
Err(_) => {println!("Request {url} run into timeout");}
}
Ok(())
}
}
#[tokio::main]
async fn main() {
let http_pool = HttpPool::new(20).start().await.expect("Failed to start http pool");
for i in 0..10 {
let url = format!("T{}", i.to_string());
http_pool.request(url).await.expect("Failed to request message");
}
loop {}
}
Maybe somebody can explain, why the code blocks? Is it related to the tokio::spawn?
I guess my attempt is wrong, so please let me know if there is another way to handle it. The goal can be summarized like this. I would like to requests URLs and process them in a fashion, that not more than N concurrent requests are made against the API server.
I have read this question: How can I perform parallel asynchronous HTTP GET requests with reqwest?. But here, this answer, I do know the work, which is not the case in my example. They arrive on the fly, hence I am not sure how to handle them.
I have finally solved the mystery about the blocking in my code example above. As we can see, I have used the crate crossbeam_channel, which does not cooperate with async code. If we call recv on this type of channel, the thread blocks until a message is received. Hence, there is no way, that we can return back to the tokio scheduler, which implies that no other task is able to run. To refresh your memories, async code only returns to the scheduler, if a .await is called.
Furthermore, the code was working, if we have spawned less tasks than worker threads. The normal amount of worker threads is equal to the CPU core count, in my case eight. Hence, if I have started more than this, the all threads were blocked an the application freezes.
The fix was to replace the crate crossbeam-channel with async-channel, as stated on the tokio tutorial page.
In case my answer is vague, I recommend to read the following posts:
https://github.com/tokio-rs/tokio/discussions/3858
https://ryhl.io/blog/async-what-is-blocking/
https://crates.io/crates/async-channel

Rust deadlock with shared struct: Arc + channel + atomic

I'm new to Rust and was trying to generate plenty of JSON data on the fly for a project, but I'm having deadlocks.
I've tried removing the serialization (json_serde) and sending the HashMaps in the channel instead but I still get deadlocks on my computer. If I however comment the send(generator.next()) line and send a string myself, code works flawlessly, thus the deadlock is caused by my DatasetGenerator, but I don't understand why.
Code summary:
Have a DatasetGenerator object that can generate sequences of "events" and serialize them to JSON.
generator.next() works like an "iterator" - It increments an internal atomic counter in the generator and then generates the i-th item in the sequence + serializes the JSON.
Have a generator threadpool generate these JSONs at high throughput (very large payloads each)
Send these JSONs through a channel to other thread (which will send them through network but irrelevant for this question)
Depending if I comment tx_ref.send(generator_ref.next()) or tx_ref.send(some_new_string) below my code deadlocks or succeeds:
src/main.rs:
extern crate threads_pool;
use threads_pool::*;
mod generator;
use std::sync::mpsc;
use std::sync::Arc;
use std::thread;
fn main() {
// N will be an argument, and a very high number. For tests use this:
const N: i64 = 12; // Increase this if you're not getting the deadlock yet, or run cargo run again until it happens.
let (tx, rx) = mpsc::channel();
let tx_producer = tx.clone();
let producer_thread = thread::spawn(move || {
let pool = ThreadPool::new(4);
let generator = Arc::new(generator::data_generator::DatasetGenerator::new(3000));
for i in 0..N {
println!("Generating #{}", i);
let tx_ref = tx_producer.clone();
let generator_ref = generator.clone();
pool.execute(move || {
////////// v !!!DEADLOCK HERE!!! v //////////
tx_ref.send(generator_ref.next()).expect("tx failed."); // This locks!
//tx_ref.send(format!(" {} ", i)).expect("tx failed."); // This works!
////////// ^ !!!DEADLOCK HERE!!! ^ //////////
})
.unwrap();
}
println!("Generator done!");
});
println!("-» Consumer consuming!");
for j in 0..N {
let s = rx.recv().expect("rx failed");
println!("-» Consumed #{}: {} ... ", j, &s[..10]);
}
println!("Consumer done!!");
producer_thread.join().unwrap();
println!("Success. Exit!");
}
This is my DatasetGenerator which seems to be causing all the trouble (as not using serde but outputting the HashMaps still gives deadlocks). src/generator/dataset_generator.rs:
use serde_json::Value;
use std::collections::HashMap;
use std::sync::atomic;
pub struct DatasetGenerator {
num_features: usize,
pub counter: atomic::AtomicI64,
feature_names: Vec<String>,
}
type Datapoint = HashMap<String, Value>;
type Out = String;
impl DatasetGenerator {
pub fn new(num_features: usize) -> DatasetGenerator {
let mut feature_names = Vec::new();
for i in 0..num_features {
feature_names.push(format!("f_{}", i));
}
DatasetGenerator {
num_features,
counter: atomic::AtomicI64::new(0),
feature_names,
}
}
/// Generates the next item in the sequence (iterator-like).
pub fn next(&self) -> Out {
let value = self.counter.fetch_add(1, atomic::Ordering::SeqCst);
self.gen(value)
}
/// Generates the ith item in the sequence. DEADLOCKS!!! ///////////////////////////
pub fn gen(&self, ith: i64) -> Out {
let mut data = Datapoint::with_capacity(self.num_features);
for f in 0..self.num_features {
let name = self.feature_names.get(f).unwrap();
data.insert(name.to_string(), Value::from(ith));
}
serde_json::json!(data).to_string() // Tried without serialization and still deadlocks!
}
}
Commit with deadlock code is here if you want to try out yourself with cargo run: https://github.com/AlbertoEAF/learn-rust/tree/dc5fa867e5a70b605553ef65796fdc9dd42d38a0/rest-injector
Deadlock on Windows with Rust 1.60.0:
Thank you for the help! it's greatly appreciated :)
Update
I've followed the suggestions from #kmdreko's answer below, and apparently the problem is in the generator: not all the items are generated. Even though pool.execute() is called N times, only a random number of closures c < N are executed even if I place pool.close() before leaving the producer_thread. Why does that happen / How can it be fixed?
Fix: Turns out this lockup is caused by the threads_pool library (0.2.6). I switched the thread pool to rayon's and it worked smoothly at the first try.
One thing you should change: an mpsc::Receiver will return an error on .recv() if it cannot possibly yield a result by realizing that all the associated mpsc::Senders have dropped, which is a good indicator that all the work is done. Your tx_refs and even tx_producer will be dropped when their respective tasks/threads complete, however you still have tx in scope that can theoretically give a value. This is what gives you the apparent deadlock. You should simply remove tx_producer and use tx directly so it is moved into the producer thread and dropped accordingly.
Now, you'll see either all N tasks complete, or you'll get an error indicating that some tasks did not complete. The reason not all tasks are completing is because you're creating the thread pool, spawning all the tasks, and then immediately destroying it. The threads_pool documentation says that the threads will finish their current job when the pool is destroyed, but you want to wait until all jobs have completed. For that you need to call the .close() method provided by the PoolManager trait before the end of the closure.
The reason you saw inconsistent behavior, but was benefited by returning a string directly is because the jobs required less work and the threads could get away with completing all them before they saw their signal to exit. Your generator_ref.next() requires much more computation so its not surprising they'd only process 4-plus-a-bit jobs before they see they've been told to exit.

Sharing Mutable Data Between Threads in Rust

I know there are hundreds of questions just like this one, but i'm having trouble wrapping my head around how to do the thing I'm trying to do.
I want an http server that accepts and processes events. On receiving/processing an event, i want the EventManager to send an update to an ApplicationMonitor that is tracking how many events have been accepted/processed. The ApplicationMonitor would also (eventually) handle things like tracking number of concurrent connections, but in this example I just want my EventManager to send an Inc('event_accepted') update to my ApplicationMonitor.
To be useful, I need the ApplicationMonitor to be able to return a snapshot of the stats when the requested through a /stats route.
So I have an ApplicationMonitor which spawns a thread and listens on a channel for incoming Stat events. When it receives a Stat event it updates the stats HashMap. The stats hashmap must be mutable within both ApplicationMonitor as well as the spawned thread.
use std::sync::mpsc;
use std::sync::mpsc::Sender;
use std::thread;
use std::thread::JoinHandle;
use std::collections::HashMap;
pub enum Stat {
Inc(&'static str),
Dec(&'static str),
Set(&'static str, i32)
}
pub struct ApplicationMonitor {
pub tx: Sender<Stat>,
pub join_handle: JoinHandle<()>
}
impl ApplicationMonitor {
pub fn new() -> ApplicationMonitor {
let (tx, rx) = mpsc::channel::<Stat>();
let mut stats: HashMap<&'static str, i32> = HashMap::new();
let join_handle = thread::spawn(move || {
for stat in rx.recv() {
match stat {
Stat::Inc(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val + 1);
},
Stat::Dec(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val - 1);
},
Stat::Set(nm, val) => {
stats.insert(nm, val);
}
}
}
});
let am = ApplicationMonitor {
tx,
join_handle
};
am
}
pub fn get_snapshot(&self) -> HashMap<&'static str, i32> {
self.stats.clone()
}
}
Because rx cannot be cloned, I must move the references into the closure. When I do this, I am no longer able to access stats outside of the thread.
I thought maybe I needed a second channel so the thread could communicate it's internals back out, but this doesn't work as i would need another thread to listen for that in a non-blocking way.
Is this where I'd use Arc?
How can I have stats live inside and out of the thread context?
Yes, this is a place where you'd wrap your stats in an Arc so that you can have multiple references to it from different threads. But just wrapping in an Arc will only give you a read-only view of the HashMap - if you need to be able to modify it, you'll also need to wrap it in something which guarantees that only one thing can modify it at a time. So you'll probably end up with either an Arc<Mutex<HashMap<&'static str, i32>>> or a Arc<RwLock<HashMap<&'static str, i32>>>.
Alternatively, if you're just changing the values, and not adding or removing values, you could potentially use an Arc<HashMap<&static str, AtomicU32>>, which would allow you to read and modify different values in parallel without needing to take out a Map-wide lock, but Atomics can be a little more fiddly to understand and use correctly than locks.

Moving Receiver to thread complains about Sync, but expected Send

I'm trying to reference, via an Arc, a receiver into a thread, so I can do centralized pub-sub via dispatcher. However, I get the following error:
src/dispatcher.rs:58:11: 58:24 error: the trait `core::marker::Sync` is not implemented for the type `core::cell::UnsafeCell<std::sync::mpsc::Flavor<dispatcher::DispatchMessage>>` [E0277]
src/dispatcher.rs:58 thread::spawn(move || {
^~~~~~~~~~~~~
src/dispatcher.rs:58:11: 58:24 note: `core::cell::UnsafeCell<std::sync::mpsc::Flavor<dispatcher::DispatchMessage>>` cannot be shared between threads safely
src/dispatcher.rs:58 thread::spawn(move || {
Wat! I thought only Send was required for moving across channels? The code of DispatchMessage is:
#[derive(PartialEq, Debug, Clone)]
enum DispatchType {
ChangeCurrentChannel,
OutgoingMessage,
IncomingMessage
}
#[derive(Clone)]
struct DispatchMessage {
dispatch_type: DispatchType,
payload: String
}
Both String and surely Enum are Send, right? Why is it complaining about Sync?
The relevant part from the dispatcher:
pub fn start(&self) {
let shared_subscribers = Arc::new(self.subscribers);
for ref broadcaster in &self.broadcasters {
let shared_broadcaster = Arc::new(Mutex::new(broadcaster));
let broadcaster = shared_broadcaster.clone();
let subscribers = shared_subscribers.clone();
thread::spawn(move || {
loop {
let message = &broadcaster.lock().unwrap().recv().ok().expect("Couldn't receive message in broadcaster");
match subscribers.get(type_to_str(&message.dispatch_type)) {
Some(ref subs) => {
for sub in subs.iter() { sub.send(*message).unwrap(); }
},
None => ()
}
}
});
}
}
Full dispatcher code is in this gist: https://gist.github.com/timonv/5cdc56bf671cee69d3fa
If it's still relevant, built against the 5-2-2015 nightly.
Arc requires Sync, and it seems to me like you're attempting to put channels inside an Arc. Channels are not Sync, neither Sender nor Receiver.
Without knowing what you're trying to do, here are some things that may help you:
it's possible to clone Sender, so where you would probably Arc a T and share it between many threads, you can instead clone a Sender and send it to many threads, since it is Send
otherwise (and especially for Receiver, which you can't clone) you have to stick it inside an Arc<Mutex<T>>, which makes it Sync.
Although Jorge is correct in the general sense, the problem with this particular code is that creating an Arc Mutex takes ownership of the argument and can thus not be a reference. This makes sense when you think about it. How can you lock something that is not yours? Or more concrete, we need to lock whatever is at that memory location, not the pointer to it.
Changing the code to create the Arc Mutex when the broadcaster is added to the struct solves the problem. This would change that part of the code to:
pub fn register_broadcaster(&mut self, broadcaster: &mut Broadcast) {
let handle = Arc::new(Mutex::new(broadcaster.broadcast_handle()));
self.broadcasters.push(handle);
}
And then the start method of the dispatcher would look like:
pub fn start(&self) {
// Assuming that broadcasters.clone() copies the vector, but increase ref count on els
for broadcaster in self.broadcasters.clone() {
let subscribers = self.subscribers.clone();
thread::spawn(move || {
loop {
let message = broadcaster.lock().unwrap().recv().ok().expect("Couldn't receive message in broadcaster or channel hung up");
match subscribers.get(type_to_str(&message.dispatch_type)) {
Some(ref subs) => {
for sub in subs.iter() { sub.send(message.clone()).unwrap(); }
},
None => ()
}
}
});
}
}

Resources