Is it possible, if a task sends to a and an other (at the same time) sends to b, that tokio::select! on a and b drops one of the value by cancelling the remaining future? Or is it guaranteed to be received at the next loop iteration?
use tokio::sync::mpsc::Receiver;
async fn foo(mut a: Receiver<()>, mut b: Receiver<()>) {
loop {
tokio::select!{
_ = a.recv() => {
println!("A!");
}
_ = b.recv() => {
println!("B!");
}
}
}
}
My mind can't get around what is really happening behind the async magic in that case.
It doesn't appear to be guaranteed in the documentation anywhere, but is likely to work for reading directly from a channel because of the way rusts poll based architecture works. A select is equivalent to polling each of futures in a random order, until one of them is ready, or if none are, then waiting until a waker is signaled and then repeating the process. A message is only removed from the channel when returned by a successful poll. A successful poll stops the select, so the rest of the channels will not be touched. Thus they will be polled the next time the loop occurs and then return the message.
However, this is a dangerous approach, because if the receiver is replaced with something that returns a future that does anything more complex than a direct read, where it could potentially suspend after the read, then you could lose messages when that happens. As such it should probably be treated as if it wouldn't work. A safer approach would be to store the futures in mutable variables that you update when they fire:
use tokio::sync::mpsc::Receiver;
async fn foo(mut a: Receiver<()>, mut b: Receiver<()>) {
let mut a_fut = a.recv();
let mut b_fut = b.recv();
loop {
tokio::select!{
_ = a_fut => {
println!("A!");
a_fut = a.recv();
}
_ = b_fut => {
println!("B!");
b_fut = b.recv();
}
}
}
}
Related
My requirement is very simple, which is a very reasonable requirement in many programs. It is to send a specified message to my Channel after a specified time.
I've checked tokio for topics related to delay, interval or timeout, but none of them seem that straightforward to implement.
What I've come up with now is to spawn an asynchronous task, then wait or sleep for a certain amount of time, and finally send the message.
But, obviously, spawning an asynchronous task is a relatively heavy operation. Is there a better solution?
async fn my_handler(sender: mpsc::Sender<i32>, dur: Duration) {
tokio::spawn(async {
time::sleep(dur).await;
sender.send(0).await;
}
}
You could try adding a second channel and a continuously running task that buffers messages until the time they are to be received. Implementing this is more involved than it sounds, I hope I'm handling cancellations right here:
fn make_timed_channel<T: Ord + Send + Sync + 'static>() -> (Sender<(Instant, T)>, Receiver<T>) {
// Ord is an unnecessary requirement arising from me stuffing both the Instant and the T into the Binary heap
// You could drop this requirement by using the priority_queue crate instead
let (sender1, receiver1) = mpsc::channel::<(Instant, T)>(42);
let (sender2, receiver2) = mpsc::channel::<T>(42);
let mut receiver1 = Some(receiver1);
tokio::spawn(async move {
let mut buf = std::collections::BinaryHeap::<Reverse<(Instant, T)>>::new();
loop {
// Pretend we're a bounded channel or exit if the upstream closed
if buf.len() >= 42 || receiver1.is_none() {
match buf.pop() {
Some(Reverse((time, element))) => {
sleep_until(time).await;
if sender2.send(element).await.is_err() {
break;
}
}
None => break,
}
}
// We have some deadline to send a message at
else if let Some(Reverse((then, _))) = buf.peek() {
if let Ok(recv) = timeout_at(*then, receiver1.as_mut().unwrap().recv()).await {
match recv {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
} else {
if sender2.send(buf.pop().unwrap().0 .1).await.is_err() {
break;
}
}
}
// We're empty, wait around
else {
match receiver1.as_mut().unwrap().recv().await {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
}
}
});
(sender1, receiver2)
}
Playground
Whether this is more efficient than spawning tasks, you'd have to benchmark. (I doubt it. Tokio iirc has some much fancier solution than a BinaryHeap for waiting for waking up at the next timeout, e.g.)
One optimization you could make if you don't need a Receiver<T> but just something that .poll().await can be called on: You could drop the second channel and maintain the BinaryHeap inside a custom receiver.
In my application I have a blocking task that synchronically reads messages from a queue and feeds them to a running task.
All of this works fine, but the problem that I'm having is that the process does not terminate correctly, since the queue_reader task does not stop.
I've constructed a small example based on the tokio documentation at: https://docs.rs/tokio/1.20.1/tokio/task/fn.spawn_blocking.html
use tokio::sync::mpsc;
use tokio::task;
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
loop {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
println!("Finalizing thread");
queue_reader.abort(); // This doesn't seem to terminate the queue_reader task
queue_reader.await.unwrap(); // <-- The process hangs on this task.
println!("Done");
}
At first I expected that queue_reader.abort() should terminate the task, however it doesn't. My expectation is that tokio can only do this for tasks that use .await internally, because that will handle control over to tokio. Is this right?
In order to terminate the queue_reader task I introduced a oneshot channel, over which I signal the termination, as shown in the next snippet.
use tokio::task;
use tokio::sync::{oneshot, mpsc};
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// A new channel to communicate when the process must finish.
let (term_tx, mut term_rx) = oneshot::channel();
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
// As long as termination is not signalled
while term_rx.try_recv().is_err() {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
// Signal termination
term_tx.send(()).unwrap();
println!("Finalizing thread");
queue_reader.await.unwrap();
println!("Done");
}
My question is, is this the canonical/best way to do this, or are there better alternatives?
Tokio cannot terminate CPU-bound/blocking tasks.
It is technically possible to kill OS threads, but generally it is not a good idea, as it's expensive to create new threads and it can leave your program in an invalid state. Even if Tokio decided this was something worth implementing, it would serverely limit its implementation - it would be forced into a multithread model, just to support the possibility that you'd want to kill a blocking task before it's finished.
Your solution is pretty good; give your blocking task the responsibility for terminating itself and provide a way to tell it to do so. If this future was part of a library, you could abstract the mechanism away by returning a "handle" to the task that had a cancel() method.
Are there better alternatives? Maybe, but that would depend on other factors. Your solution is good and easily extended, for example if you later needed to send different types of signal to the task.
Recently, I am reading source code in the repo parity-bridges-common. There are some rs files full of incomprehensibly asynchronous syntax that I am not familiar with. Especially about FusedFuture. Example file:
sync_loop.rs
message_loop.rs
According to the futures::future::fuse, what I understand is that we can transfer a Future into FusedFuture and then poll it, again and again, for many times. Besides this, I find a function named terminated() in FuseFurture and example below.
Creates a new Fuse-wrapped future which is already terminated.
This can be useful in combination with looping and the select! macro, which bypasses terminated futures.
let (sender, mut stream) = mpsc::unbounded();
// Send a few messages into the stream
sender.unbounded_send(()).unwrap();
sender.unbounded_send(()).unwrap();
drop(sender);
// Use `Fuse::terminated()` to create an already-terminated future
// which may be instantiated later.
// bear: We must terminated() here in order to use select!?
let foo_printer = Fuse::terminated();
pin_mut!(foo_printer);
loop {
select! {
_ = foo_printer => {},
() = stream.select_next_some() => {
if !foo_printer.is_terminated() {
println!("Foo is already being printed!");
} else {
// bear: here we reset the foo_printer pointed value?
foo_printer.set(async {
// do some other async operations
println!("Printing foo from `foo_printer` future");
}.fuse());
}
},
complete => break, // `foo_printer` is terminated and the stream is done
}
}
I cannot understand when to use this function and how to combine it with select!. Can anyone help me understand it more easily? Or are there better docs or examples about this usage?
Some posts i found, maybe useful:
Why doesn’t tokio::select! require FusedFuture?
What is the difference between futures::select! and tokio::select?
I'm trying to understand how to implement polling multiple futures with different types. For context, I'm calling an API that will return something like:
[{"type": "source_a", "id": 123}, {"type": "source_b", "id": 234}, ...]
Each type in the API response requires a call to another API, with each API returning different data types. The code I've written works something like this:
async fn get_data(sources: Vec<Source>) -> Data {
let mut data = Default::default();
for source in sources {
if source.kind == "source_a" {
let source_data = get_source_a(source).await;
process_source_a(source_data, &mut data);
} else if source.kind == "source_b" {
...
}
}
data
}
This won't run concurrently, it will simply fetch sources one at a time and process them. How can I rewrite this so that each source is fetched concurrently and then processed once data is available? Speaking Rustily, I think what I want is to execute a closure that mutably borrows data when the future is ready. Should I be looking at something like an Arc<RefCell<Data>>?
To process the futures in parallel, you need to await something like join_all, which will run them concurrently and return when they are all done. For this to work, you have to resolve two issues:
join_all requires futures of the same type, so you need to box them or otherwise unify them.
data needs to be accessed by multiple async blocks, so it needs to be protected by Arc and Mutex.
The first issue can be solved simply by spawning the async fns as tasks, which has the added advantage of potentially running them in parallel (in addition to them being run concurrently). The example below uses tokio::spawn, but it would be almost exactly the same for async_std. Since there is no reproducible example, I can't test the code, but it could look like this:
async fn get_data(sources: Vec<Source>) -> Data {
let data = Arc::new(Mutex::new(Data::default()));
let mut tasks = vec![];
for source in sources {
if source.kind == "source_a" {
let data = Arc::clone(&data);
tasks.push(tokio::task::spawn(async move {
let source_data = get_source_a(source).await;
process_source_a(source_data, &mut data.lock().unwrap());
}));
} else if source.kind == "source_b" {
// ...
}
}
// Wait for all sources to finish and propagate the panic if any.
// With async_std this wouldn't require the `for_each()`.
futures::future::join_all(tasks)
.await
.for_each(|x| x.unwrap());
// As all tasks are done, there should be no references to `data` at
// this point, so we can extract it out of the Arc<Mutex<_>> wrapping.
data.try_unwrap().unwrap().into_inner()
}
Here's an example but what should I wait on to decide when it is done. Do we have a better way to wait for the channel to be empty and all the threads to have completed? Full example is at http://github.com/posix4e/rust_webcrawl
loop {
let n_active_threads = running_threads.compare_and_swap(0, 0, Ordering::SeqCst);
match rx.try_recv() {
Ok(new_site) => {
let new_site_copy = new_site.clone();
let tx_copy = tx.clone();
counter += 1;
print!("{} ", counter);
if !found_urls.contains(&new_site) {
found_urls.insert(new_site);
running_threads.fetch_add(1, Ordering::SeqCst);
let my_running_threads = running_threads.clone();
pool.execute(move || {
for new_url in get_websites_helper(new_site_copy) {
if new_url.starts_with("http") {
tx_copy.send(new_url).unwrap();
}
}
my_running_threads.fetch_sub(1, Ordering::SeqCst);
});
}
}
Err(TryRecvError::Empty) if n_active_threads == 0 => break,
Err(TryRecvError::Empty) => {
writeln!(&mut std::io::stderr(),
"Channel is empty, but there are {} threads running",
n_active_threads);
thread::sleep_ms(10);
},
Err(TryRecvError::Disconnected) => unreachable!(),
}
}
This is actually a very complicated question, one with a great potential for race conditions! As I understand it, you:
Have an unbounded queue
Have a set of workers that operate on the queue items
The workers can put an unknown amount of items back into the queue
Want to know when everything is "done"
One obvious issue is that it may never be done. If every worker puts one item back into the queue, you've got an infinite loop.
That being said, I feel like the solution is to track
How many items are queued
How many items are in progress
When both of these values are zero, then you are done. Easier said than done...
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize,Ordering};
use std::sync::mpsc::{channel,TryRecvError};
use std::thread;
fn main() {
let running_threads = Arc::new(AtomicUsize::new(0));
let (tx, rx) = channel();
// We prime the channel with the first bit of work
tx.send(10).unwrap();
loop {
// In an attempt to avoid a race condition, we fetch the
// active thread count before checking the channel. Otherwise,
// we might read nothing from the channel, and *then* a thread
// finishes and added something to the queue.
let n_active_threads = running_threads.compare_and_swap(0, 0, Ordering::SeqCst);
match rx.try_recv() {
Ok(id) => {
// I lie a bit and increment the counter to start
// with. If we let the thread increment this, we might
// read from the channel before the thread ever has a
// chance to run!
running_threads.fetch_add(1, Ordering::SeqCst);
let my_tx = tx.clone();
let my_running_threads = running_threads.clone();
// You could use a threadpool, but I'm spawning
// threads to only rely on stdlib.
thread::spawn(move || {
println!("Working on {}", id);
// Simulate work
thread::sleep_ms(100);
if id != 0 {
my_tx.send(id - 1).unwrap();
// Send multiple sometimes
if id % 3 == 0 && id > 2 {
my_tx.send(id - 2).unwrap();
}
}
my_running_threads.fetch_sub(1, Ordering::SeqCst);
});
},
Err(TryRecvError::Empty) if n_active_threads == 0 => break,
Err(TryRecvError::Empty) => {
println!("Channel is empty, but there are {} threads running", n_active_threads);
// We sleep a bit here, to avoid quickly spinning
// through an empty channel while the worker threads
// work.
thread::sleep_ms(1);
},
Err(TryRecvError::Disconnected) => unreachable!(),
}
}
}
I make no guarantees that this implementation is perfect (I probably should guarantee that it's broken, because threading is hard). One big caveat is that I don't intimately know the meanings of all the variants of Ordering, so I chose the one that looked to give the strongest guarantees.