How do I spawn a long running Tokio task within another task without blocking the parent task? - rust

I'm trying to construct an object that can manage a feed from a websocket but be able to switch between multiple feeds.
There is a Feed trait:
trait Feed {
async fn start(&mut self);
async fn stop(&mut self);
}
There are three structs that implement Feed: A, B, and C.
When start is called, it starts an infinite loop of listening for messages from a websocket and processing each one as it comes in.
I want to implement a FeedManager that maintains a single active feed but can receive commands to switch what feed source it is using.
enum FeedCommand {
Start(String),
Stop,
}
struct FeedManager {
active_feed_handle: tokio::task::JoinHandle,
controller: mpsc::Receiver<FeedCommand>,
}
impl FeedManager {
async fn start(&self) {
while let Some(command) = self.controller.recv().await {
match command {
FeedCommand::Start(feed_type) => {
// somehow tell the active feed to stop (need channel probably) or kill the task?
if feed_type == "A" {
// replace active feed task with a new tokio task for consuming feed A
} else if feed_type == "B" {
// replace active feed task with a new tokio task for consuming feed B
} else {
// replace active feed task with a new tokio task for consuming feed C
}
}
}
}
}
}
I'm struggling to understand how to manage all the of Tokio tasks properly. FeedManager's core loop is to listen forever for new commands that come in, but it needs to be able to spawn another long lived task without blocking on it (so it can listen for commands).
My first attempt was:
if feed_type == "A" {
self.active_feed_handle = tokio::spawn(async {
A::new().start().await;
});
self.active_feed_handle.await
}
the .await on the handle would cause the core loop to no longer accept commands, right?
can I omit that last .await and have the task still run?
do I need to clean up the currently active task some way?

You can I spawn a long running Tokio task without blocking the parent task by spawning a task — that's a primary reason tasks exist. If you don't .await the task, then you won't wait for the task:
use std::time::Duration;
use tokio::{task, time}; // 1.3.0
#[tokio::main]
async fn main() {
task::spawn(async {
time::sleep(Duration::from_secs(100)).await;
eprintln!(
"You'll likely never see this printed \
out because the parent task has exited \
and so has the entire program"
);
});
}
See also:
What happens to an async task when it is aborted?

One way to do this would be to use Tokio's join!() macro, which takes multiple futures and awaits on all of them. You could the create multiple futures and join!() them together to await on them collectively.

Related

How to terminate a blocking tokio task?

In my application I have a blocking task that synchronically reads messages from a queue and feeds them to a running task.
All of this works fine, but the problem that I'm having is that the process does not terminate correctly, since the queue_reader task does not stop.
I've constructed a small example based on the tokio documentation at: https://docs.rs/tokio/1.20.1/tokio/task/fn.spawn_blocking.html
use tokio::sync::mpsc;
use tokio::task;
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
loop {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
println!("Finalizing thread");
queue_reader.abort(); // This doesn't seem to terminate the queue_reader task
queue_reader.await.unwrap(); // <-- The process hangs on this task.
println!("Done");
}
At first I expected that queue_reader.abort() should terminate the task, however it doesn't. My expectation is that tokio can only do this for tasks that use .await internally, because that will handle control over to tokio. Is this right?
In order to terminate the queue_reader task I introduced a oneshot channel, over which I signal the termination, as shown in the next snippet.
use tokio::task;
use tokio::sync::{oneshot, mpsc};
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// A new channel to communicate when the process must finish.
let (term_tx, mut term_rx) = oneshot::channel();
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
// As long as termination is not signalled
while term_rx.try_recv().is_err() {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
// Signal termination
term_tx.send(()).unwrap();
println!("Finalizing thread");
queue_reader.await.unwrap();
println!("Done");
}
My question is, is this the canonical/best way to do this, or are there better alternatives?
Tokio cannot terminate CPU-bound/blocking tasks.
It is technically possible to kill OS threads, but generally it is not a good idea, as it's expensive to create new threads and it can leave your program in an invalid state. Even if Tokio decided this was something worth implementing, it would serverely limit its implementation - it would be forced into a multithread model, just to support the possibility that you'd want to kill a blocking task before it's finished.
Your solution is pretty good; give your blocking task the responsibility for terminating itself and provide a way to tell it to do so. If this future was part of a library, you could abstract the mechanism away by returning a "handle" to the task that had a cancel() method.
Are there better alternatives? Maybe, but that would depend on other factors. Your solution is good and easily extended, for example if you later needed to send different types of signal to the task.

What's a good detailed explanation of Tokio's simple TCP echo server example (on GitHub and API reference)?

Tokio has the same example of a simple TCP echo server on its:
GitHub main page (https://github.com/tokio-rs/tokio)
API reference main page (https://docs.rs/tokio/0.2.18/tokio/)
However, in both pages, there is no explanation of what's actually going on. Here's the example, slightly modified so that the main function does not return Result<(), Box<dyn std::error::Error>>:
use tokio::net::TcpListener;
use tokio::prelude::*;
#[tokio::main]
async fn main() {
if let Ok(mut tcp_listener) = TcpListener::bind("127.0.0.1:8080").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
let mut buf = [0; 1024];
// In a loop, read data from the socket and write the data back.
loop {
let n = match tcp_stream.read(&mut buf).await {
// socket closed
Ok(n) if n == 0 => return,
Ok(n) => n,
Err(e) => {
eprintln!("failed to read from socket; err = {:?}", e);
return;
}
};
// Write the data back
if let Err(e) = tcp_stream.write_all(&buf[0..n]).await {
eprintln!("failed to write to socket; err = {:?}", e);
return;
}
}
});
}
}
}
After reading the Tokio documentation (https://tokio.rs/docs/overview/), here's my mental model of this example. A task is spawned for each new TCP connection. And a task is ended whenever a read/write error occurs, or when the client ends the connection (i.e. n == 0 case). Therefore, if there are 20 connected clients at a point in time, there would be 20 spawned tasks. However, under the hood, this is NOT equivalent to spawning 20 threads to handle the connected clients concurrently. As far as I understand, this is basically the problem that asynchronous runtimes are trying to solve. Correct so far?
Next, my mental model is that a tokio scheduler (e.g. the multi-threaded threaded_scheduler which is the default for apps, or the single-threaded basic_scheduler which is the default for tests) will schedule these tasks concurrently on 1-to-N threads. (Side question: for the threaded_scheduler, is N fixed during the app's lifetime? If so, is it equal to num_cpus::get()?). If one task is .awaiting for the read or write_all operations, then the scheduler can use the same thread to perform more work for one of the other 19 tasks. Still correct?
Finally, I'm curious whether the outer code (i.e. the code that is .awaiting for tcp_listener.accept()) is itself a task? Such that in the 20 connected clients example, there aren't really 20 tasks but 21: one to listen for new connections + one per connection. All of these 21 tasks could be scheduled concurrently on one or many threads, depending on the scheduler. In the following example, I wrap the outer code in a tokio::spawn and .await the handle. Is it completely equivalent to the example above?
use tokio::net::TcpListener;
use tokio::prelude::*;
#[tokio::main]
async fn main() {
let main_task_handle = tokio::spawn(async move {
if let Ok(mut tcp_listener) = TcpListener::bind("127.0.0.1:8080").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
// ... same as above ...
});
}
}
});
main_task_handle.await.unwrap();
}
This answer is a summary of an answer I received on Tokio's Discord from Alice Ryhl. Big thank you!
First of all, indeed, for the multi-threaded scheduler, the number of OS threads is fixed to num_cpus.
Second, Tokio can swap the currently running task at every .await on a per-thread basis.
Third, the main function runs in its own task, which is spawned by the #[tokio::main] macro.
Therefore, for the first code block example, if there are 20 connected clients, there would be 21 tasks: one for the main macro + one for each of the 20 open TCP streams. For the second code block example, there would be 22 tasks because of the extra outer tokio::spawn but it's needless and doesn't add any concurrency.

How can I share or avoid sharing a websocket resource between two threads?

I am using tungstenite to build a chat server, and the way I want to do it relies on having many threads that communicate with each other through mpsc. I want to start up a new thread for each user that connects to the server and connect them to a websocket, and also have that thread be able to read from mpsc so that the server can send messages out through that connection.
The problem is that the mpsc read blocks the thread, but I can't block the thread if I want to be reading from it. The only thing I could think of to work around that is to make two threads, one for inbound and one for outbound messages, but that requires me to share my websocket connection with both workers, which of course I cannot do.
Here's a heavily truncated version of my code where I try to make two workers in the Action::Connect arm of the match statement, which gives error[E0382]: use of moved value: 'websocket' for trying to move it into the second worker's closure:
extern crate tungstenite;
extern crate workerpool;
use std::net::{TcpListener, TcpStream};
use std::sync::mpsc::{self, Sender, Receiver};
use workerpool::Pool;
use workerpool::thunk::{Thunk, ThunkWorker};
use tungstenite::server::accept;
pub enum Action {
Connect(TcpStream),
Send(String),
}
fn main() {
let (main_send, main_receive): (Sender<Action>, Receiver<Action>) = mpsc::channel();
let worker_pool = Pool::<ThunkWorker<()>>::new(8);
{
// spawn thread to listen for users connecting to the server
let main_send = main_send.clone();
worker_pool.execute(Thunk::of(move || {
let listener = TcpListener::bind(format!("127.0.0.1:{}", 8080)).unwrap();
for (_, stream) in listener.incoming().enumerate() {
main_send.send(Action::Connect(stream.unwrap())).unwrap();
}
}));
}
let mut users: Vec<Sender<String>> = Vec::new();
// process actions from children
while let Some(act) = main_receive.recv().ok() {
match act {
Action::Connect(stream) => {
let mut websocket = accept(stream).unwrap();
let (user_send, user_receive): (Sender<String>, Receiver<String>) = mpsc::channel();
let main_send = main_send.clone();
// thread to read user input and propagate it to the server
worker_pool.execute(Thunk::of(move || {
loop {
let message = websocket.read_message().unwrap().to_string();
main_send.send(Action::Send(message)).unwrap();
}
}));
// thread to take server output and propagate it to the server
worker_pool.execute(Thunk::of(move || {
while let Some(message) = user_receive.recv().ok() {
websocket.write_message(tungstenite::Message::Text(message.clone())).unwrap();
}
}));
users.push(user_send);
}
Action::Send(message) => {
// take user message and echo to all users
for user in &users {
user.send(message.clone()).unwrap();
}
}
}
}
}
If I create just one thread for both in and output in that arm, then user_receive.recv() blocks the thread so I can't read any messages with websocket.read_message() until I get an mpsc message from the main thread. How can I solve both problems? I considered cloning the websocket but it doesn't implement Clone and I don't know if just making a new connection with the same stream is a reasonable thing to try to do, it seems hacky.
The problem is that the mpsc read blocks the thread
You can use try_recv to avoid thread blocking. The another implementation of mpsc is crossbeam_channel. That project is a recommended replacement even by the author of mpsc
I want to start up a new thread for each user that connects to the server
I think the asyn/await approach will be much better from most of the prospectives then thread per client one. You can read more about it there

Does rust currently have a library implement function similar to JavaScript's setTimeout and setInterval?

Does rust currently have a library implement function similar to JavaScript's setTimeout and setInverval?, that is, a library that can call multiple setTimeout and setInterval to implement management of multiple tasks at the same time.
I feel that tokio is not particularly convenient to use. I imagine it to be used like this:
fn callback1() {
println!("callback1");
}
fn callback2() {
println!("callback2");
}
set_interval(callback1, 10);
set_interval(callback1, 20);
set_timeout(callback1, 30);
Of course, I can simulate a function to make it work:
// just for test, not what I wanted at all
type rust_listener_callback = fn();
fn set_interval(func: rust_listener_callback, duration: i32) {
func()
}
fn set_timeout(func: rust_listener_callback, duration: i32) {
func();
}
If a set_interval is implemented in this way, multiple combinations, dynamic addition and deletion, and cancellation are not particularly convenient:
use tokio::time;
async fn set_interval(func: rust_listener_callback, duration: u64) {
let mut interval = time::interval(Duration::from_millis(duration));
tokio::spawn(async move {
loop {
interval.tick().await;
func()
}
}).await;
}
// emm maybe loop can be removed, just a sample
While, What I want to know is if there is a library to do this, instead of writing it myself.
I have some idea if I would write it myself. Generally, all functions are turned into a task queue or task tree, and then tokio::time::delay_for can be used to execute them one by one, but the details are actually more complicated.
However, I think that this general capability may have already been implemented but I has not found for the time being, so I want to ask here, Thank you very much.
And importantly, I hope it can support single thread
setTimeout can be done like this without the need for a crate:
tokio::spawn(async move {
tokio::time::sleep(Duration::from_secs(5)).await;
// code goes here
});
I asked myself the same question a few days ago, created a solution for this (for tokio runtimes), and found your stackoverflow post just now.
https://crates.io/crates/tokio-js-set-interval
code.rs
use std::time::Duration;
use tokio_js_set_interval::{set_interval, set_timeout};
#[tokio::main]
async fn main() {
println!("hello1");
set_timeout!(println!("hello2"), 0);
println!("hello3");
set_timeout!(println!("hello4"), 0);
println!("hello5");
// give enough time before tokios runtime exits
tokio::time::sleep(Duration::from_millis(1)).await;
}
But this must be used with caution. There is no guarantee that the futures will be executed (because tokios runtime must run long enough). Use it only:
for educational purposes,
and if you have low priority background tasks that you don't expect to get executed always
I created a library just for this which allows setting many timeouts using only 1 tokio task (instead of spawning a new task for each timeout) which provides better performance and lower memory usage.
The library also supports cancelling timeouts, and provides some ways to optimize the performance of the timeouts.
Check it out:
https://crates.io/crates/set_timeout
Usage example:
#[tokio::main]
async fn main() {
let scheduler = TimeoutScheduler::new(None);
// schedule a future which will run after 1.234 seconds from now.
scheduler.set_timeout(Duration::from_secs_f32(1.234), async move {
println!("It works!");
});
// make sure that the main task doesn't end before the timeout is executed, because if the main
// task returns the runtime stops running.
tokio::time::sleep(Duration::from_secs(2)).await;
}

Is it possible to force resume a sleeping thread?

Is it possible to force resume a sleeping thread which has been paused? For example, by calling sleep:
std::thread::sleep(std::time::Duration::from_secs(60 * 20));
I know that I can communicate between threads using std::sync::mpsc but if the thread is asleep, this does not force it to wake up before the time indicated.
I have thought that using std::sync::mpsc and maybe
Builder and .name associated with the thread, but I do not know how to get the thread to wake up.
If you want to be woken up by an event, thread::sleep() is not the correct function to use, as it's not supposed to be stopped.
There are other methods of waiting while being able to be woken up by an event (this is usually called blocking). Probably the easiest way is to use a channel together with Receiver::recv_timeout(). Often it's also sufficient to send () through the channel. That way we just communicate a signal, but don't send actual data.
If you don't want to wake up after a specific timeout, but only when a signal arrives, just use Receiver::recv().
Example with timeout:
use std::thread;
use std::sync::mpsc::{self, RecvTimeoutError};
use std::time::Duration;
use std::io;
fn main() {
let (sender, receiver) = mpsc::channel();
thread::spawn(move || {
loop {
match receiver.recv_timeout(Duration::from_secs(2)) {
Err(RecvTimeoutError::Timeout) => {
println!("Still waiting... I'm bored!");
// we'll try later...
}
Err(RecvTimeoutError::Disconnected) => {
// no point in waiting anymore :'(
break;
}
Ok(_) => {
println!("Finally got a signal! ♥♥♥");
// doing work now...
}
}
}
});
loop {
let mut s = String::new();
io::stdin().read_line(&mut s).expect("reading from stdin failed");
if s.trim() == "start" {
sender.send(()).unwrap();
}
}
}
Here, the second thread is woken up at least every two seconds (the timeout), but also earlier once something was sent through the channel.
park_timeout allows timed sleeps with wakeups from unpark, but it can also wake up early.
See std::thread module documentation

Resources