Can tokio::select allow defining arbitrary numbre of branches - rust

I am writing an echo server that is able to listen on multiple ports. My working server below relies on select to accept connections from 2 different listeners
However, instead of defining the listeners as individual variables, is it possible to define select branches based on a Vec<TcpListener> ?
use tokio::{io, net, select, spawn};
#[tokio::main]
async fn main() {
let listener1 = net::TcpListener::bind("127.0.0.1:8001").await.unwrap();
let listener2 = net::TcpListener::bind("127.0.0.1:8002").await.unwrap();
loop {
let (conn, _) = select! {
v = listener1.accept() => v.unwrap(),
v = listener2.accept() => v.unwrap(),
};
spawn(handle(conn));
}
}
async fn handle(mut conn: net::TcpStream) {
let (mut read, mut write) = conn.split();
io::copy(&mut read, &mut write).await.unwrap();
}

While futures::future::select_all() works, it is not very elegant (IMHO) and it creates an allocation for each round. A better solution is to use streams (note this also allocates on each round, but this allocates much less):
use tokio::{io, net, spawn};
use tokio_stream::wrappers::TcpListenerStream;
use futures::stream::{StreamExt, SelectAll};
#[tokio::main]
async fn main() {
let mut listeners = SelectAll::new();
listeners.push(TcpListenerStream::new(net::TcpListener::bind("127.0.0.1:8001").await.unwrap()));
listeners.push(TcpListenerStream::new(net::TcpListener::bind("127.0.0.1:8002").await.unwrap()));
while let Some(conn) = listeners.next().await {
let conn = conn.unwrap();
spawn(handle(conn));
}
}

You can use the select_all function from the futures crate, which takes an iterator of futures and awaits any of them (instead of all of them, like join_all does):
use futures::{future::select_all, FutureExt};
use tokio::{io, net, select, spawn};
#[tokio::main]
async fn main() {
let mut listeners = [
net::TcpListener::bind("127.0.0.1:8001").await.unwrap(),
net::TcpListener::bind("127.0.0.1:8002").await.unwrap(),
];
loop {
let (result, index, _) = select_all(
listeners
.iter_mut()
// note: `FutureExt::boxed` is called here because `select_all`
// requires the futures to be pinned
.map(|listener| listener.accept().boxed()),
)
.await;
let (conn, _) = result.unwrap();
spawn(handle(conn));
}
}

Related

deno_runtime running multiple invokes on single worker concurrently

I'm trying to run multiple invocation of the same script on a single deno MainWorker concurrently, and waiting for their
results (since the scripts can be async). Conceptually, I want something like the loop in run_worker below.
type Tx = Sender<(String, Sender<String>)>;
type Rx = Receiver<(String, Sender<String>)>;
struct Runner {
worker: MainWorker,
futures: FuturesUnordered<Pin<Box<dyn Future<Output=(String, Result<Global<Value>, Error>)>>>>,
response_futures: FuturesUnordered<Pin<Box<dyn Future<Output=(String, Result<(), SendError<String>>)>>>>,
result_senders: HashMap<String, Sender<String>>,
}
impl Runner {
fn new() ...
async fn run_worker(&mut self, rx: &mut Rx, main_module: ModuleSpecifier, user_module: ModuleSpecifier) {
self.worker.execute_main_module(&main_module).await.unwrap();
self.worker.preload_side_module(&user_module).await.unwrap();
loop {
tokio::select! {
msg = rx.recv() => {
if let Some((id, sender)) = msg {
let global = self.worker.js_runtime.execute_script("test", "mod.entry()").unwrap();
self.result_senders.insert(id, sender);
self.futures.push(Box::pin(async {
let resolved = self.worker.js_runtime.resolve_value(global).await;
return (id, resolved);
}));
}
},
script_result = self.futures.next() => {
if let Some((id, out)) = script_result {
self.response_futures.push(Box::pin(async {
let value = deserialize_value(out.unwrap(), &mut self.worker);
let res = self.result_senders.remove(&id).unwrap().send(value).await;
return (id.clone(), res);
}));
}
},
// also handle response_futures here
else => break,
}
}
}
}
The worker can't be borrowed as mutable multiple times, so this won't work. So the worker has to be a RefCell, and
I've created a BorrowingFuture:
struct BorrowingFuture {
worker: RefCell<MainWorker>,
global: Global<Value>,
id: String
}
And its poll implementation:
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
match Pin::new(&mut Box::pin(self.worker.borrow_mut().js_runtime.resolve_value(self.global.clone()))).poll(cx) {
Poll::Ready(result) => Poll::Ready((self.id.clone(), result)),
Poll::Pending => {
cx.waker().clone().wake_by_ref();
Poll::Pending
}
}
}
So the above
self.futures.push(Box::pin(async {
let resolved = self.worker.js_runtime.resolve_value(global).await;
return (id, resolved);
}));
would become
self.futures.push(Box::pin(BorrowingFuture{worker: self.worker, global: global.clone(), id: id.clone()}));
and this would have to be done for the response_futures above as well.
But I see a few issues with this.
Creating a new future on every poll and then polling that seems wrong, but it does work.
It probably has a performance impact because new objects are created constantly.
The same issue would happen for the response futures, which would call send on each poll, which seems completely wrong.
The waker.wake_by_ref is called on every poll, because there is no way to know when a script result
will resolve. This results in the future being polled thousands (and more) times per second (always creating a new object),
which could be the same as checking it in a loop, I guess.
Note My current setup doesn't use select!, but an enum as Output from multiple Future implementations, pushed into
a single FuturesUnordered, and then matched to handle the correct type (script, send, receive). I used select here because
it's far less verbose, and gets the point across.
Is there a way to do this better/more efficiently? Or is it just not the way MainWorker was meant to be used?
main for completeness:
#[tokio::main]
async fn main() {
let main_module = deno_runtime::deno_core::resolve_url(MAIN_MODULE_SPECIFIER).unwrap();
let user_module = deno_runtime::deno_core::resolve_url(USER_MODULE_SPECIFIER).unwrap();
let (tx, mut rx) = channel(1);
let (result_tx, mut result_rx) = channel(1);
let handle = thread::spawn(move || {
let runtime = tokio::runtime::Builder::new_multi_thread().enable_all().build().unwrap();
let mut runner = Runner::new();
runtime.block_on(runner.run_worker(&mut rx, main_module, user_module));
});
tx.send(("test input".to_string(), result_tx)).await.unwrap();
let result = result_rx.recv().await.unwrap();
println!("result from worker {}", result);
handle.join().unwrap();
}

How to create threads in a for loop and get the return value from each?

I am writing a program that pings a set of targets 100 times, and stores each RTT value returned from the ping into a vector, thus giving me a set of RTT values for each target. Say I have n targets, I would like all of the pinging to be done concurrently. The rust code looks like this:
let mut sample_rtts_map = HashMap::new();
for addr in targets.to_vec() {
let mut sampleRTTvalues: Vec<f32> = vec![];
//sample_rtts_map.insert(addr, sampleRTTvalues);
thread::spawn(move || {
while sampleRTTvalues.len() < 100 {
let sampleRTT = ping(addr);
sampleRTTvalues.push(sampleRTT);
// thread::sleep(Duration::from_millis(5000));
}
});
}
The hashmap is used to tell which vector of values belongs to which target. The problem is, how do I retrieve the updated sampleRTTvalues from each thread after the thread is done executing? I would like something like:
let (name, sampleRTTvalues) = thread::spawn(...)
The name, being the name of the thread, and sampleRTTvalues being the vector. However, since I'm creating threads in a for loop, each thread is being instantiated the same way, so how I differentiate them?
Is there some better way to do this? I've looked into schedulers, future, etc., but it seems my case can just be done with simple threads.
I go the desired behavior with the following code:
use std::thread;
use std::sync::mpsc;
use std::collections::HashMap;
use rand::Rng;
use std::net::{Ipv4Addr,Ipv6Addr,IpAddr};
const RTT_ONE: IpAddr = IpAddr::V4(Ipv4Addr::new(127,0,0,1));
const RTT_TWO: IpAddr = IpAddr::V6(Ipv6Addr::new(0,0,0,0,0,0,0,1));
const RTT_THREE: IpAddr = IpAddr::V4(Ipv4Addr::new(127,0,1,1));//idk how ip adresses work, forgive if this in invalid but you get the idea
fn ping(address: IpAddr) -> f32 {
rand::thread_rng().gen_range(5.0..107.0)
}
fn main() {
let targets = [RTT_ONE,RTT_TWO,RTT_THREE];
let mut sample_rtts_map: HashMap<IpAddr,Vec<f32>> = HashMap::new();
for addr in targets.into_iter() {
let (sample_values,moved_values) = mpsc::channel();
let mut sampleRTTvalues: Vec<f32> = vec![];
thread::spawn(move || {
while sampleRTTvalues.len() < 100 {
let sampleRTT = ping(addr);
sampleRTTvalues.push(sampleRTT);
//thread::sleep(Duration::from_millis(5000));
}
});
sample_rtts_map.insert(addr,moved_values.recv().unwrap());
}
}
note that the use rand::Rng can be removed when implementing, as it is only so the example works. what this does is pass data from the spawned thread to the main thread, and in the method used it waits until the data is ready before adding it to the hash map. If this is problematic (takes a long time, etc.) then you can use try_recv instead of recv which will add an error / option type that will return a recoverable error if the value is ready when unwrapped, or return the value if it's ready
You can use a std::sync::mpsc channel to collect your data:
use std::collections::HashMap;
use std::sync::mpsc::channel;
use std::thread;
fn ping(_: &str) -> f32 { 0.0 }
fn main() {
let targets = ["a", "b"]; // just for example
let mut sample_rtts_map = HashMap::new();
let (tx, rx) = channel();
for addr in targets {
let tx = tx.clone();
thread::spawn(move || {
for _ in 0..100 {
let sampleRTT = ping(addr);
tx.send((addr, sampleRTT));
}
});
}
drop(tx);
// exit loop when all thread's tx have dropped
while let Ok((addr, sampleRTT)) = rx.recv() {
sample_rtts_map.entry(addr).or_insert(vec![]).push(sampleRTT);
}
println!("sample_rtts_map: {:?}", sample_rtts_map);
}
This will run all pinging threads simultaneously, and collect data in main thread synchronously, so that we can avoid using locks. Do not forget to drop sender in main thread after cloning to all pinging threads, or the main thread will hang forever.

tokio::select! but for a Vec of futures

I have a Vec of futures which I want to execute concurrently (but not necessarily in parallel). Basically, I'm looking for some kind of select function that is similar to tokio::select! but takes a collection of futures, or, conversely, a function that is similar to futures::join_all but returns once the first future is done.
An additional requirement is that once a future finished I might want to add a new future to the Vec.
With such a function, my code would roughly look like this:
use std::future::Future;
use std::time::Duration;
use tokio::time::sleep;
async fn wait(millis: u64) -> u64 {
sleep(Duration::from_millis(millis)).await;
millis
}
// This pseudo-implementation simply removes the last
// future and awaits it. I'm looking for something that
// instead polls all futures until one is finished, then
// removes that future from the Vec and returns it.
async fn select<F, O>(futures: &mut Vec<F>) -> O
where
F: Future<Output=O>
{
let future = futures.pop().unwrap();
future.await
}
#[tokio::main]
async fn main() {
let mut futures = vec![
wait(500),
wait(300),
wait(100),
wait(200),
];
while !futures.is_empty() {
let finished = select(&mut futures).await;
println!("Waited {}ms", finished);
if some_condition() {
futures.push(wait(200));
}
}
}
This is exactly what futures::stream::FuturesUnordered is for (which I've found by looking through the source of StreamExt::for_each_concurrent):
use futures::{stream::FuturesUnordered, StreamExt};
use std::time::Duration;
use tokio::time::{sleep, Instant};
async fn wait(millis: u64) -> u64 {
sleep(Duration::from_millis(millis)).await;
millis
}
#[tokio::main]
async fn main() {
let mut futures = FuturesUnordered::new();
futures.push(wait(500));
futures.push(wait(300));
futures.push(wait(100));
futures.push(wait(200));
let start_time = Instant::now();
let mut num_added = 0;
while let Some(wait_time) = futures.next().await {
println!("Waited {}ms", wait_time);
if num_added < 3 {
num_added += 1;
futures.push(wait(200));
}
}
println!("Completed all work in {}ms", start_time.elapsed().as_millis());
}
(playground)
Here's a working prototype based on streams and StreamExt::for_each_concurrent, as Martin Gallagher has suggested in a comment:
use std::time::Duration;
use tokio::sync::RwLock;
use tokio::time::sleep;
use futures::stream::{self, StreamExt};
use futures::{channel::mpsc, sink::SinkExt};
async fn wait(millis: u64) -> u64 {
sleep(Duration::from_millis(millis)).await;
millis
}
#[tokio::main]
async fn main() {
let (mut sink, futures_stream) = mpsc::unbounded();
let start_futures = vec![wait(500), wait(300), wait(100), wait(200)];
let num_futures = RwLock::new(start_futures.len());
sink.send_all(&mut stream::iter(start_futures.into_iter().map(Ok)))
.await
.unwrap();
let sink_lock = RwLock::new(sink);
futures_stream
.for_each_concurrent(None, |fut| async {
let wait_time = fut.await;
println!("Waited {}", wait_time);
if some_condition() {
println!("Adding new future");
let mut sink = sink_lock.write().await;
sink.send(wait(100)).await.unwrap();
} else {
let mut num_futures = num_futures.write().await;
*num_futures -= 1;
}
let num_futures = num_futures.read().await;
if *num_futures <= 0 {
// Close the sink to exit the for_each_concurrent
sink_lock.write().await.close().await.unwrap();
}
})
.await;
}
While this approach works it has the drawback that we need to maintain a separate counter of remaining futures so that we can close the sink -- there's no Vec of futures for which we can check whether it's empty. Closing the sink requires another lock.
Given that I'm fairly new to Rust I wouldn't be surprised if this approach could be made more elegant.

How to write using tokio Framed LinesCodec?

The following code reads from my server successfully, however I cannot seem to get the correct syntax or semantics to write back to the server when a particular line command is recognized. Do I need to create a FramedWriter? Most examples I have found split the socket but that seems overkill I expect the codec to be able to handle the bi-directional io by providing some async write method.
# Cargo.toml
[dependencies]
tokio = { version = "0.3", features = ["full"] }
tokio-util = { version = "0.4", features = ["codec"] }
//! main.rs
use tokio::net::{ TcpStream };
use tokio_util::codec::{ Framed, LinesCodec };
use tokio::stream::StreamExt;
use std::error::Error;
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let saddr = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), 8081);
let conn = TcpStream::connect(saddr).await?;
let mut server = Framed::new(conn, LinesCodec::new_with_max_length(1024));
while let Some(Ok(line)) = server.next().await {
match line.as_str() {
"READY" => println!("Want to write a line to the stream"),
_ => println!("{}", line),
}
}
Ok({})
}
According to the documentation, Framed implements Stream and Sink traits. Sink defines only the bare minimum of low-level sending methods. To get the high-level awaitable methods like send() and send_all(), you need to use the SinkExt extension trait.
For example (playground):
use futures::sink::SinkExt;
// ...
while let Some(Ok(line)) = server.next().await {
match line.as_str() {
"READY" => server.send("foo").await?,
_ => println!("{}", line),
}
}

Is there an API to race N threads (or N closures on N threads) to completion?

Given several threads that complete with an Output value, how do I get the first Output that's produced? Ideally while still being able to get the remaining Outputs later in the order they're produced, and bearing in mind that some threads may or may not terminate.
Example:
struct Output(i32);
fn main() {
let mut spawned_threads = Vec::new();
for i in 0..10 {
let join_handle: ::std::thread::JoinHandle<Output> = ::std::thread::spawn(move || {
// pretend to do some work that takes some amount of time
::std::thread::sleep(::std::time::Duration::from_millis(
(1000 - (100 * i)) as u64,
));
Output(i) // then pretend to return the `Output` of that work
});
spawned_threads.push(join_handle);
}
// I can do this to wait for each thread to finish and collect all `Output`s
let outputs_in_order_of_thread_spawning = spawned_threads
.into_iter()
.map(::std::thread::JoinHandle::join)
.collect::<Vec<::std::thread::Result<Output>>>();
// but how would I get the `Output`s in order of completed threads?
}
I could solve the problem myself using a shared queue/channels/similar, but are there built-in APIs or existing libraries which could solve this use case for me more elegantly?
I'm looking for an API like:
fn race_threads<A: Send>(
threads: Vec<::std::thread::JoinHandle<A>>
) -> (::std::thread::Result<A>, Vec<::std::thread::JoinHandle<A>>) {
unimplemented!("so far this doesn't seem to exist")
}
(Rayon's join is the closest I could find, but a) it only races 2 closures rather than an arbitrary number of closures, and b) the thread pool w/ work stealing approach doesn't make sense for my use case of having some closures that might run forever.)
It is possible to solve this use case using pointers from How to check if a thread has finished in Rust? just like it's possible to solve this use case using an MPSC channel, however here I'm after a clean API to race n threads (or failing that, n closures on n threads).
These problems can be solved by using a condition variable:
use std::sync::{Arc, Condvar, Mutex};
#[derive(Debug)]
struct Output(i32);
enum State {
Starting,
Joinable,
Joined,
}
fn main() {
let pair = Arc::new((Mutex::new(Vec::new()), Condvar::new()));
let mut spawned_threads = Vec::new();
let &(ref lock, ref cvar) = &*pair;
for i in 0..10 {
let my_pair = pair.clone();
let join_handle: ::std::thread::JoinHandle<Output> = ::std::thread::spawn(move || {
// pretend to do some work that takes some amount of time
::std::thread::sleep(::std::time::Duration::from_millis(
(1000 - (100 * i)) as u64,
));
let &(ref lock, ref cvar) = &*my_pair;
let mut joinable = lock.lock().unwrap();
joinable[i] = State::Joinable;
cvar.notify_one();
Output(i as i32) // then pretend to return the `Output` of that work
});
lock.lock().unwrap().push(State::Starting);
spawned_threads.push(Some(join_handle));
}
let mut should_stop = false;
while !should_stop {
let locked = lock.lock().unwrap();
let mut locked = cvar.wait(locked).unwrap();
should_stop = true;
for (i, state) in locked.iter_mut().enumerate() {
match *state {
State::Starting => {
should_stop = false;
}
State::Joinable => {
*state = State::Joined;
println!("{:?}", spawned_threads[i].take().unwrap().join());
}
State::Joined => (),
}
}
}
}
(playground link)
I'm not claiming this is the simplest way to do it. The condition variable will awake the main thread every time a child thread is done. The list can show the state of each thread, if one is (about to) finish, it can be joined.
No, there is no such API.
You've already been presented with multiple options to solve your problem:
Use channels
Use a CondVar
Use futures
Sometimes when programming, you have to go beyond sticking pre-made blocks together. This is supposed to be a fun part of programming. I encourage you to embrace it. Go create your ideal API using the components available and publish it to crates.io.
I really don't see what's so terrible about the channels version:
use std::{sync::mpsc, thread, time::Duration};
#[derive(Debug)]
struct Output(i32);
fn main() {
let (tx, rx) = mpsc::channel();
for i in 0..10 {
let tx = tx.clone();
thread::spawn(move || {
thread::sleep(Duration::from_millis((1000 - (100 * i)) as u64));
tx.send(Output(i)).unwrap();
});
}
// Don't hold on to the sender ourselves
// Otherwise the loop would never terminate
drop(tx);
for r in rx {
println!("{:?}", r);
}
}

Resources