I want to connect to multiple peers and do a series of tasks on each connection concurrently.
I have the following contrived scenario:
use futures::{
stream::{self, FuturesUnordered},
StreamExt,
};
use tokio;
struct State {
foo: String,
bar: String
}
#[tokio::main]
async fn main() {
let mut futures = (0..10).map(move |i| async move {
let delay = core::time::Duration::from_secs(2);
tokio::time::sleep(delay).await;
if i % 2 == 0 {
// only half of the connections are successful
stream::iter(Ok::<i32, std::io::Error>(i).into_iter())
} else {
let custom_error = std::io::Error::new(std::io::ErrorKind::Other, "oh no!");
stream::iter(Err::<i32, std::io::Error>(custom_error).into_iter())
}
})
.collect::<FuturesUnordered<_>>()
.flatten() // ignore unsuccessful attempts
.map(|peer| (State { foo: "foo".into(), bar: "bar".into() }, peer));
// first task
futures.by_ref().for_each_concurrent(None, |x| async move {
println!("processing task #1 for peer {:b}", x.1);
let delay = core::time::Duration::from_secs(2);
tokio::time::sleep(delay).await;
}).await;
// second task
futures.by_ref().for_each_concurrent(None, |x| async move {
println!("processing task #2 for peer {:b}", x.1);
let delay = core::time::Duration::from_secs(2);
tokio::time::sleep(delay).await;
}).await;
}
rust playground
output:
processing task #1 for peer 0
processing task #1 for peer 10
processing task #1 for peer 100
processing task #1 for peer 110
processing task #1 for peer 1000
Why is only the first task being processed here? I would expect the following output:
processing task #1 for peer 0
processing task #1 for peer 10
processing task #1 for peer 100
processing task #1 for peer 110
processing task #1 for peer 1000
processing task #2 for peer 0
processing task #2 for peer 10
processing task #2 for peer 100
processing task #2 for peer 110
processing task #2 for peer 1000
Is there a way to run for_each_concurrent multiple times here?
Related
I have a code to check a network connection like this
fn network_connection() {
loop {
let output = Command::new("ping")
.arg("-c")
.arg("1")
.arg("google.com")
.output()
.expect("[ ERR ] Failed to execute network check process");
let network_status = output.status;
if network_status.success() == false {
Command::new("init")
.arg("6");
}
thread::sleep(Duration::from_millis(60000));
}
}
fn main() {
thread::spawn(|| {
network_connection();
});
...
this should ping google every 60 seconds. But when I'm looking at number of request sent to router its like 200 requests per 10 minutes.
Is this spamming more threads than one?
main() is running only once.
My requirement is very simple, which is a very reasonable requirement in many programs. It is to send a specified message to my Channel after a specified time.
I've checked tokio for topics related to delay, interval or timeout, but none of them seem that straightforward to implement.
What I've come up with now is to spawn an asynchronous task, then wait or sleep for a certain amount of time, and finally send the message.
But, obviously, spawning an asynchronous task is a relatively heavy operation. Is there a better solution?
async fn my_handler(sender: mpsc::Sender<i32>, dur: Duration) {
tokio::spawn(async {
time::sleep(dur).await;
sender.send(0).await;
}
}
You could try adding a second channel and a continuously running task that buffers messages until the time they are to be received. Implementing this is more involved than it sounds, I hope I'm handling cancellations right here:
fn make_timed_channel<T: Ord + Send + Sync + 'static>() -> (Sender<(Instant, T)>, Receiver<T>) {
// Ord is an unnecessary requirement arising from me stuffing both the Instant and the T into the Binary heap
// You could drop this requirement by using the priority_queue crate instead
let (sender1, receiver1) = mpsc::channel::<(Instant, T)>(42);
let (sender2, receiver2) = mpsc::channel::<T>(42);
let mut receiver1 = Some(receiver1);
tokio::spawn(async move {
let mut buf = std::collections::BinaryHeap::<Reverse<(Instant, T)>>::new();
loop {
// Pretend we're a bounded channel or exit if the upstream closed
if buf.len() >= 42 || receiver1.is_none() {
match buf.pop() {
Some(Reverse((time, element))) => {
sleep_until(time).await;
if sender2.send(element).await.is_err() {
break;
}
}
None => break,
}
}
// We have some deadline to send a message at
else if let Some(Reverse((then, _))) = buf.peek() {
if let Ok(recv) = timeout_at(*then, receiver1.as_mut().unwrap().recv()).await {
match recv {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
} else {
if sender2.send(buf.pop().unwrap().0 .1).await.is_err() {
break;
}
}
}
// We're empty, wait around
else {
match receiver1.as_mut().unwrap().recv().await {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
}
}
});
(sender1, receiver2)
}
Playground
Whether this is more efficient than spawning tasks, you'd have to benchmark. (I doubt it. Tokio iirc has some much fancier solution than a BinaryHeap for waiting for waking up at the next timeout, e.g.)
One optimization you could make if you don't need a Receiver<T> but just something that .poll().await can be called on: You could drop the second channel and maintain the BinaryHeap inside a custom receiver.
Tokio has the same example of a simple TCP echo server on its:
GitHub main page (https://github.com/tokio-rs/tokio)
API reference main page (https://docs.rs/tokio/0.2.18/tokio/)
However, in both pages, there is no explanation of what's actually going on. Here's the example, slightly modified so that the main function does not return Result<(), Box<dyn std::error::Error>>:
use tokio::net::TcpListener;
use tokio::prelude::*;
#[tokio::main]
async fn main() {
if let Ok(mut tcp_listener) = TcpListener::bind("127.0.0.1:8080").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
let mut buf = [0; 1024];
// In a loop, read data from the socket and write the data back.
loop {
let n = match tcp_stream.read(&mut buf).await {
// socket closed
Ok(n) if n == 0 => return,
Ok(n) => n,
Err(e) => {
eprintln!("failed to read from socket; err = {:?}", e);
return;
}
};
// Write the data back
if let Err(e) = tcp_stream.write_all(&buf[0..n]).await {
eprintln!("failed to write to socket; err = {:?}", e);
return;
}
}
});
}
}
}
After reading the Tokio documentation (https://tokio.rs/docs/overview/), here's my mental model of this example. A task is spawned for each new TCP connection. And a task is ended whenever a read/write error occurs, or when the client ends the connection (i.e. n == 0 case). Therefore, if there are 20 connected clients at a point in time, there would be 20 spawned tasks. However, under the hood, this is NOT equivalent to spawning 20 threads to handle the connected clients concurrently. As far as I understand, this is basically the problem that asynchronous runtimes are trying to solve. Correct so far?
Next, my mental model is that a tokio scheduler (e.g. the multi-threaded threaded_scheduler which is the default for apps, or the single-threaded basic_scheduler which is the default for tests) will schedule these tasks concurrently on 1-to-N threads. (Side question: for the threaded_scheduler, is N fixed during the app's lifetime? If so, is it equal to num_cpus::get()?). If one task is .awaiting for the read or write_all operations, then the scheduler can use the same thread to perform more work for one of the other 19 tasks. Still correct?
Finally, I'm curious whether the outer code (i.e. the code that is .awaiting for tcp_listener.accept()) is itself a task? Such that in the 20 connected clients example, there aren't really 20 tasks but 21: one to listen for new connections + one per connection. All of these 21 tasks could be scheduled concurrently on one or many threads, depending on the scheduler. In the following example, I wrap the outer code in a tokio::spawn and .await the handle. Is it completely equivalent to the example above?
use tokio::net::TcpListener;
use tokio::prelude::*;
#[tokio::main]
async fn main() {
let main_task_handle = tokio::spawn(async move {
if let Ok(mut tcp_listener) = TcpListener::bind("127.0.0.1:8080").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
// ... same as above ...
});
}
}
});
main_task_handle.await.unwrap();
}
This answer is a summary of an answer I received on Tokio's Discord from Alice Ryhl. Big thank you!
First of all, indeed, for the multi-threaded scheduler, the number of OS threads is fixed to num_cpus.
Second, Tokio can swap the currently running task at every .await on a per-thread basis.
Third, the main function runs in its own task, which is spawned by the #[tokio::main] macro.
Therefore, for the first code block example, if there are 20 connected clients, there would be 21 tasks: one for the main macro + one for each of the 20 open TCP streams. For the second code block example, there would be 22 tasks because of the extra outer tokio::spawn but it's needless and doesn't add any concurrency.
I'd like to spin up a Tokio event loop alongside a Rocket server, then add events to this loop later on. I read Is there a way to launch a tokio::Delay on a new thread to allow the main loop to continue?, but it's still not clear to me how to achieve my goal.
As the documentation states:
The returned handle can be used to spawn tasks that run on this runtime, and can be cloned to allow moving the Handle to other threads.
Here is an example of spinning up the event loop in one thread and having a second thread spawn tasks on it.
use futures::future; // 0.3.5
use std::{thread, time::Duration};
use tokio::{runtime::Runtime, time}; // 0.2.21
fn main() {
let (shutdown_tx, shutdown_rx) = tokio::sync::oneshot::channel();
let (handle_tx, handle_rx) = std::sync::mpsc::channel();
let tokio_thread = thread::spawn(move || {
let mut runtime = Runtime::new().expect("Unable to create the runtime");
eprintln!("Runtime created");
// Give a handle to the runtime to another thread.
handle_tx
.send(runtime.handle().clone())
.expect("Unable to give runtime handle to another thread");
// Continue running until notified to shutdown
runtime.block_on(async {
shutdown_rx.await.expect("Error on the shutdown channel");
});
eprintln!("Runtime finished");
});
let another_thread = thread::spawn(move || {
let handle = handle_rx
.recv()
.expect("Could not get a handle to the other thread's runtime");
eprintln!("Another thread created");
let task_handles: Vec<_> = (0..10)
.map(|value| {
// Run this future in the other thread's runtime
handle.spawn(async move {
eprintln!("Starting task for value {}", value);
time::delay_for(Duration::from_secs(2)).await;
eprintln!("Finishing task for value {}", value);
})
})
.collect();
// Finish all pending tasks
handle.block_on(async move {
future::join_all(task_handles).await;
});
eprintln!("Another thread finished");
});
another_thread.join().expect("Another thread panicked");
shutdown_tx
.send(())
.expect("Unable to shutdown runtime thread");
tokio_thread.join().expect("Tokio thread panicked");
}
Runtime created
Another thread created
Starting task for value 0
Starting task for value 1
Starting task for value 2
Starting task for value 3
Starting task for value 4
Starting task for value 5
Starting task for value 6
Starting task for value 7
Starting task for value 8
Starting task for value 9
Finishing task for value 0
Finishing task for value 5
Finishing task for value 4
Finishing task for value 3
Finishing task for value 9
Finishing task for value 2
Finishing task for value 1
Finishing task for value 7
Finishing task for value 8
Finishing task for value 6
Another thread finished
Runtime finished
The solution for Tokio 0.1 is available in the revision history of this post.
See also:
How to run an asynchronous task from a non-main thread in Tokio?
Here's an example but what should I wait on to decide when it is done. Do we have a better way to wait for the channel to be empty and all the threads to have completed? Full example is at http://github.com/posix4e/rust_webcrawl
loop {
let n_active_threads = running_threads.compare_and_swap(0, 0, Ordering::SeqCst);
match rx.try_recv() {
Ok(new_site) => {
let new_site_copy = new_site.clone();
let tx_copy = tx.clone();
counter += 1;
print!("{} ", counter);
if !found_urls.contains(&new_site) {
found_urls.insert(new_site);
running_threads.fetch_add(1, Ordering::SeqCst);
let my_running_threads = running_threads.clone();
pool.execute(move || {
for new_url in get_websites_helper(new_site_copy) {
if new_url.starts_with("http") {
tx_copy.send(new_url).unwrap();
}
}
my_running_threads.fetch_sub(1, Ordering::SeqCst);
});
}
}
Err(TryRecvError::Empty) if n_active_threads == 0 => break,
Err(TryRecvError::Empty) => {
writeln!(&mut std::io::stderr(),
"Channel is empty, but there are {} threads running",
n_active_threads);
thread::sleep_ms(10);
},
Err(TryRecvError::Disconnected) => unreachable!(),
}
}
This is actually a very complicated question, one with a great potential for race conditions! As I understand it, you:
Have an unbounded queue
Have a set of workers that operate on the queue items
The workers can put an unknown amount of items back into the queue
Want to know when everything is "done"
One obvious issue is that it may never be done. If every worker puts one item back into the queue, you've got an infinite loop.
That being said, I feel like the solution is to track
How many items are queued
How many items are in progress
When both of these values are zero, then you are done. Easier said than done...
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize,Ordering};
use std::sync::mpsc::{channel,TryRecvError};
use std::thread;
fn main() {
let running_threads = Arc::new(AtomicUsize::new(0));
let (tx, rx) = channel();
// We prime the channel with the first bit of work
tx.send(10).unwrap();
loop {
// In an attempt to avoid a race condition, we fetch the
// active thread count before checking the channel. Otherwise,
// we might read nothing from the channel, and *then* a thread
// finishes and added something to the queue.
let n_active_threads = running_threads.compare_and_swap(0, 0, Ordering::SeqCst);
match rx.try_recv() {
Ok(id) => {
// I lie a bit and increment the counter to start
// with. If we let the thread increment this, we might
// read from the channel before the thread ever has a
// chance to run!
running_threads.fetch_add(1, Ordering::SeqCst);
let my_tx = tx.clone();
let my_running_threads = running_threads.clone();
// You could use a threadpool, but I'm spawning
// threads to only rely on stdlib.
thread::spawn(move || {
println!("Working on {}", id);
// Simulate work
thread::sleep_ms(100);
if id != 0 {
my_tx.send(id - 1).unwrap();
// Send multiple sometimes
if id % 3 == 0 && id > 2 {
my_tx.send(id - 2).unwrap();
}
}
my_running_threads.fetch_sub(1, Ordering::SeqCst);
});
},
Err(TryRecvError::Empty) if n_active_threads == 0 => break,
Err(TryRecvError::Empty) => {
println!("Channel is empty, but there are {} threads running", n_active_threads);
// We sleep a bit here, to avoid quickly spinning
// through an empty channel while the worker threads
// work.
thread::sleep_ms(1);
},
Err(TryRecvError::Disconnected) => unreachable!(),
}
}
}
I make no guarantees that this implementation is perfect (I probably should guarantee that it's broken, because threading is hard). One big caveat is that I don't intimately know the meanings of all the variants of Ordering, so I chose the one that looked to give the strongest guarantees.