Spawning concurrent tokio tasks that reliably run - rust

I implemented a toy UDP server that simply squirts out data over a set of sockets as fast as it can through several concurrent tasks. I naively thought this would allow efficient use of CPU resources by making use of the Tokio's threaded runtime:
use std::error::Error;
use std::io;
use tokio::net::UdpSocket;
struct Server {
socket: UdpSocket,
buf: Vec<u8>,
id: u32,
}
const T_LOOPS: usize = 100000;
impl Server {
async fn run(self) -> Result<(), io::Error> {
let Server {
mut socket,
buf,
id,
} = self;
let peer = "127.0.0.1:9876".to_string();
loop {
for _n in 0..T_LOOPS {
let _amt = socket.send_to(&buf[..], &peer).await?;
}
println!("server {} run {} loops", id, T_LOOPS);
}
}
}
#[tokio::main(max_threads=8)]
async fn main() -> Result<(), Box<dyn Error>> {
let addr = "0.0.0.0:0".to_string();
for n in 0u32..4u32 {
let socket = UdpSocket::bind(&addr).await?;
let server = Server {
socket: socket,
buf: vec![0; 1500],
id: n,
};
tokio::spawn(async move {
server.run().await
});
}
Ok(())
}
It practice, which tasks run seems to be fairly non deterministic. Generally the first spawned task runs, and any number of the other 3 runs (and not always in order). For example, the output might look something like:
server 0 run 100000 loops
server 0 run 100000 loops
server 0 run 100000 loops
or
server 1 run 100000 loops
server 0 run 100000 loops
server 2 run 100000 loops
server 1 run 100000 loops
server 2 run 100000 loops
server 0 run 100000 loops
but is not consistently what I need (which would show all 4 ids).
I can't help but feel I must be missing something here. I've played around with core-threads and max-threads with no useful result.
How do I make this type of system work reliably with Tokio (or async in general)?
Though this is a toy system, it's actually a stepping stone towards my desired use-case.

Related

Parallelly processing HUGE file by splitting it into logical shards

I'm trying to parallelly process a huge file ~15GB - ~60GB which contains 560 Million to 2 Billion records
the record looks something like the following
<id> <amount>
123, 6000
123, 4593
111, 1
111, 100
111, -50
111, 10000
there could be thousands of users contained within a file whose activity is recorded as series of transactions.
I processed this file sequentially. Not an issue.
This can be safely parallelized by processing every client data by same thread/task.
But when I try to process it parallelly for optimize other cores available based on creating logical group which will be processed by the same tokio task. For now I'm sticking to creating spawning a single task per available core. And the transaction goes to same task by looking at client id.
This approach is way slow than sequential.
Following is snippet of the approach
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let max_threads_supported = num_cpus::get();
let mut account_state: HashMap<u16, Client> = HashMap::new();
let (result_sender, mut result_receiver) =
mpsc::channel::<HashMap<u16, Client>>(max_threads_supported);
// repository of sender for each shard
let mut sender_repository: HashMap<u16, Sender<Transaction>> = HashMap::new();
for task_counter in 0..max_threads_supported {
let result_sender_clone = result_sender.clone();
// create separate mpsc channel for each processor
let (sender, mut receiver) = mpsc::channel::<Transaction>(10_000);
sender_repository.insert(task_counter as u16, sender);
tokio::spawn(async move {
let mut exec_engine = Engine::initialize();
while let Some(tx) = receiver.recv().await {
match exec_engine.execute_transaction(tx) {
Ok(_) => (),
Err(err) => ()
}
}
result_sender_clone
.send(exec_engine.get_account_state_owned())
.await
});
}
drop(result_sender);
tokio::spawn(async move {
// just getting reading tx from file sequential std::io::BufferedReader
for result in reader.deserialize::<Transaction>() {
match result {
Ok(tx) => {
match sender_repository.get(&(&tx.get_client_id() % max_threads_supported)) {
Some(sender) => {
sender.send(tx).await;
}
None => ()
}
}
_ =>()
}
}
});
// accumulate result from all the processor
while let Some(result) = result_receiver.recv().await {
account_state.extend(result.into_iter());
}
// do what ever you like with result
Ok(())
}
But this seems pretty slow than sequential approach. What am I doing wrong?
Btw I've also tried to use broadcast approach but there is chance of lagging consumer and losing messages. So moved to mpsc.
How can I optimize this for better performance??
There are a couple of misconceptions here. The main one being that tokio is not meant for cpu-bound parallelism. It is meant for io-bound event based situations where short reaction times and good scalability is required, like web servers.
What further reinforces my impression is that you "spawn one task per CPU core", which is the opposite of what you want in tokio. Tokio's strengh is that you can spawn a very large number of tasks, and tokio efficiently schedules them on the available CPU resources. I mean, some configurations of the tokio runtime are single-threaded! So spawning more tasks achieves absolutely no speedup whatsoever; spawning tasks is not for speedup, but for waiting at more await points at the same time. For example in a web server, if you are connected to 100 clients at the same time, you need 100 wait points to wait for a message from each of them. That's where you need one task per connection.
What you actually want is not asynchronism but parallelism.
The current go-to library for structured parallelism is rayon combined with the excellent crossbeam-channel library for dataflow.
This should point you in the right direction:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let max_threads_supported = num_cpus::get();
let mut account_state: Mutex<HashMap<u16, Client>> = Mutex::new(HashMap::new());
rayon::scope(|s| {
let (work_queue_sender, work_queue_receiver) = crossbeam_channel::bounded(1000);
for task_counter in 0..max_threads_supported {
let work_receiver = work_queue_receiver.clone();
s.spawn(|_| {
let mut exec_engine = Engine::initialize();
while let Ok(tx) = work_receiver.recv() {
// TODO: do some proper error handling
exec_engine.execute_transaction(tx).unwrap();
}
account_state
.lock()
.extend(exec_engine.get_account_state_owned().into_iter());
});
}
let reader = Reader;
for result in reader.deserialize::<Transaction>() {
work_queue_sender.send(result.unwrap()).unwrap();
}
drop(work_queue_sender);
});
// Do whatever you want with the `account_state` HashMap.
Ok(())
}
Although many imports were missing from your code (please provide a minimal reproducible example next time), so I wasn't able to test the code.
But it should look somewhat similar to this.

If "futures do nothing unless awaited", why does `tokio::spawn` work anyway?

I have read here that futures in Rust do nothing unless they are awaited. However, I tried a more complex example and it is a little unclear why I get a message printed by the 2nd print in this example because task::spawn gives me a JoinHanlde on which I do not do any .await.
Meanwhile, I tried the same example, but with an await above the 2nd print, and now I get printed only the message in the 1st print.
If I wait for all the futures at the end, I get printed both messages, which I understood. My question is why the behaviour in the previous 2 cases.
use futures::stream::{FuturesUnordered, StreamExt};
use futures::TryStreamExt;
use rand::prelude::*;
use std::collections::VecDeque;
use std::sync::Arc;
use tokio::sync::Semaphore;
use tokio::task::JoinHandle;
use tokio::{task, time};
fn candidates() -> Vec<i32> {
Vec::from([2, 2])
}
async fn produce_event(nanos: u64) -> i32 {
println!("waiting {}", nanos);
time::sleep(time::Duration::from_nanos(nanos)).await;
1
}
async fn f(seconds: i64, semaphore: &Arc<Semaphore>) {
let mut futures = vec![];
for (i, j) in (0..1).enumerate() {
for (i, event) in candidates().into_iter().enumerate() {
let permit = Arc::clone(semaphore).acquire_owned().await;
let secs = 500;
futures.push(task::spawn(async move {
let _permit = permit;
produce_event(500); // 2nd example has an .await here
println!("Event produced at {}", seconds);
}));
}
}
}
#[tokio::main()]
async fn main() {
let semaphore = Arc::new(Semaphore::new(45000));
for _ in 0..1 {
let mut futures: FuturesUnordered<_> = (0..2).map(|moment| f(moment, &semaphore)).collect();
while let Some(item) = futures.next().await {
let () = item;
}
}
}
However, I tried a more complex example and it is a little unclear why I get a message printed by the 2nd print in this example because task::spawn gives me a JoinHanlde on which I do not do any .await.
You're spawning tasks. A task is a separate thread of execution which can execute concurrently to the current task, and can be scheduled in parallel.
All the JoinHandle does there is wait for that task to end, it doesn't control the task running.
Meanwhile, I tried the same example, but with an await above the 2nd print, and now I get printed only the message in the 1st print.
You spawn a bunch of tasks and make them sleep. Since you don't wait for them to terminate (don't join them) nor is there any sort of sleep in their parent task, once all the tasks have been spawned the loops terminate, you reach the end of the main function and the program terminates.
At this point all the tasks are still sleeping.

Rust rayon tcp blocking

I'm writing a program which executes a command on server using ssh and gets the output.
The part I don't understand is lower in the code.
If the function waits and then returns a string, it works as expected, but if work with TCP it starts performing very bad. I expect that using 100 threads on 100 hosts will perform 100 times faster,
because will open 100 sockets simultaneously.
In the sleep version, changing poolThreads directly affects the time of execution. In the version with TCP streams, changing pool from 1 to 100 with 100 hosts only speeds it up from 90 to 67 because some hosts are offline.
I read the documentation, but cannot find anything to help me.
use clap::{App, Arg};
use rayon::prelude::*;
use rayon::ThreadPoolBuilder;
use ssh2::Session;
use std::fmt::Display;
use std::io::prelude::*;
use std::io::{BufReader, Read};
use std::net::{TcpStream, ToSocketAddrs};
use std::thread::sleep;
use std::time::Duration;
fn process_host<A>(hostname: A) -> Result<String, String>
where
A: ToSocketAddrs + Display,
{
sleep(Duration::from_secs(1));
return Ok(hostname.to_string());
// here is the problem
// -----------------------------------------------------------
let tcp = match TcpStream::connect(&hostname) {
Ok(a) => a,
Err(e) => {
return Err(format!("{}:{}", hostname, e).to_string());
}
};
let mut sess = match Session::new() {
Ok(a) => a,
Err(e) => {
// todo logging
return Err(format!("{}:{}", hostname, e).to_string());
}
};
sess.set_tcp_stream(tcp);
match sess.handshake() {
Ok(a) => a,
Err(e) => {
return Err(format!("{}:{}", hostname, e).to_string());
}
};
Ok(format!("{}", hostname))
}
fn main() {
let hosts = vec!["aaaaa:22", "bbbbbbb:22"];
let pool = ThreadPoolBuilder::new()
.num_threads(10)
.build()
.expect("failed creating pool");
pool.install(|| {
hosts
.par_iter()
.map(|x| process_host(x))
.for_each(|x| println!("{:?}", x))
});
}
To debug a such a problem, you need to analysis, where your program wastes its time.
There're to ways:
profiling and analyzing of TCP connections.
I suggest you 2 way, cause it's much easier.
Dump traffic with wireshark, filter it by port 22.
After this, use conversations
tab. Here you can sort connections by time, and can see, that program doesn't speedup because of no time limit for ssh connection.

How to Use Serial Port in Multiple Threads in Rust?

I am trying to read and write to my serial port on Linux to communicate with a microcontroller and I'm trying to do so in Rust.
My normal pattern when developing in say C++ or Python is to have two threads: one which sends requests out over serial periodically and one which reads bytes out of the buffer and handles them.
In Rust, I'm running into trouble with the borrow checker while using the serial crate. This makes sense to me why this is, but I'm unsure what designing for an asynchronous communication interface would look like in Rust. Here's a snippet of my source:
let mut port = serial::open(&device_path.as_os_str()).unwrap();
let request_temperature: Vec<u8> = vec![0xAA];
thread::spawn(|| {
let mut buffer: Vec<u8> = Vec::new();
loop {
let _bytes_read = port.read(&mut buffer);
// process data
thread::sleep(Duration::from_millis(100));
}
});
loop {
port.write(&request_temperature);
thread::sleep(Duration::from_millis(1000));
}
How can I emulate this functionality where I have two threads holding onto a mutable resource in Rust? I know that since this specific example could be done in a single thread, but I'm thinking for an eventual larger program this would end up being multiple threads.
You can wrap your port in a Arc and a Mutex, then you can write something like:
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
struct Port;
impl Port {
pub fn read(&mut self, _v: &mut Vec<u8>) {
println!("READING...");
}
pub fn write(&mut self, _v: &Vec<u8>) {
println!("WRITING...");
}
}
pub fn main() {
let mut port = Arc::new(Mutex::new(Port));
let p2 = port.clone();
let handle = thread::spawn(move || {
let mut buffer: Vec<u8> = Vec::new();
for j in 0..100 {
let _bytes_read = p2.lock().unwrap().read(&mut buffer);
thread::sleep(Duration::from_millis(10));
}
});
let request_temperature: Vec<u8> = vec![0xAA];
for i in 0..10 {
port.lock().unwrap().write(&request_temperature);
thread::sleep(Duration::from_millis(100));
}
handle.join();
}
So that this will run on a test machine, I've replaced the serial port with a stub class, reduced the sleeps and replaced the infinite loop with some finite loops.
While this works, you'll probably actually want proper communication between the threads at some stage, at which point you'll want to look at std::sync::mpsc::channel

Rust Concurrent Execution with Futures and Tokio

I've got some Rust code that currently looks like this
fn read_stdin(mut tx: mpsc::Sender<String>) {
loop {
// read from stdin and send value over tx.
}
}
fn sleep_for(n: u64) -> impl Future<Item = (), Error = ()> {
thread::sleep(time::Duration::from_millis(n));
println!("[{}] slept for {} ms", Local::now().format("%T%.3f"), n);
future::ok(())
}
fn main() {
let (stdin_tx, stdin_rx) = mpsc::channel(0);
thread::spawn(move || read_stdin(stdin_tx));
let server = stdin_rx
.map(|data| data.trim().parse::<u64>().unwrap_or(0))
.for_each(|n| tokio::spawn(sleep_for(n * 100)));
tokio::run(server);
}
It uses tokio and futures, with the aim of running some "cpu heavy" work (emulated by the sleep_for function) and then outputting some stuff to stdout.
When I run it, things seems to work fine and I get this output
2
[00:00:00.800] slept for 200 ms
10
1
[00:00:01.800] slept for 1000 ms
[00:00:01.900] slept for 100 ms
The first output with the value 2 is exactly as expected, and I see the timestamp printed after 200ms. But for the next inputs, it becomes clear that the sleep_for function is being executed sequentially, and not concurrently.
The output that I want to see is
2
[00:00:00.800] slept for 200 ms
10
1
[00:00:00.900] slept for 100 ms
[00:00:01.900] slept for 1000 ms
It seems that to get the output I'm looking for I want to execute sleep_for(10) and sleep_for(1) concurrently. How would I go about doing this in Rust with futures and tokio?
(Note: the actual values of the timestamps aren't important I'm using them more to show the ordering of execution within the program)
Found a solution with the use of the futures-timer crate.
use chrono::Local;
use futures::{future, sync::mpsc, Future, Sink, Stream};
use futures_timer::Delay;
use std::{io::stdin, thread, time::Duration};
fn read_stdin(mut tx: mpsc::Sender<String>) {
let stdin = stdin();
loop {
let mut buf = String::new();
stdin.read_line(&mut buf).unwrap();
tx = tx.send(buf).wait().unwrap()
}
}
fn main() {
let (stdin_tx, stdin_rx) = mpsc::channel(0);
thread::spawn(move || read_stdin(stdin_tx));
let server = stdin_rx
.map(|data| data.trim().parse::<u64>().unwrap_or(0) * 100)
.for_each(|delay| {
println!("[{}] {} ms -> start", Local::now().format("%T%.3f"), delay);
tokio::spawn({
Delay::new(Duration::from_millis(delay))
.and_then(move |_| {
println!("[{}] {} ms -> done", Local::now().format("%T%.3f"), delay);
future::ok(())
})
.map_err(|e| panic!(e))
})
});
tokio::run(server);
}
The issue is that the rather letting the future to become parked and then notifying the current task, the code presented in the question was just sleeping the thread and so no progress could be made.
Update: Now I've just come across tokio-timer which seems like the standard way of doing this.

Resources