Rust Tokio - limit x threads in specific piece of code [duplicate] - multithreading

I am using Tokio for some asynchronous Rust code, and have run into a problem. I have some tasks which require access to a connection pool, and the nature of the connection pool means that only a fixed number (NUMCPUS) can run at a time - all other requests will block until there is a free connection.
Currently, I'm just using task::spawn_blocking, which kind of works. However, this has the downside that once 512 requests are blocking on the connection pool, Tokio's entire blocking pool is exhausted and all blocking tasks are just queued up. This prevents any spawn_blocking calls from elsewhere in the code that don't rely on the connection pool from running as well.
Is there any way to tell Tokio to keep a certain set of blocking tasks separate and only spawn N of them at a time, while still allowing unrelated blocking tasks to run without queueing up?
The spawn_blocking documentation suggests using Rayon for CPU intensive tasks, but a) it is not clear how to integrate Rayon with Tokio and b) my tasks are not CPU intensive anyway.

You can use a Semaphore: initialize it with the number of concurrently allowed tasks and have each task acquire the semaphore before processing and release it when done. Something like (untested):
use tokio::sync::Semaphore;
struct Pool {
sem: Semaphore,
}
impl Pool {
fn new (size: usize) -> Self {
Pool { sem: Semaphore::new (size), }
}
async fn spawn<T> (&self, f: T) -> T::Output
where
T: Future + Send + 'static,
T::Output: Send + 'static,
{
let handle = self.sem.acquire().await;
f.await
}
}

Related

Is there a performance difference between futures::executor::block_on and block_in_place

I have calls to async code inside a synchronous method (this method is part of a trait and I can't implement it asynchronously) so I use block_on to wait for the async calls to finish.
The sync method will be called from async code.
So the application is in #[tokio::main] and it calls the synchronous method when some event happens (endpoint hit), and the synchronous method will call some async code and wait on it to finish and return.
Turns out block_on can't be used inside async code. I have found tokio::task::block_in_place kind of spawns a synchronous context inside the async context, thus allows one to call block_on inside it.
So the method now looks like this:
impl SomeTrait for MyStruct {
fn some_sync_method(&self, handle: tokio::runtime::Handle) -> u32 {
tokio::task::block_in_place(|| {
handle.block_on(some_async_function())
}
}
}
Is this implementation better or using futures::executor::block_on instead:
impl SomeTrait for MyStruct {
fn some_sync_method(&self, handle: tokio::runtime::Handle) -> u32 {
futures::executor::block_on(some_async_function())
}
}
What is the underlying difference between the two implementations and in which cases each of them would be more efficient.
Btw, this method gets called a lot. This is part of a web server.
Don't use futures::executors::block_on(). Even before comparing the performance, there is something more important to consider: futures::executors::block_on() is just wrong here, as it blocks the asynchronous runtime.
As explained in block_in_place() docs:
In general, issuing a blocking call or performing a lot of compute in a future without yielding is problematic, as it may prevent the executor from driving other tasks forward. Calling this function informs the executor that the currently executing task is about to block the thread, so the executor is able to hand off any other tasks it has to a new worker thread before that happens. See the CPU-bound tasks and blocking code section for more information.
futures::executors::block_on() is likely to be a little more performant (I haven't benchmarked though) because it doesn't inform the executor. But that is the point: you need to inform the executor. Otherwise, your code can get stuck until your blocking function completes, performing essentially serializedly, without utilizing the device's resources.
If this code is most of the code, you may reconsider using an async runtime. It may be more efficient if you just spawn threads. Or just give up on using that library and use only async code.

Is there a way to create multiple pools for tokio::spawn_blocking so some tasks don't starve others?

I am using Tokio for some asynchronous Rust code, and have run into a problem. I have some tasks which require access to a connection pool, and the nature of the connection pool means that only a fixed number (NUMCPUS) can run at a time - all other requests will block until there is a free connection.
Currently, I'm just using task::spawn_blocking, which kind of works. However, this has the downside that once 512 requests are blocking on the connection pool, Tokio's entire blocking pool is exhausted and all blocking tasks are just queued up. This prevents any spawn_blocking calls from elsewhere in the code that don't rely on the connection pool from running as well.
Is there any way to tell Tokio to keep a certain set of blocking tasks separate and only spawn N of them at a time, while still allowing unrelated blocking tasks to run without queueing up?
The spawn_blocking documentation suggests using Rayon for CPU intensive tasks, but a) it is not clear how to integrate Rayon with Tokio and b) my tasks are not CPU intensive anyway.
You can use a Semaphore: initialize it with the number of concurrently allowed tasks and have each task acquire the semaphore before processing and release it when done. Something like (untested):
use tokio::sync::Semaphore;
struct Pool {
sem: Semaphore,
}
impl Pool {
fn new (size: usize) -> Self {
Pool { sem: Semaphore::new (size), }
}
async fn spawn<T> (&self, f: T) -> T::Output
where
T: Future + Send + 'static,
T::Output: Send + 'static,
{
let handle = self.sem.acquire().await;
f.await
}
}

How can I launch a daemon in a websocket handler with actix-web?

Given a basic setup of a WebSocket server with Actix, how can I launch a daemon inside my message handler?
I've extended the example starter code linked above to call daemon(false, true) using the fork crate.
use actix::{Actor, StreamHandler};
use actix_web::{web, App, Error, HttpRequest, HttpResponse, HttpServer};
use actix_web_actors::ws;
use fork::{daemon, Fork};
/// Define HTTP actor
struct MyWs;
impl Actor for MyWs {
type Context = ws::WebsocketContext<Self>;
}
/// Handler for ws::Message message
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWs {
fn handle(
&mut self,
msg: Result<ws::Message, ws::ProtocolError>,
ctx: &mut Self::Context,
) {
match msg {
Ok(ws::Message::Ping(msg)) => ctx.pong(&msg),
Ok(ws::Message::Text(text)) => {
println!("text message received");
if let Ok(Fork::Child) = daemon(false, true) {
println!("from daemon: this print but then the websocket crashes!");
};
ctx.text(text)
},
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
_ => (),
}
}
}
async fn index(req: HttpRequest, stream: web::Payload) -> Result<HttpResponse, Error> {
let resp = ws::start(MyWs {}, &req, stream);
println!("{:?}", resp);
resp
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| App::new().route("/ws/", web::get().to(index)))
.bind("127.0.0.1:8080")?
.run()
.await
}
The above code starts the server but when I send it a message, I receive a Panic in Arbiter thread.
text message received
from daemon: this print but then the websocket crashes!
thread 'actix-rt:worker:0' panicked at 'failed to park', /Users/xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.25/src/runtime/basic_scheduler.rs:158:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Panic in Arbiter thread.
The issue with your application is that the actix-web runtime (i.e. Tokio) is multi-threaded. This is a problem because the fork() call (used internaly by daemon()) only replicates the thread that called fork().
Even if your parent process has N threads, your child process will have only 1. If your parent process has any mutexes locked by those threads, their state will be replicated in the child process, but as those threads do not exist there, they will remain locked for forever.
If you have an Rc/Arc it will never de-allocate its memory, because it will never be dropped, thus its internal count will never reach zero. The same applies for any pointers and shared state.
Or said more simply - your forked child will end up in undefined state.
This is best explained in Calling fork() in a Multithreaded Environment:
The fork( ) system call creates an exact duplicate of the address
space from which it is called, resulting in two address spaces
executing the same code. Problems can occur if the forking address
space has multiple threads executing at the time of the fork( ). When
multithreading is a result of library invocation, threads are not
necessarily aware of each other's presence, purpose, actions, and so
on. Suppose that one of the other threads (any thread other than the
one doing the fork( )) has the job of deducting money from your
checking account. Clearly, you do not want this to happen twice as a
result of some other thread's decision to call fork( ).
Because of these types of problems, which in general are problems of
threads modifying persistent state, POSIX defined the behavior of
fork( ) in the presence of threads to propagate only the forking
thread. This solves the problem of improper changes being made to
persistent state. However, it causes other problems, as discussed in
the next paragraph.
In the POSIX model, only the forking thread is propagated. All the
other threads are eliminated without any form of notice; no cancels
are sent and no handlers are run. However, all the other portions of
the address space are cloned, including all the mutex state. If the
other thread has a mutex locked, the mutex will be locked in the child
process, but the lock owner will not exist to unlock it. Therefore,
the resource protected by the lock will be permanently unavailable.
Here you can find a more reputable source with more details
To answer your other question:
"how can I launch a daemon inside my message handler?"
I assume you want to implement the classical unix "fork() on accept()" model.
In that case you are out of luck, because servers such as actix-web, and async/await
in general are not designed with that in mind. Even if you have a
single-threaded async/await server, then:
When a child is forked it inherits all file descriptors from the parent. So it's
common after a fork, the child to close its listening socket in order to avoid a
resource leak - but there is no way to do that on any of the async/await based servers,
not because it's impossible to do, but because it's not implemented.
And even more important reason to do that is to prevent the child process
from accepting new connections - because even if you run a single threaded
server, it's still capable of processing many tasks concurrently - i.e.
when your handler calls .await on something, the acceptor would be free to
accept a new connection (by stealing it from the socket's queue) and start processing it.
Your parent server may have already spawned a lot of tasks and those would be
replicated in each forked child, thus executing the very same thing multiple times,
independently in each process
And well... there is no way to prevent any of that on any of the async/await
based servers I'm familiar with. You would need a custom server that:
Checks in its acceptor task if it's a child and if it detects that it's the child
it should close the listening socket and drop the acceptor.
It should not execute any other task that was forked from the parent,
but there is no way to achieve that.
In other words - async/await and "fork() on accept()" are two different and
incompatible models for processing tasks concurrently.
A possible solution would be to have a non-async acceptor daemon that only
accepts connections and forks itself. Then spawns a web-server in the child
then feeding it the accepted socket. But although possible, none of the servers
currently have support for that.
As described in the other answer, the async runtime you're relying on may completely break if you touch it in the child process. Touching anything can completely break assumptions the actix or tokio devs made. Wacky stuff will happen if you so much as return from the function.
See this response by one of the key authors of tokio to someone doing something similar (calling fork() in the context of a threadpool with hyper):
Threads + fork is bad news... you can fork if you immediately exec and do not allocate memory or perform any other operation that may have been corrupted by the fork.
Going back to your question:
The objective is for my websocket to respond to messages and be able to launch isolated long-running processes that launch successfully and do not exit when the websocket exits.
I don't think you want to manually fork() at all. Utility functions provided by actix/tokio should integrate well with their runtimes. You may:
Run blocking or CPU-heavy code in a dedicated thread with actix_web::block
Spawn a future with actix::AsyncContext::spawn. You would ideally want to use e.g. tokio::process::Command rather than the std version to avoid blocking in an async context.
If all you're doing in the child process is running Command::new() and later Command::spawn(), I'm pretty sure you can just call it directly. There's no need to fork; it does that internally.

The shared mutex problem in Rust (implementing AsyncRead/AsyncWrite for Arc<Mutex<IpStack>>)

Suppose I have an userspace TCP/IP stack. It's natural that I wrap it in Arc<Mutex<>> so I can share it with my threads.
It's also natural that I want to implement AsyncRead and AsyncWrite for it, so libraries that expect impl AsyncWrite and impl AsyncRead like hyper can use it.
This is an example:
use core::task::Context;
use std::pin::Pin;
use std::sync::Arc;
use core::task::Poll;
use tokio::io::{AsyncRead, AsyncWrite};
struct IpStack{}
impl IpStack {
pub fn send(self, data: &[u8]) {
}
//TODO: async or not?
pub fn receive<F>(self, f: F)
where F: Fn(Option<&[u8]>){
}
}
pub struct Socket {
stack: Arc<futures::lock::Mutex<IpStack>>,
}
impl AsyncRead for Socket {
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut tokio::io::ReadBuf<'_>
) -> Poll<std::io::Result<()>> {
//How should I lock and call IpStack::read here?
Poll::Ready(Ok(()))
}
}
impl AsyncWrite for Socket {
fn poll_write(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &[u8],
) -> Poll<Result<usize, std::io::Error>> {
//How should I lock and call IpStack::send here?
Poll::Ready(Ok(buf.len()))
}
//poll_flush and poll_shutdown...
}
Playground
I don't see anything wrong with my assumptions and I don't see another better way to share a stack with multiple threads unless I wrap it in Arc<Mutex<>>
This is similar to try_lock on futures::lock::Mutex outside of async? which caught my interest.
How should I lock the mutex without blocking? Notice that once I got the lock, the IpStack is not async, it has calls that block. I would like to implement async to it too, but I don't know it the problem will get much harder. Or would the problem get simpler if it had async calls?
I found the tokio documentation page on tokio::sync::Mutex pretty helpful: https://docs.rs/tokio/1.6.0/tokio/sync/struct.Mutex.html
From your description it sounds you want:
Non-blocking operations
One big data structure that manages all the IO resources managed by the userspace TCP/IP stack
To share that one big data structure across threads
I would suggest exploring something like an actor and use message passing to communicate with a task spawned to manage the TCP/IP resources. I think you could wrap the API kind of like the mini-redis example cited in tokio's documentation to implement AsyncRead and AsyncWrite. It might be easier to start with an API that returns futures of complete results and then work on streaming. I think this would be easier to make correct. Could be fun to exercise it with loom.
I think if you were intent on synchronizing access to the TCP/IP stack through a mutex you'd probably end up with an Arc<Mutex<...>> but with an API that wraps the mutex locks like mini-redis. The suggestion the tokio documentation makes is that their Mutex implementation is more appropriate for managing IO resources rather than sharing raw data and that does fit your situation I think.
You should not use an asynchronous mutex for this. Use a standard std::sync::Mutex.
Asynchronous mutexes like futures::lock::Mutex and tokio::sync::Mutex allow locking to be awaited instead of blocking so they are safe to use in async contexts. They are designed to be used across awaits. This is precisely what you don't want to happen! Locking across an await means that the mutex is locked for potentially a very long time and would prevent other asynchronous tasks wanting to use the IpStack from making progress.
Implementing AsyncRead/AsyncWrite is straight-forward in theory: either it can be completed immediately, or it coordinates through some mechanism to notify the context's waker when the data is ready and returns immediately. Neither case requires extended use of the underlying IpStack, so its safe to use a non-asynchronous mutex.
use std::pin::Pin;
use std::sync::{Arc, Mutex};
use std::task::{Context, Poll};
use tokio::io::{AsyncRead, AsyncWrite};
struct IpStack {}
pub struct Socket {
stack: Arc<Mutex<IpStack>>,
}
impl AsyncRead for Socket {
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut tokio::io::ReadBuf<'_>,
) -> Poll<std::io::Result<()>> {
let ip_stack = self.stack.lock().unwrap();
// do your stuff
Poll::Ready(Ok(()))
}
}
I don't see another better way to share a stack with multiple threads unless I wrap it in Arc<Mutex<>>.
A Mutex is certainly the most straightforward way to implement something like this, but I would suggest an inversion of control.
In the Mutex-based model, the IpStack is really driven by the Sockets, which consider the IpStack to be a shared resource. This results in a problem:
If a Socket blocks on locking the stack, it violates the contract of AsyncRead by spending an unbounded amount of time executing.
If a Socket doesn't block on locking the stack, choosing instead to use try_lock(), it may be starved because it doesn't remain "in line" for the lock. A fair locking algorithm, such as that provided by parking_lot, can't save you from starvation if you don't wait.
Instead, you might approach the problem the way the system network stack does. Sockets are not actors: the network stack drives the sockets, not the other way around.
In practice, this means that the IpStack should have some means for polling the sockets to determine which one(s) to write to/read from next. OS interfaces for this purpose, though not directly applicable, may provide some inspiration. Classically, BSD provided select(2) and poll(2); these days, APIs like epoll(7)(Linux) and kqueue(2)(FreeBSD) are preferred for large numbers of connections.
A dead simple strategy, loosely modeled on select/poll, is to repeatedly scan a list of Socket connections in round-robin fashion, handling their pending data as soon as it is available.
For a basic implementation, some concrete steps are:
When creating a new Socket, a bidirectional channel (i.e. one bounded channel in each direction) is established between it and the IpStack.
AsyncWrite on a Socket attempts to send data over the outgoing channel to the IpStack. If the channel is full, return Poll::Pending.
AsyncRead on a Socket attempts to receive data over the incoming channel from the IpStack. If the channel is empty, return Poll::Pending.
The IpStack must be driven externally (for instance, in an event loop on another thread) to continually poll the open sockets for available data, and to deliver incoming data to the correct sockets. By allowing the IpStack to control which sockets' data is sent, you can avoid the starvation problem of the Mutex solution.

How can I reliably clean up Rust threads performing blocking IO?

It seems to be a common idiom in Rust to spawn off a thread for blocking IO so you can use non-blocking channels:
use std::sync::mpsc::channel;
use std::thread;
use std::net::TcpListener;
fn main() {
let (accept_tx, accept_rx) = channel();
let listener_thread = thread::spawn(move || {
let listener = TcpListener::bind(":::0").unwrap();
for client in listener.incoming() {
if let Err(_) = accept_tx.send(client.unwrap()) {
break;
}
}
});
}
The problem is, rejoining threads like this depends on the spawned thread "realizing" that the receiving end of the channel has been dropped (i.e., calling send(..) returns Err(_)):
drop(accept_rx);
listener_thread.join(); // blocks until listener thread reaches accept_tx.send(..)
You can make dummy connections for TcpListeners, and shutdown TcpStreams via a clone, but these seem like really hacky ways to clean up such threads, and as it stands, I don't even know of a hack to trigger a thread blocking on a read from stdin to join.
How can I clean up threads like these, or is my architecture just wrong?
One simply cannot safely cancel a thread reliably in Windows or Linux/Unix/POSIX, so it isn't available in the Rust standard library.
Here is an internals discussion about it.
There are a lot of unknowns that come from cancelling threads forcibly. It can get really messy. Beyond that, the combination of threads and blocking I/O will always face this issue: you need every blocking I/O call to have timeouts for it to even have a chance of being interruptible reliably. If one can't write async code, one needs to either use processes (which have a defined boundary and can be ended by the OS forcibly, but obviously come with heavier weight and data sharing challenges) or non-blocking I/O which will land your thread back in an event loop that is interruptible.
mio is available for async code. Tokio is a higher level crate based on mio which makes writing non-blocking async code even more straight forward.

Resources