Not using Async in Rocket 0.5+?

Not using Async in Rocket 0.5+? - rust

I read Rocket v0.5 now uses the Tokio runtime to support Async. I know Async can offer great scalability when we have lots of (like hundreds or thousands of) concurrent IO-bound requests. But many web/REST server apps simply don't fall into that category and in such cases, I feel like Async would only complicate stuff. Sorry if that sounds like a dumb question, but with Rocket 0.5+ will I still be able to write a traditional non-async code the same way as before? Does Async-support in Rocket 0.5+ mean that we will only get Async behaviour for async fn handlers? If so, will the Tokio runtime still play any role in non-async code?

Sure you can.
Look at the first examples in the web page:
#[get("/")]
fn index() -> &'static str {
"Hello, world!"
}
There is no async/await anywhere. The nicest thing of Rocket5 is that you can choose which views are sync are which are async, simply by making them so, and you can mix them together as you see fit.
For example this will just work:
#[get("/sync")]
fn index1() -> &'static str {
"Hello, sync!"
}
#[get("/async")]
async fn index2() -> &'static str {
"Hello, async!"
}
The Rocket runtime is all async under the hood, but that doesn't need to be exposed to your view handlers at all. When a non-async handler is run, it will be as if Rocket used spawn_blocking().

Related

Is there a performance difference between futures::executor::block_on and block_in_place

I have calls to async code inside a synchronous method (this method is part of a trait and I can't implement it asynchronously) so I use block_on to wait for the async calls to finish.
The sync method will be called from async code.
So the application is in #[tokio::main] and it calls the synchronous method when some event happens (endpoint hit), and the synchronous method will call some async code and wait on it to finish and return.
Turns out block_on can't be used inside async code. I have found tokio::task::block_in_place kind of spawns a synchronous context inside the async context, thus allows one to call block_on inside it.
So the method now looks like this:
impl SomeTrait for MyStruct {
fn some_sync_method(&self, handle: tokio::runtime::Handle) -> u32 {
tokio::task::block_in_place(|| {
handle.block_on(some_async_function())
}
}
}
Is this implementation better or using futures::executor::block_on instead:
impl SomeTrait for MyStruct {
fn some_sync_method(&self, handle: tokio::runtime::Handle) -> u32 {
futures::executor::block_on(some_async_function())
}
}
What is the underlying difference between the two implementations and in which cases each of them would be more efficient.
Btw, this method gets called a lot. This is part of a web server.

Don't use futures::executors::block_on(). Even before comparing the performance, there is something more important to consider: futures::executors::block_on() is just wrong here, as it blocks the asynchronous runtime.
As explained in block_in_place() docs:
In general, issuing a blocking call or performing a lot of compute in a future without yielding is problematic, as it may prevent the executor from driving other tasks forward. Calling this function informs the executor that the currently executing task is about to block the thread, so the executor is able to hand off any other tasks it has to a new worker thread before that happens. See the CPU-bound tasks and blocking code section for more information.
futures::executors::block_on() is likely to be a little more performant (I haven't benchmarked though) because it doesn't inform the executor. But that is the point: you need to inform the executor. Otherwise, your code can get stuck until your blocking function completes, performing essentially serializedly, without utilizing the device's resources.
If this code is most of the code, you may reconsider using an async runtime. It may be more efficient if you just spawn threads. Or just give up on using that library and use only async code.

Can I clone a future?

I want to write some generic retry logic for a future.
I know the concrete return type and want to retry the same future.
My code only has access to the future - I do not want to wrap every fn call site in a closure to enable recreating it.
It seems that a "future" is a combination of (fn, args), and when .await is called, it runs and waits for the result in place.
If I am able to clone all of the args, would it be possible to create a clone of the not-started future to retry it if it fails the first time?

The problem is that a not-yet-started future is the same type as a future that has already started - the future transforms itself in-place. So while in theory a Future could be Clone, that would place severe constraints on the state it's allowed to keep during its whole lifetime. For futures implemented with async fn not only would the initial state (the parameters passed to async fn) have to be Clone, but also so would all the local variables that cross .await points.
A simple experiment shows that the current async doesn't auto-implement Clone the way it does e.g. Send, even for async functions where that would be safe. For example:
async fn retry(f: impl Future + Clone) {
todo!()
}
fn main() {
// fails to compile:
retry(async {});
// ^^^^^^^^ the trait `Clone` is not implemented for `impl Future`
}
I do not want to wrap every fn call site in a closure to enable recreating it.
In this situation that's probably exactly what you need to do. Or use some sort of macro if the closure requires too much boilerplate.

A Future can be cloned via https://docs.rs/futures/latest/futures/future/trait.FutureExt.html#method.shared. This is useful to pass the future to multiple consumers, but not suitable for retry.
To have retries with Futures, you need some kind of Future factory, to create a new Future for a retry when an error occurs. Ideally this retry mechanism would be wrapped in its own Future, to hide the complexity for consumers.
There's a crate which does that already: https://docs.rs/futures-retry/latest/futures_retry/struct.FutureRetry.html

Is there a rust feature for async analogous to the recv_timeout function?

I'm trying to call an async function inside non-async context, and I'm having a really hard time with it.
Channels have been far easier to use for me - it's pretty simple and intuitive.
recv means block the thread until you receive something.
try_recv means see if something's there, otherwise error out.
recv_timeout means try for a certain amount of milliseconds, and then error out if nothing's there after the timeout.
I've been looking all over in the documentation of std::future::Future, but I don't see any way to do something similar. None of the functions that I've tried are simple solutions, and they all either take or give weird results that require even more unwrapping.

The Future trait in the standard library is very rudimentary and provides a stable foundation for others to build on.
Async runtimes (such as tokio, async-std, smol) include combinators that can take a future and turn it into another future. The tokio library has one such combinator called timeout.
Here is an example (playground link), which times out after 1 second while attempting to receive on a oneshot channel.
use std::time::Duration;
use tokio::{runtime::Runtime, sync::oneshot, time::{timeout, error::Elapsed}};
fn main() {
// Create a oneshot channel, which also implements `Future`, we can use it as an example.
let (_tx, rx) = oneshot::channel::<()>();
// Create a new tokio runtime, if you already have an async environment,
// you probably want to use tokio::spawn instead in order to re-use the existing runtime.
let rt = Runtime::new().unwrap();
// block_on is a function on the runtime which makes the current thread poll the future
// until it has completed. async move { } creates an async block, which implements `Future`
let output: Result<_, Elapsed> = rt.block_on(async move {
// The timeout function returns another future, which outputs a `Result<T, Elapsed>`. If the
// future times out, the `Elapsed` error is returned.
timeout(Duration::from_secs(1), rx).await
});
println!("{:?}", output);
}

The shared mutex problem in Rust (implementing AsyncRead/AsyncWrite for Arc<Mutex<IpStack>>)

Suppose I have an userspace TCP/IP stack. It's natural that I wrap it in Arc<Mutex<>> so I can share it with my threads.
It's also natural that I want to implement AsyncRead and AsyncWrite for it, so libraries that expect impl AsyncWrite and impl AsyncRead like hyper can use it.
This is an example:
use core::task::Context;
use std::pin::Pin;
use std::sync::Arc;
use core::task::Poll;
use tokio::io::{AsyncRead, AsyncWrite};
struct IpStack{}
impl IpStack {
pub fn send(self, data: &[u8]) {
}
//TODO: async or not?
pub fn receive<F>(self, f: F)
where F: Fn(Option<&[u8]>){
}
}
pub struct Socket {
stack: Arc<futures::lock::Mutex<IpStack>>,
}
impl AsyncRead for Socket {
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut tokio::io::ReadBuf<'_>
) -> Poll<std::io::Result<()>> {
//How should I lock and call IpStack::read here?
Poll::Ready(Ok(()))
}
}
impl AsyncWrite for Socket {
fn poll_write(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &[u8],
) -> Poll<Result<usize, std::io::Error>> {
//How should I lock and call IpStack::send here?
Poll::Ready(Ok(buf.len()))
}
//poll_flush and poll_shutdown...
}
Playground
I don't see anything wrong with my assumptions and I don't see another better way to share a stack with multiple threads unless I wrap it in Arc<Mutex<>>
This is similar to try_lock on futures::lock::Mutex outside of async? which caught my interest.
How should I lock the mutex without blocking? Notice that once I got the lock, the IpStack is not async, it has calls that block. I would like to implement async to it too, but I don't know it the problem will get much harder. Or would the problem get simpler if it had async calls?

I found the tokio documentation page on tokio::sync::Mutex pretty helpful: https://docs.rs/tokio/1.6.0/tokio/sync/struct.Mutex.html
From your description it sounds you want:
Non-blocking operations
One big data structure that manages all the IO resources managed by the userspace TCP/IP stack
To share that one big data structure across threads
I would suggest exploring something like an actor and use message passing to communicate with a task spawned to manage the TCP/IP resources. I think you could wrap the API kind of like the mini-redis example cited in tokio's documentation to implement AsyncRead and AsyncWrite. It might be easier to start with an API that returns futures of complete results and then work on streaming. I think this would be easier to make correct. Could be fun to exercise it with loom.
I think if you were intent on synchronizing access to the TCP/IP stack through a mutex you'd probably end up with an Arc<Mutex<...>> but with an API that wraps the mutex locks like mini-redis. The suggestion the tokio documentation makes is that their Mutex implementation is more appropriate for managing IO resources rather than sharing raw data and that does fit your situation I think.

You should not use an asynchronous mutex for this. Use a standard std::sync::Mutex.
Asynchronous mutexes like futures::lock::Mutex and tokio::sync::Mutex allow locking to be awaited instead of blocking so they are safe to use in async contexts. They are designed to be used across awaits. This is precisely what you don't want to happen! Locking across an await means that the mutex is locked for potentially a very long time and would prevent other asynchronous tasks wanting to use the IpStack from making progress.
Implementing AsyncRead/AsyncWrite is straight-forward in theory: either it can be completed immediately, or it coordinates through some mechanism to notify the context's waker when the data is ready and returns immediately. Neither case requires extended use of the underlying IpStack, so its safe to use a non-asynchronous mutex.
use std::pin::Pin;
use std::sync::{Arc, Mutex};
use std::task::{Context, Poll};
use tokio::io::{AsyncRead, AsyncWrite};
struct IpStack {}
pub struct Socket {
stack: Arc<Mutex<IpStack>>,
}
impl AsyncRead for Socket {
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut tokio::io::ReadBuf<'_>,
) -> Poll<std::io::Result<()>> {
let ip_stack = self.stack.lock().unwrap();
// do your stuff
Poll::Ready(Ok(()))
}
}

I don't see another better way to share a stack with multiple threads unless I wrap it in Arc<Mutex<>>.
A Mutex is certainly the most straightforward way to implement something like this, but I would suggest an inversion of control.
In the Mutex-based model, the IpStack is really driven by the Sockets, which consider the IpStack to be a shared resource. This results in a problem:
If a Socket blocks on locking the stack, it violates the contract of AsyncRead by spending an unbounded amount of time executing.
If a Socket doesn't block on locking the stack, choosing instead to use try_lock(), it may be starved because it doesn't remain "in line" for the lock. A fair locking algorithm, such as that provided by parking_lot, can't save you from starvation if you don't wait.
Instead, you might approach the problem the way the system network stack does. Sockets are not actors: the network stack drives the sockets, not the other way around.
In practice, this means that the IpStack should have some means for polling the sockets to determine which one(s) to write to/read from next. OS interfaces for this purpose, though not directly applicable, may provide some inspiration. Classically, BSD provided select(2) and poll(2); these days, APIs like epoll(7)(Linux) and kqueue(2)(FreeBSD) are preferred for large numbers of connections.
A dead simple strategy, loosely modeled on select/poll, is to repeatedly scan a list of Socket connections in round-robin fashion, handling their pending data as soon as it is available.
For a basic implementation, some concrete steps are:
When creating a new Socket, a bidirectional channel (i.e. one bounded channel in each direction) is established between it and the IpStack.
AsyncWrite on a Socket attempts to send data over the outgoing channel to the IpStack. If the channel is full, return Poll::Pending.
AsyncRead on a Socket attempts to receive data over the incoming channel from the IpStack. If the channel is empty, return Poll::Pending.
The IpStack must be driven externally (for instance, in an event loop on another thread) to continually poll the open sockets for available data, and to deliver incoming data to the correct sockets. By allowing the IpStack to control which sockets' data is sent, you can avoid the starvation problem of the Mutex solution.

Do you always need an async fn main() if using async in Rust?

I'm researching and playing with Rust's async/.await to write a service in Rust that will pull from some websockets and do something with that data. A colleague of mine (who did this similar "data feed importing" in C#) has told me to handle these feeds asynchronously, since threads would be bad performance-wise.
It's my understanding that, to do any async in Rust, you need a runtime (e.g. Tokio). After inspecting most code I've found on the subject it seems that a prerequisite is to have a:
#[tokio::main]
async fn main() {
// ...
}
which provides the necessary runtime which manages our async code. I came to this conclusion because you cannot use .await in scopes which are not async functions or blocks.
This leads me to my main question: if intending to use async in Rust, do you always needs an async fn main() as described above? If so, how do you structure your synchronous code? Can structs have async methods and functions implemented (or should they even)?
All of this stems from my initial approach to writing this service, because the way I envisioned it is to have some sort of struct which would handle multiple websocket feeds and if they need to be done asynchronously, then by this logic, that struct would have to have async logic in it.

No. The #[tokio::main] is just a convenience feature which you can use to create a Tokio runtime and launch the main function inside it.
If you want to explicitly initialize a runtime instance, you can use the Builder. The runtime has the spawn method which takes an async closure and executes it inside the runtime without being async itself. This allows you to create a Tokio runtime anywhere in your non-async code.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string