What is futures::future::lazy for? - rust

In the Tokio documentation we have this snippet:
extern crate tokio;
extern crate futures;
use futures::future::lazy;
tokio::run(lazy(|| {
for i in 0..4 {
tokio::spawn(lazy(move || {
println!("Hello from task {}", i);
Ok(())
}));
}
Ok(())
}));
The explanation for this is:
The lazy function runs the closure the first time the future is polled. It is used here to ensure that tokio::spawn is called from a task. Without lazy, tokio::spawn would be called from outside the context of a task, which results in an error.
I'm not sure I understand this precisely, despite having some familiarity with Tokio. It seems that these two lazy have slightly different roles, and that this explanation only applies to the first one. Isn't the second call to lazy (inside the for loop) just here to convert the closure into a future?

The purpose of lazy is covered by the documentation for lazy:
Creates a new future which will eventually be the same as the one created by the closure provided.
The provided closure is only run once the future has a callback scheduled on it, otherwise the callback never runs. Once run, however, this future is the same as the one the closure creates.
Like a plain closure, it's used to prevent code from being eagerly evaluated. In synchronous terms, it's the difference between calling a function and calling the closure that the function returned:
fn sync() -> impl FnOnce() {
println!("This is run when the function is called");
|| println!("This is run when the return value is called")
}
fn main() {
let a = sync();
println!("Called the function");
a();
}
And the parallel for futures 0.1:
use futures::{future, Future}; // 0.1.27
fn not_sync() -> impl Future<Item = (), Error = ()> {
println!("This is run when the function is called");
future::lazy(|| {
println!("This is run when the return value is called");
Ok(())
})
}
fn main() {
let a = not_sync();
println!("Called the function");
a.wait().unwrap();
}
With async/await syntax, this function should not be needed anymore:
#![feature(async_await)] // 1.37.0-nightly (2019-06-05)
use futures::executor; // 0.3.0-alpha.16
use std::future::Future;
fn not_sync() -> impl Future<Output = ()> {
println!("This is run when the function is called");
async {
println!("This is run when the return value is called");
}
}
fn main() {
let a = not_sync();
println!("Called the function");
executor::block_on(a);
}
As you've identified, Tokio's examples use lazy to:
ensure that the code in the closure is only run from inside the executor.
ensure that a closure is run as a future
I view these two aspects of lazy as effectively the same.
See also:
Why does calling tokio::spawn result in the panic "SpawnError { is_shutdown: true }"?
What is the purpose of async/await in Rust?

Related

Async loop on a new thread in rust: the trait `std::future::Future` is not implemented for `()`

I know this question has been asked many times, but I still can't figure out what to do (more below).
I'm trying to spawn a new thread using std::thread::spawn and then run an async loop inside of it.
The async function I want to run:
#[tokio::main]
pub async fn pull_tweets(pg_pool2: Arc<PgPool>, config2: Arc<Settings>) {
let mut scheduler = AsyncScheduler::new();
scheduler.every(10.seconds()).run(move || {
let arc_pool = pg_pool2.clone();
let arc_config = config2.clone();
async {
pull_from_main(arc_pool, arc_config).await;
}
});
tokio::spawn(async move {
loop {
scheduler.run_pending().await;
tokio::time::sleep(Duration::from_millis(100)).await;
}
});
}
Spawning a thread to run in:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || async {
pull_tweets(pg_pool2, config2).await;
});
}
The error:
error[E0277]: `()` is not a future
--> src/main.rs:89:9
|
89 | pull_tweets(pg_pool2, config2).await;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `()` is not a future
|
= help: the trait `std::future::Future` is not implemented for `()`
= note: required by `poll`
The last comment here does an amazing job at generalizing the problem. It seems at some point a return value is expected that implements IntoFuture and I don't have that. I tried adding Ok(()) to both the closure and the function, but no luck.
Adding into closure does literally nothing
Adding into async function gives me a new, but similar-sounding error:
`Result<(), ()>` is not a future
Then I noticed that the answer specifically talks about extension functions, which I'm not using. This also talks about extension functions.
Some other SO answers:
This is caused by a missing async.
This and this are reqwest library specific.
So none seem to work. Could someone help me understand 1)why the error exists here and 2)how to fix it?
Note 1: All of this can be easily solved by replacing std::thread::spawn with tokio::task::spawn_blocking. But I'm purposefully experimenting with thread spawn as per this article.
Note 2: Broader context on what I want to achieve: I'm pulling 150,000 tweets from twitter in an async loop. I want to compare 2 implementations: running on the main runtime vs running on separate thread. The latter is where I struggle.
Note 3: in my mind threads and async tasks are two different primitives that don't mix. Ie spawning a new thread doesn't affect how tasks behave and spawning a new task only adds work on existing thread. Please let me know if this worldview is mistaken (and what I can read).
#[tokio::main] converts your function into the following:
#[tokio::main]
pub fn pull_tweets(pg_pool2: Arc<PgPool>, config2: Arc<Settings>) {
let rt = tokio::Runtime::new();
rt.block_on(async {
let mut scheduler = AsyncScheduler::new();
// ...
});
}
Notice that it is a synchronous function, that spawns a new runtime and runs the inner future to completion. You do not await it, it is a separate runtime with it's own dedicate thread pool and scheduler:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || {
pull_tweets(pg_pool2, config2);
});
}
Note that your original example was wrong in another way:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || async {
pull_tweets(pg_pool2, config2).await;
});
}
Even if pull_tweets was an async function, the thread would not do anything, as all you do is create another future in the async block. The created future is not executed, because futures are lazy (and there is no executor context in that thread anyways).
I would structure the code to spawn the runtime directly in the new thread, and call any async functions you want from there:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || {
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap();
rt.block_on(async {
pull_tweets(pg_pool2, config2).await;
});
});
}
pub async fn pull_tweets(pg_pool2: Arc<PgPool>, config2: Arc<Settings>) {
// ...
}

Run async function in run_interval and return result

I need to run an async function in actix::prelude::AsyncContext::run_interval, but I need to also pass in a struct member and return the result (not the future). This is a somewhat more complex version of this question here. As can be seen in the commented section below, I have tried a few approaches but all of them fail for one reason or another.
I have looked at a few related resources, including the AsyncContext trait and these StackOverflow questions: 3, 4.
Here is my example code (actix crate is required in Cargo.toml):
use std::time::Duration;
use actix::{Actor, Arbiter, AsyncContext, Context, System};
struct MyActor {
id: i32
}
impl MyActor {
fn new(id: i32) -> Self {
Self {
id: id,
}
}
fn heartbeat(&self, ctx: &mut <Self as Actor>::Context) {
ctx.run_interval(Duration::from_secs(1), |act, ctx| {
//lifetime issue
//let res = 0;
//Arbiter::spawn(async {
// res = two(act.id).await;
//});
//future must return `()`
//let res = Arbiter::spawn(two(act.id));
//async closures unstable
//let res = Arbiter::current().exec(async || {
// two(act.id).await
//});
});
}
}
impl Actor for MyActor {
type Context = Context<Self>;
fn started(&mut self, ctx: &mut Self::Context) {
self.heartbeat(ctx);
}
}
// assume functions `one` and `two` live in another module
async fn one(id: i32) -> i32 {
// assume something is done with id here
let x = id;
1
}
async fn two(id: i32) -> i32 {
let x = id;
// assume this may call other async functions
one(x).await;
2
}
fn main() {
let mut system = System::new("test");
system.block_on(async { MyActor::new(10).start() });
system.run();
}
Rust version:
$ rustc --version
rustc 1.50.0 (cb75ad5db 2021-02-10)
Using Arbiter::spawn would work, but the issue is with the data being accessed from inside the async block that's passed to Arbiter::spawn. Since you're accessing act from inside the async block, that reference will have to live longer than the closure that calls Arbiter::spawn. In fact, in will have to have a lifetime of 'static since the future produced by the async block could potentially live until the end of the program.
One way to get around this in this specific case, given that you need an i32 inside the async block, and an i32 is a Copy type, is to move it:
ctx.run_interval(Duration::from_secs(1), |act, ctx| {
let id = act.id;
Arbiter::spawn(async move {
two(id).await;
});
});
Since we're using async move, the id variable will be moved into the future, and will thus be available whenever the future is run. By assigning it to id first, we are actually copying the data, and it's the copy (id) that will be moved.
But this might not be what you want, if you're trying to get a more general solution where you can access the object inside the async function. In that case, it gets a bit tricker, and you might want to consider not using async functions if possible. If you must, it might be possible to have a separate object with the data you need which is surrounded by std::rc::Rc, which can then be moved into the async block without duplicating the underlying data.

No method named `poll_next` found for struct `tokio::sync::mpsc::Receiver<&str>` in the current scope [duplicate]

I'm trying to learn async programming, but this very basic example doesn't work:
use std::future::Future;
fn main() {
let t = async {
println!("Hello, world!");
};
t.poll();
}
Everything I've read from the specs says this should work, but cargo complains that method "poll" can't be found in "impl std::future::Future". What am I doing wrong?
poll has this signature:
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
There are two problems with calling this in the way you do:
poll is not implement on a future Fut, but on Pin<&mut Fut>, so you need to get a pinned reference first. pin_mut! is often useful, and if the future implements Unpin, you can use Pin::new as well.
The bigger problem however is that poll takes a &mut Context<'_> argument. The context is created by the asynchronous runtime and passed in to the poll function of the outermost future. This means that you can't just poll a future like that, you need to be in an asynchronous runtime to do it.
Instead, you can use a crate like tokio or async-std to run a future in a synchronous context:
// tokio
use tokio::runtime::Runtime;
let runtime = Runtime::new().unwrap();
let result = runtime.block_on(async {
// ...
});
// async-std
let result = async_std::task::block_on(async {
// ...
})
Or even better, you can use #[tokio::main] or #[async_std::main] to convert your main function into an asynchronous function:
// tokio
#[tokio::main]
async fn main() {
// ...
}
// async-std
#[async_std::main]
async fn main() {
// ...
}

Spawn non-static future with Tokio

I have an async method that should execute some futures in parallel, and only return after all futures finished. However, it is passed some data by reference that does not live as long as 'static (it will be dropped at some point in the main method). Conceptually, it's similar to this (Playground):
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in array {
let task = spawn(do_sth(i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
#[tokio::main]
async fn main() {
parallel_stuff(&[3, 1, 4, 2]);
}
Now, tokio wants futures that are passed to spawn to be valid for the 'static lifetime, because I could drop the handle without the future stopping. That means that my example above produces this error message:
error[E0759]: `array` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
--> src/main.rs:12:25
|
12 | async fn parallel_stuff(array: &[u64]) {
| ^^^^^ ------ this data with an anonymous lifetime `'_`...
| |
| ...is captured here...
...
15 | let task = spawn(do_sth(i));
| ----- ...and is required to live as long as `'static` here
So my question is: How do I spawn futures that are only valid for the current context that I can then wait until all of them completed?
It is not possible to spawn a non-'static future from async Rust. This is because any async function might be cancelled at any time, so there is no way to guarantee that the caller really outlives the spawned tasks.
It is true that there are various crates that allow scoped spawns of async tasks, but these crates cannot be used from async code. What they do allow is to spawn scoped async tasks from non-async code. This doesn't violate the problem above, because the non-async code that spawned them cannot be cancelled at any time, as it is not async.
Generally there are two approaches to this:
Spawn a 'static task by using Arc rather than ordinary references.
Use the concurrency primitives from the futures crate instead of spawning.
Generally to spawn a static task and use Arc, you must have ownership of the values in question. This means that since your function took the argument by reference, you cannot use this technique without cloning the data.
async fn do_sth(with: Arc<[u64]>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &[u64]) {
// Make a clone of the data so we can shared it across tasks.
let shared: Arc<[u64]> = Arc::from(array);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
Note that if you have a mutable reference to the data, and the data is Sized, i.e. not a slice, it is possible to temporarily take ownership of it.
async fn do_sth(with: Arc<Vec<u64>>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &mut Vec<u64>) {
// Swap the array with an empty one to temporarily take ownership.
let vec = std::mem::take(array);
let shared = Arc::new(vec);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
// Put back the vector where we took it from.
// This works because there is only one Arc left.
*array = Arc::try_unwrap(shared).unwrap();
}
Another option is to use the concurrency primitives from the futures crate. These have the advantage of working with non-'static data, but the disadvantage that the tasks will not be able to run on multiple threads at the same time.
For many workflows this is perfectly fine, as async code should spend most of its time waiting for IO anyway.
One approach is to use FuturesUnordered. This is a special collection that can store many different futures, and it has a next function that runs all of them concurrently, and returns once the first of them finished. (The next function is only available when StreamExt is imported)
You can use it like this:
use futures::stream::{FuturesUnordered, StreamExt};
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks = FuturesUnordered::new();
for i in array {
let task = do_sth(i);
tasks.push(task);
}
// This loop runs everything concurrently, and waits until they have
// all finished.
while let Some(()) = tasks.next().await { }
}
Note: The FuturesUnordered must be defined after the shared value. Otherwise you will get a borrow error that is caused by them being dropped in the wrong order.
Another approach is to use a Stream. With streams, you can use buffer_unordered. This is a utility that uses FuturesUnordered internally.
use futures::stream::StreamExt;
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
// Create a stream going through the array.
futures::stream::iter(array)
// For each item in the stream, create a future.
.map(|i| do_sth(i))
// Run at most 10 of the futures concurrently.
.buffer_unordered(10)
// Since Streams are lazy, we must use for_each or collect to run them.
// Here we use for_each and do nothing with the return value from do_sth.
.for_each(|()| async {})
.await;
}
Note that in both cases, importing StreamExt is important as it provides various methods that are not available on streams without importing the extension trait.
In case of code that uses threads for parallelism, it is possible to avoid copying by extending a lifetime with transmute. An example:
fn main() {
let now = std::time::Instant::now();
let string = format!("{now:?}");
println!(
"{now:?} has length {}",
parallel_len(&[&string, &string]) / 2
);
}
fn parallel_len(input: &[&str]) -> usize {
// SAFETY: this variable needs to be static, because it is passed into a thread,
// but the thread does not live longer than this function, because we wait for
// it to finish by calling `join` on it.
let input: &[&'static str] = unsafe { std::mem::transmute(input) };
let mut threads = vec![];
for txt in input {
threads.push(std::thread::spawn(|| txt.len()));
}
threads.into_iter().map(|t| t.join().unwrap()).sum()
}
It seems reasonable that this should also work for asynchronous code, but I do not know enough about that to say for sure.

How do you write test assertions inside of tokio::run futures?

How do you test your futures which are meant to be run in the Tokio runtime?
fn fut_dns() -> impl Future<Item = (), Error = ()> {
let f = dns::lookup("www.google.de", "127.0.0.1:53");
f.then(|result| match result {
Ok(smtptls) => {
println!("{:?}", smtptls);
assert_eq!(smtptls.version, "TLSRPTv1");
assert!(smtptls.rua.len() > 0);
assert_eq!(smtptls.rua[0], "mailto://...");
ok(())
}
Err(e) => {
println!("error: {:?}", e);
err(())
}
})
}
#[test]
fn smtp_log_test() {
tokio::run(fut_dns());
assert!(true);
}
The future runs and the thread of the future panics on an assert. You can read the panic in the console, but the test doesn't recognize the threads of tokio::run.
The How can I test a future that is bound to a tokio TcpStream? doesn't answer this, because it simply says: A simple way to test async code may be to use a dedicated runtime for each test
I do this!
My question is related to how the test can detect if the future works. The future needs a started runtime environment.
The test is successful although the future asserts or calls err().
So what can I do?
Do not write your assertions inside the future.
As described in How can I test a future that is bound to a tokio TcpStream?, create a Runtime to execute your future. As described in How do I synchronously return a value calculated in an asynchronous Future in stable Rust?, compute your value and then exit the async world:
fn run_one<F>(f: F) -> Result<F::Item, F::Error>
where
F: IntoFuture,
F::Future: Send + 'static,
F::Item: Send + 'static,
F::Error: Send + 'static,
{
let mut runtime = tokio::runtime::Runtime::new().expect("Unable to create a runtime");
runtime.block_on(f.into_future())
}
#[test]
fn smtp_log_test() {
let smtptls = run_one(dns::lookup("www.google.de", "127.0.0.1:53")).unwrap();
assert_eq!(smtptls.version, "TLSRPTv1");
assert!(smtptls.rua.len() > 0);
assert_eq!(smtptls.rua[0], "mailto://...");
}

Resources