Call a random function with variable arguments dynamically - rust

I have a list of functions with variable arguments, and I want to randomly pick one of them, in runtime, and call it, on a loop. I'm looking to enhance the performance of my solution.
I have a function that calculates the arguments based on some randomness, and then (should) return a function pointer, which I could then call.
pub async fn choose_random_endpoint(
&self,
rng: ThreadRng,
endpoint_type: EndpointType,
) -> impl Future<Output = Result<std::string::String, MyError>> {
match endpoint_type {
EndpointType::Type1 => {
let endpoint_arguments = self.choose_endpoint_arguments(rng);
let endpoint = endpoint1(&self.arg1, &self.arg2, &endpoint_arguments.arg3);
endpoint
}
EndpointType::Type2 => {
let endpoint_arguments = self.choose_endpoint_arguments(rng);
let endpoint = endpoint2(
&self.arg1,
&self.arg2,
&endpoint_arguments.arg3,
rng.clone(),
);
endpoint
}
EndpointType::Type3 => {
let endpoint_arguments = self.choose_endpoint_arguments(rng);
let endpoint = endpoint3(
&self.arg1,
&self.arg2,
&endpoint_arguments.arg3,
rng.clone(),
);
endpoint
}
}
}
The error I obtain is
expected opaque type `impl Future<Output = Result<std::string::String, MyError>>` (opaque type at <src/calls/type1.rs:14:6>)
found opaque type `impl Future<Output = Result<std::string::String, MyError>>` (opaque type at <src/type2.rs:19:6>)
. The compiler advises me to await the endpoints, and this solves the issue, but is there a performance overhead to this?
Outer function:
Aassume there is a loop calling this function:
pub async fn make_call(arg1: &str, arg2: &str) -> Result<String> {
let mut rng = rand::thread_rng();
let random_endpoint_type = choose_random_endpoint_type(&mut rng);
let random_endpoint = choose_random_endpoint(&rng, random_endpoint_type);
// call the endpoint
Ok(response)
}
Now, I want to call make_call every X seconds, but I don't want my main thread to block during the endpoint calls, as those are expensive. I suppose the right way to approach this is spawning a new thread per X seconds of interval, that call make_call?
Also, performance-wise: having so many clones on the rng seems quite expensive. Is there a more performant way to do this?

The error you get is sort of unrelated to async. It's the same one you get when you try to return two different iterators from a function. Your function as written doesn't even need to be async. I'm going to remove async from it when it's not needed, but if you need async (like for implementing an async-trait) then you can add it back and it'll probably work the same.
I've reduced your code into a simpler example that has the same issue (playground):
async fn a() -> &'static str {
"a"
}
async fn b() -> &'static str {
"b"
}
fn a_or_b() -> impl Future<Output = &'static str> {
if rand::random() {
a()
} else {
b()
}
}
What you're trying to write
When you want to return a trait, but the specific type that implements that trait isn't known at compile time, you can return a trait object. Futures need to be Unpin to be awaited, so this uses a pinned box (playground).
fn a_or_b() -> Pin<Box<dyn Future<Output = &'static str>>> {
if rand::random() {
Box::pin(a())
} else {
Box::pin(b())
}
}
You may need the type to be something like Pin<Box<dyn Future<Output = &'static str> + Send + Sync + 'static>> depending on the context.
What you should write
I think the only reason you'd do the above is if you want to generate the future with some kind of async rng, then do something else, and then run the generated future after that. Otherwise there's no need to have nested futures; just await the inner futures when you call them (playground).
async fn a_or_b() -> &'static str {
if rand::random() {
a().await
} else {
b().await
}
}
This is conceptually equivalent to the Pin<Box> method, just without having to allocate a Box. Instead, you have an opaque type that implements Future itself.
Blocking
The blocking behavior of these is only slightly different. Pin<Box> will block on non-async things when you call it, while the async one will block on non-async things where you await it. This is probably mostly the random generation.
The blocking behavior of the endpoint is the same and depends on what happens inside there. It'll block or not block wherever you await either way.
If you want to have multiple make_call calls happening at the same time, you'll need to do that outside the function anyway. Using the tokio runtime, it would look something like this:
use tokio::task;
use futures::future::join_all;
let tasks: Vec<_> = (0..100).map(|_| task::spawn(make_call())).collect();
let results = join_all(tasks).await;
This also lets you do other stuff while the futures are running, in between collect(); and let results.
If something inside your function blocks, you'd want to spawn it with task::spawn_blocking (and then await that handle) so that the await call in make_call doesn't get blocked.
RNG
If your runtime is multithreaded, the ThreadRng will be an issue. You could create a type that implements Rng + Send with from_entropy, and pass that into your functions. Or you can call thread_rng or even just rand::random where you need it. This makes a new rng per thread, but will reuse them on later calls since it's a thread-local static. On the other hand, if you don't need as much randomness, you can go with a Rng + Send type from the beginning.
If your runtime isn't multithreaded, you should be able to pass &mut ThreadRng all the way through, assuming the borrow checker is smart enough. You won't be able to pass it into an async function and then spawn it, though, so you'd have to create a new one inside that function.

Related

Extracting the saved local variables of the generator data structure of Future

From the "Rust for Rustaceans" book, I read that "... every await or yield is really a return from the function. After all, there are several local variables in the function, and it’s not clear how they’re restored when we resume later on. This is where the compiler-generated part of generators comes into play. The compiler transparently injects code to persist those variables into and read them from the generator’s associated data structure, rather than the stack, at the time of execution. So if you declare, write to, or read from some local variable a, you are really operating on something akin to self.a"
Say I have something like this:
use futures::future::{AbortHandle, Abortable};
use tokio::{time::sleep};
use std::{time::Duration};
async fn echo(s: String, times_to_repeat: u32) {
let mut vec = Vec::new();
for n in 0..times_to_repeat {
println!("Iteration {} Echoing {}", n, s.clone());
vec.push(s.clone());
sleep(Duration::from_millis(10)).await;
}
}
async fn child(s: String) {
echo(s, 100).await
}
#[tokio::main]
async fn main() {
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let result_fut = Abortable::new(child(String::from("Hello")), abort_registration);
tokio::spawn(async move {
sleep(Duration::from_millis(100)).await;
abort_handle.abort();
});
result_fut.await.unwrap();
}
After abort has been called, how do I save/ serialize variables like n and vec? Is there a way to reach within the inside of the data structure of the generator that is generated from Future?

How to iterate over a vector of functions and call them without moving the vector?

I have a struct that contains a vector of functions and a method that iterates over each and calls it.
struct App {
functions: Vec<Box<dyn FnOnce(i32) -> i32>>
}
impl App {
pub fn run(&mut self) {
for function in &mut self.functions {
println!("{}", (function)(1));
}
}
}
However, I this moves the vector and I have not found a clean way to copy it.
cannot move out of `*function` which is behind a mutable reference
move occurs because `*function` has type `Box<dyn FnOnce(i32) -> i32>`, which does not
implement the `Copy` trait (rustc E0507)
So is there any way to call the function without moving it or a clean way to copy it.
The way function traits in Rust are structured are a little confusing at the beginning, but follow the same pattern of ownership.
There are three function ownership types:
FnOnce - similar to self. Requires the caller to own this function object, and the object will be consumed. This is the strongest guarantee a caller can make to a function, and every function is FnOnce.
FnMut - similar to &mut self. Requires the caller to hold a mutable reference to the function. Can be called multiple times. All the functions that might have side effects require at least FnMut ownership. Every function that is executable through an FnMut is also compatible with FnOnce, as FnOnce has stronger ownership guarantees.
Fn - similar to &self. Allows the caller to reference this immutably, meaning, this object can be referenced and called from multiple points simultaneously (important: not talking multi-threading here, multi-threading needs Sync additionally). This is the weakest ownership requirement, functions that allow owners to call them through this usually have no side effects and just compute an output from an input. All functions that allow being called through Fn automatically also can be called through FnOnce and FnMut, as those give stronger ownership guarantees.
With that out of the way, in your case your functions are FnOnce, meaning they require owning. Calling them will consume them. That last part especially is the reason why you get errors here.
The way you own self in your run method is too weak. You can't call FnOnce, which require ownership, on a self object that you don't own. A &mut self object only allows you to call FnMut members.
So there are two ways to fix your code:
Change the type of the self parameter to owned. Note, however, that this will make run a consuming function that will destroy self in the end.
Change the type of your function objects to FnMut. This is of course only possible if your functions allow being called through FnMut.
Here is an example of how it would look like if you changed FnOnce to Fn:
struct App {
functions: Vec<Box<dyn Fn(i32) -> i32>>,
}
impl App {
pub fn run(&self) {
for function in &self.functions {
println!("{}", (function)(1));
}
}
}
fn main() {
let app = App {
functions: vec![Box::new(|x| 2 * x), Box::new(|x| 3 * x)],
};
app.run();
app.run();
}
2
3
Note that the closure |x| 2 * x here is compatible with Fn, because it has zero side effects. This means that our self variable can now be an immutable reference, because calling those functions is guaranteed to not mutate self.
We can also call app.run() multiple times, and app doesn't have to be mutable.
Now here is an example of a function that requires FnMut:
struct App {
functions: Vec<Box<dyn FnMut(i32) -> i32>>,
}
impl App {
pub fn run(&mut self) {
for function in &mut self.functions {
println!("{}", (function)(1));
}
}
}
fn main() {
let mut add = {
let mut sum = 0;
move |x| {
sum += x;
println!("sum is now {}!", sum);
sum
}
};
add(5);
add(10);
let mut app = App {
functions: vec![Box::new(add)],
};
app.run();
app.run();
}
sum is now 5!
sum is now 15!
sum is now 16!
16
sum is now 17!
17
This function now has a side effect - it changes the value of sum every time it is called. It is therefore no longer compatible with Fn, and using it in the previous code would cause an error.
Our function objects now have to be FnMut, our self has to become &mut self and our app object is required to be mutable.
We are, however, still able to call app.run() multiple times.
Now, lets see what would be necessary to use an FnOnce:
struct App {
functions: Vec<Box<dyn FnOnce(i32) -> i32>>,
}
impl App {
pub fn run(self) {
for function in self.functions {
println!("{}", (function)(1));
}
}
}
fn main() {
let add = {
let s = String::new();
move |x| {
drop(s);
x
}
};
let app = App {
functions: vec![Box::new(add)],
};
app.run();
}
1
The reason this closure is FnOnce is because of the drop. It destroys the s variable when called, and that can for obvious reasons only happen once.
Therefore, to store it in the App, App has to change its function object type to FnOnce. Further, self in the run() method now also needs to be taken as an owned object, because parts of it will get destroyed in the process.
This now means that we can call app.run() only once. There is no need to mark app as mut, because it won't get mutated in the process, it will get completely consumed, so we don't need to worry about mutability. It simply can't get accessed afterwards any more.
It makes sense that we can now call app.run() only once, because the function it contains can also only be called once.
I hope this helped somehow to increase your understanding of your situation.

How do I define the lifetime for a tokio task spawned from a class?

I'm attempting to write a generic set_interval function helper:
pub fn set_interval<F, Fut>(mut f: F, dur: Duration)
where
F: Send + 'static + FnMut() -> Fut,
Fut: Future<Output = ()> + Send + 'static,
{
let mut interval = tokio::time::interval(dur);
tokio::spawn(async move {
// first tick is at 0ms
interval.tick().await;
loop {
interval.tick().await;
tokio::spawn(f());
}
});
}
This works fine until it's called from inside a class:
fn main() {}
struct Foo {}
impl Foo {
fn bar(&self) {
set_interval(|| self.task(), Duration::from_millis(1000));
}
async fn task(&self) {
}
}
self is not 'static, and we can't restrict lifetime parameter to something that is less than 'static because of tokio::task.
Is it possible to modify set_interval implementation so it works in cases like this?
Link to playground
P.S. Tried to
let instance = self.clone();
set_interval(move || instance.task(), Duration::from_millis(1000));
but I also get an error: error: captured variable cannot escape FnMut closure body
Is it possible to modify set_interval implementation so it works in cases like this?
Not really. Though spawn-ing f() really doesn't help either, as it precludes a simple "callback owns the object" solution (as you need either both callback and future to own the object, or just future).
I think that leaves two solutions:
Convert everything to shared mutability Arc, the callback owns one Arc, then on each tick it clones that and moves the clone into the future (the task method).
Have the future (task) acquire the object from some external source instead of being called on one, this way the intermediate callback doesn't need to do anything. Or the callback can do the acquiring and move that into the future, same diff.
Incidentally at this point it could make sense to just create the future directly, but allow cloning it. So instead of taking a callback set_interval would take a clonable future, and it would spawn() clones of its stored future instead of creating them anew.
As mentioned by #Masklinn, you can clone the Arc to allow for this. Note that cloning the Arc will not clone the underlying data, just the pointer, so it is generally OK to do so, and should not have a major impact on performance.
Here is an example. The following code will produce the error async block may outlive the current function, but it borrows data, which is owned by the current function:
fn main() {
// 🛑 Error: async block may outlive the current function, but it borrows data, which is owned by the current function
let data = Arc::new("Hello, World".to_string());
tokio::task::spawn(async {
println!("1: {}", data.len());
});
tokio::task::spawn(async {
println!("2: {}", data.len());
});
}
Rust unhelpfully suggests adding move to both async blocks, but that will result in a borrowing error because there would be multiple ownership.
To fix the problem, we can clone the Arc for each task and then add the move keyword to the async blocks:
fn main() {
let data = Arc::new("Hello, World".to_string());
let data_for_task_1 = data.clone();
tokio::task::spawn(async move {
println!("1: {}", data_for_task_1.len());
});
let data_for_task_2 = data.clone();
tokio::task::spawn(async move {
println!("2: {}", data_for_task_2.len());
});
}

Run async function in run_interval and return result

I need to run an async function in actix::prelude::AsyncContext::run_interval, but I need to also pass in a struct member and return the result (not the future). This is a somewhat more complex version of this question here. As can be seen in the commented section below, I have tried a few approaches but all of them fail for one reason or another.
I have looked at a few related resources, including the AsyncContext trait and these StackOverflow questions: 3, 4.
Here is my example code (actix crate is required in Cargo.toml):
use std::time::Duration;
use actix::{Actor, Arbiter, AsyncContext, Context, System};
struct MyActor {
id: i32
}
impl MyActor {
fn new(id: i32) -> Self {
Self {
id: id,
}
}
fn heartbeat(&self, ctx: &mut <Self as Actor>::Context) {
ctx.run_interval(Duration::from_secs(1), |act, ctx| {
//lifetime issue
//let res = 0;
//Arbiter::spawn(async {
// res = two(act.id).await;
//});
//future must return `()`
//let res = Arbiter::spawn(two(act.id));
//async closures unstable
//let res = Arbiter::current().exec(async || {
// two(act.id).await
//});
});
}
}
impl Actor for MyActor {
type Context = Context<Self>;
fn started(&mut self, ctx: &mut Self::Context) {
self.heartbeat(ctx);
}
}
// assume functions `one` and `two` live in another module
async fn one(id: i32) -> i32 {
// assume something is done with id here
let x = id;
1
}
async fn two(id: i32) -> i32 {
let x = id;
// assume this may call other async functions
one(x).await;
2
}
fn main() {
let mut system = System::new("test");
system.block_on(async { MyActor::new(10).start() });
system.run();
}
Rust version:
$ rustc --version
rustc 1.50.0 (cb75ad5db 2021-02-10)
Using Arbiter::spawn would work, but the issue is with the data being accessed from inside the async block that's passed to Arbiter::spawn. Since you're accessing act from inside the async block, that reference will have to live longer than the closure that calls Arbiter::spawn. In fact, in will have to have a lifetime of 'static since the future produced by the async block could potentially live until the end of the program.
One way to get around this in this specific case, given that you need an i32 inside the async block, and an i32 is a Copy type, is to move it:
ctx.run_interval(Duration::from_secs(1), |act, ctx| {
let id = act.id;
Arbiter::spawn(async move {
two(id).await;
});
});
Since we're using async move, the id variable will be moved into the future, and will thus be available whenever the future is run. By assigning it to id first, we are actually copying the data, and it's the copy (id) that will be moved.
But this might not be what you want, if you're trying to get a more general solution where you can access the object inside the async function. In that case, it gets a bit tricker, and you might want to consider not using async functions if possible. If you must, it might be possible to have a separate object with the data you need which is surrounded by std::rc::Rc, which can then be moved into the async block without duplicating the underlying data.

Multi-threaded memoisation in Rust

I am developing an algorithm in Rust that I want to multi-thread. The nature of the algorithm is that it produces solutions to overlapping subproblems, hence why I am looking for a way to achieve multi-threaded memoisation.
An implementation of (single-threaded) memoisation is presented by Pritchard in this article.
I would like to have this functionality extended such that:
Whenever the underlying function must be invoked, including recursively, the result is evaluated asynchronously on a new thread.
Continuing on from the previous point, suppose we have some memoised function f, and f(x) that needs to recursively invoke f(x1), f(x2), … f(xn). It should be possible for all of these recursive invocations to be evaluated concurrently on separate threads.
If the memoised function is called on an input whose result is currently being evaluated, the current thread should block on this thread, and somehow obtain the result after it is released. This ensures that we don't end up with multiple threads attempting to evaluate the same result.
There is a means of forcing f(x) to be evaluated and cached (if it isn't already) without blocking the current thread. This allows the programmer to preemptively begin the evaluation of a result on a particular value that they know will be (or is likely to be) needed later.
One way you could do this is by storing a HashMap, where the key is the paramaters to f and the value is the receiver of a oneshot message containing the result. Then for any value that you need:
If there is already a receiver in the map, await it.
Otherwise, spawn a future to start calculating the result, and store the receiver in the map.
Here is a very contrived example that took way longer than it should have, but successfully runs (Playground):
use futures::{
future::{self, BoxFuture},
prelude::*,
ready,
};
use std::{
collections::HashMap,
pin::Pin,
sync::Arc,
task::{Context, Poll},
};
use tokio::sync::{oneshot, Mutex};
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct MemoInput(usize);
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct MemoReturn(usize);
/// This is necessary in order to make a concrete type for the `HashMap`.
struct OneshotReceiverUnwrap<T>(oneshot::Receiver<T>);
impl<T> Future for OneshotReceiverUnwrap<T> {
type Output = T;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// Don't worry too much about this part
Poll::Ready(ready!(Pin::new(&mut self.0).poll(cx)).unwrap())
}
}
type MemoMap = Mutex<HashMap<MemoInput, future::Shared<OneshotReceiverUnwrap<MemoReturn>>>>;
/// Compute (2^n)-1, super inefficiently.
fn compute(map: Arc<MemoMap>, x: MemoInput) -> BoxFuture<'static, MemoReturn> {
async move {
// First, get all dependencies.
let dependencies: Vec<MemoReturn> = future::join_all({
let map2 = map.clone();
let mut map_lock = map.lock().await;
// This is an iterator of futures that resolve to the results of the
// dependencies.
(0..x.0).map(move |i| {
let key = MemoInput(i);
let key2 = key.clone();
(*map_lock)
.entry(key)
.or_insert_with(|| {
// If the value is not currently being calculated (ie.
// is not in the map), start calculating it
let (tx, rx) = oneshot::channel();
let map3 = map2.clone();
tokio::spawn(async move {
// Compute the value, then send it to the receiver
// that we put in the map. This will awake all
// threads that were awaiting it.
tx.send(compute(map3, key2).await).unwrap();
});
// Return a shared future so that multiple threads at a
// time can await it
OneshotReceiverUnwrap(rx).shared()
})
.clone() // Clone one instance of the shared future for us
})
})
.await;
// At this point, all dependencies have been resolved!
let result = dependencies.iter().map(|r| r.0).sum::<usize>() + x.0;
MemoReturn(result)
}
.boxed() // Box in order to prevent a recursive type
}
#[tokio::main]
async fn main() {
let map = Arc::new(MemoMap::default());
let result = compute(map, MemoInput(10)).await.0;
println!("{}", result); // 1023
}
Note: this could certainly be better optimized, this is just a POC example.

Resources