Spawning tasks with non-static lifetimes with tokio 0.1.x - rust

I have a tokio core whose main task is running a websocket (client). When I receive some messages from the server, I want to execute a new task that will update some data. Below is a minimal failing example:
use tokio_core::reactor::{Core, Handle};
use futures::future::Future;
use futures::future;
struct Client {
handle: Handle,
data: usize,
}
impl Client {
fn update_data(&mut self) {
// spawn a new task that updates the data
self.handle.spawn(future::ok(()).and_then(|x| {
self.data += 1; // error here
future::ok(())
}));
}
}
fn main() {
let mut runtime = Core::new().unwrap();
let mut client = Client {
handle: runtime.handle(),
data: 0,
};
let task = future::ok::<(), ()>(()).and_then(|_| {
// under some conditions (omitted), we update the data
client.update_data();
future::ok::<(), ()>(())
});
runtime.run(task).unwrap();
}
Which produces this error:
error[E0477]: the type `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:13:51: 16:10 self:&mut &mut Client]>` does not fulfill the required lifetime
--> src/main.rs:13:21
|
13 | self.handle.spawn(future::ok(()).and_then(|x| {
| ^^^^^
|
= note: type must satisfy the static lifetime
The problem is that new tasks that are spawned through a handle need to be static. The same issue is described here. Sadly it is unclear to me how I can fix the issue. Even some attempts with and Arc and a Mutex (which really shouldn't be needed for a single-threaded application), I was unsuccessful.
Since developments occur rather quickly in the tokio landscape, I am wondering what the current best solution is. Do you have any suggestions?
edit
The solution by Peter Hall works for the example above. Sadly when I built the failing example I changed tokio reactor, thinking they would be similar. Using tokio::runtime::current_thread
use futures::future;
use futures::future::Future;
use futures::stream::Stream;
use std::cell::Cell;
use std::rc::Rc;
use tokio::runtime::current_thread::{Builder, Handle};
struct Client {
handle: Handle,
data: Rc<Cell<usize>>,
}
impl Client {
fn update_data(&mut self) {
// spawn a new task that updates the data
let mut data = Rc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
data.set(data.get() + 1);
future::ok(())
}));
}
}
fn main() {
// let mut runtime = Core::new().unwrap();
let mut runtime = Builder::new().build().unwrap();
let mut client = Client {
handle: runtime.handle(),
data: Rc::new(Cell::new(1)),
};
let task = future::ok::<(), ()>(()).and_then(|_| {
// under some conditions (omitted), we update the data
client.update_data();
future::ok::<(), ()>(())
});
runtime.block_on(task).unwrap();
}
I obtain:
error[E0277]: `std::rc::Rc<std::cell::Cell<usize>>` cannot be sent between threads safely
--> src/main.rs:17:21
|
17 | self.handle.spawn(future::ok(()).and_then(move |_x| {
| ^^^^^ `std::rc::Rc<std::cell::Cell<usize>>` cannot be sent between threads safely
|
= help: within `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::Cell<usize>>`
= note: required because it appears within the type `[closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]`
= note: required because it appears within the type `futures::future::chain::Chain<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`
= note: required because it appears within the type `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`
So it does seem like in this case I need an Arc and a Mutex even though the entire code is single-threaded?

In a single-threaded program, you don't need to use Arc; Rc is sufficient:
use std::{rc::Rc, cell::Cell};
struct Client {
handle: Handle,
data: Rc<Cell<usize>>,
}
impl Client {
fn update_data(&mut self) {
let data = Rc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
data.set(data.get() + 1);
future::ok(())
}));
}
}
The point is that you no longer have to worry about the lifetime because each clone of the Rc acts as if it owns the data, rather than accessing it via a reference to self. The inner Cell (or RefCell for non-Copy types) is needed because the Rc can't be dereferenced mutably, since it has been cloned.
The spawn method of tokio::runtime::current_thread::Handle requires that the future is Send, which is what is causing the problem in the update to your question. There is an explanation (of sorts) for why this is the case in this Tokio Github issue.
You can use tokio::runtime::current_thread::spawn instead of the method of Handle, which will always run the future in the current thread, and does not require that the future is Send. You can replace self.handle.spawn in the code above and it will work just fine.
If you need to use the method on Handle then you will also need to resort to Arc and Mutex (or RwLock) in order to satisfy the Send requirement:
use std::sync::{Mutex, Arc};
struct Client {
handle: Handle,
data: Arc<Mutex<usize>>,
}
impl Client {
fn update_data(&mut self) {
let data = Arc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
*data.lock().unwrap() += 1;
future::ok(())
}));
}
}
If your data is really a usize, you could also use AtomicUsize instead of Mutex<usize>, but I personally find it just as unwieldy to work with.

Related

Callbacks that mutate local state

I have two structs:
Client, which stores a callback and calls it in response to receiving new data. As an example, you can think of this as a websocket client, and we want to provide a hook for incoming messages.
BusinessLogic, which wants to hold a Client initialized with a callback that will update its local value in response to changes that the Client sees.
After following compiler hints, I arrived at the following minimal example:
Rust playground link
use rand::Rng;
struct Client<'cb> {
callback: Box<dyn FnMut(i64) + 'cb>,
}
impl<'cb> Client<'cb> {
fn do_thing(&mut self) {
// does stuff
let value = self._get_new_value();
// does more stuff
(self.callback)(value);
// does even more stuff
}
fn _get_new_value(&self) -> i64 {
let mut rng = rand::thread_rng();
rng.gen()
}
}
struct BusinessLogic<'cb> {
value: Option<i64>,
client: Option<Client<'cb>>,
}
impl<'cb> BusinessLogic<'cb> {
fn new() -> Self {
Self {
value: None,
client: None,
}
}
fn subscribe(&'cb mut self) {
self.client = Some(Client {
callback: Box::new(|value| {
self.value = Some(value);
})
})
}
}
fn main() {
let mut bl = BusinessLogic::new();
bl.subscribe();
println!("Hello, world!");
}
Problem is, I am still getting the following compiler error:
Compiling playground v0.0.1 (/playground)
error[E0597]: `bl` does not live long enough
--> src/main.rs:51:5
|
51 | bl.subscribe();
| ^^^^^^^^^^^^^^ borrowed value does not live long enough
...
54 | }
| -
| |
| `bl` dropped here while still borrowed
| borrow might be used here, when `bl` is dropped and runs the destructor for type `BusinessLogic<'_>`
For more information about this error, try `rustc --explain E0597`.
error: could not compile `playground` due to previous error
I understand why I'm seeing this error: the call to subscribe uses a borrow of bl with a lifetime of 'cb, which is not necessarily contained within the scope of main(). However, I don't see how to resolve this issue. Won't I always need to provide a lifetime for the callback stored in Client, which will end up bleeding through my code in the form of 'cb lifetime annotations?
More generally, I'm interested in understanding what is the canonical way of solving this callback/hook problem in Rust. I'm open to designs different from the one I have proposed, and if there are relevant performance concerns for various options, that would be useful to know also.
What you've created is a self-referential structure, which is problematic and not really expressible with references and lifetime annotations. See: Why can't I store a value and a reference to that value in the same struct? for the potential problems and workarounds. Its an issue here because you want to be able to mutate the BusinessLogic in the callback, but since it holds the Client, you can mutate the callback while its running, which is no good.
I would instead suggest that the callback has full ownership of the BusinessLogic which does not directly reference the Client:
use rand::Rng;
struct Client {
callback: Box<dyn FnMut(i64)>,
}
impl Client {
fn do_thing(&mut self) {
let value = rand::thread_rng().gen();
(self.callback)(value);
}
}
struct BusinessLogic {
value: Option<i64>,
}
fn main() {
let mut bl = BusinessLogic {
value: None
};
let mut client = Client {
callback: Box::new(move |value| {
bl.value = Some(value);
})
};
client.do_thing();
println!("Hello, world!");
}
if you need the subscriber to have backwards communication to the Client, you can pass an additional parameter that the callback can mutate, or simply do it via return value
if you need more complicated communication from the Client to the callback, either send a Message enum as the argument, or make the callback a custom trait instead of just FnMut with additional methods
if you need a single BusinessLogic to operate from multiple Clients use Arc+Mutex to allow shared ownership

Why do I get an error that "Sync is not satisfied" when moving self, which contains an Arc, into a new thread?

I have a struct which holds an Arc<Receiver<f32>> and I'm trying to add a method which takes ownership of self, and moves the ownership into a new thread and starts it. However, I'm getting the error
error[E0277]: the trait bound `std::sync::mpsc::Receiver<f32>: std::marker::Sync` is not satisfied
--> src/main.rs:19:9
|
19 | thread::spawn(move || {
| ^^^^^^^^^^^^^ `std::sync::mpsc::Receiver<f32>` cannot be shared between threads safely
|
= help: the trait `std::marker::Sync` is not implemented for `std::sync::mpsc::Receiver<f32>`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<std::sync::mpsc::Receiver<f32>>`
= note: required because it appears within the type `Foo`
= note: required because it appears within the type `[closure#src/main.rs:19:23: 22:10 self:Foo]`
= note: required by `std::thread::spawn`
If I change the struct to hold an Arc<i32> instead, or just a Receiver<f32>, it compiles, but not with a Arc<Receiver<f32>>. How does this work? The error doesn't make sense to me as I'm not trying to share it between threads (I'm moving it, not cloning it).
Here is the full code:
use std::sync::mpsc::{channel, Receiver, Sender};
use std::sync::Arc;
use std::thread;
pub struct Foo {
receiver: Arc<Receiver<f32>>,
}
impl Foo {
pub fn new() -> (Foo, Sender<f32>) {
let (sender, receiver) = channel::<f32>();
let sink = Foo {
receiver: Arc::new(receiver),
};
(sink, sender)
}
pub fn run_thread(self) -> thread::JoinHandle<()> {
thread::spawn(move || {
println!("Thread spawned by 'run_thread'");
self.run(); // <- This line gives the error
})
}
fn run(mut self) {
println!("Executing 'run'")
}
}
fn main() {
let (example, sender) = Foo::new();
let handle = example.run_thread();
handle.join();
}
How does this work?
Let's check the requirements of thread::spawn again:
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T,
F: Send + 'static, // <-- this line is important for us
T: Send + 'static,
Since Foo contains an Arc<Receiver<_>>, let's check if and how Arc implements Send:
impl<T> Send for Arc<T>
where
T: Send + Sync + ?Sized,
So Arc<T> implements Send if T implements Send and Sync. And while Receiver implements Send, it does not implement Sync.
So why does Arc have such strong requirements for T? T also has to implement Send because Arc can act like a container; if you could just hide something that doesn't implement Send in an Arc, send it to another thread and unpack it there... bad things would happen. The interesting part is to see why T also has to implement Sync, which is apparently also the part you are struggling with:
The error doesn't make sense to me as I'm not trying to share it between threads (I'm moving it, not cloning it).
The compiler can't know that the Arc in Foo is in fact not shared. Consider if you would add a #[derive(Clone)] to Foo later (which is possible without a problem):
fn main() {
let (example, sender) = Foo::new();
let clone = example.clone();
let handle = example.run_thread();
clone.run();
// oopsie, now the same `Receiver` is used from two threads!
handle.join();
}
In the example above there is only one Receiver which is shared between threads. And this is no good, since Receiver does not implement Sync!
To me this code raises the question: why the Arc in the first place? As you noticed, without the Arc, it works without a problem: you clearly state that Foo is the only owner of the Receiver. And if you are "not trying to share [the Receiver]" anyway, there is no point in having multiple owners.

How do I share a generic struct between threads using Arc<Mutex<MyStruct<T>>>?

I have some mutable state I need to share between threads. I followed the concurrency section of the Rust book, which shares a vector between threads and mutates it.
Instead of a vector, I need to share a generic struct that is ultimately monomorphized. Here is a distilled example of what I'm trying:
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
use std::marker::PhantomData;
trait Memory {}
struct SimpleMemory;
impl Memory for SimpleMemory {}
struct SharedData<M: Memory> {
value: usize,
phantom: PhantomData<M>,
}
impl<M: Memory> SharedData<M> {
fn new() -> Self {
SharedData {
value: 0,
phantom: PhantomData,
}
}
}
fn main() {
share(SimpleMemory);
}
fn share<M: Memory>(memory: M) {
let data = Arc::new(Mutex::new(SharedData::<M>::new()));
for i in 0..3 {
let data = data.clone();
thread::spawn(move || {
let mut data = data.lock().unwrap();
data.value += i;
});
}
thread::sleep(Duration::from_millis(50));
}
The compiler complains with the following error:
error[E0277]: the trait bound `M: std::marker::Send` is not satisfied
--> src/main.rs:37:9
|
37 | thread::spawn(move || {
| ^^^^^^^^^^^^^
|
= help: consider adding a `where M: std::marker::Send` bound
= note: required because it appears within the type `std::marker::PhantomData<M>`
= note: required because it appears within the type `SharedData<M>`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Mutex<SharedData<M>>`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<std::sync::Mutex<SharedData<M>>>`
= note: required because it appears within the type `[closure#src/main.rs:37:23: 40:10 data:std::sync::Arc<std::sync::Mutex<SharedData<M>>>, i:usize]`
= note: required by `std::thread::spawn`
I'm trying to understand why M would need to implement Send, and what the appropriate way to accomplish this is.
I'm trying to understand why M would need to implement Send, ...
Because, as stated by the Send documentation:
Types that can be transferred across thread boundaries.
If it's not Send, it is by definition not safe to send to another thread.
Almost all of the information you need is right there in the documentation:
thread::spawn requires the callable you give it to be Send.
You're using a closure, which is only Send if all the values it captures are Send. This is true in general of most types (they are Send if everything they're made of is Send, and similarly for Sync).
You're capturing data, which is an Arc<T>, which is only Send if T is Send.
T is a Mutex<U>, which is only Send if U is Send.
U is M. Thus, M must be Send.
In addition, note that thread::spawn also requires that the callable be 'static, so you need that too. It needs that because if it didn't require that, it'd have no guarantee that the value will continue to exist for the entire lifetime of the thread (which may or may not outlive the thread that spawned it).
..., and what the appropriate way to accomplish this is.
Same way as any other constraints: M: 'static + Send + Memory.

tokio-curl: capture output into a local `Vec` - may outlive borrowed value

I do not know Rust well enough to understand lifetimes and closures yet...
Trying to collect the downloaded data into a vector using tokio-curl:
extern crate curl;
extern crate futures;
extern crate tokio_core;
extern crate tokio_curl;
use std::io::{self, Write};
use std::str;
use curl::easy::Easy;
use tokio_core::reactor::Core;
use tokio_curl::Session;
fn main() {
// Create an event loop that we'll run on, as well as an HTTP `Session`
// which we'll be routing all requests through.
let mut lp = Core::new().unwrap();
let mut out = Vec::new();
let session = Session::new(lp.handle());
// Prepare the HTTP request to be sent.
let mut req = Easy::new();
req.get(true).unwrap();
req.url("https://www.rust-lang.org").unwrap();
req.write_function(|data| {
out.extend_from_slice(data);
io::stdout().write_all(data).unwrap();
Ok(data.len())
})
.unwrap();
// Once we've got our session, issue an HTTP request to download the
// rust-lang home page
let request = session.perform(req);
// Execute the request, and print the response code as well as the error
// that happened (if any).
let mut req = lp.run(request).unwrap();
println!("{:?}", req.response_code());
println!("out: {}", str::from_utf8(&out).unwrap());
}
Produces an error:
error[E0373]: closure may outlive the current function, but it borrows `out`, which is owned by the current function
--> src/main.rs:25:24
|
25 | req.write_function(|data| {
| ^^^^^^ may outlive borrowed value `out`
26 | out.extend_from_slice(data);
| --- `out` is borrowed here
|
help: to force the closure to take ownership of `out` (and any other referenced variables), use the `move` keyword, as shown:
| req.write_function(move |data| {
Investigating further, I see that Easy::write_function requires the 'static lifetime, but the example of how to collect output from the curl-rust docs uses Transfer::write_function instead:
use curl::easy::Easy;
let mut data = Vec::new();
let mut handle = Easy::new();
handle.url("https://www.rust-lang.org/").unwrap();
{
let mut transfer = handle.transfer();
transfer.write_function(|new_data| {
data.extend_from_slice(new_data);
Ok(new_data.len())
}).unwrap();
transfer.perform().unwrap();
}
println!("{:?}", data);
The Transfer::write_function does not require the 'static lifetime:
impl<'easy, 'data> Transfer<'easy, 'data> {
/// Same as `Easy::write_function`, just takes a non `'static` lifetime
/// corresponding to the lifetime of this transfer.
pub fn write_function<F>(&mut self, f: F) -> Result<(), Error>
where F: FnMut(&[u8]) -> Result<usize, WriteError> + 'data
{
...
But I can't use a Transfer instance on tokio-curl's Session::perform because it requires the Easy type:
pub fn perform(&self, handle: Easy) -> Perform {
transfer.easy is a private field that is directly passed to session.perform.
It this an issue with tokio-curl? Maybe it should mark the transfer.easy field as public or implement new function like perform_transfer? Is there another way to collect output using tokio-curl per transfer?
The first thing you have to understand when using the futures library is that you don't have any control over what thread the code is going to run on.
In addition, the documentation for curl's Easy::write_function says:
Note that the lifetime bound on this function is 'static, but that is often too restrictive. To use stack data consider calling the transfer method and then using write_function to configure a callback that can reference stack-local data.
The most straight-forward solution is to use some type of locking primitive to ensure that only one thread at a time may have access to the vector. You also have to share ownership of the vector between the main thread and the closure:
use std::sync::Mutex;
use std::sync::Arc;
let out = Arc::new(Mutex::new(Vec::new()));
let out_closure = out.clone();
// ...
req.write_function(move |data| {
let mut out = out_closure.lock().expect("Unable to lock output");
// ...
}).expect("Cannot set writing function");
// ...
let out = out.lock().expect("Unable to lock output");
println!("out: {}", str::from_utf8(&out).expect("Data was not UTF-8"));
Unfortunately, the tokio-curl library does not currently support using the Transfer type that would allow for stack-based data.

How do I share a mutable object between threads using Arc?

I'm trying to share a mutable object between threads in Rust using Arc, but I get this error:
error[E0596]: cannot borrow data in a `&` reference as mutable
--> src/main.rs:11:13
|
11 | shared_stats_clone.add_stats();
| ^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
This is the sample code:
use std::{sync::Arc, thread};
fn main() {
let total_stats = Stats::new();
let shared_stats = Arc::new(total_stats);
let threads = 5;
for _ in 0..threads {
let mut shared_stats_clone = shared_stats.clone();
thread::spawn(move || {
shared_stats_clone.add_stats();
});
}
}
struct Stats {
hello: u32,
}
impl Stats {
pub fn new() -> Stats {
Stats { hello: 0 }
}
pub fn add_stats(&mut self) {
self.hello += 1;
}
}
What can I do?
Arc's documentation says:
Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.
You will likely want a Mutex combined with an Arc:
use std::{
sync::{Arc, Mutex},
thread,
};
struct Stats;
impl Stats {
fn add_stats(&mut self, _other: &Stats) {}
}
fn main() {
let shared_stats = Arc::new(Mutex::new(Stats));
let threads = 5;
for _ in 0..threads {
let my_stats = shared_stats.clone();
thread::spawn(move || {
let mut shared = my_stats.lock().unwrap();
shared.add_stats(&Stats);
});
// Note: Immediately joining, no multithreading happening!
// THIS WAS A LIE, see below
}
}
This is largely cribbed from the Mutex documentation.
How can I use shared_stats after the for? (I'm talking about the Stats object). It seems that the shared_stats cannot be easily converted to Stats.
As of Rust 1.15, it's possible to get the value back. See my additional answer for another solution as well.
[A comment in the example] says that there is no multithreading. Why?
Because I got confused! :-)
In the example code, the result of thread::spawn (a JoinHandle) is immediately dropped because it's not stored anywhere. When the handle is dropped, the thread is detached and may or may not ever finish. I was confusing it with JoinGuard, a old, removed API that joined when it is dropped. Sorry for the confusion!
For a bit of editorial, I suggest avoiding mutability completely:
use std::{ops::Add, thread};
#[derive(Debug)]
struct Stats(u64);
// Implement addition on our type
impl Add for Stats {
type Output = Stats;
fn add(self, other: Stats) -> Stats {
Stats(self.0 + other.0)
}
}
fn main() {
let threads = 5;
// Start threads to do computation
let threads: Vec<_> = (0..threads).map(|_| thread::spawn(|| Stats(4))).collect();
// Join all the threads, fail if any of them failed
let result: Result<Vec<_>, _> = threads.into_iter().map(|t| t.join()).collect();
let result = result.unwrap();
// Add up all the results
let sum = result.into_iter().fold(Stats(0), |i, sum| sum + i);
println!("{:?}", sum);
}
Here, we keep a reference to the JoinHandle and then wait for all the threads to finish. We then collect the results and add them all up. This is the common map-reduce pattern. Note that no thread needs any mutability, it all happens in the master thread.

Resources