Referencing self in a long-running Rust thread

Referencing self in a long-running Rust thread - rust

I'm left scratching my head about how to design this. Basically, I want to implement a worker pool – kind of similar to the ThreadPool from the book, but with a twist. In the book, they just pass a closure for one of the threads in the pool to run. However, I would like to have some state for every thread in the pool. Let me explain:
use std::sync::{mpsc, Arc, Mutex};
use std::thread;
struct Job {
x: usize,
}
struct WorkerPool {
sender: mpsc::Sender<Job>,
workers: Vec<Worker>,
}
impl WorkerPool {
fn new(num_workers: usize) -> WorkerPool {
let mut workers = Vec::with_capacity(num_workers);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
for id in 0..num_workers {
workers.push(Worker::new(id, receiver.clone()));
}
WorkerPool { sender, workers }
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
receiver: Arc<Mutex<mpsc::Receiver<Job>>>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
Worker {
id,
thread: None,
receiver,
}
}
fn start(&mut self) {
self.thread = Some(thread::spawn(move || loop {
let job = self.receiver.lock().unwrap().recv().unwrap();
self.add_to_id(job.x);
}));
}
pub fn add_to_id(&self, x: usize) {
println!("The result is: {}", self.id + x);
}
}
Every one of my Workers gets an id, and its job is to accept a Job containing a number, and printing its id plus that number (this is, of course, a simplified version; in my real use case, each worker gets an HTTP client and some other state). Pretty simple problem in my eyes, but obviously the code above does not compile.
I realize that the code in Worker::start cannot possibly work, because it is moving self into the thread closure while I am trying to assign to self at the same time.
The question is, how else would I access the fields in the "parent" struct of the thread?
Can I somehow constrain the thread closures lifetime to that of the struct? (Pretty sure the answer is no, because closures have to be 'static). Or the other way around, do I have to make everything 'static here?

Not sure the exact problem you are trying to solve. But you can make your code compile by ensuring that references accessed within thread remain valid throughout the lifetime of thread. Make id remain valid using Arc<Mutex<usize>>
Shows an example where you make your code compile:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ed750f5ba5db9597efb9b2b80bde2959
let recv = self.receiver.clone();
let id = self.id.clone();
self.thread = Some(thread::spawn(move || loop {
let job = recv.lock().unwrap().recv().unwrap();
//self.add_to_id(job.x);
*id.lock().unwrap() = job.x;
}));

Related

Rust: joining thread fails with: cannot move out of dereference of `std::sync::MutexGuard<'_, models::worker::Worker>`

I am having a hard time figuring out how to sort out this issue.
So I have a class ArcWorker holding a shared reference to Worker (as you can remark below).
I wrote a function in ArcWorker called join() in which the line self.internal.lock().unwrap().join(); fails with the following error:
cannot move out of dereference of std::sync::MutexGuard<'_, models::worker::Worker>
What I attempt through that line is to lock the mutex, unwrap and call the join() function from the Worker class.
As far as I understand, once that the lock function is called and it borrows a reference to self (&self), then I need some way to get to pass self by value to join (std::thread's join function requires passing self by value).
What can I do to make this work? Tried to find an answer to my question for hours but to no avail.
pub struct Worker {
accounts: Vec<Arc<Mutex<Account>>>,
thread_join_handle: Option<thread::JoinHandle<()>>
}
pub struct ArcWorker {
internal: Arc<Mutex<Worker>>
}
impl ArcWorker {
pub fn new(accounts: Vec<Arc<Mutex<Account>>>) -> ArcWorker {
return ArcWorker {
internal: Arc::new(Mutex::new(Worker {
accounts: accounts,
thread_join_handle: None
}))
}
}
pub fn spawn(&self) {
let local_self_1 = self.internal.clone();
self.internal.lock().unwrap().thread_join_handle = Some(thread::spawn(move || {
println!("Spawn worker");
local_self_1.lock().unwrap().perform_random_transactions();
}));
}
pub fn join(&self) {
self.internal.lock().unwrap().join();
}
}
impl Worker {
fn join(self) {
if let Some(thread_join_handle) = self.thread_join_handle {
thread_join_handle.join().expect("Couldn't join the associated threads.")
}
}
fn perform_random_transactions(&self) {
}
}

Since you already hold JoinHandle in an option, you can make Worker::join() take &mut self instead of self and change the if let condition to:
// note added `.take()`
if let Some(thread_join_handle) = self.thread_join_handle.take() {
Option::take() will move the handle out of the option and give you ownership over it, while leaving None in self.thread_join_handle. With this change ArcWorker::join() should compile as-is.

How do I send read-only data to other threads without copying?

I'm trying to send a "view" of a read-only data to another thread for processing. Basically the main thread does work, and continuously updates a set of data. Whenever an update occurs, the main thread should send the updated data down to other threads where they will process it in a read-only manner. I do not want to copy the data as it may be very large. (The main thread also keeps a "cache" of the data in-memory anyway.)
I can achieve this with Arc<RwLock<T>>, where T being my data structure.
However, there is nothing stopping the side threads updating the data. The side threads can simply call lock() and the write to the data.
My question is there something similar to RwLock where the owner/creator of it has the only write access, but all other instances have read-only access? This way I will have compile time checking of any logic bugs that may occur via side threads accidentally updating data.
Regarding these questions:
Sharing read-only object between threads in Rust?
How can I pass a reference to a stack variable to a thread?
The above questions suggest solving it with Arc<Mutex<T>> or Arc<RwLock<T>> which is all fine. But it still doesn't give compile time enforcement of only one writer.
Additionally: crossbeam or rayon's scoped threads don't help here as I want my side threads to outlive my main thread.

You can create a wrapper type over an Arc<RwLock<T>> that only exposes cloning via a read only wrapper:
mod shared {
use std::sync::{Arc, LockResult, RwLock, RwLockReadGuard, RwLockWriteGuard};
pub struct Lock<T> {
inner: Arc<RwLock<T>>,
}
impl<T> Lock<T> {
pub fn new(val: T) -> Self {
Self {
inner: Arc::new(RwLock::new(val)),
}
}
pub fn write(&self) -> LockResult<RwLockWriteGuard<'_, T>> {
self.inner.write()
}
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
pub fn read_only(&self) -> ReadOnly<T> {
ReadOnly {
inner: self.inner.clone(),
}
}
}
pub struct ReadOnly<T> {
inner: Arc<RwLock<T>>,
}
impl<T> ReadOnly<T> {
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
}
}
Now you can pass read only versions of the value to spawned threads, and continue writing in the main thread:
fn main() {
let val = shared::Lock::new(String::new());
for _ in 0..10 {
let view = val.read_only();
std::thread::spawn(move || {
// view.write().unwrap().push_str("...");
// ERROR: no method named `write` found for struct `ReadOnly` in the current scope
println!("{}", view.read().unwrap());
});
}
val.write().unwrap().push_str("...");
println!("{}", val.read().unwrap());
}

Sharing Mutable Data Between Threads in Rust

I know there are hundreds of questions just like this one, but i'm having trouble wrapping my head around how to do the thing I'm trying to do.
I want an http server that accepts and processes events. On receiving/processing an event, i want the EventManager to send an update to an ApplicationMonitor that is tracking how many events have been accepted/processed. The ApplicationMonitor would also (eventually) handle things like tracking number of concurrent connections, but in this example I just want my EventManager to send an Inc('event_accepted') update to my ApplicationMonitor.
To be useful, I need the ApplicationMonitor to be able to return a snapshot of the stats when the requested through a /stats route.
So I have an ApplicationMonitor which spawns a thread and listens on a channel for incoming Stat events. When it receives a Stat event it updates the stats HashMap. The stats hashmap must be mutable within both ApplicationMonitor as well as the spawned thread.
use std::sync::mpsc;
use std::sync::mpsc::Sender;
use std::thread;
use std::thread::JoinHandle;
use std::collections::HashMap;
pub enum Stat {
Inc(&'static str),
Dec(&'static str),
Set(&'static str, i32)
}
pub struct ApplicationMonitor {
pub tx: Sender<Stat>,
pub join_handle: JoinHandle<()>
}
impl ApplicationMonitor {
pub fn new() -> ApplicationMonitor {
let (tx, rx) = mpsc::channel::<Stat>();
let mut stats: HashMap<&'static str, i32> = HashMap::new();
let join_handle = thread::spawn(move || {
for stat in rx.recv() {
match stat {
Stat::Inc(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val + 1);
},
Stat::Dec(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val - 1);
},
Stat::Set(nm, val) => {
stats.insert(nm, val);
}
}
}
});
let am = ApplicationMonitor {
tx,
join_handle
};
am
}
pub fn get_snapshot(&self) -> HashMap<&'static str, i32> {
self.stats.clone()
}
}
Because rx cannot be cloned, I must move the references into the closure. When I do this, I am no longer able to access stats outside of the thread.
I thought maybe I needed a second channel so the thread could communicate it's internals back out, but this doesn't work as i would need another thread to listen for that in a non-blocking way.
Is this where I'd use Arc?
How can I have stats live inside and out of the thread context?

Yes, this is a place where you'd wrap your stats in an Arc so that you can have multiple references to it from different threads. But just wrapping in an Arc will only give you a read-only view of the HashMap - if you need to be able to modify it, you'll also need to wrap it in something which guarantees that only one thing can modify it at a time. So you'll probably end up with either an Arc<Mutex<HashMap<&'static str, i32>>> or a Arc<RwLock<HashMap<&'static str, i32>>>.
Alternatively, if you're just changing the values, and not adding or removing values, you could potentially use an Arc<HashMap<&static str, AtomicU32>>, which would allow you to read and modify different values in parallel without needing to take out a Map-wide lock, but Atomics can be a little more fiddly to understand and use correctly than locks.

Spawning tasks with non-static lifetimes with tokio 0.1.x

I have a tokio core whose main task is running a websocket (client). When I receive some messages from the server, I want to execute a new task that will update some data. Below is a minimal failing example:
use tokio_core::reactor::{Core, Handle};
use futures::future::Future;
use futures::future;
struct Client {
handle: Handle,
data: usize,
}
impl Client {
fn update_data(&mut self) {
// spawn a new task that updates the data
self.handle.spawn(future::ok(()).and_then(|x| {
self.data += 1; // error here
future::ok(())
}));
}
}
fn main() {
let mut runtime = Core::new().unwrap();
let mut client = Client {
handle: runtime.handle(),
data: 0,
};
let task = future::ok::<(), ()>(()).and_then(|_| {
// under some conditions (omitted), we update the data
client.update_data();
future::ok::<(), ()>(())
});
runtime.run(task).unwrap();
}
Which produces this error:
error[E0477]: the type `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:13:51: 16:10 self:&mut &mut Client]>` does not fulfill the required lifetime
--> src/main.rs:13:21
|
13 | self.handle.spawn(future::ok(()).and_then(|x| {
| ^^^^^
|
= note: type must satisfy the static lifetime
The problem is that new tasks that are spawned through a handle need to be static. The same issue is described here. Sadly it is unclear to me how I can fix the issue. Even some attempts with and Arc and a Mutex (which really shouldn't be needed for a single-threaded application), I was unsuccessful.
Since developments occur rather quickly in the tokio landscape, I am wondering what the current best solution is. Do you have any suggestions?
edit
The solution by Peter Hall works for the example above. Sadly when I built the failing example I changed tokio reactor, thinking they would be similar. Using tokio::runtime::current_thread
use futures::future;
use futures::future::Future;
use futures::stream::Stream;
use std::cell::Cell;
use std::rc::Rc;
use tokio::runtime::current_thread::{Builder, Handle};
struct Client {
handle: Handle,
data: Rc<Cell<usize>>,
}
impl Client {
fn update_data(&mut self) {
// spawn a new task that updates the data
let mut data = Rc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
data.set(data.get() + 1);
future::ok(())
}));
}
}
fn main() {
// let mut runtime = Core::new().unwrap();
let mut runtime = Builder::new().build().unwrap();
let mut client = Client {
handle: runtime.handle(),
data: Rc::new(Cell::new(1)),
};
let task = future::ok::<(), ()>(()).and_then(|_| {
// under some conditions (omitted), we update the data
client.update_data();
future::ok::<(), ()>(())
});
runtime.block_on(task).unwrap();
}
I obtain:
error[E0277]: `std::rc::Rc<std::cell::Cell<usize>>` cannot be sent between threads safely
--> src/main.rs:17:21
|
17 | self.handle.spawn(future::ok(()).and_then(move |_x| {
| ^^^^^ `std::rc::Rc<std::cell::Cell<usize>>` cannot be sent between threads safely
|
= help: within `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::Cell<usize>>`
= note: required because it appears within the type `[closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]`
= note: required because it appears within the type `futures::future::chain::Chain<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`
= note: required because it appears within the type `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`
So it does seem like in this case I need an Arc and a Mutex even though the entire code is single-threaded?

In a single-threaded program, you don't need to use Arc; Rc is sufficient:
use std::{rc::Rc, cell::Cell};
struct Client {
handle: Handle,
data: Rc<Cell<usize>>,
}
impl Client {
fn update_data(&mut self) {
let data = Rc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
data.set(data.get() + 1);
future::ok(())
}));
}
}
The point is that you no longer have to worry about the lifetime because each clone of the Rc acts as if it owns the data, rather than accessing it via a reference to self. The inner Cell (or RefCell for non-Copy types) is needed because the Rc can't be dereferenced mutably, since it has been cloned.
The spawn method of tokio::runtime::current_thread::Handle requires that the future is Send, which is what is causing the problem in the update to your question. There is an explanation (of sorts) for why this is the case in this Tokio Github issue.
You can use tokio::runtime::current_thread::spawn instead of the method of Handle, which will always run the future in the current thread, and does not require that the future is Send. You can replace self.handle.spawn in the code above and it will work just fine.
If you need to use the method on Handle then you will also need to resort to Arc and Mutex (or RwLock) in order to satisfy the Send requirement:
use std::sync::{Mutex, Arc};
struct Client {
handle: Handle,
data: Arc<Mutex<usize>>,
}
impl Client {
fn update_data(&mut self) {
let data = Arc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
*data.lock().unwrap() += 1;
future::ok(())
}));
}
}
If your data is really a usize, you could also use AtomicUsize instead of Mutex<usize>, but I personally find it just as unwieldy to work with.

How do I share a mutable object between threads using Arc?

I'm trying to share a mutable object between threads in Rust using Arc, but I get this error:
error[E0596]: cannot borrow data in a `&` reference as mutable
--> src/main.rs:11:13
|
11 | shared_stats_clone.add_stats();
| ^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
This is the sample code:
use std::{sync::Arc, thread};
fn main() {
let total_stats = Stats::new();
let shared_stats = Arc::new(total_stats);
let threads = 5;
for _ in 0..threads {
let mut shared_stats_clone = shared_stats.clone();
thread::spawn(move || {
shared_stats_clone.add_stats();
});
}
}
struct Stats {
hello: u32,
}
impl Stats {
pub fn new() -> Stats {
Stats { hello: 0 }
}
pub fn add_stats(&mut self) {
self.hello += 1;
}
}
What can I do?

Arc's documentation says:
Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.
You will likely want a Mutex combined with an Arc:
use std::{
sync::{Arc, Mutex},
thread,
};
struct Stats;
impl Stats {
fn add_stats(&mut self, _other: &Stats) {}
}
fn main() {
let shared_stats = Arc::new(Mutex::new(Stats));
let threads = 5;
for _ in 0..threads {
let my_stats = shared_stats.clone();
thread::spawn(move || {
let mut shared = my_stats.lock().unwrap();
shared.add_stats(&Stats);
});
// Note: Immediately joining, no multithreading happening!
// THIS WAS A LIE, see below
}
}
This is largely cribbed from the Mutex documentation.
How can I use shared_stats after the for? (I'm talking about the Stats object). It seems that the shared_stats cannot be easily converted to Stats.
As of Rust 1.15, it's possible to get the value back. See my additional answer for another solution as well.
[A comment in the example] says that there is no multithreading. Why?
Because I got confused! :-)
In the example code, the result of thread::spawn (a JoinHandle) is immediately dropped because it's not stored anywhere. When the handle is dropped, the thread is detached and may or may not ever finish. I was confusing it with JoinGuard, a old, removed API that joined when it is dropped. Sorry for the confusion!
For a bit of editorial, I suggest avoiding mutability completely:
use std::{ops::Add, thread};
#[derive(Debug)]
struct Stats(u64);
// Implement addition on our type
impl Add for Stats {
type Output = Stats;
fn add(self, other: Stats) -> Stats {
Stats(self.0 + other.0)
}
}
fn main() {
let threads = 5;
// Start threads to do computation
let threads: Vec<_> = (0..threads).map(|_| thread::spawn(|| Stats(4))).collect();
// Join all the threads, fail if any of them failed
let result: Result<Vec<_>, _> = threads.into_iter().map(|t| t.join()).collect();
let result = result.unwrap();
// Add up all the results
let sum = result.into_iter().fold(Stats(0), |i, sum| sum + i);
println!("{:?}", sum);
}
Here, we keep a reference to the JoinHandle and then wait for all the threads to finish. We then collect the results and add them all up. This is the common map-reduce pattern. Note that no thread needs any mutability, it all happens in the master thread.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Referencing self in a long-running Rust thread - rust

Related

Rust: joining thread fails with: cannot move out of dereference of `std::sync::MutexGuard<'_, models::worker::Worker>`

How do I send read-only data to other threads without copying?

Sharing Mutable Data Between Threads in Rust

Spawning tasks with non-static lifetimes with tokio 0.1.x

How do I share a mutable object between threads using Arc?

Categories

Resources