I'm trying to spawn a given set of threads and have each perform a long running operation. I would be passing a structure to each worker thread as the internal state of the given thread. The collection of said structs is kept in a vector, part of a Master struct.
The compiler rejects me passing the internal member of a struct to Arc::new():
use std::thread;
use std::sync::Arc;
struct Worker {
name: String,
}
struct Master {
workers: Vec<Worker>,
}
impl Worker {
fn start(&self) {
println!("My name is {} and I'm working!", self.name);
thread::sleep_ms(100_000);
}
}
impl Master {
pub fn run_test(&mut self) {
for i in 0..10 {
self.workers.push(Worker {
name: String::new() + "Worker" + &i.to_string()
});
}
let mut data = Arc::new(self.workers);
for i in 0..10 {
let local_data = data.clone();
thread::spawn(move || {
local_data[i].start();
});
}
thread::sleep_ms(100_000);
}
}
fn main() {
let mut master = Master { workers: vec![] };
}
The error message:
error[E0507]: cannot move out of borrowed content
--> <anon>:26:33
|
26 | let mut data = Arc::new(self.workers);
| ^^^^ cannot move out of borrowed content
What am I doing wrong? Is this idiomatic Rust?
Welcome to Ownership.
In Rust, any single piece of data has one and exactly one owner. Don't be fooled by Rc and Arc: they are a shared interface on top of a single (invisible) owner.
The simplest way of expressing ownership is by value:
struct Master {
workers: Vec<Worker>
}
Here, Master owns a Vec<Worker> which itself owns multiple Worker.
Similarly, functions that take their argument by value (fn new(t: T) -> Arc<T> for example) receive ownership of their argument.
And that is where the issue lies:
Arc::new(self.workers)
means that you are, at the same time:
claiming that Master is the owner of workers
claiming that Arc is the owner of workers
Given the rule of one and exactly one owner, this is clearly intractable.
So, how do you cheat and have multiple co-owners for a single piece of data?
Well... use Rc or Arc!
struct Master {
workers: Arc<Vec<Worker>>
}
And now creating data is as simple as:
let data = self.workers.clone();
which creates a new Arc (which just bumps the reference count).
That's not quite all, though. The core tenet of the Borrowing system is: Aliasing XOR Mutability.
Since Arc is about aliasing, it prevents mutability. You cannot insert workers into self.workers any longer!
There are multiple solutions, such as deferring the initialization of self.workers until the vector is built, however the most common is to use cells or mutexes, that is Rc<RefCell<T>> or Arc<Mutex<T>> (or Arc<RwLock<T>>).
RefCell and Mutex are wrappers that move borrow checking from compile-time to run-time. This gives a bit more flexibility, but may result in run-time panics instead of compile-time errors, so is best used as a last resort.
Related
I am looking to use a member function of a struct as a mutable callback. The standard way I would do this is by wrapping my struct instance in a mutex and then in an Arc. In this use case however, my callback is a real-time audio callback which needs to be real-time safe and is triggered on a dedicated thread. This means no locking can happen that might delay the callback from occurring.
My actual member callback method must take a mutable reference to the struct (I think this is where things fall apart). The struct does contain members but these can all be atomic variables.
Here is an example of the kind of thing I'm looking for:
use std::sync::Arc;
use clap::error;
use cpal::{traits::{HostTrait, DeviceTrait}, StreamConfig, OutputCallbackInfo, StreamError};
use std::sync::atomic::AtomicI16;
pub struct AudioProcessor {
var: AtomicI16,
}
impl AudioProcessor {
fn audio_block_f32(&mut self, audio: &mut [f32], _info: &OutputCallbackInfo) {
}
fn audio_error(&mut self, error: StreamError) {
}
}
fn initAudio()
{
let out_dev = cpal::default_host().default_output_device().expect("No available output device found");
let mut supported_configs_range = out_dev.supported_output_configs().expect("Could not obtain device configs");
let config = supported_configs_range.next().expect("No available configs").with_max_sample_rate();
let proc = Arc::new(AudioProcessor{var: AtomicI16::new(5)});
let audio_callback_instance = proc.clone();
let error_callback_instance = proc.clone();
out_dev.build_output_stream(&StreamConfig::from(config), move |audio: &mut [f32], info: &OutputCallbackInfo| audio_callback_instance.audio_block_f32(audio, info), move |stream_error| error_callback_instance.clone().audio_error(stream_error));
}
Currently this code fails in the build_output_stream() function with this error message:
cannot borrow data in an Arc as mutable
trait DerefMut is required to modify through a dereference, but it is not implemented for Arc<AudioProcessor>
Are there any standard ways of dealing with problems like this?
I'm trying to speed up a computationally-heavy Rust function by making it concurrent using only the built-in thread support. In particular, I want to alternate between quick single-threaded phases (where the main thread has mutable access to a big structure) and concurrent phases (where many worker threads run with read-only access to the structure). I don't want to make extra copies of the structure or force it to be 'static. Where I'm having trouble is convincing the borrow checker that the worker threads have finished.
Ignoring the borrow checker, an Arc reference seems like does all that is needed. The reference count in the Arc increases with the .clone() for each worker, then decreases as the workers conclude and I join all the worker threads. If (and only if) the Arc reference count is 1, it should be safe for the main thread to resume. The borrow checker, however, doesn't seem to know about Arc reference counts, and insists that my structure needs to be 'static.
Here's some sample code which works fine if I don't use threads, but won't compile if I switch the comments to enable the multi-threaded case.
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let ref_bigstruct = &mut BigStruct { data: Vec::new() };
for i in 0..3 {
ref_bigstruct.data.push(i); // Phase where main thread has write access
run_threads(ref_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &BigStruct) {
let arc_bigstruct = Arc::new(ref_bigstruct);
{
let arc_clone_for_worker = arc_bigstruct.clone();
// SINGLE-THREADED WORKS:
worker_thread(arc_clone_for_worker);
// MULTI-THREADED DOES NOT COMPILE:
// let handle = thread::spawn(move || { worker_thread(arc_clone_for_worker); } );
// handle.join();
}
assert!(Arc::strong_count(&arc_bigstruct) == 1);
println!("??? How can I tell the borrow checker that all borrows of ref_bigstruct are done?")
}
fn worker_thread(my_struct: Arc<&BigStruct>) {
println!(" worker says len()={}", my_struct.data.len());
}
I'm still learning about Rust lifetimes, but what I think (fear?) what I need is an operation that will take an ordinary (not 'static) reference to my structure and give me an Arc that I can clone into immutable references with a 'static lifetime for use by the workers. Once all the the worker Arc references are dropped, the borrow checker needs to allow my thread-spawning function to return. For safety, I assume this would panic if the the reference count is >1. While this seems like it would generally confirm with Rust's safety requirements, I don't see how to do it.
The underlying problem is not the borrowing checker not following Arc and the solution is not to use Arc. The problem is the borrow checker being unable to understand that the reason a thread must be 'static is because it may outlive the spawning thread, and thus if I immediately .join() it it is fine.
And the solution is to use scoped threads, that is, threads that allow you to use non-'static data because they always immediately .join(), and thus the spawned thread cannot outlive the spawning thread. Problem is, there are no worker threads on the standard library. Well, there are, however they're unstable.
So if you insist on not using crates, for some reason, you have no choice but to use unsafe code (don't, really). But if you can use external crates, then you can use the well-known crossbeam crate with its crossbeam::scope function, at least til std's scoped threads are stabilized.
In Rust Arc< T>, T is per definition immutable. Which means in order to use Arc, to make threads access data that is going to change, you also need it to wrap in some type that is interiorly mutable.
Rust provides a type that is especially suited for a single write or multiple read accesses in parallel, called RwLock.
So for your simple example, this would propably look something like this
use std::{sync::{Arc, RwLock}, thread};
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let arc_bigstruct = Arc::new(RwLock::new(BigStruct { data: Vec::new() }));
for i in 0..3 {
arc_bigstruct.write().unwrap().data.push(i); // Phase where main thread has write access
run_threads(&arc_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &Arc<RwLock<BigStruct>>) {
{
let arc_clone_for_worker = ref_bigstruct.clone();
//MULTI-THREADED
let handle = thread::spawn(move || { worker_thread(&arc_clone_for_worker); } );
handle.join().unwrap();
}
assert!(Arc::strong_count(&ref_bigstruct) == 1);
}
fn worker_thread(my_struct: &Arc<RwLock<BigStruct>>) {
println!(" worker says len()={}", my_struct.read().unwrap().data.len());
}
Which outputs
worker says len()=1
worker says len()=2
worker says len()=3
As for your question, the borrow checker does not know when an Arc is released, as far as I know. The references are counted at runtime.
I want my method of struct to perform in a synchronized way. I wanted to do this by using Mutex (Playground):
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
map: BTreeMap<String, String>,
mutex: Mutex<()>,
}
impl A {
pub fn new() -> A {
A {
map: BTreeMap::new(),
mutex: Mutex::new(()),
}
}
}
impl A {
fn synchronized_call(&mut self) {
let mutex_guard_res = self.mutex.try_lock();
if mutex_guard_res.is_err() {
return
}
let mut _mutex_guard = mutex_guard_res.unwrap(); // safe because of check above
let mut lambda = |text: String| {
let _ = self.map.insert("hello".to_owned(),
"d".to_owned());
};
lambda("dd".to_owned());
}
}
Error message:
error[E0500]: closure requires unique access to `self` but `self.mutex` is already borrowed
--> <anon>:23:26
|
18 | let mutex_guard_res = self.mutex.try_lock();
| ---------- borrow occurs here
...
23 | let mut lambda = |text: String| {
| ^^^^^^^^^^^^^^ closure construction occurs here
24 | if let Some(m) = self.map.get(&text) {
| ---- borrow occurs due to use of `self` in closure
...
31 | }
| - borrow ends here
As I understand when we borrow anything from the struct we are unable to use other struct's fields till our borrow is finished. But how can I do method synchronization then?
The closure needs a mutable reference to the self.map in order to insert something into it. But closure capturing works with whole bindings only. This means, that if you say self.map, the closure attempts to capture self, not self.map. And self can't be mutably borrowed/captured, because parts of self are already immutably borrowed.
We can solve this closure-capturing problem by introducing a new binding for the map alone such that the closure is able to capture it (Playground):
let mm = &mut self.map;
let mut lambda = |text: String| {
let _ = mm.insert("hello".to_owned(), text);
};
lambda("dd".to_owned());
However, there is something you overlooked: since synchronized_call() accepts &mut self, you don't need the mutex! Why? Mutable references are also called exclusive references, because the compiler can assure at compile time that there is only one such mutable reference at any given time.
Therefore you statically know, that there is at most one instance of synchronized_call() running on one specific object at any given time, if the function is not recursive (calls itself).
If you have mutable access to a mutex, you know that the mutex is unlocked. See the Mutex::get_mut() method for more explanation. Isn't that amazing?
Rust mutexes do not work the way you are trying to use them. In Rust, a mutex protects specific data relying on the borrow-checking mechanism used elsewhere in the language. As a consequence, declaring a field Mutex<()> doesn't make sense, because it is protecting read-write access to the () unit object that has no values to mutate.
As Lukas explained, your call_synchronized as declared doesn't need to do synchronization because its signature already requests an exclusive (mutable) reference to self, which prevents it from being invoked from multiple threads on the same object. In other words, you need to change the signature of call_synchronized because the current one does not match the functionality it is intended to provide.
call_synchronized needs to accept a shared reference to self, which will signal to Rust that it can be called from multiple threads in the first place. Inside call_synchronized a call to Mutex::lock will simultaneously lock the mutex and provide a mutable reference to the underlying data, carefully scoped so that the lock is held for the duration of the reference:
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
synced_map: Mutex<BTreeMap<String, String>>,
}
impl A {
pub fn new() -> A {
A {
synced_map: Mutex::new(BTreeMap::new()),
}
}
}
impl A {
fn synchronized_call(&self) {
let mut map = self.synced_map.lock().unwrap();
// omitting the lambda for brevity, but it would also work
// (as long as it refers to map rather than self.map)
map.insert("hello".to_owned(), "d".to_owned());
}
}
I have a struct:
struct MyData {
x: i32
}
I want to asynchronously start a long operation on this struct.
My first attempt was this:
fn foo(&self) { //should return immediately
std::thread::Thread::spawn(move || {
println!("{:?}",self.x); //consider a very long operation
});
}
Clearly the compiler cannot infer an appropriate lifetime due to conflicting requirements because self may be on the stack frame and thus cannot be guaranteed to exist by the time the operation is running on a different stack frame.
To solve this, I attempted to make a copy of self and provide that copy to the new thread:
fn foo(&self) { //should return immediately
let clone = self.clone();
std::thread::Thread::spawn(move || {
println!("{:?}",clone.x); //consider a very long operation
});
}
I think that does not compile because now clone is on the stack frame which is similar to before. I also tried to do the clone inside the thread, and that does not compile either, I think for similar reasons.
Then I decided maybe I could use a channel to push the copied data into the thread, on the theory that perhaps channel can magically move (copy?) stack-allocated data between threads, which is suggested by this example in the documentation. However the compiler cannot infer a lifetime for this either:
fn foo(&self) { //should return immediately
let (tx, rx) = std::sync::mpsc::channel();
tx.send(self.clone());
std::thread::Thread::spawn(move || {
println!("{:?}",rx.recv().unwrap().x); //consider a very long operation
});
}
Finally, I decided to just copy my struct onto the heap explicitly, and pass an Arc into the thread. But not even here can the compiler figure out a lifetime:
fn foo(&self) { //should return immediately
let arc = std::sync::Arc::new(self.clone());
std::thread::Thread::spawn(move || {
println!("{:?}",arc.clone().x); //consider a very long operation
});
}
Okay borrow checker, I give up. How do I get a copy of self onto my new thread?
I think your issue is simply because your structure does not derive the Clone trait. You can get your second example to compile and run by adding a #[derive(Clone)] before your struct's definition.
What I don't understand in the compiler behaviour here is what .clone() function it tried to use here. Your structure indeed did not implement the Clone trait so should not by default have a .clone() function.
playpen
You may also want to consider in your function taking self by value, and let your caller decide whether it should make a clone, or just a move.
As an alternative solution, you could use thread::scoped and maintain a handle to the thread. This allows the thread to hold a reference, without the need to copy it in:
#![feature(old_io,std_misc)]
use std::thread::{self,JoinGuard};
use std::old_io::timer;
use std::time::duration::Duration;
struct MyData {
x: i32,
}
// returns immediately
impl MyData {
fn foo(&self) -> JoinGuard<()> {
thread::scoped(move || {
timer::sleep(Duration::milliseconds(300));
println!("{:?}", self.x); //consider a very long operation
timer::sleep(Duration::milliseconds(300));
})
}
}
fn main() {
let d = MyData { x: 42 };
let _thread = d.foo();
println!("I'm so fast!");
}
So if you had a MUD sever that handled each tcp connection in a separate process,
for stream in acceptor.incoming() {
match stream {
Err(e) => { /* connection failed */ }
Ok(stream) => spawn(proc() {
handle_client(stream)
})
}
}
What would be the strategy for sharing mutable world data for that server? I can imagine n connections responding to commands from the user. Each command needing to visit and possibly modify the world.
pub struct Server<'a> {
world: World<'a>
}
pub struct World<'a> {
pub chat_rooms: HashMap<&'a str, ChatRoom<'a>>
}
impl<'a> World<'a> {
pub fn new() -> World<'a> {
let mut rooms = HashMap::new();
rooms.insert("General", ChatRoom::new("General"));
rooms.insert("Help", ChatRoom::new("Help"));
World{chat_rooms: rooms}
}
}
Would Arc be the way to go?
let shared_server = Arc::new(server);
let server = shared_server.clone();
spawn(proc() {
// Work with server
});
What about scaling to 100 or 1000 users? I'm just looking for a nudge in the right direction.
An Arc will let you access a value from multiple tasks, but it will not allow you to borrow the value mutably. The compiler cannot verify statically that only one task would borrow the value mutably at a time, and mutating a value concurrently on different tasks leads to data races.
Rust's standard library provides some types that allow mutating a shared object safely. Here are two of them:
Mutex: This is a simple mutual exclusion lock. A Mutex wraps the protected value, so the only way to access the value is by locking the mutex. Only one task at a time may access the wrapped value.
RWLock: This is a reader-writer lock. This kind of lock lets multiple tasks read a value at the same time, but writers must have exclusive access. This is basically the same rules that the borrow checker and RefCell have (except the lock waits until the borrow is released, instead of failing compilation or panicking).
You'll need to wrap a Mutex or a RWLock in an Arc to make it accessible to multiple tasks.