Sharing Mutable Data Between Threads in Rust - multithreading

I know there are hundreds of questions just like this one, but i'm having trouble wrapping my head around how to do the thing I'm trying to do.
I want an http server that accepts and processes events. On receiving/processing an event, i want the EventManager to send an update to an ApplicationMonitor that is tracking how many events have been accepted/processed. The ApplicationMonitor would also (eventually) handle things like tracking number of concurrent connections, but in this example I just want my EventManager to send an Inc('event_accepted') update to my ApplicationMonitor.
To be useful, I need the ApplicationMonitor to be able to return a snapshot of the stats when the requested through a /stats route.
So I have an ApplicationMonitor which spawns a thread and listens on a channel for incoming Stat events. When it receives a Stat event it updates the stats HashMap. The stats hashmap must be mutable within both ApplicationMonitor as well as the spawned thread.
use std::sync::mpsc;
use std::sync::mpsc::Sender;
use std::thread;
use std::thread::JoinHandle;
use std::collections::HashMap;
pub enum Stat {
Inc(&'static str),
Dec(&'static str),
Set(&'static str, i32)
}
pub struct ApplicationMonitor {
pub tx: Sender<Stat>,
pub join_handle: JoinHandle<()>
}
impl ApplicationMonitor {
pub fn new() -> ApplicationMonitor {
let (tx, rx) = mpsc::channel::<Stat>();
let mut stats: HashMap<&'static str, i32> = HashMap::new();
let join_handle = thread::spawn(move || {
for stat in rx.recv() {
match stat {
Stat::Inc(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val + 1);
},
Stat::Dec(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val - 1);
},
Stat::Set(nm, val) => {
stats.insert(nm, val);
}
}
}
});
let am = ApplicationMonitor {
tx,
join_handle
};
am
}
pub fn get_snapshot(&self) -> HashMap<&'static str, i32> {
self.stats.clone()
}
}
Because rx cannot be cloned, I must move the references into the closure. When I do this, I am no longer able to access stats outside of the thread.
I thought maybe I needed a second channel so the thread could communicate it's internals back out, but this doesn't work as i would need another thread to listen for that in a non-blocking way.
Is this where I'd use Arc?
How can I have stats live inside and out of the thread context?

Yes, this is a place where you'd wrap your stats in an Arc so that you can have multiple references to it from different threads. But just wrapping in an Arc will only give you a read-only view of the HashMap - if you need to be able to modify it, you'll also need to wrap it in something which guarantees that only one thing can modify it at a time. So you'll probably end up with either an Arc<Mutex<HashMap<&'static str, i32>>> or a Arc<RwLock<HashMap<&'static str, i32>>>.
Alternatively, if you're just changing the values, and not adding or removing values, you could potentially use an Arc<HashMap<&static str, AtomicU32>>, which would allow you to read and modify different values in parallel without needing to take out a Map-wide lock, but Atomics can be a little more fiddly to understand and use correctly than locks.

Related

Rust member functions as multithreaded mutable callbacks in real-time safe code

I am looking to use a member function of a struct as a mutable callback. The standard way I would do this is by wrapping my struct instance in a mutex and then in an Arc. In this use case however, my callback is a real-time audio callback which needs to be real-time safe and is triggered on a dedicated thread. This means no locking can happen that might delay the callback from occurring.
My actual member callback method must take a mutable reference to the struct (I think this is where things fall apart). The struct does contain members but these can all be atomic variables.
Here is an example of the kind of thing I'm looking for:
use std::sync::Arc;
use clap::error;
use cpal::{traits::{HostTrait, DeviceTrait}, StreamConfig, OutputCallbackInfo, StreamError};
use std::sync::atomic::AtomicI16;
pub struct AudioProcessor {
var: AtomicI16,
}
impl AudioProcessor {
fn audio_block_f32(&mut self, audio: &mut [f32], _info: &OutputCallbackInfo) {
}
fn audio_error(&mut self, error: StreamError) {
}
}
fn initAudio()
{
let out_dev = cpal::default_host().default_output_device().expect("No available output device found");
let mut supported_configs_range = out_dev.supported_output_configs().expect("Could not obtain device configs");
let config = supported_configs_range.next().expect("No available configs").with_max_sample_rate();
let proc = Arc::new(AudioProcessor{var: AtomicI16::new(5)});
let audio_callback_instance = proc.clone();
let error_callback_instance = proc.clone();
out_dev.build_output_stream(&StreamConfig::from(config), move |audio: &mut [f32], info: &OutputCallbackInfo| audio_callback_instance.audio_block_f32(audio, info), move |stream_error| error_callback_instance.clone().audio_error(stream_error));
}
Currently this code fails in the build_output_stream() function with this error message:
cannot borrow data in an Arc as mutable
trait DerefMut is required to modify through a dereference, but it is not implemented for Arc<AudioProcessor>
Are there any standard ways of dealing with problems like this?

How do I send read-only data to other threads without copying?

I'm trying to send a "view" of a read-only data to another thread for processing. Basically the main thread does work, and continuously updates a set of data. Whenever an update occurs, the main thread should send the updated data down to other threads where they will process it in a read-only manner. I do not want to copy the data as it may be very large. (The main thread also keeps a "cache" of the data in-memory anyway.)
I can achieve this with Arc<RwLock<T>>, where T being my data structure.
However, there is nothing stopping the side threads updating the data. The side threads can simply call lock() and the write to the data.
My question is there something similar to RwLock where the owner/creator of it has the only write access, but all other instances have read-only access? This way I will have compile time checking of any logic bugs that may occur via side threads accidentally updating data.
Regarding these questions:
Sharing read-only object between threads in Rust?
How can I pass a reference to a stack variable to a thread?
The above questions suggest solving it with Arc<Mutex<T>> or Arc<RwLock<T>> which is all fine. But it still doesn't give compile time enforcement of only one writer.
Additionally: crossbeam or rayon's scoped threads don't help here as I want my side threads to outlive my main thread.
You can create a wrapper type over an Arc<RwLock<T>> that only exposes cloning via a read only wrapper:
mod shared {
use std::sync::{Arc, LockResult, RwLock, RwLockReadGuard, RwLockWriteGuard};
pub struct Lock<T> {
inner: Arc<RwLock<T>>,
}
impl<T> Lock<T> {
pub fn new(val: T) -> Self {
Self {
inner: Arc::new(RwLock::new(val)),
}
}
pub fn write(&self) -> LockResult<RwLockWriteGuard<'_, T>> {
self.inner.write()
}
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
pub fn read_only(&self) -> ReadOnly<T> {
ReadOnly {
inner: self.inner.clone(),
}
}
}
pub struct ReadOnly<T> {
inner: Arc<RwLock<T>>,
}
impl<T> ReadOnly<T> {
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
}
}
Now you can pass read only versions of the value to spawned threads, and continue writing in the main thread:
fn main() {
let val = shared::Lock::new(String::new());
for _ in 0..10 {
let view = val.read_only();
std::thread::spawn(move || {
// view.write().unwrap().push_str("...");
// ERROR: no method named `write` found for struct `ReadOnly` in the current scope
println!("{}", view.read().unwrap());
});
}
val.write().unwrap().push_str("...");
println!("{}", val.read().unwrap());
}

Referencing self in a long-running Rust thread

I'm left scratching my head about how to design this. Basically, I want to implement a worker pool – kind of similar to the ThreadPool from the book, but with a twist. In the book, they just pass a closure for one of the threads in the pool to run. However, I would like to have some state for every thread in the pool. Let me explain:
use std::sync::{mpsc, Arc, Mutex};
use std::thread;
struct Job {
x: usize,
}
struct WorkerPool {
sender: mpsc::Sender<Job>,
workers: Vec<Worker>,
}
impl WorkerPool {
fn new(num_workers: usize) -> WorkerPool {
let mut workers = Vec::with_capacity(num_workers);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
for id in 0..num_workers {
workers.push(Worker::new(id, receiver.clone()));
}
WorkerPool { sender, workers }
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
receiver: Arc<Mutex<mpsc::Receiver<Job>>>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
Worker {
id,
thread: None,
receiver,
}
}
fn start(&mut self) {
self.thread = Some(thread::spawn(move || loop {
let job = self.receiver.lock().unwrap().recv().unwrap();
self.add_to_id(job.x);
}));
}
pub fn add_to_id(&self, x: usize) {
println!("The result is: {}", self.id + x);
}
}
Every one of my Workers gets an id, and its job is to accept a Job containing a number, and printing its id plus that number (this is, of course, a simplified version; in my real use case, each worker gets an HTTP client and some other state). Pretty simple problem in my eyes, but obviously the code above does not compile.
I realize that the code in Worker::start cannot possibly work, because it is moving self into the thread closure while I am trying to assign to self at the same time.
The question is, how else would I access the fields in the "parent" struct of the thread?
Can I somehow constrain the thread closures lifetime to that of the struct? (Pretty sure the answer is no, because closures have to be 'static). Or the other way around, do I have to make everything 'static here?
Not sure the exact problem you are trying to solve. But you can make your code compile by ensuring that references accessed within thread remain valid throughout the lifetime of thread. Make id remain valid using Arc<Mutex<usize>>
Shows an example where you make your code compile:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ed750f5ba5db9597efb9b2b80bde2959
let recv = self.receiver.clone();
let id = self.id.clone();
self.thread = Some(thread::spawn(move || loop {
let job = recv.lock().unwrap().recv().unwrap();
//self.add_to_id(job.x);
*id.lock().unwrap() = job.x;
}));

Circular reference between two structs with high performance

I have two structs Node and Communicator. Node contains "business-logic" while Communicator contains methods for sending and receiving UDP messages. Node needs to call methods on Communicator when it wants to send messages and Communicator needs to call methods on Node when it receives a UDP message. If they are the same struct, there is no problem at all. But I want to separate them because they clearly have different responsibilities. Having it all in one struct would become unmanageable. My code looks as follows:
fn main() {
use std::sync::{Arc, Mutex, Condvar, Weak};
use std::thread;
pub struct Node {
communicator: Option<Arc<Communicator>>
}
impl Node {
pub fn new() -> Node {
Node {
communicator: None
}
}
pub fn set_communicator(&mut self, communicator: Arc<Communicator>) {
self.communicator = Some(communicator);
}
}
pub struct Communicator {
node: Option<Weak<Node>>
}
impl Communicator {
pub fn new() -> Communicator {
Communicator {
node: None
}
}
pub fn set_node(&mut self, node: Weak<Node>) {
self.node = Some(node);
}
}
let mut my_node = Arc::new(Node::new());
let mut my_communicator = Arc::new(Communicator::new());
Arc::get_mut(&mut my_node).unwrap().set_communicator(Arc::clone(&my_communicator));
//Arc::get_mut(&mut my_communicator).unwrap().set_node(Arc::downgrade(&my_node));
}
My code crashes, as I predicted, if I uncomment both of the two last lines. But it is something like that I want to achieve.
I see a couple of options:
Use Mutex or RwLock to gain interior mutability. But that gives a performance penalty.
Use Cell or RefCell. But they are not thread-safe.
Use AtomicCell from crossbeam. Looks like the best option so far. But what about performance?
Use unsafe. But where? And how to ensure the code remains memory safe in spite of unsafe?
In theory I could split Communicator into a Sender and a Receiver, so that there would be no cyclical references. But for my program, I know I will have a similar situation in the future where this will not be possible.
It seems that since I just split a struct into two, for code-structuring reasons, and gaining no new functionality, there should be a way to do this without having to pay any performance penalty like Mutex would give.

Forced to use of Mutex when it's not required

I am writing a game and have a player list defined as follows:
pub struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
This struct has methods for adding, removing, getting players, and getting the player count.
The NetworkServer and Server shares this list as follows:
NetworkServer {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
Server {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
This is inside an Arc<Mutex> because the NetworkServer accesses the list in a different thread (network loop).
When a player joins, a thread is spawned for them and they are added to the player_list.
Although the only operation I'm doing is adding to player_list, I'm forced to use Arc<Mutex<Player>> instead of the more natural Rc<RefCell<Player>> in the HashMaps because Mutex<PlayerList> requires it. I am not accessing players from the network thread (or any other thread) so it makes no sense to put them under a Mutex. Only the HashMaps need to be locked, which I am doing using Mutex<PlayerList>. But Rust is pedantic and wants to protect against all misuses.
As I'm only accessing Players in the main thread, locking every time to do that is both annoying and less performant. Is there a workaround instead of using unsafe or something?
Here's an example:
use std::cell::Cell;
use std::collections::HashMap;
use std::ffi::CString;
use std::rc::Rc;
use std::sync::{Arc, Mutex};
use std::thread;
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Uuid([u8; 16]);
struct Player {
pub name: String,
pub uuid: Uuid,
}
struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Arc::new(Mutex::new(p));
self.by_name.insert(name, Arc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
struct NetworkServer {
player_list: Arc<Mutex<PlayerList>>,
}
impl NetworkServer {
fn start(&mut self) {
let player_list = Arc::clone(&self.player_list);
thread::spawn(move || {
loop {
// fake network loop
// listen for incoming connections, accept player and add them to player_list.
player_list.lock().unwrap().add_player(Player {
name: "blahblah".into(),
uuid: Uuid([0; 16]),
});
}
});
}
}
struct Server {
player_list: Arc<Mutex<PlayerList>>,
network_server: NetworkServer,
}
impl Server {
fn start(&mut self) {
self.network_server.start();
// main game loop
loop {
// I am only accessing players in this loop in this thread. (main thread)
// so Mutex for individual player is not needed although rust requires it.
}
}
}
fn main() {
let player_list = Arc::new(Mutex::new(PlayerList {
by_name: HashMap::new(),
by_uuid: HashMap::new(),
}));
let network_server = NetworkServer {
player_list: Arc::clone(&player_list),
};
let mut server = Server {
player_list,
network_server,
};
server.start();
}
As I'm only accessing Players in the main thread, locking everytime to do that is both annoying and less performant.
You mean, as right now you are only accessing Players in the main thread, but at any time later you may accidentally introduce an access to them in another thread?
From the point of view of the language, if you can get a reference to a value, you may use the value. Therefore, if multiple threads have a reference to a value, this value should be safe to use from multiple threads. There is no way to enforce, at compile-time, that a particular value, although accessible, is actually never used.
This raises the question, however:
If the value is never used by a given thread, why does this thread have access to it in the first place?
It seems to me that you have a design issue. If you can manage to redesign your program so that only the main thread has access to the PlayerList, then you will immediately be able to use Rc<RefCell<...>>.
For example, you could instead have the network thread send a message to the main thread announcing that a new player connected.
At the moment, you are "Communicating by Sharing", and you could shift toward "Sharing by Communicating" instead. The former usually has synchronization primitives (such as mutexes, atomics, ...) all over the place, and may face contention/dead-lock issues, while the latter usually has communication queues (channels) and requires an "asynchronous" style of programming.
Send is a marker trait that governs which objects can have ownership transferred across thread boundaries. It is automatically implemented for any type that is entirely composed of Send types. It is also an unsafe trait because manually implementing this trait can cause the compiler to not enforce the concurrency safety that we love about Rust.
The problem is that Rc<RefCell<Player>> isn't Send and thus your PlayerList isn't Send and thus can't be sent to another thread, even when wrapped in an Arc<Mutex<>>. The unsafe workaround would be to unsafe impl Send for your PlayerList struct.
Putting this code into your playground example allows it to compile the same way as the original with Arc<Mutex<Player>>
struct PlayerList {
by_name: HashMap<String, Rc<RefCell<Player>>>,
by_uuid: HashMap<Uuid, Rc<RefCell<Player>>>,
}
unsafe impl Send for PlayerList {}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Rc::new(RefCell::new(p));
self.by_name.insert(name, Rc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
Playground
The Nomicon is sadly a little sparse at explaining what rules have have to be enforced by the programmer when unsafely implementing Send for a type containing Rcs, but accessing in only one thread seems safe enough...
For completeness, here's TRPL's bit on Send and Sync
I suggest solving this threading problem using a multi-sender-single-receiver channel. The network threads get a Sender<Player> and no direct access to the player list.
The Receiver<Player> gets stored inside the PlayerList. The only thread accessing the PlayerList is the main thread, so you can remove the Mutex around it. Instead in the place where the main-thread used to lock the mutexit dequeue all pending players from the Receiver<Player>, wraps them in an Rc<RefCell<>> and adds them to the appropriate collections.
Though looking at the bigger designing, I wouldn't use a per-player thread in the first place. Instead I'd use some kind single threaded event-loop based design. (I didn't look into which Rust libraries are good in that area, but tokio seems popular)

Resources