How do I send read-only data to other threads without copying?

How do I send read-only data to other threads without copying? - multithreading

I'm trying to send a "view" of a read-only data to another thread for processing. Basically the main thread does work, and continuously updates a set of data. Whenever an update occurs, the main thread should send the updated data down to other threads where they will process it in a read-only manner. I do not want to copy the data as it may be very large. (The main thread also keeps a "cache" of the data in-memory anyway.)
I can achieve this with Arc<RwLock<T>>, where T being my data structure.
However, there is nothing stopping the side threads updating the data. The side threads can simply call lock() and the write to the data.
My question is there something similar to RwLock where the owner/creator of it has the only write access, but all other instances have read-only access? This way I will have compile time checking of any logic bugs that may occur via side threads accidentally updating data.
Regarding these questions:
Sharing read-only object between threads in Rust?
How can I pass a reference to a stack variable to a thread?
The above questions suggest solving it with Arc<Mutex<T>> or Arc<RwLock<T>> which is all fine. But it still doesn't give compile time enforcement of only one writer.
Additionally: crossbeam or rayon's scoped threads don't help here as I want my side threads to outlive my main thread.

You can create a wrapper type over an Arc<RwLock<T>> that only exposes cloning via a read only wrapper:
mod shared {
use std::sync::{Arc, LockResult, RwLock, RwLockReadGuard, RwLockWriteGuard};
pub struct Lock<T> {
inner: Arc<RwLock<T>>,
}
impl<T> Lock<T> {
pub fn new(val: T) -> Self {
Self {
inner: Arc::new(RwLock::new(val)),
}
}
pub fn write(&self) -> LockResult<RwLockWriteGuard<'_, T>> {
self.inner.write()
}
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
pub fn read_only(&self) -> ReadOnly<T> {
ReadOnly {
inner: self.inner.clone(),
}
}
}
pub struct ReadOnly<T> {
inner: Arc<RwLock<T>>,
}
impl<T> ReadOnly<T> {
pub fn read(&self) -> LockResult<RwLockReadGuard<'_, T>> {
self.inner.read()
}
}
}
Now you can pass read only versions of the value to spawned threads, and continue writing in the main thread:
fn main() {
let val = shared::Lock::new(String::new());
for _ in 0..10 {
let view = val.read_only();
std::thread::spawn(move || {
// view.write().unwrap().push_str("...");
// ERROR: no method named `write` found for struct `ReadOnly` in the current scope
println!("{}", view.read().unwrap());
});
}
val.write().unwrap().push_str("...");
println!("{}", val.read().unwrap());
}

Related

How can I store variables in traits so that it can be used in other methods on trait?

I have several objects in my code that have common functionality. These objects act essentially like services where they have a life-time (controlled by start/stop) and perform work until their life-time ends. I am in the process of trying to refactor my code to reduce duplication but getting stuck.
At a high level, my objects all have the code below implemented in them.
impl SomeObject {
fn start(&self) {
// starts a new thread.
// stores the 'JoinHandle' so thread can be joined later.
}
fn do_work(&self) {
// perform work in the context of the new thread.
}
fn stop(&self) {
// interrupts the work that this object is doing.
// stops the thread.
}
}
Essentially, these objects act like "services" so in order to refactor, my first thought was that I should create a trait called "service" as shown below.
trait Service {
fn start(&self) {}
fn do_work(&self);
fn stop(&self) {}
}
Then, I can just update my objects to each implement the "Service" trait. The issue that I am having though, is that since traits are not allowed to have fields/properties, I am not sure how I can go about saving the 'JoinHandle' in the trait so that I can use it in the other methods.
Is there an idiomatic way to handle this problem in Rust?
tldr; how can I save variables in a trait so that they can be re-used in different trait methods?
Edit:
Here is the solution I settled on. Any feedback is appreciated.
extern crate log;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread::JoinHandle;
pub struct Context {
name: String,
thread_join_handle: JoinHandle<()>,
is_thread_running: Arc<AtomicBool>,
}
pub trait Service {
fn start(&self, name: String) -> Context {
log::trace!("starting service, name=[{}]", name);
let is_thread_running = Arc::new(AtomicBool::new(true));
let cloned_is_thread_running = is_thread_running.clone();
let thread_join_handle = std::thread::spawn(move || loop {
while cloned_is_thread_running.load(Ordering::SeqCst) {
Self::do_work();
}
});
log::trace!("started service, name=[{}]", name);
return Context {
name: name,
thread_join_handle: thread_join_handle,
is_thread_running: is_thread_running,
};
}
fn stop(context: Context) {
log::trace!("stopping service, name=[{}]", context.name);
context.is_thread_running.store(false, Ordering::SeqCst);
context
.thread_join_handle
.join()
.expect("joining service thread");
log::trace!("stopped service, name=[{}]", context.name);
}
fn do_work();
}

I think you just need to save your JoinHandle in the struct's state itself, then the other methods can access it as well because they all get all the struct's data passed to them already.
struct SomeObject {
join_handle: JoinHandle;
}
impl Service for SomeObject {
fn start(&mut self) {
// starts a new thread.
// stores the 'JoinHandle' so thread can be joined later.
self.join_handle = what_ever_you_wanted_to_set //Store it here like this
}
fn do_work(&self) {
// perform work in the context of the new thread.
}
fn stop(&self) {
// interrupts the work that this object is doing.
// stops the thread.
}
}
Hopefully that works for you.

Depending on how you're using the trait, you could restructure your code and make it so start returns a JoinHandle and the other functions take the join handle as input
trait Service {
fn start(&self) -> JoinHandle;
fn do_work(&self, handle: &mut JoinHandle);
fn stop(&self, handle: JoinHandle);
}
(maybe with different function arguments depending on what you need). this way you could probably cut down on duplicate code by putting all the code that handles the handles (haha) outside of the structs themselves and makes it more generic. If you want the structs the use the JoinHandle outside of this trait, I'd say it's best to just do what Equinox suggested and just make it a field.

How can I determine if I have a unique Arc when it's dropped?

I've an Arc<Mutex<Thing>> field in a struct which is cloned many times. It is shared between concurrent threads. Drop::drop is called for each clone as it goes out of scope. Is there any way to determine when Drop::drop is called for the last (unique) Arc<Mutex<Thing>>?
It's clear that strong_count is subject to data races (I've seen them). So, you can't count on Arc::strong_count() == 1 (no pun intended).
I found that I couldn't use Arc::try_unwrap() due to a move issue.
Arc::is_unique() is private.
Other than keeping a Arc<AtomicUsize> field, which is incremented on clone and decremented on drop, is there any way to determine if a drop is for a unique Arc<Mutex<Thing>>?
Here's an MRE:
use std::sync::{Arc};
#[derive(Debug)]
enum Action {
One, Two, Three
}
// Thing trait which operates on an Action, which should be a enum, allowing for
// different action sets.
trait Thing<T> {
fn disconnected(&self);
fn action(&self, action: T);
}
// There are many instances of an ActionController.
// There may be zero or more clones of an instance.
// The final drop of the instances should call thing.disconnected()
// In a multi-core environment, the same instance may be running on multiple cores
// ActionController should not be generic.
#[derive(Clone)]
struct ActionController {
id: usize,
thing: Arc<dyn Thing<Action>>,
}
impl ActionController {
fn new(id: usize, thing: Box<dyn Thing<Action>>) -> Self {
Self { id, thing: Arc::from(thing) }
}
fn invoke(&self, action: Action) {
self.thing.action(action);
}
}
//
// To work around the drop issue, I've implemented Clone for ActionController which
// performs a fetch_add(1) on clone and a fetch_sub(1) on drop. This provides
// suficient information to call disconnected() -- but it just seems like there's
// got to be a better way.
impl Drop for ActionController {
fn drop(&mut self) {
// drop will be called for each clone of an Controller instance. When
// the unique instance is dropped, disconnected() must be called
self.thing.disconnected();
}
}
struct Controlled {}
impl Thing<Action> for Controlled {
fn disconnected(&self) { println!("disconnected")}
fn action(&self, action: Action) {println!("action: {:#?}", action)}
}
fn bad() {
let controlled = Controlled{};
let controlled = Box::new(controlled) as Box<dyn Thing<Action>>;
let controller = ActionController::new(1, controlled);
let clone = controller.clone();
controller.invoke(Action::One);
clone.invoke(Action::Two);
drop (controller);
clone.invoke(Action::Three);
}
fn main() {
bad();
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn incorrect() {
bad();
}
}

Arc::try_unwrap is probably the intended way to do this - is it possible to restructure your code to avoid the move issues you were running into?

Why do you want to know? If you have some extra cleanup code that needs to be executed before the Mutex<Thing> is dropped, maybe you could use an Arc<MyLockedThing> instead, where MyLockedThing is a struct containing a Mutex<Thing> that impls Drop to do the cleanup?

It seems like you want to be notified when the data inside the Arc is to be dropped. If so, this can be done by implementing Drop on the type "inside" the Arc.
Define a newtype:
struct ThingAction(Box<dyn Thing<Action>>);
impl Thing<Action> for ThingAction {
fn disconnected(&self) {
self.0.disconnected()
}
fn action(&self, action: Action) {
self.0.action(action)
}
}
And implement Drop:
impl Drop for ThingAction {
fn drop(&mut self) {
self.disconnected()
}
}
Then use the newtype:
#[derive(Clone)]
struct ActionController {
id: usize,
thing: Arc<ThingAction>,
}
impl ActionController {
fn new(id: usize, thing: Box<dyn Thing<Action>>) -> Self {
Self { id, thing: Arc::new(ThingAction(thing)) }
}

I don't think there's any perfect way to do this without stdlib support (go checkout out Arc::drop).
Weak::strong_count or Weak::upgrade is less subject to races so if you downgrade your Arc then drop it, if the weakref's strong count is 0 or trying to upgrade it fails you know the Arc is dead, but there is no guarantee the current thread killed it, two might have concurrently dropped the Arc at the same time before either had the time to check for the weakref's strong count.
I think the only bulletproof way would be to get notified by a Drop stored inside the Arc, that you're guaranteed is only called once.

Sharing Mutable Data Between Threads in Rust

I know there are hundreds of questions just like this one, but i'm having trouble wrapping my head around how to do the thing I'm trying to do.
I want an http server that accepts and processes events. On receiving/processing an event, i want the EventManager to send an update to an ApplicationMonitor that is tracking how many events have been accepted/processed. The ApplicationMonitor would also (eventually) handle things like tracking number of concurrent connections, but in this example I just want my EventManager to send an Inc('event_accepted') update to my ApplicationMonitor.
To be useful, I need the ApplicationMonitor to be able to return a snapshot of the stats when the requested through a /stats route.
So I have an ApplicationMonitor which spawns a thread and listens on a channel for incoming Stat events. When it receives a Stat event it updates the stats HashMap. The stats hashmap must be mutable within both ApplicationMonitor as well as the spawned thread.
use std::sync::mpsc;
use std::sync::mpsc::Sender;
use std::thread;
use std::thread::JoinHandle;
use std::collections::HashMap;
pub enum Stat {
Inc(&'static str),
Dec(&'static str),
Set(&'static str, i32)
}
pub struct ApplicationMonitor {
pub tx: Sender<Stat>,
pub join_handle: JoinHandle<()>
}
impl ApplicationMonitor {
pub fn new() -> ApplicationMonitor {
let (tx, rx) = mpsc::channel::<Stat>();
let mut stats: HashMap<&'static str, i32> = HashMap::new();
let join_handle = thread::spawn(move || {
for stat in rx.recv() {
match stat {
Stat::Inc(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val + 1);
},
Stat::Dec(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val - 1);
},
Stat::Set(nm, val) => {
stats.insert(nm, val);
}
}
}
});
let am = ApplicationMonitor {
tx,
join_handle
};
am
}
pub fn get_snapshot(&self) -> HashMap<&'static str, i32> {
self.stats.clone()
}
}
Because rx cannot be cloned, I must move the references into the closure. When I do this, I am no longer able to access stats outside of the thread.
I thought maybe I needed a second channel so the thread could communicate it's internals back out, but this doesn't work as i would need another thread to listen for that in a non-blocking way.
Is this where I'd use Arc?
How can I have stats live inside and out of the thread context?

Yes, this is a place where you'd wrap your stats in an Arc so that you can have multiple references to it from different threads. But just wrapping in an Arc will only give you a read-only view of the HashMap - if you need to be able to modify it, you'll also need to wrap it in something which guarantees that only one thing can modify it at a time. So you'll probably end up with either an Arc<Mutex<HashMap<&'static str, i32>>> or a Arc<RwLock<HashMap<&'static str, i32>>>.
Alternatively, if you're just changing the values, and not adding or removing values, you could potentially use an Arc<HashMap<&static str, AtomicU32>>, which would allow you to read and modify different values in parallel without needing to take out a Map-wide lock, but Atomics can be a little more fiddly to understand and use correctly than locks.

How to safely wrap C pointers in rust structs

I am building safe bindings for a C library in Rust and I started facing a weird issue.
I created a struct to own the unsafe pointer to the objects returned by the library and free them safely.
This is what I have:
pub struct VipsImage {
pub(crate) ctx: *mut bindings::VipsImage,
}
impl Drop for VipsImage {
fn drop(&mut self) {
unsafe {
if !self.ctx.is_null() {
bindings::g_object_unref(self.ctx as *mut c_void);
}
}
}
}
This works fine as long as I don't share this between async calls. If I return one of this objects in an async function and use it afterwards it will be corrupted. If I used and free them in a single operation, they work as expected.
How would I implement Send and Sync for a struct like that so I can safely share it between threads?
If someone wants to check the full library code, here's the link

Circular reference between two structs with high performance

I have two structs Node and Communicator. Node contains "business-logic" while Communicator contains methods for sending and receiving UDP messages. Node needs to call methods on Communicator when it wants to send messages and Communicator needs to call methods on Node when it receives a UDP message. If they are the same struct, there is no problem at all. But I want to separate them because they clearly have different responsibilities. Having it all in one struct would become unmanageable. My code looks as follows:
fn main() {
use std::sync::{Arc, Mutex, Condvar, Weak};
use std::thread;
pub struct Node {
communicator: Option<Arc<Communicator>>
}
impl Node {
pub fn new() -> Node {
Node {
communicator: None
}
}
pub fn set_communicator(&mut self, communicator: Arc<Communicator>) {
self.communicator = Some(communicator);
}
}
pub struct Communicator {
node: Option<Weak<Node>>
}
impl Communicator {
pub fn new() -> Communicator {
Communicator {
node: None
}
}
pub fn set_node(&mut self, node: Weak<Node>) {
self.node = Some(node);
}
}
let mut my_node = Arc::new(Node::new());
let mut my_communicator = Arc::new(Communicator::new());
Arc::get_mut(&mut my_node).unwrap().set_communicator(Arc::clone(&my_communicator));
//Arc::get_mut(&mut my_communicator).unwrap().set_node(Arc::downgrade(&my_node));
}
My code crashes, as I predicted, if I uncomment both of the two last lines. But it is something like that I want to achieve.
I see a couple of options:
Use Mutex or RwLock to gain interior mutability. But that gives a performance penalty.
Use Cell or RefCell. But they are not thread-safe.
Use AtomicCell from crossbeam. Looks like the best option so far. But what about performance?
Use unsafe. But where? And how to ensure the code remains memory safe in spite of unsafe?
In theory I could split Communicator into a Sender and a Receiver, so that there would be no cyclical references. But for my program, I know I will have a similar situation in the future where this will not be possible.
It seems that since I just split a struct into two, for code-structuring reasons, and gaining no new functionality, there should be a way to do this without having to pay any performance penalty like Mutex would give.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I send read-only data to other threads without copying? - multithreading

Related

How can I store variables in traits so that it can be used in other methods on trait?

How can I determine if I have a unique Arc when it's dropped?

Sharing Mutable Data Between Threads in Rust

How to safely wrap C pointers in rust structs

Circular reference between two structs with high performance

Categories

Resources