Forced to use of Mutex when it's not required

Forced to use of Mutex when it's not required - multithreading

I am writing a game and have a player list defined as follows:
pub struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
This struct has methods for adding, removing, getting players, and getting the player count.
The NetworkServer and Server shares this list as follows:
NetworkServer {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
Server {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
This is inside an Arc<Mutex> because the NetworkServer accesses the list in a different thread (network loop).
When a player joins, a thread is spawned for them and they are added to the player_list.
Although the only operation I'm doing is adding to player_list, I'm forced to use Arc<Mutex<Player>> instead of the more natural Rc<RefCell<Player>> in the HashMaps because Mutex<PlayerList> requires it. I am not accessing players from the network thread (or any other thread) so it makes no sense to put them under a Mutex. Only the HashMaps need to be locked, which I am doing using Mutex<PlayerList>. But Rust is pedantic and wants to protect against all misuses.
As I'm only accessing Players in the main thread, locking every time to do that is both annoying and less performant. Is there a workaround instead of using unsafe or something?
Here's an example:
use std::cell::Cell;
use std::collections::HashMap;
use std::ffi::CString;
use std::rc::Rc;
use std::sync::{Arc, Mutex};
use std::thread;
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Uuid([u8; 16]);
struct Player {
pub name: String,
pub uuid: Uuid,
}
struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Arc::new(Mutex::new(p));
self.by_name.insert(name, Arc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
struct NetworkServer {
player_list: Arc<Mutex<PlayerList>>,
}
impl NetworkServer {
fn start(&mut self) {
let player_list = Arc::clone(&self.player_list);
thread::spawn(move || {
loop {
// fake network loop
// listen for incoming connections, accept player and add them to player_list.
player_list.lock().unwrap().add_player(Player {
name: "blahblah".into(),
uuid: Uuid([0; 16]),
});
}
});
}
}
struct Server {
player_list: Arc<Mutex<PlayerList>>,
network_server: NetworkServer,
}
impl Server {
fn start(&mut self) {
self.network_server.start();
// main game loop
loop {
// I am only accessing players in this loop in this thread. (main thread)
// so Mutex for individual player is not needed although rust requires it.
}
}
}
fn main() {
let player_list = Arc::new(Mutex::new(PlayerList {
by_name: HashMap::new(),
by_uuid: HashMap::new(),
}));
let network_server = NetworkServer {
player_list: Arc::clone(&player_list),
};
let mut server = Server {
player_list,
network_server,
};
server.start();
}

As I'm only accessing Players in the main thread, locking everytime to do that is both annoying and less performant.
You mean, as right now you are only accessing Players in the main thread, but at any time later you may accidentally introduce an access to them in another thread?
From the point of view of the language, if you can get a reference to a value, you may use the value. Therefore, if multiple threads have a reference to a value, this value should be safe to use from multiple threads. There is no way to enforce, at compile-time, that a particular value, although accessible, is actually never used.
This raises the question, however:
If the value is never used by a given thread, why does this thread have access to it in the first place?
It seems to me that you have a design issue. If you can manage to redesign your program so that only the main thread has access to the PlayerList, then you will immediately be able to use Rc<RefCell<...>>.
For example, you could instead have the network thread send a message to the main thread announcing that a new player connected.
At the moment, you are "Communicating by Sharing", and you could shift toward "Sharing by Communicating" instead. The former usually has synchronization primitives (such as mutexes, atomics, ...) all over the place, and may face contention/dead-lock issues, while the latter usually has communication queues (channels) and requires an "asynchronous" style of programming.

Send is a marker trait that governs which objects can have ownership transferred across thread boundaries. It is automatically implemented for any type that is entirely composed of Send types. It is also an unsafe trait because manually implementing this trait can cause the compiler to not enforce the concurrency safety that we love about Rust.
The problem is that Rc<RefCell<Player>> isn't Send and thus your PlayerList isn't Send and thus can't be sent to another thread, even when wrapped in an Arc<Mutex<>>. The unsafe workaround would be to unsafe impl Send for your PlayerList struct.
Putting this code into your playground example allows it to compile the same way as the original with Arc<Mutex<Player>>
struct PlayerList {
by_name: HashMap<String, Rc<RefCell<Player>>>,
by_uuid: HashMap<Uuid, Rc<RefCell<Player>>>,
}
unsafe impl Send for PlayerList {}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Rc::new(RefCell::new(p));
self.by_name.insert(name, Rc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
Playground
The Nomicon is sadly a little sparse at explaining what rules have have to be enforced by the programmer when unsafely implementing Send for a type containing Rcs, but accessing in only one thread seems safe enough...
For completeness, here's TRPL's bit on Send and Sync

I suggest solving this threading problem using a multi-sender-single-receiver channel. The network threads get a Sender<Player> and no direct access to the player list.
The Receiver<Player> gets stored inside the PlayerList. The only thread accessing the PlayerList is the main thread, so you can remove the Mutex around it. Instead in the place where the main-thread used to lock the mutexit dequeue all pending players from the Receiver<Player>, wraps them in an Rc<RefCell<>> and adds them to the appropriate collections.
Though looking at the bigger designing, I wouldn't use a per-player thread in the first place. Instead I'd use some kind single threaded event-loop based design. (I didn't look into which Rust libraries are good in that area, but tokio seems popular)

Related

How can I store variables in traits so that it can be used in other methods on trait?

I have several objects in my code that have common functionality. These objects act essentially like services where they have a life-time (controlled by start/stop) and perform work until their life-time ends. I am in the process of trying to refactor my code to reduce duplication but getting stuck.
At a high level, my objects all have the code below implemented in them.
impl SomeObject {
fn start(&self) {
// starts a new thread.
// stores the 'JoinHandle' so thread can be joined later.
}
fn do_work(&self) {
// perform work in the context of the new thread.
}
fn stop(&self) {
// interrupts the work that this object is doing.
// stops the thread.
}
}
Essentially, these objects act like "services" so in order to refactor, my first thought was that I should create a trait called "service" as shown below.
trait Service {
fn start(&self) {}
fn do_work(&self);
fn stop(&self) {}
}
Then, I can just update my objects to each implement the "Service" trait. The issue that I am having though, is that since traits are not allowed to have fields/properties, I am not sure how I can go about saving the 'JoinHandle' in the trait so that I can use it in the other methods.
Is there an idiomatic way to handle this problem in Rust?
tldr; how can I save variables in a trait so that they can be re-used in different trait methods?
Edit:
Here is the solution I settled on. Any feedback is appreciated.
extern crate log;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread::JoinHandle;
pub struct Context {
name: String,
thread_join_handle: JoinHandle<()>,
is_thread_running: Arc<AtomicBool>,
}
pub trait Service {
fn start(&self, name: String) -> Context {
log::trace!("starting service, name=[{}]", name);
let is_thread_running = Arc::new(AtomicBool::new(true));
let cloned_is_thread_running = is_thread_running.clone();
let thread_join_handle = std::thread::spawn(move || loop {
while cloned_is_thread_running.load(Ordering::SeqCst) {
Self::do_work();
}
});
log::trace!("started service, name=[{}]", name);
return Context {
name: name,
thread_join_handle: thread_join_handle,
is_thread_running: is_thread_running,
};
}
fn stop(context: Context) {
log::trace!("stopping service, name=[{}]", context.name);
context.is_thread_running.store(false, Ordering::SeqCst);
context
.thread_join_handle
.join()
.expect("joining service thread");
log::trace!("stopped service, name=[{}]", context.name);
}
fn do_work();
}

I think you just need to save your JoinHandle in the struct's state itself, then the other methods can access it as well because they all get all the struct's data passed to them already.
struct SomeObject {
join_handle: JoinHandle;
}
impl Service for SomeObject {
fn start(&mut self) {
// starts a new thread.
// stores the 'JoinHandle' so thread can be joined later.
self.join_handle = what_ever_you_wanted_to_set //Store it here like this
}
fn do_work(&self) {
// perform work in the context of the new thread.
}
fn stop(&self) {
// interrupts the work that this object is doing.
// stops the thread.
}
}
Hopefully that works for you.

Depending on how you're using the trait, you could restructure your code and make it so start returns a JoinHandle and the other functions take the join handle as input
trait Service {
fn start(&self) -> JoinHandle;
fn do_work(&self, handle: &mut JoinHandle);
fn stop(&self, handle: JoinHandle);
}
(maybe with different function arguments depending on what you need). this way you could probably cut down on duplicate code by putting all the code that handles the handles (haha) outside of the structs themselves and makes it more generic. If you want the structs the use the JoinHandle outside of this trait, I'd say it's best to just do what Equinox suggested and just make it a field.

How to create a complex concurrent datastructure in rust

I need a struct containing several fields that should be usable in a concurrent context.
struct Foo {
number: u32,
collection: Vec<u32>,
}
In C I would add a Mutex to the struct and (un)lock that, but doing this in rust, using a Mutex<()> is not possible, as the struct is located somewhere inside an Arc. This means I need the Mutex around the fields if I want to mutate them.
However this leads to problems if I want to extract functionality into private methods, as Mutex is not reentrant.
[Edit]
If the fields in question are needed in both the main method and the helper method, locking multiple times is not the solution, as I would like the whole block to be treated as one unit, otherwise it would be possible to change something inbetween from another thread.
[/Edit]
In the simeple case like above, the solution would probably be to not make it concurrent at all and let the caller deal with it, i.e. use Mutex<Foo> and deal with the locking every time it is used.
However, in some situations, a CondVar is needed, to wait on the collection being filled or emptied, in which case I need a MutexGuard to wait on.
I guess I could move the CondVar to the outside as well, but then I would need to be concerned with the inner workings of my struct on the outside, which I don't like at all.
I don't know if I'm going completely into the wrong direction here, but I couldn't find anything on this by searching the web. Everything only talks about locking simple datatypes.
The approaches mentioned above:
struct Foo {
lock: Mutex<()>,
number: u32,
collection: Vec<u32>,
}
struct Foo {
number: Mutex<u32>,
collection: Mutex<Vec<u32>>,
}
impl Foo {
pub fn add_if_bigger(&self, v: u32) {
let collection = self.collection.lock().unwrap();
if self.is_bigger(v) {
collection.push(v);
}
}
fn is_bigger(&self) -> bool {
let collection = self.collection.lock().unwrap();
// already locked by this thread --^
// ...
}
}

Sharing Mutable Data Between Threads in Rust

I know there are hundreds of questions just like this one, but i'm having trouble wrapping my head around how to do the thing I'm trying to do.
I want an http server that accepts and processes events. On receiving/processing an event, i want the EventManager to send an update to an ApplicationMonitor that is tracking how many events have been accepted/processed. The ApplicationMonitor would also (eventually) handle things like tracking number of concurrent connections, but in this example I just want my EventManager to send an Inc('event_accepted') update to my ApplicationMonitor.
To be useful, I need the ApplicationMonitor to be able to return a snapshot of the stats when the requested through a /stats route.
So I have an ApplicationMonitor which spawns a thread and listens on a channel for incoming Stat events. When it receives a Stat event it updates the stats HashMap. The stats hashmap must be mutable within both ApplicationMonitor as well as the spawned thread.
use std::sync::mpsc;
use std::sync::mpsc::Sender;
use std::thread;
use std::thread::JoinHandle;
use std::collections::HashMap;
pub enum Stat {
Inc(&'static str),
Dec(&'static str),
Set(&'static str, i32)
}
pub struct ApplicationMonitor {
pub tx: Sender<Stat>,
pub join_handle: JoinHandle<()>
}
impl ApplicationMonitor {
pub fn new() -> ApplicationMonitor {
let (tx, rx) = mpsc::channel::<Stat>();
let mut stats: HashMap<&'static str, i32> = HashMap::new();
let join_handle = thread::spawn(move || {
for stat in rx.recv() {
match stat {
Stat::Inc(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val + 1);
},
Stat::Dec(nm) => {
let current_val = stats.entry(nm).or_insert(0);
stats.insert(nm, *current_val - 1);
},
Stat::Set(nm, val) => {
stats.insert(nm, val);
}
}
}
});
let am = ApplicationMonitor {
tx,
join_handle
};
am
}
pub fn get_snapshot(&self) -> HashMap<&'static str, i32> {
self.stats.clone()
}
}
Because rx cannot be cloned, I must move the references into the closure. When I do this, I am no longer able to access stats outside of the thread.
I thought maybe I needed a second channel so the thread could communicate it's internals back out, but this doesn't work as i would need another thread to listen for that in a non-blocking way.
Is this where I'd use Arc?
How can I have stats live inside and out of the thread context?

Yes, this is a place where you'd wrap your stats in an Arc so that you can have multiple references to it from different threads. But just wrapping in an Arc will only give you a read-only view of the HashMap - if you need to be able to modify it, you'll also need to wrap it in something which guarantees that only one thing can modify it at a time. So you'll probably end up with either an Arc<Mutex<HashMap<&'static str, i32>>> or a Arc<RwLock<HashMap<&'static str, i32>>>.
Alternatively, if you're just changing the values, and not adding or removing values, you could potentially use an Arc<HashMap<&static str, AtomicU32>>, which would allow you to read and modify different values in parallel without needing to take out a Map-wide lock, but Atomics can be a little more fiddly to understand and use correctly than locks.

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.

No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...

The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.

Rust syncronization strategy for MUD server

So if you had a MUD sever that handled each tcp connection in a separate process,
for stream in acceptor.incoming() {
match stream {
Err(e) => { /* connection failed */ }
Ok(stream) => spawn(proc() {
handle_client(stream)
})
}
}
What would be the strategy for sharing mutable world data for that server? I can imagine n connections responding to commands from the user. Each command needing to visit and possibly modify the world.
pub struct Server<'a> {
world: World<'a>
}
pub struct World<'a> {
pub chat_rooms: HashMap<&'a str, ChatRoom<'a>>
}
impl<'a> World<'a> {
pub fn new() -> World<'a> {
let mut rooms = HashMap::new();
rooms.insert("General", ChatRoom::new("General"));
rooms.insert("Help", ChatRoom::new("Help"));
World{chat_rooms: rooms}
}
}
Would Arc be the way to go?
let shared_server = Arc::new(server);
let server = shared_server.clone();
spawn(proc() {
// Work with server
});
What about scaling to 100 or 1000 users? I'm just looking for a nudge in the right direction.

An Arc will let you access a value from multiple tasks, but it will not allow you to borrow the value mutably. The compiler cannot verify statically that only one task would borrow the value mutably at a time, and mutating a value concurrently on different tasks leads to data races.
Rust's standard library provides some types that allow mutating a shared object safely. Here are two of them:
Mutex: This is a simple mutual exclusion lock. A Mutex wraps the protected value, so the only way to access the value is by locking the mutex. Only one task at a time may access the wrapped value.
RWLock: This is a reader-writer lock. This kind of lock lets multiple tasks read a value at the same time, but writers must have exclusive access. This is basically the same rules that the borrow checker and RefCell have (except the lock waits until the borrow is released, instead of failing compilation or panicking).
You'll need to wrap a Mutex or a RWLock in an Arc to make it accessible to multiple tasks.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Forced to use of Mutex when it's not required - multithreading

Related

How can I store variables in traits so that it can be used in other methods on trait?

How to create a complex concurrent datastructure in rust

Sharing Mutable Data Between Threads in Rust

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

Rust syncronization strategy for MUD server

Categories

Resources