Rust syncronization strategy for MUD server

Rust syncronization strategy for MUD server - multithreading

So if you had a MUD sever that handled each tcp connection in a separate process,
for stream in acceptor.incoming() {
match stream {
Err(e) => { /* connection failed */ }
Ok(stream) => spawn(proc() {
handle_client(stream)
})
}
}
What would be the strategy for sharing mutable world data for that server? I can imagine n connections responding to commands from the user. Each command needing to visit and possibly modify the world.
pub struct Server<'a> {
world: World<'a>
}
pub struct World<'a> {
pub chat_rooms: HashMap<&'a str, ChatRoom<'a>>
}
impl<'a> World<'a> {
pub fn new() -> World<'a> {
let mut rooms = HashMap::new();
rooms.insert("General", ChatRoom::new("General"));
rooms.insert("Help", ChatRoom::new("Help"));
World{chat_rooms: rooms}
}
}
Would Arc be the way to go?
let shared_server = Arc::new(server);
let server = shared_server.clone();
spawn(proc() {
// Work with server
});
What about scaling to 100 or 1000 users? I'm just looking for a nudge in the right direction.

An Arc will let you access a value from multiple tasks, but it will not allow you to borrow the value mutably. The compiler cannot verify statically that only one task would borrow the value mutably at a time, and mutating a value concurrently on different tasks leads to data races.
Rust's standard library provides some types that allow mutating a shared object safely. Here are two of them:
Mutex: This is a simple mutual exclusion lock. A Mutex wraps the protected value, so the only way to access the value is by locking the mutex. Only one task at a time may access the wrapped value.
RWLock: This is a reader-writer lock. This kind of lock lets multiple tasks read a value at the same time, but writers must have exclusive access. This is basically the same rules that the borrow checker and RefCell have (except the lock waits until the borrow is released, instead of failing compilation or panicking).
You'll need to wrap a Mutex or a RWLock in an Arc to make it accessible to multiple tasks.

Related

Can the borrow checker know when an Arc is "released"? Can a 'static lifetime granted temporarily?

I'm trying to speed up a computationally-heavy Rust function by making it concurrent using only the built-in thread support. In particular, I want to alternate between quick single-threaded phases (where the main thread has mutable access to a big structure) and concurrent phases (where many worker threads run with read-only access to the structure). I don't want to make extra copies of the structure or force it to be 'static. Where I'm having trouble is convincing the borrow checker that the worker threads have finished.
Ignoring the borrow checker, an Arc reference seems like does all that is needed. The reference count in the Arc increases with the .clone() for each worker, then decreases as the workers conclude and I join all the worker threads. If (and only if) the Arc reference count is 1, it should be safe for the main thread to resume. The borrow checker, however, doesn't seem to know about Arc reference counts, and insists that my structure needs to be 'static.
Here's some sample code which works fine if I don't use threads, but won't compile if I switch the comments to enable the multi-threaded case.
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let ref_bigstruct = &mut BigStruct { data: Vec::new() };
for i in 0..3 {
ref_bigstruct.data.push(i); // Phase where main thread has write access
run_threads(ref_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &BigStruct) {
let arc_bigstruct = Arc::new(ref_bigstruct);
{
let arc_clone_for_worker = arc_bigstruct.clone();
// SINGLE-THREADED WORKS:
worker_thread(arc_clone_for_worker);
// MULTI-THREADED DOES NOT COMPILE:
// let handle = thread::spawn(move || { worker_thread(arc_clone_for_worker); } );
// handle.join();
}
assert!(Arc::strong_count(&arc_bigstruct) == 1);
println!("??? How can I tell the borrow checker that all borrows of ref_bigstruct are done?")
}
fn worker_thread(my_struct: Arc<&BigStruct>) {
println!(" worker says len()={}", my_struct.data.len());
}
I'm still learning about Rust lifetimes, but what I think (fear?) what I need is an operation that will take an ordinary (not 'static) reference to my structure and give me an Arc that I can clone into immutable references with a 'static lifetime for use by the workers. Once all the the worker Arc references are dropped, the borrow checker needs to allow my thread-spawning function to return. For safety, I assume this would panic if the the reference count is >1. While this seems like it would generally confirm with Rust's safety requirements, I don't see how to do it.

The underlying problem is not the borrowing checker not following Arc and the solution is not to use Arc. The problem is the borrow checker being unable to understand that the reason a thread must be 'static is because it may outlive the spawning thread, and thus if I immediately .join() it it is fine.
And the solution is to use scoped threads, that is, threads that allow you to use non-'static data because they always immediately .join(), and thus the spawned thread cannot outlive the spawning thread. Problem is, there are no worker threads on the standard library. Well, there are, however they're unstable.
So if you insist on not using crates, for some reason, you have no choice but to use unsafe code (don't, really). But if you can use external crates, then you can use the well-known crossbeam crate with its crossbeam::scope function, at least til std's scoped threads are stabilized.

In Rust Arc< T>, T is per definition immutable. Which means in order to use Arc, to make threads access data that is going to change, you also need it to wrap in some type that is interiorly mutable.
Rust provides a type that is especially suited for a single write or multiple read accesses in parallel, called RwLock.
So for your simple example, this would propably look something like this
use std::{sync::{Arc, RwLock}, thread};
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let arc_bigstruct = Arc::new(RwLock::new(BigStruct { data: Vec::new() }));
for i in 0..3 {
arc_bigstruct.write().unwrap().data.push(i); // Phase where main thread has write access
run_threads(&arc_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &Arc<RwLock<BigStruct>>) {
{
let arc_clone_for_worker = ref_bigstruct.clone();
//MULTI-THREADED
let handle = thread::spawn(move || { worker_thread(&arc_clone_for_worker); } );
handle.join().unwrap();
}
assert!(Arc::strong_count(&ref_bigstruct) == 1);
}
fn worker_thread(my_struct: &Arc<RwLock<BigStruct>>) {
println!(" worker says len()={}", my_struct.read().unwrap().data.len());
}
Which outputs
worker says len()=1
worker says len()=2
worker says len()=3
As for your question, the borrow checker does not know when an Arc is released, as far as I know. The references are counted at runtime.

Accessing disjoint entries in global HashMap for lifetime of thread in Rust

my current project requires recording some information for various events that happen during the execution of a thread. These events are saved in a global struct index by the thread id:
RECORDER1: HashMap<ThreadId, Vec<Entry>> = HashMap::new();
Every thread appends new Entry to its vector. Therefore, threads access "disjoint" vectors. Rust requires synchronization primitives to make the above work of course. So the real implementation looks like:
struct Entry {
// ... not important.
}
#[derive(Clone, Eq, PartialEq, Hash)]
struct ThreadId;
// lazy_static necessary to initialize this data structure.
lazy_static! {
/// Global data structure. Threads access disjoint entries based on their unique thread id.
/// "Outer" mutex necessary as lazy_static requires sync (so cannot use RefCell).
static ref RECORDER2: Mutex<HashMap<ThreadId, Vec<Entry>>> = Mutex::new(HashMap::new());
}
This works, but all threads contend on the same global lock. It would be nice if a thread could "borrow" its respective vector for the lifetime of the thread so it could write all the entries it needs without needing to lock every time (I understand the outer lock is necessary for ensuring threads don't insert into the HashMap at the same time).
We can do this by adding an Arc and some more interior mutability via a Mutex for the values in the HashMap:
lazy_static! {
static ref RECORDER: Mutex<HashMap<ThreadId, Arc<Mutex<Vec<Entry>>>>> = Mutex::new(HashMap::new());
}
Now we can "check out" our entry when a thread is spawned:
fn local_borrow() {
std::thread::spawn(|| {
let mut recorder = RECORDER.lock().expect("Unable to acquire outer mutex lock.");
let my_thread_id: ThreadId = ThreadId {}; // Get thread id...
// Insert entry in hashmap for our thread.
// Omit logic to check if key-value pair already existed (it shouldn't).
recorder.insert(my_thread_id.clone(), Arc::new(Mutex::new(Vec::new())));
// Get "reference" to vector
let local_entries: Arc<Mutex<Vec<Entry>>> = recorder
.get(&my_thread_id)
.unwrap() // We just inserted this entry, so unwrap.
.clone(); // Clone on the Arc to acquire a "copy".
// Lock once, use multiple times.
let mut local_entries: MutexGuard<_> = local_entries.lock().unwrap();
local_entries.push(Entry {});
local_entries.push(Entry {});
});
}
This works and is what I want. However, due to API constraints I have to access the MutexGuard from widely different places across the code without the ability to pass the MutexGuard as an argument to functions. So instead I use a thread local variable:
thread_local! {
/// This variable is initialized lazily. Due to API constraints, we use this thread_local! to
/// "pass" LOCAL_ENTRIES around.
static LOCAL_ENTRIES: Arc<Mutex<Vec<Entry>>> = {
let mut recorder = RECORDER.lock().expect("Unable to acquire outer mutex lock.");
let my_thread_id: ThreadId = ThreadId {}; // Get thread id...
// Omit logic to check if key-value pair already existed (it shouldn't).
recorder.insert(my_thread_id.clone(), Arc::new(Mutex::new(Vec::new())));
// Get "reference" to vector
recorder
.get(&my_thread_id)
.unwrap() // We just inserted this entry, so unwrap.
.clone() // Clone on the Arc to acquire a "copy".
}
}
I cannot make LOCAL_ENTRIES: MutexGuard<_> since thread_local! requires a 'static lifetime. So currently I have to .lock() every time I want to access the thread-local variable:
fn main() {
std::thread::spawn(|| {
// Record important message.
LOCAL_ENTRIES.with(|entries| {
// We have to lock every time we want to write to LOCAL_ENTRIES. It would be nice
// to lock once and hold on to the MutexGuard for the lifetime of the thread, but
// this is not possible to due the lifetime on the MutextGuard.
let mut entries = entries.lock().expect("Unable to acquire lock");
entries.push(Entry {});
});
});
}
Sorry for all the code and explanation but I'm really stuck and wanted to show why it doesn't work and what I'm trying to get working. How can one get around this in Rust?
Or am I getting hung up on cost of the mutex locking? For any Arc<Mutex<Vec<Entry>>>, the lock will always be unlocked so the cost of doing the atomic locking will be tiny?
Thanks for any thoughts. Here is the complete example in Rust Playground.

Forced to use of Mutex when it's not required

I am writing a game and have a player list defined as follows:
pub struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
This struct has methods for adding, removing, getting players, and getting the player count.
The NetworkServer and Server shares this list as follows:
NetworkServer {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
Server {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
This is inside an Arc<Mutex> because the NetworkServer accesses the list in a different thread (network loop).
When a player joins, a thread is spawned for them and they are added to the player_list.
Although the only operation I'm doing is adding to player_list, I'm forced to use Arc<Mutex<Player>> instead of the more natural Rc<RefCell<Player>> in the HashMaps because Mutex<PlayerList> requires it. I am not accessing players from the network thread (or any other thread) so it makes no sense to put them under a Mutex. Only the HashMaps need to be locked, which I am doing using Mutex<PlayerList>. But Rust is pedantic and wants to protect against all misuses.
As I'm only accessing Players in the main thread, locking every time to do that is both annoying and less performant. Is there a workaround instead of using unsafe or something?
Here's an example:
use std::cell::Cell;
use std::collections::HashMap;
use std::ffi::CString;
use std::rc::Rc;
use std::sync::{Arc, Mutex};
use std::thread;
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Uuid([u8; 16]);
struct Player {
pub name: String,
pub uuid: Uuid,
}
struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Arc::new(Mutex::new(p));
self.by_name.insert(name, Arc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
struct NetworkServer {
player_list: Arc<Mutex<PlayerList>>,
}
impl NetworkServer {
fn start(&mut self) {
let player_list = Arc::clone(&self.player_list);
thread::spawn(move || {
loop {
// fake network loop
// listen for incoming connections, accept player and add them to player_list.
player_list.lock().unwrap().add_player(Player {
name: "blahblah".into(),
uuid: Uuid([0; 16]),
});
}
});
}
}
struct Server {
player_list: Arc<Mutex<PlayerList>>,
network_server: NetworkServer,
}
impl Server {
fn start(&mut self) {
self.network_server.start();
// main game loop
loop {
// I am only accessing players in this loop in this thread. (main thread)
// so Mutex for individual player is not needed although rust requires it.
}
}
}
fn main() {
let player_list = Arc::new(Mutex::new(PlayerList {
by_name: HashMap::new(),
by_uuid: HashMap::new(),
}));
let network_server = NetworkServer {
player_list: Arc::clone(&player_list),
};
let mut server = Server {
player_list,
network_server,
};
server.start();
}

As I'm only accessing Players in the main thread, locking everytime to do that is both annoying and less performant.
You mean, as right now you are only accessing Players in the main thread, but at any time later you may accidentally introduce an access to them in another thread?
From the point of view of the language, if you can get a reference to a value, you may use the value. Therefore, if multiple threads have a reference to a value, this value should be safe to use from multiple threads. There is no way to enforce, at compile-time, that a particular value, although accessible, is actually never used.
This raises the question, however:
If the value is never used by a given thread, why does this thread have access to it in the first place?
It seems to me that you have a design issue. If you can manage to redesign your program so that only the main thread has access to the PlayerList, then you will immediately be able to use Rc<RefCell<...>>.
For example, you could instead have the network thread send a message to the main thread announcing that a new player connected.
At the moment, you are "Communicating by Sharing", and you could shift toward "Sharing by Communicating" instead. The former usually has synchronization primitives (such as mutexes, atomics, ...) all over the place, and may face contention/dead-lock issues, while the latter usually has communication queues (channels) and requires an "asynchronous" style of programming.

Send is a marker trait that governs which objects can have ownership transferred across thread boundaries. It is automatically implemented for any type that is entirely composed of Send types. It is also an unsafe trait because manually implementing this trait can cause the compiler to not enforce the concurrency safety that we love about Rust.
The problem is that Rc<RefCell<Player>> isn't Send and thus your PlayerList isn't Send and thus can't be sent to another thread, even when wrapped in an Arc<Mutex<>>. The unsafe workaround would be to unsafe impl Send for your PlayerList struct.
Putting this code into your playground example allows it to compile the same way as the original with Arc<Mutex<Player>>
struct PlayerList {
by_name: HashMap<String, Rc<RefCell<Player>>>,
by_uuid: HashMap<Uuid, Rc<RefCell<Player>>>,
}
unsafe impl Send for PlayerList {}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Rc::new(RefCell::new(p));
self.by_name.insert(name, Rc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
Playground
The Nomicon is sadly a little sparse at explaining what rules have have to be enforced by the programmer when unsafely implementing Send for a type containing Rcs, but accessing in only one thread seems safe enough...
For completeness, here's TRPL's bit on Send and Sync

I suggest solving this threading problem using a multi-sender-single-receiver channel. The network threads get a Sender<Player> and no direct access to the player list.
The Receiver<Player> gets stored inside the PlayerList. The only thread accessing the PlayerList is the main thread, so you can remove the Mutex around it. Instead in the place where the main-thread used to lock the mutexit dequeue all pending players from the Receiver<Player>, wraps them in an Rc<RefCell<>> and adds them to the appropriate collections.
Though looking at the bigger designing, I wouldn't use a per-player thread in the first place. Instead I'd use some kind single threaded event-loop based design. (I didn't look into which Rust libraries are good in that area, but tokio seems popular)

One mutable borrow and multiple immutable borrows

I'm trying to write a program that spawns a background thread that continuously inserts data into some collection. At the same time, I want to keep getting input from stdin and check if that input is in the collection the thread is operating on.
Here is a boiled down example:
use std::collections::HashSet;
use std::thread;
fn main() {
let mut set: HashSet<String> = HashSet::new();
thread::spawn(move || {
loop {
set.insert("foo".to_string());
}
});
loop {
let input: String = get_input_from_stdin();
if set.contains(&input) {
// Do something...
}
}
}
fn get_input_from_stdin() -> String {
String::new()
}
However this doesn't work because of ownership stuff.
I'm still new to Rust but this seems like something that should be possible. I just can't find the right combination of Arcs, Rcs, Mutexes, etc. to wrap my data in.

First of all, please read Need holistic explanation about Rust's cell and reference counted types.
There are two problems to solve here:
Sharing ownership between threads,
Mutable aliasing.
To share ownership, the simplest solution is Arc. It requires its argument to be Sync (accessible safely from multiple threads) which can be achieved for any Send type by wrapping it inside a Mutex or RwLock.
To safely get aliasing in the presence of mutability, both Mutex and RwLock will work. If you had multiple readers, RwLock might have an extra performance edge. Since you have a single reader there's no point: let's use the simple Mutex.
And therefore, your type is: Arc<Mutex<HashSet<String>>>.
The next trick is passing the value to the closure to run in another thread. The value is moved, and therefore you need to first make a clone of the Arc and then pass the clone, otherwise you've moved your original and cannot access it any longer.
Finally, accessing the data requires going through the borrows and locks...
use std::sync::{Arc, Mutex};
fn main() {
let set = Arc::new(Mutex::new(HashSet::new()));
let clone = set.clone();
thread::spawn(move || {
loop {
clone.lock().unwrap().insert("foo".to_string());
}
});
loop {
let input: String = get_input_from_stdin();
if set.lock().unwrap().contains(&input) {
// Do something...
}
}
}
The call to unwrap is there because Mutex::lock returns a Result; it may be impossible to lock the Mutex if it is poisoned, which means a panic occurred while it was locked and therefore its content is possibly garbage.

Arc reference to member of field

I'm trying to spawn a given set of threads and have each perform a long running operation. I would be passing a structure to each worker thread as the internal state of the given thread. The collection of said structs is kept in a vector, part of a Master struct.
The compiler rejects me passing the internal member of a struct to Arc::new():
use std::thread;
use std::sync::Arc;
struct Worker {
name: String,
}
struct Master {
workers: Vec<Worker>,
}
impl Worker {
fn start(&self) {
println!("My name is {} and I'm working!", self.name);
thread::sleep_ms(100_000);
}
}
impl Master {
pub fn run_test(&mut self) {
for i in 0..10 {
self.workers.push(Worker {
name: String::new() + "Worker" + &i.to_string()
});
}
let mut data = Arc::new(self.workers);
for i in 0..10 {
let local_data = data.clone();
thread::spawn(move || {
local_data[i].start();
});
}
thread::sleep_ms(100_000);
}
}
fn main() {
let mut master = Master { workers: vec![] };
}
The error message:
error[E0507]: cannot move out of borrowed content
--> <anon>:26:33
|
26 | let mut data = Arc::new(self.workers);
| ^^^^ cannot move out of borrowed content
What am I doing wrong? Is this idiomatic Rust?

Welcome to Ownership.
In Rust, any single piece of data has one and exactly one owner. Don't be fooled by Rc and Arc: they are a shared interface on top of a single (invisible) owner.
The simplest way of expressing ownership is by value:
struct Master {
workers: Vec<Worker>
}
Here, Master owns a Vec<Worker> which itself owns multiple Worker.
Similarly, functions that take their argument by value (fn new(t: T) -> Arc<T> for example) receive ownership of their argument.
And that is where the issue lies:
Arc::new(self.workers)
means that you are, at the same time:
claiming that Master is the owner of workers
claiming that Arc is the owner of workers
Given the rule of one and exactly one owner, this is clearly intractable.
So, how do you cheat and have multiple co-owners for a single piece of data?
Well... use Rc or Arc!
struct Master {
workers: Arc<Vec<Worker>>
}
And now creating data is as simple as:
let data = self.workers.clone();
which creates a new Arc (which just bumps the reference count).
That's not quite all, though. The core tenet of the Borrowing system is: Aliasing XOR Mutability.
Since Arc is about aliasing, it prevents mutability. You cannot insert workers into self.workers any longer!
There are multiple solutions, such as deferring the initialization of self.workers until the vector is built, however the most common is to use cells or mutexes, that is Rc<RefCell<T>> or Arc<Mutex<T>> (or Arc<RwLock<T>>).
RefCell and Mutex are wrappers that move borrow checking from compile-time to run-time. This gives a bit more flexibility, but may result in run-time panics instead of compile-time errors, so is best used as a last resort.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Rust syncronization strategy for MUD server - multithreading

Related

Can the borrow checker know when an Arc is "released"? Can a 'static lifetime granted temporarily?

Accessing disjoint entries in global HashMap for lifetime of thread in Rust

Forced to use of Mutex when it's not required

One mutable borrow and multiple immutable borrows

Arc reference to member of field

Categories

Resources