Alternative to `static mut` and unsafe while managing global application state [duplicate] - rust

This question already has answers here:
How do I create a global, mutable singleton?
(7 answers)
Closed 4 months ago.
Background
I have been learning rust recently and my most recent project involves setting a global APP_STATE that can then be accessed throughout the app. There are a few other globals as well.
Note: These variables pretty much need to be globals, otherwise I will have to pass them as arguments into every function and trait I have - which is not very elegant.
The Problem
The aforementioned globals are mutable, i.e they are represented like the following:
pub Struct AppState {
running: bool
suspended: bool
}
static mut APP_STATE = AppState {running: true, suspended:false}
To access these values, I must use unsafe like so: (Ignore the logic of code itself, just an example)
pub unsafe fn create_app {
APP_STATE.running = true;
APP_STATE.suspended = false;
}
unsafe fn confirm_app_state_valid() {
if APP_STATE.running == APP_STATE.suspended { // equality on booleans is just the XNOR(Logical Bi-conditional) operator.
fatal("Fatal! App was both running and suspended at same time. Could not resolve. Crashed")
};
}
The Question
How can I change my code to
remove the unsafe (I understand why having unsafe is needed for mutable statics - avoid race conditions). Note that my app uses concurrency and performance is critical(graphics).
I want to remove the unsafe or atleast reduce its usage - currently it encompasses the entire main loop.
not have to pass state as an argument everywhere
Already looked at this. I did not understand how I might implement the solution provided here.

If you absolutely want some static global state, you can use once_cell in conjunction with a Mutex (see the example below).
However, I don't understand your remark « which is clearly not optimal » about passing the state as a parameter; do you mean inelegant or inefficient?
Moreover, you state « performance is critical »; in my opinion, this static global state requiring runtime borrow checking at each access is less efficient than the usual static borrow checking.
use once_cell::sync::OnceCell;
use std::sync::{Mutex, MutexGuard};
#[derive(Debug)]
struct AppState {
running: bool,
suspended: bool,
}
static APP_STATE: OnceCell<Mutex<AppState>> = OnceCell::new();
fn access_app_state() -> MutexGuard<'static, AppState> {
APP_STATE.get().unwrap().lock().unwrap()
}
fn confirm_app_state_valid() {
let app_state = access_app_state();
if app_state.running == app_state.suspended {
panic!("Fatal! App was both running and suspended at same time...");
}
println!("App state is correct: {:?}", app_state);
}
fn change_app_state() {
let mut app_state = access_app_state();
app_state.running = !app_state.running;
app_state.suspended = !app_state.suspended;
}
fn main() {
APP_STATE
.set(Mutex::new(AppState {
running: true,
suspended: false,
}))
.unwrap();
confirm_app_state_valid();
change_app_state();
confirm_app_state_valid();
}
/*
App state is correct: AppState { running: true, suspended: false }
App state is correct: AppState { running: false, suspended: true }
*/
As stated by Chayim Friedman in a comment, since Rust 1.63 Mutex::new() is const.
We can get rid of once_cell and just initialise the global mutex with a content which is known at compile-time.
use std::sync::{Mutex, MutexGuard};
#[derive(Debug)]
struct AppState {
running: bool,
suspended: bool,
}
static APP_STATE: Mutex<AppState> = Mutex::new(AppState {
running: true,
suspended: false,
});
fn access_app_state() -> MutexGuard<'static, AppState> {
APP_STATE.lock().unwrap()
}
fn confirm_app_state_valid() {
let app_state = access_app_state();
if app_state.running == app_state.suspended {
panic!("Fatal! App was both running and suspended at same time...");
}
println!("App state is correct: {:?}", app_state);
}
fn change_app_state() {
let mut app_state = access_app_state();
app_state.running = !app_state.running;
app_state.suspended = !app_state.suspended;
}
fn main() {
confirm_app_state_valid();
change_app_state();
confirm_app_state_valid();
}
/*
App state is correct: AppState { running: true, suspended: false }
App state is correct: AppState { running: false, suspended: true }
*/

Related

Rust: joining thread fails with: cannot move out of dereference of `std::sync::MutexGuard<'_, models::worker::Worker>`

I am having a hard time figuring out how to sort out this issue.
So I have a class ArcWorker holding a shared reference to Worker (as you can remark below).
I wrote a function in ArcWorker called join() in which the line self.internal.lock().unwrap().join(); fails with the following error:
cannot move out of dereference of std::sync::MutexGuard<'_, models::worker::Worker>
What I attempt through that line is to lock the mutex, unwrap and call the join() function from the Worker class.
As far as I understand, once that the lock function is called and it borrows a reference to self (&self), then I need some way to get to pass self by value to join (std::thread's join function requires passing self by value).
What can I do to make this work? Tried to find an answer to my question for hours but to no avail.
pub struct Worker {
accounts: Vec<Arc<Mutex<Account>>>,
thread_join_handle: Option<thread::JoinHandle<()>>
}
pub struct ArcWorker {
internal: Arc<Mutex<Worker>>
}
impl ArcWorker {
pub fn new(accounts: Vec<Arc<Mutex<Account>>>) -> ArcWorker {
return ArcWorker {
internal: Arc::new(Mutex::new(Worker {
accounts: accounts,
thread_join_handle: None
}))
}
}
pub fn spawn(&self) {
let local_self_1 = self.internal.clone();
self.internal.lock().unwrap().thread_join_handle = Some(thread::spawn(move || {
println!("Spawn worker");
local_self_1.lock().unwrap().perform_random_transactions();
}));
}
pub fn join(&self) {
self.internal.lock().unwrap().join();
}
}
impl Worker {
fn join(self) {
if let Some(thread_join_handle) = self.thread_join_handle {
thread_join_handle.join().expect("Couldn't join the associated threads.")
}
}
fn perform_random_transactions(&self) {
}
}
Since you already hold JoinHandle in an option, you can make Worker::join() take &mut self instead of self and change the if let condition to:
// note added `.take()`
if let Some(thread_join_handle) = self.thread_join_handle.take() {
Option::take() will move the handle out of the option and give you ownership over it, while leaving None in self.thread_join_handle. With this change ArcWorker::join() should compile as-is.

What is the Rust equivalent of C++'s shared_from_this?

I have an object that I know that is inside an Arc because all the instances are always Arced. I would like to be able to pass a cloned Arc of myself in a function call. The thing I am calling will call me back later on other threads.
In C++, there is a standard mixin called enable_shared_from_this. It enables me to do exactly this
class Bus : public std::enable_shared_from_this<Bus>
{
....
void SetupDevice(Device device,...)
{
device->Attach(shared_from_this());
}
}
If this object is not under shared_ptr management (the closest C++ has to Arc) then this will fail at run time.
I cannot find an equivalent.
EDIT:
Here is an example of why its needed. I have a timerqueue library. It allows a client to request an arbitrary closure to be run at some point in the future. The code is run on a dedicated thread. To use it you must pass a closure of the function you want to be executed later.
use std::time::{Duration, Instant};
use timerqueue::*;
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
// inline me keeper cos not on github
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
// -----------------------------------
struct Test {
data:String,
me: MeKeeper<Self>,
}
impl Test {
pub fn new() -> Arc<Test>{
let arc = Arc::new(Self {
me: MeKeeper::new(),
data: "Yo".to_string()
});
arc.me.save(&arc);
arc
}
fn task(&self) {
println!("{}", self.data);
}
// in real use case the TQ and a ton of other status data is passed in the new call for Test
// to keep things simple here the 'container' passes tq as an arg
pub fn do_stuff(&self, tq: &TimerQueue) {
// stuff includes a async task that must be done in 1 second
//.....
let me = self.me.get().clone();
tq.queue(
Box::new(move || me.task()),
"x".to_string(),
Instant::now() + Duration::from_millis(1000),
);
}
}
fn main() {
// in real case (PDP11 emulator) there is a Bus class owning tons of objects thats
// alive for the whole duration
let tq = Arc::new(TimerQueue::new());
let test = Test::new();
test.do_stuff(&*tq);
// just to keep everything alive while we wait
let mut input = String::new();
std::io::stdin().read_line(&mut input).unwrap();
}
cargo toml
[package]
name = "tqclient"
version = "0.1.0"
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
timerqueue = { git = "https://github.com/pm100/timerqueue.git" }
parking_lot = "0.11"
There is no way to go from a &self to the Arc that self is stored in. This is because:
Rust references have additional assumptions compared to C++ references that would make such a conversion undefined behavior.
Rust's implementation of Arc does not even expose the information necessary to determine whether self is stored in an Arc or not.
Luckily, there is an alternative approach. Instead of creating a &self to the value inside the Arc, and passing that to the method, pass the Arc directly to the method that needs to access it. You can do that like this:
use std::sync::Arc;
struct Shared {
field: String,
}
impl Shared {
fn print_field(self: Arc<Self>) {
let clone: Arc<Shared> = self.clone();
println!("{}", clone.field);
}
}
Then the print_field function can only be called on an Shared encapsulated in an Arc.
having found that I needed this three times in recent days I decided to stop trying to come up with other designs. Maybe poor data design as far as rust is concerned but I needed it.
Works by changing the new function of the types using it to return an Arc rather than a raw self. All my objects are arced anyway, before they were arced by the caller, now its forced.
mini util library called mekeeper
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
to use it
pub struct Test {
me: MeKeeper<Self>,
foo:i8,
}
impl Test {
pub fn new() -> Arc<Self> {
let arc = Arc::new(Test {
me: MeKeeper::new(),
foo:42
});
arc.me.save(&arc);
arc
}
}
now when an instance of Test wants to call a function that requires it to pass in an Arc it does:
fn nargle(){
let me = me.get();
Ooddle::fertang(me,42);// fertang needs an Arc<T>
}
the weak use is what the shared_from_this does so as to prevent refcount deadlocks, I stole that idea.
The unreachable path is safe because the only place that can call MeKeeper::get is the instance of T (Test here) that owns it and that call can only happen if the T instance is alive. Hence no none return from weak::upgrade

How do I keep internal state in a WebAssembly module written in Rust?

I want to do computations on a large set of data each frame of my web app. Only a subset of this will be used by JavaScript, so instead of sending the entire set of data back and forth between WebAssembly and JavaScript each frame, it would be nice if the data was maintained internally in my WebAssembly module.
In C, something like this works:
#include <emscripten/emscripten.h>
int state = 0;
void EMSCRIPTEN_KEEPALIVE inc() {
state++;
}
int EMSCRIPTEN_KEEPALIVE get() {
return state;
}
Is the same thing possible in Rust? I tried doing it with a static like this:
static mut state: i32 = 0;
pub fn main() {}
#[no_mangle]
pub fn add() {
state += 1;
}
#[no_mangle]
pub fn get() -> i32 {
state
}
But it seems static variables cannot be mutable.
Francis Gagné is absolutely correct that global variables generally make your code worse and you should avoid them.
However, for the specific case of WebAssembly as it is today, we don't have to worry about this concern:
if you have multiple threads
We can thus choose to use mutable static variables, if we have a very good reason to do so:
// Only valid because we are using this in a WebAssembly
// context without threads.
static mut STATE: i32 = 0;
#[no_mangle]
pub extern fn add() {
unsafe { STATE += 1 };
}
#[no_mangle]
pub extern fn get() -> i32 {
unsafe { STATE }
}
We can see the behavior with this NodeJS driver program:
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { get, add } = instance.exports;
console.log(get());
add();
add();
console.log(get());
});
0
2
error[E0133]: use of mutable static requires unsafe function or block
In general, accessing mutable global variables is unsafe, which means that you can only do it in an unsafe block. With mutable global variables, it's easy to accidentally create dangling references (think of a reference to an item of a global mutable Vec), data races (if you have multiple threads – Rust doesn't care that you don't actually use threads) or otherwise invoke undefined behavior.
Global variables are usually not the best solution to a problem because it makes your software less flexible and less reusable. Instead, consider passing the state explicitly (by reference, so you don't need to copy it) to the functions that need to operate on it. This lets the calling code work with multiple independent states.
Here's an example of allocating unique state and modifying that:
type State = i32;
#[no_mangle]
pub extern fn new() -> *mut State {
Box::into_raw(Box::new(0))
}
#[no_mangle]
pub extern fn free(state: *mut State) {
unsafe { Box::from_raw(state) };
}
#[no_mangle]
pub extern fn add(state: *mut State) {
unsafe { *state += 1 };
}
#[no_mangle]
pub extern fn get(state: *mut State) -> i32 {
unsafe { *state }
}
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { new: newFn, free, get, add } = instance.exports;
const state1 = newFn();
const state2 = newFn();
add(state1);
add(state2);
add(state1);
console.log(get(state1));
console.log(get(state2));
free(state1);
free(state2);
});
2
1
Note — This currently needs to be compiled in release mode to work. Debugging mode has some issues at the moment.
Admittedly, this is not less unsafe because you're passing raw pointers around, but it makes it clearer in the calling code that there is some mutable state being manipulated. Also note that it is now the responsibility of the caller to ensure that the state pointer is being handled correctly.

Is it possible initialize a global variable at runtime without a mutex? [duplicate]

This is something of a controversial topic, so let me start by explaining my use case, and then talk about the actual problem.
I find that for a bunch of unsafe things, it's important to make sure that you don't leak memory; this is actually quite easy to do if you start using transmute() and forget(). For example, passing a boxed instance to C code for an arbitrary amount of time, then fetching it back out and 'resurrecting it' by using transmute.
Imagine I have a safe wrapper for this sort of API:
trait Foo {}
struct CBox;
impl CBox {
/// Stores value in a bound C api, forget(value)
fn set<T: Foo>(value: T) {
// ...
}
/// Periodically call this and maybe get a callback invoked
fn poll(_: Box<Fn<(EventType, Foo), ()> + Send>) {
// ...
}
}
impl Drop for CBox {
fn drop(&mut self) {
// Safely load all saved Foo's here and discard them, preventing memory leaks
}
}
To test this is actually not leaking any memory, I want some tests like this:
#[cfg(test)]
mod test {
struct IsFoo;
impl Foo for IsFoo {}
impl Drop for IsFoo {
fn drop(&mut self) {
Static::touch();
}
}
#[test]
fn test_drops_actually_work() {
guard = Static::lock(); // Prevent any other use of Static concurrently
Static::reset(); // Set to zero
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(Static::get() == 2); // Assert that all expected drops were invoked
guard.release();
}
}
How can you create this type of static singleton object?
It must use a Semaphore style guard lock to ensure that multiple tests do not concurrently run, and then unsafely access some kind of static mutable value.
I thought perhaps this implementation would work, but practically speaking it fails because occasionally race conditions result in a duplicate execution of init:
/// Global instance
static mut INSTANCE_LOCK: bool = false;
static mut INSTANCE: *mut StaticUtils = 0 as *mut StaticUtils;
static mut WRITE_LOCK: *mut Semaphore = 0 as *mut Semaphore;
static mut LOCK: *mut Semaphore = 0 as *mut Semaphore;
/// Generate instances if they don't exist
unsafe fn init() {
if !INSTANCE_LOCK {
INSTANCE_LOCK = true;
INSTANCE = transmute(box StaticUtils::new());
WRITE_LOCK = transmute(box Semaphore::new(1));
LOCK = transmute(box Semaphore::new(1));
}
}
Note specifically that unlike a normal program where you can be certain that your entry point (main) is always running in a single task, the test runner in Rust does not offer any kind of single entry point like this.
Other, obviously, than specifying the maximum number of tasks; given dozens of tests, only a handful need to do this sort of thing, and it's slow and pointless to limit the test task pool to one just for this one case.
It looks like a use case for std::sync::Once:
use std::sync::{Once, ONCE_INIT};
static INIT: Once = ONCE_INIT;
Then in your tests call
INIT.doit(|| unsafe { init(); });
Once guarantees that your init will only be executed once, no matter how many times you call INIT.doit().
See also lazy_static, which makes things a little more ergonomic. It does essentially the same thing as a static Once for each variable, but wraps it in a type that implements Deref so that you can access it like a normal reference.
Usage looks like this (from the documentation):
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref HASHMAP: HashMap<u32, &'static str> = {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
m
};
static ref COUNT: usize = HASHMAP.len();
static ref NUMBER: u32 = times_two(21);
}
fn times_two(n: u32) -> u32 { n * 2 }
fn main() {
println!("The map has {} entries.", *COUNT);
println!("The entry for `0` is \"{}\".", HASHMAP.get(&0).unwrap());
println!("A expensive calculation on a static results in: {}.", *NUMBER);
}
Note that autoderef means that you don't even have to use * whenever you call a method on your static variable. The variable will be initialized the first time it's Deref'd.
However, lazy_static variables are immutable (since they're behind a reference). If you want a mutable static, you'll need to use a Mutex:
lazy_static! {
static ref VALUE: Mutex<u64>;
}
impl Drop for IsFoo {
fn drop(&mut self) {
let mut value = VALUE.lock().unwrap();
*value += 1;
}
}
#[test]
fn test_drops_actually_work() {
// Have to drop the mutex guard to unlock, so we put it in its own scope
{
*VALUE.lock().unwrap() = 0;
}
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(*VALUE.lock().unwrap() == 2); // Assert that all expected drops were invoked
}
If you're willing to use nightly Rust you can use SyncLazy instead of the external lazy_static crate:
#![feature(once_cell)]
use std::collections::HashMap;
use std::lazy::SyncLazy;
static HASHMAP: SyncLazy<HashMap<i32, String>> = SyncLazy::new(|| {
println!("initializing");
let mut m = HashMap::new();
m.insert(13, "Spica".to_string());
m.insert(74, "Hoyten".to_string());
m
});
fn main() {
println!("ready");
std::thread::spawn(|| {
println!("{:?}", HASHMAP.get(&13));
}).join().unwrap();
println!("{:?}", HASHMAP.get(&74));
// Prints:
// ready
// initializing
// Some("Spica")
// Some("Hoyten")
}

Generate sequential IDs for each instance of a struct

I'm writing a system where I have a collection of Objects, and each Object has a unique integral ID. Here's how I would do it in C++:
class Object {
public:
Object(): id_(nextId_++) { }
private:
int id_;
static int nextId_;
}
int Object::nextId_ = 1;
This is obviously not thread_safe, but if I wanted it to be, I could make nextId_ an std::atomic_int, or wrap a mutex around the nextId_++ expression.
How would I do this in (preferably safe) Rust? There's no static struct members, nor are global mutable variables safe. I could always pass nextId into the new function, but these objects are going to be allocated in a number of places, and I would prefer not to pipe the nextId number hither and yon. Thoughts?
Atomic variables can live in statics, so you can use it relatively straightforwardly (the downside is that you have global state).
Example code: (playground link)
use std::{
sync::atomic::{AtomicUsize, Ordering},
thread,
};
static OBJECT_COUNTER: AtomicUsize = AtomicUsize::new(0);
#[derive(Debug)]
struct Object(usize);
impl Object {
fn new() -> Self {
Object(OBJECT_COUNTER.fetch_add(1, Ordering::SeqCst))
}
}
fn main() {
let threads = (0..10)
.map(|_| thread::spawn(|| Object::new()))
.collect::<Vec<_>>();
for t in threads {
println!("{:?}", t.join().unwrap());
}
}
nor are global mutable variables safe
Your C++ example seems like it would have thread-safety issues, but I don't know enough C++ to be sure.
However, only unsynchronized global mutable variables are trouble. If you don't care about cross-thread issues, you can use a thread-local:
use std::cell::Cell;
#[derive(Debug)]
struct Monster {
id: usize,
health: u8,
}
thread_local!(static MONSTER_ID: Cell<usize> = Cell::new(0));
impl Monster {
fn new(health: u8) -> Monster {
MONSTER_ID.with(|thread_id| {
let id = thread_id.get();
thread_id.set(id + 1);
Monster { id, health }
})
}
}
fn main() {
let gnome = Monster::new(41);
let troll = Monster::new(42);
println!("gnome {:?}", gnome);
println!("troll {:?}", troll);
}
If you do want something that works better with multiple threads, check out bluss' answer, which shows how to use an atomic variable.

Resources