Generate sequential IDs for each instance of a struct - rust

I'm writing a system where I have a collection of Objects, and each Object has a unique integral ID. Here's how I would do it in C++:
class Object {
public:
Object(): id_(nextId_++) { }
private:
int id_;
static int nextId_;
}
int Object::nextId_ = 1;
This is obviously not thread_safe, but if I wanted it to be, I could make nextId_ an std::atomic_int, or wrap a mutex around the nextId_++ expression.
How would I do this in (preferably safe) Rust? There's no static struct members, nor are global mutable variables safe. I could always pass nextId into the new function, but these objects are going to be allocated in a number of places, and I would prefer not to pipe the nextId number hither and yon. Thoughts?

Atomic variables can live in statics, so you can use it relatively straightforwardly (the downside is that you have global state).
Example code: (playground link)
use std::{
sync::atomic::{AtomicUsize, Ordering},
thread,
};
static OBJECT_COUNTER: AtomicUsize = AtomicUsize::new(0);
#[derive(Debug)]
struct Object(usize);
impl Object {
fn new() -> Self {
Object(OBJECT_COUNTER.fetch_add(1, Ordering::SeqCst))
}
}
fn main() {
let threads = (0..10)
.map(|_| thread::spawn(|| Object::new()))
.collect::<Vec<_>>();
for t in threads {
println!("{:?}", t.join().unwrap());
}
}

nor are global mutable variables safe
Your C++ example seems like it would have thread-safety issues, but I don't know enough C++ to be sure.
However, only unsynchronized global mutable variables are trouble. If you don't care about cross-thread issues, you can use a thread-local:
use std::cell::Cell;
#[derive(Debug)]
struct Monster {
id: usize,
health: u8,
}
thread_local!(static MONSTER_ID: Cell<usize> = Cell::new(0));
impl Monster {
fn new(health: u8) -> Monster {
MONSTER_ID.with(|thread_id| {
let id = thread_id.get();
thread_id.set(id + 1);
Monster { id, health }
})
}
}
fn main() {
let gnome = Monster::new(41);
let troll = Monster::new(42);
println!("gnome {:?}", gnome);
println!("troll {:?}", troll);
}
If you do want something that works better with multiple threads, check out bluss' answer, which shows how to use an atomic variable.

Related

How do I keep internal state in a WebAssembly module written in Rust?

I want to do computations on a large set of data each frame of my web app. Only a subset of this will be used by JavaScript, so instead of sending the entire set of data back and forth between WebAssembly and JavaScript each frame, it would be nice if the data was maintained internally in my WebAssembly module.
In C, something like this works:
#include <emscripten/emscripten.h>
int state = 0;
void EMSCRIPTEN_KEEPALIVE inc() {
state++;
}
int EMSCRIPTEN_KEEPALIVE get() {
return state;
}
Is the same thing possible in Rust? I tried doing it with a static like this:
static mut state: i32 = 0;
pub fn main() {}
#[no_mangle]
pub fn add() {
state += 1;
}
#[no_mangle]
pub fn get() -> i32 {
state
}
But it seems static variables cannot be mutable.
Francis Gagné is absolutely correct that global variables generally make your code worse and you should avoid them.
However, for the specific case of WebAssembly as it is today, we don't have to worry about this concern:
if you have multiple threads
We can thus choose to use mutable static variables, if we have a very good reason to do so:
// Only valid because we are using this in a WebAssembly
// context without threads.
static mut STATE: i32 = 0;
#[no_mangle]
pub extern fn add() {
unsafe { STATE += 1 };
}
#[no_mangle]
pub extern fn get() -> i32 {
unsafe { STATE }
}
We can see the behavior with this NodeJS driver program:
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { get, add } = instance.exports;
console.log(get());
add();
add();
console.log(get());
});
0
2
error[E0133]: use of mutable static requires unsafe function or block
In general, accessing mutable global variables is unsafe, which means that you can only do it in an unsafe block. With mutable global variables, it's easy to accidentally create dangling references (think of a reference to an item of a global mutable Vec), data races (if you have multiple threads – Rust doesn't care that you don't actually use threads) or otherwise invoke undefined behavior.
Global variables are usually not the best solution to a problem because it makes your software less flexible and less reusable. Instead, consider passing the state explicitly (by reference, so you don't need to copy it) to the functions that need to operate on it. This lets the calling code work with multiple independent states.
Here's an example of allocating unique state and modifying that:
type State = i32;
#[no_mangle]
pub extern fn new() -> *mut State {
Box::into_raw(Box::new(0))
}
#[no_mangle]
pub extern fn free(state: *mut State) {
unsafe { Box::from_raw(state) };
}
#[no_mangle]
pub extern fn add(state: *mut State) {
unsafe { *state += 1 };
}
#[no_mangle]
pub extern fn get(state: *mut State) -> i32 {
unsafe { *state }
}
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { new: newFn, free, get, add } = instance.exports;
const state1 = newFn();
const state2 = newFn();
add(state1);
add(state2);
add(state1);
console.log(get(state1));
console.log(get(state2));
free(state1);
free(state2);
});
2
1
Note — This currently needs to be compiled in release mode to work. Debugging mode has some issues at the moment.
Admittedly, this is not less unsafe because you're passing raw pointers around, but it makes it clearer in the calling code that there is some mutable state being manipulated. Also note that it is now the responsibility of the caller to ensure that the state pointer is being handled correctly.

Is it possible initialize a global variable at runtime without a mutex? [duplicate]

This is something of a controversial topic, so let me start by explaining my use case, and then talk about the actual problem.
I find that for a bunch of unsafe things, it's important to make sure that you don't leak memory; this is actually quite easy to do if you start using transmute() and forget(). For example, passing a boxed instance to C code for an arbitrary amount of time, then fetching it back out and 'resurrecting it' by using transmute.
Imagine I have a safe wrapper for this sort of API:
trait Foo {}
struct CBox;
impl CBox {
/// Stores value in a bound C api, forget(value)
fn set<T: Foo>(value: T) {
// ...
}
/// Periodically call this and maybe get a callback invoked
fn poll(_: Box<Fn<(EventType, Foo), ()> + Send>) {
// ...
}
}
impl Drop for CBox {
fn drop(&mut self) {
// Safely load all saved Foo's here and discard them, preventing memory leaks
}
}
To test this is actually not leaking any memory, I want some tests like this:
#[cfg(test)]
mod test {
struct IsFoo;
impl Foo for IsFoo {}
impl Drop for IsFoo {
fn drop(&mut self) {
Static::touch();
}
}
#[test]
fn test_drops_actually_work() {
guard = Static::lock(); // Prevent any other use of Static concurrently
Static::reset(); // Set to zero
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(Static::get() == 2); // Assert that all expected drops were invoked
guard.release();
}
}
How can you create this type of static singleton object?
It must use a Semaphore style guard lock to ensure that multiple tests do not concurrently run, and then unsafely access some kind of static mutable value.
I thought perhaps this implementation would work, but practically speaking it fails because occasionally race conditions result in a duplicate execution of init:
/// Global instance
static mut INSTANCE_LOCK: bool = false;
static mut INSTANCE: *mut StaticUtils = 0 as *mut StaticUtils;
static mut WRITE_LOCK: *mut Semaphore = 0 as *mut Semaphore;
static mut LOCK: *mut Semaphore = 0 as *mut Semaphore;
/// Generate instances if they don't exist
unsafe fn init() {
if !INSTANCE_LOCK {
INSTANCE_LOCK = true;
INSTANCE = transmute(box StaticUtils::new());
WRITE_LOCK = transmute(box Semaphore::new(1));
LOCK = transmute(box Semaphore::new(1));
}
}
Note specifically that unlike a normal program where you can be certain that your entry point (main) is always running in a single task, the test runner in Rust does not offer any kind of single entry point like this.
Other, obviously, than specifying the maximum number of tasks; given dozens of tests, only a handful need to do this sort of thing, and it's slow and pointless to limit the test task pool to one just for this one case.
It looks like a use case for std::sync::Once:
use std::sync::{Once, ONCE_INIT};
static INIT: Once = ONCE_INIT;
Then in your tests call
INIT.doit(|| unsafe { init(); });
Once guarantees that your init will only be executed once, no matter how many times you call INIT.doit().
See also lazy_static, which makes things a little more ergonomic. It does essentially the same thing as a static Once for each variable, but wraps it in a type that implements Deref so that you can access it like a normal reference.
Usage looks like this (from the documentation):
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref HASHMAP: HashMap<u32, &'static str> = {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
m
};
static ref COUNT: usize = HASHMAP.len();
static ref NUMBER: u32 = times_two(21);
}
fn times_two(n: u32) -> u32 { n * 2 }
fn main() {
println!("The map has {} entries.", *COUNT);
println!("The entry for `0` is \"{}\".", HASHMAP.get(&0).unwrap());
println!("A expensive calculation on a static results in: {}.", *NUMBER);
}
Note that autoderef means that you don't even have to use * whenever you call a method on your static variable. The variable will be initialized the first time it's Deref'd.
However, lazy_static variables are immutable (since they're behind a reference). If you want a mutable static, you'll need to use a Mutex:
lazy_static! {
static ref VALUE: Mutex<u64>;
}
impl Drop for IsFoo {
fn drop(&mut self) {
let mut value = VALUE.lock().unwrap();
*value += 1;
}
}
#[test]
fn test_drops_actually_work() {
// Have to drop the mutex guard to unlock, so we put it in its own scope
{
*VALUE.lock().unwrap() = 0;
}
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(*VALUE.lock().unwrap() == 2); // Assert that all expected drops were invoked
}
If you're willing to use nightly Rust you can use SyncLazy instead of the external lazy_static crate:
#![feature(once_cell)]
use std::collections::HashMap;
use std::lazy::SyncLazy;
static HASHMAP: SyncLazy<HashMap<i32, String>> = SyncLazy::new(|| {
println!("initializing");
let mut m = HashMap::new();
m.insert(13, "Spica".to_string());
m.insert(74, "Hoyten".to_string());
m
});
fn main() {
println!("ready");
std::thread::spawn(|| {
println!("{:?}", HASHMAP.get(&13));
}).join().unwrap();
println!("{:?}", HASHMAP.get(&74));
// Prints:
// ready
// initializing
// Some("Spica")
// Some("Hoyten")
}

What is the right smart pointer to have multiple strong references and allow mutability?

I want to have a structure on the heap with two references; one for me and another from a closure. Note that the code is for the single-threaded case:
use std::rc::Rc;
#[derive(Debug)]
struct Foo {
val: u32,
}
impl Foo {
fn set_val(&mut self, val: u32) {
self.val = val;
}
}
impl Drop for Foo {
fn drop(&mut self) {
println!("we drop {:?}", self);
}
}
fn need_callback(mut cb: Box<FnMut(u32)>) {
cb(17);
}
fn create() -> Rc<Foo> {
let rc = Rc::new(Foo { val: 5 });
let weak_rc = Rc::downgrade(&rc);
need_callback(Box::new(move |x| {
if let Some(mut rc) = weak_rc.upgrade() {
if let Some(foo) = Rc::get_mut(&mut rc) {
foo.set_val(x);
}
}
}));
rc
}
fn main() {
create();
}
In the real code, need_callback saves the callback to some place, but before that may call cb as need_callback does.
The code shows that std::rc::Rc is not suitable for this task because foo.set_val(x) is never called; I have two strong references and Rc::get_mut gives None in this case.
What smart pointer with reference counting should I use instead of std::rc::Rc to make it possible to call foo.set_val? Maybe it is possible to fix my code and still use std::rc::Rc?
After some thinking, I need something like std::rc::Rc, but weak references should prevent dropping. I can have two weak references and upgrade them to strong when I need mutability.
Because it is a singled-threaded program, I will have only strong reference at a time, so everything will work as expected.
Rc (and its multithreaded counterpart Arc) only concern themselves with ownership. Instead of a single owner, there is now joint ownership, tracked at runtime.
Mutability is a different concept, although closely related to ownership: if you own a value, then you have the ability to mutate it. This is why Rc::get_mut only works when there is a single strong reference - it's the same as saying there is a single owner.
If you need the ability to divide mutability in a way that doesn't match the structure of the program, you can use tools like Cell or RefCell for single-threaded programs:
use std::cell::RefCell;
fn create() -> Rc<RefCell<Foo>> {
let rc = Rc::new(RefCell::new(Foo { val: 5 }));
let weak_rc = Rc::downgrade(&rc);
need_callback(move |x| {
if let Some(rc) = weak_rc.upgrade() {
rc.borrow_mut().set_val(x);
}
});
rc
}
Or Mutex, RwLock, or an atomic type in multithreaded contexts:
use std::sync::Mutex;
fn create() -> Rc<Mutex<Foo>> {
let rc = Rc::new(Mutex::new(Foo { val: 5 }));
let weak_rc = Rc::downgrade(&rc);
need_callback(move |x| {
if let Some(rc) = weak_rc.upgrade() {
if let Ok(mut foo) = rc.try_lock() {
foo.set_val(x);
}
}
});
rc
}
These tools all defer the check that there is only a single mutable reference to runtime, instead of compile time.

Detecting new struct initialization

I'm coming from mostly OOP languages, so getting this concept to work in Rust kinda seems hard. I want to implement a basic counter that keeps count of how many "instances" I've made of that type, and keep them in a vector for later use.
I've tried many different things, first was making a static vector variable, but that cant be done due to it not allowing static stuff that have destructors.
This was my first try:
struct Entity {
name: String,
}
struct EntityCounter {
count: i64,
}
impl Entity {
pub fn init() {
let counter = EntityCounter { count: 0 };
}
pub fn new(name: String) {
println!("Entity named {} was made.", name);
counter += 1; // counter variable unaccessable (is there a way to make it global to the struct (?..idek))
}
}
fn main() {
Entity::init();
Entity::new("Hello".to_string());
}
Second:
struct Entity {
name: String,
counter: i32,
}
impl Entity {
pub fn new(self) {
println!("Entity named {} was made.", self.name);
self.counter = self.counter + 1;
}
}
fn main() {
Entity::new(Entity { name: "Test".to_string() });
}
None of those work, I was just trying out some concepts on how I could be able to implement such a feature.
Your problems appear to be somewhat more fundamental than what you describe. You're kind of throwing code at the wall to see what sticks, and that's simply not going to get you anywhere. I'd recommend reading the Rust Book completely before continuing. If you don't understand something in it, ask about it. As it stands, you're demonstrating you don't understand variable scoping, return types, how instance construction works, how statics work, and how parameters are passed. That's a really shaky base to try and build any understanding on.
In this particular case, you're asking for something that's deliberately not straightforward. You say you want a counter and a vector of instances. The counter is simple enough, but a vector of instances? Rust doesn't allow easy sharing like other languages, so how you go about doing that depends heavily on what it is you're actually intending to use this for.
What follows is a very rough guess at something that's maybe vaguely similar to what you want.
/*!
Because we need the `lazy_static` crate, you need to add the following to your
`Cargo.toml` file:
```cargo
[dependencies]
lazy_static = "0.2.1"
```
*/
#[macro_use] extern crate lazy_static;
mod entity {
use std::sync::{Arc, Weak, Mutex};
use std::sync::atomic;
pub struct Entity {
pub name: String,
}
impl Entity {
pub fn new(name: String) -> Arc<Self> {
println!("Entity named {} was made.", name);
let ent = Arc::new(Entity {
name: name,
});
bump_counter();
remember_instance(ent.clone());
ent
}
}
/*
The counter is simple enough, though I'm not clear on *why* you even want
it in the first place. You don't appear to be using it for anything...
*/
static COUNTER: atomic::AtomicUsize = atomic::ATOMIC_USIZE_INIT;
fn bump_counter() {
// Add one using the most conservative ordering.
COUNTER.fetch_add(1, atomic::Ordering::SeqCst);
}
pub fn get_counter() -> usize {
COUNTER.load(atomic::Ordering::SeqCst)
}
/*
There are *multiple* ways of doing this part, and you simply haven't given
enough information on what it is you're trying to do. This is, at best,
a *very* rough guess.
`Mutex` lets us safely mutate the vector from any thread, and `Weak`
prevents `INSTANCES` from keeping every instance alive *forever*. I mean,
maybe you *want* that, but you didn't specify.
Note that I haven't written a "cleanup" function here to remove dead weak
references.
*/
lazy_static! {
static ref INSTANCES: Mutex<Vec<Weak<Entity>>> = Mutex::new(vec![]);
}
fn remember_instance(entity: Arc<Entity>) {
// Downgrade to a weak reference. Type constraint is just for clarity.
let entity: Weak<Entity> = Arc::downgrade(&entity);
INSTANCES
// Lock mutex
.lock().expect("INSTANCES mutex was poisoned")
// Push entity
.push(entity);
}
pub fn get_instances() -> Vec<Arc<Entity>> {
/*
This is about as inefficient as I could write this, but again, without
knowing your access patterns, I can't really do any better.
*/
INSTANCES
// Lock mutex
.lock().expect("INSTANCES mutex was poisoned")
// Get a borrowing iterator from the Vec.
.iter()
/*
Convert each `&Weak<Entity>` into a fresh `Arc<Entity>`. If we
couldn't (because the weak ref is dead), just drop that element.
*/
.filter_map(|weak_entity| weak_entity.upgrade())
// Collect into a new `Vec`.
.collect()
}
}
fn main() {
use entity::Entity;
let e0 = Entity::new("Entity 0".to_string());
println!("e0: {}", e0.name);
{
let e1 = Entity::new("Entity 1".to_string());
println!("e1: {}", e1.name);
/*
`e1` is dropped here, which should cause the underlying `Entity` to
stop existing, since there are no other "strong" references to it.
*/
}
let e2 = Entity::new("Entity 2".to_string());
println!("e2: {}", e2.name);
println!("Counter: {}", entity::get_counter());
println!("Instances:");
for ent in entity::get_instances() {
println!("- {}", ent.name);
}
}

How can you make a safe static singleton in Rust?

This is something of a controversial topic, so let me start by explaining my use case, and then talk about the actual problem.
I find that for a bunch of unsafe things, it's important to make sure that you don't leak memory; this is actually quite easy to do if you start using transmute() and forget(). For example, passing a boxed instance to C code for an arbitrary amount of time, then fetching it back out and 'resurrecting it' by using transmute.
Imagine I have a safe wrapper for this sort of API:
trait Foo {}
struct CBox;
impl CBox {
/// Stores value in a bound C api, forget(value)
fn set<T: Foo>(value: T) {
// ...
}
/// Periodically call this and maybe get a callback invoked
fn poll(_: Box<Fn<(EventType, Foo), ()> + Send>) {
// ...
}
}
impl Drop for CBox {
fn drop(&mut self) {
// Safely load all saved Foo's here and discard them, preventing memory leaks
}
}
To test this is actually not leaking any memory, I want some tests like this:
#[cfg(test)]
mod test {
struct IsFoo;
impl Foo for IsFoo {}
impl Drop for IsFoo {
fn drop(&mut self) {
Static::touch();
}
}
#[test]
fn test_drops_actually_work() {
guard = Static::lock(); // Prevent any other use of Static concurrently
Static::reset(); // Set to zero
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(Static::get() == 2); // Assert that all expected drops were invoked
guard.release();
}
}
How can you create this type of static singleton object?
It must use a Semaphore style guard lock to ensure that multiple tests do not concurrently run, and then unsafely access some kind of static mutable value.
I thought perhaps this implementation would work, but practically speaking it fails because occasionally race conditions result in a duplicate execution of init:
/// Global instance
static mut INSTANCE_LOCK: bool = false;
static mut INSTANCE: *mut StaticUtils = 0 as *mut StaticUtils;
static mut WRITE_LOCK: *mut Semaphore = 0 as *mut Semaphore;
static mut LOCK: *mut Semaphore = 0 as *mut Semaphore;
/// Generate instances if they don't exist
unsafe fn init() {
if !INSTANCE_LOCK {
INSTANCE_LOCK = true;
INSTANCE = transmute(box StaticUtils::new());
WRITE_LOCK = transmute(box Semaphore::new(1));
LOCK = transmute(box Semaphore::new(1));
}
}
Note specifically that unlike a normal program where you can be certain that your entry point (main) is always running in a single task, the test runner in Rust does not offer any kind of single entry point like this.
Other, obviously, than specifying the maximum number of tasks; given dozens of tests, only a handful need to do this sort of thing, and it's slow and pointless to limit the test task pool to one just for this one case.
It looks like a use case for std::sync::Once:
use std::sync::{Once, ONCE_INIT};
static INIT: Once = ONCE_INIT;
Then in your tests call
INIT.doit(|| unsafe { init(); });
Once guarantees that your init will only be executed once, no matter how many times you call INIT.doit().
See also lazy_static, which makes things a little more ergonomic. It does essentially the same thing as a static Once for each variable, but wraps it in a type that implements Deref so that you can access it like a normal reference.
Usage looks like this (from the documentation):
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref HASHMAP: HashMap<u32, &'static str> = {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
m
};
static ref COUNT: usize = HASHMAP.len();
static ref NUMBER: u32 = times_two(21);
}
fn times_two(n: u32) -> u32 { n * 2 }
fn main() {
println!("The map has {} entries.", *COUNT);
println!("The entry for `0` is \"{}\".", HASHMAP.get(&0).unwrap());
println!("A expensive calculation on a static results in: {}.", *NUMBER);
}
Note that autoderef means that you don't even have to use * whenever you call a method on your static variable. The variable will be initialized the first time it's Deref'd.
However, lazy_static variables are immutable (since they're behind a reference). If you want a mutable static, you'll need to use a Mutex:
lazy_static! {
static ref VALUE: Mutex<u64>;
}
impl Drop for IsFoo {
fn drop(&mut self) {
let mut value = VALUE.lock().unwrap();
*value += 1;
}
}
#[test]
fn test_drops_actually_work() {
// Have to drop the mutex guard to unlock, so we put it in its own scope
{
*VALUE.lock().unwrap() = 0;
}
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(*VALUE.lock().unwrap() == 2); // Assert that all expected drops were invoked
}
If you're willing to use nightly Rust you can use SyncLazy instead of the external lazy_static crate:
#![feature(once_cell)]
use std::collections::HashMap;
use std::lazy::SyncLazy;
static HASHMAP: SyncLazy<HashMap<i32, String>> = SyncLazy::new(|| {
println!("initializing");
let mut m = HashMap::new();
m.insert(13, "Spica".to_string());
m.insert(74, "Hoyten".to_string());
m
});
fn main() {
println!("ready");
std::thread::spawn(|| {
println!("{:?}", HASHMAP.get(&13));
}).join().unwrap();
println!("{:?}", HASHMAP.get(&74));
// Prints:
// ready
// initializing
// Some("Spica")
// Some("Hoyten")
}

Resources