This is something of a controversial topic, so let me start by explaining my use case, and then talk about the actual problem.
I find that for a bunch of unsafe things, it's important to make sure that you don't leak memory; this is actually quite easy to do if you start using transmute() and forget(). For example, passing a boxed instance to C code for an arbitrary amount of time, then fetching it back out and 'resurrecting it' by using transmute.
Imagine I have a safe wrapper for this sort of API:
trait Foo {}
struct CBox;
impl CBox {
/// Stores value in a bound C api, forget(value)
fn set<T: Foo>(value: T) {
// ...
}
/// Periodically call this and maybe get a callback invoked
fn poll(_: Box<Fn<(EventType, Foo), ()> + Send>) {
// ...
}
}
impl Drop for CBox {
fn drop(&mut self) {
// Safely load all saved Foo's here and discard them, preventing memory leaks
}
}
To test this is actually not leaking any memory, I want some tests like this:
#[cfg(test)]
mod test {
struct IsFoo;
impl Foo for IsFoo {}
impl Drop for IsFoo {
fn drop(&mut self) {
Static::touch();
}
}
#[test]
fn test_drops_actually_work() {
guard = Static::lock(); // Prevent any other use of Static concurrently
Static::reset(); // Set to zero
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(Static::get() == 2); // Assert that all expected drops were invoked
guard.release();
}
}
How can you create this type of static singleton object?
It must use a Semaphore style guard lock to ensure that multiple tests do not concurrently run, and then unsafely access some kind of static mutable value.
I thought perhaps this implementation would work, but practically speaking it fails because occasionally race conditions result in a duplicate execution of init:
/// Global instance
static mut INSTANCE_LOCK: bool = false;
static mut INSTANCE: *mut StaticUtils = 0 as *mut StaticUtils;
static mut WRITE_LOCK: *mut Semaphore = 0 as *mut Semaphore;
static mut LOCK: *mut Semaphore = 0 as *mut Semaphore;
/// Generate instances if they don't exist
unsafe fn init() {
if !INSTANCE_LOCK {
INSTANCE_LOCK = true;
INSTANCE = transmute(box StaticUtils::new());
WRITE_LOCK = transmute(box Semaphore::new(1));
LOCK = transmute(box Semaphore::new(1));
}
}
Note specifically that unlike a normal program where you can be certain that your entry point (main) is always running in a single task, the test runner in Rust does not offer any kind of single entry point like this.
Other, obviously, than specifying the maximum number of tasks; given dozens of tests, only a handful need to do this sort of thing, and it's slow and pointless to limit the test task pool to one just for this one case.
It looks like a use case for std::sync::Once:
use std::sync::{Once, ONCE_INIT};
static INIT: Once = ONCE_INIT;
Then in your tests call
INIT.doit(|| unsafe { init(); });
Once guarantees that your init will only be executed once, no matter how many times you call INIT.doit().
See also lazy_static, which makes things a little more ergonomic. It does essentially the same thing as a static Once for each variable, but wraps it in a type that implements Deref so that you can access it like a normal reference.
Usage looks like this (from the documentation):
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref HASHMAP: HashMap<u32, &'static str> = {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
m
};
static ref COUNT: usize = HASHMAP.len();
static ref NUMBER: u32 = times_two(21);
}
fn times_two(n: u32) -> u32 { n * 2 }
fn main() {
println!("The map has {} entries.", *COUNT);
println!("The entry for `0` is \"{}\".", HASHMAP.get(&0).unwrap());
println!("A expensive calculation on a static results in: {}.", *NUMBER);
}
Note that autoderef means that you don't even have to use * whenever you call a method on your static variable. The variable will be initialized the first time it's Deref'd.
However, lazy_static variables are immutable (since they're behind a reference). If you want a mutable static, you'll need to use a Mutex:
lazy_static! {
static ref VALUE: Mutex<u64>;
}
impl Drop for IsFoo {
fn drop(&mut self) {
let mut value = VALUE.lock().unwrap();
*value += 1;
}
}
#[test]
fn test_drops_actually_work() {
// Have to drop the mutex guard to unlock, so we put it in its own scope
{
*VALUE.lock().unwrap() = 0;
}
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(*VALUE.lock().unwrap() == 2); // Assert that all expected drops were invoked
}
If you're willing to use nightly Rust you can use SyncLazy instead of the external lazy_static crate:
#![feature(once_cell)]
use std::collections::HashMap;
use std::lazy::SyncLazy;
static HASHMAP: SyncLazy<HashMap<i32, String>> = SyncLazy::new(|| {
println!("initializing");
let mut m = HashMap::new();
m.insert(13, "Spica".to_string());
m.insert(74, "Hoyten".to_string());
m
});
fn main() {
println!("ready");
std::thread::spawn(|| {
println!("{:?}", HASHMAP.get(&13));
}).join().unwrap();
println!("{:?}", HASHMAP.get(&74));
// Prints:
// ready
// initializing
// Some("Spica")
// Some("Hoyten")
}
Related
I'm building an interpreter with a garbage collector. I want a thread-local nursery region, and a shared older region. I am having trouble setting up the nursery. I have:
const NurserySize : usize = 25000;
#[thread_local]
static mut NurseryMemory : [usize;NurserySize] = [0;NurserySize];
thread_local! {
static Nursery: AllocableRegion = AllocableRegion::makeLocal(unsafe{&mut NurseryMemory});
}
#[cfg(test)]
mod testMemory {
use super::*;
#[test]
fn test1() {
Nursery.with(|n| n.allocObject(10));
}
}
First question is why do I need the unsafe - NurseryMemory is thread local, so access can't be unsafe?
Second question is how can I actually use this? The code is at playground, but it doesn't compile and attempts I've made to fix it seem to make it worse.
1. Why is unsafe required to get a reference to a mutable ThreadLocal?
The same reason that you need unsafe for a normal mutable static,
you would be able to create aliasing mut pointers in safe code.
The following incorrectly creates two mutable references to the mutable thread local.
#![feature(thread_local)]
#[thread_local]
static mut SomeValue: Result<&str, usize> = Ok("hello world");
pub fn main() {
let first = unsafe {&mut SomeValue};
let second = unsafe {&mut SomeValue};
if let Ok(string) = first {
*second = Err(0); // should invalidate the string reference, but it doesn't
println!("{}", string) // as first and second are considered to be disjunct
}
}
first wouldn't even need to be a mutable reference for this to be a problem.
2. How to fix the code?
You could use a RefCell around the AllocatableRegion to dynamically enforce the borrowing of the inner value.
const NurserySize : usize = 25000;
#[thread_local]
static mut NurseryMemory : [usize;NurserySize] = [0;NurserySize];
thread_local! {
static Nursery: RefCell<AllocableRegion> = RefCell::new(AllocableRegion::makeLocal(unsafe{&mut NurseryMemory}));
}
#[cfg(test)]
mod testMemory {
use super::*;
#[test]
fn test1() {
Nursery.with(|n| n.borrow_mut().allocObject(10));
}
}
You don't need unsafe to make a mutable thread-local struct in rust. However, Nursery does need to be a RefCell. This is sufficient:
use std::cell::RefCell;
const NURSERY_SIZE: usize = 300;
thread_local! {
static NURSERY: RefCell<[usize; NURSERY_SIZE]> = RefCell::new([0; NURSERY_SIZE]);
}
#[cfg(test)]
mod test_memory {
use super::*;
#[test]
fn test1() {
NURSERY.with(|n| n.borrow_mut()[10] = 20);
}
}
Rust Playground link
I have an object that I know that is inside an Arc because all the instances are always Arced. I would like to be able to pass a cloned Arc of myself in a function call. The thing I am calling will call me back later on other threads.
In C++, there is a standard mixin called enable_shared_from_this. It enables me to do exactly this
class Bus : public std::enable_shared_from_this<Bus>
{
....
void SetupDevice(Device device,...)
{
device->Attach(shared_from_this());
}
}
If this object is not under shared_ptr management (the closest C++ has to Arc) then this will fail at run time.
I cannot find an equivalent.
EDIT:
Here is an example of why its needed. I have a timerqueue library. It allows a client to request an arbitrary closure to be run at some point in the future. The code is run on a dedicated thread. To use it you must pass a closure of the function you want to be executed later.
use std::time::{Duration, Instant};
use timerqueue::*;
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
// inline me keeper cos not on github
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
// -----------------------------------
struct Test {
data:String,
me: MeKeeper<Self>,
}
impl Test {
pub fn new() -> Arc<Test>{
let arc = Arc::new(Self {
me: MeKeeper::new(),
data: "Yo".to_string()
});
arc.me.save(&arc);
arc
}
fn task(&self) {
println!("{}", self.data);
}
// in real use case the TQ and a ton of other status data is passed in the new call for Test
// to keep things simple here the 'container' passes tq as an arg
pub fn do_stuff(&self, tq: &TimerQueue) {
// stuff includes a async task that must be done in 1 second
//.....
let me = self.me.get().clone();
tq.queue(
Box::new(move || me.task()),
"x".to_string(),
Instant::now() + Duration::from_millis(1000),
);
}
}
fn main() {
// in real case (PDP11 emulator) there is a Bus class owning tons of objects thats
// alive for the whole duration
let tq = Arc::new(TimerQueue::new());
let test = Test::new();
test.do_stuff(&*tq);
// just to keep everything alive while we wait
let mut input = String::new();
std::io::stdin().read_line(&mut input).unwrap();
}
cargo toml
[package]
name = "tqclient"
version = "0.1.0"
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
timerqueue = { git = "https://github.com/pm100/timerqueue.git" }
parking_lot = "0.11"
There is no way to go from a &self to the Arc that self is stored in. This is because:
Rust references have additional assumptions compared to C++ references that would make such a conversion undefined behavior.
Rust's implementation of Arc does not even expose the information necessary to determine whether self is stored in an Arc or not.
Luckily, there is an alternative approach. Instead of creating a &self to the value inside the Arc, and passing that to the method, pass the Arc directly to the method that needs to access it. You can do that like this:
use std::sync::Arc;
struct Shared {
field: String,
}
impl Shared {
fn print_field(self: Arc<Self>) {
let clone: Arc<Shared> = self.clone();
println!("{}", clone.field);
}
}
Then the print_field function can only be called on an Shared encapsulated in an Arc.
having found that I needed this three times in recent days I decided to stop trying to come up with other designs. Maybe poor data design as far as rust is concerned but I needed it.
Works by changing the new function of the types using it to return an Arc rather than a raw self. All my objects are arced anyway, before they were arced by the caller, now its forced.
mini util library called mekeeper
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
to use it
pub struct Test {
me: MeKeeper<Self>,
foo:i8,
}
impl Test {
pub fn new() -> Arc<Self> {
let arc = Arc::new(Test {
me: MeKeeper::new(),
foo:42
});
arc.me.save(&arc);
arc
}
}
now when an instance of Test wants to call a function that requires it to pass in an Arc it does:
fn nargle(){
let me = me.get();
Ooddle::fertang(me,42);// fertang needs an Arc<T>
}
the weak use is what the shared_from_this does so as to prevent refcount deadlocks, I stole that idea.
The unreachable path is safe because the only place that can call MeKeeper::get is the instance of T (Test here) that owns it and that call can only happen if the T instance is alive. Hence no none return from weak::upgrade
This is something of a controversial topic, so let me start by explaining my use case, and then talk about the actual problem.
I find that for a bunch of unsafe things, it's important to make sure that you don't leak memory; this is actually quite easy to do if you start using transmute() and forget(). For example, passing a boxed instance to C code for an arbitrary amount of time, then fetching it back out and 'resurrecting it' by using transmute.
Imagine I have a safe wrapper for this sort of API:
trait Foo {}
struct CBox;
impl CBox {
/// Stores value in a bound C api, forget(value)
fn set<T: Foo>(value: T) {
// ...
}
/// Periodically call this and maybe get a callback invoked
fn poll(_: Box<Fn<(EventType, Foo), ()> + Send>) {
// ...
}
}
impl Drop for CBox {
fn drop(&mut self) {
// Safely load all saved Foo's here and discard them, preventing memory leaks
}
}
To test this is actually not leaking any memory, I want some tests like this:
#[cfg(test)]
mod test {
struct IsFoo;
impl Foo for IsFoo {}
impl Drop for IsFoo {
fn drop(&mut self) {
Static::touch();
}
}
#[test]
fn test_drops_actually_work() {
guard = Static::lock(); // Prevent any other use of Static concurrently
Static::reset(); // Set to zero
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(Static::get() == 2); // Assert that all expected drops were invoked
guard.release();
}
}
How can you create this type of static singleton object?
It must use a Semaphore style guard lock to ensure that multiple tests do not concurrently run, and then unsafely access some kind of static mutable value.
I thought perhaps this implementation would work, but practically speaking it fails because occasionally race conditions result in a duplicate execution of init:
/// Global instance
static mut INSTANCE_LOCK: bool = false;
static mut INSTANCE: *mut StaticUtils = 0 as *mut StaticUtils;
static mut WRITE_LOCK: *mut Semaphore = 0 as *mut Semaphore;
static mut LOCK: *mut Semaphore = 0 as *mut Semaphore;
/// Generate instances if they don't exist
unsafe fn init() {
if !INSTANCE_LOCK {
INSTANCE_LOCK = true;
INSTANCE = transmute(box StaticUtils::new());
WRITE_LOCK = transmute(box Semaphore::new(1));
LOCK = transmute(box Semaphore::new(1));
}
}
Note specifically that unlike a normal program where you can be certain that your entry point (main) is always running in a single task, the test runner in Rust does not offer any kind of single entry point like this.
Other, obviously, than specifying the maximum number of tasks; given dozens of tests, only a handful need to do this sort of thing, and it's slow and pointless to limit the test task pool to one just for this one case.
It looks like a use case for std::sync::Once:
use std::sync::{Once, ONCE_INIT};
static INIT: Once = ONCE_INIT;
Then in your tests call
INIT.doit(|| unsafe { init(); });
Once guarantees that your init will only be executed once, no matter how many times you call INIT.doit().
See also lazy_static, which makes things a little more ergonomic. It does essentially the same thing as a static Once for each variable, but wraps it in a type that implements Deref so that you can access it like a normal reference.
Usage looks like this (from the documentation):
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref HASHMAP: HashMap<u32, &'static str> = {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
m
};
static ref COUNT: usize = HASHMAP.len();
static ref NUMBER: u32 = times_two(21);
}
fn times_two(n: u32) -> u32 { n * 2 }
fn main() {
println!("The map has {} entries.", *COUNT);
println!("The entry for `0` is \"{}\".", HASHMAP.get(&0).unwrap());
println!("A expensive calculation on a static results in: {}.", *NUMBER);
}
Note that autoderef means that you don't even have to use * whenever you call a method on your static variable. The variable will be initialized the first time it's Deref'd.
However, lazy_static variables are immutable (since they're behind a reference). If you want a mutable static, you'll need to use a Mutex:
lazy_static! {
static ref VALUE: Mutex<u64>;
}
impl Drop for IsFoo {
fn drop(&mut self) {
let mut value = VALUE.lock().unwrap();
*value += 1;
}
}
#[test]
fn test_drops_actually_work() {
// Have to drop the mutex guard to unlock, so we put it in its own scope
{
*VALUE.lock().unwrap() = 0;
}
{
let c = CBox;
c.set(IsFoo);
c.set(IsFoo);
c.poll(/*...*/);
}
assert!(*VALUE.lock().unwrap() == 2); // Assert that all expected drops were invoked
}
If you're willing to use nightly Rust you can use SyncLazy instead of the external lazy_static crate:
#![feature(once_cell)]
use std::collections::HashMap;
use std::lazy::SyncLazy;
static HASHMAP: SyncLazy<HashMap<i32, String>> = SyncLazy::new(|| {
println!("initializing");
let mut m = HashMap::new();
m.insert(13, "Spica".to_string());
m.insert(74, "Hoyten".to_string());
m
});
fn main() {
println!("ready");
std::thread::spawn(|| {
println!("{:?}", HASHMAP.get(&13));
}).join().unwrap();
println!("{:?}", HASHMAP.get(&74));
// Prints:
// ready
// initializing
// Some("Spica")
// Some("Hoyten")
}
I want to have a structure on the heap with two references; one for me and another from a closure. Note that the code is for the single-threaded case:
use std::rc::Rc;
#[derive(Debug)]
struct Foo {
val: u32,
}
impl Foo {
fn set_val(&mut self, val: u32) {
self.val = val;
}
}
impl Drop for Foo {
fn drop(&mut self) {
println!("we drop {:?}", self);
}
}
fn need_callback(mut cb: Box<FnMut(u32)>) {
cb(17);
}
fn create() -> Rc<Foo> {
let rc = Rc::new(Foo { val: 5 });
let weak_rc = Rc::downgrade(&rc);
need_callback(Box::new(move |x| {
if let Some(mut rc) = weak_rc.upgrade() {
if let Some(foo) = Rc::get_mut(&mut rc) {
foo.set_val(x);
}
}
}));
rc
}
fn main() {
create();
}
In the real code, need_callback saves the callback to some place, but before that may call cb as need_callback does.
The code shows that std::rc::Rc is not suitable for this task because foo.set_val(x) is never called; I have two strong references and Rc::get_mut gives None in this case.
What smart pointer with reference counting should I use instead of std::rc::Rc to make it possible to call foo.set_val? Maybe it is possible to fix my code and still use std::rc::Rc?
After some thinking, I need something like std::rc::Rc, but weak references should prevent dropping. I can have two weak references and upgrade them to strong when I need mutability.
Because it is a singled-threaded program, I will have only strong reference at a time, so everything will work as expected.
Rc (and its multithreaded counterpart Arc) only concern themselves with ownership. Instead of a single owner, there is now joint ownership, tracked at runtime.
Mutability is a different concept, although closely related to ownership: if you own a value, then you have the ability to mutate it. This is why Rc::get_mut only works when there is a single strong reference - it's the same as saying there is a single owner.
If you need the ability to divide mutability in a way that doesn't match the structure of the program, you can use tools like Cell or RefCell for single-threaded programs:
use std::cell::RefCell;
fn create() -> Rc<RefCell<Foo>> {
let rc = Rc::new(RefCell::new(Foo { val: 5 }));
let weak_rc = Rc::downgrade(&rc);
need_callback(move |x| {
if let Some(rc) = weak_rc.upgrade() {
rc.borrow_mut().set_val(x);
}
});
rc
}
Or Mutex, RwLock, or an atomic type in multithreaded contexts:
use std::sync::Mutex;
fn create() -> Rc<Mutex<Foo>> {
let rc = Rc::new(Mutex::new(Foo { val: 5 }));
let weak_rc = Rc::downgrade(&rc);
need_callback(move |x| {
if let Some(rc) = weak_rc.upgrade() {
if let Ok(mut foo) = rc.try_lock() {
foo.set_val(x);
}
}
});
rc
}
These tools all defer the check that there is only a single mutable reference to runtime, instead of compile time.
I'm writing a system where I have a collection of Objects, and each Object has a unique integral ID. Here's how I would do it in C++:
class Object {
public:
Object(): id_(nextId_++) { }
private:
int id_;
static int nextId_;
}
int Object::nextId_ = 1;
This is obviously not thread_safe, but if I wanted it to be, I could make nextId_ an std::atomic_int, or wrap a mutex around the nextId_++ expression.
How would I do this in (preferably safe) Rust? There's no static struct members, nor are global mutable variables safe. I could always pass nextId into the new function, but these objects are going to be allocated in a number of places, and I would prefer not to pipe the nextId number hither and yon. Thoughts?
Atomic variables can live in statics, so you can use it relatively straightforwardly (the downside is that you have global state).
Example code: (playground link)
use std::{
sync::atomic::{AtomicUsize, Ordering},
thread,
};
static OBJECT_COUNTER: AtomicUsize = AtomicUsize::new(0);
#[derive(Debug)]
struct Object(usize);
impl Object {
fn new() -> Self {
Object(OBJECT_COUNTER.fetch_add(1, Ordering::SeqCst))
}
}
fn main() {
let threads = (0..10)
.map(|_| thread::spawn(|| Object::new()))
.collect::<Vec<_>>();
for t in threads {
println!("{:?}", t.join().unwrap());
}
}
nor are global mutable variables safe
Your C++ example seems like it would have thread-safety issues, but I don't know enough C++ to be sure.
However, only unsynchronized global mutable variables are trouble. If you don't care about cross-thread issues, you can use a thread-local:
use std::cell::Cell;
#[derive(Debug)]
struct Monster {
id: usize,
health: u8,
}
thread_local!(static MONSTER_ID: Cell<usize> = Cell::new(0));
impl Monster {
fn new(health: u8) -> Monster {
MONSTER_ID.with(|thread_id| {
let id = thread_id.get();
thread_id.set(id + 1);
Monster { id, health }
})
}
}
fn main() {
let gnome = Monster::new(41);
let troll = Monster::new(42);
println!("gnome {:?}", gnome);
println!("troll {:?}", troll);
}
If you do want something that works better with multiple threads, check out bluss' answer, which shows how to use an atomic variable.