Rust fast stack implementation without unnecessary memset? - rust

I need a fast stack in Rust. Millions of these need to be created/destroyed per second and each of them need only a fixed depth. I'm trying to squeeze as much speed as I can. I came up with the following (basically textbook stack implementation):
const L: usize = 1024;
pub struct Stack {
xs: [(u64, u64, u64, u64); L],
sz: usize
}
impl Stack {
pub fn new() -> Self {
Self { xs: [(0, 0 ,0, 0); L], sz: 0 }
}
pub fn push(&mut self, item: (u64, u64, u64, u64)) -> bool {
if (self.sz + 1) <= L {
self.xs[self.sz] = item;
self.sz += 1;
true
} else {
false
}
}
pub fn pop(&mut self) -> Option<(u64, u64, u64, u64)> {
(self.sz > 0).then(|| {
self.sz -= 1;
self.xs[self.sz]
})
}
}
The problem is memset, which is unnecessary. So I tried to get rid of it:
pub fn new2() -> Self {
let xs = std::array::from_fn(|_| unsafe { MaybeUninit::uninit().assume_init() });
Self { xs, sz: 0 }
}
This gets rid of the memset, but now I have a warning:
|
18 | let xs = std::array::from_fn(|_| unsafe { MaybeUninit::uninit().assume_init() });
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| this code causes undefined behavior when executed
| help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
|
= note: integers must not be uninitialized
= note: `#[warn(invalid_value)]` on by default
If uninitialized integers cause undefined behavior, is it not possible to create this kind of stack, where the logic of the stack guarantees proper behavior and I avoid unnecessary memory operations?

You need to use MaybeUninit all the way through. Change your array to an array of MaybeUninits:
use std::mem::MaybeUninit;
const L: usize = 1024;
pub struct Stack {
xs: [MaybeUninit<(u64, u64, u64, u64)>; L],
sz: usize
}
// From standard library
// https://doc.rust-lang.org/stable/src/core/mem/maybe_uninit.rs.html#350-353
#[must_use]
#[inline(always)]
pub const fn uninit_array<const N: usize, T>() -> [MaybeUninit<T>; N] {
// SAFETY: An uninitialized `[MaybeUninit<_>; LEN]` is valid.
unsafe { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() }
}
impl Stack {
pub fn new() -> Self {
Self { xs: uninit_array(), sz: 0 }
}
pub fn push(&mut self, item: (u64, u64, u64, u64)) -> bool {
if (self.sz + 1) <= L {
self.xs[self.sz].write(item);
self.sz += 1;
true
} else {
false
}
}
pub fn pop(&mut self) -> Option<(u64, u64, u64, u64)> {
(self.sz > 0).then(|| {
self.sz -= 1;
// Safety: The value has been initialized
unsafe {
self.xs[self.sz].assume_init()
}
})
}
}

Related

Cannot return reference to temporary value with RwLock and iterators

I haven't found an answer to this in other questions.
I have reduced my problem to the following:
use std::sync::RwLock;
pub fn main() {
iter_lock().for_each(|v| {
println!("{}", v);
});
}
fn get_lock<'a>() -> &'a RwLock<Vec<u32>> {
static mut lock: RwLock<Vec<u32>> = RwLock::new(Vec::new());
unsafe { &lock }
}
fn iter_lock<'a>() -> impl std::iter::Iterator<Item = &'a u32> {
get_lock().read().unwrap().iter()
}
playground
The code above will not compile and give the following error:
error[E0515]: cannot return reference to temporary value
--> src/main.rs:15:5
|
15 | get_lock().read().unwrap().iter()
| --------------------------^^^^^^^
| |
| returns a reference to data owned by the current function
| temporary value created here
|
= help: use `.collect()` to allocate the iterator
Note that the static mut is not necessary in the code above, but I need it because I need to define a variable inside of an impl block.
I need to return an iterator, not a Vec because I am trying to avoid any allocations, and this function will always be used to iterate.
How can I solve this issue? I'm not afraid of using unsafe code, so unsafe suggestions are also welcome.
You can try something like this:
use std::sync::{RwLock, RwLockReadGuard};
pub fn main() {
let data = Data::new(&[1, 2, 3]);
data.iter().for_each(|x| println!("{:?}", x));
}
struct Data {
inner: RwLock<Vec<u32>>,
}
impl Data {
fn new(vec: &[u32]) -> Self {
Self {
inner: RwLock::new(vec.to_vec()),
}
}
fn iter(&self) -> Iter<'_> {
let d = self.inner.read().unwrap();
Iter::new(d)
}
}
struct Iter<'a> {
inner: RwLockReadGuard<'a, Vec<u32>>,
current_index: usize,
}
impl<'a> Iter<'a> {
pub fn new(inner: RwLockReadGuard<'a, Vec<u32>>) -> Iter<'a> {
Self {
inner,
current_index: 0,
}
}
}
impl Iterator for Iter<'_> {
type Item = u32;
fn next(&mut self) -> Option<Self::Item> {
if self.current_index >= self.inner.len() {
return None;
}
let item = &self.inner[self.current_index];
self.current_index += 1;
Some(*item)
}
}

Why does peek return unexpected results in my circular linked list?

I'm attempting to implement a circular linked list in Rust. I've read Too Many Linked Lists and multiple resources on the Rust Pin type, and I've attempted to build a circular linked list from scratch and I'm getting baffled at what the compiler spits out.
So the high-level concept is I want a linked list struct and node struct defined as followed:
use std::pin::Pin;
use std::marker::PhantomPinned;
use std::ptr;
#[derive(Debug)]
struct Node<T> {
value: T,
next: *mut Node<T>,
_pin: PhantomPinned,
}
#[derive(Debug)]
pub struct CircularLinkedList<T> {
last: *mut Node<T>,
size: usize
}
For the first node to be pushed into the list, I want node.next to point to the node struct itself. i.e. I want the logic for pushing a new node on the list to follow the following Java implementation
public void push(Item item) {
Node node = new Node();
node.item = item;
if (isEmpty()) {
last = node;
last.next = last;
} else {
node.next = last.next;
last.next = node;
last = node;
}
size++;
}
The Rust implementation I have so far is:
impl<T> CircularLinkedList<T> where T: std::fmt::Debug {
pub fn new() -> Self {
CircularLinkedList { last: ptr::null_mut(), size: 0 }
}
pub fn enqueue(&mut self, value: T) {
self.size = self.size + 1;
if self.last.is_null() {
let new_last = Node {
value,
next: ptr::null_mut(),
_pin: PhantomPinned,
};
// Pin new node
let mut pinned_new_last = Box::pin(new_last);
// Get raw pointer to new node
let ptr = pinned_new_last.as_ref().get_ref() as *const Node<T> as *mut Node<T>;
// Assign next new_last.next value of 'ptr'
unsafe {
let mut_ref: Pin<&mut Node<T>> = Pin::as_mut(&mut pinned_new_last);
Pin::get_unchecked_mut(mut_ref).next = ptr;
}
self.last = ptr;
}
else {
println!("else");
}
}
pub fn peek(&self) -> Option<&T> {
unsafe {
let a = &(*self.last).value;
Some(a)
}
}
}
In the following main() function, I'm baffled as to why peek() doesn't return Some(1) and miri gives me a memory error
fn main () {
let mut list = LinkedList::CircularLinkedList::new();
list.enqueue(1);
println!("{:?}", list.peek());
// Some(2043) !????????, expecting Some(1)
}
Warning: You should not write unsafe code unless you completely understand all details of it! (experimentation is fine, though).
You got a Use-After-Free. At the end of enqueue(), the destructor of Box is executed for pinned_new_last and frees its memory. Now your linked list points to freed node. Adding std::mem::forget() is enough in this case:
pub fn enqueue(&mut self, value: T) {
self.size = self.size + 1;
if self.last.is_null() {
let new_last = Node {
value,
next: ptr::null_mut(),
_pin: PhantomPinned,
};
// Pin new node
let mut pinned_new_last = Box::pin(new_last);
// Get raw pointer to new node
let ptr = pinned_new_last.as_ref().get_ref() as *const Node<T> as *mut Node<T>;
// Assign next new_last.next value of 'ptr'
unsafe {
let mut_ref: Pin<&mut Node<T>> = Pin::as_mut(&mut pinned_new_last);
Pin::get_unchecked_mut(mut_ref).next = ptr;
}
std::mem::forget(pinned_new_last);
self.last = ptr;
} else {
println!("else");
}
}
Playground.
Of course it leaks the memory. You have to take care to free it in the destructor, something like:
impl<T> Drop for CircularLinkedList<T> {
fn drop(&mut self) {
let mut last = self.last;
let size = self.size;
// Panic safety
self.size = 0;
self.last = std::ptr::null_mut();
for _ in 0..size {
let current = last;
unsafe {
last = (*last).next;
Box::from_raw(current); // Drop it
}
}
}
}
Also note that while storing the node as *mut pointer is sound, actually mutating it is UB as it was derived from a shared reference, so this is really a code smell.
I've taken some time to learn more about Rust and was able to come up with the following implemention that doesn't throw any miri errors
pub mod LinkedList {
use std::marker::PhantomPinned;
use std::ptr;
use std::mem::ManuallyDrop;
// _pin -> make the struct implement !Unpin, and make it pinnable
#[derive(Debug)]
struct Node<T> where T: std::fmt::Debug {
value: T,
next: *mut Node<T>,
_pin: PhantomPinned,
}
#[derive(Debug)]
pub struct CircularLinkedList<T> where T: std::fmt::Debug {
last: *mut Node<T>,
size: usize
}
impl<T> CircularLinkedList<T> where T: std::fmt::Debug {
pub fn new() -> Self {
CircularLinkedList { last: ptr::null_mut(), size: 0 }
}
pub fn enqueue(&mut self, value: T) {
self.size = self.size + 1;
// Create a new Node struct, assign it to new_last variable
// Struct - no guarantees on data layout
let new_last = Node {
value,
next: ptr::null_mut(),
_pin: PhantomPinned,
};
// Construct a new <Pin<Box<T>>>
// new_last is pinned in heap, and can't move from this address
// pinned_new_last is Pin<Box<T>> type - a smart pointer
// Wrap in ManuallyDrop - 0-cost wrapper that stop compiler from calling destructor for pinned_new_last
// When Pin<Box<T>> dropped, both the pointer and pointee dropped, so need to clean this up in the drop
let mut pinned_new_last = ManuallyDrop::new(Box::pin(new_last));
if self.last.is_null() {
unsafe {
// as_mut gets Pin<Box<Node>> => Pin<&mut Node>
// get_unchecked_mut gets Pin<&mut Node> => &mut Node
// We coerce it into a *mut T
pinned_new_last.as_mut().get_unchecked_mut().next = pinned_new_last.as_mut().get_unchecked_mut() as *mut Node<T>;
self.last = pinned_new_last.as_mut().get_unchecked_mut() as *mut Node<T>;
}
} else {
unsafe {
pinned_new_last.as_mut().get_unchecked_mut().next = (*self.last).next;
(*self.last).next = pinned_new_last.as_mut().get_unchecked_mut() as *mut Node<T>;
self.last = pinned_new_last.as_mut().get_unchecked_mut() as *mut Node<T>;
}
}
}
pub fn dequeue(&mut self) -> Option<T> {
if self.last.is_null() {None}
else {
self.size = self.size - 1;
unsafe {
// This fixed the memory leak!!!
// Let Box destructor call destructor of T and free allocated memory
let last = Box::from_raw(self.last);
if self.size == 0 {self.last = ptr::null_mut();}
else {self.last = last.next;}
let value = (*last).value;
Some(value)
}
}
}
pub fn peek(&self) -> Option<&T> {
if self.last.is_null() {None}
else {
unsafe {
// Deref self.last (it is a *mut Node)
// Get value field
// Cast entire thing to &
Some(&(*self.last).value)
}
}
}
}
impl<T> Drop for CircularLinkedList<T> where T: std::fmt::Debug {
fn drop(&mut self) {
while let Some(_) = self.dequeue() {}
}
}
}
fn main () {
let mut list = LinkedList::CircularLinkedList::new();
list.enqueue(1);
list.enqueue(2);
list.dequeue();
list.dequeue();
println!("{:?}", list.peek());
}

how to solve cyclic-dependency & Iterator lifetime problem?

Rustaceans. when I start to write a BloomFilter example in rust. I found I have serveral problems have to solve. I struggle to solve them but no progress in a day. I need help, any suggestion will help me a lot, Thanks.
Problems
How to solve lifetime when pass a Iterator into another function?
// let bits = self.hash(value); // how to solve such lifetime error without use 'static storage?
// Below is a workaround code but need to computed in advanced.
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits);
How to solve cyclic-dependency between struts without modify lower layer code, e.g: bloom_filter ?
// cyclic-dependency:
// RedisCache -> BloomFilter -> Storage
// | ^
// ------------<impl>------------
//
// v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
Since rust does not have null value, How can I solve the cyclic-dependency initialization without any stubs?
let mut cache = RedisCache(
Client::open("redis://localhost").unwrap(),
// I found can use Weak::new() to solve it,but need to downgrade a Rc reference.
// v-- need a BloomFilter stub to create RedisCache
RefCell::new(BloomFilter::new()),
);
Code
#![allow(unused)]
mod bloom_filter {
use std::{hash::Hash, marker::PhantomData};
pub type BitsIter = Box<dyn Iterator<Item = u64>>;
pub trait Storage {
fn set(&mut self, bits: BitsIter);
fn contains_all(&self, bits: BitsIter) -> bool;
}
pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);
impl<T: Hash> BloomFilter<T> {
pub fn new() -> BloomFilter<T> {
return Self::by(Box::new(ArrayStorage([0; 5000])));
struct ArrayStorage<const N: usize>([u8; N]);
impl<const N: usize> Storage for ArrayStorage<N> {
fn set(&mut self, bits: BitsIter) {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.for_each(|index| self.0[index] = 1);
}
fn contains_all(&self, bits: BitsIter) -> bool {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.all(|index| self.0[index] == 1)
}
}
}
pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
BloomFilter(storage, PhantomData)
}
pub fn add(&mut self, value: T) {
// let bits = self.hash(value); // how to solve such lifetime error?
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits);
}
pub fn contains(&self, value: T) -> bool {
// lifetime problem same as Self::add(T)
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.contains_all(bits)
}
fn hash<'a, H: Hash + 'a>(&self, _value: H) -> Box<dyn Iterator<Item = u64> + 'a> {
todo!()
}
}
}
mod spi {
use super::bloom_filter::*;
use redis::{Client, Commands, RedisResult};
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
pub struct RedisCache<'a>(Client, RefCell<BloomFilter<&'a str>>);
impl<'a> RedisCache<'a> {
pub fn new() -> RedisCache<'a> {
let mut cache = RedisCache(
Client::open("redis://localhost").unwrap(),
// v-- need a BloomFilter stub to create RedisCache
RefCell::new(BloomFilter::new()),
);
// v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
return cache;
}
pub fn get(&mut self, key: &str, load_value: fn() -> Option<String>) -> Option<String> {
let filter = self.1.borrow();
if filter.contains(key) {
if let Ok(value) = self.0.get::<&str, String>(key) {
return Some(value);
}
if let Some(actual_value) = load_value() {
let _: () = self.0.set(key, &actual_value).unwrap();
return Some(actual_value);
}
}
return None;
}
}
impl<'a> Storage for RedisCache<'a> {
fn set(&mut self, bits: BitsIter) {
todo!()
}
fn contains_all(&self, bits: BitsIter) -> bool {
todo!()
}
}
}
Updated
First, thanks #Colonel Thirty Two give me a lot of information that I haven't mastered and help me fixed the problem of the iterator lifetime.
The cyclic-dependency I have solved by break the responsibility of the Storage into another struct RedisStorage without modify the bloom_filter module, but make the example bloated. Below is their relationships:
RedisCache -> BloomFilter -> Storage <---------------
| |
|-------> redis::Client <- RedisStorage ---<impl>---
I realized the ownership & lifetime system is not only used by borrow checker, but also Rustaceans need a bigger front design to obey the rules than in a GC language, e.g: java. Am I right?
Final Code
mod bloom_filter {
use std::{
hash::{Hash, Hasher},
marker::PhantomData,
};
pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;
pub trait Storage {
fn set(&mut self, bits: BitsIter);
fn contains_all(&self, bits: BitsIter) -> bool;
}
pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);
impl<T: Hash> BloomFilter<T> {
#[allow(unused)]
pub fn new() -> BloomFilter<T> {
return Self::by(Box::new(ArrayStorage([0; 5000])));
struct ArrayStorage<const N: usize>([u8; N]);
impl<const N: usize> Storage for ArrayStorage<N> {
fn set(&mut self, bits: BitsIter) {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.for_each(|index| self.0[index] = 1);
}
fn contains_all(&self, bits: BitsIter) -> bool {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.all(|index| self.0[index] == 1)
}
}
}
pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
BloomFilter(storage, PhantomData)
}
pub fn add(&mut self, value: T) {
self.0.set(self.hash(value));
}
pub fn contains(&self, value: T) -> bool {
self.0.contains_all(self.hash(value))
}
fn hash<'a, H: Hash + 'a>(&self, value: H) -> BitsIter<'a> {
Box::new(
[3, 11, 31, 71, 131]
.into_iter()
.map(|salt| SimpleHasher(0, salt))
.map(move |mut hasher| hasher.hash(&value)),
)
}
}
struct SimpleHasher(u64, u64);
impl SimpleHasher {
fn hash<H: Hash>(&mut self, value: &H) -> u64 {
value.hash(self);
self.finish()
}
}
impl Hasher for SimpleHasher {
fn finish(&self) -> u64 {
self.0
}
fn write(&mut self, bytes: &[u8]) {
self.0 += bytes.iter().fold(0u64, |acc, k| acc * self.1 + *k as u64)
}
}
}
mod spi {
use super::bloom_filter::*;
use redis::{Client, Commands};
use std::{cell::RefCell, rc::Rc};
pub struct RedisCache<'a>(Rc<RefCell<Client>>, BloomFilter<&'a str>);
impl<'a> RedisCache<'a> {
pub fn new(client: Rc<RefCell<Client>>, filter: BloomFilter<&'a str>) -> RedisCache<'a> {
RedisCache(client, filter)
}
pub fn get<'f>(
&mut self,
key: &str,
load_value: fn() -> Option<&'f str>,
) -> Option<String> {
if self.1.contains(key) {
let mut redis = self.0.as_ref().borrow_mut();
if let Ok(value) = redis.get::<&str, String>(key) {
return Some(value);
}
if let Some(actual_value) = load_value() {
let _: () = redis.set(key, &actual_value).unwrap();
return Some(actual_value.into());
}
}
return None;
}
}
struct RedisStorage(Rc<RefCell<Client>>);
const BLOOM_FILTER_KEY: &str = "bloom_filter";
impl Storage for RedisStorage {
fn set(&mut self, bits: BitsIter) {
bits.for_each(|slot| {
let _: bool = self
.0
.as_ref()
.borrow_mut()
.setbit(BLOOM_FILTER_KEY, slot as usize, true)
.unwrap();
})
}
fn contains_all(&self, mut bits: BitsIter) -> bool {
bits.all(|slot| {
self.0
.as_ref()
.borrow_mut()
.getbit(BLOOM_FILTER_KEY, slot as usize)
.unwrap()
})
}
}
#[test]
fn prevent_cache_penetration_by_bloom_filter() {
let client = Rc::new(RefCell::new(Client::open("redis://localhost").unwrap()));
redis::cmd("FLUSHDB").execute(&mut *client.as_ref().borrow_mut());
let mut filter: BloomFilter<&str> = BloomFilter::by(Box::new(RedisStorage(client.clone())));
assert!(!filter.contains("Rust"));
filter.add("Rust");
assert!(filter.contains("Rust"));
let mut cache = RedisCache::new(client, filter);
assert_eq!(
cache.get("Rust", || Some("System Language")),
Some("System Language".to_string())
);
assert_eq!(
cache.get("Rust", || panic!("must never be called after cached")),
Some("System Language".to_string())
);
assert_eq!(
cache.get("Go", || panic!("reject to loading `Go` from external storage")),
None
);
}
}
pub type BitsIter = Box<dyn Iterator<Item = u64>>;
In this case, the object in the box must be valid for the 'static lifetime. This isn't the case for the iterator returned by hash - its limited to the lifetime of self.
Try replacing with:
pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;
Or using generics instead of boxed trait objects.
So your RedisClient needs a BloomFilter, but the BloomFilter also needs the RedisClient?
Your BloomFilter should not use the RedisCache that itself uses the BloomFilter - that's a recipe for infinitely recursing calls (how do you know what calls to RedisCache::add should update the bloom filter and which calls are from the bloom filter?).
If you really have to, you need some form of shared ownership, like Rc or Arc. Your BloomFilter will also need to use a weak reference, or else the two objects will refer to each other and will never free.

Value referencing data owned by the current function

Here's my code:
struct Something<'a> {
val: u32,
another: &'a AnotherThing,
}
struct AnotherThing {
val: u32,
}
impl Default for AnotherThing {
fn default() -> Self {
Self {
val: 2,
}
}
}
trait Anything {
fn new(val: u32) -> Self;
}
impl Anything for Something<'_> {
fn new(val: u32) -> Self {
Self {
val,
another: &AnotherThing::default(),
}
}
}
fn main() {
let _ = Something::new(1);
}
It doesn't compile because:
Compiling playground v0.0.1 (/playground)
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:24:9
|
24 | / Self {
25 | | val,
26 | | another: &AnotherThing::default(),
| | ----------------------- temporary value created here
27 | | }
| |_________^ returns a value referencing data owned by the current function
I understand the problem but I don't know how to fix it. If it's not possible to use the Default trait for this case, how can I deal with the function ownership. Below a simpler example:
struct Something<'a> {
val: u32,
another: &'a AnotherThing,
}
struct AnotherThing {
val: u32,
}
trait Anything {
fn new(val: u32) -> Self;
}
impl Anything for Something<'_> {
fn new(val: u32) -> Self {
let at = AnotherThing { val : 2 };
Self {
val,
another: &at,
}
}
}
fn main() {
let _ = Something::new(1);
}
If I had another: &AnotherThing { val : 2 } instead of another: &at it would work. If I want the another attribute to be a reference and get the value from a function, how can I do it?
You can do like this
#[derive(Default)]
struct Something<'a> {
val: u32,
another: &'a AnotherThing,
}
struct AnotherThing {
val: u32,
}
impl<'a> Default for &'a AnotherThing {
fn default() -> &'a AnotherThing {
&AnotherThing {
val: 3,
}
}
}
trait Anything {
fn new(val: u32) -> Self;
}
impl Anything for Something<'_> {
fn new(val: u32) -> Self {
Self {
val,
..Default::default()
}
}
}
Another option is to create a const item, of which you can create a reference with 'static lifetime, thus binding to any 'a:
struct Something<'a> {
val: u32,
another: &'a AnotherThing,
}
struct AnotherThing {
val: u32,
}
const ANOTHER_THING_DEFAULT: AnotherThing = AnotherThing { val: 3 };
trait Anything {
fn new(val: u32) -> Self;
}
impl Anything for Something<'_> {
fn new(val: u32) -> Self {
Self {
val,
another: &ANOTHER_THING_DEFAULT,
}
}
}

How to create a single threaded singleton in Rust?

I'm currently trying to wrap a C library in rust that has a few requirements. The C library can only be run on a single thread, and can only be initialized / cleaned up once on the same thread. I want something something like the following.
extern "C" {
fn init_lib() -> *mut c_void;
fn cleanup_lib(ctx: *mut c_void);
}
// This line doesn't work.
static mut CTX: Option<(ThreadId, Rc<Context>)> = None;
struct Context(*mut c_void);
impl Context {
fn acquire() -> Result<Rc<Context>, Error> {
// If CTX has a reference on the current thread, clone and return it.
// Otherwise initialize the library and set CTX.
}
}
impl Drop for Context {
fn drop(&mut self) {
unsafe { cleanup_lib(self.0); }
}
}
Anyone have a good way to achieve something like this? Every solution I try to come up with involves creating a Mutex / Arc and making the Context type Send and Sync which I don't want as I want it to remain single threaded.
A working solution I came up with was to just implement the reference counting myself, removing the need for Rc entirely.
#![feature(once_cell)]
use std::{error::Error, ffi::c_void, fmt, lazy::SyncLazy, sync::Mutex, thread::ThreadId};
extern "C" {
fn init_lib() -> *mut c_void;
fn cleanup_lib(ctx: *mut c_void);
}
#[derive(Debug)]
pub enum ContextError {
InitOnOtherThread,
}
impl fmt::Display for ContextError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match *self {
ContextError::InitOnOtherThread => {
write!(f, "Context already initialized on a different thread")
}
}
}
}
impl Error for ContextError {}
struct StaticPtr(*mut c_void);
unsafe impl Send for StaticPtr {}
static CTX: SyncLazy<Mutex<Option<(ThreadId, usize, StaticPtr)>>> =
SyncLazy::new(|| Mutex::new(None));
pub struct Context(*mut c_void);
impl Context {
pub fn acquire() -> Result<Context, ContextError> {
let mut ctx = CTX.lock().unwrap();
if let Some((id, ref_count, ptr)) = ctx.as_mut() {
if *id == std::thread::current().id() {
*ref_count += 1;
return Ok(Context(ptr.0));
}
Err(ContextError::InitOnOtherThread)
} else {
let ptr = unsafe { init_lib() };
*ctx = Some((std::thread::current().id(), 1, StaticPtr(ptr)));
Ok(Context(ptr))
}
}
}
impl Drop for Context {
fn drop(&mut self) {
let mut ctx = CTX.lock().unwrap();
let (_, ref_count, ptr) = ctx.as_mut().unwrap();
*ref_count -= 1;
if *ref_count == 0 {
unsafe {
cleanup_lib(ptr.0);
}
*ctx = None;
}
}
}
I think the most 'rustic' way to do this is with std::sync::mpsc::sync_channel and an enum describing library operations.
The only public-facing elements of this module are launch_lib(), the SafeLibRef struct (but not its internals), and the pub fn that are part of the impl SafeLibRef.
Also, this example strongly represents the philosophy that the best way to deal with global state is to not have any.
I have played fast and loose with the Result::unwrap() calls. It would be more responsible to handle error conditions better.
use std::sync::{ atomic::{ AtomicBool, Ordering }, mpsc::{ SyncSender, Receiver, sync_channel } };
use std::ffi::c_void;
extern "C" {
fn init_lib() -> *mut c_void;
fn do_op_1(ctx: *mut c_void, a: u16, b: u32, c: u64) -> f64;
fn do_op_2(ctx: *mut c_void, a: f64) -> bool;
fn cleanup_lib(ctx: *mut c_void);
}
enum LibOperation {
Op1(u16,u32,u64,SyncSender<f64>),
Op2(f64, SyncSender<bool>),
Terminate(SyncSender<()>),
}
#[derive(Clone)]
pub struct SafeLibRef(SyncSender<LibOperation>);
fn lib_thread(rx: Receiver<LibOperation>) {
static LIB_INITIALIZED: AtomicBool = AtomicBool::new(false);
if LIB_INITIALIZED.compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst).is_err() {
panic!("Tried to double-initialize library!");
}
let libptr = unsafe { init_lib() };
loop {
let op = rx.recv();
if op.is_err() {
unsafe { cleanup_lib(libptr) };
break;
}
match op.unwrap() {
LibOperation::Op1(a,b,c,tx_res) => {
let res: f64 = unsafe { do_op_1(libptr, a, b, c) };
tx_res.send(res).unwrap();
},
LibOperation::Op2(a, tx_res) => {
let res: bool = unsafe { do_op_2(libptr, a) };
tx_res.send(res).unwrap();
}
LibOperation::Terminate(tx_res) => {
unsafe { cleanup_lib(libptr) };
tx_res.send(()).unwrap();
break;
}
}
}
}
/// This needs to be called no more than once.
/// The resulting SafeLibRef can be cloned and passed around.
pub fn launch_lib() -> SafeLibRef {
let (tx,rx) = sync_channel(0);
std::thread::spawn(|| lib_thread(rx));
SafeLibRef(tx)
}
// This is the interface that most of your code will use
impl SafeLibRef {
pub fn op_1(&self, a: u16, b: u32, c: u64) -> f64 {
let (res_tx, res_rx) = sync_channel(1);
self.0.send(LibOperation::Op1(a, b, c, res_tx)).unwrap();
res_rx.recv().unwrap()
}
pub fn op_2(&self, a: f64) -> bool {
let (res_tx, res_rx) = sync_channel(1);
self.0.send(LibOperation::Op2(a, res_tx)).unwrap();
res_rx.recv().unwrap()
}
pub fn terminate(&self) {
let (res_tx, res_rx) = sync_channel(1);
self.0.send(LibOperation::Terminate(res_tx)).unwrap();
res_rx.recv().unwrap();
}
}

Resources