Lock-free sharing of data with a single mutator in Rust - rust

Context
As an exercise, I'm attempting to re-implement https://github.com/urbanairship/sarlacc-pit in Rust.
Sarlacc-pit mirrors an external source into an in memory data structure (Set/Map/Etc). Clients of the library operate purely in terms of the collection with no knowledge that its contents are changing under the hood.
Problem
Clients need to keep an immutable reference to the collection, while a single update thread keeps a mutable reference in order to update its contents. This directly violates rust's guarantees but should be safe in this case with the following rough structure:
pub struct Map<K, V> {
delegate: SomeReferenceType<Arc<HashMap<K, V>>>
}
impl<K, V> Map<K, V> {
pub fn get(&self, k: &K) -> Option<&V> {
self.delegate.borrow().get(k)
}
fn update(&mut self, new_delegate: HashMap<K, V>) {
self.delegate.set(Arc::new(new_delegate));
}
}
pub struct UpdateService<K, V> {
collection: Arc<Map<K, V>>
}
impl<K, V> UpdateService<K ,V> {
pub fn get_collection(&self) -> Arc<Map<K, V>> {
collection.clone()
}
// Called from a thread run on a cadence
fn update_collection(&mut self) {
let new_value = /* fetch and process value from backing store */
self.collection.borrow_mut().update(new_value);
}
}
This doesn't compile for a number of reasons, I realize.
The core of the question is: What should the type of SomeReferenceType be to allow to allow these mutable and immutable references to co-exist without something like an ReadWriteLock? Am I missing something?

If update_collection is called from another thread, what guarantees do you have that the main thread isn't in the middle of reading from the collection at the same time? With the information you have provided, you need something like a RwLock or Mutex to make this safe.
You have asserted that you believe this to be safe. If there is an undisclosed constraint on your system that allows you to guarantee that a simultaneous read and write cannot happen then there may be a way to incorporate that into the types. But a better answer can't be given otherwise.
For example, if updates are infrequent, it might satisfy your use case to use three copies of your collection, and swap them over after each modification:
One for reading,
One for writing,
One for transitioning clients while swapping the collections
This would not be a "beginner" level Rust project though.

Related

Rust (Newbie): Read and write (mutable) access to the same underlying array from multiple threads for in-memory database?

I'm considering making Rust my primary development language instead of Go, so I've been reading the docs.
One thing that my work requires a lot of is reading and writing from multiple threads into a custom database that is stored in memory as a single massive array and could be 32GB in size. The database functions are designed to avoid race conditions, and mutexes or atomic primitives are used where necessary.
The Rust doc implies that an array can only be either mutable (writeable) on a single thread, or non-mutable (read only) by many threads, and cannot be writable on one and readable on another simultaneously. How then can an in-memory database be used...? It doesn't make sense!
Do I have this wrong?
Forgive me that I can't give any specific Rust example, because I'm still learning the Rust syntax, and to be honest I need to know the answer to this question before using all my time learning a language I will be unable to use.
There is an unsafe way to do it, namely using the UnsafeCell, which returns mutable raw pointers to its interior data. These are not tracked by the Borrow-Checker and so you have to make sure the invariants are upheld
pub struct UnsafeVec<T> {
data: UnsafeCell<Vec<T>>
}
impl<T> UnsafeVec<T> {
pub fn new() -> Self {
UnsafeVec { data: UnsafeCell::new(Vec::new()) }
}
pub fn push(&mut self, arg: T) {
self.data.get_mut().push(arg)
}
pub unsafe fn index_mut(&self, index: usize) -> &mut T {
&mut (*self.data.get())[index]
}
}
unsafe impl<T> Sync for UnsafeVec<T> {}
unsafe impl<T> Send for UnsafeVec<T> {}
which allows you to write
fn main() {
let mut unsafe_vec = UnsafeVec::<i32>::new();
unsafe_vec.push(15);
unsafe {
*unsafe_vec.index_mut(0) += 1;
}
}
The method index_mut allows to modify the interior vector with an immutable reference.
The Sync and Send traits signal the compiler that the type can be safely shared across threads, which is only true if you prevent possible data races manually!.
Again, this is an unsafe option that requires you to uphold the invariants yourself.

Having trouble with proper design pattern

I'm trying to write a simple game engine, but the normal patterns I would use to do this don't work here due to the strict borrowing rules.
There is a World struct which owns a collection of Object trait objects. We can think of these as moving physical objects in the game engine. World is responsible for calling update and draw on each of these during each game tick.
The World struct also owns a collection of Event trait objects. Each of these events encapsulates an arbitrary piece of code that modifies the objects during the game ticks.
Ideally, the Event trait object has a single method do_event that takes no arguments that can be called by World during each game tick. The problem is that a particular Event needs to have mutable references to the objects that it modifies but these objects are owned by World. We could have World pass mutable references to the necessary objects when it calls do_event for each event, but how does it know which objects to pass to each event object?
If this were C++ or Python, when constructing a particular Event, I would pass the target of the event to the constructor where it would be saved and then used during do_event. Obviously, this doesn't work in Rust because the targets are all owned by World so we cannot store mutable references elsewhere.
I could add some ID system and a lookup table so World knows which objects to pass to which Events but this is messy and costly. I suspect my entire way of approaching this class of problem needs to change, however, I am stumped.
A sketch of my code:
trait Object {
fn update(&mut self);
fn draw(&self);
}
trait Event {
fn do_event(&mut self);
}
struct World {
objects: Vec<Box<dyn Object + 'a>>,
events: Vec<Box<dyn Event + 'a>>,
}
impl World {
fn update(&mut self) {
for obj in self.objects.iter_mut() {
obj.update();
}
for evt in self.events.iter_mut() {
evt.do_event();
}
}
fn draw(&mut self) { /*...*/ }
}
struct SomeEvent<'a, T: Object> {
target: &'a mut T,
}
impl<'a, T: Object> Event for SomeEvent<'a, T> {
fn update(&self) {
self.target.do_shrink(); // need mutable reference here
}
}
I know that RefCell enables multiple objects to obtain mutable references. Perhaps that is the direction I should go. However, based on what I learned from the Rust book, I got the sense that RefCells break some of Rust's main ideas by introducing unsafe code and circular references. I guess I was just wondering if there was some other obvious design pattern that adhered more to the idiomatic way of doing things.
After reading the question comments and watching kyren's RustConf 2018 talk (thanks trentcl), I've realized the OO approach to my game engine is fundamentally incompatible with Rust (or at least quite difficult).
After working on this for a day or so, I would highly recommend watching the posted video and then using the specs library (an implementation of the ECS system mentioned by user2722968).
Hope this helps someone else in the future.

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.
No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...
The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.

It is possible to have single-threaded zero-cost memoization without mutability in safe Rust?

I'm interested in finding or implementing a Rust data structure that provides a zero-cost way to memoize a single computation with an arbitrary output type T. Specifically I would like a generic type Cache<T> whose internal data takes up no more space than an Option<T>, with the following basic API:
impl<T> Cache<T> {
/// Return a new Cache with no value stored in it yet.
pub fn new() -> Self {
// ...
}
/// If the cache has a value stored in it, return a reference to the
/// stored value. Otherwise, compute `f()`, store its output
/// in the cache, and then return a reference to the stored value.
pub fn call<F: FnOnce() -> T>(&self, f: F) -> &T {
// ...
}
}
The goal here is to be able to share multiple immutable references to a Cache within a single thread, with any holder of such a reference being able to access the value (triggering the computation if it is the first time). Since we're only requiring to be able to share a Cache within a single thread, it is not necessary for it to be Sync.
Here is a way to implement the API safely (or at least I think it's safe) using unsafe under the hood:
use std::cell::UnsafeCell;
pub struct Cache<T> {
value: UnsafeCell<Option<T>>
}
impl<T> Cache<T> {
pub fn new() -> Self {
Cache { value: UnsafeCell::new(None) }
}
pub fn call<F: FnOnce() -> T>(&self, f: F) -> &T {
let ptr = self.value.get();
unsafe {
if (*ptr).is_none() {
let t = f();
// Since `f` potentially could have invoked `call` on this
// same cache, to be safe we must check again that *ptr
// is still None, before setting the value.
if (*ptr).is_none() {
*ptr = Some(t);
}
}
(*ptr).as_ref().unwrap()
}
}
}
Is it possible to implement such a type in safe Rust (i.e., not writing our own unsafe code, only indirectly relying on unsafe code in the standard library)?
Clearly, the call method requires mutating self, which means that Cache must use some form of interior mutability. However, it does not seem possible to do with a Cell, because Cell provides no way to retrieve a reference to the enclosed value, as is required by the desired API of call above. And this is for a good reason, as it would be unsound for Cell to provide such a reference, because it would have no way to ensure that the referenced value is not mutated over the lifetime of the reference. On the other hand, for the Cache type, after call is called once, the API above does not provide any way for the stored value to be mutated again, so it is safe for it to hand out a reference with a lifetime that can be a long as that of the Cache itself.
If Cell can't work, I'm curious if the Rust standard library might provide some other safe building blocks for interior mutability which could be used to implement this Cache.
Neither RefCell nor Mutex fulfill the goals here:
they are not zero-cost: they involve storing more data than that of an Option<T> and add unnecessary runtime checks.
they do not seem to provide any way to return a real reference with the lifetime that we want -- instead we could only return a Ref or MutexGuard which is not the same thing.
Using only an Option would not provide the same functionality: if we share immutable references to the Cache, any holder of such a reference can invoke call and obtain the desired value (and mutating the Cache in the process so that future calls will retrieve the same value); whereas, sharing immutable references to an Option, it would not be possible to mutate the Option, so it could not work.

What are the use cases of the newly proposed Pin type?

There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}

Resources