How does interior mutability work for caching behavior? - rust

I'm trying to create a struct that takes a Path and, on demand, loads the image from the path specified. Here's what I have so far:
extern crate image;
use std::cell::{RefCell};
use std::path::{Path};
use image::{DynamicImage};
pub struct ImageCell<'a> {
image: RefCell<Option<DynamicImage>>,
image_path: &'a Path,
}
impl<'a> ImageCell<'a> {
pub fn new<P: AsRef<Path>>(image_path: &'a P) -> ImageCell<'a>{
ImageCell { image: RefCell::new(None), image_path: image_path.as_ref() }
}
//copied from https://doc.rust-lang.org/nightly/std/cell/index.html#implementation-details-of-logically-immutable-methods
pub fn get_image(&self) -> &DynamicImage {
{
let mut cache = self.image.borrow_mut();
if cache.is_some() {
return cache.as_ref().unwrap(); //Error here
}
let image = image::open(self.image_path).unwrap();
*cache = Some(image);
}
self.get_image()
}
}
This fails to compile:
src/image_generation.rs:34:24: 34:29 error: `cache` does not live long enough
src/image_generation.rs:34 return cache.as_ref().unwrap();
^~~~~
src/image_generation.rs:30:46: 42:6 note: reference must be valid for the anonymous lifetime #1 defined on the block at 30:45...
src/image_generation.rs:30 pub fn get_image(&self) -> &DynamicImage {
src/image_generation.rs:31 {
src/image_generation.rs:32 let mut cache = self.image.borrow_mut();
src/image_generation.rs:33 if cache.is_some() {
src/image_generation.rs:34 return cache.as_ref().unwrap();
src/image_generation.rs:35 }
...
src/image_generation.rs:32:53: 39:10 note: ...but borrowed value is only valid for the block suffix following statement 0 at 32:52
src/image_generation.rs:32 let mut cache = self.image.borrow_mut();
src/image_generation.rs:33 if cache.is_some() {
src/image_generation.rs:34 return cache.as_ref().unwrap();
src/image_generation.rs:35 }
src/image_generation.rs:36
src/image_generation.rs:37 let image = image::open(self.image_path).unwrap();
...
I think I understand why because the lifetime of cache is tied to borrow_mut().
Is there anyway to structure the code so that this works?

I'm not totally convinced you need interior mutability here. However, I do think the solution you've proposed is generally useful, so I'll elaborate on one way to achieve it.
The problem with your current code is that RefCell provides dynamic borrowing semantics. In other words, borrowing the contents of a RefCell is opaque to Rust's borrow checker. The problem is, when you try to return a &DynamicImage while it still lives inside the RefCell, it is impossible for the RefCell to track its borrowing status. If a RefCell allowed that to happen, then other code could overwrite the contents of the RefCell while there was a loan out of &DynamicImage. Whoops! Memory safety violation.
For this reason, borrowing a value out of a RefCell is tied to the lifetime of the guard you get back when you call borrow_mut(). In this case, the lifetime of the guard is the stack frame of get_image, which no longer exists after the function returns. Therefore, you cannot borrow the contents of a RefCell like you're doing.
An alternative approach (while maintaining the requirement of interior mutability) is to move values in and out of the RefCell. This enables you to retain cache semantics.
The basic idea is to return a guard that contains the dynamic image along with a pointer back to the cell it originated from. Once you're done with the dynamic image, the guard will be dropped and we can add the image back to the cell's cache.
To maintain ergonomics, we impl Deref on the guard so that you can mostly pretend like it is a DynamicImage. Here's the code with some comments and a few other things cleaned up:
use std::cell::RefCell;
use std::io;
use std::mem;
use std::ops::Deref;
use std::path::{Path, PathBuf};
struct ImageCell {
image: RefCell<Option<DynamicImage>>,
// Suffer the one time allocation into a `PathBuf` to avoid dealing
// with the lifetime.
image_path: PathBuf,
}
impl ImageCell {
fn new<P: Into<PathBuf>>(image_path: P) -> ImageCell {
ImageCell {
image: RefCell::new(None),
image_path: image_path.into(),
}
}
fn get_image(&self) -> io::Result<DynamicImageGuard> {
// `take` transfers ownership out from the `Option` inside the
// `RefCell`. If there was no value there, then generate an image
// and return it. Otherwise, move the value out of the `RefCell`
// and return it.
let image = match self.image.borrow_mut().take() {
None => {
println!("Opening new image: {:?}", self.image_path);
try!(DynamicImage::open(&self.image_path))
}
Some(img) => {
println!("Retrieving image from cache: {:?}", self.image_path);
img
}
};
// The guard provides the `DynamicImage` and a pointer back to
// `ImageCell`. When it's dropped, the `DynamicImage` is added
// back to the cache automatically.
Ok(DynamicImageGuard { image_cell: self, image: image })
}
}
struct DynamicImageGuard<'a> {
image_cell: &'a ImageCell,
image: DynamicImage,
}
impl<'a> Drop for DynamicImageGuard<'a> {
fn drop(&mut self) {
// When a `DynamicImageGuard` goes out of scope, this method is
// called. We move the `DynamicImage` out of its current location
// and put it back into the `RefCell` cache.
println!("Adding image to cache: {:?}", self.image_cell.image_path);
let image = mem::replace(&mut self.image, DynamicImage::empty());
*self.image_cell.image.borrow_mut() = Some(image);
}
}
impl<'a> Deref for DynamicImageGuard<'a> {
type Target = DynamicImage;
fn deref(&self) -> &DynamicImage {
// This increases the ergnomics of a `DynamicImageGuard`. Because
// of this impl, most uses of `DynamicImageGuard` can be as if
// it were just a `&DynamicImage`.
&self.image
}
}
// A dummy image type.
struct DynamicImage {
data: Vec<u8>,
}
// Dummy image methods.
impl DynamicImage {
fn open<P: AsRef<Path>>(_p: P) -> io::Result<DynamicImage> {
// Open image on file system here.
Ok(DynamicImage { data: vec![] })
}
fn empty() -> DynamicImage {
DynamicImage { data: vec![] }
}
}
fn main() {
let cell = ImageCell::new("foo");
{
let img = cell.get_image().unwrap(); // opens new image
println!("image data: {:?}", img.data);
} // adds image to cache (on drop of `img`)
let img = cell.get_image().unwrap(); // retrieves image from cache
println!("image data: {:?}", img.data);
} // adds image back to cache (on drop of `img`)
There is a really important caveat to note here: This only has one cache location, which means if you call get_image a second time before the first guard has been dropped, then a new image will be generated from scratch since the cell will be empty. This semantic is hard to change (in safe code) because you've committed to a solution that uses interior mutability. Generally speaking, the whole point of interior mutability is to mutate something without the caller being able to observe it. Indeed, that should be the case here, assuming that opening an image always returns precisely the same data.
This approach can be generalized to be thread safe (by using Mutex for interior mutability instead of RefCell) and possibly more performant by choosing a different caching strategy depending on your use case. For example, the regex crate uses a simple memory pool to cache compiled regex state. Since this caching should be opaque to callers, it is implemented with interior mutability using precisely the same mechanism outlined here.

Related

Transferring ownership between enum variants [duplicate]

I'm tring to replace a value in a mutable borrow; moving part of it into the new value:
enum Foo<T> {
Bar(T),
Baz(T),
}
impl<T> Foo<T> {
fn switch(&mut self) {
*self = match self {
&mut Foo::Bar(val) => Foo::Baz(val),
&mut Foo::Baz(val) => Foo::Bar(val),
}
}
}
The code above doesn't work, and understandibly so, moving the value out of self breaks the integrity of it. But since that value is dropped immediately afterwards, I (if not the compiler) could guarantee it's safety.
Is there some way to achieve this? I feel like this is a job for unsafe code, but I'm not sure how that would work.
mem:uninitialized has been deprecated since Rust 1.39, replaced by MaybeUninit.
However, uninitialized data is not required here. Instead, you can use ptr::read to get the data referred to by self.
At this point, tmp has ownership of the data in the enum, but if we were to drop self, that data would attempt to be read by the destructor, causing memory unsafety.
We then perform our transformation and put the value back, restoring the safety of the type.
use std::ptr;
enum Foo<T> {
Bar(T),
Baz(T),
}
impl<T> Foo<T> {
fn switch(&mut self) {
// I copied this code from Stack Overflow without reading
// the surrounding text that explains why this is safe.
unsafe {
let tmp = ptr::read(self);
// Must not panic before we get to `ptr::write`
let new = match tmp {
Foo::Bar(val) => Foo::Baz(val),
Foo::Baz(val) => Foo::Bar(val),
};
ptr::write(self, new);
}
}
}
More advanced versions of this code would prevent a panic from bubbling out of this code and instead cause the program to abort.
See also:
replace_with, a crate that wraps this logic up.
take_mut, a crate that wraps this logic up.
Change enum variant while moving the field to the new variant
How can I swap in a new value for a field in a mutable reference to a structure?
The code above doesn't work, and understandibly so, moving the value
out of self breaks the integrity of it.
This is not exactly what happens here. For example, same thing with self would work nicely:
impl<T> Foo<T> {
fn switch(self) {
self = match self {
Foo::Bar(val) => Foo::Baz(val),
Foo::Baz(val) => Foo::Bar(val),
}
}
}
Rust is absolutely fine with partial and total moves. The problem here is that you do not own the value you're trying to move - you only have a mutable borrowed reference. You cannot move out of any reference, including mutable ones.
This is in fact one of the frequently requested features - a special kind of reference which would allow moving out of it. It would allow several kinds of useful patterns. You can find more here and here.
In the meantime for some cases you can use std::mem::replace and std::mem::swap. These functions allow you to "take" a value out of mutable reference, provided you give something in exchange.
Okay, I figured out how to do it with a bit of unsafeness and std::mem.
I replace self with an uninitialized temporary value. Since I now "own" what used to be self, I can safely move the value out of it and replace it:
use std::mem;
enum Foo<T> {
Bar(T),
Baz(T),
}
impl<T> Foo<T> {
fn switch(&mut self) {
// This is safe since we will overwrite it without ever reading it.
let tmp = mem::replace(self, unsafe { mem::uninitialized() });
// We absolutely must **never** panic while the uninitialized value is around!
let new = match tmp {
Foo::Bar(val) => Foo::Baz(val),
Foo::Baz(val) => Foo::Bar(val),
};
let uninitialized = mem::replace(self, new);
mem::forget(uninitialized);
}
}
fn main() {}

having two struct reference each other - rust

I'm quite new to Rust programming, and I'm trying to convert a code that I had in js to Rust.
A plain concept of it is as below:
fn main() {
let mut ds=DataSource::new();
let mut pp =Processor::new(&mut ds);
}
struct DataSource {
st2r: Option<&Processor>,
}
struct Processor {
st1r: &DataSource,
}
impl DataSource {
pub fn new() -> Self {
DataSource {
st2r: None,
}
}
}
impl Processor {
pub fn new(ds: &mut DataSource) -> Self {
let pp = Processor {
st1r: ds,
};
ds.st2r = Some(&pp);
pp
}
}
As you can see I have two main modules in my system that are inter-connected to each other and I need a reference of each in another.
Well, this code would complain about lifetimes and such stuff, of course 😑. So I started throwing lifetime specifiers around like a madman and even after all that, it still complains that in "Processor::new" I can't return something that has been borrowed. Legit. But I can't find any solution around it! No matter how I try to handle the referencing of each other, it ends with this borrowing error.
So, can anyone point out a solution for this situation? Is my app's structure not valid in Rust and I should do it in another way? or there's a trick to this that my inexperienced mind can't find?
Thanks.
What you're trying to do can't be expressed with references and lifetimes because:
The DataSource must live longer than the Processor so that pp.st1r is guaranteed to be valid,
and the Processor must live longer than the DataSource so that ds.st2r is guaranteed to be valid. You might think that since ds.st2r is an Option and since the None variant doesn't contain a reference this allows a DataSource with a None value in st2r to outlive any Processors, but unfortunately the compiler can't know at compile-time whether st2r contains Some value, and therefore must assume it does.
Your problem is compounded by the fact that you need a mutable reference to the DataSource so that you can set its st2r field at a time when you also have an immutable outstanding reference inside the Processor, which Rust won't allow.
You can make your code work by switching to dynamic lifetime and mutability tracking using Rc (for dynamic lifetime tracking) and RefCell (for dynamic mutability tracking):
use std::cell::RefCell;
use std::rc::{ Rc, Weak };
fn main() {
let ds = Rc::new (RefCell::new (DataSource::new()));
let pp = Processor::new (Rc::clone (&ds));
}
struct DataSource {
st2r: Weak<Processor>,
}
struct Processor {
st1r: Rc<RefCell<DataSource>>,
}
impl DataSource {
pub fn new() -> Self {
DataSource {
st2r: Weak::new(),
}
}
}
impl Processor {
pub fn new(ds: Rc::<RefCell::<DataSource>>) -> Rc<Self> {
let pp = Rc::new (Processor {
st1r: ds,
});
pp.st1r.borrow_mut().st2r = Rc::downgrade (&pp);
pp
}
}
Playground
Note that I've replaced your Option<&Processor> with a Weak<Processor>. It would be possible to use an Option<Rc<Processor>> but this would risk leaking memory if you dropped all references to DataSource without setting st2r to None first. The Weak<Processor> behaves more or less like an Option<Rc<Processor>> that is set to None automatically when all other references are dropped, ensuring that memory will be freed properly.

Is there a way to have a struct contain a reference that might no longer be valid?

Like some kind of Box that holds the reference to the value or something? I'd have to check whether the value is still alive or not before reading it, like when a Option is pattern matched.
A mock example:
struct Whatever {
thing: AliveOrNot<i32>,
}
fn main() {
let mut w = Whatever { thing: Holder::Empty };
w.thing = AliveOrNot::new(100);
match w.thing {
Empty => println!("doesn't exist"),
Full(value) => println!("{}", value),
}
}
The exact case is that I'm using a sdl2 Font and I want to store instances of that struct in another struct. I don't want to do something like this because the Parent would have to live exactly as long as the Font:
struct Font<'a, 'b> {
aa: &'a i32,
bb: &'b i32,
}
struct Parent<'a, 'b, 'c> {
f: &'c Font<'a, 'b>
}
What I want is for the Parent to work regardless of whether that field is still alive or not.
You may be interested in std::rc::Weak or std::sync::Weak:
use std::rc::{Rc, Weak};
struct Whatever {
thing: Weak<i32>,
}
impl Whatever {
fn do_it(&self) {
match self.thing.upgrade() {
Some(value) => println!("{}", value),
None => println!("doesn't exist"),
}
}
}
fn its_dead_jim() -> Whatever {
let owner = Rc::new(42);
let thing = Rc::downgrade(&owner);
let whatever = Whatever { thing };
whatever.do_it();
whatever
}
fn main() {
let whatever = its_dead_jim();
whatever.do_it();
}
42
doesn't exist
There is no way to do this in safe Rust using non-'static references. A huge point of Rust's references is that it's impossible to refer to invalid values.
You could leak the memory, creating a &'static i32, but that's not sustainable if you need to do this multiple times.
You could use unsafe code and deal with raw pointers that have no notion of lifetimes. You then assume the responsibility of ensuring you don't introduce memory unsafety.
See also:
Need holistic explanation about Rust's cell and reference counted types
Situations where Cell or RefCell is the best choice
Is there a shared pointer with a single strong owner and multiple weak references?
Why can't I store a value and a reference to that value in the same struct?
How to convert a String into a &'static str
Is there any way to return a reference to a variable created in a function?
How to solve this rust lifetime bound issue of SDL2?
Cannot infer an appropriate lifetime for autoref due to conflicting requirements

Can a type know when a mutable borrow to itself has ended?

I have a struct and I want to call one of the struct's methods every time a mutable borrow to it has ended. To do so, I would need to know when the mutable borrow to it has been dropped. How can this be done?
Disclaimer: The answer that follows describes a possible solution, but it's not a very good one, as described by this comment from Sebastien Redl:
[T]his is a bad way of trying to maintain invariants. Mostly because dropping the reference can be suppressed with mem::forget. This is fine for RefCell, where if you don't drop the ref, you will simply eventually panic because you didn't release the dynamic borrow, but it is bad if violating the "fraction is in shortest form" invariant leads to weird results or subtle performance issues down the line, and it is catastrophic if you need to maintain the "thread doesn't outlive variables in the current scope" invariant.
Nevertheless, it's possible to use a temporary struct as a "staging area" that updates the referent when it's dropped, and thus maintain the invariant correctly; however, that version basically amounts to making a proper wrapper type and a kind of weird way to use it. The best way to solve this problem is through an opaque wrapper struct that doesn't expose its internals except through methods that definitely maintain the invariant.
Without further ado, the original answer:
Not exactly... but pretty close. We can use RefCell<T> as a model for how this can be done. It's a bit of an abstract question, but I'll use a concrete example to demonstrate. (This won't be a complete example, but something to show the general principles.)
Let's say you want to make a Fraction struct that is always in simplest form (fully reduced, e.g. 3/5 instead of 6/10). You write a struct RawFraction that will contain the bare data. RawFraction instances are not always in simplest form, but they have a method fn reduce(&mut self) that reduces them.
Now you need a smart pointer type that you will always use to mutate the RawFraction, which calls .reduce() on the pointed-to struct when it's dropped. Let's call it RefMut, because that's the naming scheme RefCell uses. You implement Deref<Target = RawFraction>, DerefMut, and Drop on it, something like this:
pub struct RefMut<'a>(&'a mut RawFraction);
impl<'a> Deref for RefMut<'a> {
type Target = RawFraction;
fn deref(&self) -> &RawFraction {
self.0
}
}
impl<'a> DerefMut for RefMut<'a> {
fn deref_mut(&mut self) -> &mut RawFraction {
self.0
}
}
impl<'a> Drop for RefMut<'a> {
fn drop(&mut self) {
self.0.reduce();
}
}
Now, whenever you have a RefMut to a RawFraction and drop it, you know the RawFraction will be in simplest form afterwards. All you need to do at this point is ensure that RefMut is the only way to get &mut access to the RawFraction part of a Fraction.
pub struct Fraction(RawFraction);
impl Fraction {
pub fn new(numerator: i32, denominator: i32) -> Self {
// create a RawFraction, reduce it and wrap it up
}
pub fn borrow_mut(&mut self) -> RefMut {
RefMut(&mut self.0)
}
}
Pay attention to the pub markings (and lack thereof): I'm using those to ensure the soundness of the exposed interface. All three types should be placed in a module by themselves. It would be incorrect to mark the RawFraction field pub inside Fraction, since then it would be possible (for code outside the module) to create an unreduced Fraction without using new or get a &mut RawFraction without going through RefMut.
Supposing all this code is placed in a module named frac, you can use it something like this (assuming Fraction implements Display):
let f = frac::Fraction::new(3, 10);
println!("{}", f); // prints 3/10
f.borrow_mut().numerator += 3;
println!("{}", f); // prints 3/5
The types encode the invariant: Wherever you have Fraction, you can know that it's fully reduced. When you have a RawFraction, &RawFraction, etc., you can't be sure. If you want, you may also make RawFraction's fields non-pub, so that you can't get an unreduced fraction at all except by calling borrow_mut on a Fraction.
Basically the same thing is done in RefCell. There you want to reduce the runtime borrow-count when a borrow ends. Here you want to perform an arbitrary action.
So let's re-use the concept of writing a function that returns a wrapped reference:
struct Data {
content: i32,
}
impl Data {
fn borrow_mut(&mut self) -> DataRef {
println!("borrowing");
DataRef { data: self }
}
fn check_after_borrow(&self) {
if self.content > 50 {
println!("Hey, content should be <= {:?}!", 50);
}
}
}
struct DataRef<'a> {
data: &'a mut Data
}
impl<'a> Drop for DataRef<'a> {
fn drop(&mut self) {
println!("borrow ends");
self.data.check_after_borrow()
}
}
fn main() {
let mut d = Data { content: 42 };
println!("content is {}", d.content);
{
let b = d.borrow_mut();
//let c = &d; // Compiler won't let you have another borrow at the same time
b.data.content = 123;
println!("content set to {}", b.data.content);
} // borrow ends here
println!("content is now {}", d.content);
}
This results in the following output:
content is 42
borrowing
content set to 123
borrow ends
Hey, content should be <= 50!
content is now 123
Be aware that you can still obtain an unchecked mutable borrow with e.g. let c = &mut d;. This will be silently dropped without calling check_after_borrow.

Can't figure out a function to return a reference to a given type stored in RefCell<Box<Any>>

Most of this is boilerplate, provided as a compilable example. Scroll down.
use std::rc::{Rc, Weak};
use std::cell::RefCell;
use std::any::{Any, AnyRefExt};
struct Shared {
example: int,
}
struct Widget {
parent: Option<Weak<Box<Widget>>>,
specific: RefCell<Box<Any>>,
shared: RefCell<Shared>,
}
impl Widget {
fn new(specific: Box<Any>,
parent: Option<Rc<Box<Widget>>>) -> Rc<Box<Widget>> {
let parent_option = match parent {
Some(parent) => Some(parent.downgrade()),
None => None,
};
let shared = Shared{pos: 10};
Rc::new(box Widget{
parent: parent_option,
specific: RefCell::new(specific),
shared: RefCell::new(shared)})
}
}
struct Woo {
foo: int,
}
impl Woo {
fn new() -> Box<Any> {
box Woo{foo: 10} as Box<Any>
}
}
fn main() {
let widget = Widget::new(Woo::new(), None);
{
// This is a lot of work...
let cell_borrow = widget.specific.borrow();
let woo = cell_borrow.downcast_ref::<Woo>().unwrap();
println!("{}", woo.foo);
}
// Can the above be made into a function?
// let woo = widget.get_specific::<Woo>();
}
I'm learning Rust and trying to figure out some workable way of doing a widget hierarchy. The above basically does what I need, but it is a bit cumbersome. Especially vexing is the fact that I have to use two statements to convert the inner widget (specific member of Widget). I tried several ways of writing a function that does it all, but the amount of reference and lifetime wizardry is just beyond me.
Can it be done? Can the commented out method at the bottom of my example code be made into reality?
Comments regarding better ways of doing this entire thing are appreciated, but put it in the comments section (or create a new question and link it)
I'll just present a working simplified and more idiomatic version of your code and then explain all the changed I made there:
use std::rc::{Rc, Weak};
use std::any::{Any, AnyRefExt};
struct Shared {
example: int,
}
struct Widget {
parent: Option<Weak<Widget>>,
specific: Box<Any>,
shared: Shared,
}
impl Widget {
fn new(specific: Box<Any>, parent: Option<Rc<Widget>>) -> Widget {
let parent_option = match parent {
Some(parent) => Some(parent.downgrade()),
None => None,
};
let shared = Shared { example: 10 };
Widget{
parent: parent_option,
specific: specific,
shared: shared
}
}
fn get_specific<T: 'static>(&self) -> Option<&T> {
self.specific.downcast_ref::<T>()
}
}
struct Woo {
foo: int,
}
impl Woo {
fn new() -> Woo {
Woo { foo: 10 }
}
}
fn main() {
let widget = Widget::new(box Woo::new() as Box<Any>, None);
let specific = widget.get_specific::<Woo>().unwrap();
println!("{}", specific.foo);
}
First of all, there are needless RefCells inside your structure. RefCells are needed very rarely - only when you need to mutate internal state of an object using only & reference (instead of &mut). This is useful tool for implementing abstractions, but it is almost never needed in application code. Because it is not clear from your code that you really need it, I assume that it was used mistakingly and can be removed.
Next, Rc<Box<Something>> when Something is a struct (like in your case where Something = Widget) is redundant. Putting an owned box into a reference-counted box is just an unnecessary double indirection and allocation. Plain Rc<Widget> is the correct way to express this thing. When dynamically sized types land, it will be also true for trait objects.
Finally, you should try to always return unboxed values. Returning Rc<Box<Widget>> (or even Rc<Widget>) is unnecessary limiting for the callers of your code. You can go from Widget to Rc<Widget> easily, but not the other way around. Rust optimizes by-value returns automatically; if your callers need Rc<Widget> they can box the return value themselves:
let rc_w = box(RC) Widget::new(...);
Same thing is also true for Box<Any> returned by Woo::new().
You can see that in the absence of RefCells your get_specific() method can be implemented very easily. However, you really can't do it with RefCell because RefCell uses RAII for dynamic borrow checks, so you can't return a reference to its internals from a method. You'll have to return core::cell::Refs, and your callers will need to downcast_ref() themselves. This is just another reason to use RefCells sparingly.

Resources