struct with reference to element of a vector in another field - rust

I have the below example where I want a struct which holds a vector of data in one field, and has another field which contains the currently selected field. My understanding is that this is not possible in rust because I could remove the element in tables which selected points to, thereby creating a dangling pointer (can't borrow mut when an immutable borrow exists). The obvious workaround is to instead store a usize index for the element, rather than a &'a String. But this means I need to update the index if I remove an element from tables. Is there any way to avoid this using smart pointers, or just any better solutions in general? I've looked at other questions but they are not quite the same as below, and have extra information which makes them harder to follow for a beginner like myself, whereas below is a very minimal example.
struct Data<'a> {
selected: &'a String,
tables: Vec<String>,
}
fn main() {
let tables = vec!["table1".to_string(), "table2".to_string()];
let my_stuff = Data {
selected: &tables[0],
tables: tables,
};
}

You quite rightfully assessed that the way you wrote it is not possible, because Rust guarantees memory safety and storing it as a reference would give the possibility to create a dangling pointer.
There are several solutions that I could see here.
Static Strings
This of course only works if you store compile-time static strings.
struct Data {
selected: &'static str,
tables: Vec<&'static str>,
}
fn main() {
let tables = vec!["table1", "table2"];
let my_stuff = Data {
selected: &tables[0],
tables,
};
}
The reason this works is because static strings are non-mutable and guaranteed to never be deallocated. Also, in case this is confusing, I recommend reading up on the differences between Strings and str slices.
You can even go one further and reduce the lifetime down to 'a. But then, you have to store them as &'a str in the vector, to ensure they cannot be edited.
But that then allows you to store Strings in them, as long as the strings can be borrowed for the entire lifetime of the Data object.
struct Data<'a> {
selected: &'a str,
tables: Vec<&'a str>,
}
fn main() {
let str1 = "table1".to_string();
let str2 = "table2".to_string();
let tables = vec![str1.as_str(), str2.as_str()];
let my_stuff = Data {
selected: &tables[0],
tables,
};
}
Reference counting smart pointers
Depending your situation, there are several types that are recommended:
Rc<...> - if your data is immutable. Otherwise, you need to create interior mutability with:
Rc<Cell<...>> - safest and best solution IF your problem is single-threaded and deals with simple data types
Rc<RefCell<...>> - for more complex data types that have to be updated in-place and can't just be moved in and out
Arc<Mutex<...>> - as soon as your problem stretches over multiple threads
In your case, the data is in fact simple and your program is single-threaded, so I'd go with:
use std::{cell::Cell, rc::Rc};
struct Data {
selected: Rc<Cell<String>>,
tables: Vec<Rc<Cell<String>>>,
}
fn main() {
let tables = vec![
Rc::new(Cell::new("table1".to_string())),
Rc::new(Cell::new("table2".to_string())),
];
let my_stuff = Data {
selected: tables[0].clone(),
tables,
};
}
Of course, if you don't want to modify your strings after creation, you could go with:
use std::rc::Rc;
struct Data {
selected: Rc<String>,
tables: Vec<Rc<String>>,
}
fn main() {
let tables = vec![Rc::new("table1".to_string()), Rc::new("table2".to_string())];
let my_stuff = Data {
selected: tables[0].clone(),
tables,
};
}
Hiding the data structure and using an index
As you already mentioned, you could use an index instead. Then you would have to hide the vector and provide getters/setters/modifiers to make sure the index is kept in sync when the vector changes.
I'll keep the implementation up to the reader and won't provide an example here :)
I hope this helped already, or at least gave you a couple of new ideas. I'm happy to see new people come to the community, so feel free to ask further questions if you have any :)

Related

What is the design pattern to use when you need two borrows in Rust?

Here is a Rust function.
impl Room {
pub fn getPageContent(&mut self, bookNumber: u32, pageNumber: u16, alp: &Alphabet) {
let page = &mut self.books[(bookNumber-1) as usize].pages[(pageNumber-1) as usize];
if page.text.is_empty() {
page.generateContent(alp, bookNumber, &self);
}
}
}
Each page needs information from to room in order to generate its content.
However, doing that as show in the function of course leads to
cannot borrow `self` as immutable because it is also borrowed as mutable´
My question is : what is the intended design pattern to be used in such cases ? Especially at scale, as this is only an example.
Giving the page each information it needs from the Room would be a solution if all those informations implement Clone, but doesn't seem advisable (what if ´generateContent´ needs 10 different informations ? Such a pattern would quickly multiply to work to be done along with potential mistakes and overall complexity).
Another solution is to generate the content directly in the ´getPageContent´, but this pattern would quickly lead to having everything be done inside Room, which does not seem to be a good idea either when applied at scale either.
You can pull the mutation out of generateContent then change generateContent to take a non-mut reference.
pub fn get_page_content(&mut self, book_number: u32, page_number: u16, alp: &Alphabet) {
let page = &self.books[(book_number - 1) as usize].pages[(page_number - 1) as usize];
if page.text.is_empty() {
let text = page.generate_content(alp, book_number, self);
// lookup the page again but mut this time
let page =
&mut self.books[(book_number - 1) as usize].pages[(page_number - 1) as usize];
page.text = text;
}
}

How to create a Box<UnsafeCell<[T]>>

The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.
The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.
Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground

Using a HashSet to canonicalize objects in Rust

As an educational exercise, I'm looking at porting cvs-fast-export to Rust.
Its basic mode of operation is to parse a number of CVS master files into a intermediate form, and then to analyse the intermediate form with the goal of transforming it into a git fast-export stream.
One of the things that is done when parsing is to convert common parts of the intermediate form into a canonical representation. A motivating example is commit authors. A CVS repository may have hundreds of thousands of individual file commits, but probably less than a thousand authors. So an interning table is used when parsing where you input the author as you parse it from the file and it will give you a pointer to a canonical version, creating a new one if it hasn't seen it before. (I've heard this called atomizing or interning too). This pointer then gets stored on the intermediate object.
My first attempt to do something similar in Rust attempted to use a HashSet as the interning table. Note this uses CVS version numbers rather than authors, this is just a sequence of digits such as 1.2.3.4, represented as a Vec.
use std::collections::HashSet;
use std::hash::Hash;
#[derive(PartialEq, Eq, Debug, Hash, Clone)]
struct CvsNumber(Vec<u16>);
fn intern<T:Eq + Hash + Clone>(set: &mut HashSet<T>, item: T) -> &T {
let dupe = item.clone();
if !set.contains(&item) {
set.insert(item);
}
set.get(&dupe).unwrap()
}
fn main() {
let mut set: HashSet<CvsNumber> = HashSet::new();
let c1 = CvsNumber(vec![1, 2]);
let c2 = intern(&mut set, c1);
let c3 = CvsNumber(vec![1, 2]);
let c4 = intern(&mut set, c3);
}
This fails with error[E0499]: cannot borrow 'set' as mutable more than once at a time. This is fair enough, HashSet doesn't guarantee references to its keys will be valid if you add more items after you have obtained a reference. The C version is careful to guarantee this. To get this guarantee, I think the HashSet should be over Box<T>. However I can't explain the lifetimes for this to the borrow checker.
The ownership model I am going for here is that the interning table owns the canonical versions of the data, and hands out references. The references should be valid as long the interning table exists. We should be able to add new things to the interning table without invalidating the old references. I think the root of my problem is that I'm confused how to write the interface for this contract in a way consistent with the Rust ownership model.
Solutions I see with my limited Rust knowledge are:
Do two passes, build a HashSet on the first pass, then freeze it and use references on the second pass. This means additional temporary storage (sometimes substantial).
Unsafe
Does anyone have a better idea?
I somewhat disagree with #Shepmaster on the use of unsafe here.
While right now it does not cause issue, should someone decide in the future to change the use of HashSet to include some pruning (for example, to only ever keep a hundred authors in there), then unsafe will bite you sternly.
In the absence of a strong performance reason, I would simply use a Rc<XXX>. You can alias it easily enough: type InternedXXX = Rc<XXX>;.
use std::collections::HashSet;
use std::hash::Hash;
use std::rc::Rc;
#[derive(PartialEq, Eq, Debug, Hash, Clone)]
struct CvsNumber(Rc<Vec<u16>>);
fn intern<T:Eq + Hash + Clone>(set: &mut HashSet<T>, item: T) -> T {
if !set.contains(&item) {
let dupe = item.clone();
set.insert(dupe);
item
} else {
set.get(&item).unwrap().clone()
}
}
fn main() {
let mut set: HashSet<CvsNumber> = HashSet::new();
let c1 = CvsNumber(Rc::new(vec![1, 2]));
let c2 = intern(&mut set, c1);
let c3 = CvsNumber(Rc::new(vec![1, 2]));
let c4 = intern(&mut set, c3);
}
Your analysis is correct. The ultimate issue is that when modifying the HashSet, the compiler cannot guarantee that the mutations will not affect the existing allocations. Indeed, in general they might affect them, unless you add another layer of indirection, as you have identified.
This is a prime example of a place that unsafe is useful. You, the programmer, can assert that the code will only ever be used in a particular way, and that particular way will allow the variable to be stable through any mutations. You can use the type system and module visibility to help enforce these conditions.
Note that String already introduces a heap allocation. So long as you don't change the String once it's allocated, you don't need an extra Box.
Something like this seems like an OK start:
use std::{cell::RefCell, collections::HashSet, mem};
struct EasyInterner(RefCell<HashSet<String>>);
impl EasyInterner {
fn new() -> Self {
EasyInterner(RefCell::new(HashSet::new()))
}
fn intern<'a>(&'a self, s: &str) -> &'a str {
let mut set = self.0.borrow_mut();
if !set.contains(s) {
set.insert(s.into());
}
let interned = set.get(s).expect("Impossible missing string");
// TODO: Document the pre- and post-conditions that the code must
// uphold to make this unsafe code valid instead of copying this
// from Stack Overflow without reading it
unsafe { mem::transmute(interned.as_str()) }
}
}
fn main() {
let i = EasyInterner::new();
let a = i.intern("hello");
let b = i.intern("world");
let c = i.intern("hello");
// Still strings
assert_eq!(a, "hello");
assert_eq!(a, c);
assert_eq!(b, "world");
// But with the same address
assert_eq!(a.as_ptr(), c.as_ptr());
assert!(a.as_ptr() != b.as_ptr());
// This shouldn't compile; a cannot outlive the interner
// let x = {
// let i = EasyInterner::new();
// let a = i.intern("hello");
// a
// };
let the_pointer;
let i = {
let i = EasyInterner::new();
{
// Introduce a scope to contstrain the borrow of `i` for `s`
let s = i.intern("inner");
the_pointer = s.as_ptr();
}
i // moving i to a new location
// All outstanding borrows are invalidated
};
// but the data is still allocated
let s = i.intern("inner");
assert_eq!(the_pointer, s.as_ptr());
}
However, it may be much more expedient to use a crate like:
string_cache, which has the collective brainpower of the Servo project behind it.
typed-arena
generational-arena

Share a reference variable in two collections

I am trying to share a reference in two collections: a map and a list. I want to push to both of the collections and remove from the back of the list and remove from the map too. This code is just a sample to present what I want to do, it doesn't even compile!
What is the right paradigm to implement this? What kind of guarantees are the best practice?
use std::collections::{HashMap, LinkedList};
struct Data {
url: String,
access: u64,
}
struct Holder {
list: LinkedList<Box<Data>>,
map: HashMap<String, Box<Data>>,
}
impl Holder {
fn push(&mut self, d: Data) {
let boxed = Box::new(d);
self.list.push_back(boxed);
self.map.insert(d.url.to_owned(), boxed);
}
fn remove_last(&mut self) {
if let Some(v) = self.list.back() {
self.map.remove(&v.url);
}
self.list.pop_back();
}
}
The idea of storing a single item in multiple collections simultaneously is not new... and is not simple.
The common idea to sharing elements is either doing it unsafely (knowing how many collections there are) or doing it with a shared pointer.
There is a some inefficiency in just blindly using regular collections with a shared pointer:
Node based collections mean you have a double indirection
Switching from one view to the other require finding the element all over again
In Boost.MultiIndex (C++), the paradigm used is to create an intrusive node to wrap the value, and then link this node in the various views. The trick (and difficulty) is to craft the intrusive node in a way that allows getting to the surrounding elements to be able to "unlink" it in O(1) or O(log N).
It would be quite unsafe to do so, and I don't recommend attempting it unless you are ready to spend significant time on it.
Thus, for a quick solution:
type Node = Rc<RefCell<Data>>;
struct Holder {
list: LinkedList<Node>,
map: HashMap<String, Node>,
}
As long as you do not need to remove by url, this is efficient enough.
Complete example:
use std::collections::{HashMap, LinkedList};
use std::cell::RefCell;
use std::rc::Rc;
struct Data {
url: String,
access: u64,
}
type Node = Rc<RefCell<Data>>;
struct Holder {
list: LinkedList<Node>,
map: HashMap<String, Node>,
}
impl Holder {
fn push(&mut self, d: Data) {
let url = d.url.to_owned();
let node = Rc::new(RefCell::new(d));
self.list.push_back(Rc::clone(&node));
self.map.insert(url, node);
}
fn remove_last(&mut self) {
if let Some(node) = self.list.back() {
self.map.remove(&node.borrow().url);
}
self.list.pop_back();
}
}
fn main() {}
Personally, I would use Rc instead of Box.
Alternatively, you could store indexes into your list as the value type of your hash map (i.e. use HashMap<String, usize> instead of HashMap<String, Box<Data>>.
Depending on what you are trying to do, it might be a better idea to only have either a list or a map, and not both.

Referencing a containing struct in Rust (and calling methods on it)

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I'm trying to write a container structure in Rust where its elements also store a reference to the containing container so that they can call methods on it. As far as I could figure out, I need to do this via Rc<RefCell<T>>. Is this correct?
So far, I have something like the following:
struct Container {
elems: ~[~Element]
}
impl Container {
pub fn poke(&mut self) {
println!("Got poked.");
}
}
struct Element {
datum: int,
container: Weak<RefCell<Container>>
}
impl Element {
pub fn poke_container(&mut self) {
let c1 = self.container.upgrade().unwrap(); // Option<Rc>
let mut c2 = c1.borrow().borrow_mut(); // &RefCell
c2.get().poke();
// self.container.upgrade().unwrap().borrow().borrow_mut().get().poke();
// -> Error: Borrowed value does not live long enough * 2
}
}
fn main() {
let container = Rc::new(RefCell::new(Container{ elems: ~[] }));
let mut elem1 = Element{ datum: 1, container: container.downgrade() };
let mut elem2 = Element{ datum: 2, container: container.downgrade() };
elem1.poke_container();
}
I feel like I am missing something here. Is accessing the contents of a Rc<RefCell<T>> really this difficult (in poke_container)? Or am I approaching the problem the wrong way?
Lastly, and assuming the approach is correct, how would I write an add method for Container so that it could fill in the container field in Element (assuming I changed the field to be of type Option<Rc<RefCell<T>>>? I can't create another Rc from &mut self as far as I know.
The long chain of method calls actually works for me on master without any changes, because the lifetime of "r-values" (e.g. the result of function calls) have changed so that the temporary return values last until the end of the statement, rather than the end of the next method call (which seemed to be how the old rule worked).
As Vladimir hints, overloadable dereference will likely reduce it to
self.container.upgrade().unwrap().borrow_mut().poke();
which is nicer.
In any case, "mutating" shared ownership is always going to be (slightly) harder to write in Rust that either single ownership code or immutable shared ownership code, because it's very easy for such code to be memory unsafe (and memory safety is the core goal of Rust).

Resources