Rust rustc::middle::graph::Graph with string node indices - string

I'm new to rust (using 0.10) and exploring its use by implementing something like the rustc::middle::graph::Graph struct, but using strings as node indices and storing nodes in a HashMap.
Assuming non-static keys, what's a reasonable and efficient policy for ownership of the strings? Does the HashMap need to own its keys? Does each NodeIndex need to own its str? Is it possible for the node to own the string that defines its index and have everything else borrow that string? More generally, how should one share an immutable (but non-static) string amongst several data structures? If the answer is "it depends", what are the relevant issues?
If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?

Is it possible for the node to own the string that defines its index and have everything else borrow that string?
[...]
If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?
Not really: it's impossible (for the compiler) to verify that self-references don't get invalidated, i.e. many borrowing-internally situations (including this one specifically) end up allowing code similar to
struct Foo<'a> {
things: Vec<~str>,
borrows: Vec<&'a str>
}
let mut foo = Foo { ... };
foo.things.push(~"x");
foo.things.push(~"y");
// foo.things is [~"x", ~"y"]
// try to borrow the ~"y" to put a "y" into borrows
foo.borrows.push(foo.things.get(1).as_slice());
// ... time/SLoC passes ...
// we make the borrowed slice point to freed memory
// by popping/deallocating the ~"y"
foo.things.pop();
The compiler has a very hard time tell that an arbitrary modification won't be like the .pop call and invalidate the internal pointers. Basically, putting a self-reference to data that an object owns into the object itself would have to freeze that object, so no further modifications could occur (including moving out of the struct, e.g. to return it).
Tl;dr: you can't have one part of the Graph storing ~strs and another part storing &str slices into those ~strs.
That said, if you were to use some unsafe code, and only allow the Graph to be expanded, i.e. never removing nodes (i.e. never letting a ~str be deallocated until the whole graph is being destroyed), then some version of this could actually work.
More generally, how should one share an immutable (but non-static) string amongst several data structures?
You can use a Rc<~str>, with each data structure storing its own pointer. Something like
struct Graph<T> {
nodes: HashMap<Rc<~str>, Node<T>>
}
struct Node<T> {
name: Rc<~str>,
data: T,
outgoing_edges: Vec<Rc<~str>>
}
Another approach would be using a bidirectional map connecting strings with an "interned" index of a simple type, something like
struct Graph<T> {
index: BiMap<~str, Index>,
nodes: HashMap<Index, Node<T>>
}
#[deriving(Hash, Eq, TotalEq)]
struct Index { x: uint }
struct Node<T> {
name: Index,
data: T,
outgoing_edges: Vec<Index>
}
(Unfortunately this is hypothetical: Rust's stdlib doesn't have a BiMap type like this (yet...).)

Related

Storing mutable references owned by someone else

I want to store structs in HashMap but also reference to same structs inside another struct vector field.
Let's say I have a tree build of two types of structs Item and Relation.
I am storing all the relations in HashMap by their id,
but I also want to fill each item.out_relations vector with mutable references to same Relation structs which are owned by HashMap.
Here is my Item:
pub struct Item<'a> {
pub id: oid::ObjectId,
pub out_relations: Vec<&'a mut Relation>, // <-- want to fill this
}
And when I am iterating over my relations got from DB
I am trying to do smth like this:
let from_id = relation.from_id; //to find related item
item_map // my items loaded here already
.entry(from_id)
.and_modify(|item| {
(*item)
.out_relations.push(
relation_map.try_insert(relation.id, relation).unwrap() // mut ref to relation expected here
)
}
);
For now compiler warns try_insert is unstable feature and points me to this bug
But let's imagine I have this mutable ref to relation owned already by HashMap- is it going to work?
Or this will be again some ref lives not long enough error? What are my options then? Or I better will store just relation id in item out_relations vector rather then refs? And when needed I will take my relation from the hashmap?
This is called shared mutability, and it is forbidden by the borrow checker.
Fortunately Rust offers safe tools to achieve this.
In your case you need to use Rc<RefCell>, so your code would be:
pub struct Item {
pub id: oid::ObjectId,
pub out_relations: Vec<Rc<RefCell<Relation>>>,
}
And when I am iterating over my relations got from DB I am trying to do smth like this:
let relation = Rc::new(RefCell::new(relation));
let from_id = relation.borrow().from_id; // assuming id is a Copy type
item_map // my items loaded here already
.entry(from_id)
.and_modify(|item| {
(*item)
.out_relations.push(
relation_map.try_insert(relation.id, relation.clone()).unwrap() // mut ref to relation expected here
)
}
);
If you want to mutate relation later, you can use .borrow_mut()
relation.borrow_mut().from_id = new_id;
I agree with Oussama Gammoudi's diagnosis, but I'd like to offer alternative solutions.
The issue with reference counting is that it kills performance.
In your vector, can you store the key to the hash map instead? Then, when you need to get the value from the vector, get the key from the vector and fetch the value from the hash map?
Another strategy is to keep a vector of the values, and then store indices into the vector in your hash map and other vector. This Rust Keynote speaker describes the strategy well: https://youtu.be/aKLntZcp27M?t=1787

how can I create a throwaway mutable reference?

I'm trying to wrap a vector to change its index behaviour, so that when an out of bounds access happens, instead of panicking it returns a reference to a dummy value, like so:
use std::ops::Index;
struct VecWrapper(Vec<()>);
impl Index<usize> for VecWrapper {
type Output = ();
fn index(&self, idx: usize) -> &() {
if idx < self.0.len() {
&self.0[idx]
} else {
&()
}
}
}
which works just fine for the implementation of Index, but trying to implement IndexMut the same way fails for obvious reasons. the type in my collection does not have a Drop implementation so no destructor needs to be called (other than freeing the memory).
the only solution I can think of is to have a static mutable array containing thousands of dummies, and distributing references to elements of this array, this is a horrible solution though it still leads to UB if there are ever more dummies borrowed than the size of the static array.
Give the wrapper an additional field that's the dummy. It's got the same lifetime restrictions, and so can't be aliased.

Expected struct but found mutable reference to struct

I created a type Space that has an optional fields lef_neighbor and right_neighbor of the same type Space. Rust needs to know the size of the type at compile time so I wrapped the types in a Box<>. I now want to create a method that I can call on a Space that creates the right_neighbor of this Space object and assigns it as such. It also needs to set the left_neighbor field of the new Space to the old Space so they can both find each other.
pub struct Space {
left_neighbor: Option<Box<Space>>,
right_neighbor: Option<Box<Space>>,
}
impl Space {
pub fn new() -> Self {
Self {
left_neighbor: None,
right_neighbor: None,
}
}
pub fn create_neigbor(&mut self) {
let neighbor_space = Space::new();
neighbor_space.left_neighbor = Some(Box::new(self));
self.right_neighbor = Some(Box::new(neighbor_space));
}
}
(Playground link)
This gives the compile error:
|
16 | neighbor_space.left_neighbor = Some(Box::new(self));
| ^^^^ expected struct `Space`, found `&mut Space`
How do I fix this problem?
First of all, what you are making is similar to a Doubly Linked List. Doubly Linked Lists are notoriously hard to do in safe rust. I recommend giving a read to Learning Rust with Entirely Too Many Linked Lists.
As a TLDR, you can use either raw pointers and unsafe, or store your spaces in an external storage (a vector, or something) and use indices into that space as a proxy to the actual thing, or use Rc/Arc instead of Box (though this will be probably lead to memory leaks, unless you properly use weak refs and such).

How to take, transform and replace a vector in a mutable reference?

I have a struct Database { events: Vec<Event> }. I would like to apply some maps and filters to events. What is a good way to do this?
Here's what I tried:
fn update(db: &mut Database) {
db.events = db.events.into_iter().filter(|e| !e.cancelled).collect();
}
This doesn't work:
cannot move out of `db.events` which is behind a mutable reference
...
move occurs because `db.events` has type `Vec<Event>`, which does not implement the `Copy` trait
Is there any way to persuade Rust compiler that I'm taking the field value only temporarily?
The conceptual issue of why this doesn't work is due to panics. If, for example, the filter callback panics, then db.events would have been moved out of, by into_iter, but would not have had a value to replace it with - it would be uninitialized, and therefore unsafe.
Joël Hecht has what you really want to do in your specific instance: Vec::retain lets you filter out elements in place, and also reuses the storage.
Alexey Larionov also has an answer involving Vec::drain, which will leave an empty vector until the replacement happens. It requires a new allocation, though.
However, in the general case, the replace_with and take_mut crates offer functions to help accomplish what you are doing. You provide the function a closure that takes the value and returns its replacement, and the crates will run that closure, and aborting the process if there are panics.
In the case you exposed, the safer way is to use Vec.retain :
fn update(db: &mut Database) {
db.events.retain(|e| !e.cancelled);
}
Alternatively to #Joël Hecht's answer, you can Vec::drain the elements to then recreate the vector. Playground
fn update(db: &mut Database) {
db.events = db.events
.drain(..)
.filter(|e| !e.cancelled)
.collect();
}

Use reference to new struct in the struct initializer

I am trying to create a disjoint set structure in Rust. It looks like this
struct DisjointSet<'s> {
id: usize,
parent: &'s mut DisjointSet<'s>,
}
The default disjoint set is a singleton structure, in which the parent refers to itself. Hence, I would like to have the option to do the following:
let a: DisjointSet = DisjointSet {
id: id,
parent: self,
};
where the self is a reference to the object that will be created.
I have tried working around this issue by creating a custom constructor. However, my attempts failed because partial initialization is not allowed. The compiler suggests using Option<DisjointSet<'s>>, but this is quite ugly. Do you have any suggestions?
My question differs from Structure containing fields that know each other
because I am interested in getting the reference to the struct that will be created.
As #delnan says, at their core, these sort of data structures are directed acyclic graphs (DAGs), with all the sharing that entails. Rust is strict about what sharing can happen so it takes a bit of extra effort to convince the compiler to accept your code in this case.
Fortunately though, "all the sharing that entails" isn't literally "all the sharing": a DAG is acyclic (modulo wanting to have parent: self), so a reference counting type like Rc or Arc is a perfect way to handle the sharing (reference counting is not so good if there are cycles). Specifically:
struct DisjointSet {
id: Cell<usize>,
parent: Rc<DisjointSet>,
}
The Cell has zero runtime overhead (there is definitely some syntactic overhead) for such a small type.
Unfortunately, this still isn't quite right for the same reason that the compiler suggests using Option<...>. There's no way to create the first DisjointSet. However, the suggested fix still works:
struct DisjointSet {
id: Cell<usize>,
parent: Option<Rc<DisjointSet>>,
}
(The Option<...> is free: Option<Rc<...>> is a single nullable pointer, just like Rc<...> is a single non-nullable pointer, and presumably one would need a branch on "do I have a parent or not" anyway.)
If you are going to take this approach, I would recommend not trying to use the Option for partial initialisation, but instead use it to represent the fact that the given set is a "root". It is easy to traverse up a chain with this representation, e.g.
fn find_root(mut x: &DisjointSet) -> &DisjointSet {
while let Some(ref parent) = x.parent {
x = parent
}
x
}
The same approach should work fine with references, but the lifetimes can often be hard to juggle.

Resources