How can I make a structure with internal references? - rust

I'm trying to make a graph with adjacency lists, but I can't figure out how to specify an appropriate lifetime for the references in the adjacency list.
What I'm trying to get at is the following:
struct Graph<T> {
nodes : Vec<T>,
adjacencies : Vec<Vec<&T>>
}
This won't work because there is a lifetime specifier missing for the reference type. I suppose I could use indices for the adjacencies, but I'm actually interested in the internal reference problem, and this is just a vehicle to express that problem.
The way I see it, this should be possible to do safely, since the nodes are owned by the object. It should be allowed to keep references to those nodes around.
Am I right? How can this be done in Rust? Or, if I'm wrong, what did I miss?

It is not possible to represent this concept in Rust with just references due to Rust’s memory safety—such an object could not be constructed without already existing. As long as nodes and adjacencies are stored separately, it’s OK, but as soon as you try to join them inside the same structure, it can’t be made to work thus.
The alternatives are using reference counting (Rc<T> if immutable is OK or Rc<RefCell<T>> with inner mutability) or using unsafe pointers (*const T or *mut T).

I think there is an issue here. If we transpose this to C++:
template <typename T>
struct Graph {
std::vector<T> nodes;
std::vector<std::vector<T*>> adjacencies;
};
where adjacencies points into nodes, then one realizes that there is an issue: an operation on nodes that invalidates references (such as reallocation) will leave dangling pointers into adjacencies.
I see at least two ways to fix this:
Use indexes in adjacencies, as those are stable
Use a level of indirection in nodes, so that the memory is pinned
In Rust, this gives:
struct Graph<T> {
nodes: Vec<T>,
adjacencies: Vec<Vec<uint>>,
};
struct Graph<T> {
nodes: Vec<Rc<RefCell<T>>>,
adjacencies: Vec<Vec<Rc<RefCell<T>>>>,
};
Note: the RefCell is to allow mutating T despite it being aliased, by introducing a runtime check.

Related

Struct holding references and life time annotations. Is my reasoning correct?

I want to create a struct that will hold a bunch of Path references for later lookup.
I first started with this:
struct AbsoluteSystemPaths{
home_folder:Path,
shortcuts_file:Path
}
The compiler complained about -> Path doesn't have a size known at compile-time.
It is sensible, a struct can only have as much as one non sized var member and I added 2 (home_folder and shortcuts_file). So, I decided to move them to references:
struct AbsoluteSystemPaths{
home_folder:& Path,
shortcuts_file: &Path
}
That made the sized error go away but brought another one related to lifetimes: "Missing lifetime specifier" for the two var members of type Path.
And again, it is pretty sensible, Rust needs to know how much time this 2 references will live to avoid dangling references or memory leaks.
So, I added the life time specifiers:
struct AbsoluteSystemPaths<'a> {
home_folder:&'a Path,
shortcuts_file: &'a Path
}
and then everything compiled without errors.
What I would like to know is:
Is my general reasoning correct?
The lifetime annotations I added could be read as: Both path references will last at least as 'a that is the lifetime of an instance contrusted from AbsoluteSystemPaths struct?
Yes, your understanding is correct.
One thing to note with Path, as it is an unsized type as you've mentioned, is that the standard library provides a struct called PathBuf that represents "An owned, mutable path (akin to String).". This is an alternative to using &Path if you'd like your struct to own the Path instead of hold references to them. This may be useful depending on the way you're using your struct.

Trying to make a graph with cycles in RUST [duplicate]

I have a data structure which can be represented as a unidirectional graph between some structs linked with link objects because links contain metadata.
It looks something like this:
struct StateMachine {
resources: Vec<Resource>,
links: Vec<Link>,
}
struct Resource {
kind: ResourceType,
// ...
}
enum LinkTarget {
ResourceList(Vec<&Resource>),
LabelSelector(HashMap<String, String>),
}
struct Link {
from: LinkTarget,
to: LinkTarget,
metadata: SomeMetadataStruct,
}
The whole structure needs to be mutable because I need to be able to add and remove links and resources at runtime. Because of this, I cannot use the normal lifetime model and bind the resources to the parent struct's lifetime.
I understand that I need to "choose my own guarantee" by picking the appropriate type, but I'm not sure what's the best way to solve this problem.
Modeling graph-like structures in Rust is not a simple problem.
Here there are two valuable discussions from Nick Cameron and Niko Matsakis (two main Rust developers at Mozilla.)
Graphs and arena allocation
Modeling Graphs in Rust Using Vector Indices
Actually, for a graph like structure, the simplest solution is to use an arena such as TypedArena.
The lifetime of the nodes will then be only dependent on the lifetime of the instance of the typed arena they were created from, which will greatly simplify resource management.
Warning: avoid a scenario where you dynamically add/remove nodes to the graph, as the nodes will NOT be removed from the arena until said arena is dropped, so the size of the arena would grow, unbounded.
If you are in a situation where you will add/remove nodes at runtime, another solution is to:
have a collection of Resources
have the edges only indirectly refer to the Resources (not owners, and not borrowers either)
Two examples:
HashMap<ResourceId, (Resource, Vec<ResourceId>)>
type R = RefCell<Resource>, Vec<Rc<R>> and Vec<(Weak<R>, Vec<Weak<R>>)>
in either case, you are responsible for cleaning up the edges when removing a resource, and forgetting may lead to a memory leak and panics (when dereferencing) but is otherwise safe.
There are, probably, infinite variations on the above.
The simplest solution for a graph-like structure is to use a library which models graphs. petgraph is a good choice:
use petgraph::Graph; // 0.5.1
use std::{collections::HashMap, rc::Rc};
struct Resource;
enum LinkTarget {
ResourceList(Vec<Rc<Resource>>),
LabelSelector(HashMap<String, String>),
}
struct SomeMetadataStruct;
fn main() {
let mut graph = Graph::new();
let n1 = graph.add_node(LinkTarget::LabelSelector(Default::default()));
let n2 = graph.add_node(LinkTarget::LabelSelector(Default::default()));
let _l2 = graph.add_edge(n1, n2, SomeMetadataStruct);
}
The guarantees that you have to choose here center around the member of ResourceList. I assume that you wish to have single-threaded shared immutable Resources.
if you need to share them across threads, use a Vec<Arc<Resource>>
if they aren't shared, you can own them — Vec<Resource>
if they need to be mutable, use a Vec<Rc<RefCell<Resource>>> (Or a Mutex or RwLock if also multithreaded)

Tell the Rust compiler that the lifetime of a parameter is always identical to a struct's lifetime

Sorry if this is obvious, I'm starting out with Rust.
I'm trying to implement a simple Composition relationship (one object is the single owner of another one, and the inner object is destroyed automatically when the outer object dies).
I originally thought it would be as simple as declaring a struct like this:
struct Outer {
inner: Inner
}
To my knowledge, that does exactly what I want: the inner attribute is owned by the outer struct, and will be destroyed whenever the outer object disappears.
However, in my case, the Inner type is from a library, and has a lifetime parameter.
// This is illegal
struct Outer {
inner: Inner<'a>
}
// I understand why, I'm using an undeclared lifetime parameter
I have read a bit on lifetime parameters (but I'm not yet completely used to them), and I'm unsure whether there is some syntax to tell the Rust compiler that “the lifetime this field expects is its owner's“, or whether it's just not possible (in which case, what would be the proper way to architecture this code?).
Edit: more detailed situation
I'm writing a project with Vulkano. I want to bundle multiple structures into a single structure so I can pass all at once to functions throughout the project.
Here, I have:
The Engine struct, which should hold everything
The Instance struct, which represents the Vulkan API
The PhysicalDevice struct, which represents a specific GPU, and can only be used as long as its matching Instance exists
The struct I'm struggling with is PhysicalDevice:
// https://github.com/vulkano-rs/vulkano/blob/c6959aa961c9c4bac59f53c99e73620b458d8d82/vulkano/src/device/physical.rs#L297
pub struct PhysicalDevice<'a> {
instance: &'a Arc<Instance>,
device: usize,
}
I want to create a struct that looks like:
pub struct Engine {
instance: Arc<Instance>,
device: PhysicalDevice,
}
Because the Engine struct owns PhysicalDevice as well as Arc<Instance>, and the instance referenced by PhysicalDevice is the same as the one referenced by the Engine, the PhysicalDevice's lifetime requirement should always be valid (since the contained instance cannot be freed before the Engine is freed).
I don't have very good reason of using this architecture, apart from the fact that this is the standard way of bundling related data in other languages. If this not proper "good practices" in Rust, I'm curious as to what the recommended approach would be.
“the lifetime this field expects is its owner's“,
This is impossible.
Generally, you should understand any type with a lifetime parameter (whether it is a reference &'a T or a struct Foo<'a> or anything else) as pointing to something elsewhere and which lives independently.
In your case, whatever Inner borrows (whatever its lifetime is about) cannot be a part of Outer. This is because if it was, Outer would be borrowing parts of itself, which makes Outer impossible to use normally. (Doing so requires pinning, preventing the struct from moving and thereby invalidating the references, which is not currently possible to set up without resorting to unsafe code.)
So, there are three cases you might have:
Your full situation is
struct Outer {
some_data: SomeOtherType,
inner: Inner<'I_want_to_borrow_the_some_data_field>,
}
This is not possible in basic Rust; you need to either
put some_data somewhere other than Outer,
use the ouroboros library which provides mechanisms to build sound self-referential structs (at the price of heap allocations and a complex interface),
or design your data structures differently in some other way.
The data Inner borrows is already independent. In that case, the correct solution is to propagate the lifetime parameter.
struct Outer<'a> {
inner: Inner<'a>,
}
There is not actually any borrowed data; Inner provides for the possibility but isn't actually using it in this case (or the borrowed data is a compile-time constant). In this case, you can use the 'static lifetime.
struct Outer {
inner: Inner<'static>,
}
You can use a bound on Self:
struct Outer<'a> where Self: 'a {
inner: Inner<'a>
}

Is my understanding of a Rust vector that supports Rc or Box wrapped types correct?

I'm not looking for code samples. I want to state my understanding of Box vs. Rc and have you tell me if my understanding is right or wrong.
Let's say I have some trait ChattyAnimal and a struct Cat that implements this trait, e.g.
pub trait ChattyAnimal {
fn make_sound(&self);
}
pub struct Cat {
pub name: String,
pub sound: String
}
impl ChattyAnimal for Cat {
fn make_sound(&self) {
println!("Meow!");
}
}
Now let's say I have other structs (Dog, Cow, Chicken, ...) that also implement the ChattyAnimal trait, and let's say I want to store all of these in the same vector.
So step 1 is I would have to use a Box type, because the Rust compiler cannot determine the size of everything that might implement this trait. And therefore, we must store these items on the heap – viola using a Box type, which is like a smarter pointer in C++. Anything wrapped with Box is automatically deleted by Rust when it goes out of scope.
// I can alias and use my Box type that wraps the trait like this:
pub type BoxyChattyAnimal = Box<dyn ChattyAnimal>;
// and then I can use my type alias, i.e.
pub struct Container {
animals: Vec<BoxyChattyAnimal>
}
Meanwhile, with Box, Rust's borrow checker requires changing when I pass or reassign the instance. But if I actually want to have multiple references to the same underlying instance, I have to use Rc. And so to have a vector of ChattyAnimal instances where each instance can have multiple references, I would need to do:
pub type RcChattyAnimal = Rc<dyn ChattyAnimal>;
pub struct Container {
animals: Vec<RcChattyAnimal>
}
One important take away from this is that if I want to have a vector of some trait type, I need to explicitly set that vector's type to a Box or Rc that wraps my trait. And so the Rust language designers force us to think about this in advance so that a Box or Rc cannot (at least not easily or accidentally) end up in the same vector.
This feels like a very and well thought design – helping prevent me from introducing bugs in my code. Is my understanding as stated above correct?
Yes, all this is correct.
There's a second reason for this design: it allows the compiler to verify that the operations you're performing on the vector elements are using memory in a safe way, relative to how they're stored.
For example, if you had a method on ChattyAnimal that mutates the animal (i.e. takes a &mut self argument), you could call that method on elements of a Vec<Box<dyn ChattyAnimal>> as long as you had a mutable reference to the vector; the Rust compiler would know that there could only be one reference to the ChattyAnimal in question (because the only reference is inside the Box, which is inside the Vec, and you have a mutable reference to the Vec so there can't be any other references to it). If you tried to write the same code with a Vec<Rc<dyn ChattyAnimal>>, the compiler would complain; it wouldn't be able to completely eliminate the possibility that your code might be mutating the animal at the same time as the code that called it was in the middle of trying to read the animal, which might lead to some inconsistencies in the calling code.
As a consequence, the compiler needs to know that all the elements of the Vec have their memory treated in the same way, so that it can check to make sure that a reference to some arbitrary element of the Vec is being used appropriately.
(There's a third reason, too, which is performance; because the compiler knows that this is a "vector of Boxes" or "vector of Rcs", it can generate code that assumes a particular storage mechanism. For example, if you have a vector of Rcs, and clone one of the elements, the machine code that the compiler generates will work simply by going to the memory address listed in the vector and adding 1 to the reference count stored there – there's no need for any extra levels of indirection. If the vector were allowed to mix different allocation schemes, the generated code would have to be a lot more complex, because it wouldn't be able to assume things like "there is a reference count", and would instead need to (at runtime) find the appropriate piece of code for dealing with the memory allocation scheme in use, and then run it; that would be much slower.)

Array as a struct field

I would like to create a non binary tree structure in Rust. Here is a try
struct TreeNode<T> {
tag : T,
father : Weak<TreeNode<T>>,
childrenlists : [Rc<TreeNode<T>>]
}
Unfortunately, this does not compile.
main.rs:4:1: 8:2 error: the trait `core::marker::Sized` is not implemented for the type `[alloc::rc::Rc<TreeNode<T>>]` [E0277]
main.rs:4 struct TreeNode<T> {
main.rs:5 tag : T,
main.rs:6 father : Weak<TreeNode<T>>,
main.rs:7 childrenlist : [Rc<TreeNode<T>>]
main.rs:8 }
main.rs:4:1: 8:2 note: `[alloc::rc::Rc<TreeNode<T>>]` does not have a constant size known at compile-time
main.rs:4 struct TreeNode<T> {
main.rs:5 tag : T,
main.rs:6 father : Weak<TreeNode<T>>,
main.rs:7 childrenlist : [Rc<TreeNode<T>>]
main.rs:8 }
error: aborting due to previous error
The code compiles if we replace an array with a Vec. However, the structure is immutable and I do not need an overallocated Vec.
I heard it could be possible to have a struct field with size unknown at compile time, provided it is unique. How can we do it?
Rust doesn't have the concept of a variable-length (stack) array, which you seem to be trying to use here.
Rust has a couple different array-ish types.
Vec<T> ("vector"): Dynamically sized; dynamically allocated on the heap. This is probably what you want to use. Initialize it with Vec::with_capacity(foo) to avoid overallocation (this creates an empty vector with the given capacity).
[T; n] ("array"): Statically sized; lives on the stack. You need to know the size at compile time, so this won't work for you (unless I've misanalysed your situation).
[T] ("slice"): Unsized; usually used from &[T]. This is a view into a contiguous set of Ts in memory somewhere. You can get it by taking a reference to an array, or a vector (called "taking a slice of an array/vector"), or even taking a view into a subset of the array/vector. Being unsized, [T] can't be used directly as a variable (it can be used as a member of an unsized struct), but you can view it from behind a pointer. Pointers referring to [T] are fat ; i.e. they have an extra field for the length. &[T] would be useful if you want to store a reference to an existing array; but I don't think that's what you want to do here.
If you don't know the size of the list in advance, you have two choices:
&[T] which is just a reference to some piece of memory that you don't own
Vec<T> which is your own storage.
The correct thing here is to use a Vec. Why? Because you want the children list (array of Rc) to be actually owned by the TreeNode. If you used a &[T], it means that someone else would be keeping the list, not the TreeNode. With some lifetime trickery, you could write some valid code but you would have to go very far to please the compiler because the borrowed reference would have to be valid at least as long as the TreeNode.
Finally, a sentence in your question shows a misunderstanding:
However, the structure is immutable and I do not need an overallocated Vec.
You confuse mutability and ownership. Sure you can have an immutable Vec. It seems like you want to avoid allocating memory from the heap, but that's not possible, precisely because you don't know the size of the children list. Now if you're concerned with overallocating, you can fine-tune the vector storage with methods like with_capacity() and shrink_to_fit().
A final note: if you actually know the size of the list because it is fixed at compile time, you just need to use a [T; n] where n is compile-time known. But that's not the same as [T].

Resources