Trying to make a graph with cycles in RUST [duplicate] - rust

I have a data structure which can be represented as a unidirectional graph between some structs linked with link objects because links contain metadata.
It looks something like this:
struct StateMachine {
resources: Vec<Resource>,
links: Vec<Link>,
}
struct Resource {
kind: ResourceType,
// ...
}
enum LinkTarget {
ResourceList(Vec<&Resource>),
LabelSelector(HashMap<String, String>),
}
struct Link {
from: LinkTarget,
to: LinkTarget,
metadata: SomeMetadataStruct,
}
The whole structure needs to be mutable because I need to be able to add and remove links and resources at runtime. Because of this, I cannot use the normal lifetime model and bind the resources to the parent struct's lifetime.
I understand that I need to "choose my own guarantee" by picking the appropriate type, but I'm not sure what's the best way to solve this problem.

Modeling graph-like structures in Rust is not a simple problem.
Here there are two valuable discussions from Nick Cameron and Niko Matsakis (two main Rust developers at Mozilla.)
Graphs and arena allocation
Modeling Graphs in Rust Using Vector Indices

Actually, for a graph like structure, the simplest solution is to use an arena such as TypedArena.
The lifetime of the nodes will then be only dependent on the lifetime of the instance of the typed arena they were created from, which will greatly simplify resource management.
Warning: avoid a scenario where you dynamically add/remove nodes to the graph, as the nodes will NOT be removed from the arena until said arena is dropped, so the size of the arena would grow, unbounded.
If you are in a situation where you will add/remove nodes at runtime, another solution is to:
have a collection of Resources
have the edges only indirectly refer to the Resources (not owners, and not borrowers either)
Two examples:
HashMap<ResourceId, (Resource, Vec<ResourceId>)>
type R = RefCell<Resource>, Vec<Rc<R>> and Vec<(Weak<R>, Vec<Weak<R>>)>
in either case, you are responsible for cleaning up the edges when removing a resource, and forgetting may lead to a memory leak and panics (when dereferencing) but is otherwise safe.
There are, probably, infinite variations on the above.

The simplest solution for a graph-like structure is to use a library which models graphs. petgraph is a good choice:
use petgraph::Graph; // 0.5.1
use std::{collections::HashMap, rc::Rc};
struct Resource;
enum LinkTarget {
ResourceList(Vec<Rc<Resource>>),
LabelSelector(HashMap<String, String>),
}
struct SomeMetadataStruct;
fn main() {
let mut graph = Graph::new();
let n1 = graph.add_node(LinkTarget::LabelSelector(Default::default()));
let n2 = graph.add_node(LinkTarget::LabelSelector(Default::default()));
let _l2 = graph.add_edge(n1, n2, SomeMetadataStruct);
}
The guarantees that you have to choose here center around the member of ResourceList. I assume that you wish to have single-threaded shared immutable Resources.
if you need to share them across threads, use a Vec<Arc<Resource>>
if they aren't shared, you can own them — Vec<Resource>
if they need to be mutable, use a Vec<Rc<RefCell<Resource>>> (Or a Mutex or RwLock if also multithreaded)

Related

Does Rust's borrow checker really mean that I should re-structure my program?

So I've read Why can't I store a value and a reference to that value in the same struct? and I understand why my naive approach to this was not working, but I'm still very unclear how to better handle my situation.
I have a program I wanted to structure like follows (details omitted because I can't make this compile anyway):
use std::sync::Mutex;
struct Species{
index : usize,
population : Mutex<usize>
}
struct Simulation<'a>{
species : Vec<Species>,
grid : Vec<&'a Species>
}
impl<'a> Simulation<'a>{
pub fn new() -> Self {...} //I don't think there's any way to implement this
pub fn run(&self) {...}
}
The idea is that I create a vector of Species (which won't change for the lifetime of Simulation, except in specific mutex-guarded fields) and then a grid representing which species live where, which will change freely. This implementation won't work, at least not any way I've been able to figure out. As I understand it, the issue is that pretty much however I make my new method, the moment it returns, all of the references in grid would becomine invalid as Simulation and therefor Simulation.species are moved to another location in the stack. Even if I could prove to the compiler that species and its contents would continue to exist, they actually won't be in the same place. Right?
I've looked into various ways around this, such as making species as an Arc on the heap or using usizes instead of references and implementing my own lookup function into the species vector, but these seem slower, messier or worse. What I'm starting to think is that I need to really re-structure my code to look something like this (details filled in with placeholders because now it actually runs):
use std::sync::Mutex;
struct Species{
index : usize,
population : Mutex<usize>
}
struct Simulation<'a>{
species : &'a Vec<Species>, //Now just holds a reference rather than data
grid : Vec<&'a Species>
}
impl<'a> Simulation<'a>{
pub fn new(species : &'a Vec <Species>) -> Self { //has to be given pre-created species
let grid = vec!(species.first().unwrap(); 10);
Self{species, grid}
}
pub fn run(&self) {
let mut population = self.grid[0].population.lock().unwrap();
println!("Population: {}", population);
*population += 1;
}
}
pub fn top_level(){
let species = vec![Species{index: 0, population : Mutex::new(0_)}];
let simulation = Simulation::new(&species);
simulation.run();
}
As far as I can tell this runs fine, and ticks off all the ideal boxes:
grid uses simple references with minimal boilerplate for me
these references are checked at compile time with minimal overhead for the system
Safety is guaranteed by the compiler (unlike a custom map based approach)
But, this feels very weird to me: the two-step initialization process of creating owned memory and then references can't be abstracted any way that I can see, which feels like I'm exposing an implementation detail to the calling function. top_level has to also be responsible for establishing any other functions or (scoped) threads to run the simulation, call draw/gui functions, etc. If I need multiple levels of references, I believe I will need to add additional initialization steps to that level.
So, my question is just "Am I doing this right?". While I can't exactly prove this is wrong, I feel like I'm losing a lot of near-universal abstraction of the call structure. Is there really no way to return species and simulation as a pair at the end (with some one-off update to make all references point to the "forever home" of the data).
Phrasing my problem a second way: I do not like that I cannot have a function with a signature of ()-> Simulation, when I can can have a pair of function calls that have that same effect. I want to be able to encapsulate the creation of this simulation. I feel like the fact that this approach cannot do so indicates I'm doing something wrong, and that there may be a more idiomatic approach I'm missing.
I've looked into various ways around this, such as making species as an Arc on the heap or using usizes instead of references and implementing my own lookup function into the species vector, but these seem slower, messier or worse.
Don't assume that, test it. I once had a self-referential (using ouroboros) structure much like yours, with a vector of things and a vector of references to them. I tried rewriting it to use indices instead of references, and it was faster.
Rc/Arc is also an option worth trying out — note that there is only an extra cost to the reference counting when an Arc is cloned or dropped. Arc<Species> doesn't cost any more to dereference than &Species, and you can always get an &Species from an Arc<Species>. So the reference counting only matters if and when you're changing which Species is in an element of Grid.
If you're owning a Vec of objects, then want to also keep track of references to particular objects in that Vec, a usize index is almost always the simplest design. It might feel like extra boilerplate to you now, but it's a hell of a lot better than properly dealing with keeping pointers in check in a self-referential struct (as somebody who's made this mistake in C++ more than I should have, trust me). Rust's rules are saving you from some real headaches, just not ones that are obvious to you necessarily.
If you want to get fancy and feel like a raw usize is too arbitrary, then I recommend you look at slotmap. For a simple SlotMap, internally it's not much more than an array of values, iteration is fast and storage is efficient. But it gives you generational indices (slotmap calls these "keys") to the values: each value is embellished with a "generation" and each index also internally keeps hold of a its generation, therefore you can safely remove and replace items in the Vec without your references suddenly pointing at a different object, it's really cool.

Tell the Rust compiler that the lifetime of a parameter is always identical to a struct's lifetime

Sorry if this is obvious, I'm starting out with Rust.
I'm trying to implement a simple Composition relationship (one object is the single owner of another one, and the inner object is destroyed automatically when the outer object dies).
I originally thought it would be as simple as declaring a struct like this:
struct Outer {
inner: Inner
}
To my knowledge, that does exactly what I want: the inner attribute is owned by the outer struct, and will be destroyed whenever the outer object disappears.
However, in my case, the Inner type is from a library, and has a lifetime parameter.
// This is illegal
struct Outer {
inner: Inner<'a>
}
// I understand why, I'm using an undeclared lifetime parameter
I have read a bit on lifetime parameters (but I'm not yet completely used to them), and I'm unsure whether there is some syntax to tell the Rust compiler that “the lifetime this field expects is its owner's“, or whether it's just not possible (in which case, what would be the proper way to architecture this code?).
Edit: more detailed situation
I'm writing a project with Vulkano. I want to bundle multiple structures into a single structure so I can pass all at once to functions throughout the project.
Here, I have:
The Engine struct, which should hold everything
The Instance struct, which represents the Vulkan API
The PhysicalDevice struct, which represents a specific GPU, and can only be used as long as its matching Instance exists
The struct I'm struggling with is PhysicalDevice:
// https://github.com/vulkano-rs/vulkano/blob/c6959aa961c9c4bac59f53c99e73620b458d8d82/vulkano/src/device/physical.rs#L297
pub struct PhysicalDevice<'a> {
instance: &'a Arc<Instance>,
device: usize,
}
I want to create a struct that looks like:
pub struct Engine {
instance: Arc<Instance>,
device: PhysicalDevice,
}
Because the Engine struct owns PhysicalDevice as well as Arc<Instance>, and the instance referenced by PhysicalDevice is the same as the one referenced by the Engine, the PhysicalDevice's lifetime requirement should always be valid (since the contained instance cannot be freed before the Engine is freed).
I don't have very good reason of using this architecture, apart from the fact that this is the standard way of bundling related data in other languages. If this not proper "good practices" in Rust, I'm curious as to what the recommended approach would be.
“the lifetime this field expects is its owner's“,
This is impossible.
Generally, you should understand any type with a lifetime parameter (whether it is a reference &'a T or a struct Foo<'a> or anything else) as pointing to something elsewhere and which lives independently.
In your case, whatever Inner borrows (whatever its lifetime is about) cannot be a part of Outer. This is because if it was, Outer would be borrowing parts of itself, which makes Outer impossible to use normally. (Doing so requires pinning, preventing the struct from moving and thereby invalidating the references, which is not currently possible to set up without resorting to unsafe code.)
So, there are three cases you might have:
Your full situation is
struct Outer {
some_data: SomeOtherType,
inner: Inner<'I_want_to_borrow_the_some_data_field>,
}
This is not possible in basic Rust; you need to either
put some_data somewhere other than Outer,
use the ouroboros library which provides mechanisms to build sound self-referential structs (at the price of heap allocations and a complex interface),
or design your data structures differently in some other way.
The data Inner borrows is already independent. In that case, the correct solution is to propagate the lifetime parameter.
struct Outer<'a> {
inner: Inner<'a>,
}
There is not actually any borrowed data; Inner provides for the possibility but isn't actually using it in this case (or the borrowed data is a compile-time constant). In this case, you can use the 'static lifetime.
struct Outer {
inner: Inner<'static>,
}
You can use a bound on Self:
struct Outer<'a> where Self: 'a {
inner: Inner<'a>
}

Is my understanding of a Rust vector that supports Rc or Box wrapped types correct?

I'm not looking for code samples. I want to state my understanding of Box vs. Rc and have you tell me if my understanding is right or wrong.
Let's say I have some trait ChattyAnimal and a struct Cat that implements this trait, e.g.
pub trait ChattyAnimal {
fn make_sound(&self);
}
pub struct Cat {
pub name: String,
pub sound: String
}
impl ChattyAnimal for Cat {
fn make_sound(&self) {
println!("Meow!");
}
}
Now let's say I have other structs (Dog, Cow, Chicken, ...) that also implement the ChattyAnimal trait, and let's say I want to store all of these in the same vector.
So step 1 is I would have to use a Box type, because the Rust compiler cannot determine the size of everything that might implement this trait. And therefore, we must store these items on the heap – viola using a Box type, which is like a smarter pointer in C++. Anything wrapped with Box is automatically deleted by Rust when it goes out of scope.
// I can alias and use my Box type that wraps the trait like this:
pub type BoxyChattyAnimal = Box<dyn ChattyAnimal>;
// and then I can use my type alias, i.e.
pub struct Container {
animals: Vec<BoxyChattyAnimal>
}
Meanwhile, with Box, Rust's borrow checker requires changing when I pass or reassign the instance. But if I actually want to have multiple references to the same underlying instance, I have to use Rc. And so to have a vector of ChattyAnimal instances where each instance can have multiple references, I would need to do:
pub type RcChattyAnimal = Rc<dyn ChattyAnimal>;
pub struct Container {
animals: Vec<RcChattyAnimal>
}
One important take away from this is that if I want to have a vector of some trait type, I need to explicitly set that vector's type to a Box or Rc that wraps my trait. And so the Rust language designers force us to think about this in advance so that a Box or Rc cannot (at least not easily or accidentally) end up in the same vector.
This feels like a very and well thought design – helping prevent me from introducing bugs in my code. Is my understanding as stated above correct?
Yes, all this is correct.
There's a second reason for this design: it allows the compiler to verify that the operations you're performing on the vector elements are using memory in a safe way, relative to how they're stored.
For example, if you had a method on ChattyAnimal that mutates the animal (i.e. takes a &mut self argument), you could call that method on elements of a Vec<Box<dyn ChattyAnimal>> as long as you had a mutable reference to the vector; the Rust compiler would know that there could only be one reference to the ChattyAnimal in question (because the only reference is inside the Box, which is inside the Vec, and you have a mutable reference to the Vec so there can't be any other references to it). If you tried to write the same code with a Vec<Rc<dyn ChattyAnimal>>, the compiler would complain; it wouldn't be able to completely eliminate the possibility that your code might be mutating the animal at the same time as the code that called it was in the middle of trying to read the animal, which might lead to some inconsistencies in the calling code.
As a consequence, the compiler needs to know that all the elements of the Vec have their memory treated in the same way, so that it can check to make sure that a reference to some arbitrary element of the Vec is being used appropriately.
(There's a third reason, too, which is performance; because the compiler knows that this is a "vector of Boxes" or "vector of Rcs", it can generate code that assumes a particular storage mechanism. For example, if you have a vector of Rcs, and clone one of the elements, the machine code that the compiler generates will work simply by going to the memory address listed in the vector and adding 1 to the reference count stored there – there's no need for any extra levels of indirection. If the vector were allowed to mix different allocation schemes, the generated code would have to be a lot more complex, because it wouldn't be able to assume things like "there is a reference count", and would instead need to (at runtime) find the appropriate piece of code for dealing with the memory allocation scheme in use, and then run it; that would be much slower.)

How safe is this behavior of GTK-rs Builder::get_object?

In The Rust Programming Language, it says something like:
Move semantics
There’s some more subtlety here, though: Rust ensures that there is
exactly one binding to any given resource. For example, if we have a
vector, we can assign it to another binding:
But I found that I can do this using gtk-rs:
let label1: gtk::Label = builder.get_object("label1").unwrap();
let label1_test: gtk::Label = builder.get_object("label1").unwrap();
Both now point to the same resource "or something happens to me."
Builder::get_object is defined as:
pub fn get_object<T: IsA<Object>>(&self, name: &str) -> Option<T> {
unsafe {
Option::<Object>::from_glib_none(
ffi::gtk_builder_get_object(self.to_glib_none().0, name.to_glib_none().0))
.and_then(|obj| obj.downcast().ok())
}
}
Although this is not really something from Rust directly, just from gtk-rs, I was wondering if I am right and how sure is this.
Maybe it could use Rc?
GTK/GLib objects (GObject) implement reference counting directly, similar to the Arc type in Rust. You can safely have multiple references to the same object, and once the last one goes out of scope the object will be destroyed.
For mutability, in Rust, gtk-rs uses interior mutability (conceptually). So you can mutate every reference to the same object, even if there are multiple of them. The implementation of the objects has to handle that (and has to anyway because that's how things work in GTK/GLib in C).

How can I make a structure with internal references?

I'm trying to make a graph with adjacency lists, but I can't figure out how to specify an appropriate lifetime for the references in the adjacency list.
What I'm trying to get at is the following:
struct Graph<T> {
nodes : Vec<T>,
adjacencies : Vec<Vec<&T>>
}
This won't work because there is a lifetime specifier missing for the reference type. I suppose I could use indices for the adjacencies, but I'm actually interested in the internal reference problem, and this is just a vehicle to express that problem.
The way I see it, this should be possible to do safely, since the nodes are owned by the object. It should be allowed to keep references to those nodes around.
Am I right? How can this be done in Rust? Or, if I'm wrong, what did I miss?
It is not possible to represent this concept in Rust with just references due to Rust’s memory safety—such an object could not be constructed without already existing. As long as nodes and adjacencies are stored separately, it’s OK, but as soon as you try to join them inside the same structure, it can’t be made to work thus.
The alternatives are using reference counting (Rc<T> if immutable is OK or Rc<RefCell<T>> with inner mutability) or using unsafe pointers (*const T or *mut T).
I think there is an issue here. If we transpose this to C++:
template <typename T>
struct Graph {
std::vector<T> nodes;
std::vector<std::vector<T*>> adjacencies;
};
where adjacencies points into nodes, then one realizes that there is an issue: an operation on nodes that invalidates references (such as reallocation) will leave dangling pointers into adjacencies.
I see at least two ways to fix this:
Use indexes in adjacencies, as those are stable
Use a level of indirection in nodes, so that the memory is pinned
In Rust, this gives:
struct Graph<T> {
nodes: Vec<T>,
adjacencies: Vec<Vec<uint>>,
};
struct Graph<T> {
nodes: Vec<Rc<RefCell<T>>>,
adjacencies: Vec<Vec<Rc<RefCell<T>>>>,
};
Note: the RefCell is to allow mutating T despite it being aliased, by introducing a runtime check.

Resources