How to model complex recursive data structures (graphs)? - rust

I am very interested in Rust and am now starting my first non-trivial project in the language. I am still having a bit of trouble fully understanding the concepts of borrowing and lifetime.
The application is a logic gate simulator in which components are defined recursively (in terms of other components and their interconnections).
My current plan is to implement this similarly as I would in C++ by having a Component structure owning a vector of Components (its sub-components) and a vector of Nets describing the inter-connections between those components:
pub struct Pin {
name: String
}
pub struct Net<'a> {
nodes: Vec<(&'a Component<'a>,&'a Pin)>
}
pub struct Component<'a> {
sub_components: Vec<Box<Component<'a>>>,
in_pins: Vec<Pin>,
out_pins: Vec<Pin>,
netlist: Vec<Net<'a>>
}
impl<'a> Component<'a> {
pub fn new() -> Component<'a> {
...
}
pub fn add_subcomponent( & mut self, comp: Component<'a> ) {
// -> &Box<Component<'a>> ??
....
}
}
In C++, Net would be easy to implement as an array of pointers to Components but I am not sure of the best way to do this in Rust, I suppose I should use borrowed pointers? Or is there a better way?
Consider the following main:
fn main() {
let sub1 = Component::new();
let sub2 = Component::new();
let circuit = Component::new();
circuit.add_subcomponent( sub1 );
circuit.add_subcomponent( sub2 );
// sub1 and sub2 are now empty...
}
How can I configure circuit to create a net between sub1 and sub2? Shall I have add_subcomponent returns a borrowed pointer to the added Component? or the Box?
It would be great if someone could point me in the right direction.
Thanks a lot.

You can't represent an arbitrary graph structure in safe rust.
The best way to implement this pattern is to use unsafe code and raw pointers, or an existing abstraction that wraps this functionality in a safe api, for example http://static.rust-lang.org/doc/master/std/cell/struct.RefCell.html
For example, the typical bi directional linked list would be:
struct Node {
next: Option<Node>, // Each node 'owns' the next one
prev: *mut Node // Backrefs are unsafe
}
There have been a number of 'safe' implementations floating around, where you have something like:
struct Node {
id: u32,
next: u32,
prev: u32
}
struct Nodes {
all:Vec<Node>,
root:Option<Node>
}
This is 'technically' safe, but it's a terrible pattern; it breaks all the safety rules by just manually implementing raw pointers. I strongly advise against it.
You could try using references, eg:
struct Container<'a> {
edges:Vec<Edge<'a>>,
pub root: Node
}
struct Node {
children:Vec<Node>
}
struct Edge<'a> {
n1: &'a Node,
n2: &'a Node
}
...but you'll stumble almost immediately into borrow checker hell. For example, when you remove a node, how does the borrow checker know that the associated links in 'edges' are no longer valid?
Although you might be able to define the structures, populating them will be extremely troublesome.
I know that's probably a quite unsatisfying answer; you may find it useful to search github for 'rust graph' and 'rust tree' and look at the implementations other people have done.
Typically they enforce single-ownership of a sub-tree to a parent object.

Related

Overriding or dynamic dispatch equivalent in rust

I've been learning rust for a while and loving it. I've hit a wall though trying to do something which ought to be simple and elegant so I'm sure I'm missing the obvious.
So I'm parsing JavaScript using the excellent RESSA crate and end up with an AST which is a graph of structs defined by the crate. Now I need to traverse this many times and 'visit' certain nodes with my logic. So I've written a traverser that does that but when it hits a certain nodes it needs to call a callback. In my niavity, I thought I'd define a struct with an attribute for every type with an Option<Fn()> value. In my traverser, I check for the Some value and call it. This works fine but it's ugly because I have to populate this enormous struct with dozens of attributes most of which are None because I'm not interested in those types. Then I thought traits, I'd define a trait 'Visit' which defines the function with a default implementation that does nothing. Then I can just redefine the trait implementation with my desired implementation but this is no good because all the types must have an implementation and then the implementation cannot be redefined. Is there as nice way I can just provide a specific implementation for a few types and leave the rest as default or check for the existence of a function before calling it ? I must be missing an idiomatic way to do this.
You can look at something like syn::Visit, which is a visitor in a popular Rust AST library, for inspiration.
The Visit trait is implemented by the visitor only, and has one method for each node type, with the default implementation only visiting the children:
// this snippet has been slightly altered from the source
pub trait Visit<'ast> {
fn visit_expr(&mut self, i: &'ast Expr) {
visit_expr(self, i);
}
fn visit_expr_array(&mut self, i: &'ast ExprArray) {
visit_expr_array(self, i);
}
fn visit_expr_assign(&mut self, i: &'ast ExprAssign) {
visit_expr_assign(self, i);
}
// ...
}
pub fn visit_expr<'ast, V>(v: &mut V, node: &'ast Expr)
where
V: Visit<'ast> + ?Sized,
{
match node {
Expr::Array(_binding_0) => v.visit_expr_array(_binding_0),
Expr::Assign(_binding_0) => v.visit_expr_assign(_binding_0),
// ...
}
}
pub fn visit_expr_array<'ast, V>(v: &mut V, node: &'ast ExprArray)
where
V: Visit<'ast> + ?Sized,
{
for el in &node.elems {
v.visit_expr(el);
}
}
// ...
With this pattern, you can create a visitor where you only implement the methods you need, and whatever you don't implement will just get the default behavior.
Additionally, because the default methods call separate functions that do the default behavior, you can call those within your custom visitor methods if you need to invoke the default behavior of visiting the children. (Rust doesn't let you invoke default implementations of an overriden trait method directly.)
So for example, a visitor to print all array expressions in a Rust program using syn::Visit could look like:
struct MyVisitor;
impl Visit<'ast> for MyVisitor {
fn visit_expr_array(&mut self, i: &'ast ExprArray) {
println("{:?}", i);
// call default visitor method to visit this node's children as well
visit_expr_array(i);
}
}
fn main() {
let root = get_ast_root_node();
MyVisitor.visit_expr(&root);
}

How to achieve encapsulation of struct fields without borrowing the struct as a whole

My question has already been somewhat discussed here.
The problem is that I want to access multiple distinct fields of a struct in order to use them, but I don't want to work on the fields directly. Instead I'd like to encapsulate the access to them, to gain more flexibility.
I tried to achieve this by writing methods for this struct, but as you can see in the question mentioned above or my older question here this approach fails in Rust because the borrow checker will only allow you to borrow different parts of a struct if you do so directly, or if you use a method borrowing them together, since all the borrow checker knows from its signature is that self is borrowed and only one mutable reference to self can exist at any time.
Losing the flexibility to borrow different parts of a struct as mutable at the same time is of course not acceptable. Therefore I'd like to know whether there is any idiomatic way to do this in Rust.
My naive approach would be to write a macro instead of a function, performing (and encapsulating) the same functionality.
EDIT:
Because Frxstrem suggested that the question which I linked to at the top may answer my question I want to make it clear, that I'm not searching for some way to solve this. My question is which of the proposed solutions (if any) is the right way to do it?
After some more research it seems that there is no beautiful way to achieve partial borrowing without accessing the struct fields directly, at least as of now.
There are, however, multiple imperfect solutions to this which I'd like to discuss a little, so that anyone who might end up here may weigh the pros and cons for themself.
I'll use the code from my original question as an example to illustrate them here.
So let's say you have a struct:
struct GraphState {
nodes: Vec<Node>,
edges: Vec<Edge>,
}
and you're trying to do borrow one part of this struct mutably and the other immutably:
// THIS CODE DOESN'T PASS THE BORROW-CHECKER
impl GraphState {
pub fn edge_at(&self, edge_index: u16) -> &Edge {
&self.edges[usize::from(edge_index)]
}
pub fn node_at_mut(&mut self, node_index: u8) -> &mut Node {
&mut self.nodes[usize::from(node_index)]
}
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = self.edge_at(edge_index); // first (immutable) borrow here
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // second (mutable)
// borrow here -> ERROR
}
}
}
}
But of course this fails, since you're not allowed to borrow self as mutable and immutable at the same time.
So to get around this you could simply access the fields directly:
impl GraphState {
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = &self.edges[usize::from(edge_index)];
for i in 0..2 {
let node_index = edge.node_indices[i];
self.nodes[usize::from(node_index)].remove_edge(edge_index);
}
}
}
This approach works, but it has two major drawbacks:
The accessed fields need to be public (at least if you want to allow access to them from another scope). If they're implementation details that you'd rather have private you're in bad luck.
You always need to operate on the fields directly. This means that code like usize::from(node_index) needs to be repeated everywhere, which makes this approach both brittle and cumbersome.
So how can you solve this?
A) Borrow everything at once
Since borrowing self mutably multiple times is not allowed, mutably borrowing all the parts you want at once instead is one straightforward way to solve this problem:
pub fn edge_at(edges: &[Edge], edge_index: u16) -> &Edge {
&edges[usize::from(edge_index)]
}
pub fn node_at_mut(nodes: &mut [Node], node_index: u8) -> &mut Node {
&mut nodes[usize::from(node_index)]
}
impl GraphState {
pub fn data_mut(&mut self) -> (&mut [Node], &mut [Edge]) {
(&mut self.nodes, &mut self.edges)
}
pub fn remove_edge(&mut self, edge_index: u16) {
let (nodes, edges) = self.data_mut(); // first (mutable) borrow here
let edge = edge_at(edges, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut(nodes, node_index).remove_edge(edge_index); // no borrow here
// -> no error
}
}
}
}
It's clearly a workaround and far from ideal, but it works and it allows you to keep the fields themselves private (though you'll probably need to expose the implementation somewhat, as the user has to hand the necessary data to the other functions by hand).
B) Use a macro
If code reuse is all you worry about and visibility is not an issue to you you can write macros like these:
macro_rules! node_at_mut {
($this:ident, $index:expr) => {
&mut self.nodes[usize::from($index)]
}
}
macro_rules! edge_at {
($this:ident, $index:expr) => {
&mut self.edges[usize::from($index)]
}
}
...
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = edge_at!(self, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut!(self, node_index).remove_edge(edge_index);
}
}
}
If your fields are public anyway I'd probably go with this solution, as it seems the most elegant to me.
C) Go unsafe
What we want to do here is obviously safe.
Sadly the borrow checker cannot see this, since the function signatures tell him no more than that self is being borrowed. Luckily Rust allows us to use the unsafe keyword for cases like these:
pub fn remove_edge(&mut self, edge_index: u16) {
let edge: *const Edge = self.edge_at(edge_index);
for i in 0..2 {
unsafe {
let node_index = (*edge).node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // first borrow here
// (safe though since remove_edge will not invalidate the first pointer)
}
}
}
This works and gives us all the benefits of solution A), but using unsafe for something that could easily be done without, if only the language had some syntax for actual partial borrowing, seems a bit ugly to me. On the other hand it may (in some cases) be preferable to solution A), due to its sheer clunkyness...
EDIT: After some thought I realised, that we know this approach to be safe here, only because we know about the implementation. In a different use case the data in question might actually be contained in one single field (say a Map), even if it looks like we are accessing two very distinct kinds of data when calling from outside.
This is why this last approach is rightfully unsafe, as there is no way for the borrow checker to check whether we are actually borrowing different things without exposing the private fields, rendering our effort pointless.
Contrary to my initial belief, this couldn't even really be changed by expanding the language. The reason is that one would still need to expose information about private fields in some way for this to work.
After writing this answer I also found this blog post which goes a bit deeper into possible solutions (and also mentions some advanced techniques that didn't come to my mind, none of which is universally applicable though). If you happen to know another solution, or a way to improve the ones proposed here, please let me know.

What are the use cases of the newly proposed Pin type?

There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}

Vec<MyTrait> without N heap allocations?

I'm trying to port some C++ code to Rust. It composes a virtual (.mp4) file from a few kinds of slices (string reference, lazy-evaluated string reference, part of a physical file) and serves HTTP requests based on the result. (If you're curious, see Mp4File which takes advantage of the FileSlice interface and its concrete implementations in http.h.)
Here's the problem: I want require as few heap allocations as possible. Let's say I have a few implementations of resource::Slice that I can hopefully figure out on my own. Then I want to make the one that composes them all:
pub trait Slice : Send + Sync {
/// Returns the length of the slice in bytes.
fn len(&self) -> u64;
/// Writes bytes indicated by `range` to `out.`
fn write_to(&self, range: &ByteRange,
out: &mut io::Write) -> io::Result<()>;
}
// (used below)
struct SliceInfo<'a> {
range: ByteRange,
slice: &'a Slice,
}
/// A `Slice` composed of other `Slice`s.
pub struct Slices<'a> {
len: u64,
slices: Vec<SliceInfo<'a>>,
}
impl<'a> Slices<'a> {
pub fn new() -> Slices<'a> { ... }
pub fn append(&mut self, slice: &'a resource::Slice) { ... }
}
impl<'a> Slice for Slices<'a> { ... }
and use them to append lots and lots of slices with as few heap allocations as possible. Simplified, something like this:
struct ThingUsedWithinMp4Resource {
slice_a: resource::LazySlice,
slice_b: resource::LazySlice,
slice_c: resource::LazySlice,
slice_d: resource::FileSlice,
}
struct Mp4Resource {
slice_a: resource::StringSlice,
slice_b: resource::LazySlice,
slice_c: resource::StringSlice,
slice_d: resource::LazySlice,
things: Vec<ThingUsedWithinMp4Resource>,
slices: resource::Slices
}
impl Mp4Resource {
fn new() {
let mut f = Mp4Resource{slice_a: ...,
slice_b: ...,
slice_c: ...,
slices: resource::Slices::new()};
// ...fill `things` with hundreds of things...
slices.append(&f.slice_a);
for thing in f.things { slices.append(&thing.slice_a); }
slices.append(&f.slice_b);
for thing in f.things { slices.append(&thing.slice_b); }
slices.append(&f.slice_c);
for thing in f.things { slices.append(&thing.slice_c); }
slices.append(&f.slice_d);
for thing in f.things { slices.append(&thing.slice_d); }
f;
}
}
but this isn't working. The append lines cause errors "f.slice_* does not live long enough", "reference must be valid for the lifetime 'a as defined on the block at ...", "...but borrowed value is only valid for the block suffix following statement". I think this is similar to this question about the self-referencing struct. That's basically what this is, with more indirection. And apparently it's impossible.
So what can I do instead?
I think I'd be happy to give ownership to the resource::Slices in append, but I can't put a resource::Slice in the SliceInfo used in Vec<SliceInfo> because resource::Slice is a trait, and traits are unsized. I could do a Box<resource::Slice> instead but that means a separate heap allocation for each slice. I'd like to avoid that. (There can be thousands of slices per Mp4Resource.)
I'm thinking of doing an enum, something like:
enum BasicSlice {
String(StringSlice),
Lazy(LazySlice),
File(FileSlice)
};
and using that in the SliceInfo. I think I can make this work. But it definitely limits the utility of my resource::Slices class. I want to allow it to be used easily in situations I didn't anticipate, preferably without having to define a new enum each time.
Any other options?
You can add a User variant to your BasicSlice enum, which takes a Box<SliceInfo>. This way only the specialized case of users will take the extra allocation, while the normal path is optimized.

How to make a struct where one of the fields refers to another field

I have the following problem: I have a have a data structure that is parsed from a buffer and contains some references into this buffer, so the parsing function looks something like
fn parse_bar<'a>(buf: &'a [u8]) -> Bar<'a>
So far, so good. However, to avoid certain lifetime issues I'd like to put the data structure and the underlying buffer into a struct as follows:
struct BarWithBuf<'a> {bar: Bar<'a>, buf: Box<[u8]>}
// not even sure if these lifetime annotations here make sense,
// but it won't compile unless I add some lifetime to Bar
However, now I don't know how to actually construct a BarWithBuf value.
fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
let my_bar = parse_bar(&*buf);
BarWithBuf {buf: buf, bar: my_bar}
}
doesn't work, since buf is moved in the construction of the BarWithBuf value, but we borrowed it for parsing.
I feel like it should be possible to do something along the lines of
fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
let mut bwb = BarWithBuf {buf: buf};
bwb.bar = parse_bar(&*bwb.buf);
bwb
}
to avoid moving the buffer after parsing the Bar, but I can't do that because the whole BarWithBuf struct has to be initalised in one go.
Now I suspect that I could use unsafe code to partially construct the struct, but I'd rather not do that.
What would be the best way to solve this problem? Do I need unsafe code? If I do, would it be safe do to this here? Or am I completely on the wrong track here and there is a better way to tie a data structure and its underlying buffer together?
I think you're right in that it's not possible to do this without unsafe code. I would consider the following two options:
Change the reference in Bar to an index. The contents of the box won't be protected by a borrow, so the index might become invalid if you're not careful. However, an index might convey the meaning of the reference in a clearer way.
Move Box<[u8]> into Bar, and add a function buf() -> &[u8] to the implementation of Bar; instead of references, store indices in Bar. Now Bar is the owner of the buffer, so it can control its modification and keep the indices valid (thereby avoiding the problem of option #1).
As per DK's suggestion below, store indices in BarWithBuf (or in a helper struct BarInternal) and add a function fn bar(&self) -> Bar to the implementation of BarWithBuf, which constructs a Bar on-the-fly.
Which of these options is the most appropriate one depends on the actual problem context. I agree that some form of "member-by-member construction" of structs would be immensely helpful in Rust.
Here's an approach that will work through a little bit of unsafe code. This approach requires that you are okay with putting the referred-to thing (here, your [u8]) on the heap, so it won't work for direct reference of a sibling field.
Let's start with a toy Bar<'a> implementation:
struct Bar<'a> {
refs: Vec<&'a [u8]>,
}
impl<'a> Bar<'a> {
pub fn parse(src: &'a [u8]) -> Self {
// placeholder for actually parsing goes here
Self { refs: vec![src] }
}
}
We'll make BarWithBuf that uses a Bar<'static>, as 'static is the only lifetime with an an accessible name. The buffer we store things in can be anything that doesn't move the target data around on us. I'm going to go with a Vec<u8>, but Box, Pin, whatever will work fine.
struct BarWithBuf {
buf: Vec<u8>,
bar: Bar<'static>,
}
The implementation requires a tiny bit of unsafe code.
impl BarWithBuf {
pub fn new(buf: Vec<u8>) -> Self {
// The `&'static [u8]` is inferred, but writing it here for demo
let buf_slice: &'static [u8] = unsafe {
// Going through a pointer is a "good" way to get around lifetime checks
std::slice::from_raw_parts(&buf[0], buf.len())
};
let bar = Bar::parse(buf_slice);
Self { buf, bar }
}
/// Access to Bar should always come through this function.
pub fn bar(&self) -> &Bar {
&self.bar
}
}
The BarWithBuf::bar is an important function to re-associate the proper lifetimes to the references. Rust's lifetime elision rules make the function equivalent to pub fn bar<'a>(&'a self) -> &'a Bar<'a>, which turns out to be exactly what we want. The lifetime of the slices in BarWithBuf::bar::refs are tied to the lifetime of BarWithBuf.
WARNING: You have to be very careful with your implementation here. You cannot make #[derive(Clone)] for BarWithBuf, since the default clone implementation will clone buf, but the elements of bar.refs will still point to the original. It is only one line of unsafe code, but the safety is still off in the "safe" bits.
For larger bits of self-referencing structures, there's the ouroboros crate, which wraps up a lot of unsafe bits for you. The techniques are similar to the one I described above, but they live behind macros, which is a more pleasant experience if you find yourself making a number of self-references.

Resources