Safely traversing a directed acyclic graph - rust

I'm attempting to build and traverse a DAG. There seems to be two feasible approaches: use Rc<RefCell<Node>> for edges, or utilize an arena allocator and some unsafe code. (See details here.)
I'm opting for the former, but having difficulty traversing the graph to its edges, as any borrow of a child node relies on borrows to its parents:
use std::cell::RefCell;
use std::rc::Rc;
// See: https://aminb.gitbooks.io/rust-for-c/content/graphs/index.html,
// https://github.com/nrc/r4cppp/blob/master/graphs/src/ref_graph.rs
pub type Link<T> = Rc<RefCell<T>>;
pub struct DagNode {
/// Each node can have several edges flowing *into* it, i.e. several owners,
/// hence the use of Rc. RefCell is used so we can have mutability
/// while building the graph.
pub edge: Option<Link<DagNode>>,
// Other data here
}
// Attempt to walk down the DAG until we reach a leaf.
fn walk_to_end(node: &Link<DagNode>) -> &Link<DagNode> {
let nb = node.borrow();
match nb.edge {
Some(ref prev) => walk_to_end(prev),
// Here be dragons: the borrow relies on all previous borrows,
// so this fails to compile.
None => node
}
}
I could modify the reference counts, i.e.
fn walk_to_end(node: Link<HistoryNode>) -> Link<HistoryNode> {
let nb = node.borrow();
match nb.previous {
Some(ref prev) => walk_to_end(prev.clone()),
None => node.clone()
}
}
but bumping the reference counts every time you traverse a node seems like quite the hack. What is the idiomatic approach here?

Rc isn't really a problem here: if you get rid of the RefCells, everything just compiles. Actually, in some situations, this might be a solution: if you need to mutate the contents of the node, but not the edges, you can just change your data structure so the edges aren't inside a RefCell.
The argument also isn't really the problem; this compiles:
fn walk_to_end(node: &Link<DagNode>) -> Link<DagNode> {
let nb = node.borrow();
match nb.edge {
Some(ref prev) => walk_to_end(prev),
None => node.clone()
}
}
The problem here is really returning the result. Basically, there isn't any way to write the return value you want. I mean, you could theoretically make your method return a wrapper around a Vec<Ref<T>>, but that's a lot more expensive than just bumping the reference count on the result.
More generally, Rc<RefCell<T>> is difficult to work with because it's a complicated data structure: you can safely mutate multiple nodes at the same time, and it keeps track of exactly how many edges reference each node.
Note that you don't have to dip into unsafe code to use an arena. https://crates.io/crates/typed-arena provides a safe API for arenas. I'm not sure why the example you linked to uses UnsafeCell; it certainly isn't necessary.

Related

Rust mutable container of immutable elements?

With Rust, is it in general possible to have a mutable container of immutable values?
Example:
struct TestStruct { value: i32 }
fn test_fn()
{
let immutable_instance = TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
Here, my TestStruct instance is wrapped in two containers: a Box, then a Vec. From the perspective of a new Rust user it's surprising that moving the Box into the Vec makes both the Box and the TestStruct instance mutable.
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Not really. You could easily create one (just create a wrapper object which implements Deref but not DerefMut), but the reality is that Rust doesn't really see (im)mutability that way, because its main concern is controlling sharing / visibility.
After all, for an external observer what difference is there between
mutable_vector[0].value = 456;
and
mutable_vector[0] = Box::new(TestStruct{value: 456});
?
None is the answer, because Rust's ownership system means it's not possible for an observer to have kept a handle on the original TestStruct, thus they can't know whether that structure was replaced or modified in place[1][2].
If you want to secure your internal state, use visibility instead: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8a9346072b32cedcf2fccc0eeb9f55c5
mod foo {
pub struct TestStruct { value: i32 }
impl TestStruct {
pub fn new(value: i32) -> Self { Self { value } }
}
}
fn test_fn() {
let immutable_instance = foo::TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
does not compile because from the point of view of test_fn, TestStruct::value is not accessible. Therefore test_fn has no way to mutate a TestStruct unless you add an &mut method on it.
[1]: technically they could check the address in memory and that might tell them, but even then it's not a sure thing (in either direction) hence pinning being a thing.
[2]: this observability distinction is also embraced by other languages, for instance the Clojure language largely falls on the "immutable all the things" side, however it has a concept of transients which allow locally mutable objects

Transfer ownership of Vec to VecDeque

In the following code,
I want to transfer the staged_bytes
vector into the buffer. Specifically,
I want 'buffer' to take ownership
of staged_bytes so that I can
reuse the staged_bytes field for
a brand new vec of u8.
I show a solution to my problem in the code.
Unfortunately according
to rust documentation, it implies a copy
of the vector elements. Since that vector
can be big, I don't want the copy, hence
my desire for an ownership transfer.
I think I can't do it the way I want because the ownership is
at the same time in staged_bytes and buffer
during the transfer().
So what are the solutions ? I thought of shared
pointers (Rc, etc.) but it seems overkill since
it's not actually shared (maybe the optimizer
figures it out ?)...
use std::collections::VecDeque;
struct M {
buffer : VecDeque<Vec<u8>>,
staged_bytes: Vec<u8>
}
impl M {
fn transfer(&mut self) {
// This is what I want to do (the idea)
// (it doesn't compile, E0507)
// self.buffer.push_front(self.staged_bytes);
// This is what I don't want to do
// (it compiles)
self.buffer.push_front(self.staged_bytes.clone());
// After the transfer, I can start with a new vec.
self.staged_bytes = Vec::new();
}
}
fn main() {
let mut s = M {
buffer : VecDeque::new(),
staged_bytes: Vec::new()
};
s.staged_bytes.push(112);
s.transfer();
}
You want to use std::mem::take() here, which returns the value moved from a mutable reference and replaces it with the default value of the type:
self.buffer.push_front(std::mem::take(&mut self.staged_bytes));
Since the default value of a Vec is an empty vector, you can remove the following assignment (self.staged_bytes = Vec::new();) as it is redundant.

How to achieve encapsulation of struct fields without borrowing the struct as a whole

My question has already been somewhat discussed here.
The problem is that I want to access multiple distinct fields of a struct in order to use them, but I don't want to work on the fields directly. Instead I'd like to encapsulate the access to them, to gain more flexibility.
I tried to achieve this by writing methods for this struct, but as you can see in the question mentioned above or my older question here this approach fails in Rust because the borrow checker will only allow you to borrow different parts of a struct if you do so directly, or if you use a method borrowing them together, since all the borrow checker knows from its signature is that self is borrowed and only one mutable reference to self can exist at any time.
Losing the flexibility to borrow different parts of a struct as mutable at the same time is of course not acceptable. Therefore I'd like to know whether there is any idiomatic way to do this in Rust.
My naive approach would be to write a macro instead of a function, performing (and encapsulating) the same functionality.
EDIT:
Because Frxstrem suggested that the question which I linked to at the top may answer my question I want to make it clear, that I'm not searching for some way to solve this. My question is which of the proposed solutions (if any) is the right way to do it?
After some more research it seems that there is no beautiful way to achieve partial borrowing without accessing the struct fields directly, at least as of now.
There are, however, multiple imperfect solutions to this which I'd like to discuss a little, so that anyone who might end up here may weigh the pros and cons for themself.
I'll use the code from my original question as an example to illustrate them here.
So let's say you have a struct:
struct GraphState {
nodes: Vec<Node>,
edges: Vec<Edge>,
}
and you're trying to do borrow one part of this struct mutably and the other immutably:
// THIS CODE DOESN'T PASS THE BORROW-CHECKER
impl GraphState {
pub fn edge_at(&self, edge_index: u16) -> &Edge {
&self.edges[usize::from(edge_index)]
}
pub fn node_at_mut(&mut self, node_index: u8) -> &mut Node {
&mut self.nodes[usize::from(node_index)]
}
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = self.edge_at(edge_index); // first (immutable) borrow here
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // second (mutable)
// borrow here -> ERROR
}
}
}
}
But of course this fails, since you're not allowed to borrow self as mutable and immutable at the same time.
So to get around this you could simply access the fields directly:
impl GraphState {
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = &self.edges[usize::from(edge_index)];
for i in 0..2 {
let node_index = edge.node_indices[i];
self.nodes[usize::from(node_index)].remove_edge(edge_index);
}
}
}
This approach works, but it has two major drawbacks:
The accessed fields need to be public (at least if you want to allow access to them from another scope). If they're implementation details that you'd rather have private you're in bad luck.
You always need to operate on the fields directly. This means that code like usize::from(node_index) needs to be repeated everywhere, which makes this approach both brittle and cumbersome.
So how can you solve this?
A) Borrow everything at once
Since borrowing self mutably multiple times is not allowed, mutably borrowing all the parts you want at once instead is one straightforward way to solve this problem:
pub fn edge_at(edges: &[Edge], edge_index: u16) -> &Edge {
&edges[usize::from(edge_index)]
}
pub fn node_at_mut(nodes: &mut [Node], node_index: u8) -> &mut Node {
&mut nodes[usize::from(node_index)]
}
impl GraphState {
pub fn data_mut(&mut self) -> (&mut [Node], &mut [Edge]) {
(&mut self.nodes, &mut self.edges)
}
pub fn remove_edge(&mut self, edge_index: u16) {
let (nodes, edges) = self.data_mut(); // first (mutable) borrow here
let edge = edge_at(edges, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut(nodes, node_index).remove_edge(edge_index); // no borrow here
// -> no error
}
}
}
}
It's clearly a workaround and far from ideal, but it works and it allows you to keep the fields themselves private (though you'll probably need to expose the implementation somewhat, as the user has to hand the necessary data to the other functions by hand).
B) Use a macro
If code reuse is all you worry about and visibility is not an issue to you you can write macros like these:
macro_rules! node_at_mut {
($this:ident, $index:expr) => {
&mut self.nodes[usize::from($index)]
}
}
macro_rules! edge_at {
($this:ident, $index:expr) => {
&mut self.edges[usize::from($index)]
}
}
...
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = edge_at!(self, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut!(self, node_index).remove_edge(edge_index);
}
}
}
If your fields are public anyway I'd probably go with this solution, as it seems the most elegant to me.
C) Go unsafe
What we want to do here is obviously safe.
Sadly the borrow checker cannot see this, since the function signatures tell him no more than that self is being borrowed. Luckily Rust allows us to use the unsafe keyword for cases like these:
pub fn remove_edge(&mut self, edge_index: u16) {
let edge: *const Edge = self.edge_at(edge_index);
for i in 0..2 {
unsafe {
let node_index = (*edge).node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // first borrow here
// (safe though since remove_edge will not invalidate the first pointer)
}
}
}
This works and gives us all the benefits of solution A), but using unsafe for something that could easily be done without, if only the language had some syntax for actual partial borrowing, seems a bit ugly to me. On the other hand it may (in some cases) be preferable to solution A), due to its sheer clunkyness...
EDIT: After some thought I realised, that we know this approach to be safe here, only because we know about the implementation. In a different use case the data in question might actually be contained in one single field (say a Map), even if it looks like we are accessing two very distinct kinds of data when calling from outside.
This is why this last approach is rightfully unsafe, as there is no way for the borrow checker to check whether we are actually borrowing different things without exposing the private fields, rendering our effort pointless.
Contrary to my initial belief, this couldn't even really be changed by expanding the language. The reason is that one would still need to expose information about private fields in some way for this to work.
After writing this answer I also found this blog post which goes a bit deeper into possible solutions (and also mentions some advanced techniques that didn't come to my mind, none of which is universally applicable though). If you happen to know another solution, or a way to improve the ones proposed here, please let me know.

Sequencing two independent futures with different error types

I have code lined up as below :-
let done = client_a.
.get_future(data)
.then(move |result| {
// further processing on result
spawn a future here.
});
tokio::run(done);
Now I have another future the result of which I want to process along with 'processing of result'.
However, that future is completely independent of client_a which implies :-
both could have different error types.
failure of one should not stop other one.
let done = client_a.
.get_future(data)
.then(move |result| {
// how to fit in
// client_b.get_future
// here
// further processing on both results
spawn third future here.
});
tokio::run(done);
If both error and item types are heterogenous and you know how many futures you will chain, the simplest way to do so is to chain into an infallible Future (because that's what your remaining future really is) whose Item type is a tuple of all the intermediate results.
This can be relatively simply implemented by simple chaining:
let future1_casted = future1.then(future::ok::<Result<_, _>, ()>);
let future2_casted = future2.then(future::ok::<Result<_, _>, ()>);
let chain = future1_casted
.and_then(|result1| future2_casted.map(|result2| (result1, result2)));
Playground link
The final future type is a tuple containing all the results of the futures.
If you do not know how many futures you are chaining, you will need to strengthen your requirements and explicitly know ahead of time the possible return types of your futures. Since it is not possible without macros to generate an arbitrary-sized tuple, you're going to need to store the intermediate results into a structure requiring homogeneous types.
To solve this problem, defining tuples containing your types, for example for errors, is required:
#[derive(PartialEq, Debug)]
pub enum MyError {
Utf16Error(char::DecodeUtf16Error),
ParseError(num::ParseIntError)
}
impl From<char::DecodeUtf16Error> for MyError {
fn from(e: char::DecodeUtf16Error) -> Self {
MyError::Utf16Error(e)
}
}
impl From<num::ParseIntError> for MyError {
fn from(e: num::ParseIntError) -> Self {
MyError::ParseError(e)
}
}
From there, combining futures follows the same route as before - turn a fallible future into an infallible Result<_, _> return, and combine then into a structure like a Vec with future::loop_fn()

Referencing a containing struct in Rust (and calling methods on it)

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I'm trying to write a container structure in Rust where its elements also store a reference to the containing container so that they can call methods on it. As far as I could figure out, I need to do this via Rc<RefCell<T>>. Is this correct?
So far, I have something like the following:
struct Container {
elems: ~[~Element]
}
impl Container {
pub fn poke(&mut self) {
println!("Got poked.");
}
}
struct Element {
datum: int,
container: Weak<RefCell<Container>>
}
impl Element {
pub fn poke_container(&mut self) {
let c1 = self.container.upgrade().unwrap(); // Option<Rc>
let mut c2 = c1.borrow().borrow_mut(); // &RefCell
c2.get().poke();
// self.container.upgrade().unwrap().borrow().borrow_mut().get().poke();
// -> Error: Borrowed value does not live long enough * 2
}
}
fn main() {
let container = Rc::new(RefCell::new(Container{ elems: ~[] }));
let mut elem1 = Element{ datum: 1, container: container.downgrade() };
let mut elem2 = Element{ datum: 2, container: container.downgrade() };
elem1.poke_container();
}
I feel like I am missing something here. Is accessing the contents of a Rc<RefCell<T>> really this difficult (in poke_container)? Or am I approaching the problem the wrong way?
Lastly, and assuming the approach is correct, how would I write an add method for Container so that it could fill in the container field in Element (assuming I changed the field to be of type Option<Rc<RefCell<T>>>? I can't create another Rc from &mut self as far as I know.
The long chain of method calls actually works for me on master without any changes, because the lifetime of "r-values" (e.g. the result of function calls) have changed so that the temporary return values last until the end of the statement, rather than the end of the next method call (which seemed to be how the old rule worked).
As Vladimir hints, overloadable dereference will likely reduce it to
self.container.upgrade().unwrap().borrow_mut().poke();
which is nicer.
In any case, "mutating" shared ownership is always going to be (slightly) harder to write in Rust that either single ownership code or immutable shared ownership code, because it's very easy for such code to be memory unsafe (and memory safety is the core goal of Rust).

Resources