Use regular reference instead of `Box` in recursive data structures - rust

I am new to Rust. When I read chapter 15 of The Rust Programming Language, I failed to know why one should use Boxes in recursive data structures instead of regular references. 15.1 of the book explains that indirection is required to avoid infinite-sized structures, but it does not explain why to use Box.
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn main() {
let list = Cons(1, &Cons(2, &Cons(3, &Nil)));
println!("{:?}", list);
}
The code above compiles and produces the desired output. It seems that using FunctionalList to store a small amount of data on stack works perfectly well. Does this code cause troubles?

It is true that the FunctionalList works in this simple case. However, we will run into some difficulties if we try to use this structure in other ways. For instance, suppose we tried to construct a FunctionalList and then return it from a function:
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list(x: u32) -> FunctionalList {
return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
}
fn main() {
let list = make_list(1);
println!("{:?}", list);
}
This results in the following compile error:
error[E0106]: missing lifetime specifier
--> src/main.rs:9:25
|
9 | fn make_list(x: u32) -> FunctionalList {
| ^^^^^^^^^^^^^^ help: consider giving it an explicit bounded or 'static lifetime: `FunctionalList + 'static`
If we follow the hint and add a 'static lifetime, then we instead get this error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:10:12
|
10 | return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
| ^^^^^^^^^^^^^^^^^^^^^^-----------------^^
| | |
| | temporary value created here
| returns a value referencing data owned by the current function
The issue is that the inner FunctionalList values here are owned by implicit temporary variables whose scope ends at the end of the make_list function. These values would thus be dropped at the end of the function, leaving dangling references to them, which Rust disallows, hence the borrow checker rejects this code.
In contrast, if FunctionalList had been defined to Box its FunctionalList component, then ownership would have been moved from the temporary value into the containing FunctionalList, and we would have been able to return it without any problem.
With your original FunctionalList, the thing we have to think about is that every value in Rust has to have an owner somewhere; and so if, as in this case, the FunctionaList is not the owner of its inner FunctionalLists, then that ownership has to reside somewhere else. In your example, that owner was an implicit temporary variable, but in more complex situations we could use a different kind of external owner. Here's an example of using a TypedArena (from the typed-arena crate) to own the data, so that we can still implement a variation of the make_list function:
use typed_arena::Arena;
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list<'a>(x: u32, arena: &'a Arena<FunctionalList<'a>>) -> &mut FunctionalList<'a> {
let l0 = arena.alloc(Nil);
let l1 = arena.alloc(Cons(x + 2, l0));
let l2 = arena.alloc(Cons(x + 1, l1));
let l3 = arena.alloc(Cons(x, l2));
return l3;
}
fn main() {
let arena = Arena::new();
let list = make_list(1, &arena);
println!("{:?}", list);
}
In this case, we adapted the return type of make_list to return only a mutable reference to a FunctionalList, instead of returning an owned FunctionalList, since now the ownership resides in the arena.

Related

Rust multiple lifetimes in structs

I have problems with understanding the behavior and availability of structs with multiple lifetime parameters. Consider the following:
struct My<'a,'b> {
first: &'a String,
second: &'b String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = My{
first: &first,
second: &second
}
}
println!("{}", my.first)
}
The error message says that
|
13 | second: &second
| ^^^^^^^ borrowed value does not live long enough
14 | }
15 | }
| - `second` dropped here while still borrowed
16 | println!("{}", my.first)
| -------- borrow later used here
First, I do not access the .second element of the struct. So, I do not see the problem.
Second, the struct has two life time parameters. I assume that compiler tracks the fields of struct seperately.
For example the following compiles fine:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
std::mem::drop(my.second);
println!("{}", my.first)
}
Which means that even though, .second of the struct is dropped that does not invalidate the whole struct. I can still access the non-dropped elements.
Why doesn't the same the same work for structs with references?
The struct has two independent lifetime parameters. Just like a struct with two type parameters are independent of each other, I would expect that these two lifetimes are independent as well. But the error message suggest that in the case of lifetimes these are not independent. The resultant struct does not have two lifetime parameters but only one that is the smaller of the two.
If the validity of struct containing two references limited to the lifetime of reference with the smallest lifetime, then my question is what is the difference between
struct My1<'a,'b>{
f: &'a X,
s: &'b Y,
}
and
struct My2<'a>{
f: &'a X,
s: &'a Y
}
I would expect that structs with multiple lifetime parameters to behave similar to functions with multiple lifetime parameters. Consider these two functions
fn fun_single<'a>(x:&'a str, y: &'a str) -> &'a str {
if x.len() <= y.len() {&x[0..1]} else {&y[0..1]}
}
fn fun_double<'a,'b>(x: &'a str, y:&'b str) -> &'a str {
&x[0..1]
}
fn main() {
let first = "first".to_string();
let second = "second".to_string();
let ref_first = &first;
let ref_second = &second;
let result_ref = fun_single(ref_first, ref_second);
std::mem::drop(second);
println!("{result_ref}")
}
In this version we get the result from a function with single life time parameter. Compiler thinks that two function parameters are related so it picks the smallest lifetime for the reference we return from the function. So it does not compile this version.
But if we just replace the line
let result_ref = fun_single(ref_first, ref_second);
with
let result_ref = fun_double(ref_first, ref_second);
the compiler sees that two lifetimes are independent so even when you drop second result_ref is still valid, the lifetime of the return reference is not the smallest but independent from second parameter and it compiles.
I would expect that structs with multiple lifetimes and functions with multiple lifetimes to behave similarly. But they don't.
What am I missing here?
I assume that compiler tracks the fields of struct seperately.
I think that's the core of your confusion. The compiler does track each lifetime separately, but only statically at compile time, not during runtime. It follows from this that Rust generally can not allow structs to be partially valid.
So, while you do specify two lifetime parameters, the compiler figures that the struct can only be valid as long as both of them are alive: that is, until the shorter-lived one lives.
But then how does the second example work? It relies on an exceptional feature of the compiler, called Partial Moving. That means that whenever you move out of a struct, it allows you to move disjoint parts separately.
It is essentially a syntax sugar for the following:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
let Own{
first: my_first,
second: my_second,
} = my;
std::mem::drop(my_second);
println!("{}", my_first);
}
Note that this too is a static feature, so the following will not compile (even though it would work when run):
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
if false {
std::mem::drop(my.first);
}
println!("{}", my.first)
}
The struct may not be moved as a whole once it has been partially moved, so not even this allows you to have partially valid structs.
A local variable may be partially initialized, such as in your second example. Rust can track this for local variables and give you an error if you attempt to access the uninitialized parts.
However in your first example the variable isn't actually partially initialized, it's fully initialized (you give it both the first and second field). Then, when second goes out of scope, my is still fully initialized, but it's second field is now invalid (but initialized). Thus it doesn't even let the variable exist past when second is dropped to avoid an invalid reference.
Rust could track this since you have 2 lifetimes and name the second lifetime a special 'never that would signal the reference is always invalid, but it currently doesn't.

Can I move from &mut self to &self within the same function?

I'm trying to learn a bit of Rust through a toy application, which involves a tree data structure that is filled dynamically by querying an external source. In the beginning, only the root node is present. The tree structure provides a method get_children(id) that returns a [u32] of the IDs of all the node's children — either this data is already known, or the external source is queried and all the nodes are inserted into the tree.
I'm running into the following problem with the borrow checker that I can't seem to figure out:
struct Node {
id: u32,
value: u64, // in my use case, this type is much larger and should not be copied
children: Option<Vec<u32>>,
}
struct Tree {
nodes: std::collections::HashMap<u32, Node>,
}
impl Tree {
fn get_children(&mut self, id: u32) -> Option<&[u32]> {
// This will perform external queries and add new nodes to the tree
None
}
fn first_even_child(&mut self, id: u32) -> Option<u32> {
let children = self.get_children(id)?;
let result = children.iter().find(|&id| self.nodes.get(id).unwrap().value % 2 == 0)?;
Some(*result)
}
}
Which results in:
error[E0502]: cannot borrow `self.nodes` as immutable because it is also borrowed as mutable
--> src/lib.rs:19:43
|
18 | let children = self.get_children(id)?;
| ---- mutable borrow occurs here
19 | let result = children.iter().find(|&id| self.nodes.get(id).unwrap().value % 2 == 0)?;
| ---- ^^^^^ ---------- second borrow occurs due to use of `self.nodes` in closure
| | |
| | immutable borrow occurs here
| mutable borrow later used by call
Since get_children might insert nodes into the tree, we need a &mut self reference. However, the way I see it, after the value of children is known, self no longer needs to be borrowed mutably. Why does this not work, and how would I fix it?
EDIT -- my workaround
After Chayim Friedman's answer, I decided against returning Self. I mostly ran into the above problem when first calling get_children to get a list of IDs and then using nodes.get() to obtain the corresponding Node. Instead, I refactored to provide the following functions:
impl Tree {
fn load_children(&mut self, id: u32) {
// If not present yet, perform queries to add children to the tree
}
fn iter_children(&self, id: u32) -> Option<IterChildren> {
// Provides an iterator over the children of node `id`
}
}
Downgrading a mutable reference into a shared reference produces a reference that should be kept unique. This is necessary for e.g. Cell::from_mut(), which has the following signature:
pub fn from_mut(t: &mut T) -> &Cell<T>
This method relies on the uniqueness guarantee of &mut T to ensure no references to T are kept directly, only via Cell. If downgrading the reference would mean the unqiueness could have been violated, this method would be unsound, because the value inside the Cell could have been changed by another shared references (via interior mutability).
For more about this see Common Rust Lifetime Misconceptions: downgrading mut refs to shared refs is safe.
To solve this you need to get both shared references from the same shared reference that was created from the mutable reference. You can, for example, also return &Self from get_children():
fn get_children(&mut self, id: u32) -> Option<(&Self, &[u32])> {
// This will perform external queries and add new nodes to the tree
Some((self, &[]))
}
fn first_even_child(&mut self, id: u32) -> Option<u32> {
let (this, children) = self.get_children(id)?;
let result = children.iter().find(|&id| this.nodes.get(id).unwrap().value % 2 == 0)?;
Some(*result)
}

Rust error :Cannot return value referencing temporary value

I'm trying to make a code that returns the mode of a list of given numbers.
Here's the code :
use std::collections::HashMap;
fn mode (vector: &Vec<i32>) -> Vec<&&i32> {
let mut occurrences = HashMap::new();
let mut n= Vec::new();
let mut mode = Vec::new();
for i in vector {
let j= occurrences.entry(i).or_insert(0);
*j+=1;
}
for (num, occ) in occurrences.clone().iter() {
if occ> n[0] {
n.clear();
mode.clear();
n.push(occ);
mode.push(num);
} else if occ== n[0] {
mode.push(num);
}
}
mode
}
fn main () {
let mut numbers: Vec<i32>= vec![1,5,2,2,5,3]; // 2 and 5 are the mode
numbers.sort();
println!("the mode is {:?}:", mode(&numbers));
}
I used a vector for the mode since a dataset could be multimodal.
Anyway, I'm getting the following error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:26:5
|
13 | for (num, occ) in occurrences.clone().iter() {
| ------------------- temporary value created here
...
26 | mode
| ^^^^ returns a value referencing data owned by the current function
When you return from the current function, any owned values are destroyed (other than the ones being returned from the function), and any data referencing that destroyed data therefore cannot be returned, e.g.:
fn example() -> &str {
let s = String::from("hello"); // owned data
&s // error: returns a value referencing data owned by the current function
// you can imagine this is added by the compiler
drop(s);
}
The issue you have comes from iter(). iter() returns an iterator of shared references:
let values: Vec<i32> = vec![1, 2, 3];
for i in values.iter() {
// i is a &i32
}
for i in values {
// i is an i32
}
So when you call occurrences.clone().iter() you're creating a temporary value (via clone()) which is owned by the current function, then iterating over that data via shared reference. When you destructure the tuple in (num, occ), these are also shared references.
Because you later call mode.push(num), Rust realizes that mode has the type Vec<&i32>. However, there is an implicit lifetime here. The lifetime of num is essentially the lifetime of the current function (let's call that 'a), so the full type of mode is Vec<&'a i32>.
Because of that, you can't return it from the current function.
To fix
Removing iter() should work, since then you will be iterating over owned values. You might also find that you can remove .clone() too, I haven't looked too closely but it seems like it's redundant.
A couple of other points while you're here:
It's rare to interact with &Vec<Foo>, instead it's much more usual to use slices: &[Foo]. They're more general, and in almost all cases more performant (you can still pass your data in like: &numbers)
Check out clippy, it has a bunch of linter rules that can catch a bunch of errors much earlier, and usually does a good job explaining them: https://github.com/rust-lang/rust-clippy

Return Ref to something inside of Rc<RefCell<>> without Ref::map

Working code first:
use std::cell::{Ref, RefCell};
use std::rc::Rc;
struct ValueHolder {
value: i32
}
fn give_value(wrapped: &Rc<RefCell<ValueHolder>>) -> Ref<i32> {
Ref::map(
(**wrapped).borrow(),
|borrowed| { &(*borrowed).value },
)
}
fn main() {
println!("Our value: {}", *give_value(
&Rc::new(RefCell::new(ValueHolder { value: 1337 }))
));
}
The relevant part is the give_value function.
I want to return a Ref to something inside a Rc<RefCell<>>, in this case a value inside of a struct.
That works just fine, but since I just started to learn Rust, I wonder, how I could achieve the same thing without using Ref::map.
The "naive" approach:
fn give_value(wrapped: &Rc<RefCell<ValueHolder>>) -> &i32 {
&(*(**wrapped).borrow()).value
}
fails for obvious reasons:
error[E0515]: cannot return value referencing temporary value
--> src/bin/rust_example.rs:9:5
|
9 | &(*(**wrapped).borrow()).value
| ^^^--------------------^^^^^^^
| | |
| | temporary value created here
| returns a value referencing data owned by the current function
So my question is: How can I recreate what the Ref::map function does by myself?
You can't. RefCell requires that Ref be used whenever the value is used, so it knows when it should allow a borrow. When the Ref gets dropped, it signals to the RefCell that it can decrease the borrow count, and if the borrow count is 0, then a mutable borrow can happen. Being able to obtain a reference to the internal data that doesn't borrow the Ref (note that a reference you can return from a function cannot borrow the Ref otherwise you'd be referencing a temporary) would be problematic, because then the Ref could get dropped and the RefCell thinks that it can lend out a mutable borrow, but there's still an immutable borrow to the data. The source code for Ref::map is:
pub fn map<U: ?Sized, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, U>
where
F: FnOnce(&T) -> &U,
{
Ref { value: f(orig.value), borrow: orig.borrow }
}
which uses private fields that you can't access. So, no, you cannot recreate Ref::map yourself.

Why can't I reuse a &mut reference after passing it to a function that accepts a generic type?

Why doesn't this code compile:
fn use_cursor(cursor: &mut io::Cursor<&mut Vec<u8>>) {
// do some work
}
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(data);
use_cursor(&mut buf);
}
data.len();
}
fn produce_data() {
let mut data = Vec::new();
take_reference(&mut data);
data.len();
}
The error in this case is:
error[E0382]: use of moved value: `*data`
--> src/main.rs:14:5
|
9 | let mut buf = io::Cursor::new(data);
| ---- value moved here
...
14 | data.len();
| ^^^^ value used here after move
|
= note: move occurs because `data` has type `&mut std::vec::Vec<u8>`, which does not implement the `Copy` trait
The signature of io::Cursor::new is such that it takes ownership of its argument. In this case, the argument is a mutable reference to a Vec.
pub fn new(inner: T) -> Cursor<T>
It sort of makes sense to me; because Cursor::new takes ownership of its argument (and not a reference) we can't use that value later on. At the same time it doesn't make sense: we essentially only pass a mutable reference and the cursor goes out of scope afterwards anyway.
In the produce_data function we also pass a mutable reference to take_reference, and it doesn't produce a error when trying to use data again, unlike inside take_reference.
I found it possible to 'reclaim' the reference by using Cursor.into_inner(), but it feels a bit weird to do it manually, since in normal use-cases the borrow-checker is perfectly capable of doing it itself.
Is there a nicer solution to this problem than using .into_inner()? Maybe there's something else I don't understand about the borrow-checker?
Normally, when you pass a mutable reference to a function, the compiler implicitly performs a reborrow. This produces a new borrow with a shorter lifetime.
When the parameter is generic (and is not of the form &mut T), the compiler doesn't do this reborrowing automatically1. However, you can do it manually by dereferencing your existing mutable reference and then referencing it again:
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(&mut *data);
use_cursor(&mut buf);
}
data.len();
}
1 — This is because the current compiler architecture only allows a chance to do a coercion if both the source and target types are known at the coercion site.

Resources