I have problems with understanding the behavior and availability of structs with multiple lifetime parameters. Consider the following:
struct My<'a,'b> {
first: &'a String,
second: &'b String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = My{
first: &first,
second: &second
}
}
println!("{}", my.first)
}
The error message says that
|
13 | second: &second
| ^^^^^^^ borrowed value does not live long enough
14 | }
15 | }
| - `second` dropped here while still borrowed
16 | println!("{}", my.first)
| -------- borrow later used here
First, I do not access the .second element of the struct. So, I do not see the problem.
Second, the struct has two life time parameters. I assume that compiler tracks the fields of struct seperately.
For example the following compiles fine:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
std::mem::drop(my.second);
println!("{}", my.first)
}
Which means that even though, .second of the struct is dropped that does not invalidate the whole struct. I can still access the non-dropped elements.
Why doesn't the same the same work for structs with references?
The struct has two independent lifetime parameters. Just like a struct with two type parameters are independent of each other, I would expect that these two lifetimes are independent as well. But the error message suggest that in the case of lifetimes these are not independent. The resultant struct does not have two lifetime parameters but only one that is the smaller of the two.
If the validity of struct containing two references limited to the lifetime of reference with the smallest lifetime, then my question is what is the difference between
struct My1<'a,'b>{
f: &'a X,
s: &'b Y,
}
and
struct My2<'a>{
f: &'a X,
s: &'a Y
}
I would expect that structs with multiple lifetime parameters to behave similar to functions with multiple lifetime parameters. Consider these two functions
fn fun_single<'a>(x:&'a str, y: &'a str) -> &'a str {
if x.len() <= y.len() {&x[0..1]} else {&y[0..1]}
}
fn fun_double<'a,'b>(x: &'a str, y:&'b str) -> &'a str {
&x[0..1]
}
fn main() {
let first = "first".to_string();
let second = "second".to_string();
let ref_first = &first;
let ref_second = &second;
let result_ref = fun_single(ref_first, ref_second);
std::mem::drop(second);
println!("{result_ref}")
}
In this version we get the result from a function with single life time parameter. Compiler thinks that two function parameters are related so it picks the smallest lifetime for the reference we return from the function. So it does not compile this version.
But if we just replace the line
let result_ref = fun_single(ref_first, ref_second);
with
let result_ref = fun_double(ref_first, ref_second);
the compiler sees that two lifetimes are independent so even when you drop second result_ref is still valid, the lifetime of the return reference is not the smallest but independent from second parameter and it compiles.
I would expect that structs with multiple lifetimes and functions with multiple lifetimes to behave similarly. But they don't.
What am I missing here?
I assume that compiler tracks the fields of struct seperately.
I think that's the core of your confusion. The compiler does track each lifetime separately, but only statically at compile time, not during runtime. It follows from this that Rust generally can not allow structs to be partially valid.
So, while you do specify two lifetime parameters, the compiler figures that the struct can only be valid as long as both of them are alive: that is, until the shorter-lived one lives.
But then how does the second example work? It relies on an exceptional feature of the compiler, called Partial Moving. That means that whenever you move out of a struct, it allows you to move disjoint parts separately.
It is essentially a syntax sugar for the following:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
let Own{
first: my_first,
second: my_second,
} = my;
std::mem::drop(my_second);
println!("{}", my_first);
}
Note that this too is a static feature, so the following will not compile (even though it would work when run):
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
if false {
std::mem::drop(my.first);
}
println!("{}", my.first)
}
The struct may not be moved as a whole once it has been partially moved, so not even this allows you to have partially valid structs.
A local variable may be partially initialized, such as in your second example. Rust can track this for local variables and give you an error if you attempt to access the uninitialized parts.
However in your first example the variable isn't actually partially initialized, it's fully initialized (you give it both the first and second field). Then, when second goes out of scope, my is still fully initialized, but it's second field is now invalid (but initialized). Thus it doesn't even let the variable exist past when second is dropped to avoid an invalid reference.
Rust could track this since you have 2 lifetimes and name the second lifetime a special 'never that would signal the reference is always invalid, but it currently doesn't.
Related
Why can I have multiple mutable references to a static type in the same scope?
My code:
static mut CURSOR: Option<B> = None;
struct B {
pub field: u16,
}
impl B {
pub fn new(value: u16) -> B {
B { field: value }
}
}
struct A;
impl A {
pub fn get_b(&mut self) -> &'static mut B {
unsafe {
match CURSOR {
Some(ref mut cursor) => cursor,
None => {
CURSOR= Some(B::new(10));
self.get_b()
}
}
}
}
}
fn main() {
// first creation of A, get a mutable reference to b and change its field.
let mut a = A {};
let mut b = a.get_b();
b.field = 15;
println!("{}", b.field);
// second creation of A, a the mutable reference to b and change its field.
let mut a_1 = A {};
let mut b_1 = a_1.get_b();
b_1.field = 16;
println!("{}", b_1.field);
// Third creation of A, get a mutable reference to b and change its field.
let mut a_2 = A {};
let b_2 = a_2.get_b();
b_2.field = 17;
println!("{}", b_1.field);
// now I can change them all
b.field = 1;
b_1.field = 2;
b_2.field = 3;
}
I am aware of the borrowing rules
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
In the above code, I have a struct A with the get_b() method for returning a mutable reference to B. With this reference, I can mutate the fields of struct B.
The strange thing is that more than one mutable reference can be created in the same scope (b, b_1, b_2) and I can use all of them to modify B.
Why can I have multiple mutable references with the 'static lifetime shown in main()?
My attempt at explaining this is behavior is that because I am returning a mutable reference with a 'static lifetime. Every time I call get_b() it is returning the same mutable reference. And at the end, it is just one identical reference. Is this thought right? Why am I able to use all of the mutable references got from get_b() individually?
There is only one reason for this: you have lied to the compiler. You are misusing unsafe code and have violated Rust's core tenet about mutable aliasing. You state that you are aware of the borrowing rules, but then you go out of your way to break them!
unsafe code gives you a small set of extra abilities, but in exchange you are now responsible for avoiding every possible kind of undefined behavior. Multiple mutable aliases are undefined behavior.
The fact that there's a static involved is completely orthogonal to the problem. You can create multiple mutable references to anything (or nothing) with whatever lifetime you care about:
fn foo() -> (&'static i32, &'static i32, &'static i32) {
let somewhere = 0x42 as *mut i32;
unsafe { (&*somewhere, &*somewhere, &*somewhere) }
}
In your original code, you state that calling get_b is safe for anyone to do any number of times. This is not true. The entire function should be marked unsafe, along with copious documentation about what is and is not allowed to prevent triggering unsafety. Any unsafe block should then have corresponding comments explaining why that specific usage doesn't break the rules needed. All of this makes creating and using unsafe code more tedious than safe code, but compared to C where every line of code is conceptually unsafe, it's still a lot better.
You should only use unsafe code when you know better than the compiler. For most people in most cases, there is very little reason to create unsafe code.
A concrete reminder from the Firefox developers:
I am new to Rust. When I read chapter 15 of The Rust Programming Language, I failed to know why one should use Boxes in recursive data structures instead of regular references. 15.1 of the book explains that indirection is required to avoid infinite-sized structures, but it does not explain why to use Box.
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn main() {
let list = Cons(1, &Cons(2, &Cons(3, &Nil)));
println!("{:?}", list);
}
The code above compiles and produces the desired output. It seems that using FunctionalList to store a small amount of data on stack works perfectly well. Does this code cause troubles?
It is true that the FunctionalList works in this simple case. However, we will run into some difficulties if we try to use this structure in other ways. For instance, suppose we tried to construct a FunctionalList and then return it from a function:
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list(x: u32) -> FunctionalList {
return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
}
fn main() {
let list = make_list(1);
println!("{:?}", list);
}
This results in the following compile error:
error[E0106]: missing lifetime specifier
--> src/main.rs:9:25
|
9 | fn make_list(x: u32) -> FunctionalList {
| ^^^^^^^^^^^^^^ help: consider giving it an explicit bounded or 'static lifetime: `FunctionalList + 'static`
If we follow the hint and add a 'static lifetime, then we instead get this error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:10:12
|
10 | return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
| ^^^^^^^^^^^^^^^^^^^^^^-----------------^^
| | |
| | temporary value created here
| returns a value referencing data owned by the current function
The issue is that the inner FunctionalList values here are owned by implicit temporary variables whose scope ends at the end of the make_list function. These values would thus be dropped at the end of the function, leaving dangling references to them, which Rust disallows, hence the borrow checker rejects this code.
In contrast, if FunctionalList had been defined to Box its FunctionalList component, then ownership would have been moved from the temporary value into the containing FunctionalList, and we would have been able to return it without any problem.
With your original FunctionalList, the thing we have to think about is that every value in Rust has to have an owner somewhere; and so if, as in this case, the FunctionaList is not the owner of its inner FunctionalLists, then that ownership has to reside somewhere else. In your example, that owner was an implicit temporary variable, but in more complex situations we could use a different kind of external owner. Here's an example of using a TypedArena (from the typed-arena crate) to own the data, so that we can still implement a variation of the make_list function:
use typed_arena::Arena;
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list<'a>(x: u32, arena: &'a Arena<FunctionalList<'a>>) -> &mut FunctionalList<'a> {
let l0 = arena.alloc(Nil);
let l1 = arena.alloc(Cons(x + 2, l0));
let l2 = arena.alloc(Cons(x + 1, l1));
let l3 = arena.alloc(Cons(x, l2));
return l3;
}
fn main() {
let arena = Arena::new();
let list = make_list(1, &arena);
println!("{:?}", list);
}
In this case, we adapted the return type of make_list to return only a mutable reference to a FunctionalList, instead of returning an owned FunctionalList, since now the ownership resides in the arena.
I have the following code, where I'm trying to return the struct Foo with a set of default values for the field values. These values may be changed later. But the compiler complains:
error: `initial` does not live long enough
How this could be achieved? Any alternatives?
struct Foo <'a> {
values: &'a mut Vec<i32>,
}
impl <'a> Foo <'a> {
fn new() -> Foo <'a> {
let initial = vec![1, 2];
Foo { values: &mut initial }
}
}
let my_foo = Foo::new();
my_foo.values.push(3);
There are two problems here.
The first is that you don't need to use &mut to make a structure field mutable. Mutability is inherited in Rust. That is, if you have a Foo stored in a mutable variable (let mut f: Foo), its fields are mutable; if it's in an immutable variable (let f: Foo), its fields are immutable. The solution is to just use:
struct Foo {
values: Vec<i32>,
}
and return a Foo by value.
The second problem (and the source of the actual compilation error) is that you're trying to return a borrow to something you created in the function. This is impossible. No, there is no way around it; you can't somehow extend the lifetime of initial, returning initial as well as the borrow won't work. Really. This is one of the things Rust was specifically designed to absolutely forbid.
If you want to transfer something out of a function, one of two things must be true:
It is being stored somewhere outside the function that will outlive the current call (as in, you were given a borrow as an argument; returning doesn't count), or
You are returning ownership, not just a borrowed reference.
The corrected Foo works because it owns the Vec<i32>.
In the following example:
struct SimpleMemoryBank {
vec: Vec<Box<i32>>,
}
impl SimpleMemoryBank {
fn new() -> SimpleMemoryBank {
SimpleMemoryBank{ vec: Vec::new() }
}
fn add(&mut self, value: i32) -> &mut i32 {
self.vec.push(Box::new(value));
let last = self.vec.len() - 1;
&mut *self.vec[last]
}
}
fn main() {
let mut foo = SimpleMemoryBank::new();
// Works okay
foo.add(1);
foo.add(2);
// Doesn't work: "cannot borrow `foo` as mutable more than once at a time"
let one = foo.add(1);
let two = foo.add(2);
}
add() can be called multiple times in a row, as long as I don't store the result of the function call. But if I store the result of the function (let one = ...), I then get the error:
problem.rs:26:15: 26:18 error: cannot borrow `foo` as mutable more than once at a time
problem.rs:26 let two = foo.add(2);
^~~
problem.rs:25:15: 25:18 note: previous borrow of `foo` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `foo` until the borrow ends
problem.rs:25 let one = foo.add(1);
^~~
problem.rs:27:2: 27:2 note: previous borrow ends here
problem.rs:17 fn main() {
...
problem.rs:27 }
^
error: aborting due to previous error
Is this a manifestation of issue #6393: borrow scopes should not always be lexical?
How can I work around this? Essentially, I want to add a new Box to the vector, and then return a reference to it (so the caller can use it).
This is exactly the problem that Rust is designed to prevent you from causing. What would happen if you did:
let one = foo.add(1);
foo.vec.clear();
println!("{}", one);
Or what if foo.add worked by pushing the new value at the beginning of the vector? Bad things would happen! The main thing is that while you have a borrow out on a variable, you cannot mutate the variable any more. If you were able to mutate it, then you could potentially invalidate the memory underlying the borrow and then your program could do a number of things, the best case would be that it crashes.
Is this a manifestation of issue #6393: borrow scopes should not always be lexical?
Kind of, but not really. In this example, you never use one or two, so theoretically a non-lexical scope would allow it to compile. However, you then state
I want to add a new Box to the vector, and then return a reference to it (so the caller can use it)
Which means your real code wants to be
let one = foo.add(1);
let two = foo.add(2);
do_something(one);
do_something(two);
So the lifetimes of the variables would overlap.
In this case, if you just want a place to store variables that can't be deallocated individually, don't overlap with each other, and cannot be moved, try using a TypedArena:
extern crate arena;
use arena::TypedArena;
fn main() {
let arena = TypedArena::new();
let one = arena.alloc(1);
let two = arena.alloc(2);
*one = 3;
println!("{}, {}", one, two);
}
Rust tutorials often advocate passing an argument by reference:
fn my_func(x: &Something)
This makes it necessary to explicitly take a reference of the value at the call site:
my_func(&my_value).
It is possible to use the ref keyword usually used in pattern matching:
fn my_func(ref x: Something)
I can call this by doing
my_func(my_value)
Memory-wise, does this work like I expect or does it copy my_value on the stack before calling my_func and then get a reference to the copy?
The value is copied, and the copy is then referenced.
fn f(ref mut x: i32) {
*x = 12;
}
fn main() {
let mut x = 42;
f(x);
println!("{}", x);
}
Output: 42
Both functions declare x to be &Something. The difference is that the former takes a reference as the parameter, while the latter expects it to be a regular stack value. To illustrate:
#[derive(Debug)]
struct Something;
fn by_reference(x: &Something) {
println!("{:?}", x); // prints "&Something""
}
fn on_the_stack(ref x: Something) {
println!("{:?}", x); // prints "&Something""
}
fn main() {
let value_on_the_stack: Something = Something;
let owned: Box<Something> = Box::new(Something);
let borrowed: &Something = &value_on_the_stack;
// Compiles:
on_the_stack(value_on_the_stack);
// Fail to compile:
// on_the_stack(owned);
// on_the_stack(borrowed);
// Dereferencing will do:
on_the_stack(*owned);
on_the_stack(*borrowed);
// Compiles:
by_reference(owned); // Does not compile in Rust 1.0 - editor
by_reference(borrowed);
// Fails to compile:
// by_reference(value_on_the_stack);
// Taking a reference will do:
by_reference(&value_on_the_stack);
}
Since on_the_stack takes a value, it gets copied, then the copy matches against the pattern in the formal parameter (ref x in your example). The match binds x to the reference to the copied value.
If you call a function like f(x) then x is always passed by value.
fn f(ref x: i32) {
// ...
}
is equivalent to
fn f(tmp: i32) {
let ref x = tmp;
// or,
let x = &tmp;
// ...
}
i.e. the referencing is completely restricted to the function call.
The difference between your two functions becomes much more pronounced and obvious if the value doesn't implement Copy. For example, a Vec<T> doesn't implement Copy, because that is an expensive operation, instead, it implements Clone (Which requires a specific method call).
Assume two methods are defined as such
fn take_ref(ref v: Vec<String>) {}// Takes a reference, ish
fn take_addr(v: &Vec<String>) {}// Takes an explicit reference
take_ref will try to copy the value passed, before referencing it. For Vec<T>, this is actually a move operation (Because it doesn't copy). This actually consumes the vector, meaning the following code would throw a compiler error:
let v: Vec<String>; // assume a real value
take_ref(v);// Value is moved here
println!("{:?}", v);// Error, v was moved on the previous line
However, when the reference is explicit, as in take_addr, the Vec isn't moved but passed by reference. Therefore, this code does work as intended:
let v: Vec<String>; // assume a real value
take_addr(&v);
println!("{:?}", v);// Prints contents as you would expect