What is the intuition behind Rust lifetimes? - rust

I have read the concept of Rust lifetimes from many different resources and I'm still not able to figure out the intuition behind it. Consider this code:
#[derive(Debug)]
struct Example<'a> {
name: &'a str,
other_name: String,
}
fn main() {
let a: &'static str = "hello world";
println!("{}", a);
let b: Example = Example {
name: "Hello",
other_name: "World".into(),
};
println!("{:?}", b);
}
In my understanding, all things in Rust have a lifetime attached to them. In the line let a: &'static str = "hello world"; the variable a is kept alive till the end of the program and the'static is optional that is let a: &str = "hello world"; is also valid. My confusion is when we add custom lifetime to others such as struct Example.
struct Example<'a> {
name: &'a str,
other_name: String,
}
Why do we need to attach a lifetime 'a to it? What is a simplified and intuitive reasoning why we use lifetimes in Rust?

If you come from a background in garbage-collected languages (I see you're familiar with Python), the whole notion of lifetimes can feel very alien indeed. Even quite high level memory management concepts, such as the difference between stack and heap or when allocations and deallocations occur, can be difficult to grasp: because these are details that garbage collection hides from you (at a cost).
On the other hand, if you come from a language where you've had to manage memory yourself (like C++, for example), these are concepts with which you'll already be quite comfortable. My understanding is that Rust was primarily designed to compete in this "systems language" space whilst at the same time introducing strategies (like the borrow checker) to help avoid most memory management errors. Hence much of the documentation has been written with this audience in mind.
Before you can really understand "lifetimes", you should get to grips with the stack and the heap. Lifetime issues mostly arise with things that are (or might be) on the heap. Rust's ownership model is ultimately about associating each heap allocation with a specific stack item (perhaps via other intermediate heap items), such that when an item is popped from the stack all its associated heap allocations are freed.
Then ask yourself, whenever you have a reference to (i.e. the memory address of) something: will that something still be at the expected location in memory when the reference is used? One reason it might not be is because it was on the heap and its owning item has been popped from the stack, causing it to be dropped and its memory allocation freed; another might be because it has relocated to some other location in memory (for example, it's a Vec that outgrew the space available in its previous allocation). Even mere mutations of the data can violate expectations about what’s held there, so they’re not allowed to happen from under you either.
The most important thing to grasp is that Rust's lifetimes have no impact whatsoever on this question: that is, they never affect how long something remains at a memory location—they are merely assertions that we make about the answers to that question, and the code won't compile if those assertions cannot be verified.
So, on to your example:
struct Example<'a>{
name: &'a str,
other_name: String,
}
Let's imagine we create an instance of this struct:
let foo = Example { name: "eggyal", other_name: String::from("Eka") };
Now suppose this foo, a stack item, is at address 0x1000. Delving into the implementation details for a typical 64-bit system, our memory might look something like this:
...
0x1000 foo.name#ptr = 0xabcd
0x1008 foo.name#len = 6
0x1010 foo.other_name#ptr = 0x5678
0x1018 foo.other_name#cap = 3
0x1020 foo.other_name#len = 3
...
0x5678 'E'
0x5679 'k'
0x567a 'a'
...
0xabcd 'e'
0xabce 'g'
0xabcf 'g'
0xabd0 'y'
0xabd1 'a'
0xabd2 'l'
...
Notice that, in foo, name is comprised of just a pointer and a length; whereas other_name additionally has a capacity (which, in this example, is the same as its length). So what's the difference between &str and String? It's all about where responsibility for managing the associated memory allocation lies.
Since String is an owned, heap-allocated string, foo.other_name "owns" (is responsible for) its associated memory allocation—and hence, when foo is dropped (e.g. because it is popped from the stack), Rust will ensure that those three bytes at address 0x5678 are freed and returned to the allocator (which ultimately happens through an implementation of std::ops::Drop). Owning the allocation also means that String can safely mutate the memory, relocate the value to another address, etc (provided that it's not currently on loan somewhere else).
By contrast, the memory allocation at 0xabcd is not "owned" by foo.name—we say that it's "borrowing" the allocation instead—but if foo.name does not manage the allocation, how can it be sure that it contains what it's supposed to? Well, we programmers promise Rust that we will keep the contents valid for the duration of the borrow (which we give a name, in your case 'a: &'a str means that the memory holding the str is being borrowed for lifetime 'a), and the borrow checker ensures that we keep our promise.
But how long are we promising that lifetime 'a will be? Well, it’ll be different for every instance of Example: the period of time for which we promise "eggyal" will be at 0xabcd for foo will in all likelihood be completely different to the period of time that we promise the name value of some other instance will be at its address. So our lifetime 'a is a parameter of Example: this is why it’s declared as Example<'a>.
Fortunately, we don’t ever need to explicitly define how long our lifetimes will actually last as the compiler knows everything's actual lifetime and merely needs to check that our assertions hold: in our example, the compiler determines that the provided value, "eggyal" is a string literal and therefore of type &'static str, so will be at its address 0xabcd for the 'static lifetime; thus in the case of foo, 'a is allowed to be "any lifetime up to and including 'static"; in #Aloso's answer you can see an example with a different lifetime. Then wherever foo is used, any lifetime assertions at that usage site can be checked and verified against this determined bound.
It takes some getting used to, but I find picturing the memory layout like this and asking myself "when does the memory allocation get freed?" helps me to understand the lifetimes in my code (sometimes I need to think about when the value might be relocated or mutated instead, but merely considering deallocation is often enough—and is usually a little bit easier to grasp).

In this line let a:&'static str = "hello world"; the variable a is kept alive till the end of the program
No, that's not what happens. a is a reference, i.e. it refers to some string data. That string data is 'static, which means that it is alive util the end of the program. a however doesn't need to be alive util the end of the program (it happens to be in this case, because it is the first value declared in the main function, but that's just coincidence).
When a struct has a lifetime, that usually means that it borrows another value, and it can only be used while that value is alive. For example:
struct Example<'a> {
name: &'a str,
other_name: String,
}
fn main() {
let s: String = "hello world".to_string();
let example = Example {
name: &s[..],
other_name: "".to_string(),
};
drop(s); // the lifetime of s ends here
// however, example borrows s, therefore its lifetime
// is tied to s. This means that example can't be used
// after s was dropped. Therefore, the following line
// will trigger a compiler error:
println!("{}", example.name);
}
The reasoning in the actual error message is slightly different, but I think it's still easy to understand:
error[E0505]: cannot move out of `s` because it is borrowed
--> src/main.rs:12:10
|
9 | name: &s[..],
| - borrow of `s` occurs here
...
12 | drop(s); // the lifetime of s ends here
| ^ move out of `s` occurs here
...
18 | println!("{}", example.name);
| ------------ borrow later used here
The error message points out that example, which borrows s, is used after s was dropped. This is forbidden because example has a lifetime that can't outlive s.
I hope I gave you a better understanding how this works. I also recommend you to read this and this chapter of the Rust book.

Related

Why is mutating an owned value and borrowed reference safe in Rust?

In How does Rust prevent data races when the owner of a value can read it while another thread changes it?, I understand I need &mut self, when we want to mutate an object, even when the method is called with an owned value.
But how about primitive values, like i32? I ran this code:
fn change_aaa(bbb: &mut i32) {
*bbb = 3;
}
fn main() {
let mut aaa: i32 = 1;
change_aaa(&mut aaa); // somehow run this asynchronously
aaa = 2; // ... and will have data race here
}
My questions are:
Is this safe in a non concurrent situation?
According to The Rust Programming Language, if we think of the owned value as a pointer, it is not safe according the following rules, however, it compiles.
Two or more pointers access the same data at the same time.
At least one of the pointers is being used to write to the data.
There’s no mechanism being used to synchronize access to the data.
Is this safe in a concurrent situation?
I tried, but I find it hard to put change_aaa(&mut aaa) into a thread, according to Why can't std::thread::spawn accept arguments in Rust? and How does Rust prevent data races when the owner of a value can read it while another thread changes it?. However, is it designed to be hard or impossible to do this, or just because I am unfamiliar with Rust?
The signature of change_aaa doesn't allow it to move the reference into another thread. For example, you might imagine a change_aaa() implemented like this:
fn change_aaa(bbb: &mut i32) {
std::thread::spawn(move || {
std::thread::sleep(std::time::Duration::from_secs(1));
*bbb = 100; // ha ha ha - kaboom!
});
}
But the above doesn't compile. This is because, after desugaring the lifetime elision, the full signature of change_aaa() is:
fn change_aaa<'a>(bbb: &'a mut i32)
The lifetime annotation means that change_aaa must support references of any lifetime 'a chosen by the caller, even a very short one, such as one that invalidates the reference as soon as change_aaa() returns. And this is exactly how change_aaa() is called from main(), which can be desugared to:
let mut aaa: i32 = 1;
{
let aaa_ref = &mut aaa;
change_aaa(aaa_ref);
// aaa_ref goes out of scope here, and we're free to mutate
// aaa as we please
}
aaa = 2; // ... and will have data race here
So the lifetime of the reference is short, and ends just before the assignment to aaa. On the other hand, thread::spawn() requires a function bound with 'static lifetime. That means that the closure passed to thread::spawn() must either only contain owned data, or references to 'static data (data guaranteed to last until the end of the program). Since change_aaa() accepts bbb with with lifetime shorter than 'static, it cannot pass bbb to thread::spawn().
To get a grip on this you can try to come up with imaginative ways to write change_aaa() so that it writes to *bbb in a thread. If you succeed in doing so, you will have found a bug in rustc. In other words:
However, is it designed to be hard or impossible to do this, or just because I am unfamiliar with Rust?
It is designed to be impossible to do this, except through types that are explicitly designed to make it safe (e.g. Arc to prolong the lifetime, and Mutex to make writes data-race-safe).
Is this safe in a non concurrent situation? According to this post, if we think owned value if self as a pointer, it is not safe according the following rules, however, it compiles.
Two or more pointers access the same data at the same time.
At least one of the pointers is being used to write to the data.
There’s no mechanism being used to synchronize access to the data.
It is safe according to those rules: there is one pointer accessing data at line 2 (the pointer passed to change_aaa), then that pointer is deleted and another pointer is used to update the local.
Is this safe in a concurrent situation? I tried, but I find it hard to put change_aaa(&mut aaa) into a thread, according to post and post. However, is it designed to be hard or impossible to do this, or just because I am unfamiliar with Rust?
While it is possible to put change_aaa(&mut aaa) in a separate thread using scoped threads, the corresponding lifetimes will ensure the compiler rejects any code trying to modify aaa while that thread runs. You will essentially have this failure:
fn main(){
let mut aaa: i32 = 1;
let r = &mut aaa;
aaa = 2;
println!("{}", r);
}
error[E0506]: cannot assign to `aaa` because it is borrowed
--> src/main.rs:10:5
|
9 | let r = &mut aaa;
| -------- borrow of `aaa` occurs here
10 | aaa = 2;
| ^^^^^^^ assignment to borrowed `aaa` occurs here
11 | println!("{}", r);
| - borrow later used here

What could go wrong with partially initializing a struct using MaybeUninit::uninit().assume_init() on a reference field then initializing after?

Given a struct like so:
pub struct MyStruct<'a> {
id: u8,
other: &'a OtherStruct,
}
I want to partially initialize it with an id field, then assign to other reference field afterwards. Note: For what I'm showing in this question, it seems extremely unnecessary to do this, but it is necessary in the actual implementation.
The rust documentation talks about initializing a struct field-by-field, which would be done like so:
fn get_struct<'a>(other: &'a OtherStruct) -> MyStruct<'a> {
let mut uninit: MaybeUninit<MyStruct<'a>> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
unsafe {
addr_of_mut!((*ptr).id).write(8);
addr_of_mut!((*ptr).other).write(other);
uninit.assume_init()
}
}
Ok, so that's a possibility and it works, but it it necessary? Is it safe to instead do the following, which also seems to work?
fn get_struct2<'a>(other: &'a OtherStruct) -> MyStruct<'a> {
let mut my_struct = MyStruct {
id: 8,
other: unsafe { MaybeUninit::uninit().assume_init() },
};
my_struct.other = other;
my_struct
}
Note the first way causes no warnings and the second one gives the following warning...
other: unsafe { MaybeUninit::uninit().assume_init() },
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
this code causes undefined behavior when executed
help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
...which makes sense because if the other field were accessed that could cause problems.
From having almost no understanding of this, I'm guessing that for the second way it's initially defining a struct that has its other reference pointing at whatever location in memory, but once a valid reference is assigned it should be good. Is that correct? I'm thinking it might matter for situations like if there was a struct or enum that wasn't initialized due to compiler optimizations so wrapping in MaybeUninit would prevent those optimizations, but is it ok for a reference? I'm never accessing the reference until it's assigned to.
Edit: Also, I know this could also be solved by using an Option or some other container for initialization in the private API of the struct, but let's skip over that.
It's undefined behavior, (What Every C (Rust using unsafe also) Programmer Should Know About Undefined Behavior):
Behavior considered undefined
A reference or Box that is dangling, unaligned, or points to an invalid value.
Note:
Undefined behavior affects the entire program. For example, calling a function in C that exhibits undefined behavior of C means your entire program contains undefined behaviour that can also affect the Rust code. And vice versa, undefined behavior in Rust can cause adverse affects on code executed by any FFI calls to other languages.
Dangling pointers
A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type (using size_of_val). As a consequence, if the span is empty, "dangling" is the same as "non-null". Note that slices and strings point to their entire range, so it is important that the length metadata is never too large. In particular, allocations and therefore slices and strings cannot be bigger than isize::MAX bytes.
The reference book

What are the differences between fn(b: Box<dyn Trait>) and fn<T: Trait>(b: &mut T) in Rust? [duplicate]

I'm a bit confused about how pointers work in Rust. There's ref, Box, &, *, and I'm not sure how they work together.
Here's how I understand it currently:
Box isn't really a pointer - it's a way to allocate data on the heap, and pass around unsized types (traits especially) in function arguments.
ref is used in pattern matching to borrow something that you match on, instead of taking it. For example,
let thing: Option<i32> = Some(4);
match thing {
None => println!("none!"),
Some(ref x) => println!("{}", x), // x is a borrowed thing
}
println!("{}", x + 1); // wouldn't work without the ref since the block would have taken ownership of the data
& is used to make a borrow (borrowed pointer). If I have a function fn foo(&self) then I'm taking a reference to myself that will expire after the function terminates, leaving the caller's data alone. I can also pass data that I want to retain ownership of by doing bar(&mydata).
* is used to make a raw pointer: for example, let y: i32 = 4; let x = &y as *const i32. I understand pointers in C/C++ but I'm not sure how this works with Rust's type system, and how they can be safely used. I'm also not sure what the use cases are for this type of pointer. Additionally, the * symbol can be used to dereference things (what things, and why?).
Could someone explain the 4th type of pointer to me, and verify that my understanding of the other types is correct? I'd also appreciate anyone pointing out any common use cases that I haven't mentioned.
First of all, all of the items you listed are really different things, even if they are related to pointers. Box is a library-defined smart pointer type; ref is a syntax for pattern matching; & is a reference operator, doubling as a sigil in reference types; * is a dereference operator, doubling as a sigil in raw pointer types. See below for more explanation.
There are four basic pointer types in Rust which can be divided in two groups - references and raw pointers:
&T - immutable (shared) reference
&mut T - mutable (exclusive) reference
*const T - immutable raw pointer
*mut T - mutable raw pointer
The difference between the last two is very thin, because either can be cast to another without any restrictions, so const/mut distinction there serves mostly as a lint. Raw pointers can be created freely to anything, and they also can be created out of thin air from integers, for example.
Naturally, this is not so for references - reference types and their interaction define one of the key feature of Rust: borrowing. References have a lot of restrictions on how and when they could be created, how they could be used and how they interact with each other. In return, they can be used without unsafe blocks. What borrowing is exactly and how it works is out of scope of this answer, though.
Both references and raw pointers can be created using & operator:
let x: u32 = 12;
let ref1: &u32 = &x;
let raw1: *const u32 = &x;
let ref2: &mut u32 = &mut x;
let raw2: *mut u32 = &mut x;
Both references and raw pointers can be dereferenced using * operator, though for raw pointers it requires an unsafe block:
*ref1; *ref2;
unsafe { *raw1; *raw2; }
The dereference operator is often omitted, because another operator, the "dot" operator (i.e., .), automatically references or dereferences its left argument. So, for example, if we have these definitions:
struct X { n: u32 };
impl X {
fn method(&self) -> u32 { self.n }
}
then, despite that method() takes self by reference, self.n automatically dereferences it, so you won't have to type (*self).n. Similar thing happens when method() is called:
let x = X { n: 12 };
let n = x.method();
Here, the compiler automatically references x in x.method(), so you won't have to write (&x).method().
The next to last piece of code also demonstrated the special &self syntax. It means just self: &Self, or, more specifically, self: &X in this example. &mut self, *const self, *mut self also work.
So, references are the main pointer kind in Rust and should be used almost always. Raw pointers, which don't have restrictions of references, should be used in low-level code implementing high-level abstractions (collections, smart pointers, etc.) and in FFI (interacting with C libraries).
Rust also has dynamically-sized (or unsized) types. These types do not have a definite statically-known size and therefore can only be used through a pointer/reference. However, only a pointer is not enough - additional information is needed, for example, length for slices or a pointer to a virtual methods table for trait objects. This information is "embedded" in pointers to unsized types, making these pointers "fat".
A fat pointer is basically a structure which contains the actual pointer to the piece of data and some additional information (length for slices, pointer to vtable for trait objects). What's important here is that Rust handles these details about pointer contents absolutely transparently for the user - if you pass &[u32] or *mut SomeTrait values around, corresponding internal information will be automatically passed along.
Box<T> is one of the smart pointers in the Rust standard library. It provides a way to allocate enough memory on the heap to store a value of the corresponding type, and then it serves as a handle, a pointer to that memory. Box<T> owns the data it points to; when it is dropped, the corresponding piece of memory on the heap is deallocated.
A very useful way to think of boxes is to consider them as regular values, but with a fixed size. That is, Box<T> is equivalent to just T, except it always takes a number of bytes which correspond to the pointer size of your machine. We say that (owned) boxes provide value semantics. Internally, they are implemented using raw pointers, like almost any other high-level abstraction.
Boxes (in fact, this is true for almost all of the other smart pointers, like Rc) can also be borrowed: you can get a &T out of Box<T>. This can happen automatically with the . operator or you can do it explicitly by dereferencing and referencing it again:
let x: Box<u32> = Box::new(12);
let y: &u32 = &*x;
In this regard, Boxes are similar to built-in pointers - you can use dereference operator to reach their contents. This is possible because the dereference operator in Rust is overloadable, and it is overloaded for most (if not all) of the smart pointer types. This allows easy borrowing of these pointers contents.
And, finally, ref is just a syntax in patterns to obtain a variable of the reference type instead of a value. For example:
let x: u32 = 12;
let y = x; // y: u32, a copy of x
let ref z = x; // z: &u32, points to x
let ref mut zz = x; // zz: &mut u32, points to x
While the above example can be rewritten with reference operators:
let z = &x;
let zz = &mut x;
(which would also make it more idiomatic), there are cases when refs are indispensable, for example, when taking references into enum variants:
let x: Option<Vec<u32>> = ...;
match x {
Some(ref v) => ...
None => ...
}
In the above example, x is only borrowed inside the whole match statement, which allows using x after this match. If we write it as such:
match x {
Some(v) => ...
None => ...
}
then x will be consumed by this match and will become unusable after it.
Box is logically a newtype around a raw pointer (*const T). However, it allocates and deallocates its data during construction and destruction, so does not have to borrow data from some other source.
The same thing is true of other pointer types, like Rc - a reference counted pointer. These are structs containing private raw pointers which they allocate into and deallocate from.
A raw pointer has exactly the same layout as a a normal pointer, so are not compatible with C pointers in several cases. Importantly, *const str and *const [T] are fat pointers, which means they contain extra information about the value's length.
However, raw pointers makes absolutely no guarantees as to their validity. For example, I can safely do
123 as *const String
This pointer is invalid, since the memory location 123 does not point to a valid String. Thus, when dereferencing one, an unsafe block is required.
Further, whereas borrows are required to respect certain laws - namely that you cannot have multiple borrows if one is mutable - raw pointers do not have to respect this. There are other, weaker, laws that must be obeyed, but you're less likely to run afoul of these.
There is no logical difference between *mut and *const, although they may need to be casted to the other to do certain operations - the difference is documentative.
References and raw pointers are the same thing at the implementation level. The difference from the programmer perspective is that references are safe (in Rust terms), but raw pointers are not.
The borrow checker guarantees that references are always valid (lifetime management), that you can have only one mutable reference at time, etc.
These type of constraint can be too strict for many use cases, so raw pointers (which do not have any constraints, like in C/C++) are useful to implement low-level data structures, and in general low-level stuff. However, you can only dereference raw pointers or do operations on them inside an unsafe block.
The containers in the standard library are implemented using raw pointers, Box and Rc too.
Box and Rc are what smart pointers are in C++, that is wrappers around raw pointers.
I would like to add my two cents.
A. Table
Reference/Pointer
DataLocation
Mutable
SharedOwnership
Safe
implCopy
&T
stack
❌
✔️️
✔️
✔️
&mut T
stack
✔️
❌
✔️
❌
*const T
stack
❌
✔️
❌
✔️
*mut T
stack
✔️
✔️
❌
✔️
Box<T>
heap
✔️
❌
✔️
❌
Rc<T>
heap
❌
✔️
✔️
❌
B. Comments on table
&T
Mutable (❌): Error: cannot assign to *some_ref, which is behind a & reference some_ref is a & reference, so the data it refers to cannot be written rustc (E0594).
Shared (✔️)
Safe (✔️)
impl Copy (✔️)
&mut T
Mutable (✔️)
Shared (❌): Has only one owner. Error: cannot borrow x as mutable more than once at a time second mutable borrow occurs here rustc (E0499).
Safe (✔️)
impl Copy (❌): Error: move occurs because some_ref has type &mut u32, which does not implement the Copy trait.
*const T
Mutable: (❌): Error: cannot assign to *some_raw_pointer, which is behind a *const pointer raw1 is a *const pointer, so the data it refers to cannot be written rustc (E0594).
Shared (✔️)
Safe: (❌): Error: dereference of raw pointer is unsafe and requires unsafe function or block raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior rustc (E0133).
impl Copy (✔️): Please check the official documentation.
*mut T
Mutable (✔️)
Shared (✔️)
Safe (❌): Error: dereference of raw pointer is unsafe and requires unsafe function or block
raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior rustc (E0133).
impl Copy (✔️): Please check the Official Documentation.
Box<T>
Mutable (✔️)
Shared (❌): In order to prove it, use a reference to a box in some scope, the reference will drop right after that scope ends because it has only one owner. Please refer to this SO answer for more details. Error: some_box does not live long enough borrowed value does not live long enough rustc (E0597).
Safe (✔️)
impl Copy (❌): Please check the Official Documentation. Actually there is a reason:
You can't implement Copy for Box, that would allow creation of multiple boxes referencing the same thing.
Rc<T>
Mutable (❌): Well, only one copy is mutable, and it's a bit more complicated. Error: cannot assign to data in an Rc trait DerefMut is required to modify through a dereference, but it is not implemented for Rc<u32> rustc (E0594).
Shared (✔️): Actually it's multiple ownership.
Safe (✔️)
impl Copy (❌): Please check the Official Documentation.
C. Related Notes
1. Copy trait vs move:
According to the official documentation:
It’s important to note that in these two examples, the only difference is whether you are allowed to access x after the assignment. Under the hood, both a copy and a move can result in bits being copied in memory, although this is sometimes optimized away.
So, be aware that move transfers ownership, while Copy has nothing to do with it.
2. Mutable References do not implement Copy
Some types can’t be copied safely. For example, copying &mut T would create an aliased mutable reference. Copying String would duplicate responsibility for managing the String’s buffer, leading to a double free.
It's good anyway to read the full Copy documentation page.
3. Dereferencing Pointers and Unsafe
The term unsafe here means that you won't be able to dereference the pointer unless with an unsafe function or block. Otherwise, you'll get the following error:
dereference of raw pointer is unsafe and requires unsafe function or block raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior rustc (E0133).
4. ref is the same as &
Box is a smart pointer which is a data type. it is not just a simple pointer to the address in the memory. Box pointer is the owner of the value.
fn main(){
// this will point to a value 0.1 which will be stored on the HEAP
// the var heap_value is just the address and it will be stored in the stack
// Box pointer is the owner of the value
let heap_value=Box::new(0.1);
// "x" is a primitive type, it will have a fixed size and therefore will be stored on the stack.
let x=0.1;
// * dereference which means just get the stored value
println!("they are equal or not {}",x==*heap_value); // true
}
Dereference a tuple:
fn main(){
let coord=Box::new((25,50));
// x is a pointer
let x=coord;
// to extract all the tuple data structure
// if you are behind a reference and you need to use the value
let extracted_tuple=*x;
}
type of "x" pointer is: Box<(i32, i32)>
type of "extracted_tuple" is (i32, i32)
Keep in mind that references are always stack allocated, because they are fixed size
fn main(){
let stack_var=10;
// this is the reference of stack_var. they both are on the stack.
// this will point to the above +
let stack_ref=&stack_var;
// this will create a box pointer. heap memory will be allocated
// copy of stack_var will be stored on the heap, heap_var points to that memory
let heap_var=Box::new(stack_var);
println!("heap var is {}",heap_var);
}
this image explains above function
As you said ref is used in pattern matching to borrow something that you match on. Instead of using ref keyword,
&thing is used
let thing: Option<i32> = Some(4);
match &thing {
None => println!("none!"),
Some(x) => println!("{}", x), // x is a borrowed thing
}
println!("{}", x + 1);

What happens to the stack when a value is moved in Rust? [duplicate]

In Rust, there are two possibilities to take a reference
Borrow, i.e., take a reference but don't allow mutating the reference destination. The & operator borrows ownership from a value.
Borrow mutably, i.e., take a reference to mutate the destination. The &mut operator mutably borrows ownership from a value.
The Rust documentation about borrowing rules says:
First, any borrow must last for a scope no greater than that of the
owner. Second, you may have one or the other of these two kinds of
borrows, but not both at the same time:
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
I believe that taking a reference is creating a pointer to the value and accessing the value by the pointer. This could be optimized away by the compiler if there is a simpler equivalent implementation.
However, I don't understand what move means and how it is implemented.
For types implementing the Copy trait it means copying e.g. by assigning the struct member-wise from the source, or a memcpy(). For small structs or for primitives this copy is efficient.
And for move?
This question is not a duplicate of What are move semantics? because Rust and C++ are different languages and move semantics are different between the two.
Semantics
Rust implements what is known as an Affine Type System:
Affine types are a version of linear types imposing weaker constraints, corresponding to affine logic. An affine resource can only be used once, while a linear one must be used once.
Types that are not Copy, and are thus moved, are Affine Types: you may use them either once or never, nothing else.
Rust qualifies this as a transfer of ownership in its Ownership-centric view of the world (*).
(*) Some of the people working on Rust are much more qualified than I am in CS, and they knowingly implemented an Affine Type System; however contrary to Haskell which exposes the math-y/cs-y concepts, Rust tends to expose more pragmatic concepts.
Note: it could be argued that Affine Types returned from a function tagged with #[must_use] are actually Linear Types from my reading.
Implementation
It depends. Please keep in mind than Rust is a language built for speed, and there are numerous optimizations passes at play here which will depend on the compiler used (rustc + LLVM, in our case).
Within a function body (playground):
fn main() {
let s = "Hello, World!".to_string();
let t = s;
println!("{}", t);
}
If you check the LLVM IR (in Debug), you'll see:
%_5 = alloca %"alloc::string::String", align 8
%t = alloca %"alloc::string::String", align 8
%s = alloca %"alloc::string::String", align 8
%0 = bitcast %"alloc::string::String"* %s to i8*
%1 = bitcast %"alloc::string::String"* %_5 to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 24, i32 8, i1 false)
%2 = bitcast %"alloc::string::String"* %_5 to i8*
%3 = bitcast %"alloc::string::String"* %t to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 24, i32 8, i1 false)
Underneath the covers, rustc invokes a memcpy from the result of "Hello, World!".to_string() to s and then to t. While it might seem inefficient, checking the same IR in Release mode you will realize that LLVM has completely elided the copies (realizing that s was unused).
The same situation occurs when calling a function: in theory you "move" the object into the function stack frame, however in practice if the object is large the rustc compiler might switch to passing a pointer instead.
Another situation is returning from a function, but even then the compiler might apply "return value optimization" and build directly in the caller's stack frame -- that is, the caller passes a pointer into which to write the return value, which is used without intermediary storage.
The ownership/borrowing constraints of Rust enable optimizations that are difficult to reach in C++ (which also has RVO but cannot apply it in as many cases).
So, the digest version:
moving large objects is inefficient, but there are a number of optimizations at play that might elide the move altogether
moving involves a memcpy of std::mem::size_of::<T>() bytes, so moving a large String is efficient because it only copies a couple bytes whatever the size of the allocated buffer they hold onto
When you move an item, you are transferring ownership of that item. That's a key component of Rust.
Let's say I had a struct, and then I assign the struct from one variable to another. By default, this will be a move, and I've transferred ownership. The compiler will track this change of ownership and prevent me from using the old variable any more:
pub struct Foo {
value: u8,
}
fn main() {
let foo = Foo { value: 42 };
let bar = foo;
println!("{}", foo.value); // error: use of moved value: `foo.value`
println!("{}", bar.value);
}
how it is implemented.
Conceptually, moving something doesn't need to do anything. In the example above, there wouldn't be a reason to actually allocate space somewhere and then move the allocated data when I assign to a different variable. I don't actually know what the compiler does, and it probably changes based on the level of optimization.
For practical purposes though, you can think that when you move something, the bits representing that item are duplicated as if via memcpy. This helps explain what happens when you pass a variable to a function that consumes it, or when you return a value from a function (again, the optimizer can do other things to make it efficient, this is just conceptually):
// Ownership is transferred from the caller to the callee
fn do_something_with_foo(foo: Foo) {}
// Ownership is transferred from the callee to the caller
fn make_a_foo() -> Foo { Foo { value: 42 } }
"But wait!", you say, "memcpy only comes into play with types implementing Copy!". This is mostly true, but the big difference is that when a type implements Copy, both the source and the destination are valid to use after the copy!
One way of thinking of move semantics is the same as copy semantics, but with the added restriction that the thing being moved from is no longer a valid item to use.
However, it's often easier to think of it the other way: The most basic thing that you can do is to move / give ownership away, and the ability to copy something is an additional privilege. That's the way that Rust models it.
This is a tough question for me! After using Rust for a while the move semantics are natural. Let me know what parts I've left out or explained poorly.
Rust's move keyword always bothers me so, I decided to write my understanding which I obtained after discussion with my colleagues.
I hope this might help someone.
let x = 1;
In the above statement, x is a variable whose value is 1. Now,
let y = || println!("y is a variable whose value is a closure");
So, move keyword is used to transfer the ownership of a variable to the closure.
In the below example, without move, x is not owned by the closure. Hence x is not owned by y and available for further use.
let x = 1;
let y = || println!("this is a closure that prints x = {}". x);
On the other hand, in this next below case, the x is owned by the closure. x is owned by y and not available for further use.
let x = 1;
let y = move || println!("this is a closure that prints x = {}". x);
By owning I mean containing as a member variable. The example cases above are in the same situation as the following two cases. We can also assume the below explanation as to how the Rust compiler expands the above cases.
The formar (without move; i.e. no transfer of ownership),
struct ClosureObject {
x: &u32
}
let x = 1;
let y = ClosureObject {
x: &x
};
The later (with move; i.e. transfer of ownership),
struct ClosureObject {
x: u32
}
let x = 1;
let y = ClosureObject {
x: x
};
Please let me answer my own question. I had trouble, but by asking a question here I did Rubber Duck Problem Solving. Now I understand:
A move is a transfer of ownership of the value.
For example the assignment let x = a; transfers ownership: At first a owned the value. After the let it's x who owns the value. Rust forbids to use a thereafter.
In fact, if you do println!("a: {:?}", a); after the letthe Rust compiler says:
error: use of moved value: `a`
println!("a: {:?}", a);
^
Complete example:
#[derive(Debug)]
struct Example { member: i32 }
fn main() {
let a = Example { member: 42 }; // A struct is moved
let x = a;
println!("a: {:?}", a);
println!("x: {:?}", x);
}
And what does this move mean?
It seems that the concept comes from C++11. A document about C++ move semantics says:
From a client code point of view, choosing move instead of copy means that you don't care what happens to the state of the source.
Aha. C++11 does not care what happens with source. So in this vein, Rust is free to decide to forbid to use the source after a move.
And how it is implemented?
I don't know. But I can imagine that Rust does literally nothing. x is just a different name for the same value. Names usually are compiled away (except of course debugging symbols). So it's the same machine code whether the binding has the name a or x.
It seems C++ does the same in copy constructor elision.
Doing nothing is the most efficient possible.
Passing a value to function, also results in transfer of ownership; it is very similar to other examples:
struct Example { member: i32 }
fn take(ex: Example) {
// 2) Now ex is pointing to the data a was pointing to in main
println!("a.member: {}", ex.member)
// 3) When ex goes of of scope so as the access to the data it
// was pointing to. So Rust frees that memory.
}
fn main() {
let a = Example { member: 42 };
take(a); // 1) The ownership is transfered to the function take
// 4) We can no longer use a to access the data it pointed to
println!("a.member: {}", a.member);
}
Hence the expected error:
post_test_7.rs:12:30: 12:38 error: use of moved value: `a.member`
let s1:String= String::from("hello");
let s2:String= s1;
To ensure memory safety, rust invalidates s1, so instead of being shallow copy, this called a Move
fn main() {
// Each value in rust has a variable that is called its owner
// There can only be one owner at a time.
let s=String::from('hello')
take_ownership(s)
println!("{}",s)
// Error: borrow of moved value "s". value borrowed here after move. so s cannot be borrowed after a move
// when we pass a parameter into a function it is the same as if we were to assign s to another variable. Passing 's' moves s into the 'my_string' variable then `println!("{}",my_string)` executed, "my_string" printed out. After this scope is done, some_string gets dropped.
let x:i32 = 2;
makes_copy(x)
// instead of being moved, integers are copied. we can still use "x" after the function
//Primitives types are Copy and they are stored in stack because there size is known at compile time.
println("{}",x)
}
fn take_ownership(my_string:String){
println!('{}',my_string);
}
fn makes_copy(some_integer:i32){
println!("{}", some_integer)
}

Lifetime constraints to model scoped garbage collection

I'm working with a friend to define a safe public API for lifetimes of a "scoped" garbage collector. The lifetimes are either overly constrained and correct code does not compile or the lifetimes are too loose and they may allow invalid behavior. After trying multiple approaches, we are still stuck getting a correct API. This is especially frustrating because Rust's lifetimes can help avoid bugs in this situation but right now it just looks stubborn.
Scoped garbage collection
I am implementing an ActionScript interpreter and need a garbage collector. I studied rust-gc but it did not suit my needs. The main reason is that it requires the garbage collected values to have a static lifetime because the GC state is a thread-local static variable. I need to get garbage-collected bindings to a dynamically created host object. The other reason to avoid globals is that it is easier for me to handle multiple independent garbage-collected scopes, control their memory limits or serialize them.
A scoped garbage collector is similar to a typed-arena. You can use it to allocate values and they are all freed once the garbage collector is dropped. The difference is that you can also trigger garbage collection during its lifetime and it will clean-up the unreachable data (and is not limited to a single type).
I have a working implementation implemented (mark & sweep GC with scopes), but the interface is not yet safe to use.
Here is a usage example of what I want:
pub struct RefNamedObject<'a> {
pub name: &'a str,
pub other: Option<Gc<'a, GcRefCell<NamedObject<'a>>>>,
}
fn main() {
// Initialize host settings: in our case the host object will be replaced by a string
// In this case it lives for the duration of `main`
let host = String::from("HostConfig");
{
// Create the garbage-collected scope (similar usage to `TypedArena`)
let gc_scope = GcScope::new();
// Allocate a garbage-collected string: returns a smart pointer `Gc` for this data
let a: Gc<String> = gc_scope.alloc(String::from("a")).unwrap();
{
let b = gc_scope.alloc(String::from("b")).unwrap();
}
// Manually trigger garbage collection: will free b's memory
gc_scope.collect_garbage();
// Allocate data and get a Gc pointer, data references `host`
let host_binding: Gc<RefNamed> = gc_scope
.alloc(RefNamedObject {
name: &host,
other: None,
})
.unwrap();
// At the end of this block, gc_scope is dropped with all its
// remaining values (`a` and `host_bindings`)
}
}
Lifetime properties
The basic intuition is that Gc can only contain data that lives as long (or longer) than the corresponding GcScope. Gc is similar to Rc but supports cycles. You need to use Gc<GcRefCell<T>> to mutate values (similar to Rc<RefCell<T>>).
Here are the properties that must be satisfied by the lifetimes of my API:
Gc cannot live longer than its GcScope
The following code must fail because a outlives gc_scope:
let a: Gc<String>;
{
let gc_scope = GcScope::new();
a = gc_scope.alloc(String::from("a")).unwrap();
}
// This must fail: the gc_scope was dropped with all its values
println("{}", *a); // Invalid
Gc cannot contain data that lives shorter than its GcScope
The following code must fail because msg does not live as long (or longer) as gc_scope
let gc_scope = GcScope::new();
let a: Gc<&string>;
{
let msg = String::from("msg");
a = gc.alloc(&msg).unwrap();
}
It must be possible to allocate multiple Gc (no exclusion on gc_scope)
The following code must compile
let gc_scope = GcScope::new();
let a = gc_scope.alloc(String::from("a"));
let b = gc_scope.alloc(String::from("b"));
It must be possible to allocate values containing references with lifetimes longer than gc_scope
The following code must compile
let msg = String::from("msg");
let gc_scope = GcScope::new();
let a: Gc<&str> = gc_scope.alloc(&msg).unwrap();
It must be possible to create cycles of Gc pointers (that's the whole point)
Similarly to the Rc<Refcell<T>> pattern, you can use Gc<GcRefCell<T>> to mutate values and create cycles:
// The lifetimes correspond to my best solution so far, they can change
struct CircularObj<'a> {
pub other: Option<Gc<'a, GcRefCell<CircularObj<'a>>>>,
}
let gc_scope = GcScope::new();
let n1 = gc_scope.alloc(GcRefCell::new(CircularObj { other: None }));
let n2 = gc_scope.alloc(GcRefCell::new(CircularObj {
other: Some(Gc::clone(&n1)),
}));
n1.borrow_mut().other = Some(Gc::clone(&n2));
Solutions so far
Automatic lifetime / lifetime tag
Implemented on the auto-lifetime branch
This solution is inspired by neon's handles.
This lets any valid code compile (and allowed me to test my implementation) but is too loose and allows invalid code. It allows Gc to outlive the gc_scope that created it. (Violates the first property)
The idea here is that I add a single lifetime 'gc to all my structs. The idea is that this lifetime represents "how long gc_scope lives".
// A smart pointer for `T` valid during `'gc`
pub struct Gc<'gc, T: Trace + 'gc> {
pub ptr: NonNull<GcBox<T>>,
pub phantom: PhantomData<&'gc T>,
pub rooted: Cell<bool>,
}
I call it automatic lifetimes because the methods never mix these struct lifetimes with the lifetime of the references they receive.
Here is the impl for gc_scope.alloc:
impl<'gc> GcScope<'gc> {
// ...
pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Inner/outer lifetimes
Implemented on the inner-outer branch
This implementation tries to fix the previous issue by relating Gc to the lifetime of GcScope. It is overly constrained and prevents the creation of cycles. This violates the last property.
To constrain Gc relative to its GcScope, I introduce two lifetimes: 'inner is the lifetime of GcScope and the result is Gc<'inner, T>. 'outer represents a lifetime longer than 'inner and is used for the allocated value.
Here is the alloc signature:
impl<'outer> GcScope<'outer> {
// ...
pub fn alloc<'inner, T: Trace + 'outer>(
&'inner self,
value: T,
) -> Result<Gc<'inner, T>, GcAllocErr> {
// ...
}
// ...
}
Closure (context management)
Implemented on the with branch
Another idea was to not let the user create a GcScope manually with GcScope::new but instead expose a function GcScope::with(executor) providing a reference to the gc_scope. The closure executor corresponds to the gc_scope. So far, it either prevents the use of external references or allows to leak data to external Gc variables (first and fourth properties).
Here is the alloc signature:
impl<'gc> GcScope<'gc> {
// ...
pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Here is a usage example showing the violation of the first property:
let message = GcScope::with(|scope| {
scope
.alloc(NamedObject {
name: String::from("Hello, World!"),
})
.unwrap()
});
println!("{}", message.name);
What I'd like
From what I understand, the alloc signature I'd like is:
impl<'gc> GcScope<'gc> {
pub fn alloc<T: Trace + 'gc>(&'gc self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Where everything lives as long or longer than self (the gc_scope). But this blows up with the most simple tests:
fn test_gc() {
let scope: GcScope = GcScope::new();
scope.alloc(String::from("Hello, World!")).unwrap();
}
causes
error[E0597]: `scope` does not live long enough
--> src/test.rs:50:3
|
50 | scope.alloc(String::from("Hello, World!")).unwrap();
| ^^^^^ borrowed value does not live long enough
51 | }
| - `scope` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
I have no idea what happens here. Playground link
Edit: As explained to me on IRC, this is because I implement Drop which requires &mut self, but the scope is already borrowed in read-only mode.
Overview
Here is a quick overview of the main components of my library.
GcScope contains a RefCell to its mutable state. This was introduced to not require &mut self for alloc because it "locked" the gc_scope and violated property 3: allocate multiple values.
This mutable state is GcState. It keeps track of all the allocated values. The values are stored as a forward-only linked list of GcBox. This GcBox is heap-allocated and contains the actual value with some metadata (how many active Gc pointers have it as their root and a boolean flag used to check if the value is reachable from the root (see rust-gc). The value here must outlive its gc_scope so GcBox uses a lifetime, and in turn GcState must then use a lifetime as well as GcScope: this is always the same lifetime meaning "longer than gc_scope". The fact that GcScope has a RefCell (interior mutability) and lifetime is maybe the reason why I can't get my lifetimes to work (it causes invariance?).
Gc is a smart pointer to some gc_scope-allocated data. You can only get it through gc_scope.alloc or by cloning it.
GcRefCell is most likely fine, it's just a RefCell wrapper adding metadata and behavior to properly support borrows.
Flexibility
I'm fine with the following requirements to get a solution:
unsafe code
nightly features
API changes (see for example my with approach). What matters is that I can create a temporary zone where I can manipulate garbage-collected values and that they are all dropped after this. These garbage-collected values need to be able to access longer-lived (but not static) variables outside of the scope.
The repository has a few tests in scoped-gc/src/lib.rs (compile-fail) as scoped-gc/src/test.rs.
I found a solution, I'll post it once redacted.
This is one of the hardest problems I had with lifetimes with Rust so far, but I managed to find a solution. Thank you to panicbit and mbrubeck for having helped me on IRC.
What helped me to move forward was the explanation of the error I posted at the end of my question:
error[E0597]: `scope` does not live long enough
--> src/test.rs:50:3
|
50 | scope.alloc(String::from("Hello, World!")).unwrap();
| ^^^^^ borrowed value does not live long enough
51 | }
| - `scope` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
I did not understand this error because it wasn't clear to me why scope was borrowed, for how long, or why it needs to no longer be borrowed at the end of the scope.
The reason is that during the allocation of the value, the scope is immutably borrowed for the duration of the allocated value. The issue now is that the scope contains a state object that implements "Drop": custom implementations of drop use &mut self -> it is not possible to get a mutable borrow for the drop while the value is already immutably borrowed.
Understanding that drop requires &mut self and that it is incompatible with immutable borrows unlocked the situation.
It turns out that the inner-outer approach described in the question above had the correct lifetimes with alloc:
impl<'outer> GcScope<'outer> {
// ...
pub fn alloc<'inner, T: Trace + 'outer>(
&'inner self,
value: T,
) -> Result<Gc<'inner, T>, GcAllocErr> {
// ...
}
// ...
}
The returned Gc lives as long as GcScope and the allocated values must live longer than the current GcScope. As mentioned in the question, the issue with this solution is that it did not support circular values.
The circular values failed to work not because of the lifetimes of alloc but due to the custom drop. Removing drop allowed all the tests to pass (but leaked memory).
The explanation is quite interesting:
The lifetime of alloc expresses the properties of the allocated values. The allocated values cannot outlive their GcScope but their content must live as long or longer than GcScope. When creating a cycle, the value is subject to both of these constraints: it is allocated so must live as long or shorter than GcScope but also referenced by another allocated value so it must live as long or longer than GcScope. Because of this there is only one solution: the allocated value must live exactly as long as its scope.
It means that the lifetime of GcScope and its allocated values is exactly the same. When two lifetimes are the same, Rust does not guarantee the order of the drops. The reason why this happens is that the drop implementations could try to access each other and since there's no ordering it would be unsafe (the value might already have been freed).
This is explained in the Drop Check chapter of the Rustonomicon.
In our case, the drop implementation of the state of the garbage collected does not dereference the allocated values (quite the opposite, it frees their memory) so the Rust compiler is overly cautious by preventing us from implementing drop.
Fortunately, the Nomicon also explains how to work around these check of values with the same lifetimes. The solution is to use the may_dangle attribute on the lifetime parameter of the Drop implementation.
This is as unstable attribute that requires to enable the generic_param_attrs and dropck_eyepatch features.
Concretely, my drop implementation became:
unsafe impl<'gc> Drop for GcState<'gc> {
fn drop(&mut self) {
// Free all the values allocated in this scope
// Might require changes to make sure there's no use after free
}
}
And I added the following lines to lib.rs:
#![feature(generic_param_attrs)]
#![feature(dropck_eyepatch)]
You can read more about these features:
generic_param_attrs
may_dangle
I updated my library scoped-gc with the fix for this issue if you want to take a closer look at it.

Resources