Moved variable still borrowing after calling `drop`? - rust

fn main() {
let mut x: Vec<&i32> = vec![];
let a = 1;
x.push(&a);
drop(x);
// x.len(); // error[E0382]: use of moved value: `x`
} // `a` dropped here while still borrowed
The compiler knows drop() drops x (as evident from the error in the commented-out code) but still thinks the variable is borrowing from a! This is unfair!
Should this be considered as one of numerous dupes of rust-lang/rust#6393 (which is now tracked by rust-lang/rfcs#811?) But the discussion there seems to be centered on making &mut self and &self coexist in a single block.

I can't give you a definite answer, but I'll try to explain a few things here. Let's start with clarifying something:
The compiler knows drop() drops x
This is not true. While there are a few "magic" things in the standard library that the compiler knows about, drop() is not such a lang item. In fact, you could implement drop() yourself and it's actually the easiest thing to do:
fn drop<T>(_: T) {}
The function just takes something by value (thus, it's moved into drop()) and since nothing happens inside of drop(), this value is dropped at the end of the scope, like in any other function. So: the compiler doesn't know x is dropped, it just knows x is moved.
As you might have noticed, the compiler error stays the same regardless of whether or not we add the drop() call. Right now, the compiler will only look at the scope of a variable when it comes to references. From Niko Matsakis' intro to NLL:
The way that the compiler currently works, assigning a reference into a variable means that its lifetime must be as large as the entire scope of that variable.
And in a later blog post of his:
In particular, today, once a lifetime must extend beyond the boundaries of a single statement [...], it must extend all the way till the end of the enclosing block.
This is exactly what happens here, so yes, your problem has to do with all this "lexical borrowing" stuff. From the current compilers perspective, the lifetime of the expression &a needs to be at least as large as the scope of x. But this doesn't work, since the reference would outlive a, since the scope of x is larger than the scope of a as pointed out by the compiler:
= note: values in a scope are dropped in the opposite order they are created
And I guess you already know all that, but you can fix your example by swapping the lines let mut x ...; and let a ...;.
I'm not sure whether or not this exact problem would be solved by any of the currently proposed solutions. But I hope that we will see soon enough, as all of this is being addressed as part of the Rust 2017 roadmap. A good place to read up on the updates is here (which also contains links to the five relevant blog posts of Niko).

The compiler knows drop() drops x (as evident from the error in the commented-out code)
The Rust compiler doesn't know anything about drop and what it does. It's just a library function, which could do anything it likes with the value since it now owns it.
The definition of drop, as the documentation points out, is literally just:
fn drop<T>(_x: T) { }
It works because it the argument is moved into the function, and is therefore automatically dropped by the compiler when the function finishes.
If you create your own function, you will get exactly the same error message:
fn my_drop<T>(_x: T) { }
fn main() {
let mut x: Vec<&i32> = vec![];
let a = 1;
x.push(&a);
my_drop(x);
x.len();
}
This is exactly what is meant in the documentation when it says drop "isn't magic".

Related

Does Rust automatically execute code using multithreads [duplicate]

Rust disallows this kind of code because it is unsafe:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
*ref_to_i_1 = 1;
*ref_to_i_2 = 2;
}
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
The only possible issues I can see come from the lifetime of the data. Here, if i is alive, each mutable reference to it should be ok.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:
for (it: collection) {
if (predicate(*it)) {
collection.remove(it);
}
}
For C++ standard library collections, this causes undefined behaviour. Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array. Maybe the whole array underlying the collection will be relocated, and it'll fail immediately. Maybe it works most of the time but fails if a reallocation happens at the wrong time. In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException - a check which causes a runtime cost even when your code is correct. Neither language can detect the error during compilation.
This is a common example of a data race caused by concurrency, even in a single-threaded environment. Concurrency doesn't just mean parallelism: it can also mean nested computation. In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.
An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function. A concrete example would be passing overlapping memory ranges to memcpy instead of memmove. Actually, Rust's memcpy equivalent is unsafe too, but that's because it takes pointers instead of references. The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.
A more contrived example of reference aliasing is like this:
int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined
You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
I believe that although you trigger 'undefined behavior' by doing this, technically the noalias flag is not used by the Rust compiler for &mut references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way. What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.
See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias? for more information.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
Have a read of this series of blog articles about undefined behavior
In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about; in a single threaded environment you can avoid that sort of problem if you're careful. This is no different to creating an arbitrary pointer to invalid memory and writing to it; just don't do it. You're no worse off than using C.
To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed; that means that although your code may run in the linear sequence:
a; b; c;
There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence. Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.
tl;dr: Basically, the compiler may perform various optimizations; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.
As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future. It is not a 'breaking change' to introduce new compiler optimizations.
So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now; but the restriction allows the possibility of future performance optimizations.
Pertinent quote:
The C FAQ defines “undefined behavior” like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?
The compiler is allowed to optimize &mut pointers under the assumption that they are exclusive (not aliased). Your code breaks this assumption.
The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1 and ref_to_i_2 to a function that modifies both and then does something with them:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
*r2 = 2;
println!("{}", r1);
println!("{}", r2);
}
The compiler may (or may not) decide to de-interleave the accesses to r1 and r2, because they are not allowed to alias:
// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code.
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
println!("{}", r1);
*r2 = 2;
println!("{}", r2);
}
It might even realize that the println!s always print the same value and take advantage of that fact to further rearrange foo:
fn foo(r1: &mut i32, r2: &mut i32) {
println!("{}", 1);
println!("{}", 2);
*r1 = 1;
*r2 = 2;
}
It's good that a compiler can do this optimization! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). All else being equal, everybody likes their code to run fast, right?
You might say "Well, that's an invalid optimization because it doesn't do the same thing." But you'd be wrong: the whole point of &mut references is that they do not alias. There is no way to make r1 and r2 alias without breaking the rules†, which is what makes this optimization valid to do.
You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples. But bear in mind that these transformations are part of a long multi-step optimization process. It's important to uphold the properties of &mut references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.
One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem; asking the compiler for occasional exceptions to the &mut aliasing rule is basically asking it to do your job for you.
If you want shared mutability and to forego those optimizations, it's simple: don't use &mut. In the example, you can use &Cell<i32> instead of &mut i32, as the comments mentioned:
fn main() {
let mut i = std::cell::Cell::new(42);
let ref_to_i_1 = &i;
let ref_to_i_2 = &i;
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
r1.set(1);
r2.set(2);
println!("{}", r1.get()); // prints 2, guaranteed
println!("{}", r2.get()); // also prints 2
}
The types in std::cell provide interior mutability, which is jargon for "disallow certain optimizations because & references may mutate things". They aren't always quite as convenient as using &mut, but that's because using them gives you more flexibility to write code like the above.
Also read
The Problem With Single-threaded Shared Mutability describes how having multiple mutable references can cause soundness issues even in the absence of multiple threads and data races.
Dan Hulme's answer illustrates how aliased mutation of more complex data can also cause undefined behavior (even before compiler optimizations).
† Be aware that using unsafe by itself does not count as "breaking the rules". &mut references cannot be aliased, even when using unsafe, in order for your code to have defined behavior.
The simplest example I know of is trying to push into a Vec that's borrowed:
let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);
This is a compiler error:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:4:5
|
3 | let c = &v[0];
| - immutable borrow occurs here
4 | v.push('b');
| ^^^^^^^^^^^ mutable borrow occurs here
5 | dbg!(c);
| - immutable borrow later used here
It's good that this is a compiler error, because otherwise it would be a use-after-free. push reallocates the Vec's heap storage and invalidates our c reference. Rust doesn't actually know what push does; all Rust knows is that push takes &mut self, and here that violates the aliasing rule.
Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this. But if we play around a bit with references and enums, we can express something similar without heap allocation:
enum MyEnum<'a> {
Ptr(&'a i32),
Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
MyEnum::Ptr(i) => i,
MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);
Here we've taken a pointer to my_int, stored that pointer in my_enum, and made my_int_ptr_ptr point into my_enum. If we could then reassign my_enum, we could trash the bits that my_int_ptr_ptr was pointing to. A double dereference of my_int_ptr_ptr would be a wild pointer read, which would probably segfault. Luckily, this it another violation of the aliasing rule, and it won't compile:
error[E0506]: cannot assign to `my_enum` because it is borrowed
--> src/main.rs:12:1
|
8 | let my_int_ptr_ptr: &&i32 = match &my_enum {
| -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
| ---------------- borrow later used here
The term "aliasing" is typically used to identify situations where changing the order of operations involving different references would change the effect of those operations. If multiple references to an object are stored in different places, but the object is not modified during the lifetime of those references, compilers may usefully hoist, defer, or consolidate operations using those references without affecting program behavior.
For example, if a compiler sees that code reads the contents of an object referenced by x, then does something with an object referenced by y, and again reads the contents of the object referenced by x, and if the compiler knows that the action on y cannot have modified the object referenced by x, the compiler may consolidate both reads of x into a single read.
Determining in all cases whether an action on one reference might affect another is would be an intractable problem if programmers had unlimited freedom to use and store references however they saw fit. Rust, however, seeks to handle the two easy cases:
If an object will never be modified during the lifetime of a reference, machine code using the reference won't have to worry about what operations might change it during that lifetime, since it would be impossible for any operations to do so.
If during the lifetime of a reference, an object will only be modified by references that are visibly based upon that reference, machine code using that reference won't have to worry about whether any operations using that reference will interact with operations involving references that appear to be unrelated, because no seemingly-unrelated references will identify the same object.
Allowing for the possibility of aliasing between mutable references would make things much more complicated, since many optimizations which could be performed interchangeably with unshared references to mutable objects or shareable references to immutable ones could no longer do so. Once a language supports situations where operations involving seemingly independent references need to be processed in precisely ordered fashion, it's hard for compilers to know when such precise sequencing is required.

Is it safe to temporarily give away ownership of the contents of a mutable borrow in Rust? [duplicate]

This question already has an answer here:
replace a value behind a mutable reference by moving and mapping the original
(1 answer)
Closed 1 year ago.
Is a function that modifies a &mut T in place by a function FnOnce(T) -> T safe to have in rust, or can it lead to undefined behavior? Is it included in the standard library somewhere, or a well-known crate?
If you additionally assume T: Default, that looks like
fn modify<T, F: FnOnce(T) -> T>(x: &mut T, f: F) -> ()
where
T: Default
{
let val = std::mem::take(x);
let val = f(val);
*x = val;
}
(See also
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f015812bac6f527fe663fe4e0b7a3188)
My question is about doing the same but dropping the where T: Default clause (and no T: Clone either). This requires a different implementation, since you can't use std::mem::take.
I'm not sure how to implement the unconstrained version, but it should be possible using unsafe Rust.
I'm learning Rust from a background of linear types and sub-structural logic. Rust's mutable borrow seems very similar to moving a resource in and then back out of a function, but I don't know if it is actually safe to take temporary ownership of the contents of a mutable borrow like this.
It is safe, and there are even crates for that (can't find them now).
HOWEVER.
When writing unsafe code, you have to be very careful. If you don't know exactly what you're doing, it can easily lead to UB.
Here, for example, there is something you maybe haven't thought of: panic safety.
Suppose we implement that trivially:
pub fn modify<T, F: FnOnce(T) -> T>(v: &mut T, f: F) {
let prev = unsafe { std::ptr::read(v) };
let new = f(prev);
unsafe { std::ptr::write(v, new) };
}
Trivially right.
Or is it?
fn main() {
struct MyStruct(pub i32);
impl Drop for MyStruct {
fn drop(&mut self) {
println!("MyStruct({}) dropped", self.0);
}
}
let mut v = MyStruct(123);
std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
modify(&mut v, |_prev| {
// `prev` is dropped here.
panic!("Haha, evil panic!");
})
}))
.unwrap_err();
v.0 = 456; // Writing to an uninitialized memory!
// `v` is dropped here, double drop!
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6f7312a8be70cd43cf5cf7a9816be56a
I used a custom type that its destructor does nothing but to print, but imagine what could happen if this was a Vec that freed the memory and we were writing into freed memory (then, as a bonus, get a double-free).
It is correct, like #Kendas said, that when there are no interruption point it is valid to leave memory in an uninitialized state in Rust. The problem is, that much more places than you wish are actually interruption points. In fact, when writing unsafe code, you have to consider any call to external code (i.e. not yours code neither code that you trust to not do bad things, for example std) to be an interruption point.
Unsafe code is hard. Better stay in the safe land.
Edit: You may wonder what the AssertUnwindSafe is. Maybe you even tried to remove it and noticed it doesn't compiler. Well, UnwindSafe is a protection against this, and AssertUnwindSafe is a way to bypass the protection.
You may ask, what's the point? The point is, this protection is really not accurate. So much not accurate, that bypassing it does not even require unsafe. But it still exists, so we have a lower chance of accidental UB.
It doesn't matter to you as the writer of the API - you should act like this protection doesn't exist, because it is safe to bypass it and easy to do so by mistake. The Rust standard library itself had bugs like that in the past (#86443, #81740, ... - It is not an accident that they're both in the same code - those issues tend to appear in chunks. But there're more).
Well, you are replacing the contents of the borrowed memory location with the default value. This would mean that the memory is indeed correct at every point. So there should not be any undefined behavior.
Basically from the perspective of the mutable reference x, you are mutating it to the default value, then mutating it again to a different new value.
In general, if there is a chance of undefined behavior, you will need to use the unsafe keyword. Or somebody has made a mistake while using the unsafe keyword further down the stack. It is relatively rare for these things to happen in the standard library.
Go ahead and look at the safety remarks in the code if you must: https://doc.rust-lang.org/src/core/mem/mod.rs.html#756

What is the logic behind freezing mutable borrowed variables in Rust? [duplicate]

Rust disallows this kind of code because it is unsafe:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
*ref_to_i_1 = 1;
*ref_to_i_2 = 2;
}
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
The only possible issues I can see come from the lifetime of the data. Here, if i is alive, each mutable reference to it should be ok.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:
for (it: collection) {
if (predicate(*it)) {
collection.remove(it);
}
}
For C++ standard library collections, this causes undefined behaviour. Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array. Maybe the whole array underlying the collection will be relocated, and it'll fail immediately. Maybe it works most of the time but fails if a reallocation happens at the wrong time. In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException - a check which causes a runtime cost even when your code is correct. Neither language can detect the error during compilation.
This is a common example of a data race caused by concurrency, even in a single-threaded environment. Concurrency doesn't just mean parallelism: it can also mean nested computation. In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.
An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function. A concrete example would be passing overlapping memory ranges to memcpy instead of memmove. Actually, Rust's memcpy equivalent is unsafe too, but that's because it takes pointers instead of references. The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.
A more contrived example of reference aliasing is like this:
int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined
You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
I believe that although you trigger 'undefined behavior' by doing this, technically the noalias flag is not used by the Rust compiler for &mut references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way. What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.
See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias? for more information.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
Have a read of this series of blog articles about undefined behavior
In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about; in a single threaded environment you can avoid that sort of problem if you're careful. This is no different to creating an arbitrary pointer to invalid memory and writing to it; just don't do it. You're no worse off than using C.
To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed; that means that although your code may run in the linear sequence:
a; b; c;
There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence. Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.
tl;dr: Basically, the compiler may perform various optimizations; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.
As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future. It is not a 'breaking change' to introduce new compiler optimizations.
So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now; but the restriction allows the possibility of future performance optimizations.
Pertinent quote:
The C FAQ defines “undefined behavior” like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?
The compiler is allowed to optimize &mut pointers under the assumption that they are exclusive (not aliased). Your code breaks this assumption.
The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1 and ref_to_i_2 to a function that modifies both and then does something with them:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
*r2 = 2;
println!("{}", r1);
println!("{}", r2);
}
The compiler may (or may not) decide to de-interleave the accesses to r1 and r2, because they are not allowed to alias:
// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code.
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
println!("{}", r1);
*r2 = 2;
println!("{}", r2);
}
It might even realize that the println!s always print the same value and take advantage of that fact to further rearrange foo:
fn foo(r1: &mut i32, r2: &mut i32) {
println!("{}", 1);
println!("{}", 2);
*r1 = 1;
*r2 = 2;
}
It's good that a compiler can do this optimization! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). All else being equal, everybody likes their code to run fast, right?
You might say "Well, that's an invalid optimization because it doesn't do the same thing." But you'd be wrong: the whole point of &mut references is that they do not alias. There is no way to make r1 and r2 alias without breaking the rules†, which is what makes this optimization valid to do.
You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples. But bear in mind that these transformations are part of a long multi-step optimization process. It's important to uphold the properties of &mut references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.
One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem; asking the compiler for occasional exceptions to the &mut aliasing rule is basically asking it to do your job for you.
If you want shared mutability and to forego those optimizations, it's simple: don't use &mut. In the example, you can use &Cell<i32> instead of &mut i32, as the comments mentioned:
fn main() {
let mut i = std::cell::Cell::new(42);
let ref_to_i_1 = &i;
let ref_to_i_2 = &i;
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
r1.set(1);
r2.set(2);
println!("{}", r1.get()); // prints 2, guaranteed
println!("{}", r2.get()); // also prints 2
}
The types in std::cell provide interior mutability, which is jargon for "disallow certain optimizations because & references may mutate things". They aren't always quite as convenient as using &mut, but that's because using them gives you more flexibility to write code like the above.
Also read
The Problem With Single-threaded Shared Mutability describes how having multiple mutable references can cause soundness issues even in the absence of multiple threads and data races.
Dan Hulme's answer illustrates how aliased mutation of more complex data can also cause undefined behavior (even before compiler optimizations).
† Be aware that using unsafe by itself does not count as "breaking the rules". &mut references cannot be aliased, even when using unsafe, in order for your code to have defined behavior.
The simplest example I know of is trying to push into a Vec that's borrowed:
let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);
This is a compiler error:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:4:5
|
3 | let c = &v[0];
| - immutable borrow occurs here
4 | v.push('b');
| ^^^^^^^^^^^ mutable borrow occurs here
5 | dbg!(c);
| - immutable borrow later used here
It's good that this is a compiler error, because otherwise it would be a use-after-free. push reallocates the Vec's heap storage and invalidates our c reference. Rust doesn't actually know what push does; all Rust knows is that push takes &mut self, and here that violates the aliasing rule.
Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this. But if we play around a bit with references and enums, we can express something similar without heap allocation:
enum MyEnum<'a> {
Ptr(&'a i32),
Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
MyEnum::Ptr(i) => i,
MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);
Here we've taken a pointer to my_int, stored that pointer in my_enum, and made my_int_ptr_ptr point into my_enum. If we could then reassign my_enum, we could trash the bits that my_int_ptr_ptr was pointing to. A double dereference of my_int_ptr_ptr would be a wild pointer read, which would probably segfault. Luckily, this it another violation of the aliasing rule, and it won't compile:
error[E0506]: cannot assign to `my_enum` because it is borrowed
--> src/main.rs:12:1
|
8 | let my_int_ptr_ptr: &&i32 = match &my_enum {
| -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
| ---------------- borrow later used here
The term "aliasing" is typically used to identify situations where changing the order of operations involving different references would change the effect of those operations. If multiple references to an object are stored in different places, but the object is not modified during the lifetime of those references, compilers may usefully hoist, defer, or consolidate operations using those references without affecting program behavior.
For example, if a compiler sees that code reads the contents of an object referenced by x, then does something with an object referenced by y, and again reads the contents of the object referenced by x, and if the compiler knows that the action on y cannot have modified the object referenced by x, the compiler may consolidate both reads of x into a single read.
Determining in all cases whether an action on one reference might affect another is would be an intractable problem if programmers had unlimited freedom to use and store references however they saw fit. Rust, however, seeks to handle the two easy cases:
If an object will never be modified during the lifetime of a reference, machine code using the reference won't have to worry about what operations might change it during that lifetime, since it would be impossible for any operations to do so.
If during the lifetime of a reference, an object will only be modified by references that are visibly based upon that reference, machine code using that reference won't have to worry about whether any operations using that reference will interact with operations involving references that appear to be unrelated, because no seemingly-unrelated references will identify the same object.
Allowing for the possibility of aliasing between mutable references would make things much more complicated, since many optimizations which could be performed interchangeably with unshared references to mutable objects or shareable references to immutable ones could no longer do so. Once a language supports situations where operations involving seemingly independent references need to be processed in precisely ordered fashion, it's hard for compilers to know when such precise sequencing is required.

Why is the "move" keyword necessary when it comes to threads; why would I ever not want that behavior?

For example (taken from the Rust docs):
let v = vec![1, 2, 3];
let handle = thread::spawn(move || {
println!("Here's a vector: {:?}", v);
});
This is not a question about what move does, but about why it is necessary to specify.
In cases where you want the closure to take ownership of an outside value, would there ever be a reason not to use the move keyword? If move is always required in these cases, is there any reason why the presence of move couldn't just be implied/omitted? For example:
let v = vec![1, 2, 3];
let handle = thread::spawn(/* move is implied here */ || {
// Compiler recognizes that `v` exists outside of this closure's
// scope and does black magic to make sure the closure takes
// ownership of `v`.
println!("Here's a vector: {:?}", v);
});
The above example gives the following compile error:
closure may outlive the current function, but it borrows `v`, which is owned by the current function
When the error magically goes away simply by adding move, I can't help but wonder to myself: why would I ever not want that behavior?
I'm not suggesting anything is wrong with the required syntax. I'm just trying to gain a deeper understanding of move from people who understand Rust better than I do. :)
It's all about lifetime annotations, and a design decision Rust made long ago.
See, the reason why your thread::spawn example fails to compile is because it expects a 'static closure. Since the new thread can run longer than the code that spawned it, we have to make sure that any captured data stays alive after the caller returns. The solution, as you pointed out, is to pass ownership of the data with move.
But the 'static constraint is a lifetime annotation, and a fundamental principle of Rust is that lifetime annotations never affect run-time behavior. In other words, lifetime annotations are only there to convince the compiler that the code is correct; they can't change what the code does.
If Rust inferred the move keyword based on whether the callee expects 'static, then changing the lifetimes in thread::spawn may change when the captured data is dropped. This means that a lifetime annotation is affecting runtime behavior, which is against this fundamental principle. We can't break this rule, so the move keyword stays.
Addendum: Why are lifetime annotations erased?
To give us the freedom to change how lifetime inference works, which allows for improvements like non-lexical lifetimes (NLL).
So that alternative Rust implementations like mrustc can save effort by ignoring lifetimes.
Much of the compiler assumes that lifetimes work this way, so to make it otherwise would take a huge effort with dubious gain. (See this article by Aaron Turon; it's about specialization, not closures, but its points apply just as well.)
There are actually a few things in play here. To help answer your question, we must first understand why move exists.
Rust has 3 types of closures:
FnOnce, a closure that consumes its captured variables (and hence can only be called once),
FnMut, a closure that mutably borrows its captured variables, and
Fn, a closure that immutably borrows its captured variables.
When you create a closure, Rust infers which trait to use based on how the closure uses the values from the environment. The manner in which a closure captures its environment depends on its type. A FnOnce captures by value (which may be a move or a copy if the type is Copyable), a FnMut mutably borrows, and a Fn immutably borrows. However, if you use the move keyword when declaring a closure, it will always "capture by value", or take ownership of the environment before capturing it. Thus, the move keyword is irrelevant for FnOnces, but it changes how Fns and FnMuts capture data.
Coming to your example, Rust infers the type of the closure to be a Fn, because println! only requires a reference to the value(s) it is printing (the Rust book page you linked talks about this when explaining the error without move). The closure thus attempts to borrow v, and the standard lifetime rules apply. Since thread::spawn requires that the closure passed to it have a 'static lifetime, the captured environment must also have a 'static lifetime, which v does not outlive, causing the error. You must thus explicitly specify that you want the closure to take ownership of v.
This can be further exemplified by changing the closure to something that the compiler would infer to be a FnOnce -- || v, as a simple example. Since the compiler infers that the closure is a FnOnce, it captures v by value by default, and the line let handle = thread::spawn(|| v); compiles without requiring the move.
The existing answers have great information, which led me to an understanding that is easier for me to think about, and hopefully easier for other Rust newcomers to get.
Consider this simple Rust program:
fn print_vec (v: &Vec<u32>) {
println!("Here's a vector: {:?}", v);
}
fn main() {
let mut v: Vec<u32> = vec![1, 2, 3];
print_vec(&v); // `print_vec()` borrows `v`
v.push(4);
}
Now, asking why the move keyword can't be implied is like asking why the "&" in print_vec(&v) can't also be implied.
Rust’s central feature is ownership. You can't just tell the compiler, "Hey, here's a bunch of code I wrote, now please discern perfectly everywhere I intend to reference, borrow, copy, move, etc. Kthnxsbye!" Symbols and keywords like & and move are a necessary and integral part of the language.
In hindsight, this seems really obvious, and makes my question seem a little silly!

Why do immutable references to copy types in rust exist?

So I just started learning rust (first few chapters of "the book") and am obviously quite a noob. I finished the ownership-basics chapter (4) and wrote some test programs to make sure I understood everything. I seem to have the basics down but I asked myself why immutable references to copy-types are even possible. I will try to explain my thoughts with examples.
I thought that you maybe want to store a reference to a copy-type so you can check it's value later instead of having a copy of the old value but this can't be it since the underlying value can't be changed as long as it's been borrowed.
The most basic example of this would be this code:
let mut x = 10; // push i32
let x_ref = &x; // push immutable reference to x
// x = 100; change x which is disallowed since it's borrowed currently
println!("{}", x_ref); // do something with the reference since you want the current value of x
The only reason for this I can currently think of (with my current knowledge) is that they just exist so you can call generic methods which require references (like cmp) with them.
This code demonstrates this:
let x = 10; // push i32
// let ordering = 10.cmp(x); try to compare it but you can't since cmp wants a reference
let ordering = 10.cmp(&x) // this works since it's now a reference
So, is that the only reason you can create immutable references to copy-types?
Disclaimer:
I don't see Just continue reading the book as a valid answer. However I fully understand if you say something like Yes you need those for this and this use-case (optional example), it will be covered in chapter X. I hope you understand what I mean :)
EDIT:
Maybe worth mentioning, I'm a C# programmer and not new to programming itself.
EDIT 2:
I don't know if this is technically a duplicate of this question but I do not fully understand the question and the answer so I hope for a more simple answer understandable by a real noob.
An immutable reference to a Copy-type is still "an immutable reference". The code that gets passed the reference can't change the original value. It can make a (hopefully) trivial copy of that value, but it can still only ever change that copy after doing so.
That is, the original owner of the value is ensured that - while receivers of the reference may decide to make a copy and change that - the state of whatever is referenced can't ever change. If the receiver wants to change the value, it can feel free; nobody else is going to see it, though.
Immutable references to primitives are not different, and while being Copy everywhere, you are probably more inclined to what "an immutable reference" means semantically for primitive types. For instance
fn print_the_age(age: &i32) { ... }
That function could make a copy via *age and change it. But the caller will not see that change and it does not make much sense to do so in the first place.
Update due to comment: There is no advantage per se, at least as far as primitives are concerned (larger types may be costly to copy). It does boil down to the semantic relationship between the owner of the i32 and the receiver: "Here is a reference, it is guaranteed to not change while you have that reference, I - the owner - can't change or move or deallocate and there is no other thread else including myself that could possibly do that".
Consider where the reference is coming from: If you receive an &i32, wherever it is coming from can't change and can't deallocate. The `i32´ may be part of a larger type, which - due to handing out a reference - can't move, change or get de-allocated; the receiver is guaranteed of that. It's hard to say there is an advantage per se in here; it might be advantageous to communicate more detailed type (and lifetime!) relationships this way.
They're very useful, because they can be passed to generic functions that expect a reference:
fn map_vec<T, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
If immutable references of non-Copy types were forbidden, we would need two versions:
fn map_vec_own<T: !Copy, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
fn map_vec_copy<T: Copy, U>(v: &Vec<T>, f: impl Fn( T) -> U) -> Vec<U> {...}
Immutable references are, naturally, used to provide access to the referenced data. For instance, you could have loaded a dictionary and have multiple threads reading from it at the same time, each using their own immutable reference. Because the references are immutable those threads will not corrupt that common data.
Using only mutable references, you can't be sure of that so you need to make full copies. Copying data takes time and space, which are always limited. The primary question for performance tends to be if your data fits in CPU cache.
I'm guessing you were thinking of "copy" types as ones that fit in the same space as the reference itself, i.e. sizeof(type) <= sizeof(type*). Rust's Copy trait indicates data that could be safely copied, no matter the size. These are orthogonal concepts; for instance, a pointer might not be safely copied without adjusting a refernce count, or an array might be copyable but take gigabytes of memory. This is why Rc<T> has the Clone trait, not Copy.

Resources