Does Rust automatically execute code using multithreads [duplicate] - multithreading

Rust disallows this kind of code because it is unsafe:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
*ref_to_i_1 = 1;
*ref_to_i_2 = 2;
}
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
The only possible issues I can see come from the lifetime of the data. Here, if i is alive, each mutable reference to it should be ok.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?

A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:
for (it: collection) {
if (predicate(*it)) {
collection.remove(it);
}
}
For C++ standard library collections, this causes undefined behaviour. Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array. Maybe the whole array underlying the collection will be relocated, and it'll fail immediately. Maybe it works most of the time but fails if a reallocation happens at the wrong time. In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException - a check which causes a runtime cost even when your code is correct. Neither language can detect the error during compilation.
This is a common example of a data race caused by concurrency, even in a single-threaded environment. Concurrency doesn't just mean parallelism: it can also mean nested computation. In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.
An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function. A concrete example would be passing overlapping memory ranges to memcpy instead of memmove. Actually, Rust's memcpy equivalent is unsafe too, but that's because it takes pointers instead of references. The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.
A more contrived example of reference aliasing is like this:
int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined
You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.

How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
I believe that although you trigger 'undefined behavior' by doing this, technically the noalias flag is not used by the Rust compiler for &mut references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way. What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.
See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias? for more information.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
Have a read of this series of blog articles about undefined behavior
In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about; in a single threaded environment you can avoid that sort of problem if you're careful. This is no different to creating an arbitrary pointer to invalid memory and writing to it; just don't do it. You're no worse off than using C.
To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed; that means that although your code may run in the linear sequence:
a; b; c;
There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence. Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.
tl;dr: Basically, the compiler may perform various optimizations; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.
As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future. It is not a 'breaking change' to introduce new compiler optimizations.
So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now; but the restriction allows the possibility of future performance optimizations.
Pertinent quote:
The C FAQ defines “undefined behavior” like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?
The compiler is allowed to optimize &mut pointers under the assumption that they are exclusive (not aliased). Your code breaks this assumption.
The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1 and ref_to_i_2 to a function that modifies both and then does something with them:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
*r2 = 2;
println!("{}", r1);
println!("{}", r2);
}
The compiler may (or may not) decide to de-interleave the accesses to r1 and r2, because they are not allowed to alias:
// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code.
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
println!("{}", r1);
*r2 = 2;
println!("{}", r2);
}
It might even realize that the println!s always print the same value and take advantage of that fact to further rearrange foo:
fn foo(r1: &mut i32, r2: &mut i32) {
println!("{}", 1);
println!("{}", 2);
*r1 = 1;
*r2 = 2;
}
It's good that a compiler can do this optimization! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). All else being equal, everybody likes their code to run fast, right?
You might say "Well, that's an invalid optimization because it doesn't do the same thing." But you'd be wrong: the whole point of &mut references is that they do not alias. There is no way to make r1 and r2 alias without breaking the rules†, which is what makes this optimization valid to do.
You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples. But bear in mind that these transformations are part of a long multi-step optimization process. It's important to uphold the properties of &mut references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.
One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem; asking the compiler for occasional exceptions to the &mut aliasing rule is basically asking it to do your job for you.
If you want shared mutability and to forego those optimizations, it's simple: don't use &mut. In the example, you can use &Cell<i32> instead of &mut i32, as the comments mentioned:
fn main() {
let mut i = std::cell::Cell::new(42);
let ref_to_i_1 = &i;
let ref_to_i_2 = &i;
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
r1.set(1);
r2.set(2);
println!("{}", r1.get()); // prints 2, guaranteed
println!("{}", r2.get()); // also prints 2
}
The types in std::cell provide interior mutability, which is jargon for "disallow certain optimizations because & references may mutate things". They aren't always quite as convenient as using &mut, but that's because using them gives you more flexibility to write code like the above.
Also read
The Problem With Single-threaded Shared Mutability describes how having multiple mutable references can cause soundness issues even in the absence of multiple threads and data races.
Dan Hulme's answer illustrates how aliased mutation of more complex data can also cause undefined behavior (even before compiler optimizations).
† Be aware that using unsafe by itself does not count as "breaking the rules". &mut references cannot be aliased, even when using unsafe, in order for your code to have defined behavior.

The simplest example I know of is trying to push into a Vec that's borrowed:
let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);
This is a compiler error:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:4:5
|
3 | let c = &v[0];
| - immutable borrow occurs here
4 | v.push('b');
| ^^^^^^^^^^^ mutable borrow occurs here
5 | dbg!(c);
| - immutable borrow later used here
It's good that this is a compiler error, because otherwise it would be a use-after-free. push reallocates the Vec's heap storage and invalidates our c reference. Rust doesn't actually know what push does; all Rust knows is that push takes &mut self, and here that violates the aliasing rule.
Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this. But if we play around a bit with references and enums, we can express something similar without heap allocation:
enum MyEnum<'a> {
Ptr(&'a i32),
Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
MyEnum::Ptr(i) => i,
MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);
Here we've taken a pointer to my_int, stored that pointer in my_enum, and made my_int_ptr_ptr point into my_enum. If we could then reassign my_enum, we could trash the bits that my_int_ptr_ptr was pointing to. A double dereference of my_int_ptr_ptr would be a wild pointer read, which would probably segfault. Luckily, this it another violation of the aliasing rule, and it won't compile:
error[E0506]: cannot assign to `my_enum` because it is borrowed
--> src/main.rs:12:1
|
8 | let my_int_ptr_ptr: &&i32 = match &my_enum {
| -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
| ---------------- borrow later used here

The term "aliasing" is typically used to identify situations where changing the order of operations involving different references would change the effect of those operations. If multiple references to an object are stored in different places, but the object is not modified during the lifetime of those references, compilers may usefully hoist, defer, or consolidate operations using those references without affecting program behavior.
For example, if a compiler sees that code reads the contents of an object referenced by x, then does something with an object referenced by y, and again reads the contents of the object referenced by x, and if the compiler knows that the action on y cannot have modified the object referenced by x, the compiler may consolidate both reads of x into a single read.
Determining in all cases whether an action on one reference might affect another is would be an intractable problem if programmers had unlimited freedom to use and store references however they saw fit. Rust, however, seeks to handle the two easy cases:
If an object will never be modified during the lifetime of a reference, machine code using the reference won't have to worry about what operations might change it during that lifetime, since it would be impossible for any operations to do so.
If during the lifetime of a reference, an object will only be modified by references that are visibly based upon that reference, machine code using that reference won't have to worry about whether any operations using that reference will interact with operations involving references that appear to be unrelated, because no seemingly-unrelated references will identify the same object.
Allowing for the possibility of aliasing between mutable references would make things much more complicated, since many optimizations which could be performed interchangeably with unshared references to mutable objects or shareable references to immutable ones could no longer do so. Once a language supports situations where operations involving seemingly independent references need to be processed in precisely ordered fashion, it's hard for compilers to know when such precise sequencing is required.

Related

Is it possible to borrow different parts of self with different mutability statically?

I am designing some struct with a prepare-commit pattern in rust, which mostly looks like this after simplified:
struct Data {
a: usize,
}
struct Adder<'a> {
data: &'a mut Data,
added: usize,
}
struct Holder<'a> {
data: &'a Data,
}
impl<'a> Adder<'a> {
fn add_once(&mut self) {
// Needs added to be mut, but not data
self.added = self.added + self.data.a;
}
fn commit(&mut self) {
// Needs data to be mut, but not added
self.data.a = self.added;
}
}
fn main() {
let mut d = Data { a: 1 };
let h = Holder { data: &d };
let mut ad = Adder {
data: &mut d,
added: 0,
};
ad.add_once();
ad.add_once();
println!("{}", h.data.a); // Cannot borrow here
ad.commit();
println!("{}", ad.data.a);
}
This snippet does not compile. The problem with the code snippet is that since Adder is borrowing data as mutable, upon the creation of Adder struct, all other immutable borrows (e.g.Holder here) could not live past. However, the add_once calls really do not need to borrow data mutably, only commit does that. Ideally there should be some way in rust to let the compiler know that add_once only needs added as mutable but not data, so that Holder could live past add_once calls but not commit call.
I do know two possible current solutions to this, however I think they both have drawbacks:
I could wrap fields of Adder in RefCell and borrow entire Adder immutably, only borrows mutably when needed. The problem with this approach is that RefCell checks borrowing dynamically, when in this case it really does not need to be that way since all lifetime bounds should be known at compile time, just different parts of a struct could have different bounds.
I could do unsafe blocks to convert references to raw pointers and do whatever I want. However as far as I think about it I could not find a way to do it without disable all borrow checks altogether (I don't really know how to do unsafe rust properly). And I want a borrow check so that if Holder lives past commit the compiler would still give a proper borrow error
So summary of questions:
Is there a static and safe way in rust to borrow different parts of self, with different mutability?
If not, is there a unsafe boilerplate to do so, still ensuring the proper borrow check as described above? (and if there is an existing crate for this)
What is the best practice in general for these situations?
Is there any RFC addressing this issue from rust language level?
Is there a static and safe way in rust to borrow different parts of self, with different mutability?
No.
If not, is there a unsafe boilerplate to do so, still ensuring the proper borrow check as described above? (and if there is an existing crate for this)
You can use unsafe for that, but it will require you to work with raw pointers only. You may be able to encapsulate it in a safe API, but it's unlikely that inside the code you'll have proper automatic verification. For this reason I also don't believe there is a crate for that, at least I don't know one.
What is the best practice in general for these situations?
It depends.
I would use RefCell (or other interior mutability primitives such as Cell - note that if you can use Cell it is actually zero-cost) for that generally, and use unsafe code if profiling shows a bottleneck. However, this kind of issues many times arises from improper API design. So first I'll see if I can redesign my API so the problem will not exist.
Is there any RFC addressing this issue from rust language level?
Not today, and I think it is very unlikely in the future.
There was something similar suggested while working on two phase borrows: https://github.com/rust-lang/rust/issues/49434. The idea was that any mutable reference will start as shared and only "upgrade" when it is actually used. In can be extended to "upgrade" only when a mutable reference is actually required, and this is basically what you propose.
However, the effect of two-phase borrows is completely local. For your code to work, we will have to need some way to represent time-aware mutable reference in a type. I don't see that happening.
In addition, two-phase borrows work poorly with Stacked Borrows. #RalfJung ended up with just using raw pointers when two phase borrows are involved. Using them for everything will probably mean poor optimization opportunities (although maybe future models, or even Stacked Borrows itself, will find a way to handle that).

Is it safe to temporarily give away ownership of the contents of a mutable borrow in Rust? [duplicate]

This question already has an answer here:
replace a value behind a mutable reference by moving and mapping the original
(1 answer)
Closed 1 year ago.
Is a function that modifies a &mut T in place by a function FnOnce(T) -> T safe to have in rust, or can it lead to undefined behavior? Is it included in the standard library somewhere, or a well-known crate?
If you additionally assume T: Default, that looks like
fn modify<T, F: FnOnce(T) -> T>(x: &mut T, f: F) -> ()
where
T: Default
{
let val = std::mem::take(x);
let val = f(val);
*x = val;
}
(See also
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f015812bac6f527fe663fe4e0b7a3188)
My question is about doing the same but dropping the where T: Default clause (and no T: Clone either). This requires a different implementation, since you can't use std::mem::take.
I'm not sure how to implement the unconstrained version, but it should be possible using unsafe Rust.
I'm learning Rust from a background of linear types and sub-structural logic. Rust's mutable borrow seems very similar to moving a resource in and then back out of a function, but I don't know if it is actually safe to take temporary ownership of the contents of a mutable borrow like this.
It is safe, and there are even crates for that (can't find them now).
HOWEVER.
When writing unsafe code, you have to be very careful. If you don't know exactly what you're doing, it can easily lead to UB.
Here, for example, there is something you maybe haven't thought of: panic safety.
Suppose we implement that trivially:
pub fn modify<T, F: FnOnce(T) -> T>(v: &mut T, f: F) {
let prev = unsafe { std::ptr::read(v) };
let new = f(prev);
unsafe { std::ptr::write(v, new) };
}
Trivially right.
Or is it?
fn main() {
struct MyStruct(pub i32);
impl Drop for MyStruct {
fn drop(&mut self) {
println!("MyStruct({}) dropped", self.0);
}
}
let mut v = MyStruct(123);
std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
modify(&mut v, |_prev| {
// `prev` is dropped here.
panic!("Haha, evil panic!");
})
}))
.unwrap_err();
v.0 = 456; // Writing to an uninitialized memory!
// `v` is dropped here, double drop!
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6f7312a8be70cd43cf5cf7a9816be56a
I used a custom type that its destructor does nothing but to print, but imagine what could happen if this was a Vec that freed the memory and we were writing into freed memory (then, as a bonus, get a double-free).
It is correct, like #Kendas said, that when there are no interruption point it is valid to leave memory in an uninitialized state in Rust. The problem is, that much more places than you wish are actually interruption points. In fact, when writing unsafe code, you have to consider any call to external code (i.e. not yours code neither code that you trust to not do bad things, for example std) to be an interruption point.
Unsafe code is hard. Better stay in the safe land.
Edit: You may wonder what the AssertUnwindSafe is. Maybe you even tried to remove it and noticed it doesn't compiler. Well, UnwindSafe is a protection against this, and AssertUnwindSafe is a way to bypass the protection.
You may ask, what's the point? The point is, this protection is really not accurate. So much not accurate, that bypassing it does not even require unsafe. But it still exists, so we have a lower chance of accidental UB.
It doesn't matter to you as the writer of the API - you should act like this protection doesn't exist, because it is safe to bypass it and easy to do so by mistake. The Rust standard library itself had bugs like that in the past (#86443, #81740, ... - It is not an accident that they're both in the same code - those issues tend to appear in chunks. But there're more).
Well, you are replacing the contents of the borrowed memory location with the default value. This would mean that the memory is indeed correct at every point. So there should not be any undefined behavior.
Basically from the perspective of the mutable reference x, you are mutating it to the default value, then mutating it again to a different new value.
In general, if there is a chance of undefined behavior, you will need to use the unsafe keyword. Or somebody has made a mistake while using the unsafe keyword further down the stack. It is relatively rare for these things to happen in the standard library.
Go ahead and look at the safety remarks in the code if you must: https://doc.rust-lang.org/src/core/mem/mod.rs.html#756

What is the logic behind freezing mutable borrowed variables in Rust? [duplicate]

Rust disallows this kind of code because it is unsafe:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
*ref_to_i_1 = 1;
*ref_to_i_2 = 2;
}
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
The only possible issues I can see come from the lifetime of the data. Here, if i is alive, each mutable reference to it should be ok.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:
for (it: collection) {
if (predicate(*it)) {
collection.remove(it);
}
}
For C++ standard library collections, this causes undefined behaviour. Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array. Maybe the whole array underlying the collection will be relocated, and it'll fail immediately. Maybe it works most of the time but fails if a reallocation happens at the wrong time. In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException - a check which causes a runtime cost even when your code is correct. Neither language can detect the error during compilation.
This is a common example of a data race caused by concurrency, even in a single-threaded environment. Concurrency doesn't just mean parallelism: it can also mean nested computation. In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.
An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function. A concrete example would be passing overlapping memory ranges to memcpy instead of memmove. Actually, Rust's memcpy equivalent is unsafe too, but that's because it takes pointers instead of references. The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.
A more contrived example of reference aliasing is like this:
int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined
You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
I believe that although you trigger 'undefined behavior' by doing this, technically the noalias flag is not used by the Rust compiler for &mut references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way. What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.
See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias? for more information.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
Have a read of this series of blog articles about undefined behavior
In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about; in a single threaded environment you can avoid that sort of problem if you're careful. This is no different to creating an arbitrary pointer to invalid memory and writing to it; just don't do it. You're no worse off than using C.
To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed; that means that although your code may run in the linear sequence:
a; b; c;
There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence. Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.
tl;dr: Basically, the compiler may perform various optimizations; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.
As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future. It is not a 'breaking change' to introduce new compiler optimizations.
So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now; but the restriction allows the possibility of future performance optimizations.
Pertinent quote:
The C FAQ defines “undefined behavior” like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?
The compiler is allowed to optimize &mut pointers under the assumption that they are exclusive (not aliased). Your code breaks this assumption.
The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1 and ref_to_i_2 to a function that modifies both and then does something with them:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
*r2 = 2;
println!("{}", r1);
println!("{}", r2);
}
The compiler may (or may not) decide to de-interleave the accesses to r1 and r2, because they are not allowed to alias:
// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code.
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
println!("{}", r1);
*r2 = 2;
println!("{}", r2);
}
It might even realize that the println!s always print the same value and take advantage of that fact to further rearrange foo:
fn foo(r1: &mut i32, r2: &mut i32) {
println!("{}", 1);
println!("{}", 2);
*r1 = 1;
*r2 = 2;
}
It's good that a compiler can do this optimization! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). All else being equal, everybody likes their code to run fast, right?
You might say "Well, that's an invalid optimization because it doesn't do the same thing." But you'd be wrong: the whole point of &mut references is that they do not alias. There is no way to make r1 and r2 alias without breaking the rules†, which is what makes this optimization valid to do.
You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples. But bear in mind that these transformations are part of a long multi-step optimization process. It's important to uphold the properties of &mut references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.
One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem; asking the compiler for occasional exceptions to the &mut aliasing rule is basically asking it to do your job for you.
If you want shared mutability and to forego those optimizations, it's simple: don't use &mut. In the example, you can use &Cell<i32> instead of &mut i32, as the comments mentioned:
fn main() {
let mut i = std::cell::Cell::new(42);
let ref_to_i_1 = &i;
let ref_to_i_2 = &i;
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
r1.set(1);
r2.set(2);
println!("{}", r1.get()); // prints 2, guaranteed
println!("{}", r2.get()); // also prints 2
}
The types in std::cell provide interior mutability, which is jargon for "disallow certain optimizations because & references may mutate things". They aren't always quite as convenient as using &mut, but that's because using them gives you more flexibility to write code like the above.
Also read
The Problem With Single-threaded Shared Mutability describes how having multiple mutable references can cause soundness issues even in the absence of multiple threads and data races.
Dan Hulme's answer illustrates how aliased mutation of more complex data can also cause undefined behavior (even before compiler optimizations).
† Be aware that using unsafe by itself does not count as "breaking the rules". &mut references cannot be aliased, even when using unsafe, in order for your code to have defined behavior.
The simplest example I know of is trying to push into a Vec that's borrowed:
let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);
This is a compiler error:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:4:5
|
3 | let c = &v[0];
| - immutable borrow occurs here
4 | v.push('b');
| ^^^^^^^^^^^ mutable borrow occurs here
5 | dbg!(c);
| - immutable borrow later used here
It's good that this is a compiler error, because otherwise it would be a use-after-free. push reallocates the Vec's heap storage and invalidates our c reference. Rust doesn't actually know what push does; all Rust knows is that push takes &mut self, and here that violates the aliasing rule.
Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this. But if we play around a bit with references and enums, we can express something similar without heap allocation:
enum MyEnum<'a> {
Ptr(&'a i32),
Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
MyEnum::Ptr(i) => i,
MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);
Here we've taken a pointer to my_int, stored that pointer in my_enum, and made my_int_ptr_ptr point into my_enum. If we could then reassign my_enum, we could trash the bits that my_int_ptr_ptr was pointing to. A double dereference of my_int_ptr_ptr would be a wild pointer read, which would probably segfault. Luckily, this it another violation of the aliasing rule, and it won't compile:
error[E0506]: cannot assign to `my_enum` because it is borrowed
--> src/main.rs:12:1
|
8 | let my_int_ptr_ptr: &&i32 = match &my_enum {
| -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
| ---------------- borrow later used here
The term "aliasing" is typically used to identify situations where changing the order of operations involving different references would change the effect of those operations. If multiple references to an object are stored in different places, but the object is not modified during the lifetime of those references, compilers may usefully hoist, defer, or consolidate operations using those references without affecting program behavior.
For example, if a compiler sees that code reads the contents of an object referenced by x, then does something with an object referenced by y, and again reads the contents of the object referenced by x, and if the compiler knows that the action on y cannot have modified the object referenced by x, the compiler may consolidate both reads of x into a single read.
Determining in all cases whether an action on one reference might affect another is would be an intractable problem if programmers had unlimited freedom to use and store references however they saw fit. Rust, however, seeks to handle the two easy cases:
If an object will never be modified during the lifetime of a reference, machine code using the reference won't have to worry about what operations might change it during that lifetime, since it would be impossible for any operations to do so.
If during the lifetime of a reference, an object will only be modified by references that are visibly based upon that reference, machine code using that reference won't have to worry about whether any operations using that reference will interact with operations involving references that appear to be unrelated, because no seemingly-unrelated references will identify the same object.
Allowing for the possibility of aliasing between mutable references would make things much more complicated, since many optimizations which could be performed interchangeably with unshared references to mutable objects or shareable references to immutable ones could no longer do so. Once a language supports situations where operations involving seemingly independent references need to be processed in precisely ordered fashion, it's hard for compilers to know when such precise sequencing is required.

Moved variable still borrowing after calling `drop`?

fn main() {
let mut x: Vec<&i32> = vec![];
let a = 1;
x.push(&a);
drop(x);
// x.len(); // error[E0382]: use of moved value: `x`
} // `a` dropped here while still borrowed
The compiler knows drop() drops x (as evident from the error in the commented-out code) but still thinks the variable is borrowing from a! This is unfair!
Should this be considered as one of numerous dupes of rust-lang/rust#6393 (which is now tracked by rust-lang/rfcs#811?) But the discussion there seems to be centered on making &mut self and &self coexist in a single block.
I can't give you a definite answer, but I'll try to explain a few things here. Let's start with clarifying something:
The compiler knows drop() drops x
This is not true. While there are a few "magic" things in the standard library that the compiler knows about, drop() is not such a lang item. In fact, you could implement drop() yourself and it's actually the easiest thing to do:
fn drop<T>(_: T) {}
The function just takes something by value (thus, it's moved into drop()) and since nothing happens inside of drop(), this value is dropped at the end of the scope, like in any other function. So: the compiler doesn't know x is dropped, it just knows x is moved.
As you might have noticed, the compiler error stays the same regardless of whether or not we add the drop() call. Right now, the compiler will only look at the scope of a variable when it comes to references. From Niko Matsakis' intro to NLL:
The way that the compiler currently works, assigning a reference into a variable means that its lifetime must be as large as the entire scope of that variable.
And in a later blog post of his:
In particular, today, once a lifetime must extend beyond the boundaries of a single statement [...], it must extend all the way till the end of the enclosing block.
This is exactly what happens here, so yes, your problem has to do with all this "lexical borrowing" stuff. From the current compilers perspective, the lifetime of the expression &a needs to be at least as large as the scope of x. But this doesn't work, since the reference would outlive a, since the scope of x is larger than the scope of a as pointed out by the compiler:
= note: values in a scope are dropped in the opposite order they are created
And I guess you already know all that, but you can fix your example by swapping the lines let mut x ...; and let a ...;.
I'm not sure whether or not this exact problem would be solved by any of the currently proposed solutions. But I hope that we will see soon enough, as all of this is being addressed as part of the Rust 2017 roadmap. A good place to read up on the updates is here (which also contains links to the five relevant blog posts of Niko).
The compiler knows drop() drops x (as evident from the error in the commented-out code)
The Rust compiler doesn't know anything about drop and what it does. It's just a library function, which could do anything it likes with the value since it now owns it.
The definition of drop, as the documentation points out, is literally just:
fn drop<T>(_x: T) { }
It works because it the argument is moved into the function, and is therefore automatically dropped by the compiler when the function finishes.
If you create your own function, you will get exactly the same error message:
fn my_drop<T>(_x: T) { }
fn main() {
let mut x: Vec<&i32> = vec![];
let a = 1;
x.push(&a);
my_drop(x);
x.len();
}
This is exactly what is meant in the documentation when it says drop "isn't magic".

Can Rust optimise away the bit-wise copy during move of an object someday?

Consider the snippet
struct Foo {
dummy: [u8; 65536],
}
fn bar(foo: Foo) {
println!("{:p}", &foo)
}
fn main() {
let o = Foo { dummy: [42u8; 65536] };
println!("{:p}", &o);
bar(o);
}
A typical result of the program is
0x7fffc1239890
0x7fffc1229890
where the addresses are different.
Apparently, the large array dummy has been copied, as expected in the compiler's move implementation. Unfortunately, this can have non-trivial performance impact, as dummy is a very large array. This impact can force people to choose passing argument by reference instead, even when the function actually "consumes" the argument conceptually.
Since Foo does not derive Copy, object o is moved. Since Rust forbids the access of moved object, what is preventing bar to "reuse" the original object o, forcing the compiler to generate a potentially expensive bit-wise copy? Is there a fundamental difficulty, or will we see the compiler someday optimise away this bit-wise copy?
Given that in Rust (unlike C or C++) the address of a value is not considered to matter, there is nothing in terms of language that prevents the elision of the copy.
However, today rustc does not optimize anything: all optimizations are delegated to LLVM, and it seems you have hit a limitation of the LLVM optimizer here (it's unclear whether this limitation is due to LLVM being close to C's semantics or is just an omission).
So, there are two avenues of improving code generation for this:
teaching LLVM to perform this optimization (if possible)
teaching rustc to perform this optimization (optimization passes are coming to rustc now that it has MIR)
but for now you might simply want to avoid such large objects from being allocated on the stack, you can Box it for example.

Resources