Why is transmuting &T to &mut T Undefined Behaviour? - rust

I want to reinterpret an immutable reference to a mutable reference (in an unsafe block) and be responsible for the safety checks on my own, yet it appears I cannot use mem::transmute() to do so.
let map_of_vecs: HashMap<usize, Vec<_>> = ...;
let vec = map_of_vecs[2];
/// obtain a mutable reference to vec here
I do not want to wrap the Vecs into Cells because that would affect all other areas of code that use map_of_vecs and I only need mutability in one line.
I do not have mutable access to map_of_vecs

The Rust optimiser makes the assumption that &mut T references are unique. For example, it might deduce that a particular piece of memory can be reused because a mutable reference to that memory exists but is never accessed again.
However, if you transmute a &T to a &mut T then you are able to create multiple mutable references to the same data. If the compiler makes this assumption, you could end up dereferencing a value that has been overwritten with something else.
This is just one example of how the compiler might make use of the assumption that mutable references are unique. In fact, the compiler is free to use this information in any way it sees fit — which could (and likely will) change from version to version.
Even if you think you have guaranteed that the reference isn't aliased, you can't always guarantee that users of your code won't create more references. Even if you think you can be sure of that, the existence of references is extremely subtle and it's very easy to miss one. For example when you call a method that takes &self, that's a reference.

The Rust compiler annotates &T function parameters with the LLVM noalias and readonly attributes (provided that T does not contain any UnsafeCell parts). The noalias attribute tells LLVM that the memory behind this pointer may only be written to through this pointer (and not through any other pointers), and the readonly attribute tells LLVM that it can't be written to through this pointer (but possibly other pointers). In combination, the two attributes allow the LLVM optimiser to assume the memory is not changed at all during the execution of this function, and the code can be optimised based on this assumption. The optimiser may reorder instructions or remove code in a way that is only safe to do if you actually stick to this contract.
Another way the conversion can lead to undefined behaviour is for statics: immutable statics without UnsafeCells will be placed into read-only memory, so if you actually write to them, your code will segfault.
For parameters with UnsafeCells the compiler does not emit the readonly attribute, and statics containing an UnsafeCell are placed into writable memory.

Related

How does refcell interact with the borrow checker

I am trying to wrap my head around the RefCell type, not as a utility, but in terms of it's implementation.
I understand the general idea, a struct that has mutable data inside of it that keeps track of how many active references to that data exist.
but how does it interact with the borrow checker to allow you to mutate the data multiple times?
but how does it interact with the borrow checker to allow you to mutate the data multiple times?
Internally, it doesn't really - it uses unsafe code.
RefCell is a checked wrapper around UnsafeCell, which provides the inner mutability. Its get method returns a *mut pointer to the wrapped type, which can be cast to a &mut reference with an arbitrary lifetime. This, of course, is very unsafe, so it's up to the user to ensure there's no violations of the borrow checker rules with UnsafeCell.
Cell does this by not allowing references to the inner object. RefCell does this via essentially a single-threaded lock, keeping track of whether a reference exists and denying access if so. Mutex and RwLock are similar, but use thread-safe locks.
(As an aside, UnsafeCell IS compiler magic, but for a different reason - it's needed to tell the compiler that an object may change behind an immutable reference. But that doesn't have to do with the borrow checks so much as optimizations.)

How to modify private mutable state when the trait dictates a non-mutable self reference? [duplicate]

When would you be required to use Cell or RefCell? It seems like there are many other type choices that would be suitable in place of these, and the documentation warns that using RefCell is a bit of a "last resort".
Is using these types a "code smell"? Can anyone show an example where using these types makes more sense than using another type, such as Rc or even Box?
It is not entirely correct to ask when Cell or RefCell should be used over Box and Rc because these types solve different problems. Indeed, more often than not RefCell is used together with Rc in order to provide mutability with shared ownership. So yes, use cases for Cell and RefCell are entirely dependent on the mutability requirements in your code.
Interior and exterior mutability are very nicely explained in the official Rust book, in the designated chapter on mutability. External mutability is very closely tied to the ownership model, and mostly when we say that something is mutable or immutable we mean exactly the external mutability. Another name for external mutability is inherited mutability, which probably explains the concept more clearly: this kind of mutability is defined by the owner of the data and inherited to everything you can reach from the owner. For example, if your variable of a structural type is mutable, so are all fields of the structure in the variable:
struct Point { x: u32, y: u32 }
// the variable is mutable...
let mut p = Point { x: 10, y: 20 };
// ...and so are fields reachable through this variable
p.x = 11;
p.y = 22;
let q = Point { x: 10, y: 20 };
q.x = 33; // compilation error
Inherited mutability also defines which kinds of references you can get out of the value:
{
let px: &u32 = &p.x; // okay
}
{
let py: &mut u32 = &mut p.x; // okay, because p is mut
}
{
let qx: &u32 = &q.x; // okay
}
{
let qy: &mut u32 = &mut q.y; // compilation error since q is not mut
}
Sometimes, however, inherited mutability is not enough. The canonical example is reference-counted pointer, called Rc in Rust. The following code is entirely valid:
{
let x1: Rc<u32> = Rc::new(1);
let x2: Rc<u32> = x1.clone(); // create another reference to the same data
let x3: Rc<u32> = x2.clone(); // even another
} // here all references are destroyed and the memory they were pointing at is deallocated
At the first glance it is not clear how mutability is related to this, but recall that reference-counted pointers are called so because they contain an internal reference counter which is modified when a reference is duplicated (clone() in Rust) and destroyed (goes out of scope in Rust). Hence Rc has to modify itself even though it is stored inside a non-mut variable.
This is achieved via internal mutability. There are special types in the standard library, the most basic of them being UnsafeCell, which allow one to work around the rules of external mutability and mutate something even if it is stored (transitively) in a non-mut variable.
Another way to say that something has internal mutability is that this something can be modified through a &-reference - that is, if you have a value of type &T and you can modify the state of T which it points at, then T has internal mutability.
For example, Cell can contain Copy data and it can be mutated even if it is stored in non-mut location:
let c: Cell<u32> = Cell::new(1);
c.set(2);
assert_eq!(c.get(), 2);
RefCell can contain non-Copy data and it can give you &mut pointers to its contained value, and absence of aliasing is checked at runtime. This is all explained in detail on their documentation pages.
As it turned out, in overwhelming number of situations you can easily go with external mutability only. Most of existing high-level code in Rust is written that way. Sometimes, however, internal mutability is unavoidable or makes the code much clearer. One example, Rc implementation, is already described above. Another one is when you need shared mutable ownership (that is, you need to access and modify the same value from different parts of your code) - this is usually achieved via Rc<RefCell<T>>, because it can't be done with references alone. Even another example is Arc<Mutex<T>>, Mutex being another type for internal mutability which is also safe to use across threads.
So, as you can see, Cell and RefCell are not replacements for Rc or Box; they solve the task of providing you mutability somewhere where it is not allowed by default. You can write your code without using them at all; and if you get into a situation when you would need them, you will know it.
Cells and RefCells are not code smell; the only reason whey they are described as "last resort" is that they move the task of checking mutability and aliasing rules from the compiler to the runtime code, as in case with RefCell: you can't have two &muts pointing to the same data at the same time, this is statically enforced by the compiler, but with RefCells you can ask the same RefCell to give you as much &muts as you like - except that if you do it more than once it will panic at you, enforcing aliasing rules at runtime. Panics are arguably worse than compilation errors because you can only find errors causing them at runtime rather than at compilation time. Sometimes, however, the static analyzer in the compiler is too restrictive, and you indeed do need to "work around" it.
No, Cell and RefCell aren't "code smells". Normally, mutability is inherited, that is you can mutate a field or a part of a data structure if and only if you have exclusive access to of the whole data structure, and hence you can opt into mutability at that level with mut (i.e., foo.x inherits its mutability or lack thereof from foo). This is a very powerful pattern and should be used whenever it works well (which is surprisingly often). But it's not expressive enough for all code everywhere.
Box and Rc have nothing to do with this. Like almost all other types, they respect inherited mutability: you can mutate the contents of a Box if you have exclusive, mutable access to the Box (because that means you have exclusive access to the contents, too). Conversely, you can never get a &mut to the contents of an Rc because by its nature Rc is shared (i.e. there can be multiple Rcs referring to the same data).
One common case of Cell or RefCell is that you need to share mutable data between several places. Having two &mut references to the same data is normally not allowed (and for good reason!). However, sometimes you need it, and the cell types enable doing it safely.
This could be done via the common combination of Rc<RefCell<T>>, which allows the data to stick around for as long as anyone uses it and allows everyone (but only one at a time!) to mutate it. Or it could be as simple as &Cell<i32> (even if the cell is wrapped in a more meaningful type). The latter is also commonly used for internal, private, mutable state like reference counts.
The documentation actually has several examples of where you'd use Cell or RefCell. A good example is actually Rc itself. When creating a new Rc, the reference count must be increased, but the reference count is shared between all Rcs, so, by inherited mutability, this couldn't possibly work. Rc practically has to use a Cell.
A good guideline is to try writing as much code as possible without cell types, but using them when it hurts too much without them. In some cases, there is a good solution without cells, and, with experience, you'll be able to find those when you previously missed them, but there will always be things that just aren't possible without them.
Suppose you want or need to create some object of the type of your choice and dump it into an Rc.
let x = Rc::new(5i32);
Now, you can easily create another Rc that points to the exact same object and therefore memory location:
let y = x.clone();
let yval: i32 = *y;
Since in Rust you may never have a mutable reference to a memory location to which any other reference exists, these Rc containers can never be modified again.
So, what if you wanted to be able to modify those objects and have multiple Rc pointing to one and the same object?
This is the issue that Cell and RefCell solve. The solution is called "interior mutability", and it means that Rust's aliasing rules are enforced at runtime instead of compile-time.
Back to our original example:
let x = Rc::new(RefCell::new(5i32));
let y = x.clone();
To get a mutable reference to your type, you use borrow_mut on the RefCell.
let yval = x.borrow_mut();
*yval = 45;
In case you already borrowed the value your Rcs point to either mutably or non-mutably, the borrow_mut function will panic, and therefore enforce Rust's aliasing rules.
Rc<RefCell<T>> is just one example for RefCell, there are many other legitimate uses. But the documentation is right. If there is another way, use it, because the compiler cannot help you reason about RefCells.

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.
You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.
The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

When should I use a reference instead of transferring ownership?

From the Rust book's chapter on ownership, non-copyable values can be passed to functions by either transferring ownership or by using a mutable or immutable reference. When you transfer ownership of a value, it can't be used in the original function anymore: you must return it back if you want to. When you pass a reference, you borrow the value and can still use it.
I come from languages where values are immutable by default (Haskell, Idris and the like). As such, I'd probably never think about using references at all. Having the same value in two places looks dangerous (or, at least, awkward) to me. Since references are a feature, there must be a reason to use them.
Are there situations I should force myself to use references? What are those situations and why are they beneficial? Or are they just for convenience and defaulting to passing ownership is fine?
Mutable references in particular look very dangerous.
They are not dangerous, because the Rust compiler will not let you do anything dangerous. If you have a &mut reference to a value then you cannot simultaneously have any other references to it.
In general you should pass references around. This saves copying memory and should be the default thing you do, unless you have a good reason to do otherwise.
Some good reasons to transfer ownership instead:
When the value's type is small in size, such as bool, u32, etc. It's often better performance to move/copy these values to avoid a level of indirection. Usually these values implement Copy, and actually the compiler may make this optimisation for you automatically. Something it's free to do because of a strong type system and immutability by default!
When the value's current owner is going to go out of scope, you may want to move the value somewhere else to keep it alive.

Situations where Cell or RefCell is the best choice

When would you be required to use Cell or RefCell? It seems like there are many other type choices that would be suitable in place of these, and the documentation warns that using RefCell is a bit of a "last resort".
Is using these types a "code smell"? Can anyone show an example where using these types makes more sense than using another type, such as Rc or even Box?
It is not entirely correct to ask when Cell or RefCell should be used over Box and Rc because these types solve different problems. Indeed, more often than not RefCell is used together with Rc in order to provide mutability with shared ownership. So yes, use cases for Cell and RefCell are entirely dependent on the mutability requirements in your code.
Interior and exterior mutability are very nicely explained in the official Rust book, in the designated chapter on mutability. External mutability is very closely tied to the ownership model, and mostly when we say that something is mutable or immutable we mean exactly the external mutability. Another name for external mutability is inherited mutability, which probably explains the concept more clearly: this kind of mutability is defined by the owner of the data and inherited to everything you can reach from the owner. For example, if your variable of a structural type is mutable, so are all fields of the structure in the variable:
struct Point { x: u32, y: u32 }
// the variable is mutable...
let mut p = Point { x: 10, y: 20 };
// ...and so are fields reachable through this variable
p.x = 11;
p.y = 22;
let q = Point { x: 10, y: 20 };
q.x = 33; // compilation error
Inherited mutability also defines which kinds of references you can get out of the value:
{
let px: &u32 = &p.x; // okay
}
{
let py: &mut u32 = &mut p.x; // okay, because p is mut
}
{
let qx: &u32 = &q.x; // okay
}
{
let qy: &mut u32 = &mut q.y; // compilation error since q is not mut
}
Sometimes, however, inherited mutability is not enough. The canonical example is reference-counted pointer, called Rc in Rust. The following code is entirely valid:
{
let x1: Rc<u32> = Rc::new(1);
let x2: Rc<u32> = x1.clone(); // create another reference to the same data
let x3: Rc<u32> = x2.clone(); // even another
} // here all references are destroyed and the memory they were pointing at is deallocated
At the first glance it is not clear how mutability is related to this, but recall that reference-counted pointers are called so because they contain an internal reference counter which is modified when a reference is duplicated (clone() in Rust) and destroyed (goes out of scope in Rust). Hence Rc has to modify itself even though it is stored inside a non-mut variable.
This is achieved via internal mutability. There are special types in the standard library, the most basic of them being UnsafeCell, which allow one to work around the rules of external mutability and mutate something even if it is stored (transitively) in a non-mut variable.
Another way to say that something has internal mutability is that this something can be modified through a &-reference - that is, if you have a value of type &T and you can modify the state of T which it points at, then T has internal mutability.
For example, Cell can contain Copy data and it can be mutated even if it is stored in non-mut location:
let c: Cell<u32> = Cell::new(1);
c.set(2);
assert_eq!(c.get(), 2);
RefCell can contain non-Copy data and it can give you &mut pointers to its contained value, and absence of aliasing is checked at runtime. This is all explained in detail on their documentation pages.
As it turned out, in overwhelming number of situations you can easily go with external mutability only. Most of existing high-level code in Rust is written that way. Sometimes, however, internal mutability is unavoidable or makes the code much clearer. One example, Rc implementation, is already described above. Another one is when you need shared mutable ownership (that is, you need to access and modify the same value from different parts of your code) - this is usually achieved via Rc<RefCell<T>>, because it can't be done with references alone. Even another example is Arc<Mutex<T>>, Mutex being another type for internal mutability which is also safe to use across threads.
So, as you can see, Cell and RefCell are not replacements for Rc or Box; they solve the task of providing you mutability somewhere where it is not allowed by default. You can write your code without using them at all; and if you get into a situation when you would need them, you will know it.
Cells and RefCells are not code smell; the only reason whey they are described as "last resort" is that they move the task of checking mutability and aliasing rules from the compiler to the runtime code, as in case with RefCell: you can't have two &muts pointing to the same data at the same time, this is statically enforced by the compiler, but with RefCells you can ask the same RefCell to give you as much &muts as you like - except that if you do it more than once it will panic at you, enforcing aliasing rules at runtime. Panics are arguably worse than compilation errors because you can only find errors causing them at runtime rather than at compilation time. Sometimes, however, the static analyzer in the compiler is too restrictive, and you indeed do need to "work around" it.
No, Cell and RefCell aren't "code smells". Normally, mutability is inherited, that is you can mutate a field or a part of a data structure if and only if you have exclusive access to of the whole data structure, and hence you can opt into mutability at that level with mut (i.e., foo.x inherits its mutability or lack thereof from foo). This is a very powerful pattern and should be used whenever it works well (which is surprisingly often). But it's not expressive enough for all code everywhere.
Box and Rc have nothing to do with this. Like almost all other types, they respect inherited mutability: you can mutate the contents of a Box if you have exclusive, mutable access to the Box (because that means you have exclusive access to the contents, too). Conversely, you can never get a &mut to the contents of an Rc because by its nature Rc is shared (i.e. there can be multiple Rcs referring to the same data).
One common case of Cell or RefCell is that you need to share mutable data between several places. Having two &mut references to the same data is normally not allowed (and for good reason!). However, sometimes you need it, and the cell types enable doing it safely.
This could be done via the common combination of Rc<RefCell<T>>, which allows the data to stick around for as long as anyone uses it and allows everyone (but only one at a time!) to mutate it. Or it could be as simple as &Cell<i32> (even if the cell is wrapped in a more meaningful type). The latter is also commonly used for internal, private, mutable state like reference counts.
The documentation actually has several examples of where you'd use Cell or RefCell. A good example is actually Rc itself. When creating a new Rc, the reference count must be increased, but the reference count is shared between all Rcs, so, by inherited mutability, this couldn't possibly work. Rc practically has to use a Cell.
A good guideline is to try writing as much code as possible without cell types, but using them when it hurts too much without them. In some cases, there is a good solution without cells, and, with experience, you'll be able to find those when you previously missed them, but there will always be things that just aren't possible without them.
Suppose you want or need to create some object of the type of your choice and dump it into an Rc.
let x = Rc::new(5i32);
Now, you can easily create another Rc that points to the exact same object and therefore memory location:
let y = x.clone();
let yval: i32 = *y;
Since in Rust you may never have a mutable reference to a memory location to which any other reference exists, these Rc containers can never be modified again.
So, what if you wanted to be able to modify those objects and have multiple Rc pointing to one and the same object?
This is the issue that Cell and RefCell solve. The solution is called "interior mutability", and it means that Rust's aliasing rules are enforced at runtime instead of compile-time.
Back to our original example:
let x = Rc::new(RefCell::new(5i32));
let y = x.clone();
To get a mutable reference to your type, you use borrow_mut on the RefCell.
let yval = x.borrow_mut();
*yval = 45;
In case you already borrowed the value your Rcs point to either mutably or non-mutably, the borrow_mut function will panic, and therefore enforce Rust's aliasing rules.
Rc<RefCell<T>> is just one example for RefCell, there are many other legitimate uses. But the documentation is right. If there is another way, use it, because the compiler cannot help you reason about RefCells.

Resources