Why is immutability enforced in Rust unless otherwise specified with `mut`?

Why is immutability enforced in Rust unless otherwise specified with `mut`? - rust

Why is immutability forced in Rust, unless you specify mut? Is this a design choice for safety, do you consider this how it should be naturally in other languages?
I should probably clarify, I'm still a newbie at Rust. So is this a design choice related to another feature in the language?

The Rust-Book actually addresses this topic.
There is no single reason that bindings are immutable by default, but we can think about it through one of Rust’s primary focuses: safety. If you forget to say mut, the compiler will catch it, and let you know that you have mutated something you may not have intended to mutate. If bindings were mutable by default, the compiler would not be able to tell you this. If you did intend mutation, then the solution is quite easy: add mut.
There are other good reasons to avoid mutable state when possible, but they’re out of the scope of this guide. In general, you can often avoid explicit mutation, and so it is preferable in Rust. That said, sometimes, mutation is what you need, so it’s not verboten.
Basically it is the C++-Mantra that everything that you don't want to modify should be const, just properly done by reversing the rules. Also see this Stackoverflow article about C++.

Related

Rust Box vs non-box

Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.

Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.

Rust Ownership Smart Pointers

I've recently started learning Rust and just learned about the Smart Pointers (Box, Rc and RefCell).
In the guide they talked about Rc implementing "shared ownership". But if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
And to me (still a Rust newbie) it seems as if Rc and RefCell take ownership of they value they contain and just "expose" different types of references to the contained value?
Am I wrong and if yes: why is Rust allowed to "cheat" the ownership system like that and would I be theoretically able to implement my own "cheating" types?

if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
No. Rust guarantees that there can be no more than a single mutable borrow and there cannot be mutable and non-mutable borrows at the same time. It doesn't say anything about owners.
why is Rust allowed to "cheat" the ownership system
It doesn't.
would I be theoretically able to implement my own "cheating" types
Yes. Those types are all implemented in Rust¹. Those types are battle-tested and perfectly safe under Rust's safety rules, but they require the use of unsafe at a lower level.
Note that unsafe doesn't permit going around the rule that you can have one mutable borrow XOR any number of non-mutable borrows, but using unsafe, you could do it anyway. This, of course, would actually be unsafe (and trigger undefined behavior).
1: Although some of those types are implemented using features that are still private to the compiler so you wouldn't be able to do everything as efficiently as the standard library, and Box and UnsafeCell are special to the language and cannot be reproduced by a normal library. There are for example many crates providing Rc or Arc alternatives which are better that the standard ones in some cases.

Can I avoid using explicit lifetime specifiers and instead use reference counting (Rc)?

I am reading the Rust Book and everything was pretty simple to understand (thanks to the book's authors), until the section about lifetimes. I spent all day, reading a lot of articles on lifetimes and still I am very insecure about using them correctly.
What I do understand, though, is that the concept of explicit lifetime specifiers aims to solve the problem of dangling references. I also know that Rust has reference-counting smart pointers (Rc) which I believe is the same as shared_ptr in C++, which has the same purpose: to prevent dangling references.
Given that those lifetimes are so horrendous to me, and smart pointers are very familiar and comfortable for me (I used them in C++ a lot), can I avoid the lifetimes in favor of smart pointers? Or are lifetimes an inevitable thing that I'll have to understand and use in Rust code?

are lifetimes an inevitable thing that I'll have to understand and use in Rust code?
In order to read existing Rust code, you probably don't need to understand lifetimes. The borrow-checker understands them so if it compiles then they are correct and you can just review what the code does.
I am very insecure about using them correctly.
The most important thing to understand about lifetimes annotations is that they do nothing. Rather, they are a way to express to the compiler the relationship between references. For example, if an input and output to a function have the same lifetime, that means that the output contains a reference to the input (or part of it) and therefore is not allowed to live longer than the input. Using them "incorrectly" means that you are telling the compiler something about the lifetime of a reference which it can prove to be untrue - and it will give you an error, so there is nothing to be insecure about!
can I avoid the lifetimes in favor of smart pointers?
You could choose to avoid using references altogether and use Rc everywhere. You would be missing out on one of the big features of Rust: lifetimes and references form one of the most important zero-cost abstractions, which enable Rust to be fast and safe at the same time. There is code written in Rust that nobody would attempt to write in C/C++ because a human could never be absolutely certain that they haven't introduced a memory bug. Avoiding Rust references in favour of smart pointers will mostly result in slower code, because smart pointers have runtime overhead.
Many APIs use references. In order to use those APIs you will need to have at least some grasp of what is going on.
The best way to understand is just to write code and gain an intuition from what works and what doesn't. Rust's error messages are excellent and will help a lot with forming that intuition.

Is it safe and defined behavior to transmute between a T and an UnsafeCell<T>?

A recent question was looking for the ability to construct self-referential structures. In discussing possible answers for the question, one potential answer involved using an UnsafeCell for interior mutability and then "discarding" the mutability through a transmute.
Here's a small example of such an idea in action. I'm not deeply interested in the example itself, but it's just enough complication to require a bigger hammer like transmute as opposed to just using UnsafeCell::new and/or UnsafeCell::into_inner:
use std::{
cell::UnsafeCell, mem, rc::{Rc, Weak},
};
// This is our real type.
struct ReallyImmutable {
value: i32,
myself: Weak<ReallyImmutable>,
}
fn initialize() -> Rc<ReallyImmutable> {
// This mirrors ReallyImmutable but we use `UnsafeCell`
// to perform some initial interior mutation.
struct NotReallyImmutable {
value: i32,
myself: Weak<UnsafeCell<NotReallyImmutable>>,
}
let initial = NotReallyImmutable {
value: 42,
myself: Weak::new(),
};
// Without interior mutability, we couldn't update the `myself` field
// after we've created the `Rc`.
let second = Rc::new(UnsafeCell::new(initial));
// Tie the recursive knot
let new_myself = Rc::downgrade(&second);
unsafe {
// Should be safe as there can be no other accesses to this field
(&mut *second.get()).myself = new_myself;
// No one outside of this function needs the interior mutability
// TODO: Is this call safe?
mem::transmute(second)
}
}
fn main() {
let v = initialize();
println!("{} -> {:?}", v.value, v.myself.upgrade().map(|v| v.value))
}
This code appears to print out what I'd expect, but that doesn't mean that it's safe or using defined semantics.
Is transmuting from a UnsafeCell<T> to a T memory safe? Does it invoke undefined behavior? What about transmuting in the opposite direction, from a T to an UnsafeCell<T>?

(I am still new to SO and not sure if "well, maybe" qualifies as an answer, but here you go. ;)
Disclaimer: The rules for these kinds of things are not (yet) set in stone. So, there is no definitive answer yet. I'm going to make some guesses based on (a) what kinds of compiler transformations LLVM does/we will eventually want to do, and (b) what kind of models I have in my head that would define the answer to this.
Also, I see two parts to this: The data layout perspective, and the aliasing perspective. The layout issue is that NotReallyImmutable could, in principle, have a totally different layout than ReallyImmutable. I don't know much about data layout, but with UnsafeCell becoming repr(transparent) and that being the only difference between the two types, I think the intent is for this to work. You are, however, relying on repr(transparent) being "structural" in the sense that it should allow you to replace things in larger types, which I am not sure has been written down explicitly anywhere. Sounds like a proposal for a follow-up RFC that extends the repr(transparent) guarantees appropriately?
As far as aliasing is concerned, the issue is breaking the rules around &T. I'd say that, as long as you never have a live &T around anywhere when writing through the &UnsafeCell<T>, you are good -- but I don't think we can guarantee that quite yet. Let's look in more detail.
Compiler perspective
The relevant optimizations here are the ones that exploit &T being read-only. So if you reordered the last two lines (transmute and the assignment), that code would likely be UB as we may want the compiler to be able to "pre-fetch" the value behind the shared reference and re-use that value later (i.e. after inlining this).
But in your code, we would only emit "read-only" annotations (noalias in LLVM) after the transmute comes back, and the data is indeed read-only starting there. So, this should be good.
Memory models
The "most aggressive" of my memory models essentially asserts that all values are always valid, and I think even that model should be fine with your code. &UnsafeCell is a special case in that model where validity just stops, and nothing is said about what lives behind this reference. The moment the transmute returns, we grab the memory it points to and make it all read-only, and even if we did that "recursively" through the Rc (which my model doesn't, but only because I couldn't figure out a good way to make it do so) you'd be fine as you don't mutate any more after the transmute. (As you may have noticed, this is the same restriction as in the compiler perspective. The point of these models is to allow compiler optimizations, after all. ;)
(As a side-note, I really wish miri was in better shape right now. Seems I have to try and get validation to work again in there, because then I could tell you to just run your code in miri and it'd tell you if that version of my model is okay with what you are doing :D )
I am thinking about other models currently that only check things "on access", but haven't worked out the UnsafeCell story for that model yet. What this example shows is that the model may have to contain ways for a "phase transition" of memory first being UnsafeCell, but later having normal sharing with read-only guarantees. Thanks for bringing this up, that will make for some nice examples to think about!
So, I think I can say that (at least from my side) there is the intent to allow this kind of code, and doing so does not seem to prevent any optimizations. Whether we'll actually manage to find a model that everybody can agree with and that still allows this, I cannot predict.
The opposite direction: T -> UnsafeCell<T>
Now, this is more interesting. The problem is that, as I said above, you must not have a &T live when writing through an UnsafeCell<T>. But what does "live" mean here? That's a hard question! In some of my models, this could be as weak as "a reference of that type exists somewhere and the lifetime is still active", i.e., it could have nothing to do with whether the reference is actually used. (That's useful because it lets us do more optimizations, like moving a load out of a loop even if we cannot prove that the loop ever runs -- which would introduce a use of an otherwise unused reference.) And since &T is Copy, you cannot even really get rid of such a reference either. So, if you have x: &T, then after let y: &UnsafeCell<T> = transmute(x), the old x is still around and its lifetime still active, so writing through y could well be UB.
I think you'd have to somehow restrict the aliasing that &T allows, very carefully making sure that nobody still holds such a reference. I'm not going to say "this is impossible" because people keep surprising me (especially in this community ;) but TBH I cannot think of a way to make this work. I'd be curious if you have an example though where you think this is reasonable.

What's the right way to have a thread-safe lazy-initialized possibly mutable value in Rust?

I have a struct that contains a field that is rather expensive to initialize, so I want to be able to do so lazily. However, this may be necessary in a method that takes &self. The field also needs to be able to modified once it is initialized, but this will only occur in methods that take &mut self.
What is the correct (as in idiomatic, as well as in thread-safe) way to do this in Rust? It seems to me that it would be trivial with either of the two constraints:
If it only needed to be lazily initialized, and not mutated, I could simply use lazy-init's Lazy<T> type.
If it only needed to be mutable and not lazy, then I could just use a normal field (obviously).
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.

The simplest solution is RwLock<Option<T>>.
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.
lazy-init uses tricky code because it guarantees lock-free access after creation. Lock-free is always a bit trickier.
Note that in Rust it's easy to tell whether something is tricky or not: tricky means using an unsafe block. Since you can use RwLock<Option<T>> without any unsafe block there is nothing for you to worry about.
A variant to RwLock<Option<T>> may be necessary if you want to capture a closure for initialization once, rather than have to pass it at each potential initialization call-site.
In this case, you'll need something like RwLock<SimpleLazy<T>> where:
enum SimpleLazy<T> {
Initialized(T),
Uninitialized(Box<FnOnce() -> T>),
}
You don't have to worry about making SimpleLazy<T> Sync as RwLock will take care of that for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string