How to convert RefMut<T> to Ref<T> - rust

I was wondering if it's possible to convert a RefMut<T> to a Ref<T> in Rust?
Some context - Inside of a function, I've used .borrow_mut() to access an item in a RefCell<T> mutably, and now want to return an immutable reference to it from the function. I believe I cannot use &T as it would be a temporary value than cannot be returned from the function, though correct me if I'm wrong.
If not, I'd love to know the technical reason why it's not possible.

If you just want to prevent mutation through RefMut you can give a reference to it, which will "turn off" DerefeMut implementation (you won't have access to exclusive reference).
But if you want to "convert" RefMut to Ref, then I'm afraid it's not possible. The only way is to drop RefMut and then borrow again. I don't think this is impossible pre se. You could imagine a method on RefMut that would change reference counters and downgrade a RefMut to Ref. But there is nothing like this in current standard library.

Related

How does refcell interact with the borrow checker

I am trying to wrap my head around the RefCell type, not as a utility, but in terms of it's implementation.
I understand the general idea, a struct that has mutable data inside of it that keeps track of how many active references to that data exist.
but how does it interact with the borrow checker to allow you to mutate the data multiple times?
but how does it interact with the borrow checker to allow you to mutate the data multiple times?
Internally, it doesn't really - it uses unsafe code.
RefCell is a checked wrapper around UnsafeCell, which provides the inner mutability. Its get method returns a *mut pointer to the wrapped type, which can be cast to a &mut reference with an arbitrary lifetime. This, of course, is very unsafe, so it's up to the user to ensure there's no violations of the borrow checker rules with UnsafeCell.
Cell does this by not allowing references to the inner object. RefCell does this via essentially a single-threaded lock, keeping track of whether a reference exists and denying access if so. Mutex and RwLock are similar, but use thread-safe locks.
(As an aside, UnsafeCell IS compiler magic, but for a different reason - it's needed to tell the compiler that an object may change behind an immutable reference. But that doesn't have to do with the borrow checks so much as optimizations.)

Understanding usage of Rc<RefCell<SomeStruct>> in Rust

I'm looking at some code that uses
Rc<RefCell<SomeStruct>>
So I went out to read about the differences between Rc and RefCell:
Here is a recap of the reasons to choose Box, Rc, or RefCell:
Rc enables multiple owners of the same data; Box and RefCell
have single owners.
Box allows immutable or mutable borrows checked
at compile time; Rc allows only immutable borrows checked at
compile time;
RefCell allows immutable or mutable borrows checked
at runtime. Because RefCell allows mutable borrows checked at
runtime, you can mutate the value inside the RefCell even when the
RefCell is immutable.
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access? I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access?
By dereferencing. If you have a variable x of type Rc<...>, you can access the inner value using *x. In many cases this happens implicitly; for example you can call methods on x simply with x.method(...).
I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
The get_mut() method is probably more recent than the explanation stating that Rc only allows immutable borrows. Moreover, it only returns a mutable borrow if there currently is only a single owner of the inner value, i.e. if you currently wouldn't need Rc in the first place. As soon as there are multiple owners, get_mut() will return None.
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
RefCell will allow you to get mutable access even when multiple owners exist, and even if you only hold a shared reference to the RefCell. It will dynamically check at runtime that only a single mutable reference exists at any given time, and it will panic if you request a second, concurrent one (or return and error for the try_borrow methods, respecitvely). This functionality is not offered by Rc.
So in summary, Rc gives you shared ownership. The innervalue has multiple owners, and reference counting makes sure the data stays alive as long as at least one owner still holds onto it. This is useful if your data doesn't have a clear single owner. RefCell gives you interior mutability, i.e. you can borrow the inner value dynamically at runtime, and modify it even with a shared reference. The combination Rc<RefCell<...>> gives you the combination of both – a value with multiple owners that can be borrowed mutably by any one of the owners.
For further details, you can read the relevant chapters of the Rust book:
Rc<T>, the Reference Counted Smart Pointer
RefCell<T> and the Interior Mutability Pattern
If it's possible to access Rc's object in mut and not mut way, why a
RefCell is needed?
Rc pointer allows you to have shared ownership. since ownership is shared, the value owned by Rc pointer is immutable
Refcell smart pointer represents single ownership over the data it holds, much like Box smart pointer. the difference is that box smart pointer enforces the borrowing rules at compile time, whereas refcell enforces the borrowing rules at run time.
If you combine them together, you can create a smart pointer which can have multiple owners, and some of the owners would be able to modify the value some cannot. A perfect use case is to create a doubly linked list in rust.
struct LinkedList<T>{
head:Pointer<T>,
tail:Pointer<T>
}
struct Node<T>{
element:T,
next:Pointer<T>,
prev:Pointer<T>,
}
// we need multiple owners who can mutate the data
// it is Option because "end.next" would be None
type Pointer<T>=Option<Rc<RefCell<Node<T>>>>;
In the image "front" and "end" nodes will both point to the "middle" node and they can both mutate it. Imagine you need to insert a new node after "front", you will need to mutate "front.next". So in doubly linked you need multiple ownership and mutability power at the same time.

Why is transmuting &T to &mut T Undefined Behaviour?

I want to reinterpret an immutable reference to a mutable reference (in an unsafe block) and be responsible for the safety checks on my own, yet it appears I cannot use mem::transmute() to do so.
let map_of_vecs: HashMap<usize, Vec<_>> = ...;
let vec = map_of_vecs[2];
/// obtain a mutable reference to vec here
I do not want to wrap the Vecs into Cells because that would affect all other areas of code that use map_of_vecs and I only need mutability in one line.
I do not have mutable access to map_of_vecs
The Rust optimiser makes the assumption that &mut T references are unique. For example, it might deduce that a particular piece of memory can be reused because a mutable reference to that memory exists but is never accessed again.
However, if you transmute a &T to a &mut T then you are able to create multiple mutable references to the same data. If the compiler makes this assumption, you could end up dereferencing a value that has been overwritten with something else.
This is just one example of how the compiler might make use of the assumption that mutable references are unique. In fact, the compiler is free to use this information in any way it sees fit — which could (and likely will) change from version to version.
Even if you think you have guaranteed that the reference isn't aliased, you can't always guarantee that users of your code won't create more references. Even if you think you can be sure of that, the existence of references is extremely subtle and it's very easy to miss one. For example when you call a method that takes &self, that's a reference.
The Rust compiler annotates &T function parameters with the LLVM noalias and readonly attributes (provided that T does not contain any UnsafeCell parts). The noalias attribute tells LLVM that the memory behind this pointer may only be written to through this pointer (and not through any other pointers), and the readonly attribute tells LLVM that it can't be written to through this pointer (but possibly other pointers). In combination, the two attributes allow the LLVM optimiser to assume the memory is not changed at all during the execution of this function, and the code can be optimised based on this assumption. The optimiser may reorder instructions or remove code in a way that is only safe to do if you actually stick to this contract.
Another way the conversion can lead to undefined behaviour is for statics: immutable statics without UnsafeCells will be placed into read-only memory, so if you actually write to them, your code will segfault.
For parameters with UnsafeCells the compiler does not emit the readonly attribute, and statics containing an UnsafeCell are placed into writable memory.

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.
You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.
The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

What's the rule of thumb when dealing with passing args in Rust?

I read a couple of articles and it's still unclear to me. It looks like T and &T is kinda interchangeable as long as a compiler doesn't show any errors. But after I read an official doc I want to pass everything by reference to take advantage of borrowing.
Could you provide any simple rule about passing an arg as T against &T when T is an object/string? E.g., in C++ there're 3 options:
T – copy the value, can't mutate the current value
&T – don't create a copy, can mutate the current value
const &T – don't create a copy, can't mutate the current value
E.g., is it a good idea to pass by T if I want to deallocate T after it goes out of scope in my child function (the function I'm passing T to); and use &T if I want to use it my child function in a read-only mode and then continue to use it in my current (parent) function.
Thanks!
These are the rules I personally use (in order).
Pass by value (T) if the parameter has a generic type and the trait(s) that this generic type implements all take &self or &mut self but there is a blanket impl for &T or &mut T (respectively) for all types T that implement that trait (or these traits). For example, in std::io::Write, all methods take &mut self, but there is a blanket impl impl<'a, W: Write + ?Sized> Write for &'a mut W provided by the standard library. This means that although you accept a T (where T: Write) by value, one can pass a &mut T because &mut T also implements Write.
Pass by value (T) if you must take ownership of the value (for example, because you pass it to another function/method that takes it by value, or because the alternatives would require potentially expensive clones).
Pass by mutable reference (&mut T) if you must mutate the object (by calling other functions/methods that take the object by mutable reference, or just by overwriting it and you want the caller to see the new value) but do not need to take ownership of it.
Pass by value (T) if the type is Copy and is small (my criterion for small is size_of::<T>() <= size_of::<usize>() * 2, but other people might have slightly different criteria). The primitive integer and floating-point types are examples of such types. Passing values of these types by reference would create an unnecessary indirection in memory, so the caller will have to perform an additional machine instruction to read it. When size_of::<T>() <= size_of::<usize>(), you're usually not saving anything by passing the value by reference because T and &T will usually be both passed in a single register (if the function has few enough parameters).
Pass by shared reference (&T) otherwise.
In general, prefer passing by shared reference when possible. This avoids potentially expensive clones when the type is large or manages resources other than memory, and gives the most flexibility to the caller in how the value can be used after the call.
E.g., is it a good idea to pass by T if I want to deallocate T after it goes out of scope in my child function (the function I'm passing T to)
You'd better have a good reason for that! If you ever decide that you actually need to use the T later in the caller, then you'll have to change the callee's signature and update all call sites (because unlike in C++, where going from T to const T& is mostly transparent, going from T to &T in Rust is not: you must add a & in front of the argument in all call sites).
I recommend you use Clippy if you're not already using it. Clippy has a lint that can notify you if you write a function that takes an argument by value but the function doesn't need to take ownership of it (this lint used to warn by default, but it no longer does 😞, so you have to enable it manually with #[warn(clippy::needless_pass_by_value)]).

Resources