Rust multiple levels of indirections - Performance impact? - rust

I've been experimenting with everyone's favorite new thing, rust. I'm coming from a C background, so I think this may be what is causing me issue here. I've tried to find a clear definitive answer online but could not. Here's my issue, simplified:
fn main() {
let v : Vec<&str> = vec!["abc", "def"];
let z : Vec<&&str> = v.iter().filter(|e| {e.starts_with("a")}).collect(); // type of `e` is &&&str!
// do something with z
}
My issue occurs with the type of z and e (within the closure). I understand the need for the additional layer of reference for the ownership system. I also understand I can go around this issue using things liked copied() or into_iter() (for z's type) or use |&e|.
However, I am wondering if there's a performance impact here. Even if I used copied, the type of e in |e| would still be a double reference. If pointers and references map 1:1, this seems like a waste. But references and pointers do not need (in this case at least) to map 1:1. The compiler could create a reference for the ownership rules, without deref a reference.
So here's my question: does multiple references translates 1:1 to pointers? Are there optimizations? In this case (e for example), am I going through two levels of indirection?

Does multiple references translates 1:1 to pointers?
Yes.
But the compiler can optimize them out, as usual. In the case of iterators, it usually does. When all stuff will be inlined, the compiler will be able to convert the iterator to a simple loop, so this is rarely a problem. If it is, you can always use copied(), or even switch to simple loops. This is part of Rust's zero-cost abstraction story.

Related

Why can I access struct fields by a variable and the reference to that variable in the same way? (Rust)

If I print x.passwd, I will get 234
If I print y.passwd, I will get 234 too, But how is that possible since y = &x (essentially storing the address of x), shouldnt I be dereferencing in order to access passwd like (*y).passwd?
I was solving a leetcode problem and they were accessing a node's val field directly by the reference without dereferencing and that made me more confused about references.
On Left hand size, we have Option<Box> while on the right we have &Option<Box>, How can we perform Some(node) = node
PS: I Hope someone explains with a memory diagram of what is actually happening. And if anyone has good resources to understand references and borrowing, Please Let me know, I have been referring the docs and Lets Get Rusty youtube channel but still references are little confusing for me.
In Rust, the . operator will automatically dereference as necessary to find something with the right name. In y.passwd, y is a reference, but references don't have any named fields, so the compiler tries looking at the type of the referent — Cred — and does find the field named passwd.
The same thing works with methods, but there's some more to it — in addition to dereferencing, the compiler will also try adding an & to find a matching method. That way you don't have to write the awkward (&foo).bar() just to call a method that takes &self, similar to how you've already found that you don't have to write (*y).passwd.
In general, you rarely (but not never) have to worry about whether or not something is a reference, when using ..

Rust Box vs non-box

Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.
Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.

Erronous mutable borrow (E0502) when trying to remove and insert into a HashMap

I am a beginner to Rust and tried using a HashMap<u64, u64>. I want to remove an element and insert it with a modified value:
let mut r = HashMap::new();
let mut i = 2;
...
if r.contains_key(&i) {
let v = r.get(&i).unwrap();
r.remove(&i);
r.insert(i, v+1);
}
Now, the borrow checker complains that r is borrowed immutable, then mutable and then immutable again in the three lines of the if-block.
I don't understand what's going on...I guess since the get, remove and insert methods have r as implicit argument, it is borrowed in the three calls. But why is it a problem that this borrow in the remove call is mutable?
But why is it a problem that this borrow in the remove call is mutable?
The problem is the spanning: Rust allows either any number of immutable borrows or a single mutable borrow, they can not overlap.
The issue here is that v is a reference to the map contents, meaning the existence of v requires borrowing the map until v stops being used. Which thus overlaps with both remove and insert calls, and forbids them.
Now there are various ways to fix this. Since in this specific case you're using u64 which is Copy, you can just dereference and it'll copy the value you got from the map, removing the need for a borrow:
if r.contains_key(&i) {
let v = *r.get(&i).unwrap();
r.remove(&i);
r.insert(i, v+1);
}
this is limited in its flexibility though, as it only works for Copy types[0].
In this specific case it probably doesn't matter that much, because Copy is cheap, but it would still make more sense to use the advanced APIs Rust provides, for safety, for clarity, and because you'll eventually need them for less trivial types.
The simplest is to just use get_mut: where get returns an Option<&V>, get_mut returns an Option<&mut V>, meaning you can... update the value in-place, you don't need to get it out, and you don't need to insert it back in (nor do you need a separate lookup but you already didn't need that really):
if let Some(v) = r.get_mut(&i) {
*v += 1;
}
more than sufficient for your use case.
The second option is the Entry API, and the thing which will ruin every other hashmap API for you forever. I'm not joking, every other language becomes ridiculously frustrating, you may want to avoid clicking on that link (though you will eventually need to learn about it anyway, as it solves real borrowing and efficiency issues).
It doesn't really show its stuff here because your use case is simple and get_mut more than does the job, but anyway, you could write the increment as:
r.entry(i).and_modify(|v| *v+=1);
Incidentally in most languages (and certainly in Rust as well) when you insert an item in a hashmap, the old value gets evicted if there was one. So the remove call was already redundant and wholly unnecessary.
And pattern-matching an Option (such as that returned by HashMap::get) is generally safer, cleaner, and faster than painstakenly and procedurally doing all the low-level bits.
So even without using advanced APIs, the original code can be simplified to:
if let Some(&v) = r.get(&i) {
r.insert(i, v+1);
}
I'd still recommend the get_mut version over that as it is simpler, avoids the double lookup, and works on non-Copy types, but YMMV.
Also unlike most languages Rust's HashMap::insert returns the old value (f any), not a concern here but can be useful in some cases.
[0] as well as Clone ones, by explicitly calling .clone(), that may or may not translate to a significant performance impact depending on the type you're cloning.
The problem is that you keep an immutable reference when getting v. Since it is a u64, just implicitly clone so there is no more reference involved:
let v = r.get(&i).unwrap().clone();
Playground

Why do immutable references to copy types in rust exist?

So I just started learning rust (first few chapters of "the book") and am obviously quite a noob. I finished the ownership-basics chapter (4) and wrote some test programs to make sure I understood everything. I seem to have the basics down but I asked myself why immutable references to copy-types are even possible. I will try to explain my thoughts with examples.
I thought that you maybe want to store a reference to a copy-type so you can check it's value later instead of having a copy of the old value but this can't be it since the underlying value can't be changed as long as it's been borrowed.
The most basic example of this would be this code:
let mut x = 10; // push i32
let x_ref = &x; // push immutable reference to x
// x = 100; change x which is disallowed since it's borrowed currently
println!("{}", x_ref); // do something with the reference since you want the current value of x
The only reason for this I can currently think of (with my current knowledge) is that they just exist so you can call generic methods which require references (like cmp) with them.
This code demonstrates this:
let x = 10; // push i32
// let ordering = 10.cmp(x); try to compare it but you can't since cmp wants a reference
let ordering = 10.cmp(&x) // this works since it's now a reference
So, is that the only reason you can create immutable references to copy-types?
Disclaimer:
I don't see Just continue reading the book as a valid answer. However I fully understand if you say something like Yes you need those for this and this use-case (optional example), it will be covered in chapter X. I hope you understand what I mean :)
EDIT:
Maybe worth mentioning, I'm a C# programmer and not new to programming itself.
EDIT 2:
I don't know if this is technically a duplicate of this question but I do not fully understand the question and the answer so I hope for a more simple answer understandable by a real noob.
An immutable reference to a Copy-type is still "an immutable reference". The code that gets passed the reference can't change the original value. It can make a (hopefully) trivial copy of that value, but it can still only ever change that copy after doing so.
That is, the original owner of the value is ensured that - while receivers of the reference may decide to make a copy and change that - the state of whatever is referenced can't ever change. If the receiver wants to change the value, it can feel free; nobody else is going to see it, though.
Immutable references to primitives are not different, and while being Copy everywhere, you are probably more inclined to what "an immutable reference" means semantically for primitive types. For instance
fn print_the_age(age: &i32) { ... }
That function could make a copy via *age and change it. But the caller will not see that change and it does not make much sense to do so in the first place.
Update due to comment: There is no advantage per se, at least as far as primitives are concerned (larger types may be costly to copy). It does boil down to the semantic relationship between the owner of the i32 and the receiver: "Here is a reference, it is guaranteed to not change while you have that reference, I - the owner - can't change or move or deallocate and there is no other thread else including myself that could possibly do that".
Consider where the reference is coming from: If you receive an &i32, wherever it is coming from can't change and can't deallocate. The `i32´ may be part of a larger type, which - due to handing out a reference - can't move, change or get de-allocated; the receiver is guaranteed of that. It's hard to say there is an advantage per se in here; it might be advantageous to communicate more detailed type (and lifetime!) relationships this way.
They're very useful, because they can be passed to generic functions that expect a reference:
fn map_vec<T, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
If immutable references of non-Copy types were forbidden, we would need two versions:
fn map_vec_own<T: !Copy, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
fn map_vec_copy<T: Copy, U>(v: &Vec<T>, f: impl Fn( T) -> U) -> Vec<U> {...}
Immutable references are, naturally, used to provide access to the referenced data. For instance, you could have loaded a dictionary and have multiple threads reading from it at the same time, each using their own immutable reference. Because the references are immutable those threads will not corrupt that common data.
Using only mutable references, you can't be sure of that so you need to make full copies. Copying data takes time and space, which are always limited. The primary question for performance tends to be if your data fits in CPU cache.
I'm guessing you were thinking of "copy" types as ones that fit in the same space as the reference itself, i.e. sizeof(type) <= sizeof(type*). Rust's Copy trait indicates data that could be safely copied, no matter the size. These are orthogonal concepts; for instance, a pointer might not be safely copied without adjusting a refernce count, or an array might be copyable but take gigabytes of memory. This is why Rc<T> has the Clone trait, not Copy.

Is it safe and defined behavior to transmute between a T and an UnsafeCell<T>?

A recent question was looking for the ability to construct self-referential structures. In discussing possible answers for the question, one potential answer involved using an UnsafeCell for interior mutability and then "discarding" the mutability through a transmute.
Here's a small example of such an idea in action. I'm not deeply interested in the example itself, but it's just enough complication to require a bigger hammer like transmute as opposed to just using UnsafeCell::new and/or UnsafeCell::into_inner:
use std::{
cell::UnsafeCell, mem, rc::{Rc, Weak},
};
// This is our real type.
struct ReallyImmutable {
value: i32,
myself: Weak<ReallyImmutable>,
}
fn initialize() -> Rc<ReallyImmutable> {
// This mirrors ReallyImmutable but we use `UnsafeCell`
// to perform some initial interior mutation.
struct NotReallyImmutable {
value: i32,
myself: Weak<UnsafeCell<NotReallyImmutable>>,
}
let initial = NotReallyImmutable {
value: 42,
myself: Weak::new(),
};
// Without interior mutability, we couldn't update the `myself` field
// after we've created the `Rc`.
let second = Rc::new(UnsafeCell::new(initial));
// Tie the recursive knot
let new_myself = Rc::downgrade(&second);
unsafe {
// Should be safe as there can be no other accesses to this field
(&mut *second.get()).myself = new_myself;
// No one outside of this function needs the interior mutability
// TODO: Is this call safe?
mem::transmute(second)
}
}
fn main() {
let v = initialize();
println!("{} -> {:?}", v.value, v.myself.upgrade().map(|v| v.value))
}
This code appears to print out what I'd expect, but that doesn't mean that it's safe or using defined semantics.
Is transmuting from a UnsafeCell<T> to a T memory safe? Does it invoke undefined behavior? What about transmuting in the opposite direction, from a T to an UnsafeCell<T>?
(I am still new to SO and not sure if "well, maybe" qualifies as an answer, but here you go. ;)
Disclaimer: The rules for these kinds of things are not (yet) set in stone. So, there is no definitive answer yet. I'm going to make some guesses based on (a) what kinds of compiler transformations LLVM does/we will eventually want to do, and (b) what kind of models I have in my head that would define the answer to this.
Also, I see two parts to this: The data layout perspective, and the aliasing perspective. The layout issue is that NotReallyImmutable could, in principle, have a totally different layout than ReallyImmutable. I don't know much about data layout, but with UnsafeCell becoming repr(transparent) and that being the only difference between the two types, I think the intent is for this to work. You are, however, relying on repr(transparent) being "structural" in the sense that it should allow you to replace things in larger types, which I am not sure has been written down explicitly anywhere. Sounds like a proposal for a follow-up RFC that extends the repr(transparent) guarantees appropriately?
As far as aliasing is concerned, the issue is breaking the rules around &T. I'd say that, as long as you never have a live &T around anywhere when writing through the &UnsafeCell<T>, you are good -- but I don't think we can guarantee that quite yet. Let's look in more detail.
Compiler perspective
The relevant optimizations here are the ones that exploit &T being read-only. So if you reordered the last two lines (transmute and the assignment), that code would likely be UB as we may want the compiler to be able to "pre-fetch" the value behind the shared reference and re-use that value later (i.e. after inlining this).
But in your code, we would only emit "read-only" annotations (noalias in LLVM) after the transmute comes back, and the data is indeed read-only starting there. So, this should be good.
Memory models
The "most aggressive" of my memory models essentially asserts that all values are always valid, and I think even that model should be fine with your code. &UnsafeCell is a special case in that model where validity just stops, and nothing is said about what lives behind this reference. The moment the transmute returns, we grab the memory it points to and make it all read-only, and even if we did that "recursively" through the Rc (which my model doesn't, but only because I couldn't figure out a good way to make it do so) you'd be fine as you don't mutate any more after the transmute. (As you may have noticed, this is the same restriction as in the compiler perspective. The point of these models is to allow compiler optimizations, after all. ;)
(As a side-note, I really wish miri was in better shape right now. Seems I have to try and get validation to work again in there, because then I could tell you to just run your code in miri and it'd tell you if that version of my model is okay with what you are doing :D )
I am thinking about other models currently that only check things "on access", but haven't worked out the UnsafeCell story for that model yet. What this example shows is that the model may have to contain ways for a "phase transition" of memory first being UnsafeCell, but later having normal sharing with read-only guarantees. Thanks for bringing this up, that will make for some nice examples to think about!
So, I think I can say that (at least from my side) there is the intent to allow this kind of code, and doing so does not seem to prevent any optimizations. Whether we'll actually manage to find a model that everybody can agree with and that still allows this, I cannot predict.
The opposite direction: T -> UnsafeCell<T>
Now, this is more interesting. The problem is that, as I said above, you must not have a &T live when writing through an UnsafeCell<T>. But what does "live" mean here? That's a hard question! In some of my models, this could be as weak as "a reference of that type exists somewhere and the lifetime is still active", i.e., it could have nothing to do with whether the reference is actually used. (That's useful because it lets us do more optimizations, like moving a load out of a loop even if we cannot prove that the loop ever runs -- which would introduce a use of an otherwise unused reference.) And since &T is Copy, you cannot even really get rid of such a reference either. So, if you have x: &T, then after let y: &UnsafeCell<T> = transmute(x), the old x is still around and its lifetime still active, so writing through y could well be UB.
I think you'd have to somehow restrict the aliasing that &T allows, very carefully making sure that nobody still holds such a reference. I'm not going to say "this is impossible" because people keep surprising me (especially in this community ;) but TBH I cannot think of a way to make this work. I'd be curious if you have an example though where you think this is reasonable.

Resources