Rust E0382 - value used here after move - rust

I am new to Rust and is really struggling with way to write code the Rust way. I understand its rules to enforce memory correctness, however I cannot figure out the changes required to comply in code.
I have created a Tree like object from the json structure recieved from the application.
I am trying to create two operations on tree,
Get the leaves of tree
Get the mapping of parent -> children in a map
The high level code looks like this
fn rename_workspaces(conn: Connection) {
let i3_info = I3Info::new(conn);
let _leaves = i3_info.get_leaves();
let _parent_child = i3_info.dfs_parent_child();
}
However, rust is complaining that i3_info variable has been used after the move. I understand its complaint, however, I cannot figure out what should be the correct Rust way to solve it.
Please help me to figure out the change in thinking required to solve this. This is important, because my application really need to perform these calculations on the tree structure multiple times.
Interesting thing is , I am not really mutating the structure, just iterating over it and returning the new / mutated structure from the function.
Source link: https://github.com/madhur/i3-auto-workspace-icons-rust/blob/main/src/main.rs

The problem is that you have declared the methods of I3Info such that they consume (move) the I3Info:
pub fn dfs_parent_child(self) ...
pub fn get_leaves(self) ...
To not consume the I3Info, allowing it to be used more than once, declare your methods to take references to the I3Info:
pub fn dfs_parent_child(&self) ...
pub fn get_leaves(&self) ...
You will need to modify the code within these methods, also, to work with references because this change also means you can no longer move things out of self — they have to be left intact. Sometimes this is as simple as putting & before a field access (&self.foo instead of self.foo), and sometimes it will require more extensive changes.
The general “Rust way of thinking” lessons here are:
Think about the type of your method receivers. self is not always right, and neither is &self.
Don't take ownership of values except when it makes sense. Passing by & reference is a good default choice (except for Copy types, like numbers).

Related

Is there a way to automatically register trait implementors?

I'm trying to load JSON files that refer to structs implementing a trait. When the JSON files are loaded, the struct is grabbed from a hashmap. The problem is, I'll probably have to have a lot of structs put into that hashmap all over my code. I would like to have that done automatically. To me this seems to be doable with procedural macros, something like:
#[my_proc_macro(type=ImplementedType)]
struct MyStruct {}
impl ImplementedType for MyStruct {}
fn load_implementors() {
let implementors = HashMap::new();
load_implementors!(implementors, ImplementedType);
}
Is there a way to do this?
No
There is a core issue that makes it difficult to skip manually inserting into a structure. Consider this simplified example, where we simply want to print values that are provided separately in the code-base:
my_register!(alice);
my_register!(bob);
fn main() {
my_print(); // prints "alice" and "bob"
}
In typical Rust, there is no mechanism to link the my_print() call to the multiple invocations of my_register. There is no support for declaration merging, run-time/compile-time reflection, or run-before-main execution that you might find in other languages that might make this possible (unless of course there's something I'm missing).
But Also Yes
There are third party crates built around link-time or run-time tricks that can make this possible:
ctor allows you to define functions that are executed before main(). With it, you can have my_register!() create invididual functions for alice and bob that when executed will add themselves to some global structure which can then be accessed by my_print().
linkme allows you to define a slice that is made from elements defined separately, which are combined at compile time. The my_register!() simply needs to use this crate's attributes to add an element to the slice, which my_print() can easily access.
I understand skepticism of these methods since the declarative approach is often clearer to me, but sometimes they are necessary or the ergonomic benefits outweigh the "magic".

Does Rust's borrow checker really mean that I should re-structure my program?

So I've read Why can't I store a value and a reference to that value in the same struct? and I understand why my naive approach to this was not working, but I'm still very unclear how to better handle my situation.
I have a program I wanted to structure like follows (details omitted because I can't make this compile anyway):
use std::sync::Mutex;
struct Species{
index : usize,
population : Mutex<usize>
}
struct Simulation<'a>{
species : Vec<Species>,
grid : Vec<&'a Species>
}
impl<'a> Simulation<'a>{
pub fn new() -> Self {...} //I don't think there's any way to implement this
pub fn run(&self) {...}
}
The idea is that I create a vector of Species (which won't change for the lifetime of Simulation, except in specific mutex-guarded fields) and then a grid representing which species live where, which will change freely. This implementation won't work, at least not any way I've been able to figure out. As I understand it, the issue is that pretty much however I make my new method, the moment it returns, all of the references in grid would becomine invalid as Simulation and therefor Simulation.species are moved to another location in the stack. Even if I could prove to the compiler that species and its contents would continue to exist, they actually won't be in the same place. Right?
I've looked into various ways around this, such as making species as an Arc on the heap or using usizes instead of references and implementing my own lookup function into the species vector, but these seem slower, messier or worse. What I'm starting to think is that I need to really re-structure my code to look something like this (details filled in with placeholders because now it actually runs):
use std::sync::Mutex;
struct Species{
index : usize,
population : Mutex<usize>
}
struct Simulation<'a>{
species : &'a Vec<Species>, //Now just holds a reference rather than data
grid : Vec<&'a Species>
}
impl<'a> Simulation<'a>{
pub fn new(species : &'a Vec <Species>) -> Self { //has to be given pre-created species
let grid = vec!(species.first().unwrap(); 10);
Self{species, grid}
}
pub fn run(&self) {
let mut population = self.grid[0].population.lock().unwrap();
println!("Population: {}", population);
*population += 1;
}
}
pub fn top_level(){
let species = vec![Species{index: 0, population : Mutex::new(0_)}];
let simulation = Simulation::new(&species);
simulation.run();
}
As far as I can tell this runs fine, and ticks off all the ideal boxes:
grid uses simple references with minimal boilerplate for me
these references are checked at compile time with minimal overhead for the system
Safety is guaranteed by the compiler (unlike a custom map based approach)
But, this feels very weird to me: the two-step initialization process of creating owned memory and then references can't be abstracted any way that I can see, which feels like I'm exposing an implementation detail to the calling function. top_level has to also be responsible for establishing any other functions or (scoped) threads to run the simulation, call draw/gui functions, etc. If I need multiple levels of references, I believe I will need to add additional initialization steps to that level.
So, my question is just "Am I doing this right?". While I can't exactly prove this is wrong, I feel like I'm losing a lot of near-universal abstraction of the call structure. Is there really no way to return species and simulation as a pair at the end (with some one-off update to make all references point to the "forever home" of the data).
Phrasing my problem a second way: I do not like that I cannot have a function with a signature of ()-> Simulation, when I can can have a pair of function calls that have that same effect. I want to be able to encapsulate the creation of this simulation. I feel like the fact that this approach cannot do so indicates I'm doing something wrong, and that there may be a more idiomatic approach I'm missing.
I've looked into various ways around this, such as making species as an Arc on the heap or using usizes instead of references and implementing my own lookup function into the species vector, but these seem slower, messier or worse.
Don't assume that, test it. I once had a self-referential (using ouroboros) structure much like yours, with a vector of things and a vector of references to them. I tried rewriting it to use indices instead of references, and it was faster.
Rc/Arc is also an option worth trying out — note that there is only an extra cost to the reference counting when an Arc is cloned or dropped. Arc<Species> doesn't cost any more to dereference than &Species, and you can always get an &Species from an Arc<Species>. So the reference counting only matters if and when you're changing which Species is in an element of Grid.
If you're owning a Vec of objects, then want to also keep track of references to particular objects in that Vec, a usize index is almost always the simplest design. It might feel like extra boilerplate to you now, but it's a hell of a lot better than properly dealing with keeping pointers in check in a self-referential struct (as somebody who's made this mistake in C++ more than I should have, trust me). Rust's rules are saving you from some real headaches, just not ones that are obvious to you necessarily.
If you want to get fancy and feel like a raw usize is too arbitrary, then I recommend you look at slotmap. For a simple SlotMap, internally it's not much more than an array of values, iteration is fast and storage is efficient. But it gives you generational indices (slotmap calls these "keys") to the values: each value is embellished with a "generation" and each index also internally keeps hold of a its generation, therefore you can safely remove and replace items in the Vec without your references suddenly pointing at a different object, it's really cool.

How to decide when function input params should be references or not?

When writing a function how does one decide whether to make input parameters referenced or consumed?
For example, should I do this?
fn foo(val: Bar) -> bool { check(val) } // version 1
Or use referenced param instead?
fn foo(val: &Bar) -> bool { check(*val) } // version 2
On the client side, if I only had the second version but wanted to have my value consumed, I'd have to do something like this:
// given in: Bar
let out = foo(&in); // using version 2 but wanting to consume ownership
drop(in);
On the other hand, if I only had the first version but wanted to keep my reference, I'd have to do something like this:
// given in: &Bar
let out = foo(in.clone()); // using version 1 but wanting to keep reference alive
So which is preferred, and why?
Are there any performance considerations in making this choice? Or does the compiler make them equivalent in terms of performance, and how?
And when would you want to offer both versions (via traits)? And for those times how do you write the underlying implementations for both functions -- do you duplicate the logic in each method signature or do you have one proxy to the other? Which to which, and why?
Rust's goal is to have performance and syntax similar to C/C++ without the memory problems. To do this it avoids things like garbage collection and instead enforces a particular strict memory model of "ownership" and "borrowing". These are critical concepts in Rust. I would suggest reading Understanding Ownership in The Rust Book.
The rules of memory ownership are...
Each value in Rust has a variable that’s called its owner.
There can only be one owner at a time.
When the owner goes out of scope, the value will be dropped.
Enforcing a single owner avoids a great many bugs and complications typical of C and C++ programs while avoiding complex and slow memory management at runtime.
You can't get very far with only that, so Rust provides references. A reference lets functions safely "borrow" data without taking ownership. You can have either as many immutable references as you like, or only one mutable reference.
When applied to function calls, passing a value passes ownership to the function. Passing a reference is "borrowing", ownership is retained.
It's really, really important to understand ownership, borrowing, and later on, lifetimes. But here's some rules of thumb.
If your function needs to take ownership of the data, pass by value.
If your function only needs to read the data, pass a reference.
If your function needs to change the data, pass a mutable reference.
Note what's not in there: performance. Let the compiler take care of that.
Assuming check only reads data and checks that it's ok, it should take a reference. So your example would be...
fn foo(val: &Bar) -> bool { check(val) }
On the client side, if I only had the second version but wanted to have my value consumed...
There's no reason to want a function which takes a reference to do that. If it's the function's job to manage the memory, you pass it ownership. If it isn't, it's not its job to manage your memory.
There's also no need to manually call drop. You'd simply let the variable fall out of scope and it will be automatically dropped.
And when would you want to offer both versions (via traits)?
You wouldn't. If a function can take a reference there's no reason for it to take ownership.
If the function needs ownership, you should pass by value. If the function only needs a reference, you should pass by reference.
Passing by value fn foo(val: Bar) when it isn't necessary for the function to work could require the user to clone the value. Passing by reference is preferred in this case since a clone can be avoided.
Passing by reference fn foo(val: &Bar) when the function needs ownership would require it to either copy or clone the value. Pass by value is preferred in this case because it gives the user control whether an existing value's ownership is transferred or is cloned. The function doesn't have to make that decision and a clone can be avoided.
There are some exceptions, simple primitives like i32 can be passed-by-value without any performance penalty and may be more convenient.
And when would you want to offer both versions (via traits)?
You could use the Borrow trait:
fn foo<B: Borrow<Bar>>(val: B) -> bool {
check(val.borrow())
}
let b: Bar = ...;
foo(&b); // both of
foo(b); // these work

Why do immutable references to copy types in rust exist?

So I just started learning rust (first few chapters of "the book") and am obviously quite a noob. I finished the ownership-basics chapter (4) and wrote some test programs to make sure I understood everything. I seem to have the basics down but I asked myself why immutable references to copy-types are even possible. I will try to explain my thoughts with examples.
I thought that you maybe want to store a reference to a copy-type so you can check it's value later instead of having a copy of the old value but this can't be it since the underlying value can't be changed as long as it's been borrowed.
The most basic example of this would be this code:
let mut x = 10; // push i32
let x_ref = &x; // push immutable reference to x
// x = 100; change x which is disallowed since it's borrowed currently
println!("{}", x_ref); // do something with the reference since you want the current value of x
The only reason for this I can currently think of (with my current knowledge) is that they just exist so you can call generic methods which require references (like cmp) with them.
This code demonstrates this:
let x = 10; // push i32
// let ordering = 10.cmp(x); try to compare it but you can't since cmp wants a reference
let ordering = 10.cmp(&x) // this works since it's now a reference
So, is that the only reason you can create immutable references to copy-types?
Disclaimer:
I don't see Just continue reading the book as a valid answer. However I fully understand if you say something like Yes you need those for this and this use-case (optional example), it will be covered in chapter X. I hope you understand what I mean :)
EDIT:
Maybe worth mentioning, I'm a C# programmer and not new to programming itself.
EDIT 2:
I don't know if this is technically a duplicate of this question but I do not fully understand the question and the answer so I hope for a more simple answer understandable by a real noob.
An immutable reference to a Copy-type is still "an immutable reference". The code that gets passed the reference can't change the original value. It can make a (hopefully) trivial copy of that value, but it can still only ever change that copy after doing so.
That is, the original owner of the value is ensured that - while receivers of the reference may decide to make a copy and change that - the state of whatever is referenced can't ever change. If the receiver wants to change the value, it can feel free; nobody else is going to see it, though.
Immutable references to primitives are not different, and while being Copy everywhere, you are probably more inclined to what "an immutable reference" means semantically for primitive types. For instance
fn print_the_age(age: &i32) { ... }
That function could make a copy via *age and change it. But the caller will not see that change and it does not make much sense to do so in the first place.
Update due to comment: There is no advantage per se, at least as far as primitives are concerned (larger types may be costly to copy). It does boil down to the semantic relationship between the owner of the i32 and the receiver: "Here is a reference, it is guaranteed to not change while you have that reference, I - the owner - can't change or move or deallocate and there is no other thread else including myself that could possibly do that".
Consider where the reference is coming from: If you receive an &i32, wherever it is coming from can't change and can't deallocate. The `i32´ may be part of a larger type, which - due to handing out a reference - can't move, change or get de-allocated; the receiver is guaranteed of that. It's hard to say there is an advantage per se in here; it might be advantageous to communicate more detailed type (and lifetime!) relationships this way.
They're very useful, because they can be passed to generic functions that expect a reference:
fn map_vec<T, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
If immutable references of non-Copy types were forbidden, we would need two versions:
fn map_vec_own<T: !Copy, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
fn map_vec_copy<T: Copy, U>(v: &Vec<T>, f: impl Fn( T) -> U) -> Vec<U> {...}
Immutable references are, naturally, used to provide access to the referenced data. For instance, you could have loaded a dictionary and have multiple threads reading from it at the same time, each using their own immutable reference. Because the references are immutable those threads will not corrupt that common data.
Using only mutable references, you can't be sure of that so you need to make full copies. Copying data takes time and space, which are always limited. The primary question for performance tends to be if your data fits in CPU cache.
I'm guessing you were thinking of "copy" types as ones that fit in the same space as the reference itself, i.e. sizeof(type) <= sizeof(type*). Rust's Copy trait indicates data that could be safely copied, no matter the size. These are orthogonal concepts; for instance, a pointer might not be safely copied without adjusting a refernce count, or an array might be copyable but take gigabytes of memory. This is why Rc<T> has the Clone trait, not Copy.

Is it safe and defined behavior to transmute between a T and an UnsafeCell<T>?

A recent question was looking for the ability to construct self-referential structures. In discussing possible answers for the question, one potential answer involved using an UnsafeCell for interior mutability and then "discarding" the mutability through a transmute.
Here's a small example of such an idea in action. I'm not deeply interested in the example itself, but it's just enough complication to require a bigger hammer like transmute as opposed to just using UnsafeCell::new and/or UnsafeCell::into_inner:
use std::{
cell::UnsafeCell, mem, rc::{Rc, Weak},
};
// This is our real type.
struct ReallyImmutable {
value: i32,
myself: Weak<ReallyImmutable>,
}
fn initialize() -> Rc<ReallyImmutable> {
// This mirrors ReallyImmutable but we use `UnsafeCell`
// to perform some initial interior mutation.
struct NotReallyImmutable {
value: i32,
myself: Weak<UnsafeCell<NotReallyImmutable>>,
}
let initial = NotReallyImmutable {
value: 42,
myself: Weak::new(),
};
// Without interior mutability, we couldn't update the `myself` field
// after we've created the `Rc`.
let second = Rc::new(UnsafeCell::new(initial));
// Tie the recursive knot
let new_myself = Rc::downgrade(&second);
unsafe {
// Should be safe as there can be no other accesses to this field
(&mut *second.get()).myself = new_myself;
// No one outside of this function needs the interior mutability
// TODO: Is this call safe?
mem::transmute(second)
}
}
fn main() {
let v = initialize();
println!("{} -> {:?}", v.value, v.myself.upgrade().map(|v| v.value))
}
This code appears to print out what I'd expect, but that doesn't mean that it's safe or using defined semantics.
Is transmuting from a UnsafeCell<T> to a T memory safe? Does it invoke undefined behavior? What about transmuting in the opposite direction, from a T to an UnsafeCell<T>?
(I am still new to SO and not sure if "well, maybe" qualifies as an answer, but here you go. ;)
Disclaimer: The rules for these kinds of things are not (yet) set in stone. So, there is no definitive answer yet. I'm going to make some guesses based on (a) what kinds of compiler transformations LLVM does/we will eventually want to do, and (b) what kind of models I have in my head that would define the answer to this.
Also, I see two parts to this: The data layout perspective, and the aliasing perspective. The layout issue is that NotReallyImmutable could, in principle, have a totally different layout than ReallyImmutable. I don't know much about data layout, but with UnsafeCell becoming repr(transparent) and that being the only difference between the two types, I think the intent is for this to work. You are, however, relying on repr(transparent) being "structural" in the sense that it should allow you to replace things in larger types, which I am not sure has been written down explicitly anywhere. Sounds like a proposal for a follow-up RFC that extends the repr(transparent) guarantees appropriately?
As far as aliasing is concerned, the issue is breaking the rules around &T. I'd say that, as long as you never have a live &T around anywhere when writing through the &UnsafeCell<T>, you are good -- but I don't think we can guarantee that quite yet. Let's look in more detail.
Compiler perspective
The relevant optimizations here are the ones that exploit &T being read-only. So if you reordered the last two lines (transmute and the assignment), that code would likely be UB as we may want the compiler to be able to "pre-fetch" the value behind the shared reference and re-use that value later (i.e. after inlining this).
But in your code, we would only emit "read-only" annotations (noalias in LLVM) after the transmute comes back, and the data is indeed read-only starting there. So, this should be good.
Memory models
The "most aggressive" of my memory models essentially asserts that all values are always valid, and I think even that model should be fine with your code. &UnsafeCell is a special case in that model where validity just stops, and nothing is said about what lives behind this reference. The moment the transmute returns, we grab the memory it points to and make it all read-only, and even if we did that "recursively" through the Rc (which my model doesn't, but only because I couldn't figure out a good way to make it do so) you'd be fine as you don't mutate any more after the transmute. (As you may have noticed, this is the same restriction as in the compiler perspective. The point of these models is to allow compiler optimizations, after all. ;)
(As a side-note, I really wish miri was in better shape right now. Seems I have to try and get validation to work again in there, because then I could tell you to just run your code in miri and it'd tell you if that version of my model is okay with what you are doing :D )
I am thinking about other models currently that only check things "on access", but haven't worked out the UnsafeCell story for that model yet. What this example shows is that the model may have to contain ways for a "phase transition" of memory first being UnsafeCell, but later having normal sharing with read-only guarantees. Thanks for bringing this up, that will make for some nice examples to think about!
So, I think I can say that (at least from my side) there is the intent to allow this kind of code, and doing so does not seem to prevent any optimizations. Whether we'll actually manage to find a model that everybody can agree with and that still allows this, I cannot predict.
The opposite direction: T -> UnsafeCell<T>
Now, this is more interesting. The problem is that, as I said above, you must not have a &T live when writing through an UnsafeCell<T>. But what does "live" mean here? That's a hard question! In some of my models, this could be as weak as "a reference of that type exists somewhere and the lifetime is still active", i.e., it could have nothing to do with whether the reference is actually used. (That's useful because it lets us do more optimizations, like moving a load out of a loop even if we cannot prove that the loop ever runs -- which would introduce a use of an otherwise unused reference.) And since &T is Copy, you cannot even really get rid of such a reference either. So, if you have x: &T, then after let y: &UnsafeCell<T> = transmute(x), the old x is still around and its lifetime still active, so writing through y could well be UB.
I think you'd have to somehow restrict the aliasing that &T allows, very carefully making sure that nobody still holds such a reference. I'm not going to say "this is impossible" because people keep surprising me (especially in this community ;) but TBH I cannot think of a way to make this work. I'd be curious if you have an example though where you think this is reasonable.

Resources