Why is std::rc::Rc<> not Copy?

Why is std::rc::Rc<> not Copy? - rust

Can someone explain to me why Rc<> is not Copy?
I'm writing code that uses a lot of shared pointers, and having to type .clone() all the time is getting on my nerves.
It seems to me that Rc<> should just consist of a pointer, which is a fixed size, so the type itself should be Sized and hence Copy, right?
Am I missing something?

It seems to me that Rc<> should just consist of a pointer, which is a fixed size, so the type itself should be Sized and hence Copy, right?
This is not quite true. Rc is short for Reference Counted. This means that the type keeps track of how many references point to the owned data. That way we can have multiple owners at the same time and safely free the data, once the reference count reaches 0.
But how do we keep the reference counter valid and up to date? Exactly, we have to do something whenever a new reference/owner is created and whenever a reference/owner is deleted. Specifically, we have to increase the counter in the former case and decrease it in the latter.
The counter is decreased by implementing Drop, the Rust equivalent of a destructor. This drop() function is executed whenever a variable goes out of scope – perfect for our goal.
But when do we do the increment? You guessed it: in clone(). The Copy trait, by definition, says that a type can be duplicated just by copying bits:
Types that can be copied by simply copying bits (i.e. memcpy).
This is not true in our case, because: yes, we "just copy bits", but we also do additional work! We do need to increment our reference counter!
Drop impl of Rc
Clone impl of Rc

A type cannot implement Copy if it implements Drop (source). Since Rc does implement it to decrement its reference count, it is not possible.
In addition, Rc is not just a pointer. It consists of a Shared:
pub struct Rc<T: ?Sized> {
ptr: Shared<RcBox<T>>,
}
Which, in turn, is not only a pointer:
pub struct Shared<T: ?Sized> {
pointer: NonZero<*const T>,
_marker: PhantomData<T>,
}
PhantomData is needed to express the ownership of T:
this marker has no consequences for variance, but is necessary for
dropck to understand that we logically own a T.
For details, see:
https://github.com/rust-lang/rfcs/blob/master/text/0769-sound-generic-drop.md#phantom-data

Related

How to determine if passing an argument to a function call will do a move, a copy, a ref or a mut ref?

I am in my first days on rust, coming from php
I am developing some basic console programs to acquire confidence with ownership.
Is there a way to determine if passing an argument to a function call will do a move, a copy, a ref or a mut ref?
I am referring to structs only, defined by myself, for now.
I am referring to functions not returning values. This is a whole new topic for me
For what I can understand now
mystruct: &mut StructureOfMine
is explicitly a mutable ref, so if in the function I alter a field of mystruct, it will be reflected outside
mystruct: &StructureOfMine
is explicitly a non mutable ref, so in the function I cannot alter fields of mystruct
mystruct: StructureOfMine
mystruct is moved if StructureOfMine is not implementing Copy. I can alter mystruct, but the caller cannot use mystruct anymore.
mystruct is copyed if StructureOfMine is implementing Copy. I can alter mystruct; the caller use its original copy of mystruct but changes to this copy are not reversed into the original one
Questions
Can you confirm / redact what I understood of these cases?
Is there a way to determine if something has been copied or moved ?

As #BlackBeans already answered most of your question, I will talk about "how do I detect a Copy vs a Move".
It's not really important which one is actually used in the end. Rust figures this out internally.
The only difference for the programmer is that you cannot use a moved object after it is moved away. This isn't a big concern, however, because this is not something the programmer has to ensure by himself; it is literally a compiler error to do this wrong. This is the beauty of the borrow checker.
Here, look at the following code:
#[derive(Debug, Copy, Clone)]
struct StructWithCopy;
fn take_object(obj: StructWithCopy) {
println!("Taken: {:?} (Address: {:p})", obj, &obj);
}
fn main() {
let obj = StructWithCopy;
take_object(obj);
println!("Original: {:?} (Address: {:p})", obj, &obj);
}
Taken: StructWithCopy (Address: 0x7fff9ced8478)
Original: StructWithCopy (Address: 0x7fff9ced8508)
Here, a copy is used; simply because you are able to access the object after passing it to the function.
If the object wouldn't be copyable, this would cause a compiler error:
#[derive(Debug, Clone)]
struct StructWithoutCopy;
fn take_object(obj: StructWithoutCopy) {
println!("Taken: {:?} (Address: {:p})", obj, &obj);
}
fn main() {
let obj = StructWithoutCopy;
take_object(obj);
println!("Original: {:?} (Address: {:p})", obj, &obj);
}
error[E0382]: borrow of moved value: `obj`
--> src/main.rs:11:48
|
9 | let obj = StructWithoutCopy;
| --- move occurs because `obj` has type `StructWithoutCopy`, which does not implement the `Copy` trait
10 | take_object(obj);
| --- value moved here
11 | println!("Original: {:?} (Address: {:p})", obj, &obj);
| ^^^ value borrowed here after move
|
= note: this error originates in the macro `$crate::format_args_nl` (in Nightly builds, run with -Z macro-backtrace for more info)
This is the beauty of programming in Rust; you can be confident that when your code compiles, there is no undefined behavior in it, like double free, using uninitialized memory, buffer overflows, etc.
Don't think about it too much, just enjoy the confidence your compiler provides :)

You've understood the cases, but it seems like you are missing the big picture, because they all are handled by a single, very simple rule, which is: when you pass a value in a function, it is moved, which means it is copied to a new location where the function expects it. In addition, Rust's ownership model now assumes the variable is owned by the called function, not anymore by the caller. Keep in mind that this rule applies to any type. Let's see how it applies to the particular cases your have mentioned.
Types that implement Copy
Let's start with a fundamental one. Copy is just a marker trait that says to the compiler "forget about the ownership transferal", that is, if I move a value, it doesn't mean I don't own it anymore (at least, from the ownership point of view; Copy also implies other things).
It's actually quite natural: assume I have a number, and I give you that number (by copying it, which is the only way to physically move memory around). Does it mean my version of the number gets invalidated? No. Which is why all numerical types in Rust implement Copy.
A type that doesn't implement Copy, on the other hand, is for example a vector. This is because a vector is, roughly speaking, a pointer to an area of the memory that is also "owned" by whomever owns the vector, and that gets freed when the vector is dropped. This means that, when I move a vector to you, I only copy that pointer, not the whole allocated memory (which shouldn't move anyways). This invalidates my version of the vector because not doing so would mean that the data allocated in memory, which is owned by the fat pointer, has now two owners (which is not allowed by Rust's ownership model). This is why Vec<_>: !Copy. Note that if you wanted to also copy the whole allocate memory, in order to keep my version valid, you'd have to explicitly clone it, which is possible because Vec<T>: Clone if T: Clone.
Borrows
Now, the case of borrows is very simply. For any type T, &T: Copy. It's that simple! When I pass a borrow of data I own, I have first to create a borrow myself (which I own), then I pass it, which simply means copying the pointer, and my borrow is still valid.
Note that this is possible because it's impossible (at least, let's assume it is for the sake of simplicity) to modify a value behind a borrow, which makes the &T: Copy sound.
In addition, I can't myself access the owned data if there is still a chance of someone having a borrow of it somewhere.
Mutable borrows
A mutable borrow is pretty much like a borrow, except it doesn't implement Copy (nor Clone, so you can't either explicitly clone it). This means that when you first create a mutable borrow, which you own, and then pass, it is still copied, but you version gets invalidated (because you gave ownership).
In addition, you can't create a borrow (mutable or not) or access the owned data is the mutable borrow is still around.

Is it safe to transmute::<&'a Arc<T>, &'a Weak<T>>(…)?

Is it safe to transmute a shared reference & to a strong Arc<T> into a shared reference & to a Weak<T>?
To ask another way: is the following safe function sound, or is it a vulnerability waiting to happen?
pub fn as_weak<'a, T>(strong: &'a Arc<T>) -> &'a Weak<T> {
unsafe { transmute::<&'a Arc<T>, &'a Weak<T>>(strong) }
}
Why I want to do this
We have an existing function that returns a &Weak<T>. The internal data structure has changed a bit, and I now have an Arc<T> where I previously had a Weak<T>, but I need to maintain semver compatibility with this function's interface. I'd rather avoid needing to stash an actual Weak<T> copy just for the sake of this function if I don't need to.
Why I hope this is safe
The underlying memory representations of Arc<T> and Weak<T> are the same: a not-null pointer (or pointer-like value for Weak::new()) to an internal ArcInner struct, which contains the strong and weak reference counts and the inner T value.
Arc<T> also contains a PhantomData<T>, but my understanding is that if that changes anything, it would only apply on drop, which isn't relevant for the case here as we're only transmuting a shared reference, not an owned value.
The operations that an Arc<T> will perform on its inner pointer are presumably a superset of those that may be performed by a Weak<T>, since they have the same representation but Arc carries a guarantee that the inner T value is still alive, while Weak does not.
Given these facts, it seems to me like nothing could go wrong. However, I haven't written much unsafe code before, and never for a production case like this. I'm not confident that I fully understand the possible issues. Is this transmutation safe and sound, or are are there other factors that need to be considered?

No, this is not sound.
Neither Arc nor Weak has a #[repr] forcing a particular layout, therefore they are both #[repr(Rust)] by default. According to the Rustonomicon section about repr(Rust):
struct A {
a: i32,
b: u64,
}
struct B {
a: i32,
b: u64,
}
Rust does guarantee that two instances of A have their data laid out in exactly the same way. However Rust does not currently guarantee that an instance of A has the same field ordering or padding as an instance of B.
You cannot therefore assume that Arc<T> and Weak<T> have the same layout.

Is my understanding of a Rust vector that supports Rc or Box wrapped types correct?

I'm not looking for code samples. I want to state my understanding of Box vs. Rc and have you tell me if my understanding is right or wrong.
Let's say I have some trait ChattyAnimal and a struct Cat that implements this trait, e.g.
pub trait ChattyAnimal {
fn make_sound(&self);
}
pub struct Cat {
pub name: String,
pub sound: String
}
impl ChattyAnimal for Cat {
fn make_sound(&self) {
println!("Meow!");
}
}
Now let's say I have other structs (Dog, Cow, Chicken, ...) that also implement the ChattyAnimal trait, and let's say I want to store all of these in the same vector.
So step 1 is I would have to use a Box type, because the Rust compiler cannot determine the size of everything that might implement this trait. And therefore, we must store these items on the heap – viola using a Box type, which is like a smarter pointer in C++. Anything wrapped with Box is automatically deleted by Rust when it goes out of scope.
// I can alias and use my Box type that wraps the trait like this:
pub type BoxyChattyAnimal = Box<dyn ChattyAnimal>;
// and then I can use my type alias, i.e.
pub struct Container {
animals: Vec<BoxyChattyAnimal>
}
Meanwhile, with Box, Rust's borrow checker requires changing when I pass or reassign the instance. But if I actually want to have multiple references to the same underlying instance, I have to use Rc. And so to have a vector of ChattyAnimal instances where each instance can have multiple references, I would need to do:
pub type RcChattyAnimal = Rc<dyn ChattyAnimal>;
pub struct Container {
animals: Vec<RcChattyAnimal>
}
One important take away from this is that if I want to have a vector of some trait type, I need to explicitly set that vector's type to a Box or Rc that wraps my trait. And so the Rust language designers force us to think about this in advance so that a Box or Rc cannot (at least not easily or accidentally) end up in the same vector.
This feels like a very and well thought design – helping prevent me from introducing bugs in my code. Is my understanding as stated above correct?

Yes, all this is correct.
There's a second reason for this design: it allows the compiler to verify that the operations you're performing on the vector elements are using memory in a safe way, relative to how they're stored.
For example, if you had a method on ChattyAnimal that mutates the animal (i.e. takes a &mut self argument), you could call that method on elements of a Vec<Box<dyn ChattyAnimal>> as long as you had a mutable reference to the vector; the Rust compiler would know that there could only be one reference to the ChattyAnimal in question (because the only reference is inside the Box, which is inside the Vec, and you have a mutable reference to the Vec so there can't be any other references to it). If you tried to write the same code with a Vec<Rc<dyn ChattyAnimal>>, the compiler would complain; it wouldn't be able to completely eliminate the possibility that your code might be mutating the animal at the same time as the code that called it was in the middle of trying to read the animal, which might lead to some inconsistencies in the calling code.
As a consequence, the compiler needs to know that all the elements of the Vec have their memory treated in the same way, so that it can check to make sure that a reference to some arbitrary element of the Vec is being used appropriately.
(There's a third reason, too, which is performance; because the compiler knows that this is a "vector of Boxes" or "vector of Rcs", it can generate code that assumes a particular storage mechanism. For example, if you have a vector of Rcs, and clone one of the elements, the machine code that the compiler generates will work simply by going to the memory address listed in the vector and adding 1 to the reference count stored there – there's no need for any extra levels of indirection. If the vector were allowed to mix different allocation schemes, the generated code would have to be a lot more complex, because it wouldn't be able to assume things like "there is a reference count", and would instead need to (at runtime) find the appropriate piece of code for dealing with the memory allocation scheme in use, and then run it; that would be much slower.)

Does a type that implements copy get moved if possible?

If you pass a type into a function that is not a pointer type and it implements Copy, does Rust copy it only if necessary?
Here is some specific code:
#[derive(Clone, Copy)]
struct Data([u8; 2]);
#[derive(Clone, Copy)]
struct Buffer(Data);
fn do_something(id: Buffer) {
println!("{}", id.0.0[0]);
}
fn create_buffer() -> Buffer {
Buffer(Data([0x01, 0x01]))
}
fn main() {
let buffer = create_buffer();
// Since we don't use buffer again in this function, copy trait isn't necessary on Buffer.
// But, "Buffer" does implement the Copy trait. Will Rust copy it anyway?
do_something(buffer);
}
If rust does not copy it, are there rules to this behavior or is it entirely up to the compiler to decide? Can I rely on this not being copied?
What if I were to call do_something(buffer) twice? Does it copy twice, or move once and copy once? This requires the copy trait to even compile so I expect at least 1 copy.
do_something(buffer);
do_something(buffer);

move and copy are solely type system concerns denoting whether the source can or can not still be used (by high-level code) after it has been used once[0]. In the lingo, it only defines whether the type is affine (aka move, the default) or normal (Copy)
After typechecking, both result in the same actual operation: a memcopy (semantically), then the normal elision optimisations will work the same for both and may optimise away the actual copy.
So the answer would be "yes". At a codegen level, there really is no difference between Copy and non-Copy types as long as they are used in the same way. Do note that the elision optimisation may fail to trigger in both cases. In fact there are issues on the tracker (some closed and some not) where "move elision" fails to trigger, and a large on-stack non-Copy type gets copied around.
[0] I believe there is limited NRVO and plans for more in MIR / rustc itself, but the vast majority of the work is left to LLVM: https://github.com/rust-lang/rust/issues/32966

Does a move involve a copy? [duplicate]

Editor's note: this question was asked before Rust 1.0 and some of the assertions in the question are not necessarily true in Rust 1.0. Some answers have been updated to address both versions.
I have this struct
struct Triplet {
one: i32,
two: i32,
three: i32,
}
If I pass this to a function, it is implicitly copied. Now, sometimes I read that some values are not copyable and therefore have to moved.
Would it be possible to make this struct Triplet non-copyable? For example, would it be possible to implement a trait which would make Triplet non-copyable and therefore "movable"?
I read somewhere that one has to implement the Clone trait to copy things that are not implicitly copyable, but I never read about the other way around, that is having something that is implicitly copyable and making it non-copyable so that it moves instead.
Does that even make any sense?

Preface: This answer was written before opt-in built-in traits—specifically the Copy aspects—were implemented. I've used block quotes to indicate the sections that only applied to the old scheme (the one that applied when the question was asked).
Old: To answer the basic question, you can add a marker field storing a NoCopy value. E.g.
struct Triplet {
one: int,
two: int,
three: int,
_marker: NoCopy
}
You can also do it by having a destructor (via implementing the Drop trait), but using the marker types is preferred if the destructor is doing nothing.
Types now move by default, that is, when you define a new type it doesn't implement Copy unless you explicitly implement it for your type:
struct Triplet {
one: i32,
two: i32,
three: i32
}
impl Copy for Triplet {} // add this for copy, leave it out for move
The implementation can only exist if every type contained in the new struct or enum is itself Copy. If not, the compiler will print an error message. It can also only exist if the type doesn't have a Drop implementation.
To answer the question you didn't ask... "what's up with moves and copy?":
Firstly I'll define two different "copies":
a byte copy, which is just shallowly copying an object byte-by-byte, not following pointers, e.g. if you have (&usize, u64), it is 16 bytes on a 64-bit computer, and a shallow copy would be taking those 16 bytes and replicating their value in some other 16-byte chunk of memory, without touching the usize at the other end of the &. That is, it's equivalent to calling memcpy.
a semantic copy, duplicating a value to create a new (somewhat) independent instance that can be safely used separately to the old one. E.g. a semantic copy of an Rc<T> involves just increasing the reference count, and a semantic copy of a Vec<T> involves creating a new allocation, and then semantically copying each stored element from the old to the new. These can be deep copies (e.g. Vec<T>) or shallow (e.g. Rc<T> doesn't touch the stored T), Clone is loosely defined as the smallest amount of work required to semantically copy a value of type T from inside a &T to T.
Rust is like C, every by-value use of a value is a byte copy:
let x: T = ...;
let y: T = x; // byte copy
fn foo(z: T) -> T {
return z // byte copy
}
foo(y) // byte copy
They are byte copies whether or not T moves or is "implicitly copyable". (To be clear, they aren't necessarily literally byte-by-byte copies at run-time: the compiler is free to optimise the copies out if code's behaviour is preserved.)
However, there's a fundamental problem with byte copies: you end up with duplicated values in memory, which can be very bad if they have destructors, e.g.
{
let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
} // destructors run here
If w was just a plain byte copy of v then there would be two vectors pointing at the same allocation, both with destructors that free it... causing a double free, which is a problem. NB. This would be perfectly fine, if we did a semantic copy of v into w, since then w would be its own independent Vec<u8> and destructors wouldn't be trampling on each other.
There's a few possible fixes here:
Let the programmer handle it, like C. (there's no destructors in C, so it's not as bad... you just get left with memory leaks instead. :P )
Perform a semantic copy implicitly, so that w has its own allocation, like C++ with its copy constructors.
Regard by-value uses as a transfer of ownership, so that v can no longer be used and doesn't have its destructor run.
The last is what Rust does: a move is just a by-value use where the source is statically invalidated, so the compiler prevents further use of the now-invalid memory.
let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
println!("{}", v); // error: use of moved value
Types that have destructors must move when used by-value (aka when byte copied), since they have management/ownership of some resource (e.g. a memory allocation, or a file handle) and its very unlikely that a byte copy will correctly duplicate this ownership.
"Well... what's an implicit copy?"
Think about a primitive type like u8: a byte copy is simple, just copy the single byte, and a semantic copy is just as simple, copy the single byte. In particular, a byte copy is a semantic copy... Rust even has a built-in trait Copy that captures which types have identical semantic and byte copies.
Hence, for these Copy types by-value uses are automatically semantic copies too, and so it's perfectly safe to continue using the source.
let v: u8 = 1;
let w: u8 = v;
println!("{}", v); // perfectly fine
Old: The NoCopy marker overrides the compiler's automatic behaviour of assuming that types which can be Copy (i.e. only containing aggregates of primitives and &) are Copy. However this will be changing when opt-in built-in traits is implemented.
As mentioned above, opt-in built-in traits are implemented, so the compiler no longer has automatic behaviour. However, the rule used for the automatic behaviour in the past are the same rules for checking whether it is legal to implement Copy.

The easiest way is to embed something in your type that is not copyable.
The standard library provides a "marker type" for exactly this use case: NoCopy. For example:
struct Triplet {
one: i32,
two: i32,
three: i32,
nocopy: NoCopy,
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string