Does a move involve a copy? [duplicate] - rust

Editor's note: this question was asked before Rust 1.0 and some of the assertions in the question are not necessarily true in Rust 1.0. Some answers have been updated to address both versions.
I have this struct
struct Triplet {
one: i32,
two: i32,
three: i32,
}
If I pass this to a function, it is implicitly copied. Now, sometimes I read that some values are not copyable and therefore have to moved.
Would it be possible to make this struct Triplet non-copyable? For example, would it be possible to implement a trait which would make Triplet non-copyable and therefore "movable"?
I read somewhere that one has to implement the Clone trait to copy things that are not implicitly copyable, but I never read about the other way around, that is having something that is implicitly copyable and making it non-copyable so that it moves instead.
Does that even make any sense?

Preface: This answer was written before opt-in built-in traits—specifically the Copy aspects—were implemented. I've used block quotes to indicate the sections that only applied to the old scheme (the one that applied when the question was asked).
Old: To answer the basic question, you can add a marker field storing a NoCopy value. E.g.
struct Triplet {
one: int,
two: int,
three: int,
_marker: NoCopy
}
You can also do it by having a destructor (via implementing the Drop trait), but using the marker types is preferred if the destructor is doing nothing.
Types now move by default, that is, when you define a new type it doesn't implement Copy unless you explicitly implement it for your type:
struct Triplet {
one: i32,
two: i32,
three: i32
}
impl Copy for Triplet {} // add this for copy, leave it out for move
The implementation can only exist if every type contained in the new struct or enum is itself Copy. If not, the compiler will print an error message. It can also only exist if the type doesn't have a Drop implementation.
To answer the question you didn't ask... "what's up with moves and copy?":
Firstly I'll define two different "copies":
a byte copy, which is just shallowly copying an object byte-by-byte, not following pointers, e.g. if you have (&usize, u64), it is 16 bytes on a 64-bit computer, and a shallow copy would be taking those 16 bytes and replicating their value in some other 16-byte chunk of memory, without touching the usize at the other end of the &. That is, it's equivalent to calling memcpy.
a semantic copy, duplicating a value to create a new (somewhat) independent instance that can be safely used separately to the old one. E.g. a semantic copy of an Rc<T> involves just increasing the reference count, and a semantic copy of a Vec<T> involves creating a new allocation, and then semantically copying each stored element from the old to the new. These can be deep copies (e.g. Vec<T>) or shallow (e.g. Rc<T> doesn't touch the stored T), Clone is loosely defined as the smallest amount of work required to semantically copy a value of type T from inside a &T to T.
Rust is like C, every by-value use of a value is a byte copy:
let x: T = ...;
let y: T = x; // byte copy
fn foo(z: T) -> T {
return z // byte copy
}
foo(y) // byte copy
They are byte copies whether or not T moves or is "implicitly copyable". (To be clear, they aren't necessarily literally byte-by-byte copies at run-time: the compiler is free to optimise the copies out if code's behaviour is preserved.)
However, there's a fundamental problem with byte copies: you end up with duplicated values in memory, which can be very bad if they have destructors, e.g.
{
let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
} // destructors run here
If w was just a plain byte copy of v then there would be two vectors pointing at the same allocation, both with destructors that free it... causing a double free, which is a problem. NB. This would be perfectly fine, if we did a semantic copy of v into w, since then w would be its own independent Vec<u8> and destructors wouldn't be trampling on each other.
There's a few possible fixes here:
Let the programmer handle it, like C. (there's no destructors in C, so it's not as bad... you just get left with memory leaks instead. :P )
Perform a semantic copy implicitly, so that w has its own allocation, like C++ with its copy constructors.
Regard by-value uses as a transfer of ownership, so that v can no longer be used and doesn't have its destructor run.
The last is what Rust does: a move is just a by-value use where the source is statically invalidated, so the compiler prevents further use of the now-invalid memory.
let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
println!("{}", v); // error: use of moved value
Types that have destructors must move when used by-value (aka when byte copied), since they have management/ownership of some resource (e.g. a memory allocation, or a file handle) and its very unlikely that a byte copy will correctly duplicate this ownership.
"Well... what's an implicit copy?"
Think about a primitive type like u8: a byte copy is simple, just copy the single byte, and a semantic copy is just as simple, copy the single byte. In particular, a byte copy is a semantic copy... Rust even has a built-in trait Copy that captures which types have identical semantic and byte copies.
Hence, for these Copy types by-value uses are automatically semantic copies too, and so it's perfectly safe to continue using the source.
let v: u8 = 1;
let w: u8 = v;
println!("{}", v); // perfectly fine
Old: The NoCopy marker overrides the compiler's automatic behaviour of assuming that types which can be Copy (i.e. only containing aggregates of primitives and &) are Copy. However this will be changing when opt-in built-in traits is implemented.
As mentioned above, opt-in built-in traits are implemented, so the compiler no longer has automatic behaviour. However, the rule used for the automatic behaviour in the past are the same rules for checking whether it is legal to implement Copy.

The easiest way is to embed something in your type that is not copyable.
The standard library provides a "marker type" for exactly this use case: NoCopy. For example:
struct Triplet {
one: i32,
two: i32,
three: i32,
nocopy: NoCopy,
}

Related

Is my understanding of a Rust vector that supports Rc or Box wrapped types correct?

I'm not looking for code samples. I want to state my understanding of Box vs. Rc and have you tell me if my understanding is right or wrong.
Let's say I have some trait ChattyAnimal and a struct Cat that implements this trait, e.g.
pub trait ChattyAnimal {
fn make_sound(&self);
}
pub struct Cat {
pub name: String,
pub sound: String
}
impl ChattyAnimal for Cat {
fn make_sound(&self) {
println!("Meow!");
}
}
Now let's say I have other structs (Dog, Cow, Chicken, ...) that also implement the ChattyAnimal trait, and let's say I want to store all of these in the same vector.
So step 1 is I would have to use a Box type, because the Rust compiler cannot determine the size of everything that might implement this trait. And therefore, we must store these items on the heap – viola using a Box type, which is like a smarter pointer in C++. Anything wrapped with Box is automatically deleted by Rust when it goes out of scope.
// I can alias and use my Box type that wraps the trait like this:
pub type BoxyChattyAnimal = Box<dyn ChattyAnimal>;
// and then I can use my type alias, i.e.
pub struct Container {
animals: Vec<BoxyChattyAnimal>
}
Meanwhile, with Box, Rust's borrow checker requires changing when I pass or reassign the instance. But if I actually want to have multiple references to the same underlying instance, I have to use Rc. And so to have a vector of ChattyAnimal instances where each instance can have multiple references, I would need to do:
pub type RcChattyAnimal = Rc<dyn ChattyAnimal>;
pub struct Container {
animals: Vec<RcChattyAnimal>
}
One important take away from this is that if I want to have a vector of some trait type, I need to explicitly set that vector's type to a Box or Rc that wraps my trait. And so the Rust language designers force us to think about this in advance so that a Box or Rc cannot (at least not easily or accidentally) end up in the same vector.
This feels like a very and well thought design – helping prevent me from introducing bugs in my code. Is my understanding as stated above correct?
Yes, all this is correct.
There's a second reason for this design: it allows the compiler to verify that the operations you're performing on the vector elements are using memory in a safe way, relative to how they're stored.
For example, if you had a method on ChattyAnimal that mutates the animal (i.e. takes a &mut self argument), you could call that method on elements of a Vec<Box<dyn ChattyAnimal>> as long as you had a mutable reference to the vector; the Rust compiler would know that there could only be one reference to the ChattyAnimal in question (because the only reference is inside the Box, which is inside the Vec, and you have a mutable reference to the Vec so there can't be any other references to it). If you tried to write the same code with a Vec<Rc<dyn ChattyAnimal>>, the compiler would complain; it wouldn't be able to completely eliminate the possibility that your code might be mutating the animal at the same time as the code that called it was in the middle of trying to read the animal, which might lead to some inconsistencies in the calling code.
As a consequence, the compiler needs to know that all the elements of the Vec have their memory treated in the same way, so that it can check to make sure that a reference to some arbitrary element of the Vec is being used appropriately.
(There's a third reason, too, which is performance; because the compiler knows that this is a "vector of Boxes" or "vector of Rcs", it can generate code that assumes a particular storage mechanism. For example, if you have a vector of Rcs, and clone one of the elements, the machine code that the compiler generates will work simply by going to the memory address listed in the vector and adding 1 to the reference count stored there – there's no need for any extra levels of indirection. If the vector were allowed to mix different allocation schemes, the generated code would have to be a lot more complex, because it wouldn't be able to assume things like "there is a reference count", and would instead need to (at runtime) find the appropriate piece of code for dealing with the memory allocation scheme in use, and then run it; that would be much slower.)

Does a type that implements copy get moved if possible?

If you pass a type into a function that is not a pointer type and it implements Copy, does Rust copy it only if necessary?
Here is some specific code:
#[derive(Clone, Copy)]
struct Data([u8; 2]);
#[derive(Clone, Copy)]
struct Buffer(Data);
fn do_something(id: Buffer) {
println!("{}", id.0.0[0]);
}
fn create_buffer() -> Buffer {
Buffer(Data([0x01, 0x01]))
}
fn main() {
let buffer = create_buffer();
// Since we don't use buffer again in this function, copy trait isn't necessary on Buffer.
// But, "Buffer" does implement the Copy trait. Will Rust copy it anyway?
do_something(buffer);
}
If rust does not copy it, are there rules to this behavior or is it entirely up to the compiler to decide? Can I rely on this not being copied?
What if I were to call do_something(buffer) twice? Does it copy twice, or move once and copy once? This requires the copy trait to even compile so I expect at least 1 copy.
do_something(buffer);
do_something(buffer);
move and copy are solely type system concerns denoting whether the source can or can not still be used (by high-level code) after it has been used once[0]. In the lingo, it only defines whether the type is affine (aka move, the default) or normal (Copy)
After typechecking, both result in the same actual operation: a memcopy (semantically), then the normal elision optimisations will work the same for both and may optimise away the actual copy.
So the answer would be "yes". At a codegen level, there really is no difference between Copy and non-Copy types as long as they are used in the same way. Do note that the elision optimisation may fail to trigger in both cases. In fact there are issues on the tracker (some closed and some not) where "move elision" fails to trigger, and a large on-stack non-Copy type gets copied around.
[0] I believe there is limited NRVO and plans for more in MIR / rustc itself, but the vast majority of the work is left to LLVM: https://github.com/rust-lang/rust/issues/32966

Is it possible to create an Arc<[T]> from a Vec<T>?

To be more specific, why doesn't Arc<T> implement from_raw with a dynamically sized T while Box<T> does?
use std::sync::Arc;
fn main() {
let x = vec![1, 2, 3].into_boxed_slice();
let y = Box::into_raw(x);
let z = unsafe { Arc::from_raw(y) }; // ERROR
}
(play)
As pointed out in the comments, Arc::from_raw must be used with a pointer from Arc::into_raw, so the above example doesn't make sense. My original question (Is it possible to create an Arc<[T]> from a Vec<T>) remains: is this possible, and if not, why?
As of Rust 1.21.0, you can do this:
let thing: Arc<[i32]> = vec![1, 2, 3].into();
This was enabled by RFC 1845:
In addition: From<Vec<T>> for Rc<[T]> and From<Box<T: ?Sized>> for Rc<T> will be added.
Identical APIs will also be added for Arc.
Internally, this uses a method called copy_from_slice, so the allocation of the Vec is not reused. For the details why, check out DK.'s answer.
No.
First of all, as already noted in comments, you can't toss raw pointers around willy-nilly like that. To quote the documentation of Arc::from_raw:
The raw pointer must have been previously returned by a call to a Arc::into_raw.
You absolutely must read the documentation any time you're using an unsafe method.
Secondly, the conversion you want is impossible. Vec<T> → Box<[T]> works because, internally, Vec<T> is effectively a (Box<[T]>, usize) pair. So, all the method does is give you access to that internal Box<[T]> pointer [1]. Arc<[T]>, however, is not physically compatible with a Box<[T]>, because it has to contain the reference counts. The thing being pointed to by Arc<T> has a different size and layout to the thing being pointed to by Box<T>.
The only way you could get from Vec<T> to Arc<[T]> would be to reallocate the contents of the vector in a reference-counted allocation... which I'm not aware of any way to do. I don't believe there's any particular reason it couldn't be implemented, it just hasn't [2].
All that said, I believe not being able to use dynamically sized types with Arc::into_raw/Arc::from_raw is a bug. It's certainly possible to get Arcs with dynamically sized types... though only by casting from pointers to fixed-sized types.
[1]: Not quite. Vec<T> doesn't actually have a Box<[T]> inside it, but it has something compatible. It also has to shrink the slice to not contain uninitialised elements.
[2]: Rust does not, on the whole, have good support for allocating dynamically sized things in general. It's possible that part of the reason for this hole in particular is that Box<T> also can't allocate arrays directly, which is possibly because Vec<T> exists, because Vec<T> used to be part of the language itself, and why would you add array allocation to Box when Vec already exists? "Why not have ArcVec<T>, then?" Because you'd never be able to construct one due to shared ownership.
Arc<[T]> is an Arc containing a pointer to a slice of T's. But [T] is not actually Sized at compile time since the compiler does not know how long it will be (versus &[T] which is just a reference and thus has a known size).
use std::sync::Arc;
fn main() {
let v: Vec<u32> = vec![1, 2, 3];
let b: Box<[u32]> = v.into_boxed_slice();
let y: Arc<[u32]> = Arc::new(*b);
print!("{:?}", y)
}
Play Link
However, you can make an Arc<&[T]> without making a boxed slice:
use std::sync::Arc;
fn main() {
let v = vec![1, 2, 3];
let y: Arc<&[u32]> = Arc::new(&v[..]);
print!("{:?}", y)
}
Shared Ref Play Link
However, this seems like a study in the type system with little practical value. If what you really want is a view of the Vec that you can pass around between threads, an Arc<&[T]> will give you what you want. And if you need it to be on the heap, Arc<Box<&[T]>> works fine too.

What is the `PhantomData` actually doing in the implementation of `Vec`? [duplicate]

This question already has answers here:
Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?
(2 answers)
Closed 5 years ago.
How does PhantomData work in Rust? In the Nomicon it says the following:
In order to tell dropck that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that.
To me that seems to imply that when we add a PhantomData field to a structure, say in the case of a Vec.
pub struct Vec<T> {
data: *mut T,
length: usize,
capacity: usize,
phantom: PhantomData<T>,
}
that the drop checker should forbid the following sequence of code:
fn main() -> () {
let mut vector = Vec::new();
let x = Box::new(1 as i32);
let y = Box::new(2 as i32);
let z = Box::new(3 as i32);
vector.push(x);
vector.push(y);
vector.push(z);
}
Since the freeing of x, y, and z would occur before the freeing of the Vec, I would expect some complaint from the compiler. However, if you run the code above there is no warning or error.
The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).

Why is std::rc::Rc<> not Copy?

Can someone explain to me why Rc<> is not Copy?
I'm writing code that uses a lot of shared pointers, and having to type .clone() all the time is getting on my nerves.
It seems to me that Rc<> should just consist of a pointer, which is a fixed size, so the type itself should be Sized and hence Copy, right?
Am I missing something?
It seems to me that Rc<> should just consist of a pointer, which is a fixed size, so the type itself should be Sized and hence Copy, right?
This is not quite true. Rc is short for Reference Counted. This means that the type keeps track of how many references point to the owned data. That way we can have multiple owners at the same time and safely free the data, once the reference count reaches 0.
But how do we keep the reference counter valid and up to date? Exactly, we have to do something whenever a new reference/owner is created and whenever a reference/owner is deleted. Specifically, we have to increase the counter in the former case and decrease it in the latter.
The counter is decreased by implementing Drop, the Rust equivalent of a destructor. This drop() function is executed whenever a variable goes out of scope – perfect for our goal.
But when do we do the increment? You guessed it: in clone(). The Copy trait, by definition, says that a type can be duplicated just by copying bits:
Types that can be copied by simply copying bits (i.e. memcpy).
This is not true in our case, because: yes, we "just copy bits", but we also do additional work! We do need to increment our reference counter!
Drop impl of Rc
Clone impl of Rc
A type cannot implement Copy if it implements Drop (source). Since Rc does implement it to decrement its reference count, it is not possible.
In addition, Rc is not just a pointer. It consists of a Shared:
pub struct Rc<T: ?Sized> {
ptr: Shared<RcBox<T>>,
}
Which, in turn, is not only a pointer:
pub struct Shared<T: ?Sized> {
pointer: NonZero<*const T>,
_marker: PhantomData<T>,
}
PhantomData is needed to express the ownership of T:
this marker has no consequences for variance, but is necessary for
dropck to understand that we logically own a T.
For details, see:
https://github.com/rust-lang/rfcs/blob/master/text/0769-sound-generic-drop.md#phantom-data

Resources