The nomicon says:
repr(transparent)
[...]
This repr is only considered part of the public ABI of a type if either the single field is pub, or if its layout is documented in prose. Otherwise, the layout should not be relied upon by other crates.
ManuallyDrop<T> is repr(transparent), but its single field is not pub. Its docs say:
[...] This wrapper is 0-cost.
ManuallyDrop<T> is subject to the same layout optimizations as T. [...]
Does this count as documenting its layout in the prose? Is it safe to assume that I can transmute or otherwise convert (e.g. pointer cast) from T to ManuallyDrop<T>?
This was actually just clarified with today's release of Rust 1.61.0. The documentation of ManuallyDrop now specifies (added text highlighted in bold):
ManuallyDrop<T> is guaranteed to have the same layout as T, and is subject
to the same layout optimizations as T.
Related
I was wondering, is there a way to know the list of all smart pointers in Rust std?
I know String and Vec<T> are smart pointers, and reading Chp. 15 of the Rust book, I am learning about Box<T>, Rc<T>, Ref<T> and RefMut<T>
I was just wondering, is there a place to know all the available smart pointers in Rust's std?
I don't think an all-encompassing list would be particularly useful since there are lots (especially many which serve more as an implementation detail of another type). If you really want a complete list of everything that's technically a smart pointer, then as eggyal pointed out in a comment on your question you could browse the implementors of Deref, but that will probably give you more noise than useful information. I've listed some of the most common and useful ones below:
Box<T> - a unique pointer to an object on the heap. Analogous to C++'s std::unique_ptr.
Rc<T>/Weak<T> - a shared pointer that provides shared ownership of a value on a single thread. This smart pointer cannot be sent between threads safely since it does not use atomic operations to maintain its refcount (the compiler will make sure you don't accidentally do this).
Arc<T>/Weak<T> - very similar to Rc except it uses atomic operations to update its refcount, and thus is thread-safe. Similar to std::shared_ptr.
Vec<T>/String/PathBuf/OsString et al. - all of these are smart pointers for owning dynamically allocated arrays of items on the heap. Read their documentation for more specific details.
Cow<'a, B> - a clone-on-write smart pointer. Useful for when you have a value that could be borrowed or owned.
The list above isn't the full picture but it will get you very far with most of the code you write.
As you've noted there are other smart pointers like Ref and MutexGuard. These are returned by types with interior mutability, and usually have some kind of specific behavior on drop, such as releasing a lock or decrementing a refcount. Usually you don't interact with these types as much, but you can read their documentation on an as-needed basis.
There is also Pin<T>, but this smart pointer is notoriously hard to understand and really only comes up in conversations about the implementation details of futures and generators. You can read more about it here.
Box<> is explained like this on the Rust Book:
... allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.
With a description like that, I would expect the described object to be called Heap<> or somethingHeapsomethingelse (DerefHeap, perhaps?). Instead, we use Box.
Why was the name Box chosen?
First, Heap is a very overloaded term, and importantly a heap is an abstract datastructure often used to implement things like priority queues. Having a type called Heap which is not a heap would be extremely confusing, a good reason to avoid that.
Second, "box" is related to the concept of "boxing" or "boxed" objects, in languages which strongly distinguish between value and reference types e.g. Java or Javascript: https://en.wikipedia.org/wiki/Object_type_(object-oriented_programming), in those a "boxed" type is the heap-allocated version of a value type e.g. int/Integer in java, or number/Number in Javascript.
Rust's Box performs an operation which is similar in spirit. Box also originally had a built-in "lifting" operator called box (it's still an internal operation and was originally planned to be stabilised for placement new), as such "box"/"boxing" makes sense linguistically in a way "heap"/"heaping" really does not (as "heaping" hints at a lot of things being put on a heap).
How is an Option laid out in memory? Since a i32 already takes up an even number of bytes, it Rust forced to use a full byte to store the single bit None/Some?
EDIT: According to this answer, Rust in fact uses an extra 4 (!) bytes. Why?
For structs and enums declared without special layout modifiers, the Rust docs state
Nominal types without a repr attribute have the default representation. Informally, this representation is also called the rust representation.
There are no guarantees of data layout made by this representation.
Option cannot possibly be repr(transparent) or repr(i*) since it is neither a newtype struct nor a fieldless enum, and we can check the source code and see that it's not declared repr(C). So no guarantees are made about the layout.
If it were declared repr(C), then we'd get the C representation, which is what you're envisioning. We need one integer to indicate whether it's None or Some (which size of integer is implementation-defined) and then we need enough space to store the actual i32.
In reality, since Rust is given a lot of leeway here, it can do clever things. If you have a variable which is only ever Some, it needn't store the tag bit (and, again, no guarantees are made about layout, so it's free to make this change internally). If you have an i32 that starts at 0 and goes up to 10, it's provably never negative, so Rust might choose to use, say, -1 to indicate None.
Is the exact layout of D structs defined? That is, the exact offset of every member defined and in a compiler-independent way? That would mean that the compiler would, fortunately or unfortunately, depending on your needs, be forbidden to reorder fields to get optimal packing of smaller items and minimise all offsets.
It is indeed illegal for the D compiler to rearrange members of a struct (though it can for classes). It's important that the compiler not rearrange members for structs, because structs are supposed to be able to be used for low-level stuff that requires specific memory layouts. It's also the case that structs need to be able to interact with C code, so they need to match what you'd get it in C (at least when extern(C) is used). So, structs definitely don't get their members rearranged. In addition, you can specific the alignment of members via the align attribute, so you have full control over the layout of a struct.
Now, the default layout can differ depending on the architecture (e.g. 64-bit pointers take up more space than 32-bit pointers, which will affect how the struct members are packed), but it should match what you get in C on that architecture.
I have an easy question regarding Box<X>.
I understand what it does, it allocates X on the heap.
In C++ you use the new operator to allocate something on the heap so it can outlive the current scope (because if you create something on the stack it goes away at the end of the current block).
But reading Rust's documentation, it looks like you can create something on the stack and still return it taking advantage of the language's move semantics without having to resort to the heap.
Then it's not clear to me when to use Box<X> as opposed to simply X.
I just started reading about Rust so I apologize if I'm missing something obvious.
First of all: C++11 (and newer) has move semantics with rvalue references, too. So your question would also apply to C++. Keep in mind though, that C++'s move semantics are -- unlike Rust's ones -- highly unsafe.
Second: the word "move semantic" somehow hints the absence of a "copy", which is not true. Suppose you have a struct with 100 64-bit integers. If you would transfer an object of this struct via move semantics, those 100 integers will be copied (of course, the compiler's optimizer can often remove those copies, but anyway...). The advantage of move semantics comes to play when dealing with objects that deal with some kind of data on the heap (or pointers in general).
For example, take a look at Vec (similar to C++'s vector): the type itself only contains a pointer and two pointer-sized integer (ptr, len and cap). Those three times 64bit are still copied when the vector is moved, but the main data of the vector (which lives on the heap) is not touched.
That being said, let's discuss the main question: "Why to use Box at all?". There are actually many use cases:
Unsized types: some types (e.g. Trait-objects which also includes closures) are unsized, meaning their size is not known to the compiler. But the compiler has to know the size of each stack frame -- hence those unsized types cannot live on the stack.
Recursive data structures: think of a BinaryTreeNode struct. It saves two members named "left" and "right" of type... BinaryTreeNode? That won't work. So you can box both children so that the compiler knows the size of your struct.
Huge structs: think of the 100 integer struct mentioned above. If you don't want to copy it every time, you can allocate it on the heap (this happens pretty seldom).
There are cases where you can’t return X eg. if X is ?Sized (traits, non-compile-time-sized arrays, etc.). In those cases Box<X> will still work.