Box<X> vs move semantics on X - rust

I have an easy question regarding Box<X>.
I understand what it does, it allocates X on the heap.
In C++ you use the new operator to allocate something on the heap so it can outlive the current scope (because if you create something on the stack it goes away at the end of the current block).
But reading Rust's documentation, it looks like you can create something on the stack and still return it taking advantage of the language's move semantics without having to resort to the heap.
Then it's not clear to me when to use Box<X> as opposed to simply X.
I just started reading about Rust so I apologize if I'm missing something obvious.

First of all: C++11 (and newer) has move semantics with rvalue references, too. So your question would also apply to C++. Keep in mind though, that C++'s move semantics are -- unlike Rust's ones -- highly unsafe.
Second: the word "move semantic" somehow hints the absence of a "copy", which is not true. Suppose you have a struct with 100 64-bit integers. If you would transfer an object of this struct via move semantics, those 100 integers will be copied (of course, the compiler's optimizer can often remove those copies, but anyway...). The advantage of move semantics comes to play when dealing with objects that deal with some kind of data on the heap (or pointers in general).
For example, take a look at Vec (similar to C++'s vector): the type itself only contains a pointer and two pointer-sized integer (ptr, len and cap). Those three times 64bit are still copied when the vector is moved, but the main data of the vector (which lives on the heap) is not touched.
That being said, let's discuss the main question: "Why to use Box at all?". There are actually many use cases:
Unsized types: some types (e.g. Trait-objects which also includes closures) are unsized, meaning their size is not known to the compiler. But the compiler has to know the size of each stack frame -- hence those unsized types cannot live on the stack.
Recursive data structures: think of a BinaryTreeNode struct. It saves two members named "left" and "right" of type... BinaryTreeNode? That won't work. So you can box both children so that the compiler knows the size of your struct.
Huge structs: think of the 100 integer struct mentioned above. If you don't want to copy it every time, you can allocate it on the heap (this happens pretty seldom).

There are cases where you can’t return X eg. if X is ?Sized (traits, non-compile-time-sized arrays, etc.). In those cases Box<X> will still work.

Related

Why is Box called like that in Rust?

Box<> is explained like this on the Rust Book:
... allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.
With a description like that, I would expect the described object to be called Heap<> or somethingHeapsomethingelse (DerefHeap, perhaps?). Instead, we use Box.
Why was the name Box chosen?
First, Heap is a very overloaded term, and importantly a heap is an abstract datastructure often used to implement things like priority queues. Having a type called Heap which is not a heap would be extremely confusing, a good reason to avoid that.
Second, "box" is related to the concept of "boxing" or "boxed" objects, in languages which strongly distinguish between value and reference types e.g. Java or Javascript: https://en.wikipedia.org/wiki/Object_type_(object-oriented_programming), in those a "boxed" type is the heap-allocated version of a value type e.g. int/Integer in java, or number/Number in Javascript.
Rust's Box performs an operation which is similar in spirit. Box also originally had a built-in "lifting" operator called box (it's still an internal operation and was originally planned to be stabilised for placement new), as such "box"/"boxing" makes sense linguistically in a way "heap"/"heaping" really does not (as "heaping" hints at a lot of things being put on a heap).

Rust Storage of Option<i32>

How is an Option laid out in memory? Since a i32 already takes up an even number of bytes, it Rust forced to use a full byte to store the single bit None/Some?
EDIT: According to this answer, Rust in fact uses an extra 4 (!) bytes. Why?
For structs and enums declared without special layout modifiers, the Rust docs state
Nominal types without a repr attribute have the default representation. Informally, this representation is also called the rust representation.
There are no guarantees of data layout made by this representation.
Option cannot possibly be repr(transparent) or repr(i*) since it is neither a newtype struct nor a fieldless enum, and we can check the source code and see that it's not declared repr(C). So no guarantees are made about the layout.
If it were declared repr(C), then we'd get the C representation, which is what you're envisioning. We need one integer to indicate whether it's None or Some (which size of integer is implementation-defined) and then we need enough space to store the actual i32.
In reality, since Rust is given a lot of leeway here, it can do clever things. If you have a variable which is only ever Some, it needn't store the tag bit (and, again, no guarantees are made about layout, so it's free to make this change internally). If you have an i32 that starts at 0 and goes up to 10, it's provably never negative, so Rust might choose to use, say, -1 to indicate None.

Rust Box vs non-box

Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.
Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.

Rust seems to allocate the same space in memory for an array of booleans as an array of 8 bit integers

Running this code in rust:
fn main() {
println!("{:?}", std::mem::size_of::<[u8; 1024]>());
println!("{:?}", std::mem::size_of::<[bool; 1024]>());
}
1024
1024
This is not what I expected. So I compiled and ran in release mode. But I got the same answer.
Why does the rust compiler seemingly allocate a whole byte for each single boolean? To me it seems to be a simple optimization to only allocate 128 bytes instead. This project implies I'm not the first to think this.
Is this a case of compilers being way harder than the seem? Or is this not optimized because it isn't a realistic scenario? Or am I not understanding something here?
Pointers and references.
There is an assumption that you can always take a reference to an item of a slice, a field of a struct, etc...
There is an assumption in the language that any reference to an instance of a statically sized type can transmuted to a type-erased pointer *mut ().
Those two assumptions together mean that:
due to (2), it is not possible to create a "bit-reference" which would allow sub-byte addressing,
due to (1), it is not possible not to have references.
This essentially means that any type must have a minimum alignment of one byte.
Note that this is not necessarily an issue. Opting in to a 128 bytes representation should be done cautiously, as it implies trading off speed (and convenience) for memory. It's not a pure win.
Prior art (in the name of std::vector<bool> in C++) is widely considered a mistake in hindsight.

Stack allocation of `isbits` types in Julia

Summary of question and answers
Objects of a particular type, say
type Foo
a::A
b::B
end
can be stored in either of two ways:
Inlined (aka by value): in this case, the statement "variable foo::Foo is stored at location x" effectively means we have a variable foo.a::A at location x and a variable foo.b::B at location x + sizeof(A) (technically the addresses could be a bit more complicated, but that's irrelevant for our purposes).
Referenced (aka by reference): "foo::Foo is stored at location x" means the location x contains a pointer fooptr::Ptr{Foo} such that there is a variable foo.a::A at location fooptr and foo.b::B at location fooptr + sizeof(A).
Unlike other languages (I'm looking at you, C/C++), Julia decides by itself whether to store variables inlined or referenced, and it does so based on the properties of the type:
mutable types -> referenced,
immutable types -> referenced if at least one of its fields is referenced, inlined otherwise.
There are at least two reasons for this rule:
StefanKarpinski's answer: The garbage collector needs be able to find all pointers to heap-allocated objects on the stack. Currently, Julia ensures this by storing all such pointers on a separate "shadow stack", but if we allowed composite types containing pointers to be placed on the stack then such a neat separation would no longer be possible. Instead, the compiler would need to look for pointers among other variables which poses technical difficulties.
yuyichao's answer: Julia requires the inline/reference decision to be made on a per-type rather than per-object basis, which means a hypothetical type
immutable A
a::A
end
would have to be infinitely big if we insisted on inlining it. So we would either have to forbid such recursive immutable types, or we could at most allow non-recursive immutable types to be inlined.
Original question
My understanding of memory management in Julia is:
mutable types -> heap-allocated,
immutable types and tuples -> stack-allocated unless one of their fields is heap-allocated (i.e. mutable).
I don't quite understand the rationale for this behaviour, however. I've read somewhere that the problem with stack-allocating immutables with pointers to mutables is that then the garbage collector might consider the mutables unreachable and destroy them prematurely. On the other hand, if we place the immutable on the heap then there will still be a pointer to the mutables, so it might seem like we avoided the problem, but actually we just shifted it to making sure that now the immutable itself will not be destroyed.
Can anyone explain this to me who has only very superficial knowledge of how garbage collection works?
The problem with stack-allocation of objects which reference other objects is knowing that they need to be traced during garbage collection. The simplest way to do this is what Julia does: heap allocate the objects and "root" them using "shadow stack" which is pushed and popped in sync with the actual stack. This introduces a fair bit of overhead and forces these objects to be heap allocated.
A more sophisticated approach that avoids the overhead of a shadow stack and heap allocation is to stack allocated these objects and then scan the stack which doing garbage collection and follow references from objects in the stack to objects on the heap. However, this requires knowing which objects in the stack are pointers to objects on the heap – in general, non heap-allocated objects are not guaranteed to be kept intact or contiguous in registers or the stack. One approach to doing this is called "conservative stack scanning" which entails assuming during gc that any value on the stack which looks like it could be a pointer to an object on the heap actually is. That approach has been successfully used in applications like Safari's JavaScript engine, but it's not without it's challenges. We've contemplated using conservative stack scanning in Julia, and an initial effort to do so was started but the effort was never completed.
References:
https://github.com/JuliaLang/julia/issues/11714
https://github.com/JuliaLang/julia/pull/8134
There are multiple issues/concepts that are frequently mixed together whenever this is brought up.
mutable or non-pointerfree immutable doesn't necessarily mean heap allocation, we already have optimization passes to elide some of the optimizations and are working on improving them further.
The object layout ABI is an user visible behavior and not something an optimization pass can easily change (unless it can prove that the local optimization it wants to do does not escape). The current ABI is that only isbits immutable will be stored inline (and "stack allocated" when used as local variable). There's a fundamental limitation of lifting the requirement of pointerfree-ness for inlined object, i.e. the necessity to handle recursive types. It is impossible to make all types in a reference circle stored inline and the loop has to be broken somewhere if we want to make some of them inlined. I believe we do have a consistent and predictable model to do this though whether this is desireable is another issue.
This is somewhat related to performance but not always. Stored inline means more copy so it's hard to make sure there's no regression if we do the switch.
Edit: And I should also mention that pointer-free is a sufficient condition for cycle free and is easier to compute, which is partly why we are currently using it to break inlining cycles.
GC support. This is basically the easiest part. It's very easy to make GC recognize pointers on the stack. It just needs to be done if we decide to change the object layout ABI.
Edit: And I should add that "GC support" is needed because we currently only support a limited / simple stack layout for object reference (i.e. an array of pointers). It's this that needs to be improved.

Resources