I just read the documentation of the std::boxed module and encountered a sentence
For non-zero-sized values, a Box will use the Global allocator for its allocation
What are non-zero-sized values (in this context and in general)?
Non-zero-sized types are types that occupy one or more bytes in memory. This is typical of most types, since the point of most data structures is to, well, store data.
Zero-sized types are types that do not occupy any space in memory, like () and PhantomData. They have certain uses, but allocators need to handle them specially.
Related
This may sound like a question with an obvious answer, but please bear with me.
Why is it useful that the Rust compiler knows the size in memory a type takes at compile time?
Is this used for some optimization? if so what kinds?
Can one infer the size of the binary produced from the size of the types? That is, will a program using the i128 type produce a bigger binary compared to a program that uses u32? Or this "knowing of types" only affects the size of the memory the program will use at runtime?
Knowing the size of the types in your program lets the compiler do a bunch of things:
Allocate the correct amount of memory for stack frames (since all your local variables are on the stack).
Allocate the correct amount of memory when heap-allocating a value.
Reason about things in the type system regarding sized-ness.
Additionally, this is beside any optimizations that the compiler can do, for example: size_of::<Option<Box<usize>>>() == size_of::<Box<usize>>(), since a null pointer is an invalid bit pattern for Box and can therefore be used to represent None.
This doesn't really affect the size of your binary in any meaningful way. Chances are that the biggest thing which affects that is actual code rather than data. Technically, different types often have different instructions, so yes, different types could result in differently sized binaries, however any difference because of that is probably overshadowed by other things (such as generic monomorphization of functions).
At runtime, the size of your data does matter however. For example, 4 million u8s have the size of 4mb, however 4 million u128s have the size of 64mb.
However, knowing the size of the types in your program is simply part of compilation -- it's not something that can be turned off and is something that the compiler could not function without knowing.
Box<> is explained like this on the Rust Book:
... allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.
With a description like that, I would expect the described object to be called Heap<> or somethingHeapsomethingelse (DerefHeap, perhaps?). Instead, we use Box.
Why was the name Box chosen?
First, Heap is a very overloaded term, and importantly a heap is an abstract datastructure often used to implement things like priority queues. Having a type called Heap which is not a heap would be extremely confusing, a good reason to avoid that.
Second, "box" is related to the concept of "boxing" or "boxed" objects, in languages which strongly distinguish between value and reference types e.g. Java or Javascript: https://en.wikipedia.org/wiki/Object_type_(object-oriented_programming), in those a "boxed" type is the heap-allocated version of a value type e.g. int/Integer in java, or number/Number in Javascript.
Rust's Box performs an operation which is similar in spirit. Box also originally had a built-in "lifting" operator called box (it's still an internal operation and was originally planned to be stabilised for placement new), as such "box"/"boxing" makes sense linguistically in a way "heap"/"heaping" really does not (as "heaping" hints at a lot of things being put on a heap).
Say I have a value type Foo, and a method Bar which accepts a reference to a Foo. Most languages will allow me to allocate a new Foo on the stack, and will automatically box it when I try and pass it in to Bar. However, as far as I am aware, this involves copying the Foo value onto the heap, and then using that reference.
Is it possible for a language to include a way of allocating a garbage collected object on the stack? When the method ends, the runtime could check if the object is still in use, and only then would it need to allocate the object on the heap, and update the references.
I imagine this would improve performance for methods that do not keep the reference, and it would hinder performance for methods that do.
Yes, Graal's partial escape analysis does that. While regular EA can only stack-allocate (more precisely: decompose into fields, put fields onto stack) when the object doesn't escape partial EA can optimistically allocate on the stack and only reify the data into an object on uncommon cases where the object must exist.
Also note that garbage collection is not a binary choice. You can have environments that mix and match garbage-collection, ref-counting, arena or scope-based allocators with automatic deallocation and completely manual management. In such a case stack allocations could also be one of the latter things while some heap would be garbage-collected.
I'm confused by what seem to be conflicting statements in the documentation for vectors in Rust:
A ‘vector’ is a dynamic or ‘growable’ array, implemented as the standard library type Vec<T>.
and
Vectors store their contents as contiguous arrays of T on the heap. This means that they must be able to know the size of T at compile time (that is, how many bytes are needed to store a T?). The size of some things can't be known at compile time. For these you'll have to store a pointer to that thing: thankfully, the Box type works perfectly for this.
Rust vectors are dynamically growable, but I don't see how that fits with the statement that their size must be known at compile time.
It's been a while since I've worked with a lower-level language where I have to think about memory allocation so I'm probably missing something obvious.
Note the wording:
they must be able to know the size of T
This says that the size of an individual element must be known. The total count of elements, and thus the total amount of memory allocated, is not known.
When the vector allocates memory, it says "I want to store 12 FooBar structs. One FooBar is 24 bytes, therefore I need to allocate 288 bytes total".
The 12 is the dynamic capacity of the vector, the 24 is the static size of one element (the T).
I thought one of the big features of Rust is being a systems language comparable to C but with a garbage collector. If this is the case, why do you need to return values of a static size (or use Box from what I gather)?
Why does Rust need to return static sizes?
Every value in every language needs to have a static size. That's how the compiler / interpreter / runtime / virtual machine / hardware knows how to access the bits that make up the value.
In many languages, every value is comparable to a Rust Box, so they all take up one or two pointer's worth of space. The statically-known size for those values allows a layer of indirection which can point to something with a runtime-determined size.
In Rust (and C, C++, probably other system languages), you can also directly store arbitrary values on the stack, unboxed. In these cases, you still need to know the size that the value will occupy.
This is a simplification, as some languages allow certain specific values to reside on the stack, while others "embed" certain value types inside of the fixed-size indirection. Tricks like these are usually for performance reasons.
but with a garbage collector
Rust does not have a garbage collector. It does have smart pointers that deallocate resources when the pointer goes out of scope.
Box is the obvious smart pointer, but there's also Rc and Arc.