What does the "box" keyword do? - rust

In Rust, we can use the Box<T> type to allocate things on the heap. This type is used to safely abstract pointers to heap memory. Box<T> is provided by the Rust standard library.
I was curious about how Box<T> allocation is implemented, so I found its source code. Here is the code for Box<T>::new (as of Rust 1.0):
impl<T> Box<T> {
/// Allocates memory on the heap and then moves `x` into it.
/// [...]
#[stable(feature = "rust1", since = "1.0.0")]
#[inline(always)]
pub fn new(x: T) -> Box<T> {
box x
}
}
The only line in the implementation returns the value box x. This box keyword is not explained anywhere in the official documentation; in fact, it is only mentioned briefly on the std::boxed documentation page.

NOTE: This reply is a bit old. Since it talks about internals and unstable features, things have changed a little bit. The basic mechanism remains the same though, so the answer is still capable of explaining the underlying mechanisms of box.
What does box x usually uses to allocate and free memory?
The answer is the functions marked with lang items exchange_malloc for allocation and exchange_free for freeing. You can see the implementation of those in the default standard library at heap.rs#L112 and heap.rs#L125.
In the end the box x syntax depends on the following lang items:
owned_box on a Box struct to encapsulate the allocated pointer. This struct does not need a Drop implementation, it is implemented automatically by the compiler.
exchange_malloc to allocate the memory.
exchange_free to free the previously allocated memory.
This can be effectively seen in the lang items chapter of the unstable rust book using this no_std example:
#![feature(lang_items, box_syntax, start, no_std, libc)]
#![no_std]
extern crate libc;
extern {
fn abort() -> !;
}
#[lang = "owned_box"]
pub struct Box<T>(*mut T);
#[lang = "exchange_malloc"]
unsafe fn allocate(size: usize, _align: usize) -> *mut u8 {
let p = libc::malloc(size as libc::size_t) as *mut u8;
// malloc failed
if p as usize == 0 {
abort();
}
p
}
#[lang = "exchange_free"]
unsafe fn deallocate(ptr: *mut u8, _size: usize, _align: usize) {
libc::free(ptr as *mut libc::c_void)
}
#[start]
fn main(argc: isize, argv: *const *const u8) -> isize {
let x = box 1;
0
}
#[lang = "stack_exhausted"] extern fn stack_exhausted() {}
#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] fn panic_fmt() -> ! { loop {} }
Notice how Drop was not implemented for the Box struct? Well let's see the LLVM IR generated for main:
define internal i64 #_ZN4main20hbd13b522fdb5b7d4ebaE(i64, i8**) unnamed_addr #1 {
entry-block:
%argc = alloca i64
%argv = alloca i8**
%x = alloca i32*
store i64 %0, i64* %argc, align 8
store i8** %1, i8*** %argv, align 8
%2 = call i8* #_ZN8allocate20hf9df30890c435d76naaE(i64 4, i64 4)
%3 = bitcast i8* %2 to i32*
store i32 1, i32* %3, align 4
store i32* %3, i32** %x, align 8
call void #"_ZN14Box$LT$i32$GT$9drop.103617h8817b938807fc41eE"(i32** %x)
ret i64 0
}
The allocate (_ZN8allocate20hf9df30890c435d76naaE) was called as expected to build the Box, meanwhile... Look! A Drop method for the Box (_ZN14Box$LT$i32$GT$9drop.103617h8817b938807fc41eE)! Let's see the IR for this method:
define internal void #"_ZN14Box$LT$i32$GT$9drop.103617h8817b938807fc41eE"(i32**) unnamed_addr #0 {
entry-block:
%1 = load i32** %0
%2 = ptrtoint i32* %1 to i64
%3 = icmp ne i64 %2, 2097865012304223517
br i1 %3, label %cond, label %next
next: ; preds = %cond, %entry- block
ret void
cond: ; preds = %entry-block
%4 = bitcast i32* %1 to i8*
call void #_ZN10deallocate20he2bff5e01707ad50VaaE(i8* %4, i64 4, i64 4)
br label %next
}
There it is, deallocate (ZN10deallocate20he2bff5e01707ad50VaaE) being called on the compiler generated Drop!
Notice even on the standard library the Drop trait is not implemented by user-code. Indeed Box is a bit of a magical struct.

Before box was marked as unstable, it was used as a shorthand for calling Box::new. However, it's always been intended to be able to allocate arbitrary types, such as Rc, or to use arbitrary allocators. Neither of these have been finalized, so it wasn't marked as stable for the 1.0 release. This is done to prevent supporting a bad decision for all of Rust 1.x.
For further reference, you can read the RFC that changed the "placement new" syntax and also feature gated it.

box does exactly what Box::new() does - it creates an owned box.
I believe that you can't find implementation of box keyword because currently it is hardcoded to work with owned boxes, and Box type is a lang item:
#[lang = "owned_box"]
#[stable(feature = "rust1", since = "1.0.0")]
#[fundamental]
pub struct Box<T>(Unique<T>);
Because it is a lang item, the compiler has special logic to handle its instantiation which it can link with box keyword.
I believe that the compiler delegates box allocation to functions in alloc::heap module.
As for what box keyword does and supposed to do in general, Shepmaster's answer describes perfectly.

Related

Why does moving a value generate LLVM IR that includes a value copy?

If I assign a variable (with a type without copy semantics) to a second one, rustc creates a copy of the value even if the first one cannot be used/accessed anymore.
For example:
#[derive(Debug)]
struct T {
val: i32,
}
fn main() {
let v1 = T { val: 123 };
let v2 = v1;
// println!("{:?}", v1); error!
println!("{:?}", v2);
}
This generates LLVM IR which contains two store operations:
%v2 = alloca i32, align 4
%v1 = alloca i32, align 4
store i32 123, i32* %v1, align 4
store i32 123, i32* %v2, align 4
Isn't it unnecessary workload to copy values with move semantics if the first copy cannot be used anymore?
LLVM was specifically designed to perform optimizations. A front-end will try to encode everything it knows about the operations being performed in the LLVM IR and then invoke optimizations upon the LLVM IR.
If you turn on optimizations you'll indeed only get:
%v2 = alloca i32, align 4
%0 = bitcast i32* %v2 to i8*, !dbg !36
call void #llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0), !dbg !36
store i32 123, i32* %v2, align 4, !dbg !37
%v1 has been optimized away entirely.
https://godbolt.org/z/fW9457eEn
Because copy vs move is language semantics not how it is done in the generated code. Rust always (maybe that have changed?) generates copy and then just logically disallows usage of the moved value. It relies on LLVM ability to remove unneeded copies.

LLVM produced by rustc gives error about argument type of main when run with lli

I'm trying to learn a little about the LLVM IR, particularly what exactly rustc outputs. I'm having a little bit of trouble running even a very simple case.
I put the following in a source file simple.rs:
fn main() {
let x = 7u32;
let y = x + 2;
}
and run rustc --emit llvm-ir simple.rs to get the file simple.ll, containing
; ModuleID = 'simple.cgu-0.rs'
source_filename = "simple.cgu-0.rs"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: uwtable
define internal void #_ZN6simple4main17h8ac50d7470339b75E() unnamed_addr #0 {
start:
br label %bb1
bb1: ; preds = %start
ret void
}
define i64 #main(i64, i8**) unnamed_addr {
top:
%2 = call i64 #_ZN3std2rt10lang_start17ha09816a4e25587eaE(void ()* #_ZN6simple4main17h8ac50d7470339b75E, i64 %0, i8** %1)
ret i64 %2
}
declare i64 #_ZN3std2rt10lang_start17ha09816a4e25587eaE(void ()*, i64, i8**) unnamed_addr
attributes #0 = { uwtable }
!llvm.module.flags = !{!0}
!0 = !{i32 1, !"PIE Level", i32 2}
I then try to run this with the command
lli-3.9 -load ~/.multirust/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-35ad9950c7e5074b.so simple.ll
but I get the error message
LLVM ERROR: Invalid type for first argument of main() supplied
I'm able to make a minimal reproduction of this as follows: I make a file called s2.ll, containing
define i32 #main(i64, i8**) {
ret i32 42
}
and running lli-3.9 s2.ll gives the same error message. But if I change the contents of s2.ll to
define i32 #main(i32, i8**) {
ret i32 42
}
(i.e. I've changed the type of argc in main) then lli-3.9 s2.ll runs, and echo $? reveals that it did indeed return 42.
I don't think I should have to pass in the i64 explicitly - my argument list or C strings should be put into memory somewhere and the pointer and length passed to main automatically, right? Therefore I assume that I'm doing something wrong in the way in invoke lli - but I have no idea what.
Rust marks its entry point (the function marked with #[start] attribute, by default the function lang_start in the standard library) as taking an argc parameter of type isize. This is a bug because it should have the type of a C int, so it should be 32 bits on a 64-bit platform, but isize is 64 bits. However, due to the way 64-bit calling conventions work, this happens to still work correctly. The same issue also exists for the return type.
A fix for this has been committed on 2017-10-01 and should be present in Rust 1.22.
lli is apparently more strict about checking the type of main which is why it gives the error. But if you use llc instead, it should work correctly.
To get the correct main signature, you can cancel the default main by putting #![no_main] at the top of the module, and provide your own main marked with #[no_mangle]. But note that this will skip the standard library's initialization.
#![no_main]
#[no_mangle]
pub extern fn main(_argc: i32, _argv: *const *const u8) -> i32 {
0
}
See also:
documentation about lang_items and disabling main
Tracking issue for #[start] feature, where some people mention isize not being correct.

Why does Rust store i64s captured by a closure as i64*s in the LLVM IR closure environment?

In this simple example
#[inline(never)]
fn apply<F, A, B>(f: F, x: A) -> B
where F: FnOnce(A) -> B {
f(x)
}
fn main() {
let y: i64 = 1;
let z: i64 = 2;
let f = |x: i64| x + y + z;
print!("{}", apply(f, 42));
}
the closure passed to apply is passed as a LLVM IR {i64*, i64*}*:
%closure = type { i64*, i64* }
define internal fastcc i64 #apply(%closure* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 personality i32 (i32, i32, i64, %"8.unwind::libunwind::_Unwind_Exception"*, %"8.unwind::libunwind::_Unwind_Context"*)* #rust_eh_personality {
entry-block:
%1 = getelementptr inbounds %closure, %closure* %0, i64 0, i32 1
%2 = getelementptr inbounds %closure, %closure* %0, i64 0, i32 0
%3 = load i64*, i64** %2, align 8
%4 = load i64*, i64** %1, align 8
%.idx.val.val.i = load i64, i64* %3, align 8, !noalias !1
%.idx1.val.val.i = load i64, i64* %4, align 8, !noalias !1
%5 = add i64 %.idx.val.val.i, 42
%6 = add i64 %5, %.idx1.val.val.i
ret i64 %6
}
(apply actually has a more complicated name in the generated LLVM code.)
This causes two loads to get to each of the captured variables. Why isn't %closure just {i64, i64} (which would make the argument to apply {i64, i64}*)?
Closures capture by reference by default. You can change that behavior to capture by value by adding the move keyword before the parameter list:
let f = move |x: i64| x + y + z;
This generates much leaner code:
define internal fastcc i64 #apply(i64 %.0.0.val, i64 %.0.1.val) unnamed_addr #0 personality i32 (i32, i32, i64, %"8.unwind::libunwind::_Unwind_Exception"*, %"8.unwind::libunwind::_Unwind_Context"*)* #rust_eh_personality {
entry-block:
%0 = add i64 %.0.0.val, 42
%1 = add i64 %0, %.0.1.val
ret i64 %1
}
Adding the move keyword means that any value that the closure uses will be moved into the closure's environment. In the case of integers, which are Copy, it doesn't make much difference, but in the case of other types like String, it means that you can't use the String anymore in the outer scope after creating the closure. It's an all-or-nothing deal, but you can manually take references to individual variables outside a move closure and have the closure use these references instead of the original values to get manual capture-by-reference behavior.
Can you observe the value vs ref difference somehow in this code?
If you take the address of the captured variable, you can observe the difference. Notice how the first and second output lines are the same, and the third is different.

Do copy semantics in Rust literally become a copy in memory?

Let's say I have the following struct in Rust:
struct Num {
pub num: i32;
}
impl Num {
pub fn new(x: i32) -> Num {
Num { num: x }
}
}
impl Clone for Num {
fn clone(&self) -> Num {
Num { num: self.num }
}
}
impl Copy for Num { }
impl Add<Num> for Num {
type Output = Num;
fn add(self, rhs: Num) -> Num {
Num { num: self.num + rhs.num }
}
}
And then I have the following code snippet:
let a = Num::new(0);
let b = Num::new(1);
let c = a + b;
let d = a + b;
This works because Num is marked as Copy. Otherwise, the second addition would be a compilation error, since a and b had already been moved into the add function during the first addition (I think).
The question is what the emitted assembly does. When the add function is called, are two copies of the arguments made into the new stack frame, or is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?
If the Rust compiler isn't smart enough, and actually does the copying like a function with a argument passed by value in C++, how do you avoid the performance overhead in cases where it matters?
The context is I am implementing a matrix class (just to learn), and if I have a 100x100 matrix, I really don't want to be invoking two copies every time I try to do a multiply or add.
The question is what the emitted assembly does
There's no need to guess; you can just look. Let's use this code:
use std::ops::Add;
#[derive(Copy, Clone, Debug)]
struct Num(i32);
impl Add for Num {
type Output = Num;
fn add(self, rhs: Num) -> Num {
Num(self.0 + rhs.0)
}
}
#[inline(never)]
fn example() -> Num {
let a = Num(0);
let b = Num(1);
let c = a + b;
let d = a + b;
c + d
}
fn main() {
println!("{:?}", example());
}
Paste it into the Rust Playground, then select the Release mode and view the LLVM IR:
Search through the result to see the definition of the example function:
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 #_ZN10playground7example17h60e923840d8c0cd0E() unnamed_addr #2 {
start:
ret i32 2
}
That's right, this was completely and totally evaluated at compile time and simplified all the way down to a simple constant. Compilers are pretty good nowadays.
Maybe you want to try something not quite as hardcoded?
#[inline(never)]
fn example(a: Num, b: Num) -> Num {
let c = a + b;
let d = a + b;
c + d
}
fn main() {
let something = std::env::args().count();
println!("{:?}", example(Num(something as i32), Num(1)));
}
Produces
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 #_ZN10playground7example17h73d4138fe5e9856fE(i32 %a) unnamed_addr #3 {
start:
%0 = shl i32 %a, 1
%1 = add i32 %0, 2
ret i32 %1
}
Oops, the compiler saw that we we basically doing (x + 1) * 2, so it did some tricky optimizations here to get to 2x + 2. Let's try something harder...
#[inline(never)]
fn example(a: Num, b: Num) -> Num {
a + b
}
fn main() {
let something = std::env::args().count() as i32;
let another = std::env::vars().count() as i32;
println!("{:?}", example(Num(something), Num(another)));
}
Produces
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 #_ZN10playground7example17h73d4138fe5e9856fE(i32 %a, i32 %b) unnamed_addr #3 {
start:
%0 = add i32 %b, %a
ret i32 %0
}
A simple add instruction.
The real takeaway from this is:
Look at the generated assembly for your cases. Even similar-looking code might optimize differently.
Perform micro and macro benchmarking. You never know exactly how the code will play out in the big picture. Maybe all your cache will be blown, but your micro benchmarks will be stellar.
is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?
As you just saw, the Rust compiler plus LLVM are pretty smart. In general, it is possible to elide copies when it knows that the operand isn't needed. Whether it will work in your case or not is tough to answer.
Even if it did, you might not want to be passing large items via the stack as it's always possible that it will need to be copied.
And note that you don't have to implement copy for the value, you can choose to only allow it via references:
impl<'a, 'b> Add<&'b Num> for &'a Num {
type Output = Num;
fn add(self, rhs: &'b Num) -> Num {
Num(self.0 + rhs.0)
}
}
In fact, you may want to implement both ways of adding them, and maybe all 4 permutations of value / reference!
If the Rust compiler isn't smart enough, and actually does the copying like a function with a argument passed by value in C++, how do you avoid the performance overhead in cases where it matters?
The context is I am implementing a matrix class (just to learn), and if I have a 100x100 matrix, I really don't want to be invoking two copies every time I try to do a multiply or add.
All of Rust's implicit copies (be them from moves or actual Copy types) are a shallow memcpy. If you heap allocate, only the pointers and such are copied. Unlike C++, passing a vector by value will only copy three pointer-sized values.
To copy heap memory, an explicit copy must be made, normally by calling .clone(), implemented with #[derive(Clone)] or impl Clone.
I've talked in more detail about this elsewhere.
Shepmaster points out that shallow copies are often messed with a lot by the compiler - generally only heap memory and massive stack values cause problems.

What happens when a stack-allocated value is boxed?

If we have a value that is already allocated on stack, will boxing copy it to heap and then transfer ownership (that's how it works in .NET, with the exception that both copies will stay alive)? Or will the compiler be "smart" enough to allocate it directly on heap from the beginning?
struct Foo {
x: i32,
}
fn main() {
// a is allocated on stack?
let a = Foo { x: 1 };
// if a is not used, it will be optimized out
println!("{}", a.x);
// what happens here? will the stack allocated structure
// be moved to heap? or was it originally allocated on heap?
let b = Box::new(a);
}
I'm not a specialist in assembler, but this looks like it is actually allocated on stack and then moved: http://pastebin.com/8PzsgTJ1. But I need a confirmation from someone who actually knows what is happening.
It would be pretty strange for this optimization to happen as you describe it. For example, in this code:
let a = Foo { x: 1 };
// operation that observes a
let b = Box::new(a);
// operation that observes b
&a and &b would be equal, which would be surprising. However, if you do something similar, but don't observe a:
#[inline(never)]
fn frobnotz() -> Box<Foo> {
let a = Foo { x: 1 };
Box::new(a)
}
You can see via the LLVM IR that this case was optimized:
define internal fastcc noalias dereferenceable(4) %Foo* #_ZN8frobnotz20h3dca7bc0ee8400bciaaE() unnamed_addr #0 {
entry-block:
%0 = tail call i8* #je_mallocx(i64 4, i32 0)
%1 = icmp eq i8* %0, null
br i1 %1, label %then-block-106-.i.i, label %"_ZN5boxed12Box$LT$T$GT$3new20h2665038481379993400E.exit"
then-block-106-.i.i: ; preds = %entry-block
tail call void #_ZN3oom20he7076b57c17ed7c6HYaE()
unreachable
"_ZN5boxed12Box$LT$T$GT$3new20h2665038481379993400E.exit": ; preds = %entry-block
%2 = bitcast i8* %0 to %Foo*
%x.sroa.0.0..sroa_idx.i = bitcast i8* %0 to i32*
store i32 1, i32* %x.sroa.0.0..sroa_idx.i, align 4
ret %Foo* %2
}
Similarly, you can return the struct on the stack and then box it up, and there will still just be the one allocation:
You may think that this gives us terrible performance: return a value and then immediately box it up ?! Isn't this pattern the worst of both worlds? Rust is smarter than that. There is no copy in this code. main allocates enough room for the box, passes a pointer to that memory into foo as x, and then foo writes the value straight into the Box.
As explained in the official Rust documentation here, Box<T>::new(x: T) allocates memory on the heap and then moves the argument into that memory. Accessing a after let b = Box::new(a) is a compile-time error.

Resources