Struct padding rules in Rust - struct

Recently when I was learning Type Layout in Rust, I saw that struct in Rust supports the #[repr(C)] directive, so I want to to see the difference between the default (Rust) representation and C-like representation. Here comes the code:
use type_layout::TypeLayout;
#[derive(TypeLayout)]
struct ACG1 {
time1: u16, // 2
time2: u16, // 2
upper: u32, // 4
lower: u16, // 2
}
#[derive(TypeLayout)]
#[repr(C)]
struct ACG2 {
time1: u16, // 2
time2: u16, // 2
upper: u32, // 4
lower: u16, // 2
}
fn main() {
println!("ACG1: {}", ACG1::type_layout());
println!("ACG2: {}", ACG2::type_layout());
}
And I get the following output:
I understand the rules for padding the #[repr(C)] structure and the size of the structure as a whole, but what confused me is the Rust representation struct ACG1. I can't find any clear documentation on Rust padding rules, and I think the padding size should also be included in the overall size of the structure, but why is the size of ACG1 only 12 bytes?
BTW, this is the crate I used to assist in printing the layout of the structure: type-layout
0.2.0

This crate does not seem to account for field reordering. It appears the compiler reordered the struct to have upper first:
struct ACG1 {
upper: u32,
time1: u16,
time2: u16,
lower: u16,
}
It’s somewhat hard to see, but the derive macro implementation checks the difference between fields in declared order. So in this sense, there are four bytes of "padding" between the beginning of the struct and the first field (time1) and four bytes of "padding" between the third field (upper) and fourth field (lower).
There is an issue filed that it doesn't work for non-#[repr(C)] structs, so I would not recommend using this crate for this purpose.
As far as Rust's rules go, the reference says "There are no guarantees of data layout made by [the default] representation." So in theory, the compiler can do whatever it wants and reorder fields based on access patterns. But in practice, I don't think it’s that elaborate and organizing by field size is a simple way to minimize padding.

As others have said, it seem to be an issue in the crate.
Better ask the compiler:
cargo clean
cargo rustc -- -Zprint-type-sizes
This will give you:
...
print-type-size type: `ACG1`: 12 bytes, alignment: 4 bytes
print-type-size field `.upper`: 4 bytes
print-type-size field `.time1`: 2 bytes
print-type-size field `.time2`: 2 bytes
print-type-size field `.lower`: 2 bytes
print-type-size end padding: 2 bytes
print-type-size type: `ACG2`: 12 bytes, alignment: 4 bytes
print-type-size field `.time1`: 2 bytes
print-type-size field `.time2`: 2 bytes
print-type-size field `.upper`: 4 bytes
print-type-size field `.lower`: 2 bytes
print-type-size end padding: 2 bytes

Related

Rust why is size of enum variant of (u32, u32) less than (u64)?

Was looking into packing enums and while doing so I ran following program
enum SizeEnum {
V1(u32, u32),
// V2(u64),
V3(u32, u32),
}
fn main() {
println!("{:?}", std::mem::size_of::<SizeEnum>());
}
Playground link
The output is 12 bytes (96 bits). What I expected was 16 bytes (128 bits). That's what happens when I uncomment V2 variant.
Questions are:
So why does a u32, u32 variant use less space than u64?
And why 12 bytes (96 bits) rather than something like 64+8 (72 bits)? I assume it's something about padding but would appreciate a detailed answer.
Both questions boil down to alignment.
The u32s need to be aligned to 4 bytes. The u64 needs to be aligned to 8 bytes.
Therefore, for the u32 there are 3 bytes padding for the discriminant (so the u32 is at the fourth byte) and for the u64 there are seven.

How to align a packed struct with no padding between fields in Rust?

I'm working with an ABI where I need exact control over the data layout of the payload on both ends. Furthermore, there should be no padding between fields at all, ever. Additionally, the beginning of the payload should be page-aligned.
#[repr(C)] helps a lot. The modifiers #[repr(packed(N))] and #[repr(align(N))] are compatible with repr(C) but they can't be used together. I can't achieve what I want with #[repr(C, packed(4096))].
How to solve this?
The packed(N) type layout modifier does not guarantee that there will be never padding at all. This is only the case for packed / packed(1) because packed(N) can only lower the alignment of each field to min(N, default alignment). packed(N) doesn't mean that the struct is "packed", i.e. no padding at all between fields, or the alignment of the struct is 4096 byte.
If you want a page-aligned struct with no padding at all, you want to do the following:
#[repr(align(4096))]
struct Aligned4096<T>(T);
// plus impl convenient methods
#[repr(C, packed)]
struct Foo {
a: u8,
b: u64,
c: u16,
d: u8,
}
// plus impl convenient methods
fn main() {
let aligned_foo = Aligned4096(Foo::new());
}
A more detailed view of how different N in packed(N) change the type layout is shown in this table on GitHub. More information about the type layout modifiers in general is provided in the official language documentation.

Is it possible to have a type with a larger alignment than its own size?

Is it ever possible to have a type in Rust with a larger alignment than its own size? Conversely, does the rust compiler always add padding to a type to make its size at least a multiple of its alignment?
This simple example code would seem to indicate that the answer is no, all types have a size that is a multiple of their alignment, but I wanted to make sure there aren't more esoteric possibilities.
use std::mem::{size_of, align_of};
struct b1 {
byte: u8
}
#[repr(align(4))]
struct b4 {
byte: u8
}
struct b5 {
a: u8,
b: u8,
c: u8,
d: u8,
e: u8,
}
#[repr(align(8))]
struct b8 {
a: u8,
b: u8,
c: u8,
d: u8,
e: u8,
}
fn main() {
assert_eq!(size_of::<b1>(), 1);
assert_eq!(align_of::<b1>(), 1);
assert_eq!(size_of::<b4>(), 4);
assert_eq!(align_of::<b4>(), 4);
assert_eq!(size_of::<b5>(), 5);
assert_eq!(align_of::<b5>(), 1);
assert_eq!(size_of::<b8>(), 8);
assert_eq!(align_of::<b8>(), 8);
}
There is a similar question for C++, where the answer seems to be "not in standard C++, but some compiler extensions support it. You can't create an array of T in that case".
The Rust reference has this to say about size and alignment (emphasis mine):
Size and Alignment
[...]
The size of a value is the offset in bytes between successive elements in an array with that item type including alignment padding. The size of a value is always a multiple of its alignment. The size of a value can be checked with the size_of_val function.

Why does Box<[T]> need 16 bytes in memory, but a referenced slice needs only 8? (on x64 machine)

Consider:
fn main() {
// Prints 8, 8, 16
println!(
"{}, {}, {}",
std::mem::size_of::<Box<i8>>(),
std::mem::size_of::<Box<&[i8]>>(),
std::mem::size_of::<Box<[i8]>>(),
);
}
Why do owned slices take 16 bytes, but referenced slices take only 8?
Box<T> is basically *const T (Actually it's a newtype around Unique<T>, which itself is a NonNull<T> with PhantomData<T> (for dropck), but let's stick to *const T for simplicity).
A pointer in Rust normally has the same size as size_of::<usize>() except when T is a dynamically sized type (DST). Currently, a Box<DST> is 2 * size_of::<usize>() in size (the exact representation is not stable at the time of writing). A pointer to a DST is called FatPtr.
Currently, there are two kinds of DSTs: Slices and traits. A FatPtr to a slice is defined like this:
#[repr(C)]
struct FatPtr<T> {
data: *const T,
len: usize,
}
Note: For a trait pointer, len is replaced by a pointer to the vtable.
With those information, your question can be answered:
Box<i8>: i8 is a sized type => basically the same as *const i8 => 8 bytes in size (with 64 bit pointer width)
Box<[i8]>: [i8] is a DST => basically the same as FatPtr<i8> => 16 bytes in size (with 64 bit pointer width)
Box<&[i8]>: &[i8] is not a DST. It's basically the same as *const FatPtr<i8> => 8 bytes in size (with 64 bit pointer width)
The size of a reference depends on the "sizedness" of the referenced type:
A reference to a sized type is a single pointer to the memory address.
A reference to an unsized type is a pointer to the memory and the size of the pointed datum. That's what is called a fat pointer:
#[repr(C)]
struct FatPtr<T> {
data: *const T,
len: usize,
}
A Box is a special kind of pointer that points to the heap, but it is still a pointer.
Knowing that, you understand that:
Box<i8> is 8 bytes because i8 is sized,
Box<&[i8]> is 8 bytes because a reference is sized,
Box<[i8]> is 16 bytes because a slice is unsized.

Why do fat pointers sometimes percolate outwards?

I thought that I had understood fat pointers in Rust, but I have a case where I can't understand why they seem to percolate outwards from an inner type. Presumably my mental model is off, but I'm struggling to come up with a satisfactory explanation for this code:
use std::cell::RefCell;
use std::fmt::Debug;
use std::mem::size_of;
use std::rc::Rc;
fn main() {
println!("{}", size_of::<Rc<RefCell<Vec<u8>>>>());
println!("{}", size_of::<Rc<RefCell<Debug>>>());
println!("{}", size_of::<Box<Rc<RefCell<Debug>>>>());
}
which, on a 64-bit machine, prints 8, 16, 8. Playground link.
Since Rc makes a Box internally (using into_raw_non_null), I expected this to print 8, 8, 8. Is there a reason why, at least from size_of's perspective, the fat pointer seems to percolate outwards from Debug, even past Rc's Box? Is it because it's stored as a raw pointer perhaps?
Ultimately, Rc<RefCell<Debug>> is the trait object and trait objects are fat pointers. The types inside and outside of it are not fat pointers.
There are no fat pointers in the Vec<u8> set, whatsoever. A Vec<T> is a (*mut T, usize, usize), a RefCell<T> is a (T, usize), and an Rc<T> is a (*mut T).
size_of | is
---------------------+---
Vec<u8> | 24
RefCell<Vec<u8>> | 32
Rc<RefCell<Vec<u8>>> | 8
Your second and third cases do involve a fat pointer for a trait object: Rc<RefCell<dyn Debug>>. Putting a trait object behind another pointer (the Rc) creates a thin pointer to the concrete type: *mut RefCell<dyn Debug>.
size_of | is
----------------------------+---
Rc<RefCell<dyn Debug>> | 16
Box<Rc<RefCell<dyn Debug>>> | 8
Notably, it's impossible to create a RefCell<dyn Debug>:
error[E0277]: the size for values of type `dyn std::fmt::Debug` cannot be known at compilation time
--> src/main.rs:4:20
|
4 | println!("{}", mem::size_of::<RefCell<dyn Debug>>());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: within `std::cell::RefCell<dyn std::fmt::Debug>`, the trait `std::marker::Sized` is not implemented for `dyn std::fmt::Debug`
= note: to learn more, visit <https://doc.rust-lang.org/book/second-edition/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: required because it appears within the type `std::cell::RefCell<dyn std::fmt::Debug>`
= note: required by `std::mem::size_of`
The trait object requires some indirection; when you add some, you've finally constructed some type of fat pointer.
You can use the unstable option -Z print-type-sizes to explore the layout of structs:
type: `std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>`: 24 bytes, alignment: 8 bytes
field `.strong`: 8 bytes
field `.weak`: 8 bytes
field `.value`: 8 bytes
type: `core::nonzero::NonZero<*const std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>>`: 16 bytes, alignment: 8 bytes
field `.0`: 16 bytes
type: `std::ptr::NonNull<std::rc::RcBox<std::cell::RefCell<dyn std::fmt::Debug>>>`: 16 bytes, alignment: 8 bytes
field `.pointer`: 16 bytes
type: `std::rc::Rc<std::cell::RefCell<dyn std::fmt::Debug>>`: 16 bytes, alignment: 8 bytes
field `.ptr`: 16 bytes
field `.phantom`: 0 bytes, offset: 0 bytes, alignment: 1 bytes
type: `std::cell::RefCell<dyn std::fmt::Debug>`: 8 bytes, alignment: 8 bytes
field `.borrow`: 8 bytes
field `.value`: 0 bytes
I'm not 100% about parsing this output, as I expect RefCell<dyn Debug> to be an unsized type (as shown by the error above). I assume the meaning of "0 bytes" is overloaded.

Resources