Is it possible to have a type with a larger alignment than its own size? - rust

Is it ever possible to have a type in Rust with a larger alignment than its own size? Conversely, does the rust compiler always add padding to a type to make its size at least a multiple of its alignment?
This simple example code would seem to indicate that the answer is no, all types have a size that is a multiple of their alignment, but I wanted to make sure there aren't more esoteric possibilities.
use std::mem::{size_of, align_of};
struct b1 {
byte: u8
}
#[repr(align(4))]
struct b4 {
byte: u8
}
struct b5 {
a: u8,
b: u8,
c: u8,
d: u8,
e: u8,
}
#[repr(align(8))]
struct b8 {
a: u8,
b: u8,
c: u8,
d: u8,
e: u8,
}
fn main() {
assert_eq!(size_of::<b1>(), 1);
assert_eq!(align_of::<b1>(), 1);
assert_eq!(size_of::<b4>(), 4);
assert_eq!(align_of::<b4>(), 4);
assert_eq!(size_of::<b5>(), 5);
assert_eq!(align_of::<b5>(), 1);
assert_eq!(size_of::<b8>(), 8);
assert_eq!(align_of::<b8>(), 8);
}
There is a similar question for C++, where the answer seems to be "not in standard C++, but some compiler extensions support it. You can't create an array of T in that case".

The Rust reference has this to say about size and alignment (emphasis mine):
Size and Alignment
[...]
The size of a value is the offset in bytes between successive elements in an array with that item type including alignment padding. The size of a value is always a multiple of its alignment. The size of a value can be checked with the size_of_val function.

Related

How can I find the size of an enum while ignoring the discriminant?

The Rust Reference documents that a Rust enum annotated with #[repr(C)] can be viewed as a C struct of two fields. The first field is a C enum for the discriminant, the second field is a C union of C structs corresponding to the fields of the enum's variants.
Due to a bug in an FFI interoperation library, I need to avoid using unions that are exactly 8 bytes. To that end, I wanted to add some static assertions to my Rust code so I would be aware of any problematic enums. I do not know how to ask the compiler for the size of the generated union type (or equivalently, the size of the enum without accounting for the discriminant):
#[repr(C)]
enum UnionSizeIs8Bytes {
A(u8),
B(u64),
}
#[repr(C)]
enum UnionSizeIsNot8Bytes {
A(u8),
B(u16),
}
const _: () = {
// Should fail, but does not
assert!(8 != std::mem::size_of::<UnionSizeIs8Bytes>());
// Should not fail, but does
assert!(8 != std::mem::size_of::<UnionSizeIsNot8Bytes>());
};
Reading The Book about repr(C) field-less enums:
[...] the C representation has the size and alignment of the default enum size and alignment for the target platform's C ABI.
That is, they try to be fully compatible with C enums.
And in the next section about struct-like enums:
[..] is a repr(C) struct with two fields:
a repr(C) version of the enum with all fields removed ("the tag")
a repr(C) union of repr(C) structs for the fields of each variant that had them ("the payload")
That is, your enum:
#[repr(C)]
enum UnionSizeIs8Bytes {
A(u8),
B(u64),
}
has the same layout as this other one:
#[repr(C)]
enum UnionSizeIs8Bytes_Tag {
A,
B,
}
#[repr(C)]
union UnionSizeIs8Bytes_Union {
a: u8,
b: u64,
}
#[repr(C)]
struct UnionSizeIs8Bytes_Explicit {
tag: UnionSizeIs8Bytes_Tag,
data: UnionSizeIs8Bytes_Union,
}
Now, what is the actual size and alignment of an enum in C? It seems that even experts do not fully agree in the details. In practice most mainstream C compilers define the underlying type of an enum as a plain int, that will be an i32 or u32.
With that in mind, the layout of your examples should be straightforward:
UnionSizeIs8Bytes:
0-4: tag
4-8: padding
8-16: union
8-9: u8
8-16: u64
Size: 16, alignment: 8
UnionSizeIsNot8Bytes:
0-4: tag
4-6: union:
4-5: u8
4-6: u16
6-8: padding
Size: 8, alignment: 4
Note that the alignment of a repr(C) enum is never less than that of the tag, that is 4 bytes using the above assumptions.
To compute the size of the data without the tag, you just have to subtract to the full size the value of the alignment. The alignment value will account for the size of the tag itself plus any needed padding.
const fn size_of_enum_data<T>() -> usize {
std::mem::size_of::<T>() - std::mem::align_of::<T>()
}
If you want to be extra sure you could subtract std::mem::align_of::<T>().max(std::mem::size_of::<i32>()), in case your architecture's i32 does not have alignment equal to 4, but unfortunately max doesn't seem to be const yet. You could write an if of course, but that gets ugly, something like:
const fn size_of_enum_data<T>() -> usize {
let a = std::mem::align_of::<T>();
let i = std::mem::size_of::<i32>();
std::mem::size_of::<T>() - if a > i { a } else { i }
}
And if you want to be extra, extra sure, you can use c_int instead of i32. But then for esoteric architectures where c_int != i32 maybe the C enum equals C int may not hold either...
Then your assertions would be (playground):
const _: () = {
// It fails
assert!(8 != size_of_enum_data::<UnionSizeIs8Bytes>());
// It does not fail
assert!(8 != size_of_enum_data::<UnionSizeIsNot8Bytes>());
};

How to align a packed struct with no padding between fields in Rust?

I'm working with an ABI where I need exact control over the data layout of the payload on both ends. Furthermore, there should be no padding between fields at all, ever. Additionally, the beginning of the payload should be page-aligned.
#[repr(C)] helps a lot. The modifiers #[repr(packed(N))] and #[repr(align(N))] are compatible with repr(C) but they can't be used together. I can't achieve what I want with #[repr(C, packed(4096))].
How to solve this?
The packed(N) type layout modifier does not guarantee that there will be never padding at all. This is only the case for packed / packed(1) because packed(N) can only lower the alignment of each field to min(N, default alignment). packed(N) doesn't mean that the struct is "packed", i.e. no padding at all between fields, or the alignment of the struct is 4096 byte.
If you want a page-aligned struct with no padding at all, you want to do the following:
#[repr(align(4096))]
struct Aligned4096<T>(T);
// plus impl convenient methods
#[repr(C, packed)]
struct Foo {
a: u8,
b: u64,
c: u16,
d: u8,
}
// plus impl convenient methods
fn main() {
let aligned_foo = Aligned4096(Foo::new());
}
A more detailed view of how different N in packed(N) change the type layout is shown in this table on GitHub. More information about the type layout modifiers in general is provided in the official language documentation.

Is the data in Vec<T> always densely packed?

It is a common pattern to see this 'shortcut' code in rust:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::std::slice::from_raw_parts(
(p as *const T) as *const u8,
::std::mem::size_of::<T>(),
)
}
ie. Given a struct, unsafely convert the underlying pointer to &[u8] to read the bytes.
However, is it valid to take the same approach when using Vec<T>?
For example, this appears to work:
use std::mem::size_of;
use std::slice::from_raw_parts;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct Point {
pub x: u8,
pub y: u8,
pub z: u8,
}
fn as_bytes(data: &[Point]) -> &[u8] {
unsafe {
let raw_pointer = data.as_ptr();
from_raw_parts(raw_pointer as *const u8, size_of::<Point>() * data.len())
}
}
fn main() {
let points = vec![Point{x: 0u8, y: 1u8, z: 2u8}, Point{x: 3u8, y: 4u8, z: 5u8}];
let slice = points.as_slice();
println!("{:?}", slice);
let bytes = as_bytes(slice);
println!("{:?}", bytes);
assert!(bytes.len() == 6);
assert!(bytes[0] == 0u8);
assert!(bytes[1] == 1u8);
assert!(bytes[2] == 2u8);
assert!(bytes[3] == 3u8);
assert!(bytes[4] == 4u8);
assert!(bytes[5] == 5u8);
}
...but is it reliable to assume that Vec<T> is represented as a single contiguous block of data this way?
The documentation on https://doc.rust-lang.org/std/vec/struct.Vec.html#capacity-and-reallocation says:
If a Vec has allocated memory, then the memory it points to is on the heap (as defined by the allocator Rust is configured to use by default), and its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice), followed by capacity-len logically uninitialized, contiguous elements.
...but I'm not really sure if I understand what it means. Does this actually mean that for Vec<T> the underlying pointer is to a block of memory of length size_of::<T> * length of the Vec?
Yes, a Vec<T> can be made into something that can be treated as a pointer to a block of memory of length std::mem::size_of::<T>() times the length of Vec.
There is one caveat, as what you are actually interested in is the slice of T, which the Vec can provide; the Vec itself should be considered an implementation detail. Besides that:
A Vec<T> can deref to a slice [T]. Take that slice.
The Rust Reference defines that a slice has the same layout as the section of the Array they slice. So when we deref from a Vec<T> to a [T], this slice of length n is guaranteed to have the same memory layout as an array [T; n].
The Rust References defines the memory layout of an Array:
Arrays are laid out so that the nth element of the array is offset
from the start of the array by n * the size of the type bytes. An
array of [T; n] has a size of size_of::<T>() * n and the same
alignment of T.
We know n (from [T]) and we know "the size of the type bytes" (via mem::size_of<T>()). Since all members of an array must be fully initialized at all times, and given the two sentences from the paragraph above, we know it is safe to access all bytes up until mem::size_of<T>() * length of Vec (actually length of slice, which introduces the array memory layout rule).
To make use of all that, you should make sure that you get a slice of the Vec first, use as_ptr() on the slice, and cast the raw pointer you get. This ensures the sequence of definitions as above. Your fn as_bytes(data: &[Point]) -> &[u8] is exactly correct.

How to calculate u64 modulus u8 in Rust? [duplicate]

Editor's note: This question is from a version of Rust prior to 1.0 and references some items that are not present in Rust 1.0. The answers still contain valuable information.
What's the idiomatic way to convert from (say) a usize to a u32?
For example, casting using 4294967295us as u32 works and the Rust 0.12 reference docs on type casting say
A numeric value can be cast to any numeric type. A raw pointer value can be cast to or from any integral type or raw pointer type. Any other cast is unsupported and will fail to compile.
but 4294967296us as u32 will silently overflow and give a result of 0.
I found ToPrimitive and FromPrimitive which provide nice functions like to_u32() -> Option<u32>, but they're marked as unstable:
#[unstable(feature = "core", reason = "trait is likely to be removed")]
What's the idiomatic (and safe) way to convert between numeric (and pointer) types?
The platform-dependent size of isize / usize is one reason why I'm asking this question - the original scenario was I wanted to convert from u32 to usize so I could represent a tree in a Vec<u32> (e.g. let t = Vec![0u32, 0u32, 1u32], then to get the grand-parent of node 2 would be t[t[2us] as usize]), and I wondered how it would fail if usize was less than 32 bits.
Converting values
From a type that fits completely within another
There's no problem here. Use the From trait to be explicit that there's no loss occurring:
fn example(v: i8) -> i32 {
i32::from(v) // or v.into()
}
You could choose to use as, but it's recommended to avoid it when you don't need it (see below):
fn example(v: i8) -> i32 {
v as i32
}
From a type that doesn't fit completely in another
There isn't a single method that makes general sense - you are asking how to fit two things in a space meant for one. One good initial attempt is to use an Option — Some when the value fits and None otherwise. You can then fail your program or substitute a default value, depending on your needs.
Since Rust 1.34, you can use TryFrom:
use std::convert::TryFrom;
fn example(v: i32) -> Option<i8> {
i8::try_from(v).ok()
}
Before that, you'd have to write similar code yourself:
fn example(v: i32) -> Option<i8> {
if v > std::i8::MAX as i32 {
None
} else {
Some(v as i8)
}
}
From a type that may or may not fit completely within another
The range of numbers isize / usize can represent changes based on the platform you are compiling for. You'll need to use TryFrom regardless of your current platform.
See also:
How do I convert a usize to a u32 using TryFrom?
Why is type conversion from u64 to usize allowed using `as` but not `From`?
What as does
but 4294967296us as u32 will silently overflow and give a result of 0
When converting to a smaller type, as just takes the lower bits of the number, disregarding the upper bits, including the sign:
fn main() {
let a: u16 = 0x1234;
let b: u8 = a as u8;
println!("0x{:04x}, 0x{:02x}", a, b); // 0x1234, 0x34
let a: i16 = -257;
let b: u8 = a as u8;
println!("0x{:02x}, 0x{:02x}", a, b); // 0xfeff, 0xff
}
See also:
What is the difference between From::from and as in Rust?
About ToPrimitive / FromPrimitive
RFC 369, Num Reform, states:
Ideally [...] ToPrimitive [...] would all be removed in favor of a more principled way of working with C-like enums
In the meantime, these traits live on in the num crate:
ToPrimitive
FromPrimitive

How do I return an vector of dynamic length in a pub extern "C" fn?

I want to return a vector in a pub extern "C" fn. Since a vector has an arbitrary length, I guess I need to return a struct with
the pointer to the vector, and
the number of elements in the vector
My current code is:
extern crate libc;
use self::libc::{size_t, int32_t, int64_t};
// struct to represent an array and its size
#[repr(C)]
pub struct array_and_size {
values: int64_t, // this is probably not how you denote a pointer, right?
size: int32_t,
}
// The vector I want to return the address of is already in a Boxed struct,
// which I have a pointer to, so I guess the vector is on the heap already.
// Dunno if this changes/simplifies anything?
#[no_mangle]
pub extern "C" fn rle_show_values(ptr: *mut Rle) -> array_and_size {
let rle = unsafe {
assert!(!ptr.is_null());
&mut *ptr
};
// this is the Vec<i32> I want to return
// the address and length of
let values = rle.values;
let length = values.len();
array_and_size {
values: Box::into_raw(Box::new(values)),
size: length as i32,
}
}
#[derive(Debug, PartialEq)]
pub struct Rle {
pub values: Vec<i32>,
}
The error I get is
$ cargo test
Compiling ranges v0.1.0 (file:///Users/users/havpryd/code/rust-ranges)
error[E0308]: mismatched types
--> src/rle.rs:52:17
|
52 | values: Box::into_raw(Box::new(values)),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected i64, found *-ptr
|
= note: expected type `i64`
= note: found type `*mut std::vec::Vec<i32>`
error: aborting due to previous error
error: Could not compile `ranges`.
To learn more, run the command again with --verbose.
-> exit code: 101
I posted the whole thing because I could not find an example of returning arrays/vectors in the eminently useful Rust FFI Omnibus.
Is this the best way to return a vector of unknown size from Rust? How do I fix my remaining compile error? Thanks!
Bonus q: if the fact that my vector is in a struct changes the answer, perhaps you could also show how to do this if the vector was not in a Boxed struct already (which I think means the vector it owns is on the heap too)? I guess many people looking up this q will not have their vectors boxed already.
Bonus q2: I only return the vector to view its values (in Python), but I do not want to let the calling code change the vector. But I guess there is no way to make the memory read-only and ensure the calling code does not fudge with the vector? const is just for showing intent, right?
Ps: I do not know C or Rust well, so my attempt might be completely WTF.
pub struct array_and_size {
values: int64_t, // this is probably not how you denote a pointer, right?
size: int32_t,
}
First of all, you're correct. The type you want for values is *mut int32_t.
In general, and note that there are a variety of C coding styles, C often doesn't "like" returning ad-hoc sized array structs like this. The more common C API would be
int32_t rle_values_size(RLE *rle);
int32_t *rle_values(RLE *rle);
(Note: many internal programs do in fact use sized array structs, but this is by far the most common for user-facing libraries because it's automatically compatible with the most basic way of representing arrays in C).
In Rust, this would translate to:
extern "C" fn rle_values_size(rle: *mut RLE) -> int32_t
extern "C" fn rle_values(rle: *mut RLE) -> *mut int32_t
The size function is straightforward, to return the array, simply do
extern "C" fn rle_values(rle: *mut RLE) -> *mut int32_t {
unsafe { &mut (*rle).values[0] }
}
This gives a raw pointer to the first element of the Vec's underlying buffer, which is all C-style arrays really are.
If, instead of giving C a reference to your data you want to give C the data, the most common option would be to allow the user to pass in a buffer that you clone the data into:
extern "C" fn rle_values_buf(rle: *mut RLE, buf: *mut int32_t, len: int32_t) {
use std::{slice,ptr}
unsafe {
// Make sure we don't overrun our buffer's length
if len > (*rle).values.len() {
len = (*rle).values.len()
}
ptr::copy_nonoverlapping(&(*rle).values[0], buf, len as usize);
}
}
Which, from C, looks like
void rle_values_buf(RLE *rle, int32_t *buf, int32_t len);
This (shallowly) copies your data into the presumably C-allocated buffer, which the C user is then responsible for destroying. It also prevents multiple mutable copies of your array from floating around at the same time (assuming you don't implement the version that returns a pointer).
Note that you could sort of "move" the array into C as well, but it's not particularly recommended and involves the use mem::forget and expecting the C user to explicitly call a destruction function, as well as requiring both you and the user to obey some discipline that may be difficult to structure the program around.
If you want to receive an array from C, you essentially just ask for both a *mut i32 and i32 corresponding to the buffer start and length. You can assemble this into a slice using the from_raw_parts function, and then use the to_vec function to create an owned Vector containing the values allocated from the Rust side. If you don't plan on needing to own the values, you can simply pass around the slice you produced via from_raw_parts.
However, it is imperative that all values be initialized from either side, typically to zero. Otherwise you invoke legitimately undefined behavior which often results in segmentation faults (which tend to frustratingly disappear when inspected with GDB).
There are multiple ways to pass an array to C.
First of all, while C has the concept of fixed-size arrays (int a[5] has type int[5] and sizeof(a) will return 5 * sizeof(int)), it is not possible to directly pass an array to a function or return an array from it.
On the other hand, it is possible to wrap a fixed size array in a struct and return that struct.
Furthermore, when using an array, all elements must be initialized, otherwise a memcpy technically has undefined behavior (as it is reading from undefined values) and valgrind will definitely report the issue.
Using a dynamic array
A dynamic array is an array whose length is unknown at compile-time.
One may chose to return a dynamic array if no reasonable upper-bound is known, or this bound is deemed too large for passing by value.
There are two ways to handle this situation:
ask C to pass a suitably sized buffer
allocate a buffer and return it to C
They differ in who allocates the memory: the former is simpler, but may require to either have a way to hint at a suitable size or to be able to "rewind" if the size proves unsuitable.
Ask C to pass a suitable sized buffer
// file.h
int rust_func(int32_t* buffer, size_t buffer_length);
// file.rs
#[no_mangle]
pub extern fn rust_func(buffer: *mut libc::int32_t, buffer_length: libc::size_t) -> libc::c_int {
// your code here
}
Note the existence of std::slice::from_raw_parts_mut to transform this pointer + length into a mutable slice (do initialize it with 0s before making it a slice or ask the client to).
Allocate a buffer and return it to C
// file.h
struct DynArray {
int32_t* array;
size_t length;
}
DynArray rust_alloc();
void rust_free(DynArray);
// file.rs
#[repr(C)]
struct DynArray {
array: *mut libc::int32_t,
length: libc::size_t,
}
#[no_mangle]
pub extern fn rust_alloc() -> DynArray {
let mut v: Vec<i32> = vec!(...);
let result = DynArray {
array: v.as_mut_ptr(),
length: v.len() as _,
};
std::mem::forget(v);
result
}
#[no_mangle]
pub extern fn rust_free(array: DynArray) {
if !array.array.is_null() {
unsafe { Box::from_raw(array.array); }
}
}
Using a fixed-size array
Similarly, a struct containing a fixed size array can be used. Note that both in Rust and C all elements should be initialized, even if unused; zeroing them works well.
Similarly to the dynamic case, it can be either passed by mutable pointer or returned by value.
// file.h
struct FixedArray {
int32_t array[32];
};
// file.rs
#[repr(C)]
struct FixedArray {
array: [libc::int32_t; 32],
}

Resources