As far as I know, the Rust compiler is allowed to pack, reorder, and add padding to each field of a struct. How can I specify the precise memory layout if I need it?
In C#, I have the StructLayout attribute, and in C/C++, I could use various compiler extensions. I could verify the memory layout by checking the byte offset of expected value locations.
I'd like to write OpenGL code employing custom shaders, which needs precise memory layout. Is there a way to do this without sacrificing performance?
As described in the FFI guide, you can add attributes to structs to use the same layout as C:
#[repr(C)]
struct Object {
a: i32,
// other members
}
and you also have the ability to pack the struct:
#[repr(C, packed)]
struct Object {
a: i32,
// other members
}
And for detecting that the memory layout is ok, you can initialize a struct and check that the offsets are ok by casting the pointers to integers:
#[repr(C, packed)]
struct Object {
a: u8,
b: u16,
c: u32, // other members
}
fn main() {
let obj = Object {
a: 0xaa,
b: 0xbbbb,
c: 0xcccccccc,
};
let a_ptr: *const u8 = &obj.a;
let b_ptr: *const u16 = &obj.b;
let c_ptr: *const u32 = &obj.c;
let base = a_ptr as usize;
println!("a: {}", a_ptr as usize - base);
println!("b: {}", b_ptr as usize - base);
println!("c: {}", c_ptr as usize - base);
}
outputs:
a: 0
b: 1
c: 3
There's no longer to_uint. In Rust 1.0, the code can be:
#[repr(C, packed)]
struct Object {
a: i8,
b: i16,
c: i32, // other members
}
fn main() {
let obj = Object {
a: 0x1a,
b: 0x1bbb,
c: 0x1ccccccc,
};
let base = &obj as *const _ as usize;
let a_off = &obj.a as *const _ as usize - base;
let b_off = &obj.b as *const _ as usize - base;
let c_off = &obj.c as *const _ as usize - base;
println!("a: {}", a_off);
println!("b: {}", b_off);
println!("c: {}", c_off);
}
You also can set memory layout for "data-carrying enums" like this.
#[repr(Int)]
enum MyEnum {
A(u32),
B(f32, u64),
C { x: u32, y: u8 },
D,
}
Details are described in manual and RFC2195.
https://rust-lang.github.io/unsafe-code-guidelines/layout/enums.html
https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html#motivation
Related
TL;DR: I thought that the packed attribute in Rust always strips any padding between the fields but apparently this is only true for packed(1).
I want my struct to represent the exact bytes in memory without any additional padding between fields but the struct also needs to be page-aligned. The compiler output isn't what I expect it to be in my code example. From the language reference [0] I found, that packed(N) aligns the struct to a N-byte boundary. I expected that only the beginning of the struct is aligned while there is never padding between the fields. However, I found out that:
#[repr(C, packed(4096)]
struct Foo {
first: u8,
second: u32,
}
let foo = Foo { first: 0, second: 0 };
println!("foo is page-aligned: {}", &foo as *const _ as usize & 0xfff == 0);
println!("{:?}", &foo.first as *const _);
println!("{:?}", &foo.second as *const _);
println!("padding between fields: {}", &foo.second as *const _ as usize - &foo.first as *const _ as usize);
results in
foo is page-aligned: false
0x7ffc85be5eb8
0x7ffc85be5ebc
padding between fields: 4
Why is the struct not page-aligned and why is there padding between the fields? I found out that I can achieve what I want with
#[repr(align(4096))]
struct PageAligned<T>(T);
#[repr(C, packed)]
struct Foo {
first: u8,
second: u32,
}
let foo = Foo { first: 0, second: 0 };
let aligned_foo = PageAligned(Foo { first: 0, second: 0 });
it results in
foo is page-aligned: true
0x7ffd18c12000
0x7ffd18c12001
padding between fields: 1
but I think this is counter-intuitive. Is this how it is supposed to work? I'm on Rust stable 1.57.
To meet the requirements with the available tools, your best option may be to construct a substitute for your u32 field which naturally has an alignment of 1:
#[repr(C, align(4096))]
struct Foo {
first: u8,
second: [u8; 4],
}
impl Foo {
fn second(&self) -> u32 {
u32::from_ne_bytes(self.second)
}
fn set_second(&mut self, value: u32) {
self.second = u32::to_ne_bytes(value);
}
}
This struct's layout passes your tests.
I am attempting to map a simple struct to an underlying buffer as follows, where modifying the struct also modifies the buffer:
#[repr(C, packed)]
pub struct User {
id: u8,
username: [u8; 20],
}
fn main() {
let buf = [97u8; 21];
let mut u: User = unsafe { std::ptr::read(buf.as_ptr() as *const _) };
let buf_addr = &buf[0] as *const u8;
let id_addr = &u.id as *const u8;
println!("buf addr: {:p} id addr: {:p} id val: {}", buf_addr, id_addr, u.id);
assert_eq!(buf_addr, id_addr); // TODO addresses not equal
u.id = 10;
println!("id val: {}", u.id);
println!("{:?}", &buf);
assert_eq!(buf[0], u.id); // TODO buffer not updated
}
However the starting address of the buffer is different to the address of the first member in the struct and modifying the struct does not modify the buffer. What is wrong with the above example?
The struct only contains owned values. That means that, in order to construct one, you have to copy data into it. And that is exactly what you are doing when you use ptr::read.
But what you want to do (at least the code presented) is not possible. If you work around Rust's safety checks with unsafe code then you would have two mutable references to the same data, which is Undefined Behaviour.
You can, however, create a safe API of mutable "views" onto the data, something like this:
#[repr(C, packed)]
pub struct User {
id: u8,
username: [u8; 20],
}
pub struct RawUser {
buf: [u8; 21],
}
impl RawUser {
pub fn as_bytes_mut(&mut self) -> &mut [u8; 21] {
&mut self.buf
}
pub fn as_user_mut(&mut self) -> &mut User {
unsafe { &mut *(self.buf.as_mut_ptr() as *mut _) }
}
}
These accessors let you view the same data in different ways, while allowing the Rust borrow checker to enforce memory safety. Usage looks like this:
fn main() {
let buf = [97u8; 21];
let mut u: RawUser = RawUser { buf };
let user = u.as_user_mut();
user.id = 10;
println!("id val: {}", user.id); // id val: 10
let bytes = u.as_bytes_mut();
// it would be a compile error to try to access `user` here
assert_eq!(bytes[0], 10);
}
I have a function implemented in C, and I want to write a function in Rust with the same interface. The function receives a pointer to the beginning of the array (win8_t *) and the length of the array. I need to be able to run through the array.
There must be a better way to get the next value, but now I can do this strange thing:
use std::mem;
pub extern "C" fn print_next(i: *const u8) {
let mut ii = unsafe { mem::transmute::<*const u8, i64>(i) };
ii += 1;
let iii = unsafe { mem::transmute::<i64, *const u8>(ii) };
let jj = unsafe { *iii };
println!("{}", jj); // jj is next value
}
As Shepmaster said, you probably need to provide the length of the slice.
Most of the time you're working with pointers, your function will be unsafe (because you usually need to dereference it at some point). It might be a good idea to mark them unsafe to delegate the safety responsibility to the caller.
Here are some examples using offset and from_raw_slice:
use std::mem;
use std::slice;
// unsafe!
pub extern "C" fn print_next(i: *const u8) {
let mut ii = unsafe { mem::transmute::<*const u8, i64>(i) };
ii += 1;
let iii = unsafe { mem::transmute::<i64, *const u8>(ii) };
let jj = unsafe { *iii };
println!("{}", jj); // jj is next value
}
// unsafe!
pub unsafe extern "C" fn print_next2(i: *const u8) {
let j = *i.offset(1);
println!("{}", j);
}
// (less but still ...) unsafe!
pub unsafe extern "C" fn print_next3(i: *const u8, len: usize) {
let slice = slice::from_raw_parts(i, len);
// we are not checking the size ... so it may panic!
println!("{}", slice[1]);
}
fn main() {
let a = [9u8, 4, 6, 7];
print_next(&a as *const u8);
unsafe {
print_next2(&a[1] as *const u8);
print_next3(&a[2] as *const u8, 2);
}
// what if I print something not in a??
print_next(&a[3] as *const u8); // BAD
unsafe {
print_next2(&a[3] as *const u8); // BAD
print_next3(&a[3] as *const u8, 2); // as bad as others, length is wrong
print_next3(&a[3] as *const u8, 1); // panic! out of bounds
}
}
I have a type:
struct Foo {
memberA: Bar,
memberB: Baz,
}
and a pointer which I know is a pointer to memberB in Foo:
p: *const Baz
What is the correct way to get a new pointer p: *const Foo which points to the original struct Foo?
My current implementation is the following, which I'm pretty sure invokes undefined behavior due to the dereference of (p as *const Foo) where p is not a pointer to a Foo:
let p2 = p as usize -
((&(*(p as *const Foo)).memberB as *const _ as usize) - (p as usize));
This is part of FFI - I can't easily restructure the code to avoid needing to perform this operation.
This is very similar to Get pointer to object from pointer to some member but for Rust, which as far as I know has no offsetof macro.
The dereference expression produces an lvalue, but that lvalue is not actually read from, we're just doing pointer math on it, so in theory, it should be well defined. That's just my interpretation though.
My solution involves using a null pointer to retrieve the offset to the field, so it's a bit simpler than yours as it avoids one subtraction (we'd be subtracting 0). I believe I saw some C compilers/standard libraries implementing offsetof by essentially returning the address of a field from a null pointer, which is what inspired the following solution.
fn main() {
let p: *const Baz = 0x1248 as *const _;
let p2: *const Foo = unsafe { ((p as usize) - (&(*(0 as *const Foo)).memberB as *const _ as usize)) as *const _ };
println!("{:p}", p2);
}
We can also define our own offset_of! macro:
macro_rules! offset_of {
($ty:ty, $field:ident) => {
unsafe { &(*(0 as *const $ty)).$field as *const _ as usize }
}
}
fn main() {
let p: *const Baz = 0x1248 as *const _;
let p2: *const Foo = ((p as usize) - offset_of!(Foo, memberB)) as *const _;
println!("{:p}", p2);
}
With the implementation of RFC 2582, raw reference MIR operator, it is now possible to get the address of a field in a struct without an instance of the struct and without invoking undefined behavior.
use std::{mem::MaybeUninit, ptr};
struct Example {
a: i32,
b: u8,
c: bool,
}
fn main() {
let offset = unsafe {
let base = MaybeUninit::<Example>::uninit();
let base_ptr = base.as_ptr();
let c = ptr::addr_of!((*base_ptr).c);
(c as usize) - (base_ptr as usize)
};
println!("{}", offset);
}
The implementation of this is tricky and nuanced. It is best to use a crate that is well-maintained, such as memoffset.
Before this functionality was stabilized, you must have a valid instance of the struct. You can use tools like once_cell to minimize the overhead of the dummy value that you need to create:
use once_cell::sync::Lazy; // 1.4.1
struct Example {
a: i32,
b: u8,
c: bool,
}
static DUMMY: Lazy<Example> = Lazy::new(|| Example {
a: 0,
b: 0,
c: false,
});
static OFFSET_C: Lazy<usize> = Lazy::new(|| {
let base: *const Example = &*DUMMY;
let c: *const bool = &DUMMY.c;
(c as usize) - (base as usize)
});
fn main() {
println!("{}", *OFFSET_C);
}
If you must have this at compile time, you can place similar code into a build script and write out a Rust source file with the offsets. However, that will span multiple compiler invocations, so you are relying on the struct layout not changing between those invocations. Using something with a known representation would reduce that risk.
See also:
How do I create a global, mutable singleton?
How to create a static string at compile time
I'm reading a series of bytes from a socket and I need to put each segment of n bytes as a item in a struct.
use std::mem;
#[derive(Debug)]
struct Things {
x: u8,
y: u16,
}
fn main() {
let array = [22 as u8, 76 as u8, 34 as u8];
let foobar: Things;
unsafe {
foobar = mem::transmute::<[u8; 3], Things>(array);
}
println!("{:?}", foobar);
}
I'm getting errors that say that foobar is 32 bits when array is 24 bits. Shouldn't foobar be 24 bits (8 + 16 = 24)?
The issue here is that the y field is 16-bit-aligned. So your memory layout is actually
x
padding
y
y
Note that swapping the order of x and y doesn't help, because Rust's memory layout for structs is actually undefined (and thus still 32 bits for no reason but simplicity in the compiler). If you depend on it you will get undefined behavior.
The reasons for alignment are explained in Purpose of memory alignment.
You can prevent alignment from happening by adding the attribute repr(packed) to your struct, but you'll lose performance and the ability to take references of fields:
#[repr(packed)]
struct Things {
x: u8,
y: u16,
}
The best way would be to not use transmute at all, but to extract the values manually and hope the optimizer makes it fast:
let foobar = Things {
x: array[0],
y: ((array[1] as u16) << 8) | (array[2] as u16),
};
A crate like byteorder may simplify the process of reading different sizes and endianness from the bytes.
bincode and serde can do this quit simply.
use bincode::{deserialize};
use serde::{Deserialize};
#[derive(Deserialize, Debug)]
struct Things {
x: u8,
y: u16,
}
fn main() {
let array = [22 as u8, 76 as u8, 34 as u8];
let foobar: Things = deserialize(&array).unwrap();
println!("{:?}", foobar);
}
This also works well for serializing a struct into bytes as well.
use bincode::{serialize};
use serde::{Serialize};
#[derive(Serialize, Debug)]
struct Things {
x: u8,
y: u16,
}
fn main() {
let things = Things{
x: 22,
y: 8780,
};
let baz = serialize(&things).unwrap();
println!("{:?}", baz);
}
I was having issues using the byteorder crate when dealing with structs that also had char arrays. I couldn't get past the compiler errors. I ended up casting like this:
#[repr(packed)]
struct Things {
x: u8,
y: u16,
}
fn main() {
let data: [u8; 3] = [0x22, 0x76, 0x34];
unsafe {
let things_p: *const Things = data.as_ptr() as *const Things;
let things: &Things = &*things_p;
println!("{:x} {:x}", things.x, things.y);
}
}
Note that with using packed, you get this warning:
= warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
If you can, change Things to behave like a C struct:
#[repr(C)]
struct Things2 {
x: u8,
y: u16,
}
Then initialize data like this. Note the extra byte for alignment purposes.
let data: [u8; 4] = [0x22, 0, 0x76, 0x34];
use std::mem;
fn main() {
let bytes = vec!(0u8, 1u8,2u8, 3, 4, 5, 6, 7, 8, 9, 0xffu8, );
let data_ptr: *const u64 = unsafe { mem::transmute(bytes[0..4].as_ptr()) };
let data: u64 = unsafe { *data_ptr };
println!("{:#x}", data);
}