Align struct members but not struct itself - rust

I'm writing some Rust code that interfaces with C. The C defines some structures like
struct foo {
// other fields
uint32_t count;
const struct bar* array;
};
I want to define a Rust structure with this layout, but for safety I want to put count and array in their own structure as private members, like so:
#[repr(C)]
pub struct Foo {
// other fields
pub bars: BarSlice,
}
#[repr(C)]
pub struct BarSlice {
count: u32,
ptr: *const Bar,
}
This is wrong though, since on 64-bit systems BarSlice will be aligned to 8 bytes, and the field immediately before count might be 8 byte aligned and 4 bytes long. How can I tell Rust to align the members of the struct correctly, but not worry about the struct itself?
I would also rather avoid if possible
creating a different slice type for each alignment circumstance
having private fields in Foo

Related

Best way to populate a struct from a similar struct?

I am new to Rust, and am attempting to take a struct returned from a library (referred to as source struct) and convert it into protobuf message using prost. The goal is to take the source struct, map the source struct types to the protobuf message types (or rather, the appropriate types for prost-generated struct, referred to as message struct), and populate the message struct fields using fields from the source struct. The source struct fields are a subset of message struct fields. For example:
pub struct Message {
pub id: i32,
pub email: String,
pub name: String,
}
pub struct Source {
pub email: Email,
pub name: Name,
}
So, I would like to take fields from from Source, map the types to corresponding types in Message, and populate the fields of Message using Source (fields have the same name). Currently, I am manually assigning the values by creating a Message struct and doing something like message_struct.email = source_struct.email.to_string();. Except I have multiple Message structs based on protobuf, some having 20+ fields, so I'm really hoping to find a more efficient way of doing this.
If I understand you correctly you want to generate define new struct based on fields from another. In that case you have to use macros.
https://doc.rust-lang.org/book/ch19-06-macros.html
Also this question (and answer) could be useful for you Is it possible to generate a struct with a macro?
To convert struct values from one to another struct best way is to use From<T> or Into<T> trait.
https://doc.rust-lang.org/std/convert/trait.From.html
This is called FRU (functional record update).
This currently works only for structs with the same type and structs with the same type modulo generic parameters.
The RFC-2528 talks about a generalization that would make this work:
struct Foo {
field1: &'static str,
field2: i32,
}
struct Bar {
field1: f64,
field2: i32,
}
let foo = Foo { field1: "hi", field2: 1 };
let bar = Bar { field1: 3.14, ..foo };
Unfortunately, this has not yet been implemented.
You could make a method to create a Message from a Source like this.
impl Message {
pub fn from_source(source: &Source, id: i32) -> Self {
Message {
id: id,
email: source.email.to_string(),
name: source.name.to_string(),
}
}
}
And then,
let source = // todo
let id = // todo
let message = Message::from_source(&source, id);

Writing to a field in a MaybeUninit structure?

I'm doing something with MaybeUninit and FFI in Rust that seems to work, but I suspect may be unsound/relying on undefined behavior.
My aim is to have a struct MoreA extend a struct A, by including A as an initial field. And then to call some C code that writes to the struct A. And then finalize MoreA by filling in its additional fields, based on what's in A.
In my application, the additional fields of MoreA are all integers, so I don't have to worry about assignments to them dropping the (uninitialized) previous values.
Here's a minimal example:
use core::fmt::Debug;
use std::mem::MaybeUninit;
#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct A(i32, i32);
#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct MoreA {
head: A,
more: i32,
}
unsafe fn mock_ffi(p: *mut A) {
// write doesn't drop previous (uninitialized) occupant of p
p.write(A(1, 2));
}
fn main() {
let mut b = MaybeUninit::<MoreA>::uninit();
unsafe { mock_ffi(b.as_mut_ptr().cast()); }
let b = unsafe {
let mut b = b.assume_init();
b.more = 3;
b
};
assert_eq!(&b, &MoreA { head: A(1, 2), more: 3 });
}
Is the code let b = unsafe { ... } sound? It runs Ok and Miri doesn't complain.
But the MaybeUninit docs say:
Moreover, uninitialized memory is special in that the compiler knows that it does not have
a fixed value. This makes it undefined behavior to have uninitialized data in a variable
even if that variable has an integer type, which otherwise can hold any fixed bit pattern.
Also, the Rust book says that Behavior considered undefined includes:
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
... An integer (i*/u*) or ... obtained from uninitialized memory.
On the other hand, it doesn't seem possible to write to the more field before calling assume_init. Later on the same page:
There is currently no supported way to create a raw pointer or reference to a field of a struct
inside MaybeUninit. That means it is not possible to create a struct by calling
MaybeUninit::uninit::() and then writing to its fields.
If what I'm doing in the above code example does trigger undefined behavior, what would solutions be?
I'd like to avoid boxing the A value (that is, I'd like to have it be directly included in MoreA).
I'd hope also to avoid having to create one A to pass to mock_ffi and then having to copy the results into MoreA. A in my real application is a large structure.
I guess if there's no sound way to get what I'm after, though, I'd have to choose one of those two fallbacks.
If struct A is of a type that can hold the bit-pattern 0 as a valid value, then I guess a third fallback would be:
Start with MaybeUninit::zeroed() rather than MaybeUninit::uninit().
Currently, the only sound way to refer to uninitialized memory—of any type—is MaybeUninit. In practice, it is probably safe to read or write to uninitialized integers, but that is not officially documented. It is definitely not safe to read or write to an uninitialized bool or most other types.
In general, as the documentation states, you cannot initialize a struct field by field. However, it is sound to do so as long as:
the struct has repr(C). This is necessary because it prevents Rust from doing clever layout tricks, so that the layout of a field of type MaybeUninit<T> remains identical to the layout of a field of type T, regardless of its adjacent fields.
every field is MaybeUninit. This lets us assume_init() for the entire struct, and then later initialise each field individually.
Given that your struct is already repr(C), you can use an intermediate representation which uses MaybeIninit for every field. The repr(C) also means that we can transmute between the types once it is initialised, provided that the two structs have the same fields in the same order.
use std::mem::{self, MaybeUninit};
#[repr(C)]
struct MoreAConstruct {
head: MaybeUninit<A>,
more: MaybeUninit<i32>,
}
let b: MoreA = unsafe {
// It's OK to assume a struct is initialized when all of its fields are MaybeUninit
let mut b_construct = MaybeUninit::<MoreAConstruct>::uninit().assume_init();
mock_ffi(b_construct.head.as_mut_ptr());
b_construct.more = MaybeUninit::new(3);
mem::transmute(b_construct)
};
It is now possible (since Rust 1.51) to initialize fields of any uninitialized struct using the std::ptr::addr_of_mut macro. This example is from the documentation:
You can use MaybeUninit, and the std::ptr::addr_of_mut macro, to
initialize structs field by field:
#[derive(Debug, PartialEq)] pub struct Foo {
name: String,
list: Vec<u8>, }
let foo = {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
// Initializing the `name` field
unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }
// Initializing the `list` field
// If there is a panic here, then the `String` in the `name` field leaks.
unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }
// All the fields are initialized, so we call `assume_init` to get an initialized Foo.
unsafe { uninit.assume_init() } };
assert_eq!(
foo,
Foo {
name: "Bob".to_string(),
list: vec![0, 1, 2]
}
);

How do I use cbindgen to return and free a Box<Vec<_>>?

I have a struct returned to C code from Rust. I have no idea if it's a good way to do things, but it does work for rebuilding the struct and freeing memory without leaks.
#[repr(C)]
pub struct s {
// ...
}
#[repr(C)]
#[allow(clippy::box_vec)]
pub struct s_arr {
arr: *const s,
n: i8,
vec: Box<Vec<s>>,
}
/// Frees memory that was returned to C code
pub unsafe extern "C" fn free_s_arr(a: *mut s_arr) {
Box::from_raw(s_arr);
}
/// Generates an array for the C code
pub unsafe extern "C" fn gen_s_arr() -> *mut s_arr {
let many_s: Vec<s> = Vec::new();
// ... logic here
Box::into_raw(Box::new(s_arr {
arr: many_s.as_mut_ptr(),
n: many_s.len() as i8,
vec: many_s,
}))
}
The C header is currently written by hand, but I wanted to try out cbindgen. The manual C definition for s_arr is:
struct s_arr {
struct s *arr;
int8_t n;
void *_;
};
cbindgen generates the following for s_arr:
typedef struct Box_Vec_s Box_Vec_s;
typedef struct s_arr {
const s *arr;
int8_t n;
Box_Vec_s vec;
} s_arr;
This doesn't work since struct Box_Vec_s is not defined. Ideally I would just want to override the cbindgen type generated for vec to make it void * since it requires no code changes and thus no additional testing, but I am open to other suggestions.
I have looked through the cbindgen documentation, though not the examples, and couldn't find anything.
Your question is a bit unclear, but I think that if I understood you right, you're confusing two things and being led down a dark alley as a result.
In C, a dynamically-sized array, as you probably know, is identified by two things:
Its starting position, as a pointer
Its length
Rust follows the same convention - a Vec<_>, below the hood, shares the same structure (well, almost. It has a capacity as well, but that's beside the point).
Passing the boxed vector on top of a pointer is not only overkill, but extremely unwise. FFI bindings may be smart, but they're not smart enough to deal with a boxed complex type most of the time.
To solve this, we're going to simplify your bindings. I've added a single element in struct S to show you how it works. I've also cleaned up your FFI boundary:
#[repr(C)]
#[no_mangle]
pub struct S {
foo: u8
}
#[repr(C)]
pub struct s_arr {
arr: *mut S,
n: usize,
cap: usize
}
// Retrieve the vector back
pub unsafe extern "C" fn recombine_s_arr(ptr: *mut S, n: usize, cap: usize) -> Vec<S> {
Vec::from_raw_parts(ptr, n, cap)
}
#[no_mangle]
pub unsafe extern "C" fn gen_s_arr() -> s_arr {
let mut many_s: Vec<S> = Vec::new();
let output = s_arr {
arr: many_s.as_mut_ptr(),
n: many_s.len(),
cap: many_s.capacity()
};
std::mem::forget(many_s);
output
}
With this, cbindgen returns the expected header definitions:
typedef struct {
uint8_t foo;
} so58311426S;
typedef struct {
so58311426S *arr;
uintptr_t n;
uintptr_t cap;
} so58311426s_arr;
so58311426s_arr gen_s_arr(void);
This allows us to call gen_s_arr() from either C or Rust and retrieve a struct that is usable across both parts of the FFI boundary (so58311426s_arr). This struct contains all we need to be able to modify our array of S (well, so58311426S according to cbindgen).
When passing through FFI, you need to make sure of a few simple things:
You cannot pass raw boxes or non-primitive types; you will almost universally need to convert down to a set of pointers or change your definitions to accomodate (as I have done here)
You most definitely do not pass raw vectors. At most, you pass a slice, as that is a primitive type (see the point above).
You make sure to std::mem::forget() whatever you do not want to deallocate, and make sure to remember to deallocate it or reform it somewhere else.
I will edit this question in an hour; I have a plane to get on to. Let me know if any of this needs clarifications and I'll get to it once I'm in the right country :-)

Struct with mixed bitflag and normal members

I'm trying to recreate a C struct with mixed bitfield members and "normal" members in Rust for FFI.
I've read that the bitflags crate would be the one to go with, unfortunately I find the documentation lacking on the regards how the syntax actually works.
The bitflags crate makes it easier to create bitmasks in a similar style as in C using enums. The bitfield crate claims to create bitfields that can be accessed, however I have no idea how it works.
I have a C structure like this:
struct mixed {
unsigned int flag_1_1 : 1;
unsigned int flag_2_7 : 7;
unsigned int flag_3_8 : 8;
unsigned int some_val1;
unsigned int some_val2;
unsigned int flag_4_16 : 16;
};
I have no clue on how to represent it in Rust, I'd use the crate libc to have access to c_uint, but other than that, I'm currently pretty much out of ideas and finding other code that does this has not proven successful:
#[repr(transparent)] // do I also need repr(C) ?
struct mixed {
flags_1_3: mixed_flags_1_3;
some_val1: c_uint;
some_val2: c_uint;
flags_4: mixed_flags_4;
}
bitfield!(
#[repr(transparent)] // do I also need repr(C), here too?
struct mixed_flags_1_3(u16 /* what about this parameter? */) {
u8; // what does this mean?
/* get/field, set: first bit, last bit; */
flag_1_1, _: 0, 0;
flag_2_7, _: 7, 1;
flag_3_8, _: 15, 8;
}
)
bitfield!(
#[repr(transparent)]
struct mixed_flags_4(u8) {
u8;
flag_4_16, _: 15, 0;
}
)
These are just guesses, how do I create a correct representation?
In cases like this you can look at genearted code by bindgen:
$ bindgen test.h
#[repr(C)]
#[derive(Copy, Clone, Debug, Default, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub struct __BindgenBitfieldUnit<Storage, Align>
where
Storage: AsRef<[u8]> + AsMut<[u8]>,
{
storage: Storage,
align: [Align; 0],
}
//skipped code with access method for bit fields
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct mixed {
pub _bitfield_1: __BindgenBitfieldUnit<[u8; 2usize], u8>,
pub some_val1: ::std::os::raw::c_uint,
pub some_val2: ::std::os::raw::c_uint,
pub _bitfield_2: __BindgenBitfieldUnit<[u8; 2usize], u16>,
pub __bindgen_padding_0: u16,
}
Using rustc -- -Z unstable-options --pretty=expanded I think I could figure out that the macro does, and this seems to yield something that could be correct, however this is probably only compatible when the compiler does not try to pad or reorder the bitfields in the struct.
#[repr(transparent)] // do I also need repr(C) ?
struct mixed {
flags_1_3: mixed_flags_1_3;
some_val1: c_uint;
some_val2: c_uint;
flags_4: mixed_flags_4;
}
bitfield!(
#[repr(transparent)] // do I also need repr(C), here too?
// Results in a "tuple struct", ie. u16 = total size of bitfields
struct mixed_flags_1_3(u16) {
// All the following fields value should be treated as an
// unsigned int when accessed
c_uint;
/* get/field, set: first bit, last bit; */
flag_1_1, _: 0, 0;
flag_2_7, _: 7, 1;
// One could change the type again here, if one wanted to:
// u16
flag_3_8, _: 15, 8;
}
)
bitfield!(
#[repr(transparent)]
struct mixed_flags_4(u16) {
c_uint;
flag_4_16, _: 15, 0;
}
)
But for now at least I think I will just use libclang and bindgen as dependencies and generate my bindings automatically, due to the aforementioned problems with platform compat.

Move semantics in Rust

I'm wrapping a C library in Rust, and many of its functions take parameters by pointers to structs, which themselves often have pointers to other structs. In the interest of reducing overhead, I'd like to provide the ability to cache the results of marshaling the Rust data into the C structs.
Here's an example of how the C library might expect some parameters:
#[repr(C)]
struct Foo {
x: i32,
y: f32
}
#[repr(C)]
struct Bar {
p_foo: *const Foo,
z: bool
}
And how I'd imagine an owning, "cached" version would look:
struct Cached {
foo: Option<Foo>,
bar: Bar
}
The p_foo field of bar would be constructed to point to Some value within foo, or a null pointer if there was None.
The issue, here, of course, is that if a value of Cached was to be moved, a straight memcpy would be inappropriate and bar.p_foo would additionally need to be redirected. This would be easy to ensure in C++, with its definable move semantics, but does Rust offer a solution besides "don't set bar.p_foo until it's used"? While it would certainly work to do it that way, I don't imagine that these cached values will be moved more than (or even close to the frequency that) they are reused, and there is a bit of work involved to set up these pointers, especially if the nesting/chaining is deep/long. I'd also rather not Box the substructures up on the heap.
To clarify, here's what I can write in C++, which I would like to replicate in Rust:
struct Foo {
int x;
float y;
};
struct Bar {
Foo const*pFoo;
bool z;
};
// bear with me while I conjure up a Maybe for C++
class Cached {
public:
// would have appropriate copy constructor/assignment
Cached(Cached &&other) {
m_foo = other.m_foo;
m_bar = other.m_bar;
if(m_foo.isJust()) {
m_bar.pFoo = &m_foo.value();
} // else already nullptr
}
// similar move assignment
private:
Maybe<Foo> m_foo;
Bar m_bar;
};
The Rust-equivalent would be to not use raw pointers, as raw pointers are there for implementing our safe datastructures, not for implementing normal datastructures.
#[repr(C)]
struct Foo {
x: i32,
y: f32
}
#[repr(C)]
struct Bar {
p_foo: Option<Box<Foo>>,
z: bool
}
An Option<Box<T>> is guaranteed to be exactly equivalent (in bits in memory) to a *const T, as long as T is a type and not a trait. The only difference is that it's safe to use within Rust.
This way you don't even need a Cached struct anymore, but can directly pass around the Bar object.
I'd also rather not Box the substructures up on the heap.
Then I suggest you don't keep a Bar object around, and instead conjure it up whenever you need to pass one to C:
#[repr(C)]
struct Foo {
x: i32,
y: f32
}
#[repr(C)]
struct Bar<'a> {
p_foo: Option<&'a Foo>,
z: bool
}
struct Cached {
foo: Option<Foo>,
z: bool,
}
impl Cached {
fn bar<'a>(&'a self) -> Bar<'a> {
Bar {
p_foo: self.foo.as_ref(),
z: self.z,
}
}
}
there is a bit of work involved to set up these pointers, especially if the nesting/chaining is deep/long.
That sounds a lot like premature optimization. Don't optimize where you haven't benchmarked.

Resources