Is there a way to pass compile time constant to repr directive? - rust

const CELL_ALIGNMENT: usize = std::mem::align_of::<EntryType>();
#[repr(align(CELL_ALIGNMENT))]
pub struct AlignedCell;
#[repr(C)]
pub struct AlignedHeader {
_align: [AlignedCell; 0],
count: usize,
}
CELL_ALIGNMENT is a constant. But it looks like repr doesn't allow constant. Only literal is allowed. Is there any way to get around this?

No, you cannot use an expression in #[repr(align(_))]. It must be a literal.
The next best thing: you can assert the alignment of your struct matches another at compile-time by using the static-assertions crate:
#[repr(align(8))]
pub struct AlignedCell;
static_assertions::assert_eq_align!(AlignedCell, EntryType);
It doesn't set the alignment automatically, but you'll get a compiler error if its wrong.

Related

How to align a packed struct with no padding between fields in Rust?

I'm working with an ABI where I need exact control over the data layout of the payload on both ends. Furthermore, there should be no padding between fields at all, ever. Additionally, the beginning of the payload should be page-aligned.
#[repr(C)] helps a lot. The modifiers #[repr(packed(N))] and #[repr(align(N))] are compatible with repr(C) but they can't be used together. I can't achieve what I want with #[repr(C, packed(4096))].
How to solve this?
The packed(N) type layout modifier does not guarantee that there will be never padding at all. This is only the case for packed / packed(1) because packed(N) can only lower the alignment of each field to min(N, default alignment). packed(N) doesn't mean that the struct is "packed", i.e. no padding at all between fields, or the alignment of the struct is 4096 byte.
If you want a page-aligned struct with no padding at all, you want to do the following:
#[repr(align(4096))]
struct Aligned4096<T>(T);
// plus impl convenient methods
#[repr(C, packed)]
struct Foo {
a: u8,
b: u64,
c: u16,
d: u8,
}
// plus impl convenient methods
fn main() {
let aligned_foo = Aligned4096(Foo::new());
}
A more detailed view of how different N in packed(N) change the type layout is shown in this table on GitHub. More information about the type layout modifiers in general is provided in the official language documentation.

What does BigUint::from(24u32) do?

BigUint::from(24u32)
I understand that it is related to a big unsigned integer since it is part of a cryptography module to factor big numbers. I don't understand what the part from(24u32) does.
For example, if it is used in this context
let b: BigUint = (BigUint::from(24u32) * &n).sqrt() + &one;
Conversions between types in Rust are defined by the From trait which defines a function from, so that Foo::from (bar) converts the value bar to the type Foo. In your case, you are therefore converting 24u32 to type BigUint, but what's this 24u32?
Integer literals in Rust can use a suffix to specify their actual type. If you wrote just 24, it could be for example an i32, a u8 or a u32. Most of the time the compiler is able to infer the actual type from the way the value is used, and when it can't it defaults to i32. But in your case that won't work: there is no BigUint::from<i32> but there are conversion functions for all the regular unsigned types: u8, u16, u32, u64 and u128, so the compiler doesn't know which one to use. Adding the u32 suffix clarifies the type, so 24u32 is the value 24 with the type u32, allowing the compiler to understand that you want BigUint::from<u32> (24).
The BigUint struct implements the From<u32> trait, which means that it will implement a from(u32) function.
Implementations of the From<_> trait are used to perform a value conversion that will consume the original input. In your example the BigUint struct is constructed from the 24u32 number.

Is there any way to restrict a generic type to one of several types?

I'm trying to create a generic struct which uses an "integer type" for references into an array. For performance reasons I'd like to be able to specify easily whether to use u16, u32 or u64. Something like this (which obviously isn't valid Rust code):
struct Foo<T: u16 or u32 or u64> { ... }
Is there any way to express this?
For references into an array usually you'd just use a usize rather than different integer types.
However, to do what you are after you can create a new trait, implement that trait for u16, u32 and u64 and then restrict T to your new trait.
pub trait MyNewTrait {}
impl MyNewTrait for u16 {}
impl MyNewTrait for u32 {}
impl MyNewTrait for u64 {}
struct Foo<T: MyNewTrait> { ... }
You may then also add methods onto MyNewTrait and the impls to encapsulate the logic specific to u16, u32 and u64.
Sometimes you may want to use an enum rather than a generic type with a trait bound. For example:
enum Unsigned {
U16(u16),
U32(u32),
U64(u64),
}
struct Foo { x: Unsigned, ... };
One advantage of making a new type over implementing a new trait for existing types is that you can add foreign traits and inherent behavior to the new type. You can implement any traits you like for Unsigned, like Add, Mul, etc. When Foo contains an Unsigned, implementing traits on Unsigned doesn't affect the signature of Foo like it would to add them as bounds on Foo's parameter (e.g. Foo<T: Add<Output=Self> + PartialCmp + ...>). On the other hand, you do still have to implement each trait.
Another thing to note: while you can generally always make a new type and implement a trait for it, an enum is "closed": you can't add new types to Unsigned without touching the rest of its implementation, like you could if you used a trait. This may be a good thing or a bad thing depending on what your design calls for.
"Performance reasons" is a bit ambiguous, but if you're thinking of storing a lot of Unsigneds that will all be the same internal type, and this:
struct Foo([Unsigned; 1_000_000]);
would waste a ton of space over storing a million u16s, you can still make Foo generic! Just implement From<u16>, From<u32>, and From<u64> for Unsigned and write this instead:
struct Foo<T: Into<Unsigned>>([T; 1_000_000]);
Now you only have one simple trait bound on T, you're not wasting space for tags and padding, and functions that deal with T can always convert it to Unsigned to do calculations with. The cost of the conversion may even be optimized away entirely.
See Also
Should I use enum to emulate the polymorphism or use trait with Box<trait> instead?

How do I return an vector of dynamic length in a pub extern "C" fn?

I want to return a vector in a pub extern "C" fn. Since a vector has an arbitrary length, I guess I need to return a struct with
the pointer to the vector, and
the number of elements in the vector
My current code is:
extern crate libc;
use self::libc::{size_t, int32_t, int64_t};
// struct to represent an array and its size
#[repr(C)]
pub struct array_and_size {
values: int64_t, // this is probably not how you denote a pointer, right?
size: int32_t,
}
// The vector I want to return the address of is already in a Boxed struct,
// which I have a pointer to, so I guess the vector is on the heap already.
// Dunno if this changes/simplifies anything?
#[no_mangle]
pub extern "C" fn rle_show_values(ptr: *mut Rle) -> array_and_size {
let rle = unsafe {
assert!(!ptr.is_null());
&mut *ptr
};
// this is the Vec<i32> I want to return
// the address and length of
let values = rle.values;
let length = values.len();
array_and_size {
values: Box::into_raw(Box::new(values)),
size: length as i32,
}
}
#[derive(Debug, PartialEq)]
pub struct Rle {
pub values: Vec<i32>,
}
The error I get is
$ cargo test
Compiling ranges v0.1.0 (file:///Users/users/havpryd/code/rust-ranges)
error[E0308]: mismatched types
--> src/rle.rs:52:17
|
52 | values: Box::into_raw(Box::new(values)),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected i64, found *-ptr
|
= note: expected type `i64`
= note: found type `*mut std::vec::Vec<i32>`
error: aborting due to previous error
error: Could not compile `ranges`.
To learn more, run the command again with --verbose.
-> exit code: 101
I posted the whole thing because I could not find an example of returning arrays/vectors in the eminently useful Rust FFI Omnibus.
Is this the best way to return a vector of unknown size from Rust? How do I fix my remaining compile error? Thanks!
Bonus q: if the fact that my vector is in a struct changes the answer, perhaps you could also show how to do this if the vector was not in a Boxed struct already (which I think means the vector it owns is on the heap too)? I guess many people looking up this q will not have their vectors boxed already.
Bonus q2: I only return the vector to view its values (in Python), but I do not want to let the calling code change the vector. But I guess there is no way to make the memory read-only and ensure the calling code does not fudge with the vector? const is just for showing intent, right?
Ps: I do not know C or Rust well, so my attempt might be completely WTF.
pub struct array_and_size {
values: int64_t, // this is probably not how you denote a pointer, right?
size: int32_t,
}
First of all, you're correct. The type you want for values is *mut int32_t.
In general, and note that there are a variety of C coding styles, C often doesn't "like" returning ad-hoc sized array structs like this. The more common C API would be
int32_t rle_values_size(RLE *rle);
int32_t *rle_values(RLE *rle);
(Note: many internal programs do in fact use sized array structs, but this is by far the most common for user-facing libraries because it's automatically compatible with the most basic way of representing arrays in C).
In Rust, this would translate to:
extern "C" fn rle_values_size(rle: *mut RLE) -> int32_t
extern "C" fn rle_values(rle: *mut RLE) -> *mut int32_t
The size function is straightforward, to return the array, simply do
extern "C" fn rle_values(rle: *mut RLE) -> *mut int32_t {
unsafe { &mut (*rle).values[0] }
}
This gives a raw pointer to the first element of the Vec's underlying buffer, which is all C-style arrays really are.
If, instead of giving C a reference to your data you want to give C the data, the most common option would be to allow the user to pass in a buffer that you clone the data into:
extern "C" fn rle_values_buf(rle: *mut RLE, buf: *mut int32_t, len: int32_t) {
use std::{slice,ptr}
unsafe {
// Make sure we don't overrun our buffer's length
if len > (*rle).values.len() {
len = (*rle).values.len()
}
ptr::copy_nonoverlapping(&(*rle).values[0], buf, len as usize);
}
}
Which, from C, looks like
void rle_values_buf(RLE *rle, int32_t *buf, int32_t len);
This (shallowly) copies your data into the presumably C-allocated buffer, which the C user is then responsible for destroying. It also prevents multiple mutable copies of your array from floating around at the same time (assuming you don't implement the version that returns a pointer).
Note that you could sort of "move" the array into C as well, but it's not particularly recommended and involves the use mem::forget and expecting the C user to explicitly call a destruction function, as well as requiring both you and the user to obey some discipline that may be difficult to structure the program around.
If you want to receive an array from C, you essentially just ask for both a *mut i32 and i32 corresponding to the buffer start and length. You can assemble this into a slice using the from_raw_parts function, and then use the to_vec function to create an owned Vector containing the values allocated from the Rust side. If you don't plan on needing to own the values, you can simply pass around the slice you produced via from_raw_parts.
However, it is imperative that all values be initialized from either side, typically to zero. Otherwise you invoke legitimately undefined behavior which often results in segmentation faults (which tend to frustratingly disappear when inspected with GDB).
There are multiple ways to pass an array to C.
First of all, while C has the concept of fixed-size arrays (int a[5] has type int[5] and sizeof(a) will return 5 * sizeof(int)), it is not possible to directly pass an array to a function or return an array from it.
On the other hand, it is possible to wrap a fixed size array in a struct and return that struct.
Furthermore, when using an array, all elements must be initialized, otherwise a memcpy technically has undefined behavior (as it is reading from undefined values) and valgrind will definitely report the issue.
Using a dynamic array
A dynamic array is an array whose length is unknown at compile-time.
One may chose to return a dynamic array if no reasonable upper-bound is known, or this bound is deemed too large for passing by value.
There are two ways to handle this situation:
ask C to pass a suitably sized buffer
allocate a buffer and return it to C
They differ in who allocates the memory: the former is simpler, but may require to either have a way to hint at a suitable size or to be able to "rewind" if the size proves unsuitable.
Ask C to pass a suitable sized buffer
// file.h
int rust_func(int32_t* buffer, size_t buffer_length);
// file.rs
#[no_mangle]
pub extern fn rust_func(buffer: *mut libc::int32_t, buffer_length: libc::size_t) -> libc::c_int {
// your code here
}
Note the existence of std::slice::from_raw_parts_mut to transform this pointer + length into a mutable slice (do initialize it with 0s before making it a slice or ask the client to).
Allocate a buffer and return it to C
// file.h
struct DynArray {
int32_t* array;
size_t length;
}
DynArray rust_alloc();
void rust_free(DynArray);
// file.rs
#[repr(C)]
struct DynArray {
array: *mut libc::int32_t,
length: libc::size_t,
}
#[no_mangle]
pub extern fn rust_alloc() -> DynArray {
let mut v: Vec<i32> = vec!(...);
let result = DynArray {
array: v.as_mut_ptr(),
length: v.len() as _,
};
std::mem::forget(v);
result
}
#[no_mangle]
pub extern fn rust_free(array: DynArray) {
if !array.array.is_null() {
unsafe { Box::from_raw(array.array); }
}
}
Using a fixed-size array
Similarly, a struct containing a fixed size array can be used. Note that both in Rust and C all elements should be initialized, even if unused; zeroing them works well.
Similarly to the dynamic case, it can be either passed by mutable pointer or returned by value.
// file.h
struct FixedArray {
int32_t array[32];
};
// file.rs
#[repr(C)]
struct FixedArray {
array: [libc::int32_t; 32],
}

Is it possible to return part of a struct by reference?

Consider the following two structs:
pub struct BitVector<S: BitStorage> {
data: Vec<S>,
capacity: usize,
storage_size: usize
}
pub struct BitSlice<'a, S: BitStorage> {
data: &'a [S],
storage_size: usize
}
Where BitStorage is practically a type that is restricted to all unsigned integers (u8, u16, u32, u64, usize).
How to implement the Deref trait? (BitVector<S> derefs to BitSlice<S> similar to how Vec<S> derefs to &[S])
I have tried the following (Note that it doesn't compile due to issues with lifetimes, but more importantly because I try to return a value on the stack as a reference):
impl<'b, S: BitStorage> Deref for BitVector<S> {
type Target = BitSlice<'b, S>;
fn deref<'a>(&'a self) -> &'a BitSlice<'b, S> {
let slice = BitSlice {
data: self.data,
storage_size: self.storage_size,
};
&slice
}
}
I am aware that it is possible to return a field of a struct by reference, so for example I could return &Vec<S> or &usize in the Deref trait, but is it possible to return a BitSlice noting that I essentially have all the data in the BitVector already as Vec<S> can be transformed into &[S] and storage_size is already there?
I would think this is possible if I could create a struct using both values and somehow tell the compiler to ignore the fact that it is a struct that is created on the stack and instead just use the existing values, but I have got no clue how.
Deref is required to return a reference. A reference always points to some existing memory, and any local variable will not exist long enough. While there are, in theory, some sick tricks you could play to create a new object in deref and return a reference to it, all that I'm aware of result in a memory leak. Let's ignore these technicalities and just say it's plain impossible.
Now what? You'll have to change your API. Vec can implement Deref because it derefs to [T], not to &[T] or anything like that. You may have success with the same strategy: Make BitSlice<S> an unsized type containing only a slice [S], so that the return type is &'a BitSlice<S>. This assume the storage_size member is not needed. But it seems that this refers to the number of bits that are logically valid (i.e., can be accessed without extending the bit vector) — if so, that seems unavoidable1.
The other alternative, of course, is to not implement a Deref. Inconvenient, but if your slice data type is too far from an actual slice, it may be the only option.
RFC PR #1524 that proposed custom dynamically-sized types, then you could have a type BitSlice<S> that is like a slice but can have additional contents such as storage_size. However, this doesn't exist yet and it's far from certain if it ever will.
1 The capacity member on BitVector, however, seems pointless. Isn't that just sizeof S * 8?

Resources