Translating c++'s union struct bitfield to rust [duplicate] - rust

I have a C struct defined as:
struct my_c_s {
u_char *ptr;
unsigned flag_a:1;
unsigned flag_b:1;
int some_num;
}
How would flag_a and flag_b be represented?
#[repr(C)]
pub struct my_rust_s {
pub ptr: *const u_char,
//pub flag_a: ?,
//pub flag_b: ?,
pub some_num: ::libc::c_int,
}
Can I declare them as bools? Or does that whole thing need to be some sort of set of bits with a single field, and then I bitmask them out?
e.g. pub flag_bits: ::libc::c_uint,

No, you can't.
There is an open issue about supporting bitfields, which doesn't seem to be active. In the issue, #retep998 explains how bitfields are handled in winapi. That might be helpful if you need to handle bitfields in C interface.
OP seems to aim at C interoperation, but if you just need bitfields functionality, there are several solutions.
You should always consider simple redundant solution: avoid bitfields and let fields align naturally.
bitfield, according to the comment -- I didn't know that, but it seems to provide C bitfields equivalent.
bitflags. This seems suitable for bit-based flags, which typically represented as enum in C.
#[repr(packed)] if you just want to pack fields to some degree, ignoring alignment. The fields will still be aligned to byte boundary.
bit-vec if you need homogenious arrays of bits.

Related

How to align a packed struct with no padding between fields in Rust?

I'm working with an ABI where I need exact control over the data layout of the payload on both ends. Furthermore, there should be no padding between fields at all, ever. Additionally, the beginning of the payload should be page-aligned.
#[repr(C)] helps a lot. The modifiers #[repr(packed(N))] and #[repr(align(N))] are compatible with repr(C) but they can't be used together. I can't achieve what I want with #[repr(C, packed(4096))].
How to solve this?
The packed(N) type layout modifier does not guarantee that there will be never padding at all. This is only the case for packed / packed(1) because packed(N) can only lower the alignment of each field to min(N, default alignment). packed(N) doesn't mean that the struct is "packed", i.e. no padding at all between fields, or the alignment of the struct is 4096 byte.
If you want a page-aligned struct with no padding at all, you want to do the following:
#[repr(align(4096))]
struct Aligned4096<T>(T);
// plus impl convenient methods
#[repr(C, packed)]
struct Foo {
a: u8,
b: u64,
c: u16,
d: u8,
}
// plus impl convenient methods
fn main() {
let aligned_foo = Aligned4096(Foo::new());
}
A more detailed view of how different N in packed(N) change the type layout is shown in this table on GitHub. More information about the type layout modifiers in general is provided in the official language documentation.

What does BigUint::from(24u32) do?

BigUint::from(24u32)
I understand that it is related to a big unsigned integer since it is part of a cryptography module to factor big numbers. I don't understand what the part from(24u32) does.
For example, if it is used in this context
let b: BigUint = (BigUint::from(24u32) * &n).sqrt() + &one;
Conversions between types in Rust are defined by the From trait which defines a function from, so that Foo::from (bar) converts the value bar to the type Foo. In your case, you are therefore converting 24u32 to type BigUint, but what's this 24u32?
Integer literals in Rust can use a suffix to specify their actual type. If you wrote just 24, it could be for example an i32, a u8 or a u32. Most of the time the compiler is able to infer the actual type from the way the value is used, and when it can't it defaults to i32. But in your case that won't work: there is no BigUint::from<i32> but there are conversion functions for all the regular unsigned types: u8, u16, u32, u64 and u128, so the compiler doesn't know which one to use. Adding the u32 suffix clarifies the type, so 24u32 is the value 24 with the type u32, allowing the compiler to understand that you want BigUint::from<u32> (24).
The BigUint struct implements the From<u32> trait, which means that it will implement a from(u32) function.
Implementations of the From<_> trait are used to perform a value conversion that will consume the original input. In your example the BigUint struct is constructed from the 24u32 number.

What are the differences between the multiple ways to create zero-sized structs?

I found four different ways to create a struct with no data:
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
struct C(()); // unit-valued tuple struct
struct D; // unit struct
(I'm leaving arbitrarily nested tuples that contain only ()s and single-variant enum declarations out of the question, as I understand why those shouldn't be used).
What are the differences between these four declarations? Would I use them for specific purposes, or are they interchangeable?
The book and the reference were surprisingly unhelpful. I did find this accepted RFC (clarified_adt_kinds) which goes into the differences a bit, namely that the unit struct also declares a constant value D and that the tuple structs also declare constructors B() and C(_: ()). However it doesn't offer a design guideline on why to use which.
My guess would be that when I export them with pub, there are differences in which kinds can actually be constructed outside of my module, but I found no conclusive documentation about that.
There are only two functional differences between these four definitions (and a fifth possibility I'll mention in a minute):
Syntax (the most obvious). mcarton's answer goes into more detail.
When the struct is marked pub, whether its constructor (also called struct literal syntax) is usable outside the module it's defined in.
The only one of your examples that is not directly constructible from outside the current module is C. If you try to do this, you will get an error:
mod stuff {
pub struct C(());
}
let _c = stuff::C(()); // error[E0603]: tuple struct `C` is private
This happens because the field is not marked pub; if you declare C as pub struct C(pub ()), the error goes away.
There's another possibility you didn't mention that gives a marginally more descriptive error message: a normal struct, with a zero-sized non-pub member.
mod stuff {
pub struct E {
_dummy: (),
}
}
let _e = stuff::E { _dummy: () }; // error[E0451]: field `_dummy` of struct `main::stuff::E` is private
(Again, you can make the _dummy field available outside of the module by declaring it with pub.)
Since E's constructor is only usable inside the stuff module, stuff has exclusive control over when and how values of E are created. Many structs in the standard library take advantage of this, like Box (to take an obvious example). Zero-sized types work in exactly the same way; in fact, from outside the module it's defined in, the only way you would know that an opaque type is zero-sized is by calling mem::size_of.
See also
What is an idiomatic way to create a zero-sized struct that can't be instantiated outside its crate?
Why define a struct with single private field of unit type?
struct D; // unit struct
This is the usual way for people to write a zero-sized struct.
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
These are just special cases of basic struct and tuple struct which happen to have no parameters. RFC 1506 explains the rational to allow those (they didn't used to):
Permit tuple structs and tuple variants with 0 fields. This restriction is artificial and can be lifted trivially. Macro writers dealing with tuple structs/variants will be happy to get rid of this one special case.
As such, they could easily be generated by macros, but people will rarely write those on their own.
struct C(()); // unit-valued tuple struct
This is another special case of tuple struct. In Rust, () is a type just like any other type, so struct C(()); isn't much different from struct E(u32);. While the type itself isn't very useful, forbidding it would make yet another special case that would need to be handled in macros or generics (struct F<T>(T) can of course be instantiated as F<()>).
Note that there are many other ways to have empty types in Rust. Eg. it is possible to have a function return Result<(), !> to indicate that it doesn't produce a value, and cannot fail. While you might think that returning () in that case would be better, you might have to do that if you implement a trait that dictates you to return Result<T, E> but lets you choose T = () and E = !.

Is there any way to restrict a generic type to one of several types?

I'm trying to create a generic struct which uses an "integer type" for references into an array. For performance reasons I'd like to be able to specify easily whether to use u16, u32 or u64. Something like this (which obviously isn't valid Rust code):
struct Foo<T: u16 or u32 or u64> { ... }
Is there any way to express this?
For references into an array usually you'd just use a usize rather than different integer types.
However, to do what you are after you can create a new trait, implement that trait for u16, u32 and u64 and then restrict T to your new trait.
pub trait MyNewTrait {}
impl MyNewTrait for u16 {}
impl MyNewTrait for u32 {}
impl MyNewTrait for u64 {}
struct Foo<T: MyNewTrait> { ... }
You may then also add methods onto MyNewTrait and the impls to encapsulate the logic specific to u16, u32 and u64.
Sometimes you may want to use an enum rather than a generic type with a trait bound. For example:
enum Unsigned {
U16(u16),
U32(u32),
U64(u64),
}
struct Foo { x: Unsigned, ... };
One advantage of making a new type over implementing a new trait for existing types is that you can add foreign traits and inherent behavior to the new type. You can implement any traits you like for Unsigned, like Add, Mul, etc. When Foo contains an Unsigned, implementing traits on Unsigned doesn't affect the signature of Foo like it would to add them as bounds on Foo's parameter (e.g. Foo<T: Add<Output=Self> + PartialCmp + ...>). On the other hand, you do still have to implement each trait.
Another thing to note: while you can generally always make a new type and implement a trait for it, an enum is "closed": you can't add new types to Unsigned without touching the rest of its implementation, like you could if you used a trait. This may be a good thing or a bad thing depending on what your design calls for.
"Performance reasons" is a bit ambiguous, but if you're thinking of storing a lot of Unsigneds that will all be the same internal type, and this:
struct Foo([Unsigned; 1_000_000]);
would waste a ton of space over storing a million u16s, you can still make Foo generic! Just implement From<u16>, From<u32>, and From<u64> for Unsigned and write this instead:
struct Foo<T: Into<Unsigned>>([T; 1_000_000]);
Now you only have one simple trait bound on T, you're not wasting space for tags and padding, and functions that deal with T can always convert it to Unsigned to do calculations with. The cost of the conversion may even be optimized away entirely.
See Also
Should I use enum to emulate the polymorphism or use trait with Box<trait> instead?

How does a repr(C) type handle Option?

I have this C code:
typedef void (*f_t)(int a);
struct Foo {
f_t f;
};
extern void f(struct Foo *);
bindgen generates the following Rust code (I have removed unimportant details):
#[repr(C)]
#[derive(Copy, Clone)]
#[derive(Debug)]
pub struct Foo {
pub f: ::std::option::Option<extern "C" fn(a: ::std::os::raw::c_int)>,
}
I do not understand why Option is here. Obviously that Rust enum and C pointer are not the same thing on the bit level, so how does the Rust compiler handle this?
When I call the C f function and pass a pointer to a Rust struct Foo, does the compiler convert Foo_rust to Foo_C and then only pass a pointer to Foo_C to f?
From The Rust Programming Language chapter on FFI (emphasis mine):
Certain types are defined to not be null. This includes references (&T, &mut T), boxes (Box<T>), and function pointers (extern "abi" fn()). When interfacing with C, pointers that might be null are often used. As a special case, a generic enum that contains exactly two variants, one of which contains no data and the other containing a single field, is eligible for the "nullable pointer optimization". When such an enum is instantiated with one of the non-nullable types, it is represented as a single pointer, and the non-data variant is represented as the null pointer. So Option<extern "C" fn(c_int) -> c_int> is how one represents a nullable function pointer using the C ABI.
Said another way:
Obviously that Rust enum and C pointer are not the same thing on the bit level
They actually are, when the Option contains a specific set of types.
See also:
Can I use the "null pointer optimization" for my own non-pointer types?
What is the overhead of Rust's Option type?
How to check if function pointer passed from C is non-NULL

Resources