What are the differences between the multiple ways to create zero-sized structs? - struct

I found four different ways to create a struct with no data:
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
struct C(()); // unit-valued tuple struct
struct D; // unit struct
(I'm leaving arbitrarily nested tuples that contain only ()s and single-variant enum declarations out of the question, as I understand why those shouldn't be used).
What are the differences between these four declarations? Would I use them for specific purposes, or are they interchangeable?
The book and the reference were surprisingly unhelpful. I did find this accepted RFC (clarified_adt_kinds) which goes into the differences a bit, namely that the unit struct also declares a constant value D and that the tuple structs also declare constructors B() and C(_: ()). However it doesn't offer a design guideline on why to use which.
My guess would be that when I export them with pub, there are differences in which kinds can actually be constructed outside of my module, but I found no conclusive documentation about that.

There are only two functional differences between these four definitions (and a fifth possibility I'll mention in a minute):
Syntax (the most obvious). mcarton's answer goes into more detail.
When the struct is marked pub, whether its constructor (also called struct literal syntax) is usable outside the module it's defined in.
The only one of your examples that is not directly constructible from outside the current module is C. If you try to do this, you will get an error:
mod stuff {
pub struct C(());
}
let _c = stuff::C(()); // error[E0603]: tuple struct `C` is private
This happens because the field is not marked pub; if you declare C as pub struct C(pub ()), the error goes away.
There's another possibility you didn't mention that gives a marginally more descriptive error message: a normal struct, with a zero-sized non-pub member.
mod stuff {
pub struct E {
_dummy: (),
}
}
let _e = stuff::E { _dummy: () }; // error[E0451]: field `_dummy` of struct `main::stuff::E` is private
(Again, you can make the _dummy field available outside of the module by declaring it with pub.)
Since E's constructor is only usable inside the stuff module, stuff has exclusive control over when and how values of E are created. Many structs in the standard library take advantage of this, like Box (to take an obvious example). Zero-sized types work in exactly the same way; in fact, from outside the module it's defined in, the only way you would know that an opaque type is zero-sized is by calling mem::size_of.
See also
What is an idiomatic way to create a zero-sized struct that can't be instantiated outside its crate?
Why define a struct with single private field of unit type?

struct D; // unit struct
This is the usual way for people to write a zero-sized struct.
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
These are just special cases of basic struct and tuple struct which happen to have no parameters. RFC 1506 explains the rational to allow those (they didn't used to):
Permit tuple structs and tuple variants with 0 fields. This restriction is artificial and can be lifted trivially. Macro writers dealing with tuple structs/variants will be happy to get rid of this one special case.
As such, they could easily be generated by macros, but people will rarely write those on their own.
struct C(()); // unit-valued tuple struct
This is another special case of tuple struct. In Rust, () is a type just like any other type, so struct C(()); isn't much different from struct E(u32);. While the type itself isn't very useful, forbidding it would make yet another special case that would need to be handled in macros or generics (struct F<T>(T) can of course be instantiated as F<()>).
Note that there are many other ways to have empty types in Rust. Eg. it is possible to have a function return Result<(), !> to indicate that it doesn't produce a value, and cannot fail. While you might think that returning () in that case would be better, you might have to do that if you implement a trait that dictates you to return Result<T, E> but lets you choose T = () and E = !.

Related

rust, how to run entities' origina method when access it from wrapped Box, [duplicate]

How can I downcast a trait to a struct like in this C# example?
I have a base trait and several derived structs that must be pushed into a single vector of base traits.
I have to check if the each item of the vector is castable to a specific derived struct and, if yes, use it as a struct of that type.
This is my Rust code, I don't know how to implement the commented part.
trait Base {
fn base_method(&self);
}
struct Derived1;
impl Derived1 {
pub fn derived1_method(&self) {
println!("Derived1");
}
}
impl Base for Derived1 {
fn base_method(&self) {
println!("Base Derived1");
}
}
struct Derived2;
impl Derived2 {
pub fn derived2_method(&self) {
println!("Derived2");
}
}
impl Base for Derived2 {
fn base_method(&self) {
println!("Base Derived2");
}
}
fn main() {
let mut bases = Vec::<Box<dyn Base>>::new();
let der1 = Derived1{};
let der2 = Derived2{};
bases.push(Box::new(der1));
bases.push(Box::new(der2));
for b in bases {
b.base_method();
//if (b is Derived1)
// (b as Derived1).derived1_method();
//else if (b is Derived2)
// (b as Derived2).derived2_method();
}
}
Technically you can use as_any, as explained in this answer:
How to get a reference to a concrete type from a trait object?
However, type-checking and downcasting when looping over a vector of trait objects is considered a code smell. If you put a bunch of objects into a vector and then loop over that vector, presumably the objects in that vector are supposed to play a similar role.
So then you should refactor your code such that you can call the same method on your object regardless of the underlying concrete type.
From your code, it seems you're purely checking the type (and downcasting) so that you can call the appropriate method. What you really should do, then, is introduce yet another trait that provides a unified interface that you then can call from your loop, so that the loop doesn't need to know the concrete type at all.
EDIT: Allow me to add a concrete example that highlights this, but I'm going to use Python to show this, because in Python it's very easy to do what you are asking to do, so we can then focus on why it's not the best design choice:
class Dog:
def bark():
print("Woof woof")
class Cat:
def meow():
print("Meow meow")
list_of_animals = [Dog(), Dog(), Cat(), Dog()]
for animal in list_of_animals:
if isinstance(animal, Dog):
animal.bark()
elif isinstance(animal, Cat):
animal.meow()
Here Python's dynamic typing allows us to just slap all the objects into the same list, iterate over it, and then figure out the type at runtime so we can call the right method.
But really the whole point of well-designed object oriented code is to lift that burden from the caller to the object. This type of design is very inflexible, because as soon as you add another animal, you'll have to add another branch to your if block, and you better do that everywhere you had that branching.
The solution is of course to identify the common role that both bark and meow play, and abstract that behavior into an interface. In Python of course we don't need to formally declare such an interface, so we can just slap another method in:
class Dog:
...
def make_sound():
self.bark()
class Cat:
...
def make_sound():
self.meow()
...
for animal in list_of_animals:
animal.make_sound()
In your Rust code, you actually have two options, and that depends on the rest of your design. Either, as I suggested, adding another trait that expresses the common role that the objects play (why put them into a vector otherwise in the first place?) and implementing that for all your derived structs. Or, alternatively, expressing all the various derived structs as different variants of the same enum, and then add a method to the enum that handles the dispatch. The enum is of course more closed to outside extension than using the trait version. That's why the solution will depend on your needs for that.

Why doesn't the struct update syntax work on non-exhaustive structs?

In the struct update syntax, the "spreaded" struct must be the same type as the resulting struct. So the spreaded struct has to contain all fields already.
What, then, is left that is not "exhausted"? Why is the struct update syntax not allowed for non-exhaustive struct?
use some_crate::NonExhaustiveStruct;
let a = NonExhaustiveStruct::default();
let b = {
some_field: true,
..a //Why doesn't this work?
};
This is currently an explicitly unsupported edge case: https://rust-lang.github.io/rfcs/2008-non-exhaustive.html#functional-record-updates. Within the same crate struct spread update syntax is allowed on non-exhausive structs but it is not allowed when the struct is defined in a separate crate.
The reasoning for this is that a private field could be added in future, and code outside the crate can't do spread updates of structs with private fields.

How to handle generic types with different concrete types in rust efficiently?

The main goal is to implement a computation graph, that handles nodes with values and nodes with operators (think of simple arithmetic operators like add, subtract, multiply etc..). An operator node can take up to two value nodes, and "produces" a resulting value node.
Up to now, I'm using an enum to differentiate between a value and operator node:
pub enum Node<'a, T> where T : Copy + Clone {
Value(ValueNode<'a, T>),
Operator(OperatorNode)
}
pub struct ValueNode<'a, T> {
id: usize,
value_object : &'a dyn ValueType<T>
}
Update: Node::Value contains a struct, which itself contains a reference to a trait object ValueType, which is being implemented by a variety of concrete types.
But here comes the problem. During compililation, the generic types will be elided, and replaced by the actual types. The generic type T is also being propagated throughout the computation graph (obviously):
pub struct ComputationGraph<T> where T : Copy + Clone {
nodes: Vec<Node<T>>
}
This actually restricts the usage of ComputeGraph to one specific ValueType.
Furthermore the generic type T cannot be Sized, since a value node can be an opqaue type handled by a different backend not available through rust (think of C opqaue types made available through FFI).
One possible solution to this problem would be to introduce an additional enum type, that "mirrors" the concrete implementation of the valuetype trait mentioned above. this approach would be similiar, that enum dispatch does.
Is there anything I haven't thought of to use multiple implementations of ValueType?
update:
What i want to achive is following code:
pub struct Scalar<T> where T : Copy + Clone{
data : T
}
fn main() {
let cg = ComputeGraph::new();
// a new scalar type. doesn't have to be a tuple struct
let a = Scalar::new::<f32>(1.0);
let b_size = 32;
let b = Container::new::<opaque_type>(32);
let op = OperatorAdd::new();
// cg.insert_operator_node constructs four nodes: 3 value nodes
// and one operator nodes internally.
let result = cg.insert_operator_node::<Container>(&op, &a, &b);
}
update
ValueType<T> looks like this
pub trait ValueType<T> {
fn get_size(&self) -> usize;
fn get_value(&self) -> T;
}
update
To further increase the clarity of my question think of a small BLAS library backed by OpenCL. The memory management and device interaction shall be transparent to the user. A Matrix type allocates space on an OpenCL device with types as a primitive type buffer, and the appropriate call will return a pointer to that specific region of memory. Think of an operation that will scale the matrix by a scalar type, that is being represented by a primitive value. Both the (pointer to the) buffer and the scalar can be passed to a kernel function. Going back to the ComputeGraph, it seems obvious, that all BLAS operations form some type of computational graph, which can be reduced to a linear list of instructions ( think here of setting kernel arguments, allocating buffers, enqueue the kernel, storing the result, etc... ). Having said all that, a computation graph needs to be able to store value nodes with a variety of types.
As always the answer to the problem posed in the question is obvious. The graph expects one generic type (with trait bounds). Using an enum to "cluster" various subtypes was the solution, as already sketched out in the question.
An example to illustrate the solution. Consider following "subtypes":
struct Buffer<T> {
// fields
}
struct Scalar<T> {
// fields
}
struct Kernel {
// fields
}
The value containing types can be packed into an enum:
enum MemType {
Buffer(Buffer<f32>);
Scalar(Scalar<f32>);
// more enum variants ..
}
Now MemType and Kernel can now be packed in an enum as well
enum Node {
Value(MemType);
Operator(Kernel);
}
Node can now be used as the main type for nodes/vertices inside the graph. The solution might not be very elegant, but it does the trick for now. Maybe some code restructuring might be done in the future.

How does a repr(C) type handle Option?

I have this C code:
typedef void (*f_t)(int a);
struct Foo {
f_t f;
};
extern void f(struct Foo *);
bindgen generates the following Rust code (I have removed unimportant details):
#[repr(C)]
#[derive(Copy, Clone)]
#[derive(Debug)]
pub struct Foo {
pub f: ::std::option::Option<extern "C" fn(a: ::std::os::raw::c_int)>,
}
I do not understand why Option is here. Obviously that Rust enum and C pointer are not the same thing on the bit level, so how does the Rust compiler handle this?
When I call the C f function and pass a pointer to a Rust struct Foo, does the compiler convert Foo_rust to Foo_C and then only pass a pointer to Foo_C to f?
From The Rust Programming Language chapter on FFI (emphasis mine):
Certain types are defined to not be null. This includes references (&T, &mut T), boxes (Box<T>), and function pointers (extern "abi" fn()). When interfacing with C, pointers that might be null are often used. As a special case, a generic enum that contains exactly two variants, one of which contains no data and the other containing a single field, is eligible for the "nullable pointer optimization". When such an enum is instantiated with one of the non-nullable types, it is represented as a single pointer, and the non-data variant is represented as the null pointer. So Option<extern "C" fn(c_int) -> c_int> is how one represents a nullable function pointer using the C ABI.
Said another way:
Obviously that Rust enum and C pointer are not the same thing on the bit level
They actually are, when the Option contains a specific set of types.
See also:
Can I use the "null pointer optimization" for my own non-pointer types?
What is the overhead of Rust's Option type?
How to check if function pointer passed from C is non-NULL

Why is the Copy trait needed for default (struct valued) array initialization?

When I define a struct like this, I can pass it to a function by value without adding anything specific:
#[derive(Debug)]
struct MyType {
member: u16,
}
fn my_function(param: MyType) {
println!("param.member: {}", param.member);
}
When I want to create an array of MyType instances with a default value
fn main() {
let array = [MyType { member: 1234 }; 100];
println!("array[42].member: ", array[42].member);
}
The Rust compiler tells me:
error[E0277]: the trait bound `MyType: std::marker::Copy` is not satisfied
--> src/main.rs:11:17
|
11 | let array = [MyType { member: 1234 }; 100];
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `std::marker::Copy` is not implemented for `MyType`
|
= note: the `Copy` trait is required because the repeated element will be copied
When I implement Copy and Clone, everything works:
impl Copy for MyType {}
impl Clone for MyType {
fn clone(&self) -> Self {
MyType {
member: self.member.clone(),
}
}
}
Why do I need to specify an empty Copy trait implementation?
Is there a simpler way to do this or do I have to re-think something?
Why does it work when passing an instance of MyType to the function by value? My guess is that it is being moved, so there is no copy in the first place.
Contrary to C/C++, Rust has very explicit distinction between types which are copied and which are moved. Note that this is only a semantic distinction; on the implementation level move is a shallow bytewise copy, however, the compiler places certain restrictions on what you can do with variables you moved from.
By default every type is only moveable (non-copyable). It means that values of such types are moved around:
let x = SomeNonCopyableType::new();
let y = x;
x.do_something(); // error!
do_something_else(x); // error!
You see, the value which was stored in x has been moved to y, and so you can't do anything with x.
Move semantics is a very important part of ownership concept in Rust. You can read more on it in the official guide.
Some types, however, are simple enough so their bytewise copy is also their semantic copy: if you copy a value byte-by-byte, you will get a new completely independent value. For example, primitive numbers are such types. Such property is designated by Copy trait in Rust, i.e. if a type implements Copy, then values of this type are implicitly copyable. Copy does not contain methods; it exists solely to mark that implementing types have certain property and so it is usually called a marker trait (as well as few other traits which do similar things).
However, it does not work for all types. For example, structures like dynamically allocated vectors cannot be automatically copyable: if they were, the address of the allocation contained in them would be byte-copied too, and then the destructor of such vector will be run twice over the same allocation, causing this pointer to be freed twice, which is a memory error.
So by default custom types in Rust are not copyable. But you can opt-in for it using #[derive(Copy, Clone)] (or, as you noticed, using direct impl; they are equivalent, but derive usually reads better):
#[derive(Copy, Clone)]
struct MyType {
member: u16
}
(deriving Clone is necessary because Copy inherits Clone, so everything which is Copy must also be Clone)
If your type can be automatically copyable in principle, that is, it doesn't have an associated destructor and all of its members are Copy, then with derive your type will also be Copy.
You can use Copy types in array initializer precisely because the array will be initialized with bytewise copies of the value used in this initializer, so your type has to implement Copy to designate that it indeed can be automatically copied.
The above was the answer to 1 and 2. As for 3, yes, you are absolutely correct. It does work precisely because the value is moved into the function. If you tried to use a variable of MyType type after you passed it into the function, you would quickly notice an error about using a moved value.
Why do I need to specify an empty Copy trait implementation?
Copy is a special built-in trait such that T implementing Copy represents that it is safe to duplicate a value of type T with a shallow byte copy.
This simple definition mean that one just needs to tell the compiler those semantics are correct, since there's no fundamental change in run-time behaviour: both a move (a non-Copy type) and a "copy" are shallow byte copies, it's just a question of if the source is usable later. See an older answer for more details.
(The compiler will complain if the contents of MyType isn't Copy itself; previously it would be automatically implemented, but that all changed with opt-in built-in traits.)
Creating an array is duplicating the value via shallow copies, and this is guaranteed to be safe if T is Copy. It is safe in more general situations, #5244 covers some of them, but at the core, a non-Copy struct won't be able to be used to create a fixed-length array automatically because the compiler can't tell that the duplication is safe/correct.
Is there a simpler way to do this or do I have to re-think something (I'm coming from C)?
#[derive(Copy)]
struct MyType {
member: u16
}
will insert the appropriate empty implementation (#[derive] works with several other traits, e.g. one often sees #[derive(Copy, Clone, PartialEq, Eq)].)
Why does it work when passing an instance of MyType to the function by value? My guess is that it is being moved, so there is no copy in the first place.
Well, without calling the function one doesn't see the move vs. copy behaviour (if you were to call it twice the same non-Copy value, the compiler would emit an error about moved values). But, a "move" and a "copy" are essentially the same on the machine. All by-value uses of a value are shallow copies semantically in Rust, just like in C.

Resources