Avoiding pubs in Rust - struct

I've just split my program into an executable and a large file full of struct definitions (structs.rs).
In order to use the structs and their fields for the main executable, I have to prepend each struct definition, and every field definition with pub.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Foo {
pub bar: u8,
pub baz: [u8; 4]
}
This reduces readability.
Is there a way of avoiding all of these pubs?
Or should I be using another way of decomposing my program into multiple files?

This is expected. Modules and crates are a privacy boundary in Rust. The default privacy level for structs and their fields is "same module and its submodules", so parent modules and sibling modules need to have a permission to touch struct's fields.
In Rust structs typically have only a few fields, but the fields may have important invariants to uphold. For example, if Vec.len was public, you could cause unsafe out of bounds access. As a precaution, Rust requires programmers to think through whether they can allow access to each field.
It's unusual to have thousands of structs or structs with thousands of fields, unless they're mirroring some external data definition. If that's the case, consider auto-generating struct definitions with a macro or build.rs.

Related

Avoid duplicating use statements at beginning of doctests

When writing doctests for methods on a struct or trait or functions in a file, etc. I often find myself putting the same use my_crate::{MyStruct, MyTrait, Etc} at the beginning of each doctest. Is there any way to avoid this duplication, like some way I could define those use statement just once for the whole module?
If you keep finding the same group of items gets imported over and over again, you may want to consider creating a prelude module. The core idea is that you simply re-export those key items in that module so anyone using your crate can do use your_crate::prelude::*; to import all of them as a group.
One common use cases for this is if you have a lot of traits that are frequently used. When you provide the most common traits as a group, your users don't need to spend figuring out which traits provide which methods. You can also choose to add structs/enums/unions, but I wouldn't recommend it. Unlike traits, types are almost always referred to explicitly and are much easier to find.
For example, here is what rayon's prelude module looks like.
//! The rayon prelude imports the various `ParallelIterator` traits.
//! The intention is that one can include `use rayon::prelude::*` and
//! have easy access to the various traits and methods you will need.
pub use crate::iter::FromParallelIterator;
pub use crate::iter::IndexedParallelIterator;
pub use crate::iter::IntoParallelIterator;
pub use crate::iter::IntoParallelRefIterator;
pub use crate::iter::IntoParallelRefMutIterator;
pub use crate::iter::ParallelBridge;
pub use crate::iter::ParallelDrainFull;
pub use crate::iter::ParallelDrainRange;
pub use crate::iter::ParallelExtend;
pub use crate::iter::ParallelIterator;
pub use crate::slice::ParallelSlice;
pub use crate::slice::ParallelSliceMut;
pub use crate::str::ParallelString;

Can I rely on the serialization of two identical structures being identical?

Consider the following code:
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct MyStruct {
a: u32,
b: u32
}
#[derive(Serialize, Deserialize)]
#[serde(rename = "MyStruct")]
struct AlsoMyStruct {
a: u32,
b: u32
}
I am wondering if I can safely do something like:
let ser = any_encoding::serialize(&MyStruct{a: 33, b: 44}).unwrap();
let deser: AlsoMyStruct = any_encoding::deserialize(&ser).unwrap();
where any_encoding is, e.g., bincode, json, or any other Serde-supported encoding. In my head, this should work nicely: the two structures have the same name (I'm explicitly renaming AlsoMyStruct into "MyStruct") and exactly the same fields: same field names, same field types, same order of fields.
However, I am wondering: is this is actually guaranteed to work? Or is there some other, corner-case, maybe platform-dependent, unforeseen piece of information that a Serde serializer/deserializer might include in the representation of MyStruct / AlsoMyStruct that could lead to the two representations being incompatible?
In general, no, you cannot expect this to work. The reason is that neither serde nor any de/serializers guarantee that you can round-trip your data (source). This means you cannot even expect this to work in all cases if you use the same struct in both places.
For example JSON cannot round-trip Option<()> and formats which are not self-describing like bincode, do not support untagged enums.
Nothing in the type signatures enforces round-tripping.
Here are some reasons why deserialization might fail:
Using skip_serializing_none with not self-describing formats (serde #1732).
Anything which calls deserialize_any, such as untagged, adjacently tagged, or internally tagged enums (serde #1762).
Borrowing during deserialization, e.g., for &'de str or &'de [u8]. serde_json only supports &'de str if there are no escape sequences and never supports &'de [u8].
Some formats cannot serialize some types, e.g., JSON does not supports structs as map keys and bincode only supports sequences of known lengths (bincode #167).
A type only implements one of the traits (Serializer/Deserializer) or the implementations do not match, e.g., serialize as number but deserialize as string.
That being said, this can work under some circumstances. The structs should have the same name and the fields in the same order. The types or rather the Serialize/Deserialize implementations also need to support round-tripping. With Option<()> from above it also depends on the Serializer/Deserializer implementations if you can round-trip them, even if Serialize/Deserialize implementations do support it.
Many types do try to support round-tripping, since that is what most commonly is expected.

How to automatically #derive(Something) on every struct(possibly with filtering)?

Is there a way to get the Rust compiler to add derives on all structs, possibly with some kind of matching, like only structs in X crate, or maybe recursively apply #derives? I would rather not vendor those crates because it'd be a lot of work to keep them up to date.
No, the Rust compiler does not provide such a facility.
You can't derive traits for types outside of your own crate. So you have two options:
Submit a PR to the crate in question, to add the implementations. Many projects will accept changes that implement common traits like serde::Serialize etc, provided they are feature-gated
Use wrapper types and implement the traits there. e.g. if the type in question is Uuid from the uuid crate:
pub struct MyUuid(uuid::Uuid);
impl MyTrait for MyUuid {
// etc
}
Note that you can't derive traits that need to know the fields of third party types because the tokens will not be available.

When to use a reference or a box to have a field that implements a trait in a struct?

I have the following code:
pub trait MyTrait {
pub fn do_something(&self);
}
If I want a struct A to have a field a that implements the trait MyTrait, there are 2 options:
pub struct A<'a> {
a: &'a MyTrait
}
or
pub struct A {
a: Box<MyTrait>
}
But on Difference between pass by reference and by box, someone said:
Really, Box<T> is only useful for recursive data structures (so that
they can be represented rather than being of infinite size) and for
the very occasional performance optimisation on large types (which you
shouldn’t try doing without measurements).
Unless A implements MyTrait, I'd say A is not a recursive data structure, so that makes me think I should prefer using a reference instead of a box.
If I have another struct B that has a reference to some A object, like this:
pub struct A<'a> {
a: &'a MyTrait
}
pub struct B<'a, 'b: 'a> {
b: &'a A<'b>
}
I need to say that 'b is larger than 'a, and according to the documentation:
You won't often need this syntax, but it can come up in situations
like this one, where you need to refer to something you have a
reference to.
I feel like that's a bad choice too, because the example here is really simple and probably doesn't need this kind of advanced feature.
How to decide whether I should use a reference or a box then?
Unfortunately, the quote you used applied to a completely different situation.
Really, Box<T> is only useful for recursive data structures (so that they can be represented rather than being of infinite size) and for the very occasional performance optimisation on large types (which you shouldn’t try doing without measurements).
Is speaking about using either of MyEnum or Box<MyEnum> for data members:
it is not comparing to references,
it is not talking about traits.
So... reset your brain, and let's start from scratch again.
The main difference between Box and a reference is ownership:
A Box indicates that the surrounding struct owns the piece of data,
A reference indicates that the surrounding struct borrows the piece of data.
Their use, therefore, is dictated by whether you want ownership or borrowing, which is a situational decision: neither is better than the other in the same way that a screwdriver and a hammer are not better than the other.
Rc (and Arc) can somewhat alleviate the need to decide, as they allow multiple owners, however they also introduce the risk of reference cycles which is its own nightmare to debug so I would caution over overusing them.

How to suppress the warning for "drop_with_repr_extern" at a fine granularity?

I am currently experimenting with multi-threading code, and its performance is affected by whether two data members share the same cache line or not.
In order to avoid false-sharing, I need to specify the layout of the struct without the Rust compiler interfering, and thus I use repr(C). However, this same struct also implements Drop, and therefore the compiler warns about the "incompatibility" of repr(C) and Drop, which I care naught for.
However, attempting to silence this futile warning has proven beyond me.
Here is a reduced example:
#[repr(C)]
#[derive(Default, Debug)]
struct Simple<T> {
item: T,
}
impl<T> Drop for Simple<T> {
fn drop(&mut self) {}
}
fn main() {
println!("{:?}", Simple::<u32>::default());
}
which emits #[warn(drop_with_repr_extern)].
I have tried specifying #[allow(drop_with_repr_extern)]:
at struct
at impl Drop
at mod
and neither worked. Only the crate-level suppression worked, which is rather heavy-handed.
Which leads us to: is there a more granular way of suppressing this warning?
Note: remarks on a better way to ensure that two data members are spread over different cache lines are welcome; however they will not constitute answers on their own.
The reason is near the end of rustc_lint/builtin.rs:
The lint does not walk the crate, instead using ctx.tcx.lang_items.drop_trait() to look up all Drop trait implementations within the crate. The annotations are only picked up while walking the crate. I've stumbled upon the same problem in this question. So unless someone changes the lint to actually walk the crate and pick up Drop impls as it goes, you need to annotate the whole crate.

Resources