Serde skip field serialization depending on a "global" runtime condition - rust

Depending on some runtime condition, I'd like to either serialize a field or not. That condition applies to the whole serialization and has nothing to do with the field's value itself. Hence, I cannot use skip_serializing_if() if I understand it right, unless I use some sort of a global state, but then that would be more like a constant, not a "condition".
As an example, let's say the condition depends on the client that requested the file. Some clients will need to have that field, others - not.
If the condition says serialize, do so even if the field's value is None (i.e. explicitly create a property with null value in the output JSON).
What's the simplest and cleanest way to achieve that?

Just create a function and ignore the argument:
use serde_json; // 1.0.67
use serde::Serialize; // 1.0.130
fn condition_met<T>(_: &T) -> bool {
false
}
#[derive(Serialize)]
struct Foo {
#[serde(skip_serializing_if = "condition_met")]
data: Option<u32>,
}
fn main() {
println!("{}", serde_json::to_string(&Foo{data: None}).unwrap());
}
Playground

Related

Why does Default derived on an enum not apply to references to the enum?

I have an enum that has Default derived on it, say for example:
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq)]
pub enum Enum<T> {
#[default] None,
One(T),
Two(T),
}
Vec has methods like last() which return an Option<&T>. But when I call unwrap_or_default() on the Option<&T>, I get the following error:
22 | println!("{:?}", v.last().unwrap_or_default());
| ^^^^^^^^ ----------------- required by a bound introduced by this call
| |
| the trait `Default` is not implemented for `&Enum<i8>`
|
= help: the trait `Default` is implemented for `Enum<T>`
note: required by a bound in `Option::<T>::unwrap_or_default`
I can work around this in one of two ways:
Use unwrap_or() and provide a reference to the default element manually:
v.last().unwrap_or(&Enum::None)
Implement Default for &Enum.
impl<'a> Default for &'a Enum2 {
fn default() -> &'a Enum2 { &Enum2::None }
}
But I don't realize why I should have to. Is there a separate trait I should derive to use the tagged default element correctly here?
Playground
Context: the goal was to output "None" if the list was empty and Enum::Display(val) from Some(val) for the last element, within a sequence of format arguments.
match self.history.last() { None => "None", Some(val) => val }
is illegal: the types of the match arms must match. Many other options don't work due to val being borrowed and self not being Copy. Defining a separate default for Enum was an easily-thought-of option to avoid extra string formatting/allocation.
What should Default return for &Enum?
Remember, a reference must point to some valid data, and does not keep said data alive.
So even if the Default implementation would allocate a new object, there is nowhere to keep it alive, because the returning reference wouldn't keep it alive. The only way is to store the default element somewhere globally; and then it would be shared for everyone. Many objects are more complicated and cannot be simply stored globally, because they are not const-initializable. What should happen then?
The fact that a reference does not implement Default automatically is intentional and with good reasons. unwrap_or_default is simply the wrong function to call if your contained value is a reference. Use match or if let to extract the value without having to create a fallback value. unwrap_or is also a valid choice if you must absolutely have a fallback value, although I don't see how this is easily possible to implement with a reference without falling back to global const values. Which is also fine, if it fits your usecase.
In your case, it's of course fine to derive Default manually and is probably what I would do if I absolutely had to. But to me, requiring this in the first place is most likely a code smell in the first place.
For enums that are that simple, it's actually an anti-pattern to use a non-mutable reference instead of a value. References are of size 64 in a 64bit system, while an enum of with three simple values is most likely a 8 bits large. So using a reference is an overhead.
So in this case, you can use copied() to query the object by-value. This is most likely even faster than querying it by reference.
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq)]
pub enum Enum<T> {
#[default]
None,
One(T),
Two(T),
}
fn main() {
let v: Vec<Enum<u32>> = vec![];
println!("{:?}", v.last().copied().unwrap_or_default());
}
None
Either way, I think the fact that you have this question in the first place indicates that there is an architectural issue in your code.
There are two cases:
Your enum members all contain valid data; in that case it is always important to distinguish in .last() between "one element with some data" and "no element", and falling back to a default wouldn't be viable.
Your enum has a None member, in which case having the None element at .last() would be identical to having no members. (which is what I suspect your code does). In that case it's pointless to manually define a None member, though, simply use the Option enum that already exists. That would make your problem trivially solvable; Option already implements Default (which is None) and can propagate the reference to its member via .as_ref():
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum EnumContent<T> {
One(T),
Two(T),
}
// Option already implements `Default`.
pub type Enum<T> = Option<EnumContent<T>>;
fn main() {
let mut v: Vec<Enum<u32>> = vec![];
println!("{:?} -> {:?}", v, v.last().and_then(Option::as_ref));
v.push(Some(EnumContent::One(42)));
println!("{:?} -> {:?}", v, v.last().and_then(Option::as_ref));
v.push(None);
println!("{:?} -> {:?}", v, v.last().and_then(Option::as_ref));
}
[] -> None
[Some(One(42))] -> Some(One(42))
[Some(One(42)), None] -> None

How do I inspect function arguments at runtime in Rust?

Say I have a trait that looks like this:
use std::{error::Error, fmt::Debug};
use super::CheckResult;
/// A Checker is a component that is responsible for checking a
/// particular aspect of the node under investigation, be that metrics,
/// system information, API checks, load tests, etc.
#[async_trait::async_trait]
pub trait Checker: Debug + Sync + Send {
type Input: Debug;
/// This function is expected to take input, whatever that may be,
/// and return a vec of check results.
async fn check(&self, input: &Self::Input) -> anyhow::Result<Vec<CheckResult>>;
}
And say I have two implementations of this trait:
pub struct ApiData {
some_response: String,
}
pub MetricsData {
number_of_events: u64,
}
pub struct ApiChecker;
impl Checker for ApiChecker {
type Input = ApiData;
// implement check function
}
pub struct MetricsChecker;
impl Checker for MetricsChecker {
type Input = MetricsData;
// implement check function
}
In my code I have a Vec of these Checkers that looks like this:
pub struct MyServer {
checkers: Vec<Box<dyn Checker>>,
}
What I want to do is figure out, based on what Checkers are in this Vec, what data I need to fetch. For example, if it just contained an ApiChecker, I would only need to fetch the ApiData. If both ApiChecker and MetricsChecker were there, I'd need both ApiData and MetricsData. You can also imagine a third checker where Input = (ApiData, MetricsData). In that case I'd still just need to fetch ApiData and MetricsData once.
I imagine an approach where the Checker trait has an additional function on it that looks like this:
fn required_data(&self) -> HashSet<DataId>;
This could then return something like [DataId::Api, DataId::Metrics]. I would then run this for all Checkers in my vec and then I'd end up a complete list of data I need to get. I could then do some complicated set of checks like this:
let mut required_data = HashSet::new();
for checker in checkers {
required_data.union(&mut checker.required_data());
}
let api_data: Option<ApiData> = None;
if required_data.contains(DataId::Api) {
api_data = Some(get_api_data());
}
And so on for each of the data types.
I'd then pass them into the check calls like this:
api_checker.check(
api_data.expect("There was some logic error and we didn't get the API data even though a Checker declared that it needed it")
);
The reasons I want to fetch the data outside of the Checkers is:
To avoid fetching the same data multiple times.
To support memoization between unrelated calls where the arguments are the same (this could be done inside some kind of Fetcher trait implementation for example).
To support generic retry logic.
By now you can probably see that I've got two big problems:
The declaration of what data a specific Checker needs is duplicated, once in the function signature and again from the required_data function. This naturally introduces bug potential. Ideally this information would only be declared once.
Similarly, in the calling code, I have to trust that the data that the Checkers said they needed was actually accurate (the expect in the previous snippet). If it's not, and we didn't get data we needed, there will be problems.
I think both of these problems would be solved if the function signature, and specifically the Input associated type, was able to express this "required data" declaration on its own. Unfortunately I'm not sure how to do that. I see there is a nightly feature in any that implements Provider and Demand: https://doc.rust-lang.org/std/any/index.html#provider-and-demand. This sort of sounds like what I want, but I have to use stable Rust, plus I figure I must be missing something and there is an easier way to do this without going rogue with semi dynamic typing.
tl;dr: How can I inspect what types the arguments are for a function (keeping in mind that the input might be more complex than just one thing, such as a struct or tuple) at runtime from outside the trait implementer? Alternatively, is there a better way to design this code that would eliminate the need for this kind of reflection?
Your problems start way earlier than you mention:
checkers: Vec<Box<dyn Checker>>
This is an incomplete type. The associated type Input means that Checker<Input = ApiData> and Checker<Input = MetricsData> are incompatible. How would you call checkers[0].check(input)? What type would input be? If you want a collection of "checkers" then you'll need a unified API, where the arguments to .check() are all the same.
I would suggest a different route altogether: Instead of providing the input, provide a type that can retrieve the input that they ask for. That way there's no need to coordinate what type the checkers will ask for in a type-safe way, it'll be inherent to the methods the checkers themselves call. And if your primary concern is repeatedly retrieving the same data for different checkers, then all you need to do is implement caching in the provider. Same with retry logic.
Here's my suggestion:
struct DataProvider { /* cached api and metrics */ }
impl DataProvider {
fn fetch_api_data(&mut self) -> anyhow::Result<ApiData> { todo!() }
fn fetch_metrics_data(&mut self) -> anyhow::Result<MetricsData> { todo!() }
}
#[async_trait::async_trait]
trait Checker {
async fn check(&self, data: &mut DataProvider) -> anyhow::Result<Vec<CheckResult>>;
}
struct ApiAndMetricsChecker;
#[async_trait::async_trait]
impl Checker for ApiAndMetricsChecker {
async fn check(&self, data: &mut DataProvider) -> anyhow::Result<Vec<CheckResult>> {
let _api_data = data.fetch_api_data()?;
let _metrics_data = data.fetch_metrics_data()?;
// do something with api and metrics data
todo!()
}
}

How to serialize/deserialize a map with the None values

I need a map with the Option values in my configuration. However, serde seems to ignore any pairs with the None value
use std::collections::HashMap;
use serde::{Deserialize, Serialize};
use toml;
#[derive(Debug, Serialize, Deserialize)]
struct Config {
values: HashMap<String, Option<u32>>,
}
fn main() {
let values = [("foo", Some(5)), ("bar", None)]
.iter()
.map(|(name, s)| (name.to_string(), s.clone()))
.collect();
let config = Config { values };
let s = toml::ser::to_string(&config).unwrap();
println!("{}", s);
}
produces
[values]
foo = 5
The same goes for deserializing: I simply cannot represent bar: None in any form,
since the TOML has no notion of None or null or alike.
Are there some tricks to do that?
The closest alternative I have found is to use a special sentinel value (the one you will probably use in Option::unwrap_or), which appears in the TOML file as the real value (e.g. 0), and converts from Option::None on serialization. But on deserialization, the sentinel value converts to Option::None and leaves us with the real Option type.
Serde has a special #[serde(with = module)] attribute to customize the ser/de field behavior, which you can use here. The full working example is here.

how can I flatten an enum to a special case when the explicit cases don't match

I'd like it so when the case is unknown, it will be associated with the last case
#[derive(Serialize, Deserialize, Clone, Debug)]
#[serde(untagged)]
pub enum Action {
Action1,
Action2,
Action3,
Other(String), // when not known it should be here
}
I've tried using the directive
#[serde(untagged)]
but then it doesn't serialize properly
let b = Action::Action1;
let s = serde_json::to_string(&b);
let ss = s.unwrap();
println!("ss {:#?}", &ss);
let val = serde_json::to_value(b);
println!("ss {:#?}", &val);
results in
ss "null"
ss Ok(
Null,
)
Playground link
I can think of two options that build off each other.
First use From to turn this into a string every time you serialize and then From to turn it back into your own type. This requires you to convert every time you serialize and deserialize but will accomplish your goal.
If you want to make the API a little cleaner at the cost of doing more work you can implement serialize and deserialize yourself. Here are some references on how to do that:
Custom Serialization
Implementing Serialize
Implementing Deserialize
As a second option you can offload the custom serialization and deserialization if your willing to add another dependency of serde_with.
According to its docs:
De/Serializing a type using the Display and FromStr traits, e.g., for u8, url::Url, or mime::Mime. Check DisplayFromStr or serde_with::rust::display_fromstr for details.

Writing to a field in a MaybeUninit structure?

I'm doing something with MaybeUninit and FFI in Rust that seems to work, but I suspect may be unsound/relying on undefined behavior.
My aim is to have a struct MoreA extend a struct A, by including A as an initial field. And then to call some C code that writes to the struct A. And then finalize MoreA by filling in its additional fields, based on what's in A.
In my application, the additional fields of MoreA are all integers, so I don't have to worry about assignments to them dropping the (uninitialized) previous values.
Here's a minimal example:
use core::fmt::Debug;
use std::mem::MaybeUninit;
#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct A(i32, i32);
#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct MoreA {
head: A,
more: i32,
}
unsafe fn mock_ffi(p: *mut A) {
// write doesn't drop previous (uninitialized) occupant of p
p.write(A(1, 2));
}
fn main() {
let mut b = MaybeUninit::<MoreA>::uninit();
unsafe { mock_ffi(b.as_mut_ptr().cast()); }
let b = unsafe {
let mut b = b.assume_init();
b.more = 3;
b
};
assert_eq!(&b, &MoreA { head: A(1, 2), more: 3 });
}
Is the code let b = unsafe { ... } sound? It runs Ok and Miri doesn't complain.
But the MaybeUninit docs say:
Moreover, uninitialized memory is special in that the compiler knows that it does not have
a fixed value. This makes it undefined behavior to have uninitialized data in a variable
even if that variable has an integer type, which otherwise can hold any fixed bit pattern.
Also, the Rust book says that Behavior considered undefined includes:
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
... An integer (i*/u*) or ... obtained from uninitialized memory.
On the other hand, it doesn't seem possible to write to the more field before calling assume_init. Later on the same page:
There is currently no supported way to create a raw pointer or reference to a field of a struct
inside MaybeUninit. That means it is not possible to create a struct by calling
MaybeUninit::uninit::() and then writing to its fields.
If what I'm doing in the above code example does trigger undefined behavior, what would solutions be?
I'd like to avoid boxing the A value (that is, I'd like to have it be directly included in MoreA).
I'd hope also to avoid having to create one A to pass to mock_ffi and then having to copy the results into MoreA. A in my real application is a large structure.
I guess if there's no sound way to get what I'm after, though, I'd have to choose one of those two fallbacks.
If struct A is of a type that can hold the bit-pattern 0 as a valid value, then I guess a third fallback would be:
Start with MaybeUninit::zeroed() rather than MaybeUninit::uninit().
Currently, the only sound way to refer to uninitialized memory—of any type—is MaybeUninit. In practice, it is probably safe to read or write to uninitialized integers, but that is not officially documented. It is definitely not safe to read or write to an uninitialized bool or most other types.
In general, as the documentation states, you cannot initialize a struct field by field. However, it is sound to do so as long as:
the struct has repr(C). This is necessary because it prevents Rust from doing clever layout tricks, so that the layout of a field of type MaybeUninit<T> remains identical to the layout of a field of type T, regardless of its adjacent fields.
every field is MaybeUninit. This lets us assume_init() for the entire struct, and then later initialise each field individually.
Given that your struct is already repr(C), you can use an intermediate representation which uses MaybeIninit for every field. The repr(C) also means that we can transmute between the types once it is initialised, provided that the two structs have the same fields in the same order.
use std::mem::{self, MaybeUninit};
#[repr(C)]
struct MoreAConstruct {
head: MaybeUninit<A>,
more: MaybeUninit<i32>,
}
let b: MoreA = unsafe {
// It's OK to assume a struct is initialized when all of its fields are MaybeUninit
let mut b_construct = MaybeUninit::<MoreAConstruct>::uninit().assume_init();
mock_ffi(b_construct.head.as_mut_ptr());
b_construct.more = MaybeUninit::new(3);
mem::transmute(b_construct)
};
It is now possible (since Rust 1.51) to initialize fields of any uninitialized struct using the std::ptr::addr_of_mut macro. This example is from the documentation:
You can use MaybeUninit, and the std::ptr::addr_of_mut macro, to
initialize structs field by field:
#[derive(Debug, PartialEq)] pub struct Foo {
name: String,
list: Vec<u8>, }
let foo = {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
// Initializing the `name` field
unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }
// Initializing the `list` field
// If there is a panic here, then the `String` in the `name` field leaks.
unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }
// All the fields are initialized, so we call `assume_init` to get an initialized Foo.
unsafe { uninit.assume_init() } };
assert_eq!(
foo,
Foo {
name: "Bob".to_string(),
list: vec![0, 1, 2]
}
);

Resources