Deserialize data error using serde in Rust - rust

I have three structs A, B and C that are being fed to a method process_data() as a list of JSON. All the three structs are serde serializable/deserializable.
They are defined below as follows:-
#[derive(Serialize, Deserialize)]
struct A {
pub a: u32,
}
#[derive(Serialize, Deserialize)]
struct B {
pub b: u32,
}
#[derive(Serialize, Deserialize)]
struct C {
pub c: u32,
}
The function signature looks like this
fn process_data(data: String) {}
data can have any of these structs but its guaranteed that one of A, B or C will be there
data = "A{a: 1}"
or data = "[A{a:1}, B{b:1}, C{c:1}]"
or data = "[B{b:1}, A{a:1}, C{c:1}]"
I am looking for a way to process the variable data through serde within process_data, such that I can extract the structs from the data stream.
What I have tried so far.
I tried defining a struct called Collect which holds all the structs like this:-
#[derive(Serialize, Deserialize)]
struct Collect {
pub a1: Option<A>
pub b1: Option<B>,
pub c1: Option<C>
}
and then process the data as follows:-
serde_json::from_str::<Collect>(data.as_str())
But the previous command throws an error. Also I am looking to preserve the order of the vector in which the data is coming
I am not sure if serde will work in this case.

I'll assume you wanted the following JSON data:
[{"a":1}, {"b":1}, {"c":1}]
So, you want to deserialize to Vec<Collect>:
{"a":1} only contains the subfield from struct A, no additional wrap for a1. Normally you handle missing levels by tagging with #[serde(flatten)]
{"a":1} doesn't contain the subfields from b1 or c1. Normally you handle missing fields by tagging (an Option) with #[serde(default)]
It seems that the combination of the two doesn't work on deserialization.
Instead, you can deserialize to an untagged enum:
#[derive(Serialize, Deserialize, Debug)]
#[serde(untagged)]
enum CollectSer {
A { a: u32 },
B { b: u32 },
C { c: u32 },
}
If you do absolutely want to use your Collect as is, with the Options, you can do that still:
#[derive(Serialize, Deserialize, Debug, Default)]
#[serde(from = "CollectSer")]
struct Collect {
#[serde(flatten)]
pub a1: Option<A>,
#[serde(flatten)]
pub b1: Option<B>,
#[serde(flatten)]
pub c1: Option<C>,
}
impl From<CollectSer> for Collect {
fn from(cs: CollectSer) -> Self {
match cs {
CollectSer::A { a } => Collect {
a1: Some(A { a }),
..Default::default()
},
CollectSer::B { b } => Collect {
b1: Some(B { b }),
..Default::default()
},
CollectSer::C { c } => Collect {
c1: Some(C { c }),
..Default::default()
},
}
}
}
I suggest you just stick with the enum though, it's a lot more rustic.
Playground
(Apologies if I misguessed the structure of your data, but if so, I suppose you can at least point out the difference with this?)

Related

How to serialize a type that might be an arbitrary string?

I have an enum type that is defined as either one of list of predefined strings or an arbitrary value (i.e. code that uses this type potentially wants to handle a few specific cases a certain way and also allow an arbitrary string).
I'm trying to represent this in Rust with serde the following way:
#[derive(Serialize, Debug)]
pub enum InvalidatedAreas {
#[serde(rename = "all")]
All,
#[serde(rename = "stacks")]
Stacks,
#[serde(rename = "threads")]
Threads,
#[serde(rename = "variables")]
Variables,
String(String),
}
When used as a member, I would like to serialize the above enum as simply a string value:
#[derive(Serialize, Debug)]
struct FooBar {
foo: InvalidatedAreas,
bar: InvalidatedAreas,
}
fn main() {
let foob = FooBar {
foo: types::InvalidatedAreas::Stacks,
bar: types::InvalidatedAreas::String("hello".to_string())
};
let j = serde_json::to_string(&foob)?;
println!("{}", j);
}
What I get is:
{"foo":"stacks","bar":{"String":"hello"}}
But I need
{"foo":"stacks","bar":"hello"}
If I add #[serde(untagged)] to the enum definition, I get
{"foo":null,"bar":"hello"}
How can I serialize this correctly?
I've arrived at the following solution. It requires a bit of repetition, but it's not so bad. I'll leave the question open in case someone has a better idea.
impl ToString for InvalidatedAreas {
fn to_string(&self) -> String {
match &self {
InvalidatedAreas::All => "all",
InvalidatedAreas::Stacks => "stacks",
InvalidatedAreas::Threads => "threads",
InvalidatedAreas::Variables => "variables",
InvalidatedAreas::String(other) => other
}
.to_string()
}
}
impl Serialize for InvalidatedAreas {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_str(&self.to_string())
}
}
}

Serde MsgPack versioning

I need to serialize some data into files. For the sake of memory efficiency, I want to use the default compact serializer of MessagePack (MsgPack), as it only serializes field values w/o their names. I also want to be able to make changes to the data structure in future versions, which obviously can't be done w/o also storing some meta/versioning information. I imagine the most efficient way to do it is to simply use some "header" field for that purpose. Here is an example:
pub struct Data {
pub version: u8,
pub items: Vec<Item>,
}
pub struct Item {
pub field_a: i32,
pub field_b: String,
pub field_c: i16, // Added in version 3
}
Can I do something like that in rmp-serde (or maybe some other crate?) - to somehow annotate that a certain struct field should only be taken into account for specific file versions?
You can achieve this by writing a custom deserializer like this:
use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize};
#[derive(Serialize)]
pub struct Data {
pub version: u8,
pub items: Vec<Item>,
}
#[derive(Serialize)]
pub struct Item {
pub field_a: i32,
pub field_b: String,
pub field_c: i16, // Added in version 3
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
// Inner structs, used for deserializing only
#[derive(Deserialize)]
pub struct InnerData {
version: u8,
items: Vec<InnerItem>,
}
#[derive(Deserialize)]
pub struct InnerItem {
field_a: i32,
field_b: String,
field_c: Option<i16>, // Added in version 3 - note that this field is optional
}
// Deserializer the inner structs
let inner_data = InnerData::deserialize(deserializer)?;
// Get the version so we can add custom logic based on the version later on
let version = inner_data.version;
// Map the InnerData/InnerItem structs to Data/Item using our version based logic
Ok(Data {
version,
items: inner_data
.items
.into_iter()
.map(|item| {
Ok(Item {
field_a: item.field_a,
field_b: item.field_b,
field_c: if version < 3 {
42 // Default value
} else {
// Get the value of field_c
// If it's missing return an error, since it's required since version 3
// Otherwise return the value
item.field_c
.map_or(Err(D::Error::missing_field("field_c")), Ok)?
},
})
})
.collect::<Result<_, _>>()?,
})
}
}
Short explanation how the deserializer works:
We create a "dumb" inner struct which is a copy of your structs but the "new" fields are optional
We deserialize to the new inner structs
We map from our inner to our outer structs using version-based logic
If one of the new fields is missing in a new version we return a D::Error::missing_field error

Is there any convenient way to modify the field of a Rust struct while cloning it without unnecessarily cloning the field to be modified?

I have some code which needs to modify a field of a struct during a clone. I want to avoid unnecessarily cloning that field before modifying it. The original struct is wrapped in an Rc so I can only access it via a reference. Here is a simplified example with a naive attempt to use the struct update syntax:
use std::rc::Rc;
#[derive(Clone, Debug)]
struct B {
value: i32,
}
#[derive(Clone, Debug)]
struct A {
b: B,
c: i32,
}
impl A {
fn update_c(&self, c: i32) -> Self {
A {
c,
..*self
}
}
fn update_b(&self, b: B) -> Self {
A {
b,
..*self
}
}
}
fn main() {
let a = Rc::new(A {
b: B { value: 0 },
c: 0,
});
let a = Rc::new(a.update_c(-1));
}
rust playground link
Unfortunately this doesn't compile because my type does not implement Copy, and I don't really want it to implement Copy. I'm also guessing that in this case, the copy while performing the dereference in ..*a would be just as expensive as cloning the entire struct before modification.
The only way I've found so far to make this work is to manually clone each field:
fn update_c(&self, c: i32) -> Self {
Self {
c,
b: self.b.clone()
}
}
rust playground link
But this would not be convenient for a struct with many fields. Is there a different solution or crate out there with a custom derive to have this functionality available for every field without having to exponentially increase the amount of code you write to add a new field?

How do I avoid generating JSON when serializing a value that is null or a default value?

The serde_json::to_string() function will generate a string which may include null for an Option<T>, or 0 for a u32. This makes the output larger, so I want to ignore these sorts of values.
I want to simplify the JSON string output of the following structure:
use serde_derive::Serialize; // 1.0.82
#[derive(Serialize)]
pub struct WeightWithOptionGroup {
pub group: Option<String>,
pub proportion: u32,
}
When group is None and proportion is 0, the JSON string should be "{}"
Thanks for the answerHow do I change Serde's default implementation to return an empty object instead of null?, it can resolve Optionproblem, but for 0 there is none solution.
The link Skip serializing field give me the answer.
And the fixed code:
#[derive(Debug, Clone, Serialize, Deserialize, Default, PartialEq, Ord, PartialOrd, Eq)]
pub struct WeightWithOptionGroup {
#[serde(skip_serializing_if = "Option::is_none")]
#[serde(default)]
pub group: Option<String>,
#[serde(skip_serializing_if = "is_zero")]
#[serde(default)]
pub proportion: u32,
}
/// This is only used for serialize
#[allow(clippy::trivially_copy_pass_by_ref)]
fn is_zero(num: &u32) -> bool {
*num == 0
}
There's a couple of ways you could do this:
Mark each of the fields with a skip_serialising_if attribute to say when to skip them. This is much easier, but you'll have to remember to do it for every field.
Write your own Serde serialiser that does this custom JSON form. This is more work, but shouldn't be too bad, especially given you can still use the stock JSON deserialiser.
For those searching how to skip serialization for some enum entries you can do this
#[derive(Serialize, Deserialize)]
enum Metadata<'a> {
App, // want this serialized
Ebook, // want this serialized
Empty // dont want this serialized
}
#[derive(Serialize, Deserialize)]
struct Request<'a> {
request_id: &'a str,
item_type: ItemType,
#[serde(skip_serializing_if = "metadata_is_empty")]
metadata: Metadata<'a>,
}
fn metadata_is_empty<'a>(metadata: &Metadata<'a>) -> bool {
match metadata {
Metadata::Empty => true,
_ => false
}
}

How to make a public struct where all fields are public without repeating `pub` for every field?

How can I define a public struct in Rust where all the fields are public without having to repeat pub modifier in front of every field?
A pub_struct macro would be ideal:
pub_struct! Foo {
a: i32,
b: f64,
// ...
}
which would be equivalent to:
pub struct Foo {
pub a: i32,
pub b: f64,
//...
}
macro_rules! pub_struct {
($name:ident {$($field:ident: $t:ty,)*}) => {
#[derive(Debug, Clone, PartialEq)] // ewww
pub struct $name {
$(pub $field: $t),*
}
}
}
Unfortunately, derive may only be applied to structs, enums and unions, so I don't know how to hoist those to the caller.
Usage:
pub_struct!(Foo {
a: i32,
b: f64,
});
It would be nice if I didn't need the parentheses and semicolon, i.e. if Rust supported reader macros.

Resources