Skip empty objects when deserializing array with serde - rust

I need to deserialize an array (JSON) of a type let call Foo. I have implemented this and it works well for most stuff, but I have noticed the latest version of the data will sometimes include erroneous empty objects.
Prior to this change, each Foo can be de-serialized to the following enum:
#[derive(Deserialize)]
#[serde(untagged)]
pub enum Foo<'s> {
Error {
// My current workaround is using Option<Cow<'s, str>>
error: Cow<'s, str>,
},
Value {
a: u32,
b: i32,
// etc.
}
}
/// Foo is part of a larger struct Bar.
#[derive(Deserialize)]
#[serde(untagged)]
pub struct Bar<'s> {
foos: Vec<Foo<'s>>,
// etc.
}
This struct may represent one of the following JSON values:
// Valid inputs
[]
[{"a": 34, "b": -23},{"a": 33, "b": -2},{"a": 37, "b": 1}]
[{"error":"Unable to connect to network"}]
[{"a": 34, "b": -23},{"error":"Timeout"},{"a": 37, "b": 1}]
// Possible input for latest versions of data
[{},{},{},{},{},{},{"a": 34, "b": -23},{},{},{},{},{},{},{},{"error":"Timeout"},{},{},{},{},{},{}]
This does not happen very often, but it is enough to cause issues. Normally, the array should include 3 or less entries, but these extraneous empty objects break that convention. There is no meaningful information I can gain from parsing {} and in the worst cases there can be hundreds of them in one array.
I do not want to error on parsing {} as the array still contains other meaningful values, but I do not want to include {} in my parsed data either. Ideally I would also be able to use tinyvec::ArrayVec<[Foo<'s>; 3]> instead of a Vec<Foo<'s>> to save memory and reduce time spent performing allocation during paring, but am unable to due to this issue.
How can I skip {} JSON values when deserializing an array with serde in Rust?
I also put together a Rust Playground with some test cases to try different solutions.

serde_with::VecSkipError provides a way to ignore any elements which fail deserialization, by skipping them. This will ignore any errors and not only the empty object {}. So it might be too permissive.
#[serde_with::serde_as]
#[derive(Deserialize)]
pub struct Bar<'s> {
#[serde_as(as = "serde_with::VecSkipError<_>")]
foos: Vec<Foo<'s>>,
}
Playground

The simplest, but not performant, solution would be to define an enum that captures both the Foo case and the empty case, deserialize into a vector of those, and then filter that vector to get just the nonempty ones.
#[derive(Deserialize, Debug)]
#[serde(untagged)]
pub enum FooDe<'s> {
Nonempty(Foo<'s>),
Empty {},
}
fn main() {
let json = r#"[
{},{},{},{},{},{},
{"a": 34, "b": -23},
{},{},{},{},{},{},{},
{"error":"Timeout"},
{},{},{},{},{},{}
]"#;
let foo_des = serde_json::from_str::<Vec<FooDe>>(json).unwrap();
let foos = foo_des
.into_iter()
.filter_map(|item| {
use FooDe::*;
match item {
Nonempty(foo) => Some(foo),
Empty {} => None,
}
})
.collect();
let bar = Bar { foos };
println!("{:?}", bar);
// Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}
Conceptually this is simple but you're allocating a lot of space for Empty cases that you ultimately don't need. Instead, you can control exactly how deserialization is done by implementing it yourself.
struct BarVisitor<'s> {
marker: PhantomData<fn() -> Bar<'s>>,
}
impl<'s> BarVisitor<'s> {
fn new() -> Self {
BarVisitor {
marker: PhantomData,
}
}
}
// This is the trait that informs Serde how to deserialize Bar.
impl<'de, 's: 'de> Deserialize<'de> for Bar<'s> {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
impl<'de, 's: 'de> Visitor<'de> for BarVisitor<'s> {
// The type that our Visitor is going to produce.
type Value = Bar<'s>;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("a list of objects")
}
fn visit_seq<V>(self, mut access: V) -> Result<Self::Value, V::Error>
where
V: SeqAccess<'de>,
{
let mut foos = Vec::new();
while let Some(foo_de) = access.next_element::<FooDe>()? {
if let FooDe::Nonempty(foo) = foo_de {
foos.push(foo)
}
}
let bar = Bar { foos };
Ok(bar)
}
}
// Instantiate our Visitor and ask the Deserializer to drive
// it over the input data, resulting in an instance of Bar.
deserializer.deserialize_seq(BarVisitor::new())
}
}
fn main() {
let json = r#"[
{},{},{},{},{},{},
{"a": 34, "b": -23},
{},{},{},{},{},{},{},
{"error":"Timeout"},
{},{},{},{},{},{}
]"#;
let bar = serde_json::from_str::<Bar>(json).unwrap();
println!("{:?}", bar);
// Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}

Related

Deserialize using a function of the tag

An API with this internally tagged field structure, with "topic" being the tag:
{
"topic": "Car"
"name": "BMW"
"HP": 250
}
This can be deserialized with
#[derive(Serialize, Deserialize)]
#[serde(tag = "topic")]
pub enum catalog {
CarEntry(Car),
... (other types)
}
#[derive(Serialize, Deserialize)]
pub struct Car {
pub name: String
pub HP: i32
}
It turns out that instead of reporting the topic as just Car, the API actually sends Car.product1 or Car.product2 etc.
This breaks the deserialization, because the deserializer doesn't know what the type is based on the string. Is there a way to supply a function to chop off the type string so that the correct model is found?
I don't think serde provides a way to mangle the tag before using it (at least I don't see anything relevant). And the generated serializers for tagged enums are relatively complex, with internal caching if the tag isn't the first field, and whatnot, so I wouldn't want to reproduce that in a custom deserializer.
The cheapest (but not necessarily most efficient) shot at this is to deserialize to serde_json::Value first, manually process the tag, and then deserialize the serde_json::Values to whatever struct you want.
Do that in a custom deserializer, and it starts looking reasonable:
impl<'de> Deserialize<'de> for Catalog {
fn deserialize<D>(d: D) -> Result<Self, <D as Deserializer<'de>>::Error>
where
D: Deserializer<'de>,
{
use serde_json::{Map, Value};
#[derive(Deserialize)]
struct Pre {
topic: String,
#[serde(flatten)]
data: Map<String, Value>,
}
let v = Pre::deserialize(d)?;
// Now you can mangle Pre any way you want to get your final structs.
match v.topic.as_bytes() {
[b'C', b'a', b'r', b'.', _rest # ..] => Ok(Catalog::CarEntry(
serde_json::from_value(v.data.into()).map_err(de::Error::custom)?,
)),
[b'B', b'a', b'r', b'.', _rest # ..] => Ok(Catalog::BarEntry(
serde_json::from_value(v.data.into()).map_err(de::Error::custom)?,
)),
_ => return Err(de::Error::unknown_variant(&v.topic, &["Car.…", "Bar.…"])),
}
}
}
Playground
Btw, what do you want to do with the suffix of topic? Throw it away? How do you plan on handling serialization if you do throw it away?
You can directly use enum instead of defining extra struct type.
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Debug)]
#[serde(tag = "topic")]
pub enum Catalog {
Car { name: String, hp: i32 }
}
fn main() {
let car = Catalog::Car { name: String::from("BMW"), hp: 2000 };
// Convert the Car to a JSON string.
let serialized = serde_json::to_string(&car).unwrap();
// Prints serialized = {"topic":"Car","name":"BMW","hp":2000}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Car.
let deserialized: Catalog = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Car { name: "BMW", hp: 2000 }
println!("deserialized = {:?}", deserialized);
}
Playground
You can use #[serde(rename()] to rename type in output
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Debug)]
#[serde(tag = "topic")]
pub enum Catalog {
#[serde(rename(serialize = "Car", deserialize = "CarEntry"))]
CarEntry(Car),
}
#[derive(Serialize, Deserialize, Debug)]
pub struct Car {
pub name: String,
pub hp: i32
}
fn main() {
let car = Car { name: String::from("BMW"), hp: 2000 };
let catalog = Catalog::CarEntry(car);
// Convert the Car to a JSON string.
let serialized = serde_json::to_string(&catalog).unwrap();
// Prints serialized = {"topic":"Car","name":"BMW","hp":2000}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Car.
let deserialized: Car = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Car { name: "BMW", hp: 2000 }
println!("deserialized = {:?}", deserialized);
}
Playground

Can one specify serde's rename_all rule at runtime?

I have a data model that I would like to be deserialized from "camelCase" to the rust standard "snake_case" when reading from a source, X. But I'd like to leave it in "snake_case" when reading or writing to another source, Y.
For example, the following code,
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct Data {
foo_bar: String,
hello_word: String,
}
can only be encoded and decoded in camel case. Even if I manually defined my Serialize and Deserialize implementations, I can't define multiple for the same struct. I could define a second struct that's a copy/paste of the other and then derive but that method would get tedious with multiple large structs. What I would really like to do is specify that rename_all attribute at run-time. But I'm not seeing any way to do that in serde's API.
I think the best way sigh is to just write out one struct Data_ per #[serde(rename_all = ...)], then write one additional struct Data that will be the in-memory representation (which won't be serializable, to remove ambiguity), then implement From in both directions for the Data_s and Data so that they're interconvertible.
Thankfully, we can use a macro so that we only have to specify the fields once. (It is incredibly disgusting nonetheless.)
This playground available here.
use serde::{Deserialize, Serialize}; // 1.0.130
use serde_json; // 1.0.69
macro_rules! interconvertible {
($T:ident <-> $U:ident, $($field_name:ident),*) => {
impl From<$T> for $U {
fn from(t: $T) -> Self {
let $T { $($field_name),* } = t;
Self { $($field_name),* }
}
}
impl From<$U> for $T {
fn from(u: $U) -> Self {
let $U { $($field_name),* } = u;
Self { $($field_name),* }
}
}
};
}
macro_rules! create_data_structs {
($($field_name:ident: $field_type:ty),* $(,)?) => {
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
struct DataX {
$($field_name: $field_type),*
}
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "snake_case")]
struct DataY {
$($field_name: $field_type),*
}
#[derive(Debug)]
struct Data {
$($field_name: $field_type),*
}
interconvertible!(DataX <-> Data, $($field_name),*);
interconvertible!(DataY <-> Data, $($field_name),*);
}
}
create_data_structs!(foo_bar: String, hello_world: String);
fn main() -> serde_json::Result<()> {
let x1: DataX = serde_json::from_str(r#"{"fooBar": "a", "helloWorld": "b"}"#)?;
let y1: DataY = serde_json::from_str(r#"{"foo_bar": "a", "hello_world": "b"}"#)?;
println!("{:?}, {:?}", x1, y1);
let x2: Data = x1.into();
let y2: Data = y1.into();
println!("{:?}, {:?}", x2, y2);
let x_string = serde_json::to_string(&DataX::from(x2))?;
let y_string = serde_json::to_string(&DataY::from(y2))?;
println!("{:?}, {:?}", x_string, y_string);
Ok(())
}
The output is:
DataX { foo_bar: "a", hello_world: "b" }, DataY { foo_bar: "a", hello_world: "b" }
[Data { foo_bar: "a", hello_world: "b" }, Data { foo_bar: "a", hello_world: "b" }]
"{\"fooBar\":\"a\",\"helloWorld\":\"b\"}", "{\"foo_bar\":\"a\",\"hello_world\":\"b\"}"
Since I'm only every decoding from source X I can utilize the #[serde(alias = ???)] macro. So my above use case would be
#[derive(Serialize, Deserialize)]
struct Data {
#[serde(alias="fooBar")]
foo_bar: String,
#[serde(alias="helloWorld")]
hello_word: String,
}
It's still a little tedious but better than an intermediate struct. It won't work though if I want to decode or encode to different cases.
(I'm not going to mark this as an answer because it's a work-around for my specific use case. If anyone has a more generic solution feel free to answer.)

serde: deserialize a field based on the value of another field

I would like to deserialize a wire format, like this JSON, into the Data structure below and I am failing to write the serde Deserialize implementations for the corresponding rust types.
{ "type": "TypeA", "value": { "id": "blah", "content": "0xa1b.." } }
enum Content {
TypeA(Vec<u8>),
TypeB(BigInt),
}
struct Value {
id: String,
content: Content,
}
struct Data {
typ: String,
value: Value,
}
The difficulty is selecting the correct value of the Content enumeration, which is based on the typ value.
As far as I know, deserialization in serde is stateless, an hence there is no way of either
knowing what the value of typ is at the time of deserialization of content (even though the deserialization order is guaranteed)
or injecting the value of typ in the deserializer then collecting it.
How can this be achieved with serde ?
I have looked at
serde_state but I cannot get the macros working and this library is wrapping serde, which worries me
DeserializeSeed but my undestanding is that it must be used in place of Deserialize for all types and my data model is big
The existing SO answers usually exploit the fact that the related fields are at the same level. This is not the case here: the actual data model is big, deep and the fields are "far apart"
Much simpler using tagging, but changing your data structure:
use serde::{Deserialize, Deserializer}; // 1.0.130
use serde_json; // 1.0.67
#[derive(Debug, Deserialize)]
#[serde(tag = "type", content = "value")]
enum Data {
TypeA(Value<String>),
TypeB(Value<u32>),
}
#[derive(Debug, Deserialize)]
struct Value<T> {
id: String,
content: T,
}
fn main() {
let input = r#"{"type": "TypeA", "value": { "id": "blah", "content": "0xa1b..."}}"#;
let data: Data = serde_json::from_str(input).unwrap();
println!("{:?}", data);
}
Playground
Also, you can write your own custom desializer using some intermediary serde_json::Value:
use serde::{Deserialize, Deserializer};// 1.0.130
use serde_json; // 1.0.67
#[derive(Debug)]
enum Content {
TypeA(String),
TypeB(String),
}
#[derive(Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Debug)]
struct Data {
typ: String,
value: Value,
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let json: serde_json::value::Value = serde_json::value::Value::deserialize(deserializer)?;
let typ = json.get("type").expect("type").as_str().unwrap();
let value = json.get("value").expect("value");
let id = value.get("id").expect("id").as_str().unwrap();
let content = value.get("content").expect("content").as_str().unwrap();
Ok(Data {
typ: typ.to_string(),
value: Value {
id: id.to_string(),
content: {
match typ {
"TypeA" => Content::TypeA(content.to_string()),
"TypeB" => Content::TypeB(content.to_string()),
_ => panic!("Invalid type, but this should be an error not a panic"),
}
}
}
})
}
}
fn main() {
let input = r#"{"type": "TypeA", "value": { "id": "blah", "content": "0xa1b..."}}"#;
let data: Data = serde_json::from_str(input).unwrap();
println!("{:?}", data);
}
Playground
Disclaimer: I didn't handle error correctly and you could also extract the content matching into a function for example. The above code is just to illustrate the main idea.
There's a few different ways this can be solved, e.g. with a custom impl Deserialize for Data, then deserialize into a serde_json::Value, and then manually juggling between the types.
For a somewhat example of that, checkout this answer that I wrote in the past. It's not a one-to-one solution, but it might give some hints for implementing Deserialize manually, for what you want.
That being said. Personally, I prefer to minimize when I have to impl Deserialize manually, and instead deserialize into another type, and have it automatically convert using #[serde(from = "FromType")].
First, instead of type_: String, I'd suggest we introduce enum ContentType.
#[derive(Deserialize, Clone, Copy, Debug)]
enum ContentType {
TypeA,
TypeB,
TypeC,
TypeD,
}
Now, let's consider the types you introduced. I've added a few extra variants to Content, as you mentioned the variants can be different.
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum Content {
TypeA(Vec<u8>),
TypeB(Vec<u8>),
TypeC(String),
TypeD { foo: i32, bar: i32 },
}
#[derive(Deserialize, Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Deserialize, Debug)]
#[serde(try_from = "IntermediateData")]
struct Data {
#[serde(alias = "type")]
type_: ContentType,
value: Value,
}
Nothing crazy yet or much different. All the "magic" happens in the IntermediateData type, along with the impl TryFrom.
First, let's introduce a check_type(), which takes a ContentType and checks it against the Content. If the Content variant doesn't match the ContentType variant, then convert it.
In short, when using #[serde(untagged)] then when serde attempts to deserialize Content it will always return the first successful variant it can deserialize to if any. So if it can deserialize a Vec<u8>, then it will always result in Content::TypeA(). Knowing this, then in our check_type(), if the ContentType is TypeB and the Content is TypeA. Then we simply change it to TypeB.
impl Content {
// TODO: impl proper error type instead of `String`
fn check_type(self, type_: ContentType) -> Result<Self, String> {
match (type_, self) {
(ContentType::TypeA, content # Self::TypeA(_)) => Ok(content),
(ContentType::TypeB, Self::TypeA(content)) => Ok(Self::TypeB(content)),
(ContentType::TypeC | ContentType::TypeD, content) => Ok(content),
(type_, content) => Err(format!(
"unexpected combination of {:?} and {:?}",
type_, content
)),
}
}
}
Now all we need is the intermediate IntermediateData, along with a TryFrom conversion, which calls check_type() on the Content.
#[derive(Deserialize, Debug)]
struct IntermediateData {
#[serde(alias = "type")]
type_: ContentType,
value: Value,
}
impl TryFrom<IntermediateData> for Data {
// TODO: impl proper error type instead of `String`
type Error = String;
fn try_from(mut data: IntermediateData) -> Result<Self, Self::Error> {
data.value.content = data.value.content.check_type(data.type_)?;
Ok(Data {
type_: data.type_,
value: data.value,
})
}
}
That's all. Now we can test it against the following:
// serde = { version = "1", features = ["derive"] }
// serde_json = "1.0"
use std::convert::TryFrom;
use serde::Deserialize;
// ... all the previous code ...
fn main() {
let json = r#"{ "type": "TypeA", "value": { "id": "foo", "content": [0, 1, 2, 3] } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeB", "value": { "id": "foo", "content": [0, 1, 2, 3] } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeC", "value": { "id": "bar", "content": "foo" } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeD", "value": { "id": "baz", "content": { "foo": 1, "bar": 2 } } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
}
Then it correctly results in Datas with Content::TypeA, Content::TypeB, Content::TypeC, and the last one Content::TypeD.
Lastly. There is issue #939 which talks about adding a #[serde(validate = "...")]. However, it was created in 2017, so I wouldn't hold my breath on it.
DeserializeSeed can be mixed with normal Deserialize code. It does not need to be used for all types. Here, it is enough to use it to deserialize Value.
Playground
use serde::de::{DeserializeSeed, IgnoredAny, MapAccess, Visitor};
use serde::*;
use std::fmt;
#[derive(Debug)]
enum ContentType {
A,
B,
Unknown,
}
#[derive(Debug)]
enum Content {
TypeA(String),
TypeB(i32),
}
#[derive(Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Debug)]
struct Data {
typ: String,
value: Value,
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct DataVisitor;
impl<'de> Visitor<'de> for DataVisitor {
type Value = Data;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct Data")
}
fn visit_map<A>(self, mut access: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut typ = None;
let mut value = None;
while let Some(key) = access.next_key()? {
match key {
"type" => {
typ = Some(access.next_value()?);
}
"value" => {
let seed = match typ.as_deref() {
Some("TypeA") => ContentType::A,
Some("TypeB") => ContentType::B,
_ => ContentType::Unknown,
};
value = Some(access.next_value_seed(seed)?);
}
_ => {
access.next_value::<IgnoredAny>()?;
}
}
}
Ok(Data {
typ: typ.unwrap(),
value: value.unwrap(),
})
}
}
deserializer.deserialize_map(DataVisitor)
}
}
impl<'de> DeserializeSeed<'de> for ContentType {
type Value = Value;
fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
where
D: Deserializer<'de>,
{
struct ValueVisitor(ContentType);
impl<'de> Visitor<'de> for ValueVisitor {
type Value = Value;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct Value")
}
fn visit_map<A>(self, mut access: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut id = None;
let mut content = None;
while let Some(key) = access.next_key()? {
match key {
"id" => {
id = Some(access.next_value()?);
}
"content" => {
content = Some(match self.0 {
ContentType::A => Content::TypeA(access.next_value()?),
ContentType::B => Content::TypeB(access.next_value()?),
ContentType::Unknown => {
panic!("Should not happen if type happens to occur before value, but JSON is unordered.");
}
});
}
_ => {
access.next_value::<IgnoredAny>()?;
}
}
}
Ok(Value {
id: id.unwrap(),
content: content.unwrap(),
})
}
}
deserializer.deserialize_map(ValueVisitor(self))
}
}
fn main() {
let j = r#"{"type": "TypeA", "value": {"id": "blah", "content": "0xa1b.."}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap());
let j = r#"{"type": "TypeB", "value": {"id": "blah", "content": 666}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap());
let j = r#"{"type": "TypeB", "value": {"id": "blah", "content": "Foobar"}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap_err());
}
The main downside of this solution is that you lose the possibility of deriving the code.

Deserialization of json with serde by a numerical value as type identifier

I'm quite new to rust and come from an OOP background. So, maybe I misunderstood some rust basics.
I want to parse a fixed json-structure with serde. This structure represents one of different messages types. Each message has a numeric type attribute to distinguish it. The exact structure of the individual message types varies mostly, but they can also be the same.
{"type": 1, "sender_id": 4, "name": "sender", ...}
{"type": 2, "sender_id": 5, "measurement": 3.1415, ...}
{"type": 3, "sender_id": 6, "measurement": 13.37, ...}
...
First of all, I defined an enum to distinguish between message types also a struct for each type of message without a field storing the type.
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "type")]
enum Message {
T1(Type1),
T2(Type2),
T3(Type3),
// ...
}
#[derive(Debug, Serialize, Deserialize)]
struct Type1 {
sender_id: u32,
name: String,
// ...
}
#[derive(Debug, Serialize, Deserialize)]
struct Type2 {
sender_id: u32,
measurement: f64,
// ...
}
#[derive(Debug, Serialize, Deserialize)]
struct Type3 {
sender_id: u32,
measurement: f64,
// ...
}
// ...
When I try to turn a string to a Message object, I get an error.
let message = r#"{"type":1,"sender_id":123456789,"name":"sender"}"#;
let message: Message = serde_json::from_str(message)?; // error here
// Error: Custom { kind: InvalidData, error: Error("invalid type: integer `1`, expected variant identifier", line: 1, column: 9) }
So, as I understood, serde tries to figure out the type of the current message but it needs a string
for that. I also tried to write my own deserialize()-function. I tried to get the numerical value
of the corresponding type-key and wanted to create the specific object by the type value.
How I have to implement the deserialize() to extract the type of the message and create the specific message object? Is it possible to write this without writing a deserialize()-function for each Type1/2/3/... struct?
impl<'de> Deserialize<'de> for Message {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>,
{
// which functions I have to call?
}
Or is there a better solution to achieve my deserialization?
I prepared a playground for this issue: Playground
Serde doesn't support integer tags yet (see issue #745).
If you're able to change what's producing the data, then if you're able to change type into a string, i.e. "1" instead of 1. Then you can get it working simply using #[serde(rename)].
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "type")]
enum Message {
#[serde(rename = "1")]
T1(Type1),
#[serde(rename = "2")]
T2(Type2),
#[serde(rename = "3")]
T3(Type3),
// ...
}
If that's not an option, then you indeed need to create a custom deserializer. The shortest in terms of code, is likely to deserialize into a serde_json::Value, and then match on the type, and deserialize the serde_json::Value into the correct Type{1,2,3}.
use serde_json::Value;
impl<'de> serde::Deserialize<'de> for Message {
fn deserialize<D: serde::Deserializer<'de>>(d: D) -> Result<Self, D::Error> {
let value = Value::deserialize(d)?;
Ok(match value.get("type").and_then(Value::as_u64).unwrap() {
1 => Message::T1(Type1::deserialize(value).unwrap()),
2 => Message::T2(Type2::deserialize(value).unwrap()),
3 => Message::T3(Type3::deserialize(value).unwrap()),
type_ => panic!("unsupported type {:?}", type_),
})
}
}
You'll probably want to perform some proper error handling, instead of unwrapping and panicking.
If you need serialization as well, then you will likewise need a custom serializer. For this you could create a new type to serialize into, as you cannot use Message.
use serde::Serializer;
impl Serialize for Message {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
#[derive(Serialize)]
#[serde(untagged)]
enum Message_<'a> {
T1(&'a Type1),
T2(&'a Type2),
T3(&'a Type3),
}
#[derive(Serialize)]
struct TypedMessage<'a> {
#[serde(rename = "type")]
t: u64,
#[serde(flatten)]
msg: Message_<'a>,
}
let msg = match self {
Message::T1(t) => TypedMessage { t: 1, msg: Message_::T1(t) },
Message::T2(t) => TypedMessage { t: 2, msg: Message_::T2(t) },
Message::T3(t) => TypedMessage { t: 3, msg: Message_::T3(t) },
};
msg.serialize(serializer)
}
}
When using #[serde(flatten)], then it uses serde::private::ser::FlatMapSerializer, which is hidden from the documentation. In place of creating new types, you could use SerializeMap and FlatMapSerializer.
However, be warned, given it's undocumented, then any future release of serde could break your code if you're using FlatMapSerializer directly.
use serde::{private::ser::FlatMapSerializer, ser::SerializeMap, Serializer};
impl Serialize for Message {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut s = serializer.serialize_map(None)?;
let type_ = &match self {
Message::T1(_) => 1,
Message::T2(_) => 2,
Message::T3(_) => 3,
};
s.serialize_entry("type", &type_)?;
match self {
Message::T1(t) => t.serialize(FlatMapSerializer(&mut s))?,
Message::T2(t) => t.serialize(FlatMapSerializer(&mut s))?,
Message::T3(t) => t.serialize(FlatMapSerializer(&mut s))?,
}
s.end()
}
}

Is there a way to create a copy of an enum with some field values updated?

I have an enum with several record variants:
enum A {
Var1 { a: i64, b: i64 },
Var2 { c: i32, d: i32 },
}
I want to create a modified copy of such an enum (with different behavior for each variant). I know I can do this:
match a {
A::Var1 { a, b } => A::Var1 { a: new_a, b },
A::Var2 { c, d } => A::Var2 { c, d: new_d },
}
However each variant has quite a few fields, and I'd prefer not to explicitly pass them all. Is there any way to say "clone this enum, except use this value for field x instead of the cloned value"?
Not exactly.
There's "functional record update syntax", but it's only for structs:
struct Foo {
bar: u8,
baz: u8,
quux: u8,
}
fn foo() {
let foobar = Foo {
bar: 1,
baz: 2,
quux: 3,
};
let bazquux = Foo { baz: 4, ..foobar };
}
Best you can do without creating structs for each variant is something like this:
let mut altered = x.clone();
match &mut altered {
A::Var1 { a, .. } => *a = new_a,
A::Var2 { d, .. } => *d = new_d,
};
altered
I'm afraid you're hitting one of the restrictions on Rust based on its design and your only real solution is mutation in-place and writing a mutator function or four.
The problem with enums is that you need to match to be able to do anything on them. Until that point, Rust knows or infers very little about what the struct actually is. An additional problem is the lack of any kind of reflection-like ability to allow for the ability to query a type and figure out if it has a field, and the inability to do anything but exhaustively match all contents.
Honestly, the cleanest way may actually depend on the purpose of your mutations. Are they a defined set of changes to an enum based on a business concern of some sort? If so, you may actually want to wrap your logic into a trait extension and use that to encapsulate the logic.
Consider, for instance, a very contrived example. We're building an application that has to deal with different items and apply taxes to them. Said taxes depend on the type of products, and for some reason, all our products are represented by variants of an enum, like so:
#[derive(Debug)]
enum Item {
Food { price: u8, calories: u8 },
Technology { price: u8 },
}
trait TaxExt {
fn apply_tax(&mut self);
}
impl TaxExt for Item {
fn apply_tax(&mut self) {
match self {
&mut Item::Food {
ref mut price,
calories: _,
} => {
// Our food costs double, for tax reasons
*price *= 2;
}
&mut Item::Technology { ref mut price } => {
// Technology has a 1 unit tax
*price += 1;
}
}
}
}
fn main() {
let mut obj = Item::Food {
price: 3,
calories: 200,
};
obj.apply_tax();
println!("{:?}", obj);
}
playground
Assuming you can split your logic like so, it is probably the cleanest way to structure this.

Resources