How to ignore unknown enum variant while deserializing? - rust

I have several non-exhaustive enums which I need to handle nicely.
When a not yet known variant is detected, I need to simply ignore value and continue processing the others.
I am currently deserializing vectors of data from and I managed to properly obtain vectors of MyStruct for my application.
My application needs to be forward-compatible with new versions of enums and simply ignores unknown variants.
For example, currently:
use serde::{Deserialize};
#[derive(Deserialize, Debug)]
#[non_exhaustive]
pub enum CaseStyle {
Lowercase,
Uppercase,
}
#[derive(Deserialize, Debug)]
#[non_exhaustive]
pub enum Encoding {
Plain,
Base64,
}
#[derive(Deserialize, Debug)]
pub struct MyStruct {
case_style: CaseStyle,
encoding: Encoding,
}
fn main() {
let j = r#"[
{"case_style": "Lowercase","encoding":"Plain"},
{"case_style": "Snakecase","encoding":"Plain"},
{"case_style": "Lowercase","encoding":"Aes"},
{"case_style": "Uppercase","encoding":"Base64"}
]"#;
// Convert the JSON string to vec.
let deserialized: Vec<MyStruct> = serde_json::from_str(&j).unwrap();
// Prints deserialized = [MyStruct { case_style: Lowercase, encoding: Plain }, MyStruct { case_style: Uppercase, encoding: Base64 }]
println!("deserialized = {:?}", deserialized);
}
This example fails because of the 2 unknown variant in json data. How could I just ignore these unknown variants from the deserialization?

You can deserialize a Vec<Option<MyStruct>> by converting all deserialization errors to None and all successes to Some(...). Afterwards, you can remove the Option by flattening them. You could skip the Option but this would require you to write a custom deserializer for Vec.
Based on the serde_with crate:
use serde::Deserialize;
use serde_with::{serde_as, DefaultOnError};
#[derive(Deserialize, Debug)]
#[non_exhaustive]
pub enum CaseStyle {
Lowercase,
Uppercase,
}
#[derive(Deserialize, Debug)]
#[non_exhaustive]
pub enum Encoding {
Plain,
Base64,
}
#[derive(Deserialize, Debug)]
pub struct MyStruct {
case_style: CaseStyle,
encoding: Encoding,
}
fn main() {
let j = r#"[
{"case_style": "Lowercase","encoding":"Plain"},
{"case_style": "Snakecase","encoding":"Plain"},
{"case_style": "Lowercase","encoding":"Aes"},
{"case_style": "Uppercase","encoding":"Base64"}
]"#;
#[serde_as]
#[derive(Deserialize)]
struct W(#[serde_as(as = "Vec<DefaultOnError>")] Vec<Option<MyStruct>>);
// Convert the JSON string to vec.
let deserialized: Vec<MyStruct> = serde_json::from_str::<W>(&j)
.unwrap()
.0
.into_iter()
.flatten()
.collect();
// Prints deserialized = [MyStruct { case_style: Lowercase, encoding: Plain }, MyStruct { case_style: Uppercase, encoding: Base64 }]
println!("deserialized = {:?}", deserialized);
}

use serde::{Deserialize, Deserializer};
use serde_with::{serde_as, DefaultOnError};
#[derive(Deserialize, Debug)]
#[non_exhaustive]
pub enum CaseStyle {
Lowercase,
Uppercase,
}
#[derive(Deserialize, Debug)]
#[non_exhaustive]
pub enum Encoding {
Plain,
Base64,
}
#[derive(Deserialize, Debug)]
pub struct MyStruct {
case_style: CaseStyle,
encoding: Encoding,
}
#[derive(Deserialize, Debug)]
pub struct VecMyStruct {
#[serde(deserialize_with = "skip_on_error")]
items: Vec<MyStruct>,
}
fn skip_on_error<'de, D>(deserializer: D) -> Result<Vec<MyStruct>, D::Error>
where
D: Deserializer<'de>,
{
#[serde_as]
#[derive(Deserialize, Debug)]
struct MayBeT(#[serde_as(as = "DefaultOnError")] Option<MyStruct>);
let values: Vec<MayBeT> = Deserialize::deserialize(deserializer)?;
Ok(values.into_iter().filter_map(|t| t.0).collect())
}
fn main() {
let j = r#"{"items":[
{"case_style": "Lowercase","encoding":"Plain"},
{"case_style": "Snakecase","encoding":"Plain"},
{"case_style": "Lowercase","encoding":"Aes"},
{"case_style": "Uppercase","encoding":"Base64"}
]}"#;
// Convert the JSON string to vec.
let deserialized: VecMyStruct = serde_json::from_str(&j).unwrap();
// Prints deserialized = VecMyStruct { items: [MyStruct { case_style: Lowercase, encoding: Plain }, MyStruct { case_style: Uppercase, encoding: Base64 }] }
println!("deserialized = {:?}", deserialized);
}
I found this way of doing thanks to #jonasbb. The only issue that I now have is to have a more generic "skip_on_error" method, that could take any T: Deserialize<'de> and not only MyStruct. As soon as I add T type, compiler tells me errors while implementing Deserialization for MaybeT.

Related

Rust: how to minimize patternmatching when parsing json into complex enum

So, let's say I am expecting a lot of different JSONs of a known format from a network stream. I define structures for them and wrap them with an enum representing all the possibilities:
use serde::Deserialize;
use serde_json;
#[derive(Deserialize, Debug)]
struct FirstPossibleResponse {
first_field: String,
}
#[derive(Deserialize, Debug)]
struct SecondPossibleResponse {
second_field: String,
}
#[derive(Deserialize, Debug)]
enum ResponseFromNetwork {
FirstPossibleResponse(FirstPossibleResponse),
SecondPossibleResponse(SecondPossibleResponse),
}
Then, being a smart folk, I want to provide myself with a short way of parsing these JSONs into my structures, so I am implementing a trait (and here is the problem):
impl From<String> for ResponseFromNetwork {
fn from(r: String) -> Self {
match serde_json::from_slice(&r.as_bytes()) {
Ok(v) => ResponseFromNetwork::FirstPossibleResponse(v),
Err(_) => match serde_json::from_slice(&r.as_bytes()) {
Ok(v) => ResponseFromNetwork::SecondPossibleResponse(v),
Err(_) => unimplemented!("idk"),
},
}
}
}
...To use it later like this:
fn main() {
let data_first = r#"
{
"first_field": "first_value"
}"#;
let data_second = r#"
{
"second_field": "first_value"
}"#;
print!("{:?}", ResponseFromNetwork::from(data_first.to_owned()));
print!("{:?}", ResponseFromNetwork::from(data_second.to_owned()));
}
Rust playground
So, as mentioned earlier, the problem is that - this match tree is the only way I got parsing work, and you can imagine - the more variations of different JSONs I might possibly get over the network - the deeper and nastier the tree grows.
I want to have it in some way like ths, e.g. parse it once and then operate depending on the value:
use serde_json::Result;
impl From<String> for ResponseFromNetwork {
fn from(r: String) -> Self {
let parsed: Result<ResponseFromNetwork> = serde_json::from_slice(r.as_bytes());
match parsed {
Ok(v) => {
match v {print!("And here we should match on invariants or something: {:?}", v);
v
}
Err(e) => unimplemented!("{:?}", e),
}
}
}
But it doesn't really work:
thread 'main' panicked at 'not implemented: Error("unknown variant `first_field`, expected `FirstPossibleResponse` or `SecondPossibleResponse`", line: 3, column: 25)', src/main.rs:28:23
Playground
#[serde(untagged)] is designed for precisely that use case. Just add it in front of the definition of enum ResponseFromNetwork and your code will work the way you want it to:
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum ResponseFromNetwork {
FirstPossibleResponse(FirstPossibleResponse),
SecondPossibleResponse(SecondPossibleResponse),
}
impl From<String> for ResponseFromNetwork {
fn from(r: String) -> Self {
match serde_json::from_slice(r.as_bytes()) {
Ok(v) => v,
Err(e) => unimplemented!("{:?}", e),
}
}
}
Playground
If the formats of the response JSON strings can be extended (maybe not if they are predefined and unchangable), adding a tag field, say "kind", in each JSON, and annotating each variant struct with #[serde(tag = "kind")] and the enum with #[serde(untagged)] can address the issue.
[playground]
use serde::Deserialize;
use serde_json;
#[derive(Deserialize, Debug)]
#[serde(tag = "kind")]
struct FirstPossibleResponse {
first_field: String,
}
#[derive(Deserialize, Debug)]
#[serde(tag = "kind")]
struct SecondPossibleResponse {
second_field: String,
}
#[derive(Deserialize, Debug)]
#[serde(tag = "kind")]
struct ThirdPossibleResponse {
third_field: String,
}
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum ResponseFromNetwork {
FirstPossibleResponse(FirstPossibleResponse),
SecondPossibleResponse(SecondPossibleResponse),
ThirdPossibleResponse(ThirdPossibleResponse),
}
impl From<String> for ResponseFromNetwork {
fn from(r: String) -> Self {
match serde_json::from_slice(&r.as_bytes()) {
Ok(v) => v,
Err(_) => unimplemented!("idk"),
}
}
}
fn main() {
let data_first = r#"
{
"kind":"FirstPossibleResponse",
"first_field": "first_value"
}"#;
let data_second = r#"
{
"kind":"SecondPossibleResponse",
"second_field": "second_value"
}"#;
let data_third = r#"
{
"kind":"ThirdPossibleResponse",
"third_field": "third_value"
}"#;
println!("{:?}", ResponseFromNetwork::from(data_first.to_owned()));
println!("{:?}", ResponseFromNetwork::from(data_second.to_owned()));
println!("{:?}", ResponseFromNetwork::from(data_third.to_owned()));
}

Capture original payload through serde

I wonder whether there's a way to preserve the original String using serde_json? Consider this example:
#[derive(Debug, Serialize, Deserialize)]
struct User {
#[serde(skip)]
pub raw: String,
pub id: u64,
pub login: String,
}
{
"id": 123,
"login": "johndoe"
}
My structure would end up containing such values:
User {
raw: String::from(r#"{"id": 123,"login": "johndoe"}"#),
id: 1,
login: String::from("johndoe")
}
Currently, I'm doing that by deserializing into Value, then deserializing this value into the User structure and assigning Value to the raw field, but that doesn't seem right, perhaps there's a better way to do so?
This solution uses the RawValue type from serde_json to first get the original input string. Then a new Deserializer is created from that String to deserialize the User type.
This solution can work with readers, by using Box<serde_json::value::RawValue> as an intermediary type and it can also work with struct which borrow from the input, by using &'de serde_json::value::RawValue as the intermediary. You can test it in the solution by (un-)commenting the borrow field.
use std::marker::PhantomData;
#[derive(Debug, serde::Serialize, serde::Deserialize)]
#[serde(remote = "Self")]
struct User<'a> {
#[serde(skip)]
pub raw: String,
pub id: u64,
pub login: String,
// Test for borrowing input data
// pub borrow: &'a str,
#[serde(skip)]
pub ignored: PhantomData<&'a ()>,
}
impl serde::Serialize for User<'_> {
fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
Self::serialize(self, serializer)
}
}
impl<'a, 'de> serde::Deserialize<'de> for User<'a>
where
'de: 'a,
{
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
use serde::de::Error;
// Deserializing a `&'a RawValue` would also work here
// but then you loose support for deserializing from readers
let raw: Box<serde_json::value::RawValue> = Box::deserialize(deserializer)?;
// Use this line instead if you have a struct which borrows from the input
// let raw = <&'de serde_json::value::RawValue>::deserialize(deserializer)?;
let mut raw_value_deserializer = serde_json::Deserializer::from_str(raw.get());
let mut user =
User::deserialize(&mut raw_value_deserializer).map_err(|err| D::Error::custom(err))?;
user.raw = raw.get().to_string();
Ok(user)
}
}
fn main() {
// Test serialization
let u = User {
raw: String::new(),
id: 456,
login: "USERNAME".to_string(),
// Test for borrowing input data
// borrow: "foobar",
ignored: PhantomData,
};
let json = serde_json::to_string(&u).unwrap();
println!("{}", json);
// Test deserialization
let u2: User = serde_json::from_str(&json).unwrap();
println!("{:#?}", u2);
}
Test on the Playground.

Deserialize map with empty objects as values

I have json documents, that may contain objects which have keys that refer to empty objects, like this
{
"mylist": {
"foo": {},
"bar": {}
}
}
I would like to deserialize them to a Vec of Strings (and serialize it back to the format above later)
pub struct MyStruct {
#[serde(skip_serializing_if = "Option::is_none")]
pub my_list: Option<Vec<String>>; // should contain "foo", "bar"
}
How can I do that with serde?
You need to write your own deserializing method, and use deserialize_with or implement it directly for your type:
use serde::Deserialize; // 1.0.127
use serde::Deserializer;
use serde_json;
use std::collections::HashMap; // 1.0.66
#[derive(Deserialize, Debug)]
pub struct MyStruct {
#[serde(deserialize_with = "deserialize_as_vec", alias = "mylist")]
pub my_list: Vec<String>,
}
#[derive(Deserialize)]
struct DesHelper {}
fn deserialize_as_vec<'de, D>(deserializer: D) -> Result<Vec<String>, D::Error>
where
D: Deserializer<'de>,
{
let data: HashMap<String, DesHelper> = HashMap::<String, DesHelper>::deserialize(deserializer)?;
Ok(data.keys().cloned().collect())
}
fn main() {
let example = r#"
{
"mylist": {
"foo": {},
"bar": {}
}
}"#;
let deserialized: MyStruct = serde_json::from_str(&example).unwrap();
println!("{:?}", &deserialized);
}
Results:
MyStruct { my_list: ["foo", "bar"] }
Playground
Notice the use of the helper struct for the empty parts of them. Code is pretty straight forward, you basically deserialize a map and then just take the keys that is what you need.

Using serde to deserialize a HashMap with a Enum key

I have the following Rust code which models a configuration file which includes a HashMap keyed with an enum.
use std::collections::HashMap;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Hash)]
enum Source {
#[serde(rename = "foo")]
Foo,
#[serde(rename = "bar")]
Bar
}
#[derive(Debug, Clone, Serialize, Deserialize)]
struct SourceDetails {
name: String,
address: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Config {
name: String,
main_source: Source,
sources: HashMap<Source, SourceDetails>,
}
fn main() {
let config_str = std::fs::read_to_string("testdata.toml").unwrap();
match toml::from_str::<Config>(&config_str) {
Ok(config) => println!("toml: {:?}", config),
Err(err) => eprintln!("toml: {:?}", err),
}
let config_str = std::fs::read_to_string("testdata.json").unwrap();
match serde_json::from_str::<Config>(&config_str) {
Ok(config) => println!("json: {:?}", config),
Err(err) => eprintln!("json: {:?}", err),
}
}
This is the Toml representation:
name = "big test"
main_source = "foo"
[sources]
foo = { name = "fooname", address = "fooaddr" }
[sources.bar]
name = "barname"
address = "baraddr"
This is the JSON representation:
{
"name": "big test",
"main_source": "foo",
"sources": {
"foo": {
"name": "fooname",
"address": "fooaddr"
},
"bar": {
"name": "barname",
"address": "baraddr"
}
}
}
Deserializing the JSON with serde_json works perfectly, but deserializing the Toml with toml gives the error.
Error: Error { inner: ErrorInner { kind: Custom, line: Some(5), col: 0, at: Some(77), message: "invalid type: string \"foo\", expected enum Source", key: ["sources"] } }
If I change the sources HashMap to be keyed on String instead of Source, both the JSON and the Toml deserialize correctly.
I'm pretty new to serde and toml, so I'm looking for suggestions on how to I would properly de-serialize the toml variant.
As others have said in the comments, the Toml deserializer doesn't support enums as keys.
You can use serde attributes to convert them to String first:
use std::convert::TryFrom;
use std::fmt;
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Hash)]
#[serde(try_from = "String")]
enum Source {
Foo,
Bar
}
And then implement a conversion from String:
struct SourceFromStrError;
impl fmt::Display for SourceFromStrError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.write_str("SourceFromStrError")
}
}
impl TryFrom<String> for Source {
type Error = SourceFromStrError;
fn try_from(s: String) -> Result<Self, Self::Error> {
match s.as_str() {
"foo" => Ok(Source::Foo),
"bar" => Ok(Source::Bar),
_ => Err(SourceFromStrError),
}
}
}
If you only need this for the HashMap in question, you could also follow the suggestion in the Toml issue, which is to keep the definition of Source the same and use the crate, serde_with, to modify how the HashMap is serialized instead:
use serde_with::{serde_as, DisplayFromStr};
use std::collections::HashMap;
#[serde_as]
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Config {
name: String,
main_source: Source,
#[serde_as(as = "HashMap<DisplayFromStr, _>")]
sources: HashMap<Source, SourceDetails>,
}
This requires a FromStr implementation for Source, rather than TryFrom<String>:
impl FromStr for Source {
type Err = SourceFromStrError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"foo" => Ok(Source::Foo),
"bar" => Ok(Source::Bar),
_ => Err(SourceFromStrError),
}
}
}

How do I deserialize a struct through its own `new` constructor? [duplicate]

I'm using Serde to deserialize an XML file which has the hex value 0x400 as a string and I need to convert it to the value 1024 as a u32.
Do I need to implement the Visitor trait so that I separate 0x and then decode 400 from base 16 to base 10? If so, how do I do that so that deserialization for base 10 integers remains intact?
The deserialize_with attribute
The easiest solution is to use the Serde field attribute deserialize_with to set a custom serialization function for your field. You then can get the raw string and convert it as appropriate:
use serde::{de::Error, Deserialize, Deserializer}; // 1.0.94
use serde_json; // 1.0.40
#[derive(Debug, Deserialize)]
struct EtheriumTransaction {
#[serde(deserialize_with = "from_hex")]
account: u64, // hex
amount: u64, // decimal
}
fn from_hex<'de, D>(deserializer: D) -> Result<u64, D::Error>
where
D: Deserializer<'de>,
{
let s: &str = Deserialize::deserialize(deserializer)?;
// do better hex decoding than this
u64::from_str_radix(&s[2..], 16).map_err(D::Error::custom)
}
fn main() {
let raw = r#"{"account": "0xDEADBEEF", "amount": 100}"#;
let transaction: EtheriumTransaction =
serde_json::from_str(raw).expect("Couldn't derserialize");
assert_eq!(transaction.amount, 100);
assert_eq!(transaction.account, 0xDEAD_BEEF);
}
playground
Note how this can use any other existing Serde implementation to decode. Here, we decode to a string slice (let s: &str = Deserialize::deserialize(deserializer)?). You can also create intermediate structs that map directly to your raw data, derive Deserialize on them, then deserialize to them inside your implementation of Deserialize.
Implement serde::Deserialize
From here, it's a tiny step to promoting it to your own type to allow reusing it:
#[derive(Debug, Deserialize)]
struct EtheriumTransaction {
account: Account, // hex
amount: u64, // decimal
}
#[derive(Debug, PartialEq)]
struct Account(u64);
impl<'de> Deserialize<'de> for Account {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let s: &str = Deserialize::deserialize(deserializer)?;
// do better hex decoding than this
u64::from_str_radix(&s[2..], 16)
.map(Account)
.map_err(D::Error::custom)
}
}
playground
This method allows you to also add or remove fields as the "inner" deserialized type can do basically whatever it wants.
The from and try_from attributes
You can also place the custom conversion logic from above into a From or TryFrom implementation, then instruct Serde to make use of that via the from or try_from attributes:
#[derive(Debug, Deserialize)]
struct EtheriumTransaction {
account: Account, // hex
amount: u64, // decimal
}
#[derive(Debug, PartialEq, Deserialize)]
#[serde(try_from = "IntermediateAccount")]
struct Account(u64);
#[derive(Deserialize)]
struct IntermediateAccount<'a>(&'a str);
impl<'a> TryFrom<IntermediateAccount<'a>> for Account {
type Error = std::num::ParseIntError;
fn try_from(other: IntermediateAccount<'a>) -> Result<Self, Self::Error> {
// do better hex decoding than this
u64::from_str_radix(&other.0[2..], 16).map(Self)
}
}
See also:
How to transform fields during serialization using Serde?

Resources