Rust serde untagged enum fails - rust

I'm tying to deserialize one of two structs using serde. The input comes from a csv file.
use serde::Deserialize;
use std::io;
#[derive(Deserialize)]
struct A {
value: i8,
}
#[derive(Deserialize)]
struct B {
value: String,
}
#[derive(Deserialize)]
#[serde(untagged)]
enum C {
One(A),
Two(B),
}
fn main() {
let mut rdr = csv::Reader::from_reader(io::stdin());
for result in rdr.deserialize() {
let record: Result<C, csv::Error> = result;
match record {
Ok(value) => {
println!("ok");
}
Err(error) => {
println!("Error parsing line: {}", error);
}
}
}
}
If I understand untagged enums correctly, this should try parsing it as an A struct, so just an i8, if that fails, try parsing it as a B struct, so just a String.
I've verified that my structs deserialize correctly on their own.
Here's the command I run:
echo "value\nTest\n1" | cargo r
Here's the output:
Error parsing line: CSV deserialize error: record 1 (line: 2, byte: 6): data did not match any variant of untagged enum C
Error parsing line: CSV deserialize error: record 2 (line: 3, byte: 11): data did not match any variant of untagged enum C

If you switch your model from "enum of structs" to "struct containing enum", the csv input works as expected:
use serde::{Deserialize, Deserializer, Serialize};
#[derive(Deserialize)]
enum Value {
One(i8),
Two(String),
}
#[derive(Deserialize)]
struct Data {
value: Value,
}
fn main() {
let mut rdr = csv::Reader::from_reader(io::stdin());
for result in rdr.deserialize() {
let record: Result<Data, csv::Error> = result;
match record {
Ok(value) => {
println!("ok");
}
Err(error) => {
println!("Error parsing line: {}", error);
}
}
}
}
It seems that the csv crate isn't able to understand that the value field in your struct A and the value field in your struct B should belong to the same column. Since they have the same name, defining as a struct containing an untagged enum makes more sense.

Related

How do I extract information about the type in a derive macro?

I am implementing a derive macro to reduce the amount of boilerplate I have to write for similar types.
I want the macro to operate on structs which have the following format:
#[derive(MyTrait)]
struct SomeStruct {
records: HashMap<Id, Record>
}
Calling the macro should generate an implementation like so:
impl MyTrait for SomeStruct {
fn foo(&self, id: Id) -> Record { ... }
}
So I understand how to generate the code using quote:
#[proc_macro_derive(MyTrait)]
pub fn derive_answer_fn(item: TokenStream) -> TokenStream {
...
let generated = quote!{
impl MyTrait for #struct_name {
fn foo(&self, id: #id_type) -> #record_type { ... }
}
}
...
}
But what is the best way to get #struct_name, #id_type and #record_type from the input token stream?
One way is to use the venial crate to parse the TokenStream.
use proc_macro2;
use quote::quote;
use venial;
#[proc_macro_derive(MyTrait)]
pub fn derive_answer_fn(item: proc_macro::TokenStream) -> proc_macro::TokenStream {
// Ensure it's deriving for a struct.
let s = match venial::parse_declaration(proc_macro2::TokenStream::from(item)) {
Ok(venial::Declaration::Struct(s)) => s,
Ok(_) => panic!("Can only derive this trait on a struct"),
Err(_) => panic!("Error parsing into valid Rust"),
};
let struct_name = s.name;
// Get the struct's first field.
let fields = s.fields;
let named_fields = match fields {
venial::StructFields::Named(named_fields) => named_fields,
_ => panic!("Expected a named field"),
};
let inners: Vec<(venial::NamedField, proc_macro2::Punct)> = named_fields.fields.inner;
if inners.len() != 1 {
panic!("Expected exactly one named field");
}
// Get the name and type of the first field.
let first_field_name = &inners[0].0.name;
let first_field_type = &inners[0].0.ty;
// Extract Id and Record from the type HashMap<Id, Record>
if first_field_type.tokens.len() != 6 {
panic!("Expected type T<R, S> for first named field");
}
let id = first_field_type.tokens[2].clone();
let record = first_field_type.tokens[4].clone();
// Implement MyTrait.
let generated = quote! {
impl MyTrait for #struct_name {
fn foo(&self, id: #id) -> #record { *self.#first_field_name.get(&id).unwrap() }
}
};
proc_macro::TokenStream::from(generated)
}

Deserialize using a function of the tag

An API with this internally tagged field structure, with "topic" being the tag:
{
"topic": "Car"
"name": "BMW"
"HP": 250
}
This can be deserialized with
#[derive(Serialize, Deserialize)]
#[serde(tag = "topic")]
pub enum catalog {
CarEntry(Car),
... (other types)
}
#[derive(Serialize, Deserialize)]
pub struct Car {
pub name: String
pub HP: i32
}
It turns out that instead of reporting the topic as just Car, the API actually sends Car.product1 or Car.product2 etc.
This breaks the deserialization, because the deserializer doesn't know what the type is based on the string. Is there a way to supply a function to chop off the type string so that the correct model is found?
I don't think serde provides a way to mangle the tag before using it (at least I don't see anything relevant). And the generated serializers for tagged enums are relatively complex, with internal caching if the tag isn't the first field, and whatnot, so I wouldn't want to reproduce that in a custom deserializer.
The cheapest (but not necessarily most efficient) shot at this is to deserialize to serde_json::Value first, manually process the tag, and then deserialize the serde_json::Values to whatever struct you want.
Do that in a custom deserializer, and it starts looking reasonable:
impl<'de> Deserialize<'de> for Catalog {
fn deserialize<D>(d: D) -> Result<Self, <D as Deserializer<'de>>::Error>
where
D: Deserializer<'de>,
{
use serde_json::{Map, Value};
#[derive(Deserialize)]
struct Pre {
topic: String,
#[serde(flatten)]
data: Map<String, Value>,
}
let v = Pre::deserialize(d)?;
// Now you can mangle Pre any way you want to get your final structs.
match v.topic.as_bytes() {
[b'C', b'a', b'r', b'.', _rest # ..] => Ok(Catalog::CarEntry(
serde_json::from_value(v.data.into()).map_err(de::Error::custom)?,
)),
[b'B', b'a', b'r', b'.', _rest # ..] => Ok(Catalog::BarEntry(
serde_json::from_value(v.data.into()).map_err(de::Error::custom)?,
)),
_ => return Err(de::Error::unknown_variant(&v.topic, &["Car.…", "Bar.…"])),
}
}
}
Playground
Btw, what do you want to do with the suffix of topic? Throw it away? How do you plan on handling serialization if you do throw it away?
You can directly use enum instead of defining extra struct type.
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Debug)]
#[serde(tag = "topic")]
pub enum Catalog {
Car { name: String, hp: i32 }
}
fn main() {
let car = Catalog::Car { name: String::from("BMW"), hp: 2000 };
// Convert the Car to a JSON string.
let serialized = serde_json::to_string(&car).unwrap();
// Prints serialized = {"topic":"Car","name":"BMW","hp":2000}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Car.
let deserialized: Catalog = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Car { name: "BMW", hp: 2000 }
println!("deserialized = {:?}", deserialized);
}
Playground
You can use #[serde(rename()] to rename type in output
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Debug)]
#[serde(tag = "topic")]
pub enum Catalog {
#[serde(rename(serialize = "Car", deserialize = "CarEntry"))]
CarEntry(Car),
}
#[derive(Serialize, Deserialize, Debug)]
pub struct Car {
pub name: String,
pub hp: i32
}
fn main() {
let car = Car { name: String::from("BMW"), hp: 2000 };
let catalog = Catalog::CarEntry(car);
// Convert the Car to a JSON string.
let serialized = serde_json::to_string(&catalog).unwrap();
// Prints serialized = {"topic":"Car","name":"BMW","hp":2000}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Car.
let deserialized: Car = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Car { name: "BMW", hp: 2000 }
println!("deserialized = {:?}", deserialized);
}
Playground

Can one specify serde's rename_all rule at runtime?

I have a data model that I would like to be deserialized from "camelCase" to the rust standard "snake_case" when reading from a source, X. But I'd like to leave it in "snake_case" when reading or writing to another source, Y.
For example, the following code,
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct Data {
foo_bar: String,
hello_word: String,
}
can only be encoded and decoded in camel case. Even if I manually defined my Serialize and Deserialize implementations, I can't define multiple for the same struct. I could define a second struct that's a copy/paste of the other and then derive but that method would get tedious with multiple large structs. What I would really like to do is specify that rename_all attribute at run-time. But I'm not seeing any way to do that in serde's API.
I think the best way sigh is to just write out one struct Data_ per #[serde(rename_all = ...)], then write one additional struct Data that will be the in-memory representation (which won't be serializable, to remove ambiguity), then implement From in both directions for the Data_s and Data so that they're interconvertible.
Thankfully, we can use a macro so that we only have to specify the fields once. (It is incredibly disgusting nonetheless.)
This playground available here.
use serde::{Deserialize, Serialize}; // 1.0.130
use serde_json; // 1.0.69
macro_rules! interconvertible {
($T:ident <-> $U:ident, $($field_name:ident),*) => {
impl From<$T> for $U {
fn from(t: $T) -> Self {
let $T { $($field_name),* } = t;
Self { $($field_name),* }
}
}
impl From<$U> for $T {
fn from(u: $U) -> Self {
let $U { $($field_name),* } = u;
Self { $($field_name),* }
}
}
};
}
macro_rules! create_data_structs {
($($field_name:ident: $field_type:ty),* $(,)?) => {
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
struct DataX {
$($field_name: $field_type),*
}
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "snake_case")]
struct DataY {
$($field_name: $field_type),*
}
#[derive(Debug)]
struct Data {
$($field_name: $field_type),*
}
interconvertible!(DataX <-> Data, $($field_name),*);
interconvertible!(DataY <-> Data, $($field_name),*);
}
}
create_data_structs!(foo_bar: String, hello_world: String);
fn main() -> serde_json::Result<()> {
let x1: DataX = serde_json::from_str(r#"{"fooBar": "a", "helloWorld": "b"}"#)?;
let y1: DataY = serde_json::from_str(r#"{"foo_bar": "a", "hello_world": "b"}"#)?;
println!("{:?}, {:?}", x1, y1);
let x2: Data = x1.into();
let y2: Data = y1.into();
println!("{:?}, {:?}", x2, y2);
let x_string = serde_json::to_string(&DataX::from(x2))?;
let y_string = serde_json::to_string(&DataY::from(y2))?;
println!("{:?}, {:?}", x_string, y_string);
Ok(())
}
The output is:
DataX { foo_bar: "a", hello_world: "b" }, DataY { foo_bar: "a", hello_world: "b" }
[Data { foo_bar: "a", hello_world: "b" }, Data { foo_bar: "a", hello_world: "b" }]
"{\"fooBar\":\"a\",\"helloWorld\":\"b\"}", "{\"foo_bar\":\"a\",\"hello_world\":\"b\"}"
Since I'm only every decoding from source X I can utilize the #[serde(alias = ???)] macro. So my above use case would be
#[derive(Serialize, Deserialize)]
struct Data {
#[serde(alias="fooBar")]
foo_bar: String,
#[serde(alias="helloWorld")]
hello_word: String,
}
It's still a little tedious but better than an intermediate struct. It won't work though if I want to decode or encode to different cases.
(I'm not going to mark this as an answer because it's a work-around for my specific use case. If anyone has a more generic solution feel free to answer.)

Deserialization of json with serde by a numerical value as type identifier

I'm quite new to rust and come from an OOP background. So, maybe I misunderstood some rust basics.
I want to parse a fixed json-structure with serde. This structure represents one of different messages types. Each message has a numeric type attribute to distinguish it. The exact structure of the individual message types varies mostly, but they can also be the same.
{"type": 1, "sender_id": 4, "name": "sender", ...}
{"type": 2, "sender_id": 5, "measurement": 3.1415, ...}
{"type": 3, "sender_id": 6, "measurement": 13.37, ...}
...
First of all, I defined an enum to distinguish between message types also a struct for each type of message without a field storing the type.
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "type")]
enum Message {
T1(Type1),
T2(Type2),
T3(Type3),
// ...
}
#[derive(Debug, Serialize, Deserialize)]
struct Type1 {
sender_id: u32,
name: String,
// ...
}
#[derive(Debug, Serialize, Deserialize)]
struct Type2 {
sender_id: u32,
measurement: f64,
// ...
}
#[derive(Debug, Serialize, Deserialize)]
struct Type3 {
sender_id: u32,
measurement: f64,
// ...
}
// ...
When I try to turn a string to a Message object, I get an error.
let message = r#"{"type":1,"sender_id":123456789,"name":"sender"}"#;
let message: Message = serde_json::from_str(message)?; // error here
// Error: Custom { kind: InvalidData, error: Error("invalid type: integer `1`, expected variant identifier", line: 1, column: 9) }
So, as I understood, serde tries to figure out the type of the current message but it needs a string
for that. I also tried to write my own deserialize()-function. I tried to get the numerical value
of the corresponding type-key and wanted to create the specific object by the type value.
How I have to implement the deserialize() to extract the type of the message and create the specific message object? Is it possible to write this without writing a deserialize()-function for each Type1/2/3/... struct?
impl<'de> Deserialize<'de> for Message {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>,
{
// which functions I have to call?
}
Or is there a better solution to achieve my deserialization?
I prepared a playground for this issue: Playground
Serde doesn't support integer tags yet (see issue #745).
If you're able to change what's producing the data, then if you're able to change type into a string, i.e. "1" instead of 1. Then you can get it working simply using #[serde(rename)].
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "type")]
enum Message {
#[serde(rename = "1")]
T1(Type1),
#[serde(rename = "2")]
T2(Type2),
#[serde(rename = "3")]
T3(Type3),
// ...
}
If that's not an option, then you indeed need to create a custom deserializer. The shortest in terms of code, is likely to deserialize into a serde_json::Value, and then match on the type, and deserialize the serde_json::Value into the correct Type{1,2,3}.
use serde_json::Value;
impl<'de> serde::Deserialize<'de> for Message {
fn deserialize<D: serde::Deserializer<'de>>(d: D) -> Result<Self, D::Error> {
let value = Value::deserialize(d)?;
Ok(match value.get("type").and_then(Value::as_u64).unwrap() {
1 => Message::T1(Type1::deserialize(value).unwrap()),
2 => Message::T2(Type2::deserialize(value).unwrap()),
3 => Message::T3(Type3::deserialize(value).unwrap()),
type_ => panic!("unsupported type {:?}", type_),
})
}
}
You'll probably want to perform some proper error handling, instead of unwrapping and panicking.
If you need serialization as well, then you will likewise need a custom serializer. For this you could create a new type to serialize into, as you cannot use Message.
use serde::Serializer;
impl Serialize for Message {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
#[derive(Serialize)]
#[serde(untagged)]
enum Message_<'a> {
T1(&'a Type1),
T2(&'a Type2),
T3(&'a Type3),
}
#[derive(Serialize)]
struct TypedMessage<'a> {
#[serde(rename = "type")]
t: u64,
#[serde(flatten)]
msg: Message_<'a>,
}
let msg = match self {
Message::T1(t) => TypedMessage { t: 1, msg: Message_::T1(t) },
Message::T2(t) => TypedMessage { t: 2, msg: Message_::T2(t) },
Message::T3(t) => TypedMessage { t: 3, msg: Message_::T3(t) },
};
msg.serialize(serializer)
}
}
When using #[serde(flatten)], then it uses serde::private::ser::FlatMapSerializer, which is hidden from the documentation. In place of creating new types, you could use SerializeMap and FlatMapSerializer.
However, be warned, given it's undocumented, then any future release of serde could break your code if you're using FlatMapSerializer directly.
use serde::{private::ser::FlatMapSerializer, ser::SerializeMap, Serializer};
impl Serialize for Message {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut s = serializer.serialize_map(None)?;
let type_ = &match self {
Message::T1(_) => 1,
Message::T2(_) => 2,
Message::T3(_) => 3,
};
s.serialize_entry("type", &type_)?;
match self {
Message::T1(t) => t.serialize(FlatMapSerializer(&mut s))?,
Message::T2(t) => t.serialize(FlatMapSerializer(&mut s))?,
Message::T3(t) => t.serialize(FlatMapSerializer(&mut s))?,
}
s.end()
}
}

Use of moved value when pattern matching an enum with multiple values after downcasting

I can use pattern matching on an enum that has one String parameter:
extern crate robots;
use std::any::Any;
use robots::actors::{Actor, ActorCell};
#[derive(Clone, PartialEq)]
pub enum ExampleMessage {
Msg { param_a: String },
}
pub struct Dummy {}
impl Actor for Dummy {
// Using `Any` is required for actors in RobotS
fn receive(&self, message: Box<Any>, _context: ActorCell) {
if let Ok(message) = Box::<Any>::downcast::<ExampleMessage>(message) {
match *message {
ExampleMessage::Msg { param_a } => println!("got message"),
}
}
}
}
And yet I am unable to perform pattern matching on an enum with 2 parameters:
#[derive(Clone, PartialEq)]
pub enum ExampleMessage {
Msg { param_a: String, param_b: usize },
}
impl Actor for Dummy {
// Using `Any` is required for actors in RobotS
fn receive(&self, message: Box<Any>, _context: ActorCell) {
if let Ok(message) = Box::<Any>::downcast::<ExampleMessage>(message) {
match *message {
ExampleMessage::Msg { param_a, param_b } => println!("got message"),
}
}
}
}
This results in the error:
error[E0382]: use of moved value: `message`
--> src/example.rs:19:48
|
19 | ExampleMessage::Msg { param_a, param_b } => {
| ------- ^^^^^^^ value used here after move
| |
| value moved here
|
= note: move occurs because `message.param_a` has type `std::string::String`, which does not implement the `Copy` trait
I tried pattern matching on the same enum without downcasting before, and this works fine but I am required to downcast.
This just seems like very strange behavior to me and I don't know how to circumvent this error.
I am using Rust 1.19.0-nightly (afa1240e5 2017-04-29)
I tried pattern matching on the same enum without downcasting before, and this works fine
This is a good attempt at reducing the problem. The issue is that you reduced too far. Downcasting a Box<T> to a Foo doesn't return a Foo, it returns a Box<Foo>:
fn downcast<T>(self) -> Result<Box<T>, Box<Any + 'static>>
You can reproduce the problem with:
#[derive(Clone, PartialEq)]
pub enum ExampleMessage {
Msg { param_a: String, param_b: usize },
}
fn receive2(message: Box<ExampleMessage>) {
match *message {
ExampleMessage::Msg { param_a, param_b } => println!("got message"),
}
}
fn main() {}
The good news
This is a limitation of the current implementation of the borrow checker and your original code will work as-is when non-lexical lifetimes are enabled:
#![feature(nll)]
#[derive(Clone, PartialEq)]
pub enum ExampleMessage {
Msg { param_a: String, param_b: usize },
}
fn receive2(message: Box<ExampleMessage>) {
match *message {
ExampleMessage::Msg { param_a, param_b } => println!("got message"),
}
}
fn main() {}
The current reality
Non-lexical lifetimes and the MIR-based borrow checker are not yet stable!
When you match against a dereferenced value, the value is not normally moved. This allows you to do something like:
enum Foo {
One,
Two,
}
fn main() {
let f = &Foo::One;
match *f {
Foo::One => {}
Foo::Two => {}
}
}
In this case, you wish to take ownership of the thing inside the Box1 in order to take ownership of the fields when destructuring it in the match. You can accomplish this by moving the value out of the box before trying to match on it.
The long way to do this is:
fn receive2(message: Box<ExampleMessage>) {
let message = *message;
match message {
ExampleMessage::Msg { param_a, param_b } => println!("got message"),
}
}
But you can also force the move by using curly braces:
fn receive2(message: Box<ExampleMessage>) {
match {*message} {
ExampleMessage::Msg { param_a, param_b } => println!("got message"),
}
}
I don't fully understand why a single field would work; it's certainly inconsistent. My only guess is that the ownership of the Box is moved to the first param, the param is extracted, then the compiler tries to move it again to the next parameter.
1 — Moving the contained element out via * is a special power that only Box supports. For example, if you try to do this with a reference, you get the "cannot move out of borrowed content" error. You cannot implement the Deref trait to do this either; it's a hard-coded ability inside the compiler.

Resources