How to parse TOML in Rust with unknown structure? - rust

My configuration file has a large number of arbitrary key-value pairs in it, which I want to parse using the toml crate. However it seems as if the standard way is to use a given struct that fits the configuration file. How can I load the key-value pairs into a data structure like a map or an iterator of pairs, instead of having to specifiy the structure beforehand with a struct?

toml as a Value struct that can hold anything and that you can introspect dynamically in order to discover any content without forcing the usage of a specific structure.
use toml::Value;
fn show_value(
value: &Value,
indent: usize,
) {
let pfx = " ".repeat(indent);
print!("{}", pfx);
match value {
Value::String(string) => {
println!("a string --> {}", string);
}
Value::Integer(integer) => {
println!("an integer --> {}", integer);
}
Value::Float(float) => {
println!("a float --> {}", float);
}
Value::Boolean(boolean) => {
println!("a boolean --> {}", boolean);
}
Value::Datetime(datetime) => {
println!("a datetime --> {}", datetime);
}
Value::Array(array) => {
println!("an array");
for v in array.iter() {
show_value(v, indent + 1);
}
}
Value::Table(table) => {
println!("a table");
for (k, v) in table.iter() {
println!("{}key {}", pfx, k);
show_value(v, indent + 1);
}
}
}
}
fn main() {
let input_text = r#"
abc = 123
[def]
ghi = "hello"
jkl = [ 12.34, 56.78 ]
"#;
let value = input_text.parse::<Value>().unwrap();
show_value(&value, 0);
}
/*
a table
key abc
an integer --> 123
key def
a table
key ghi
a string --> hello
key jkl
an array
a float --> 12.34
a float --> 56.78
*/

You actually don't need to do anything special other than tell it to deserialize into a HashMap:
use std::collections::HashMap;
use toml;
fn main() {
let toml_data = r#"
foo = "123"
bar = "456"
"#;
let config: HashMap<String, String> = toml::from_str(toml_data).unwrap();
println!("{:?}", config);
}
Of course, since TOML and Rust are both typed, your keys all need to be the same type (String in this example), and it cannot handle tables, since it wouldn't know where in the map a table should go.
If you do have a couple tables, just add your maps as fields to a struct and that works just as simply:
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use toml;
#[derive(Debug, Serialize, Deserialize)]
struct Config {
data_a: HashMap<String, String>,
data_b: HashMap<String, String>,
}
fn main() {
let toml_data = r#"
[data_a]
foo = "123"
bar = "456"
[data_b]
bat = "123"
baz = "456"
"#;
let config: Config = toml::from_str(toml_data).unwrap();
println!("{:?}", config);
}

For what is is worth!
I came here to find a way to handle a toml-config file for my project.
This is what I've found:
You can parse an arbitrary toml file by using the Table type.
See the documentation.
All types can be automatically parsed but you cannot escape the fact that rust is typed. Therefore you have to parse the values into an expected type.
See my example:
use toml::Table;
fn main() {
//Load toml file
let path = std::path::Path::new("../Cargo.toml");
let file = match std::fs::read_to_string(path) {
Ok(f) => f,
Err(e) => panic!("{}", e),
};
let cfg: Table = file.parse().unwrap();
println!("Config in table format\n");
dbg!(&cfg);
println!("Index into config");
let cfg_string: &str = cfg["package"]["version"].as_str().unwrap();
println!("Version: {:?}", cfg_string);
let cfg_bool: bool = cfg["package"]["nest"]["nested_bool"].as_bool().unwrap();
println!("Nested bool: {:?}", cfg_bool);
// Default value if failed
let cfg_float: f64 = cfg["package"]["nest"]["nested_int"]
.as_float()
.unwrap_or(5.0);
println!("Default float to value: {:?}", cfg_float);
}
The toml-file
[package]
name = "rust_test"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
toml = "0.6.0"
[package.nest]
nested_float = 1.0
nested_int = 1
nested_bool = false
Output:
Config in table format
[src/main.rs:13] &cfg = {
"dependencies": Table(
{
"toml": String(
"0.6.0",
),
},
),
"package": Table(
{
"edition": String(
"2021",
),
"name": String(
"rust_test",
),
"nest": Table(
{
"nested_bool": Boolean(
false,
),
"nested_float": Float(
1.0,
),
"nested_int": Integer(
1,
),
},
),
"version": String(
"0.1.0",
),
},
),
}
Index into config
Version: "0.1.0"
Nested bool: false
Default float to value: 5.0

Related

Get struct from inside tuple variant [duplicate]

I wish that enums in Rust can be used like Haskell's productive type. I want to
access a field's value directly
assign a field's value directly or make a clone with the changing value.
Directly means that not using too long pattern matching code, but just could access like let a_size = a.size.
In Haskell:
data TypeAB = A {size::Int, name::String} | B {size::Int, switch::Bool} deriving Show
main = do
let a = A 1 "abc"
let b = B 1 True
print (size a) -- could access a field's value directly
print (name a) -- could access a field's value directly
print (switch b) -- could access a field's value directly
let aa = a{size=2} -- could make a clone directly with the changing value
print aa
I tried two styles of Rust enum definition like
Style A:
#[derive(Debug)]
enum EntryType {
A(TypeA),
B(TypeB),
}
#[derive(Debug)]
struct TypeA {
size: u32,
name: String,
}
#[derive(Debug)]
struct TypeB {
size: u32,
switch: bool,
}
fn main() {
let mut ta = TypeA {
size: 3,
name: "TAB".to_string(),
};
println!("{:?}", &ta);
ta.size = 2;
ta.name = "TCD".to_string();
println!("{:?}", &ta);
let mut ea = EntryType::A(TypeA {
size: 1,
name: "abc".to_string(),
});
let mut eb = EntryType::B(TypeB {
size: 1,
switch: true,
});
let vec_ab = vec![&ea, &eb];
println!("{:?}", &ea);
println!("{:?}", &eb);
println!("{:?}", &vec_ab);
// Want to do like `ta.size = 2` for ea
// Want to do like `ta.name = "bcd".to_string()` for ea
// Want to do like `tb.switch = false` for eb
// ????
println!("{:?}", &ea);
println!("{:?}", &eb);
println!("{:?}", &vec_ab);
}
Style B:
#[derive(Debug)]
enum TypeCD {
TypeC { size: u32, name: String },
TypeD { size: u32, switch: bool },
}
fn main() {
// NOTE: Rust requires representative struct name before each constructor
// TODO: Check constructor name can be duplicated
let mut c = TypeCD::TypeC {
size: 1,
name: "abc".to_string(),
};
let mut d = TypeCD::TypeD {
size: 1,
switch: true,
};
let vec_cd = vec![&c, &d];
println!("{:?}", &c);
println!("{:?}", &d);
println!("{:?}", &vec_cd);
// Can't access a field's value like
// let c_size = c.size
let c_size = c.size; // [ERROR]: No field `size` on `TypeCD`
let c_name = c.name; // [ERROR]: No field `name` on `TypeCD`
let d_switch = d.switch; // [ERROR]: No field `switch` on `TypeCD`
// Can't change a field's value like
// c.size = 2;
// c.name = "cde".to_string();
// d.switch = false;
println!("{:?}", &c);
println!("{:?}", &d);
println!("{:?}", &vec_cd);
}
I couldn't access/assign values directly in any style. Do I have to implement functions or a trait just to access a field's value? Is there some way of deriving things to help this situation?
What about style C:
#[derive(Debug)]
enum Color {
Green { name: String },
Blue { switch: bool },
}
#[derive(Debug)]
struct Something {
size: u32,
color: Color,
}
fn main() {
let c = Something {
size: 1,
color: Color::Green {
name: "green".to_string(),
},
};
let d = Something {
size: 2,
color: Color::Blue { switch: true },
};
let vec_cd = vec![&c, &d];
println!("{:?}", &c);
println!("{:?}", &d);
println!("{:?}", &vec_cd);
let _ = c.size;
}
If all variant have something in common, why separate them?
Of course, I need to access not common field too.
This would imply that Rust should define what to do when the actual type at runtime doesn't contain the field you required. So, I don't think Rust would add this one day.
You could do it yourself. It will require some lines of code, but that matches the behavior of your Haskell code. However, I don't think this is the best thing to do. Haskell is Haskell, I think you should code in Rust and not try to code Haskell by using Rust. That a general rule, some feature of Rust come directly from Haskell, but what you want here is very odd in my opinion for Rust code.
#[derive(Debug)]
enum Something {
A { size: u32, name: String },
B { size: u32, switch: bool },
}
impl Something {
fn size(&self) -> u32 {
match self {
Something::A { size, .. } => *size,
Something::B { size, .. } => *size,
}
}
fn name(&self) -> &String {
match self {
Something::A { name, .. } => name,
Something::B { .. } => panic!("Something::B doesn't have name field"),
}
}
fn switch(&self) -> bool {
match self {
Something::A { .. } => panic!("Something::A doesn't have switch field"),
Something::B { switch, .. } => *switch,
}
}
fn new_size(&self, size: u32) -> Something {
match self {
Something::A { name, .. } => Something::A {
size,
name: name.clone(),
},
Something::B { switch, .. } => Something::B {
size,
switch: *switch,
},
}
}
// etc...
}
fn main() {
let a = Something::A {
size: 1,
name: "Rust is not haskell".to_string(),
};
println!("{:?}", a.size());
println!("{:?}", a.name());
let b = Something::B {
size: 1,
switch: true,
};
println!("{:?}", b.switch());
let aa = a.new_size(2);
println!("{:?}", aa);
}
I think there is currently no built-in way of accessing size directly on the enum type. Until then, enum_dispatch or a macro-based solution may help you.

How do I serialize Polars DataFrame Row/HashMap of `AnyValue` into JSON?

I have a row of a polars dataframe created using iterators reading a parquet file from this method: Iterate over rows polars rust
I have constructed a HashMap that represents an individual row and I would like to now convert that row into JSON.
This is what my code looks like so far:
use polars::prelude::*;
use std::iter::zip;
use std::{fs::File, collections::HashMap};
fn main() -> anyhow::Result<()> {
let file = File::open("0.parquet").unwrap();
let mut df = ParquetReader::new(file).finish()?;
dbg!(df.schema());
let fields = df.fields();
let columns: Vec<&String> = fields.iter().map(|x| x.name()).collect();
df.as_single_chunk_par();
let mut iters = df.iter().map(|s| s.iter()).collect::<Vec<_>>();
for _ in 0..df.height() {
let mut row = HashMap::new();
for (column, iter) in zip(&columns, &mut iters) {
let value = iter.next().expect("should have as many iterations as rows");
row.insert(column, value);
}
dbg!(&row);
let json = serde_json::to_string(&row).unwrap();
dbg!(json);
break;
}
Ok(())
}
And I have the following feature flags enabled: ["parquet", "serde", "dtype-u8", "dtype-i8", "dtype-date", "dtype-datetime"].
I am running into the following error at the serde_json::to_string(&row).unwrap() line:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("the enum variant AnyValue::Datetime cannot be serialized", line: 0, column: 0)', src/main.rs:47:48
I am also unable to implement my own serialized for AnyValue::DateTime because of only traits defined in the current crate can be implemented for types defined outside of the crate.
What's the best way to serialize this row into JSON?
I was able to resolve this error by using a match statement over value to change it from a Datetime to an Int64.
let value = match value {
AnyValue::Datetime(value, TimeUnit::Milliseconds, _) => AnyValue::Int64(value),
x => x
};
row.insert(column, value);
Root cause is there is no enum variant for Datetime in the impl Serialize block: https://docs.rs/polars-core/0.24.0/src/polars_core/datatypes/mod.rs.html#298
Although this code now works, it outputs data that looks like:
{'myintcolumn': {'Int64': 22342342343},
'mylistoclumn': {'List': {'datatype': 'Int32', 'name': '', 'values': []}},
'mystrcolumn': {'Utf8': 'lorem ipsum lorem ipsum'}
So you likely to be customizing the serialization here regardless of the data type.
Update: If you want to get the JSON without all of the inner nesting, I had to do a gnarly match statement:
use polars::prelude::*;
use std::iter::zip;
use std::{fs::File, collections::HashMap};
use serde_json::json;
fn main() -> anyhow::Result<()> {
let file = File::open("0.parquet").unwrap();
let mut df = ParquetReader::new(file).finish()?;
dbg!(df.schema());
let fields = df.fields();
let columns: Vec<&String> = fields.iter().map(|x| x.name()).collect();
df.as_single_chunk_par();
let mut iters = df.iter().map(|s| s.iter()).collect::<Vec<_>>();
for _ in 0..df.height() {
let mut row = HashMap::new();
for (column, iter) in zip(&columns, &mut iters) {
let value = iter.next().expect("should have as many iterations as rows");
let value = match value {
AnyValue::Null => json!(Option::<String>::None),
AnyValue::Int64(val) => json!(val),
AnyValue::Int32(val) => json!(val),
AnyValue::Int8(val) => json!(val),
AnyValue::Float32(val) => json!(val),
AnyValue::Float64(val) => json!(val),
AnyValue::Utf8(val) => json!(val),
AnyValue::List(val) => {
match val.dtype() {
DataType::Int32 => ({let vec: Vec<Option<_>> = val.i32().unwrap().into_iter().collect(); json!(vec)}),
DataType::Float32 => ({let vec: Vec<Option<_>> = val.f32().unwrap().into_iter().collect(); json!(vec)}),
DataType::Utf8 => ({let vec: Vec<Option<_>> = val.utf8().unwrap().into_iter().collect(); json!(vec)}),
DataType::UInt8 => ({let vec: Vec<Option<_>> = val.u8().unwrap().into_iter().collect(); json!(vec)}),
x => panic!("unable to parse list column: {} with value: {} and type: {:?}", column, x, x.inner_dtype())
}
},
AnyValue::Datetime(val, TimeUnit::Milliseconds, _) => json!(val),
x => panic!("unable to parse column: {} with value: {}", column, x)
};
row.insert(*column as &str, value);
}
let json = serde_json::to_string(&row).unwrap();
dbg!(json);
break;
}
Ok(())
}

serde: deserialize a field based on the value of another field

I would like to deserialize a wire format, like this JSON, into the Data structure below and I am failing to write the serde Deserialize implementations for the corresponding rust types.
{ "type": "TypeA", "value": { "id": "blah", "content": "0xa1b.." } }
enum Content {
TypeA(Vec<u8>),
TypeB(BigInt),
}
struct Value {
id: String,
content: Content,
}
struct Data {
typ: String,
value: Value,
}
The difficulty is selecting the correct value of the Content enumeration, which is based on the typ value.
As far as I know, deserialization in serde is stateless, an hence there is no way of either
knowing what the value of typ is at the time of deserialization of content (even though the deserialization order is guaranteed)
or injecting the value of typ in the deserializer then collecting it.
How can this be achieved with serde ?
I have looked at
serde_state but I cannot get the macros working and this library is wrapping serde, which worries me
DeserializeSeed but my undestanding is that it must be used in place of Deserialize for all types and my data model is big
The existing SO answers usually exploit the fact that the related fields are at the same level. This is not the case here: the actual data model is big, deep and the fields are "far apart"
Much simpler using tagging, but changing your data structure:
use serde::{Deserialize, Deserializer}; // 1.0.130
use serde_json; // 1.0.67
#[derive(Debug, Deserialize)]
#[serde(tag = "type", content = "value")]
enum Data {
TypeA(Value<String>),
TypeB(Value<u32>),
}
#[derive(Debug, Deserialize)]
struct Value<T> {
id: String,
content: T,
}
fn main() {
let input = r#"{"type": "TypeA", "value": { "id": "blah", "content": "0xa1b..."}}"#;
let data: Data = serde_json::from_str(input).unwrap();
println!("{:?}", data);
}
Playground
Also, you can write your own custom desializer using some intermediary serde_json::Value:
use serde::{Deserialize, Deserializer};// 1.0.130
use serde_json; // 1.0.67
#[derive(Debug)]
enum Content {
TypeA(String),
TypeB(String),
}
#[derive(Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Debug)]
struct Data {
typ: String,
value: Value,
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let json: serde_json::value::Value = serde_json::value::Value::deserialize(deserializer)?;
let typ = json.get("type").expect("type").as_str().unwrap();
let value = json.get("value").expect("value");
let id = value.get("id").expect("id").as_str().unwrap();
let content = value.get("content").expect("content").as_str().unwrap();
Ok(Data {
typ: typ.to_string(),
value: Value {
id: id.to_string(),
content: {
match typ {
"TypeA" => Content::TypeA(content.to_string()),
"TypeB" => Content::TypeB(content.to_string()),
_ => panic!("Invalid type, but this should be an error not a panic"),
}
}
}
})
}
}
fn main() {
let input = r#"{"type": "TypeA", "value": { "id": "blah", "content": "0xa1b..."}}"#;
let data: Data = serde_json::from_str(input).unwrap();
println!("{:?}", data);
}
Playground
Disclaimer: I didn't handle error correctly and you could also extract the content matching into a function for example. The above code is just to illustrate the main idea.
There's a few different ways this can be solved, e.g. with a custom impl Deserialize for Data, then deserialize into a serde_json::Value, and then manually juggling between the types.
For a somewhat example of that, checkout this answer that I wrote in the past. It's not a one-to-one solution, but it might give some hints for implementing Deserialize manually, for what you want.
That being said. Personally, I prefer to minimize when I have to impl Deserialize manually, and instead deserialize into another type, and have it automatically convert using #[serde(from = "FromType")].
First, instead of type_: String, I'd suggest we introduce enum ContentType.
#[derive(Deserialize, Clone, Copy, Debug)]
enum ContentType {
TypeA,
TypeB,
TypeC,
TypeD,
}
Now, let's consider the types you introduced. I've added a few extra variants to Content, as you mentioned the variants can be different.
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum Content {
TypeA(Vec<u8>),
TypeB(Vec<u8>),
TypeC(String),
TypeD { foo: i32, bar: i32 },
}
#[derive(Deserialize, Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Deserialize, Debug)]
#[serde(try_from = "IntermediateData")]
struct Data {
#[serde(alias = "type")]
type_: ContentType,
value: Value,
}
Nothing crazy yet or much different. All the "magic" happens in the IntermediateData type, along with the impl TryFrom.
First, let's introduce a check_type(), which takes a ContentType and checks it against the Content. If the Content variant doesn't match the ContentType variant, then convert it.
In short, when using #[serde(untagged)] then when serde attempts to deserialize Content it will always return the first successful variant it can deserialize to if any. So if it can deserialize a Vec<u8>, then it will always result in Content::TypeA(). Knowing this, then in our check_type(), if the ContentType is TypeB and the Content is TypeA. Then we simply change it to TypeB.
impl Content {
// TODO: impl proper error type instead of `String`
fn check_type(self, type_: ContentType) -> Result<Self, String> {
match (type_, self) {
(ContentType::TypeA, content # Self::TypeA(_)) => Ok(content),
(ContentType::TypeB, Self::TypeA(content)) => Ok(Self::TypeB(content)),
(ContentType::TypeC | ContentType::TypeD, content) => Ok(content),
(type_, content) => Err(format!(
"unexpected combination of {:?} and {:?}",
type_, content
)),
}
}
}
Now all we need is the intermediate IntermediateData, along with a TryFrom conversion, which calls check_type() on the Content.
#[derive(Deserialize, Debug)]
struct IntermediateData {
#[serde(alias = "type")]
type_: ContentType,
value: Value,
}
impl TryFrom<IntermediateData> for Data {
// TODO: impl proper error type instead of `String`
type Error = String;
fn try_from(mut data: IntermediateData) -> Result<Self, Self::Error> {
data.value.content = data.value.content.check_type(data.type_)?;
Ok(Data {
type_: data.type_,
value: data.value,
})
}
}
That's all. Now we can test it against the following:
// serde = { version = "1", features = ["derive"] }
// serde_json = "1.0"
use std::convert::TryFrom;
use serde::Deserialize;
// ... all the previous code ...
fn main() {
let json = r#"{ "type": "TypeA", "value": { "id": "foo", "content": [0, 1, 2, 3] } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeB", "value": { "id": "foo", "content": [0, 1, 2, 3] } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeC", "value": { "id": "bar", "content": "foo" } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeD", "value": { "id": "baz", "content": { "foo": 1, "bar": 2 } } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
}
Then it correctly results in Datas with Content::TypeA, Content::TypeB, Content::TypeC, and the last one Content::TypeD.
Lastly. There is issue #939 which talks about adding a #[serde(validate = "...")]. However, it was created in 2017, so I wouldn't hold my breath on it.
DeserializeSeed can be mixed with normal Deserialize code. It does not need to be used for all types. Here, it is enough to use it to deserialize Value.
Playground
use serde::de::{DeserializeSeed, IgnoredAny, MapAccess, Visitor};
use serde::*;
use std::fmt;
#[derive(Debug)]
enum ContentType {
A,
B,
Unknown,
}
#[derive(Debug)]
enum Content {
TypeA(String),
TypeB(i32),
}
#[derive(Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Debug)]
struct Data {
typ: String,
value: Value,
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct DataVisitor;
impl<'de> Visitor<'de> for DataVisitor {
type Value = Data;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct Data")
}
fn visit_map<A>(self, mut access: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut typ = None;
let mut value = None;
while let Some(key) = access.next_key()? {
match key {
"type" => {
typ = Some(access.next_value()?);
}
"value" => {
let seed = match typ.as_deref() {
Some("TypeA") => ContentType::A,
Some("TypeB") => ContentType::B,
_ => ContentType::Unknown,
};
value = Some(access.next_value_seed(seed)?);
}
_ => {
access.next_value::<IgnoredAny>()?;
}
}
}
Ok(Data {
typ: typ.unwrap(),
value: value.unwrap(),
})
}
}
deserializer.deserialize_map(DataVisitor)
}
}
impl<'de> DeserializeSeed<'de> for ContentType {
type Value = Value;
fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
where
D: Deserializer<'de>,
{
struct ValueVisitor(ContentType);
impl<'de> Visitor<'de> for ValueVisitor {
type Value = Value;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct Value")
}
fn visit_map<A>(self, mut access: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut id = None;
let mut content = None;
while let Some(key) = access.next_key()? {
match key {
"id" => {
id = Some(access.next_value()?);
}
"content" => {
content = Some(match self.0 {
ContentType::A => Content::TypeA(access.next_value()?),
ContentType::B => Content::TypeB(access.next_value()?),
ContentType::Unknown => {
panic!("Should not happen if type happens to occur before value, but JSON is unordered.");
}
});
}
_ => {
access.next_value::<IgnoredAny>()?;
}
}
}
Ok(Value {
id: id.unwrap(),
content: content.unwrap(),
})
}
}
deserializer.deserialize_map(ValueVisitor(self))
}
}
fn main() {
let j = r#"{"type": "TypeA", "value": {"id": "blah", "content": "0xa1b.."}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap());
let j = r#"{"type": "TypeB", "value": {"id": "blah", "content": 666}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap());
let j = r#"{"type": "TypeB", "value": {"id": "blah", "content": "Foobar"}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap_err());
}
The main downside of this solution is that you lose the possibility of deriving the code.

How to create a macro that matches enum variants without knowing its structure?

I've found the following solution to create a macro that defines a function which returns true if an enum matches a variant:
macro_rules! is_variant {
($name: ident, $enum_type: ty, $enum_pattern: pat) => {
fn $name(value: &$enum_type) -> bool {
matches!(value, $enum_pattern)
}
}
}
Usage:
enum TestEnum {
A,
B(),
C(i32, i32),
}
is_variant!(is_a, TestEnum, TestEnum::A);
is_variant!(is_b, TestEnum, TestEnum::B());
is_variant!(is_c, TestEnum, TestEnum::C(_, _));
assert_eq!(is_a(&TestEnum::A), true);
assert_eq!(is_a(&TestEnum::B()), false);
assert_eq!(is_a(&TestEnum::C(1, 1)), false);
Is there a way to define this macro so that
providing placeholders for the variant data can be avoided?
In other words, change the macro to be able to use it like so:
is_variant!(is_a, TestEnum, TestEnum::A);
is_variant!(is_a, TestEnum, TestEnum::B);
is_variant!(is_a, TestEnum, TestEnum::C);
Using std::mem::discriminant, as described in Compare enums only by variant, not value, doesn't help since it can only be used to compare two enum instances. In this case there is only one single object and the variant identifier.
It also mentions matching on TestEnum::A(..) but that doesn't work if the variant has no data.
You can do that using proc macros. There is a chapter in rust book that may help.
Then you can use it like:
use is_variant_derive::IsVariant;
#[derive(IsVariant)]
enum TestEnum {
A,
B(),
C(i32, i32),
D { _name: String, _age: i32 },
}
fn main() {
let x = TestEnum::C(1, 2);
assert!(x.is_c());
let x = TestEnum::A;
assert!(x.is_a());
let x = TestEnum::B();
assert!(x.is_b());
let x = TestEnum::D {_name: "Jane Doe".into(), _age: 30 };
assert!(x.is_d());
}
For the above effect, proc macro crate will look like:
is_variant_derive/src/lib.rs:
extern crate proc_macro;
use proc_macro::TokenStream;
use proc_macro2::{Span, TokenStream as TokenStream2};
use quote::{format_ident, quote, quote_spanned};
use syn::spanned::Spanned;
use syn::{parse_macro_input, Data, DeriveInput, Error, Fields};
// https://crates.io/crates/convert_case
use convert_case::{Case, Casing};
macro_rules! derive_error {
($string: tt) => {
Error::new(Span::call_site(), $string)
.to_compile_error()
.into();
};
}
#[proc_macro_derive(IsVariant)]
pub fn derive_is_variant(input: TokenStream) -> TokenStream {
// See https://doc.servo.org/syn/derive/struct.DeriveInput.html
let input: DeriveInput = parse_macro_input!(input as DeriveInput);
// get enum name
let ref name = input.ident;
let ref data = input.data;
let mut variant_checker_functions;
// data is of type syn::Data
// See https://doc.servo.org/syn/enum.Data.html
match data {
// Only if data is an enum, we do parsing
Data::Enum(data_enum) => {
// data_enum is of type syn::DataEnum
// https://doc.servo.org/syn/struct.DataEnum.html
variant_checker_functions = TokenStream2::new();
// Iterate over enum variants
// `variants` if of type `Punctuated` which implements IntoIterator
//
// https://doc.servo.org/syn/punctuated/struct.Punctuated.html
// https://doc.servo.org/syn/struct.Variant.html
for variant in &data_enum.variants {
// Variant's name
let ref variant_name = variant.ident;
// Variant can have unnamed fields like `Variant(i32, i64)`
// Variant can have named fields like `Variant {x: i32, y: i32}`
// Variant can be named Unit like `Variant`
let fields_in_variant = match &variant.fields {
Fields::Unnamed(_) => quote_spanned! {variant.span()=> (..) },
Fields::Unit => quote_spanned! { variant.span()=> },
Fields::Named(_) => quote_spanned! {variant.span()=> {..} },
};
// construct an identifier named is_<variant_name> for function name
// We convert it to snake case using `to_case(Case::Snake)`
// For example, if variant is `HelloWorld`, it will generate `is_hello_world`
let mut is_variant_func_name =
format_ident!("is_{}", variant_name.to_string().to_case(Case::Snake));
is_variant_func_name.set_span(variant_name.span());
// Here we construct the function for the current variant
variant_checker_functions.extend(quote_spanned! {variant.span()=>
fn #is_variant_func_name(&self) -> bool {
match self {
#name::#variant_name #fields_in_variant => true,
_ => false,
}
}
});
// Above we are making a TokenStream using extend()
// This is because TokenStream is an Iterator,
// so we can keep extending it.
//
// proc_macro2::TokenStream:- https://docs.rs/proc-macro2/1.0.24/proc_macro2/struct.TokenStream.html
// Read about
// quote:- https://docs.rs/quote/1.0.7/quote/
// quote_spanned:- https://docs.rs/quote/1.0.7/quote/macro.quote_spanned.html
// spans:- https://docs.rs/syn/1.0.54/syn/spanned/index.html
}
}
_ => return derive_error!("IsVariant is only implemented for enums"),
};
let (impl_generics, ty_generics, where_clause) = input.generics.split_for_impl();
let expanded = quote! {
impl #impl_generics #name #ty_generics #where_clause {
// variant_checker_functions gets replaced by all the functions
// that were constructed above
#variant_checker_functions
}
};
TokenStream::from(expanded)
}
Cargo.toml for the library named is_variant_derive:
[lib]
proc-macro = true
[dependencies]
syn = "1.0"
quote = "1.0"
proc-macro2 = "1.0"
convert_case = "0.4.0"
Cargo.toml for the binary:
[dependencies]
is_variant_derive = { path = "../is_variant_derive" }
Then have both crates in the same directory (workspace) and then have this Cargo.toml:
[workspace]
members = [
"bin",
"is_variant_derive",
]
Playground
Also note that proc-macro needs to exist in its own separate crate.
Or you can directly use is_variant crate.

How can I set a struct field value by string name?

Out of habit from interpreted programming languages, I want to rewrite many values based on their key. I assumed that I would store all the information in the struct prepared for this project. So I started iterating:
struct Container {
x: String,
y: String,
z: String
}
impl Container {
// (...)
fn load_data(&self, data: &HashMap<String, String>) {
let valid_keys = vec_of_strings![ // It's simple vector with Strings
"x", "y", "z"
] ;
for key_name in &valid_keys {
if data.contains_key(key_name) {
self[key_name] = Some(data.get(key_name);
// It's invalid of course but
// I do not know how to write it correctly.
// For example, in PHP I would write it like this:
// $this[$key_name] = $data[$key_name];
}
}
}
// (...)
}
Maybe macros? I tried to use them. key_name is always interpreted as it is, I cannot get value of key_name instead.
How can I do this without repeating the code for each value?
With macros, I always advocate starting from the direct code, then seeing what duplication there is. In this case, we'd start with
fn load_data(&mut self, data: &HashMap<String, String>) {
if let Some(v) = data.get("x") {
self.x = v.clone();
}
if let Some(v) = data.get("y") {
self.y = v.clone();
}
if let Some(v) = data.get("z") {
self.z = v.clone();
}
}
Note the number of differences:
The struct must take &mut self.
It's inefficient to check if a value is there and then get it separately.
We need to clone the value because we only only have a reference.
We cannot store an Option in a String.
Once you have your code working, you can see how to abstract things. Always start by trying to use "lighter" abstractions (functions, traits, etc.). Only after exhausting that, I'd start bringing in macros. Let's start by using stringify
if let Some(v) = data.get(stringify!(x)) {
self.x = v.clone();
}
Then you can extract out a macro:
macro_rules! thing {
($this: ident, $data: ident, $($name: ident),+) => {
$(
if let Some(v) = $data.get(stringify!($name)) {
$this.$name = v.clone();
}
)+
};
}
impl Container {
fn load_data(&mut self, data: &HashMap<String, String>) {
thing!(self, data, x, y, z);
}
}
fn main() {
let mut c = Container::default();
let d: HashMap<_, _> = vec![("x".into(), "alpha".into())].into_iter().collect();
c.load_data(&d);
println!("{:?}", c);
}
Full disclosure: I don't think this is a good idea.

Resources