How do I annotate a HashMap? - rust

I want to write a function to get some data in a HashMap, only that its value can be a HashMap.
So, my data would look something like this:
{
"foo": {
"bar": "baz"
}
}
And I'm looking for an annotation like HashMap<String, HashMap<...>
How can i annotate this in my function?

I assume with "can be a HashMap" you mean "is definitely a HashMap"? Because in Rust, you have to know all the types at compile time, so a "maybe" isn't quite that easy.
Assuming you mean "is definitely", this works:
use std::collections::HashMap;
fn main() {
let mut x: HashMap<String, HashMap<String, String>> = HashMap::new();
let foo_entry = x.entry("foo".to_string()).or_insert_with(HashMap::new);
foo_entry.insert("bar".to_string(), "baz".to_string());
println!("{:#?}", x);
}
{
"foo": {
"bar": "baz",
},
}
If you mean "might be" then I assume it you have a JSON- like situation, for which I can wholeheartedly recommend the serde-json library:
use serde_json::json;
fn main() {
let x = json!({
"foo": {
"bar": "baz"
}
});
println!("As Debug: \n{:#?}\n", x);
println!("As JSON serialized: \n{}", x.to_string());
}
As Debug:
Object({
"foo": Object({
"bar": String(
"baz",
),
}),
})
As JSON serialized:
{"foo":{"bar":"baz"}}```

Related

How to parse TOML in Rust with unknown structure?

My configuration file has a large number of arbitrary key-value pairs in it, which I want to parse using the toml crate. However it seems as if the standard way is to use a given struct that fits the configuration file. How can I load the key-value pairs into a data structure like a map or an iterator of pairs, instead of having to specifiy the structure beforehand with a struct?
toml as a Value struct that can hold anything and that you can introspect dynamically in order to discover any content without forcing the usage of a specific structure.
use toml::Value;
fn show_value(
value: &Value,
indent: usize,
) {
let pfx = " ".repeat(indent);
print!("{}", pfx);
match value {
Value::String(string) => {
println!("a string --> {}", string);
}
Value::Integer(integer) => {
println!("an integer --> {}", integer);
}
Value::Float(float) => {
println!("a float --> {}", float);
}
Value::Boolean(boolean) => {
println!("a boolean --> {}", boolean);
}
Value::Datetime(datetime) => {
println!("a datetime --> {}", datetime);
}
Value::Array(array) => {
println!("an array");
for v in array.iter() {
show_value(v, indent + 1);
}
}
Value::Table(table) => {
println!("a table");
for (k, v) in table.iter() {
println!("{}key {}", pfx, k);
show_value(v, indent + 1);
}
}
}
}
fn main() {
let input_text = r#"
abc = 123
[def]
ghi = "hello"
jkl = [ 12.34, 56.78 ]
"#;
let value = input_text.parse::<Value>().unwrap();
show_value(&value, 0);
}
/*
a table
key abc
an integer --> 123
key def
a table
key ghi
a string --> hello
key jkl
an array
a float --> 12.34
a float --> 56.78
*/
You actually don't need to do anything special other than tell it to deserialize into a HashMap:
use std::collections::HashMap;
use toml;
fn main() {
let toml_data = r#"
foo = "123"
bar = "456"
"#;
let config: HashMap<String, String> = toml::from_str(toml_data).unwrap();
println!("{:?}", config);
}
Of course, since TOML and Rust are both typed, your keys all need to be the same type (String in this example), and it cannot handle tables, since it wouldn't know where in the map a table should go.
If you do have a couple tables, just add your maps as fields to a struct and that works just as simply:
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use toml;
#[derive(Debug, Serialize, Deserialize)]
struct Config {
data_a: HashMap<String, String>,
data_b: HashMap<String, String>,
}
fn main() {
let toml_data = r#"
[data_a]
foo = "123"
bar = "456"
[data_b]
bat = "123"
baz = "456"
"#;
let config: Config = toml::from_str(toml_data).unwrap();
println!("{:?}", config);
}
For what is is worth!
I came here to find a way to handle a toml-config file for my project.
This is what I've found:
You can parse an arbitrary toml file by using the Table type.
See the documentation.
All types can be automatically parsed but you cannot escape the fact that rust is typed. Therefore you have to parse the values into an expected type.
See my example:
use toml::Table;
fn main() {
//Load toml file
let path = std::path::Path::new("../Cargo.toml");
let file = match std::fs::read_to_string(path) {
Ok(f) => f,
Err(e) => panic!("{}", e),
};
let cfg: Table = file.parse().unwrap();
println!("Config in table format\n");
dbg!(&cfg);
println!("Index into config");
let cfg_string: &str = cfg["package"]["version"].as_str().unwrap();
println!("Version: {:?}", cfg_string);
let cfg_bool: bool = cfg["package"]["nest"]["nested_bool"].as_bool().unwrap();
println!("Nested bool: {:?}", cfg_bool);
// Default value if failed
let cfg_float: f64 = cfg["package"]["nest"]["nested_int"]
.as_float()
.unwrap_or(5.0);
println!("Default float to value: {:?}", cfg_float);
}
The toml-file
[package]
name = "rust_test"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
toml = "0.6.0"
[package.nest]
nested_float = 1.0
nested_int = 1
nested_bool = false
Output:
Config in table format
[src/main.rs:13] &cfg = {
"dependencies": Table(
{
"toml": String(
"0.6.0",
),
},
),
"package": Table(
{
"edition": String(
"2021",
),
"name": String(
"rust_test",
),
"nest": Table(
{
"nested_bool": Boolean(
false,
),
"nested_float": Float(
1.0,
),
"nested_int": Integer(
1,
),
},
),
"version": String(
"0.1.0",
),
},
),
}
Index into config
Version: "0.1.0"
Nested bool: false
Default float to value: 5.0

serde: deserialize a field based on the value of another field

I would like to deserialize a wire format, like this JSON, into the Data structure below and I am failing to write the serde Deserialize implementations for the corresponding rust types.
{ "type": "TypeA", "value": { "id": "blah", "content": "0xa1b.." } }
enum Content {
TypeA(Vec<u8>),
TypeB(BigInt),
}
struct Value {
id: String,
content: Content,
}
struct Data {
typ: String,
value: Value,
}
The difficulty is selecting the correct value of the Content enumeration, which is based on the typ value.
As far as I know, deserialization in serde is stateless, an hence there is no way of either
knowing what the value of typ is at the time of deserialization of content (even though the deserialization order is guaranteed)
or injecting the value of typ in the deserializer then collecting it.
How can this be achieved with serde ?
I have looked at
serde_state but I cannot get the macros working and this library is wrapping serde, which worries me
DeserializeSeed but my undestanding is that it must be used in place of Deserialize for all types and my data model is big
The existing SO answers usually exploit the fact that the related fields are at the same level. This is not the case here: the actual data model is big, deep and the fields are "far apart"
Much simpler using tagging, but changing your data structure:
use serde::{Deserialize, Deserializer}; // 1.0.130
use serde_json; // 1.0.67
#[derive(Debug, Deserialize)]
#[serde(tag = "type", content = "value")]
enum Data {
TypeA(Value<String>),
TypeB(Value<u32>),
}
#[derive(Debug, Deserialize)]
struct Value<T> {
id: String,
content: T,
}
fn main() {
let input = r#"{"type": "TypeA", "value": { "id": "blah", "content": "0xa1b..."}}"#;
let data: Data = serde_json::from_str(input).unwrap();
println!("{:?}", data);
}
Playground
Also, you can write your own custom desializer using some intermediary serde_json::Value:
use serde::{Deserialize, Deserializer};// 1.0.130
use serde_json; // 1.0.67
#[derive(Debug)]
enum Content {
TypeA(String),
TypeB(String),
}
#[derive(Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Debug)]
struct Data {
typ: String,
value: Value,
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let json: serde_json::value::Value = serde_json::value::Value::deserialize(deserializer)?;
let typ = json.get("type").expect("type").as_str().unwrap();
let value = json.get("value").expect("value");
let id = value.get("id").expect("id").as_str().unwrap();
let content = value.get("content").expect("content").as_str().unwrap();
Ok(Data {
typ: typ.to_string(),
value: Value {
id: id.to_string(),
content: {
match typ {
"TypeA" => Content::TypeA(content.to_string()),
"TypeB" => Content::TypeB(content.to_string()),
_ => panic!("Invalid type, but this should be an error not a panic"),
}
}
}
})
}
}
fn main() {
let input = r#"{"type": "TypeA", "value": { "id": "blah", "content": "0xa1b..."}}"#;
let data: Data = serde_json::from_str(input).unwrap();
println!("{:?}", data);
}
Playground
Disclaimer: I didn't handle error correctly and you could also extract the content matching into a function for example. The above code is just to illustrate the main idea.
There's a few different ways this can be solved, e.g. with a custom impl Deserialize for Data, then deserialize into a serde_json::Value, and then manually juggling between the types.
For a somewhat example of that, checkout this answer that I wrote in the past. It's not a one-to-one solution, but it might give some hints for implementing Deserialize manually, for what you want.
That being said. Personally, I prefer to minimize when I have to impl Deserialize manually, and instead deserialize into another type, and have it automatically convert using #[serde(from = "FromType")].
First, instead of type_: String, I'd suggest we introduce enum ContentType.
#[derive(Deserialize, Clone, Copy, Debug)]
enum ContentType {
TypeA,
TypeB,
TypeC,
TypeD,
}
Now, let's consider the types you introduced. I've added a few extra variants to Content, as you mentioned the variants can be different.
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum Content {
TypeA(Vec<u8>),
TypeB(Vec<u8>),
TypeC(String),
TypeD { foo: i32, bar: i32 },
}
#[derive(Deserialize, Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Deserialize, Debug)]
#[serde(try_from = "IntermediateData")]
struct Data {
#[serde(alias = "type")]
type_: ContentType,
value: Value,
}
Nothing crazy yet or much different. All the "magic" happens in the IntermediateData type, along with the impl TryFrom.
First, let's introduce a check_type(), which takes a ContentType and checks it against the Content. If the Content variant doesn't match the ContentType variant, then convert it.
In short, when using #[serde(untagged)] then when serde attempts to deserialize Content it will always return the first successful variant it can deserialize to if any. So if it can deserialize a Vec<u8>, then it will always result in Content::TypeA(). Knowing this, then in our check_type(), if the ContentType is TypeB and the Content is TypeA. Then we simply change it to TypeB.
impl Content {
// TODO: impl proper error type instead of `String`
fn check_type(self, type_: ContentType) -> Result<Self, String> {
match (type_, self) {
(ContentType::TypeA, content # Self::TypeA(_)) => Ok(content),
(ContentType::TypeB, Self::TypeA(content)) => Ok(Self::TypeB(content)),
(ContentType::TypeC | ContentType::TypeD, content) => Ok(content),
(type_, content) => Err(format!(
"unexpected combination of {:?} and {:?}",
type_, content
)),
}
}
}
Now all we need is the intermediate IntermediateData, along with a TryFrom conversion, which calls check_type() on the Content.
#[derive(Deserialize, Debug)]
struct IntermediateData {
#[serde(alias = "type")]
type_: ContentType,
value: Value,
}
impl TryFrom<IntermediateData> for Data {
// TODO: impl proper error type instead of `String`
type Error = String;
fn try_from(mut data: IntermediateData) -> Result<Self, Self::Error> {
data.value.content = data.value.content.check_type(data.type_)?;
Ok(Data {
type_: data.type_,
value: data.value,
})
}
}
That's all. Now we can test it against the following:
// serde = { version = "1", features = ["derive"] }
// serde_json = "1.0"
use std::convert::TryFrom;
use serde::Deserialize;
// ... all the previous code ...
fn main() {
let json = r#"{ "type": "TypeA", "value": { "id": "foo", "content": [0, 1, 2, 3] } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeB", "value": { "id": "foo", "content": [0, 1, 2, 3] } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeC", "value": { "id": "bar", "content": "foo" } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
let json = r#"{ "type": "TypeD", "value": { "id": "baz", "content": { "foo": 1, "bar": 2 } } }"#;
let data: Data = serde_json::from_str(json).unwrap();
println!("{:#?}", data);
}
Then it correctly results in Datas with Content::TypeA, Content::TypeB, Content::TypeC, and the last one Content::TypeD.
Lastly. There is issue #939 which talks about adding a #[serde(validate = "...")]. However, it was created in 2017, so I wouldn't hold my breath on it.
DeserializeSeed can be mixed with normal Deserialize code. It does not need to be used for all types. Here, it is enough to use it to deserialize Value.
Playground
use serde::de::{DeserializeSeed, IgnoredAny, MapAccess, Visitor};
use serde::*;
use std::fmt;
#[derive(Debug)]
enum ContentType {
A,
B,
Unknown,
}
#[derive(Debug)]
enum Content {
TypeA(String),
TypeB(i32),
}
#[derive(Debug)]
struct Value {
id: String,
content: Content,
}
#[derive(Debug)]
struct Data {
typ: String,
value: Value,
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct DataVisitor;
impl<'de> Visitor<'de> for DataVisitor {
type Value = Data;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct Data")
}
fn visit_map<A>(self, mut access: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut typ = None;
let mut value = None;
while let Some(key) = access.next_key()? {
match key {
"type" => {
typ = Some(access.next_value()?);
}
"value" => {
let seed = match typ.as_deref() {
Some("TypeA") => ContentType::A,
Some("TypeB") => ContentType::B,
_ => ContentType::Unknown,
};
value = Some(access.next_value_seed(seed)?);
}
_ => {
access.next_value::<IgnoredAny>()?;
}
}
}
Ok(Data {
typ: typ.unwrap(),
value: value.unwrap(),
})
}
}
deserializer.deserialize_map(DataVisitor)
}
}
impl<'de> DeserializeSeed<'de> for ContentType {
type Value = Value;
fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
where
D: Deserializer<'de>,
{
struct ValueVisitor(ContentType);
impl<'de> Visitor<'de> for ValueVisitor {
type Value = Value;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct Value")
}
fn visit_map<A>(self, mut access: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut id = None;
let mut content = None;
while let Some(key) = access.next_key()? {
match key {
"id" => {
id = Some(access.next_value()?);
}
"content" => {
content = Some(match self.0 {
ContentType::A => Content::TypeA(access.next_value()?),
ContentType::B => Content::TypeB(access.next_value()?),
ContentType::Unknown => {
panic!("Should not happen if type happens to occur before value, but JSON is unordered.");
}
});
}
_ => {
access.next_value::<IgnoredAny>()?;
}
}
}
Ok(Value {
id: id.unwrap(),
content: content.unwrap(),
})
}
}
deserializer.deserialize_map(ValueVisitor(self))
}
}
fn main() {
let j = r#"{"type": "TypeA", "value": {"id": "blah", "content": "0xa1b.."}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap());
let j = r#"{"type": "TypeB", "value": {"id": "blah", "content": 666}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap());
let j = r#"{"type": "TypeB", "value": {"id": "blah", "content": "Foobar"}}"#;
dbg!(serde_json::from_str::<Data>(j).unwrap_err());
}
The main downside of this solution is that you lose the possibility of deriving the code.

Custom serde serialization for enum type

I have the following data structure which should be able to hold either a String, a u64 value, a boolean value, or a String vector.
#[derive(Serialize, Deserialize)]
pub enum JsonRpcParam {
String(String),
Int(u64),
Bool(bool),
Vec(Vec<String>)
}
The use case for this data structure is to build JSON RPC parameters which can have multiple types, so I would be able to build a parameter list like this:
let mut params : Vec<JsonRpcParam> = Vec::new();
params.push(JsonRpcParam::String("Test".to_string()));
params.push(JsonRpcParam::Bool(true));
params.push(JsonRpcParam::Int(64));
params.push(JsonRpcParam::Vec(vec![String::from("abc"), String::from("cde")]));
My problem is now the serialization. I am using serde_json for the serialization part. The default serialization of the vector posted above yields:
[
{
"String":"Test"
},
{
"Bool":true
},
{
"Int":64
},
{
"Vec":[
"abc",
"cde"
]
}
]
Instead, I would like the serialization to look like this:
[
"Test",
true,
64,
["abc","cde"]
]
I attempted to implement a custom serialize method for this type, but dont't know how to achieve what I want, my attempt looks like this:
impl Serialize for JsonRpcParam {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer {
match *self {
JsonRpcParam::String(x) => serializer.serialize_str(x),
JsonRpcParam::Int(x) => serializer.serialize_u64(x),
JsonRpcParam::Bool(x) => serializer.serialize_bool(x),
JsonRpcParam::Vec(x) => _
}
}
}
Instead of manually implementing Serialize you can instead use #[serde(untagged)].
In your case that will work perfectly fine. However, be warned that if the enum variant isn't unique and can't be clearly identified from the JSON, then it will deserialize into the first variant that matches. In short if you also have e.g. a subsequent JsonRpcParam::OtherString(String), then that will deserialize into JsonRpcParam::String(String).
#[derive(Serialize, Deserialize, Debug)]
#[serde(untagged)]
pub enum JsonRpcParam {
String(String),
Int(u64),
Bool(bool),
Vec(Vec<String>),
}
If you now use e.g. serde_json::to_string_pretty(), then it will yield an output in your desired format:
fn main() {
let mut params: Vec<JsonRpcParam> = Vec::new();
params.push(JsonRpcParam::String("Test".to_string()));
params.push(JsonRpcParam::Bool(true));
params.push(JsonRpcParam::Int(64));
params.push(JsonRpcParam::Vec(vec![
String::from("abc"),
String::from("cde"),
]));
let json = serde_json::to_string_pretty(&params).unwrap();
println!("{}", json);
}
Output:
[
"Test",
true,
64,
[
"abc",
"cde"
]
]

Access struct field by variable

I want to iterate over over the fields of a struct and access its respective value for each iteration:
#[derive(Default, Debug)]
struct A {
foo: String,
bar: String,
baz: String
}
fn main() {
let fields = vec!["foo", "bar", "baz"];
let a: A = Default::default();
for field in fields {
let value = a[field] // this doesn't work
}
}
How can I access a field by variable?
Rust doesn't have any way of iterating directly over its fields. You should instead use a collection type such as Vec, array or one of the collections in std::collections if your data semantically represents a collection of some sort.
If you still feel the need to iterate over the fields, perhaps you need to re-consider your approach to your task and see if there isn't a more idiomatic/proper way to accomplish it
By using pattern matching, you can iterate over its fields.
#[derive(Default, Debug)]
struct A {
foo: String,
bar: String,
baz: String
}
impl A {
fn get(&self, field_string: &str) -> Result<&String, String> {
match field_string {
"foo" => Ok(&self.foo),
"bar" => Ok(&self.bar),
"baz" => Ok(&self.baz),
_ => Err(format!("invalid field name to get '{}'", field_string))
}
}
}
fn main() {
let fields = vec!["foo", "bar", "baz"];
let a = A {
foo: "value_of_foo".to_string(),
bar: "value_of_bar".to_string(),
baz: "value_of_baz".to_string()
};
for field in fields {
let value = a.get(field).unwrap();
println!("{:?}", value);
}
}
returns
"value_of_foo"
"value_of_bar"
"value_of_baz"
I am now writing a macro that implements such codes automatically for any struct, although there may be some bugs.
field_accessor (https://github.com/europeanplaice/field_accessor).
Cargo.toml
[dependencies]
field_accessor = "0"
use field_accessor::FieldAccessor;
#[derive(Default, Debug, FieldAccessor)]
struct A {
foo: String,
bar: String,
baz: String
}
fn main() {
let a = A {
foo: "value_of_foo".to_string(),
bar: "value_of_bar".to_string(),
baz: "value_of_baz".to_string()
};
for field in a.getstructinfo().field_names.iter() {
let value = a.get(field).unwrap();
println!("{:?}", value);
}
}
It also returns
"value_of_foo"
"value_of_bar"
"value_of_baz"
Based on the answer of sshashank124 I came to the conclusion that I should use an Hashmap instead of a struct:
fn main() {
let mut B = HashMap::new();
B.insert("foo", 1);
B.insert("bar", 2);
B.insert("baz", 3);
let fields = vec!["foo", "bar", "baz"];
for &field in &fields {
let value = B.get(field);
}
}

What's an idiomatic way to delete a value from HashMap if it is empty?

The following code works, but it doesn't look nice as the definition of is_empty is too far away from the usage.
fn remove(&mut self, index: I, primary_key: &Rc<K>) {
let is_empty;
{
let ks = self.data.get_mut(&index).unwrap();
ks.remove(primary_key);
is_empty = ks.is_empty();
}
// I have to wrap `ks` in an inner scope so that we can borrow `data` mutably.
if is_empty {
self.data.remove(&index);
}
}
Do we have some ways to drop the variables in condition before entering the if branches, e.g.
if {ks.is_empty()} {
self.data.remove(&index);
}
Whenever you have a double look-up of a key, you need to think Entry API.
With the entry API, you get a handle to a key-value pair and can:
read the key,
read/modify the value,
remove the entry entirely (getting the key and value back).
It's extremely powerful.
In this case:
use std::collections::HashMap;
use std::collections::hash_map::Entry;
fn remove(hm: &mut HashMap<i32, String>, index: i32) {
if let Entry::Occupied(o) = hm.entry(index) {
if o.get().is_empty() {
o.remove_entry();
}
}
}
fn main() {
let mut hm = HashMap::new();
hm.insert(1, String::from(""));
remove(&mut hm, 1);
println!("{:?}", hm);
}
I did this in the end:
match self.data.entry(index) {
Occupied(mut occupied) => {
let is_empty = {
let ks = occupied.get_mut();
ks.remove(primary_key);
ks.is_empty()
};
if is_empty {
occupied.remove();
}
},
Vacant(_) => unreachable!()
}

Resources