How to sort HashMap keys when serializing with serde? - rust

I'm serializing a HashMap with serde, like so:
#[derive(Serialize, Deserialize)]
struct MyStruct {
map: HashMap<String, String>
}
HashMap's key order is unspecified, and since the hashing is randomized (see documentation), the keys actually end up coming out in different order between identical runs.
I'd like my HashMap to be serialized in sorted (e.g. alphabetical) key order, so that the serialization is deterministic.
I could use a BTreeMap instead of a HashMap to achieve this, as BTreeMap::keys() returns its keys in sorted order, but I'd rather not change my data structure just to accommodate the serialization logic.
How do I tell serde to sort the HashMap keys before serializing?

Use the serialize_with field attribute:
use serde::{Deserialize, Serialize, Serializer}; // 1.0.106
use serde_json; // 1.0.52
use std::collections::{BTreeMap, HashMap};
#[derive(Serialize, Deserialize, Default)]
struct MyStruct {
#[serde(serialize_with = "ordered_map")]
map: HashMap<String, String>,
}
fn ordered_map<S>(value: &HashMap<String, String>, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let ordered: BTreeMap<_, _> = value.iter().collect();
ordered.serialize(serializer)
}
fn main() {
let mut m = MyStruct::default();
m.map.insert("gamma".into(), "3".into());
m.map.insert("alpha".into(), "1".into());
m.map.insert("beta".into(), "2".into());
println!("{}", serde_json::to_string_pretty(&m).unwrap());
}
Here, I've chosen to just rebuild an entire BTreeMap from the HashMap and then reuse the existing serialization implementation.
{
"map": {
"alpha": "1",
"beta": "2",
"gamma": "3"
}
}

A slightly more generic way with automatic sorting, one that uses itertools, and one that only relies on the std lib. Try it on playground
// This requires itertools crate
pub fn sorted_map<S: Serializer, K: Serialize + Ord, V: Serialize>(
value: &HashMap<K, V>,
serializer: S,
) -> Result<S::Ok, S::Error> {
value
.iter()
.sorted_by_key(|v| v.0)
.collect::<BTreeMap<_, _>>()
.serialize(serializer)
}
// This only uses std
pub fn sorted_map<S: Serializer, K: Serialize + Ord, V: Serialize>(
value: &HashMap<K, V>,
serializer: S,
) -> Result<S::Ok, S::Error> {
let mut items: Vec<(_, _)> = value.iter().collect();
items.sort_by(|a, b| a.0.cmp(&b.0));
BTreeMap::from_iter(items).serialize(serializer)
}
Both of the above functions can be used with these structs:
#[derive(Serialize)]
pub struct Obj1 {
#[serde(serialize_with = "sorted_map")]
pub table: HashMap<&'static str, i32>,
}
#[derive(Serialize)]
pub struct Obj2 {
#[serde(serialize_with = "sorted_map")]
pub table: HashMap<String, i32>,
}

Related

Serialization of Hashmap inside a struct is failing

I have a struct
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Clone, Debug, Default)]
pub struct State {
pub hash_map: HashMap<String, String>,
}
The serialization of this struct is not working.
It follows the stack trace to collect_map
fn collect_map<K, V, I>(self, iter: I) -> Result<Self::Ok, Self::Error>
where
K: Serialize,
V: Serialize,
I: IntoIterator<Item = (K, V)>,
{
let iter = iter.into_iter();
let mut serializer = try!(self.serialize_map(iterator_len_hint(&iter)));
and then to
fn serialize_map(self, _len: Option<usize>) -> Result<Self::SerializeMap> {
unreachable!()
}
in impl<'a> ser::Serializer for &'a mut Serializer { where the method is not implemented and it fails.
From my understanding, HashMap<String, String> should work out of the box so I'm not sure what am I missing?
I'm using serde = { version = "1.0.150", default-features = false, features = ["derive"] }
and have tried the troubleshooting as per https://serde.rs/derive.html
You need to add std to the serde features: serde = { version = "1.0.150", default-features = false, features = ["derive", "std"] }
Serde can't implement Serialize/Deserialize to HashMap without the std.

How to just use custom serialisation for "stringy" serialisation?

I've recently got to grips with custom serialisation/deserialisation: https://stackoverflow.com/a/63846824/129805
I want to use this custom "stringy" serialisation (and des.) only for JSON and RON, while using the #[derive(Serialisation, ... for all the binary serialisations, such as bincode. (Inflating a two-byte (100, 200) to seven or more bytes of "100:200" is pointlessly wasteful.)
I need to do this within a single executable, as server/server comms will be bincode or protobufs, while client/server comms will be JSON.
Both server/server and client/server comms will use the same serialisable structs. i.e. I want a single set of structs for all comms, but they should use custom serialisation for JSON/RON but derived serialisation for bin/protobufs.
How can I do this?
Update:
Here is working code with tests which pass:
use serde::{Serialize, Serializer, Deserialize, Deserializer};
use serde::de::{self, Visitor, Unexpected};
use std::fmt;
use std::str::FromStr;
use regex::Regex;
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, PartialOrd, Ord)]
struct DerivedIncline {
rise: u8,
distance: u8,
}
impl DerivedIncline {
pub fn new(rise: u8, distance: u8) -> DerivedIncline {
DerivedIncline {rise, distance}
}
}
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord)]
struct StringyIncline {
rise: u8,
distance: u8,
}
impl StringyIncline {
pub fn new(rise: u8, distance: u8) -> StringyIncline {
StringyIncline {rise, distance}
}
}
impl Serialize for StringyIncline {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.serialize_str(&format!("{}:{}", self.rise, self.distance))
}
}
struct StringyInclineVisitor;
impl<'de> Visitor<'de> for StringyInclineVisitor {
type Value = StringyIncline;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("a colon-separated pair of integers between 0 and 255")
}
fn visit_str<E>(self, s: &str) -> Result<Self::Value, E>
where
E: de::Error,
{
let re = Regex::new(r"(\d+):(\d+)").unwrap(); // PERF: move this into a lazy_static!
if let Some(nums) = re.captures_iter(s).next() {
if let Ok(rise) = u8::from_str(&nums[1]) { // nums[0] is the whole match, so we must skip that
if let Ok(distance) = u8::from_str(&nums[2]) {
Ok(StringyIncline::new(rise, distance))
} else {
Err(de::Error::invalid_value(Unexpected::Str(s), &self))
}
} else {
Err(de::Error::invalid_value(Unexpected::Str(s), &self))
}
} else {
Err(de::Error::invalid_value(Unexpected::Str(s), &self))
}
}
}
impl<'de> Deserialize<'de> for StringyIncline {
fn deserialize<D>(deserializer: D) -> Result<StringyIncline, D::Error>
where
D: Deserializer<'de>,
{
deserializer.deserialize_string(StringyInclineVisitor)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn serialisation() {
let stringy_incline = StringyIncline::new(4, 3);
let derived_incline = DerivedIncline::new(4, 3);
let json = serde_json::to_string(&stringy_incline).unwrap();
assert_eq!(json, "\"4:3\"");
let bin = bincode::serialize(&derived_incline).unwrap();
assert_eq!(bin, [4u8, 3u8]);
}
#[test]
fn deserialisation() {
let json = "\"4:3\"";
let bin = [4u8, 3u8];
let deserialised_json: StringyIncline = serde_json::from_str(&json).unwrap();
let deserialised_bin: DerivedIncline = bincode::deserialize(&bin).unwrap();
assert_eq!(deserialised_json, StringyIncline::new(4, 3));
assert_eq!(deserialised_bin, DerivedIncline::new(4, 3));
}
}
I want to have a single Incline struct which acts like StringlyIncline when serialised to JSON or as DerivedIncline when serialised to bincode.
If you're using nightly and are willing to turn on the specialization feature you can write a function that will tell you if the generic parameter S is a serde_json::Serializer
trait IsJsonSerializer {
fn is_json_serializer() -> bool;
}
impl<T> IsJsonSerializer for T {
default fn is_json_serializer() -> bool {
false
}
}
impl<W,F> IsJsonSerializer for &mut serde_json::Serializer<W,F> {
fn is_json_serializer() -> bool {
true
}
}
Then you can write if S::is_json_serializer() {...}. Using this your serialization function can be written:
#[derive(Serialize, Deserialize, PartialEq, Eq, Debug)]
struct RawIncline {
rise: u8,
distance: u8,
}
impl Serialize for Incline {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
if S::is_json_serializer() {
serializer.serialize_str(&format!("{}:{}", self.rise, self.distance))
} else {
RawIncline{rise:self.rise, distance:self.distance}.serialize(serializer)
}
}
}
You can then do something similar for deserialization.
I can't think of a way to get something like this to work without the specialization feature, so it limited to nightly for now - but I'd love to see if it is possible somehow.

How can I ignore extra tuple items when deserializing with Serde? ("trailing characters" error)

Serde ignores unknown named fields when deserializing into regular structs. How can I similarly ignore extra items when deserializing into tuple structs (e.g. from a heterogeneous JSON array)?
For example, this code ignores the extra "c" field just fine:
#[derive(Serialize, Deserialize, Debug)]
pub struct MyStruct { a: String, b: i32 }
fn test_deserialize() -> MyStruct {
::serde_json::from_str::<MyStruct>(r#"
{
"a": "foo",
"b": 123,
"c": "ignore me"
}
"#).unwrap()
}
// => MyStruct { a: "foo", b: 123 }
By contrast, this fails on the extra item in the tuple:
#[derive(Serialize, Deserialize, Debug)]
pub struct MyTuple(String, i32);
fn test_deserialize_tuple() -> MyTuple {
::serde_json::from_str::<MyTuple>(r#"
[
"foo",
123,
"ignore me"
]
"#).unwrap()
}
// => Error("trailing characters", line: 5, column: 13)
I'd like to allow extra items for forward compatibility in my data format. What's the easiest way to get Serde to ignore extra tuple items when deserializing?
You can implement a custom Visitor which ignores rest of the sequence. Be aware that the whole sequence must be consumed. This is an important part (try to remove it and you'll get same error):
// This is very important!
while let Some(IgnoredAny) = seq.next_element()? {
// Ignore rest
}
Here's a working example:
use std::fmt;
use serde::de::{self, Deserialize, Deserializer, IgnoredAny, SeqAccess, Visitor};
use serde::Serialize;
#[derive(Serialize, Debug)]
pub struct MyTuple(String, i32);
impl<'de> Deserialize<'de> for MyTuple {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct MyTupleVisitor;
impl<'de> Visitor<'de> for MyTupleVisitor {
type Value = MyTuple;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct MyTuple")
}
fn visit_seq<V>(self, mut seq: V) -> Result<Self::Value, V::Error>
where
V: SeqAccess<'de>,
{
let s = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(0, &self))?;
let n = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(1, &self))?;
// This is very important!
while let Some(IgnoredAny) = seq.next_element()? {
// Ignore rest
}
Ok(MyTuple(s, n))
}
}
deserializer.deserialize_seq(MyTupleVisitor)
}
}
fn main() {
let two_elements = r#"["foo", 123]"#;
let three_elements = r#"["foo", 123, "bar"]"#;
let tuple: MyTuple = serde_json::from_str(two_elements).unwrap();
assert_eq!(tuple.0, "foo");
assert_eq!(tuple.1, 123);
let tuple: MyTuple = serde_json::from_str(three_elements).unwrap();
assert_eq!(tuple.0, "foo");
assert_eq!(tuple.1, 123);
}
For JSON, I'd combine RawValue and a custom deserialization:
use serde::{Deserialize, Deserializer};
#[derive(Debug)]
struct MyTuple(String, i32);
#[derive(Deserialize, Debug)]
struct MyTupleFutureCompat<'a>(
String,
i32,
#[serde(default, borrow)] Option<&'a serde_json::value::RawValue>,
);
impl<'de> Deserialize<'de> for MyTuple {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let t: MyTupleFutureCompat = Deserialize::deserialize(deserializer)?;
Ok(MyTuple(t.0, t.1))
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let json = r#"[
"foo",
123,
"ignore me"
]"#;
let d: MyTuple = serde_json::from_str(json)?;
println!("{:?}", d);
Ok(())
}
See also:
How to transform fields during deserialization using Serde?
Is there a way to deserialize arbitrary JSON using Serde without creating fine-grained objects?
Why can Serde not derive Deserialize for a struct containing only a &Path?

How do I use Serde to serialize a HashMap with structs as keys to JSON?

I want to serialize a HashMap with structs as keys:
use serde::{Deserialize, Serialize}; // 1.0.68
use std::collections::HashMap;
fn main() {
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash)]
struct Foo {
x: u64,
}
#[derive(Serialize, Deserialize, Debug)]
struct Bar {
x: HashMap<Foo, f64>,
}
let mut p = Bar { x: HashMap::new() };
p.x.insert(Foo { x: 0 }, 0.0);
let serialized = serde_json::to_string(&p).unwrap();
}
This code compiles, but when I run it I get an error:
Error("key must be a string", line: 0, column: 0)'
I changed the code:
#[derive(Serialize, Deserialize, Debug)]
struct Bar {
x: HashMap<u64, f64>,
}
let mut p = Bar { x: HashMap::new() };
p.x.insert(0, 0.0);
let serialized = serde_json::to_string(&p).unwrap();
The key in the HashMap is now a u64 instead of a string. Why does the first code give an error?
You can use serde_as from the serde_with crate to encode the HashMap as a sequence of key-value pairs:
use serde_with::serde_as; // 1.5.1
#[serde_as]
#[derive(Serialize, Deserialize, Debug)]
struct Bar {
#[serde_as(as = "Vec<(_, _)>")]
x: HashMap<Foo, f64>,
}
Which will serialize to (and deserialize from) this:
{
"x":[
[{"x": 0}, 0.0],
[{"x": 1}, 0.0],
[{"x": 2}, 0.0]
]
}
There is likely some overhead from converting the HashMap to Vec, but this can be very convenient.
According to JSONs specification, JSON keys must be strings. serde_json uses fmt::Display in here, for some non-string keys, to allow serialization of wider range of HashMaps. That's why HashMap<u64, f64> works as well as HashMap<String, f64> would. However, not all types are covered (Foo's case here).
That's why we need to provide our own Serialize implementation:
impl Display for Foo {
fn fmt(&self, f: &mut Formatter) -> std::fmt::Result {
write!(f, "{}", self.x)
}
}
impl Serialize for Bar {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut map = serializer.serialize_map(Some(self.x.len()))?;
for (k, v) in &self.x {
map.serialize_entry(&k.to_string(), &v)?;
}
map.end()
}
}
(playground)
I've found the bulletproof solution 😃
Extra dependencies not required
Compatible with HashMap, BTreeMap and other iterable types
Works with flexbuffers
The following code converts a field (map) to the intermediate Vec representation:
pub mod vectorize {
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::iter::FromIterator;
pub fn serialize<'a, T, K, V, S>(target: T, ser: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
T: IntoIterator<Item = (&'a K, &'a V)>,
K: Serialize + 'a,
V: Serialize + 'a,
{
let container: Vec<_> = target.into_iter().collect();
serde::Serialize::serialize(&container, ser)
}
pub fn deserialize<'de, T, K, V, D>(des: D) -> Result<T, D::Error>
where
D: Deserializer<'de>,
T: FromIterator<(K, V)>,
K: Deserialize<'de>,
V: Deserialize<'de>,
{
let container: Vec<_> = serde::Deserialize::deserialize(des)?;
Ok(T::from_iter(container.into_iter()))
}
}
To use it just add the module's name as an attribute:
#[derive(Debug, Serialize, Deserialize)]
struct MyComplexType {
#[serde(with = "vectorize")]
map: HashMap<MyKey, String>,
}
The remained part if you want to check it locally:
use anyhow::Error;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct MyKey {
one: String,
two: u16,
more: Vec<u8>,
}
#[derive(Debug, Serialize, Deserialize)]
struct MyComplexType {
#[serde(with = "vectorize")]
map: HashMap<MyKey, String>,
}
fn main() -> Result<(), Error> {
let key = MyKey {
one: "1".into(),
two: 2,
more: vec![1, 2, 3],
};
let mut map = HashMap::new();
map.insert(key.clone(), "value".into());
let instance = MyComplexType { map };
let serialized = serde_json::to_string(&instance)?;
println!("JSON: {}", serialized);
let deserialized: MyComplexType = serde_json::from_str(&serialized)?;
let expected_value = "value".to_string();
assert_eq!(deserialized.map.get(&key), Some(&expected_value));
Ok(())
}
And on the Rust playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bf1773b6e501a0ea255ccdf8ce37e74d
While all provided answers will fulfill the goal of serializing your HashMap to json they are ad hoc or hard to maintain.
One correct way to allow a specific data structure to be serialized with serde as keys in a map, is the same way serde handles integer keys in HashMaps (which works): They serialize the value to String. This has a few advantages; namely
Intermediate data-structure omitted,
no need to clone the entire HashMap,
easier maintained by applying OOP concepts, and
serialization usable in more complex structures such as MultiMap.
This can be done by manually implementing Serialize and Deserialize for your data-type.
I use composite ids for maps.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Proj {
pub value: u64,
}
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Doc {
pub proj: Proj,
pub value: u32,
}
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Sec {
pub doc: Doc,
pub value: u32,
}
So now manually implementing serde serialization for them is kind of a hassle, so instead we delegate the implementation to the FromStr and From<Self> for String (Into<String> blanket) traits.
impl From<Doc> for String {
fn from(val: Doc) -> Self {
format!("{}{:08X}", val.proj, val.value)
}
}
impl FromStr for Doc {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match parse_doc(s) {
Ok((_, p)) => Ok(p),
Err(e) => Err(e.to_string()),
}
}
}
In order to parse the Doc we make use of nom. The parse functionality below is explained in their examples.
fn is_hex_digit(c: char) -> bool {
c.is_digit(16)
}
fn from_hex8(input: &str) -> Result<u32, std::num::ParseIntError> {
u32::from_str_radix(input, 16)
}
fn parse_hex8(input: &str) -> IResult<&str, u32> {
map_res(take_while_m_n(8, 8, is_hex_digit), from_hex8)(input)
}
fn parse_doc(input: &str) -> IResult<&str, Doc> {
let (input, proj) = parse_proj(input)?;
let (input, value) = parse_hex8(input)?;
Ok((input, Doc { value, proj }))
}
Now we need to hook up self.to_string() and str::parse(&str) to serde we can do this using a simple macro.
macro_rules! serde_str {
($type:ty) => {
impl Serialize for $type {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let s: String = self.clone().into();
serializer.serialize_str(&s)
}
}
impl<'de> Deserialize<'de> for $type {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
paste! {deserializer.deserialize_string( [<$type Visitor>] {})}
}
}
paste! {struct [<$type Visitor>] {}}
impl<'de> Visitor<'de> for paste! {[<$type Visitor>]} {
type Value = $type;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
formatter.write_str("\"")
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
match str::parse(v) {
Ok(id) => Ok(id),
Err(_) => Err(serde::de::Error::custom("invalid format")),
}
}
}
};
}
Here we are using paste to interpolate the names. Beware that now the struct will always serialize as defined above. Never as a struct, always as a string.
It is important to implement fn visit_str instead of fn visit_string because visit_string defers to visit_str.
Finally, we have to call the macro for our custom structs
serde_str!(Sec);
serde_str!(Doc);
serde_str!(Proj);
Now the specified types can be serialized to and from string with serde.

How do I serialize or deserialize an Arc<T> in Serde?

I have a struct that contains children of its own type. These children are wrapped in Arcs, and I'm getting issues when calling serde_json::to_string on it. My struct is:
#[derive(Serialize, Deserialize)]
pub struct Category {
pub id: i32,
pub name: String,
pub parent_id: i32,
pub children: Vec<Arc<Category>>,
}
This produces the error the trait 'serde::Serialize' is not implemented for 'std::sync::Arc<db::queries::categories::Category>' I've tried a few different approaches to get serialization working, such as:
#[serde(serialize_with = "arc_category_vec")]
pub children: Vec<Arc<Category>>
fn arc_category_vec<S>(value: &Vec<Arc<Category>>, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut seq = serializer.serialize_seq(Some(value.len()))?;
for e in value {
seq.serialize_element(e.as_ref())?;
}
seq.end()
}
This doesn't help as I get the same error. I also tried:
impl Serialize for Arc<Category> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut state = serializer.serialize_struct("Category", 4)?;
state.serialize_field("id", &self.id)?;
state.serialize_field("name", &self.name)?;
state.serialize_field("parent_id", &self.parent_id)?;
state.serialize_field("children", &self.children)?;
state.end();
}
}
but that gives the error impl doesn't use types inside crate
I could probably live without deserialization, since serialization is more important at this point.
Serde provides implementations of Serialize and Deserialize for Arc<T> and Rc<T>, but only if the rc feature is enabled.
There's a comment on Serde's reference website explaining why:
Opt into impls for Rc<T> and Arc<T>. Serializing and deserializing these types does not preserve identity and may result in multiple copies of the same data. Be sure that this is what you want before enabling this feature.
To enable the rc feature, you need to ask for it in your own Cargo.toml:
[dependencies]
serde = { version = "1.0", features = ["rc"] }

Resources