Related
I'm implementing a bytecode VM and am struggling referencing data stored in a parsed representation of the bytecode. As is the nature of (most) bytecode, it and thus its parsed representation remain unmodified once it's initialized. A separate Vm contains the mutable parts (stack etc.) along with that module. I made an MCVE with additional explanatory comments to illustrate the problem; it's at the bottom and on the playground. The parsed bytecode may look like this:
Module { struct_types: {"Bar": StructType::Named(["a", "b"])} }
The strings "Bar", "a", "b" are references into the bytecode and have lifetime 'b, so we also have lifetimes in the types Module<'b> and StructType<'b>.
After creating this, I will want to create struct instances, think let bar = Bar { a: (), b: () };. At least currently, each struct instance needs to hold a reference to its type, so that type might look like this:
pub struct Struct<'b> {
struct_type: &'b bytecode::StructType<'b>,
fields: Vec<Value<'b>>,
}
The values of a struct's fields may be constants whose value is stored in the bytecode, so the Value enum has a lifetime 'b as well, and that works. The problem is that I have a &'b bytecode::StructType<'b> in the first field: how do I get a reference that lives long enough? I think the reference would actually be valid long enough.
The part of the code that I suspect to be the critical one is here:
pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
// self.struct_types.get(name)
todo!("fix lifetime problems")
}
With the commented out code, I can't get a 'b reference because the reference self.struct_types lives too short; to fix that I'd need to do &'b self which would spread virally through the code; also, most of the times I need to borrow the Vm mutably, which doesn't work if all those exclusive self references have to live long.
Introducing a separate lifetime 'm so that I could return a &'m StructType<'b> sounds like something that I could try as well, but that sounds just as viral and in addition introduces a separate lifetime I need to keep track of; being able to replace 'b with 'm (or at least only having on in each place) would be a bit nicer.
Finally this feels like something that pinning could be helpful with, but I don't understand that topic enough to make an educated guess on how that could be approached.
MCVE
#![allow(dead_code)]
mod bytecode {
use std::collections::BTreeMap;
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StructType<'b> {
/// unit struct type; doesn't have fields
Empty,
/// tuple struct type; fields are positional
Positional(usize),
/// "normal" struct type; fields are named
Named(Vec<&'b str>),
}
impl<'b> StructType<'b> {
pub fn field_count(&self) -> usize {
match self {
Self::Empty => 0,
Self::Positional(field_count) => *field_count,
Self::Named(fields) => fields.len(),
}
}
}
#[derive(Debug, Clone)]
pub struct Module<'b> {
struct_types: BTreeMap<&'b str, StructType<'b>>,
}
impl<'b> Module<'b> {
// here is the problem: I would like to return a reference with lifetime 'b.
// from the point I start executing instructions, I know that I won't modify
// the module (particularly, I won't add entries to the map), so I think that
// lifetime should be possible - pinning? `&'b self` everywhere? idk
pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
// self.struct_types.get(name)
todo!("fix lifetime problems")
}
}
pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
// this would use nom to parse actual bytecode
assert_eq!(bytecode, "struct Bar { a, b }");
let bar = &bytecode[7..10];
let a = &bytecode[13..14];
let b = &bytecode[16..17];
let fields = vec![a, b];
let bar_struct = StructType::Named(fields);
let struct_types = BTreeMap::from_iter([
(bar, bar_struct),
]);
Module { struct_types }
}
}
mod vm {
use crate::bytecode::{self, StructType};
#[derive(Debug, Clone)]
pub enum Value<'b> {
Unit,
Struct(Struct<'b>),
}
#[derive(Debug, Clone)]
pub struct Struct<'b> {
struct_type: &'b bytecode::StructType<'b>,
fields: Vec<Value<'b>>,
}
impl<'b> Struct<'b> {
pub fn new(struct_type: &'b bytecode::StructType<'b>, fields: Vec<Value<'b>>) -> Self {
Struct { struct_type, fields }
}
}
#[derive(Debug, Clone)]
pub struct Vm<'b> {
module: bytecode::Module<'b>,
}
impl<'b> Vm<'b> {
pub fn new(module: bytecode::Module<'b>) -> Self {
Self { module }
}
pub fn create_struct(&mut self, type_name: &'b str) -> Value<'b> {
let struct_type: &'b StructType<'b> = self.module.struct_type(type_name).unwrap();
// just initialize the fields to something, we don't care
let fields = vec![Value::Unit; struct_type.field_count()];
let value = Value::Struct(Struct::new(struct_type, fields));
value
}
}
}
pub fn main() {
// the bytecode contains all constants needed at runtime;
// we're just interested in how struct types are handled
// obviously the real bytecode is not as human-readable
let bytecode = "struct Bar { a, b }";
// we parse that into a module that, among other things,
// has a map of all struct types
let module = bytecode::parse(bytecode);
println!("{:?}", module);
// we create a Vm that is capable of running commands
// that are stored in the module
let mut vm = vm::Vm::new(module);
// now we try to execute an instruction to create a struct value
// the instruction for this contains a reference to the type name
// stored in the bytecode.
// the struct value contains a reference to its type and holds its field values.
let value = {
let bar = &bytecode[7..10];
vm.create_struct(bar)
};
println!("{:?}", value);
}
&'b bytecode::StructType<'b> is a classic anti-pattern in Rust, it strongly indicates incorrectly annotated lifetimes. It doesn't make sense that an object would depend on some lifetime and borrowing it creates the same lifetime. That is very rare to happen on purpose.
So I suspect you need two lifetimes, which I will call 'm and 'b:
'b: the lifetime of the bytecode string, everything that references it will use &'b str.
'm: the lifetime of the Module object. Everything that references it or its contained StructType will use this lifetime.
If split into two lifetimes and adjusted correctly, it simply works:
#![allow(dead_code)]
mod bytecode {
use std::{collections::BTreeMap, iter::FromIterator};
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StructType<'b> {
/// unit struct type; doesn't have fields
Empty,
/// tuple struct type; fields are positional
Positional(usize),
/// "normal" struct type; fields are named
Named(Vec<&'b str>),
}
impl<'b> StructType<'b> {
pub fn field_count(&self) -> usize {
match self {
Self::Empty => 0,
Self::Positional(field_count) => *field_count,
Self::Named(fields) => fields.len(),
}
}
}
#[derive(Debug, Clone)]
pub struct Module<'b> {
struct_types: BTreeMap<&'b str, StructType<'b>>,
}
impl<'b> Module<'b> {
// here is the problem: I would like to return a reference with lifetime 'b.
// from the point I start executing instructions, I know that I won't modify
// the module (particularly, I won't add entries to the map), so I think that
// lifetime should be possible - pinning? `&'b self` everywhere? idk
pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
self.struct_types.get(name)
}
}
pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
// this would use nom to parse actual bytecode
assert_eq!(bytecode, "struct Bar { a, b }");
let bar = &bytecode[7..10];
let a = &bytecode[13..14];
let b = &bytecode[16..17];
let fields = vec![a, b];
let bar_struct = StructType::Named(fields);
let struct_types = BTreeMap::from_iter([(bar, bar_struct)]);
Module { struct_types }
}
}
mod vm {
use crate::bytecode::{self, StructType};
#[derive(Debug, Clone)]
pub enum Value<'b, 'm> {
Unit,
Struct(Struct<'b, 'm>),
}
#[derive(Debug, Clone)]
pub struct Struct<'b, 'm> {
struct_type: &'m bytecode::StructType<'b>,
fields: Vec<Value<'b, 'm>>,
}
impl<'b, 'm> Struct<'b, 'm> {
pub fn new(struct_type: &'m bytecode::StructType<'b>, fields: Vec<Value<'b, 'm>>) -> Self {
Struct {
struct_type,
fields,
}
}
}
#[derive(Debug, Clone)]
pub struct Vm<'b> {
module: bytecode::Module<'b>,
}
impl<'b> Vm<'b> {
pub fn new(module: bytecode::Module<'b>) -> Self {
Self { module }
}
pub fn create_struct(&mut self, type_name: &str) -> Value<'b, '_> {
let struct_type: &StructType<'b> = self.module.struct_type(type_name).unwrap();
// just initialize the fields to something, we don't care
let fields = vec![Value::Unit; struct_type.field_count()];
let value = Value::Struct(Struct::new(struct_type, fields));
value
}
}
}
pub fn main() {
// the bytecode contains all constants needed at runtime;
// we're just interested in how struct types are handled
// obviously the real bytecode is not as human-readable
let bytecode = "struct Bar { a, b }";
// we parse that into a module that, among other things,
// has a map of all struct types
let module = bytecode::parse(bytecode);
println!("{:?}", module);
// we create a Vm that is capable of running commands
// that are stored in the module
let mut vm = vm::Vm::new(module);
// now we try to execute an instruction to create a struct value
// the instruction for this contains a reference to the type name
// stored in the bytecode.
// the struct value contains a reference to its type and holds its field values.
let value = {
let bar = &bytecode[7..10];
vm.create_struct(bar)
};
println!("{:?}", value);
}
Module { struct_types: {"Bar": Named(["a", "b"])} }
Struct(Struct { struct_type: Named(["a", "b"]), fields: [Unit, Unit] })
It can further be simplified, however, due to the fact that 'm is connected to 'b, and therefore everything that depends on 'm automatically also has access to 'b objects, because 'b is guaranteed to outlive 'm.
Therefore, let's introduce 'a, which we will now use inside of the vm mod to reference anything from the bytecode mod. This will further allow lifetime elysion to happen at a couple of points, simplifying the code even further:
#![allow(dead_code)]
mod bytecode {
use std::{collections::BTreeMap, iter::FromIterator};
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StructType<'b> {
/// unit struct type; doesn't have fields
Empty,
/// tuple struct type; fields are positional
Positional(usize),
/// "normal" struct type; fields are named
Named(Vec<&'b str>),
}
impl<'b> StructType<'b> {
pub fn field_count(&self) -> usize {
match self {
Self::Empty => 0,
Self::Positional(field_count) => *field_count,
Self::Named(fields) => fields.len(),
}
}
}
#[derive(Debug, Clone)]
pub struct Module<'b> {
struct_types: BTreeMap<&'b str, StructType<'b>>,
}
impl<'b> Module<'b> {
// here is the problem: I would like to return a reference with lifetime 'b.
// from the point I start executing instructions, I know that I won't modify
// the module (particularly, I won't add entries to the map), so I think that
// lifetime should be possible - pinning? `&'b self` everywhere? idk
pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
self.struct_types.get(name)
}
}
pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
// this would use nom to parse actual bytecode
assert_eq!(bytecode, "struct Bar { a, b }");
let bar = &bytecode[7..10];
let a = &bytecode[13..14];
let b = &bytecode[16..17];
let fields = vec![a, b];
let bar_struct = StructType::Named(fields);
let struct_types = BTreeMap::from_iter([(bar, bar_struct)]);
Module { struct_types }
}
}
mod vm {
use crate::bytecode::{self, StructType};
#[derive(Debug, Clone)]
pub enum Value<'a> {
Unit,
Struct(Struct<'a>),
}
#[derive(Debug, Clone)]
pub struct Struct<'a> {
struct_type: &'a bytecode::StructType<'a>,
fields: Vec<Value<'a>>,
}
impl<'a> Struct<'a> {
pub fn new(struct_type: &'a bytecode::StructType, fields: Vec<Value<'a>>) -> Self {
Struct {
struct_type,
fields,
}
}
}
#[derive(Debug, Clone)]
pub struct Vm<'a> {
module: bytecode::Module<'a>,
}
impl<'a> Vm<'a> {
pub fn new(module: bytecode::Module<'a>) -> Self {
Self { module }
}
pub fn create_struct(&mut self, type_name: &str) -> Value {
let struct_type: &StructType = self.module.struct_type(type_name).unwrap();
// just initialize the fields to something, we don't care
let fields = vec![Value::Unit; struct_type.field_count()];
let value = Value::Struct(Struct::new(struct_type, fields));
value
}
}
}
pub fn main() {
// the bytecode contains all constants needed at runtime;
// we're just interested in how struct types are handled
// obviously the real bytecode is not as human-readable
let bytecode = "struct Bar { a, b }";
// we parse that into a module that, among other things,
// has a map of all struct types
let module = bytecode::parse(bytecode);
println!("{:?}", module);
// we create a Vm that is capable of running commands
// that are stored in the module
let mut vm = vm::Vm::new(module);
// now we try to execute an instruction to create a struct value
// the instruction for this contains a reference to the type name
// stored in the bytecode.
// the struct value contains a reference to its type and holds its field values.
let value = {
let bar = &bytecode[7..10];
vm.create_struct(bar)
};
println!("{:?}", value);
}
Module { struct_types: {"Bar": Named(["a", "b"])} }
Struct(Struct { struct_type: Named(["a", "b"]), fields: [Unit, Unit] })
Fun fact: This is now one of the rare cases where we legitimately have to use &'a bytecode::StructType<'a>, so take my opening statement with a grain of salt, and you were kind of right all along :)
The crazy thing is if we then rename 'a to 'b to be consistent with your original code, we get almost your code with only some minor differences:
#![allow(dead_code)]
mod bytecode {
use std::{collections::BTreeMap, iter::FromIterator};
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StructType<'b> {
/// unit struct type; doesn't have fields
Empty,
/// tuple struct type; fields are positional
Positional(usize),
/// "normal" struct type; fields are named
Named(Vec<&'b str>),
}
impl<'b> StructType<'b> {
pub fn field_count(&self) -> usize {
match self {
Self::Empty => 0,
Self::Positional(field_count) => *field_count,
Self::Named(fields) => fields.len(),
}
}
}
#[derive(Debug, Clone)]
pub struct Module<'b> {
struct_types: BTreeMap<&'b str, StructType<'b>>,
}
impl<'b> Module<'b> {
// here is the problem: I would like to return a reference with lifetime 'b.
// from the point I start executing instructions, I know that I won't modify
// the module (particularly, I won't add entries to the map), so I think that
// lifetime should be possible - pinning? `&'b self` everywhere? idk
pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
self.struct_types.get(name)
}
}
pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
// this would use nom to parse actual bytecode
assert_eq!(bytecode, "struct Bar { a, b }");
let bar = &bytecode[7..10];
let a = &bytecode[13..14];
let b = &bytecode[16..17];
let fields = vec![a, b];
let bar_struct = StructType::Named(fields);
let struct_types = BTreeMap::from_iter([(bar, bar_struct)]);
Module { struct_types }
}
}
mod vm {
use crate::bytecode::{self, StructType};
#[derive(Debug, Clone)]
pub enum Value<'b> {
Unit,
Struct(Struct<'b>),
}
#[derive(Debug, Clone)]
pub struct Struct<'b> {
struct_type: &'b bytecode::StructType<'b>,
fields: Vec<Value<'b>>,
}
impl<'b> Struct<'b> {
pub fn new(struct_type: &'b bytecode::StructType, fields: Vec<Value<'b>>) -> Self {
Struct {
struct_type,
fields,
}
}
}
#[derive(Debug, Clone)]
pub struct Vm<'b> {
module: bytecode::Module<'b>,
}
impl<'b> Vm<'b> {
pub fn new(module: bytecode::Module<'b>) -> Self {
Self { module }
}
pub fn create_struct(&mut self, type_name: &str) -> Value {
let struct_type: &StructType = self.module.struct_type(type_name).unwrap();
// just initialize the fields to something, we don't care
let fields = vec![Value::Unit; struct_type.field_count()];
let value = Value::Struct(Struct::new(struct_type, fields));
value
}
}
}
pub fn main() {
// the bytecode contains all constants needed at runtime;
// we're just interested in how struct types are handled
// obviously the real bytecode is not as human-readable
let bytecode = "struct Bar { a, b }";
// we parse that into a module that, among other things,
// has a map of all struct types
let module = bytecode::parse(bytecode);
println!("{:?}", module);
// we create a Vm that is capable of running commands
// that are stored in the module
let mut vm = vm::Vm::new(module);
// now we try to execute an instruction to create a struct value
// the instruction for this contains a reference to the type name
// stored in the bytecode.
// the struct value contains a reference to its type and holds its field values.
let value = {
let bar = &bytecode[7..10];
vm.create_struct(bar)
};
println!("{:?}", value);
}
Module { struct_types: {"Bar": Named(["a", "b"])} }
Struct(Struct { struct_type: Named(["a", "b"]), fields: [Unit, Unit] })
So the actual fix for your original code is as follows:
4c4
< use std::collections::BTreeMap;
---
> use std::{collections::BTreeMap, iter::FromIterator};
36,38c36,37
< pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
< // self.struct_types.get(name)
< todo!("fix lifetime problems")
---
> pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
> self.struct_types.get(name)
73c72
< pub fn new(struct_type: &'b bytecode::StructType<'b>, fields: Vec<Value<'b>>) -> Self {
---
> pub fn new(struct_type: &'b bytecode::StructType, fields: Vec<Value<'b>>) -> Self {
91,92c90,91
< pub fn create_struct(&mut self, type_name: &'b str) -> Value<'b> {
< let struct_type: &'b StructType<'b> = self.module.struct_type(type_name).unwrap();
---
> pub fn create_struct(&mut self, type_name: &str) -> Value {
> let struct_type: &StructType = self.module.struct_type(type_name).unwrap();
I hope deriving them step by step made it somewhat clear why those changes are necessary.
I have JSON objects with the following format:
{
"name": "foo",
"value": 1234,
"upper_bound": 5000,
"lower_bound": 1000
}
I'd like to use serde to work with these objects, with a struct like
struct MyObject {
name: String,
value: i32,
bound: Range<i32>,
}
Without any modifications, serializing one of these structs yields
{
"name": "foo",
"value": 1234,
"bound": {
"start": 1000,
"end": 5000
}
}
I can apply #[serde(flatten)] to get closer, yielding
{
"name": "foo",
"value": 1234,
"start": 1000,
"end": 5000
}
But adding #[serde(rename...)] doesn't seem to change anything, no matter what kind of arguments I try giving to the rename. Is it possible to flatten the range, and rename the args?
You can use serde attribute with and just use a intermediate structure letting the real implementation to serde:
use core::ops::Range;
use serde::{Deserialize, Serialize};
use serde_json::Error;
#[derive(Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
struct Foo {
name: String,
value: i32,
#[serde(with = "range_aux", flatten)]
bound: Range<i32>,
}
mod range_aux {
use core::ops::Range;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
#[derive(Serialize, Deserialize)]
struct RangeAux {
upper_bound: i32,
lower_bound: i32,
}
pub fn serialize<S>(range: &Range<i32>, ser: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
RangeAux::serialize(
&RangeAux {
upper_bound: range.end,
lower_bound: range.start,
},
ser,
)
}
pub fn deserialize<'de, D>(d: D) -> Result<Range<i32>, D::Error>
where
D: Deserializer<'de>,
{
let range_aux: RangeAux = RangeAux::deserialize(d)?;
Ok(Range {
start: range_aux.lower_bound,
end: range_aux.upper_bound,
})
}
}
fn main() -> Result<(), Error> {
let data = r#"{"name":"foo","value":1234,"upper_bound":5000,"lower_bound":1000}"#;
let foo: Foo = serde_json::from_str(data)?;
assert_eq!(
foo,
Foo {
name: "foo".to_string(),
value: 1234,
bound: 1000..5000
}
);
let output = serde_json::to_string(&foo)?;
assert_eq!(data, output);
Ok(())
}
That very close to remote pattern but this doesn't work with generic see serde#1844.
A possible generic version:
use core::ops::Range;
use serde::{Deserialize, Serialize};
use serde_json::Error;
#[derive(Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
struct Foo {
name: String,
value: i32,
#[serde(with = "range_aux", flatten)]
bound: Range<i32>,
}
mod range_aux {
use core::ops::Range;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
pub fn serialize<S, Idx: Serialize>(range: &Range<Idx>, ser: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
// could require Idx to be Copy or Clone instead of borrowing Idx
#[derive(Serialize)]
struct RangeAux<'a, Idx> {
upper_bound: &'a Idx,
lower_bound: &'a Idx,
}
RangeAux::serialize(
&RangeAux {
upper_bound: &range.end,
lower_bound: &range.start,
},
ser,
)
}
pub fn deserialize<'de, D, Idx: Deserialize<'de>>(d: D) -> Result<Range<Idx>, D::Error>
where
D: Deserializer<'de>,
{
#[derive(Deserialize)]
struct RangeAux<Idx> {
upper_bound: Idx,
lower_bound: Idx,
}
let range_aux: RangeAux<Idx> = RangeAux::deserialize(d)?;
Ok(Range {
start: range_aux.lower_bound,
end: range_aux.upper_bound,
})
}
}
fn main() -> Result<(), Error> {
let data = r#"{"name":"foo","value":1234,"upper_bound":5000,"lower_bound":1000}"#;
let foo: Foo = serde_json::from_str(data)?;
assert_eq!(
foo,
Foo {
name: "foo".to_string(),
value: 1234,
bound: 1000..5000
}
);
let output = serde_json::to_string(&foo)?;
assert_eq!(data, output);
Ok(())
}
Not necessarily more concise than a custom serializer, but certainly a good bit more trivial is a solution with [serde(from and into)]. (I feel like I'm posting this on every serde question. :/)
You define an auxiliary, serializable struct that has the JSON structure you want:
#[derive(Deserialize, Serialize, Clone)]
struct AuxMyObject {
name: String,
value: i32,
upper_bound: i32,
lower_bound: i32,
}
Then you explain to rust how your auxiliary struct relates to the original struct. It's a bit tedious (but easy), there may be some macro crates that help lessen the typing load:
impl From<MyObject> for AuxMyObject {
fn from(from: MyObject) -> Self {
Self {
name: from.name,
value: from.value,
lower_bound: from.bound.start,
upper_bound: from.bound.end,
}
}
}
impl From<AuxMyObject> for MyObject {
fn from(from: AuxMyObject) -> Self {
Self {
name: from.name,
value: from.value,
bound: Range {
start: from.lower_bound,
end: from.upper_bound,
},
}
}
}
Lastly, you tell serde to replace your main struct with the auxiliary struct when serializing:
#[derive(Serialize, Deserialize, Debug, PartialEq, Clone)]
#[serde(from = "AuxMyObject", into = "AuxMyObject")]
struct MyObject { … }
Playground
Serde ignores unknown named fields when deserializing into regular structs. How can I similarly ignore extra items when deserializing into tuple structs (e.g. from a heterogeneous JSON array)?
For example, this code ignores the extra "c" field just fine:
#[derive(Serialize, Deserialize, Debug)]
pub struct MyStruct { a: String, b: i32 }
fn test_deserialize() -> MyStruct {
::serde_json::from_str::<MyStruct>(r#"
{
"a": "foo",
"b": 123,
"c": "ignore me"
}
"#).unwrap()
}
// => MyStruct { a: "foo", b: 123 }
By contrast, this fails on the extra item in the tuple:
#[derive(Serialize, Deserialize, Debug)]
pub struct MyTuple(String, i32);
fn test_deserialize_tuple() -> MyTuple {
::serde_json::from_str::<MyTuple>(r#"
[
"foo",
123,
"ignore me"
]
"#).unwrap()
}
// => Error("trailing characters", line: 5, column: 13)
I'd like to allow extra items for forward compatibility in my data format. What's the easiest way to get Serde to ignore extra tuple items when deserializing?
You can implement a custom Visitor which ignores rest of the sequence. Be aware that the whole sequence must be consumed. This is an important part (try to remove it and you'll get same error):
// This is very important!
while let Some(IgnoredAny) = seq.next_element()? {
// Ignore rest
}
Here's a working example:
use std::fmt;
use serde::de::{self, Deserialize, Deserializer, IgnoredAny, SeqAccess, Visitor};
use serde::Serialize;
#[derive(Serialize, Debug)]
pub struct MyTuple(String, i32);
impl<'de> Deserialize<'de> for MyTuple {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct MyTupleVisitor;
impl<'de> Visitor<'de> for MyTupleVisitor {
type Value = MyTuple;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("struct MyTuple")
}
fn visit_seq<V>(self, mut seq: V) -> Result<Self::Value, V::Error>
where
V: SeqAccess<'de>,
{
let s = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(0, &self))?;
let n = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(1, &self))?;
// This is very important!
while let Some(IgnoredAny) = seq.next_element()? {
// Ignore rest
}
Ok(MyTuple(s, n))
}
}
deserializer.deserialize_seq(MyTupleVisitor)
}
}
fn main() {
let two_elements = r#"["foo", 123]"#;
let three_elements = r#"["foo", 123, "bar"]"#;
let tuple: MyTuple = serde_json::from_str(two_elements).unwrap();
assert_eq!(tuple.0, "foo");
assert_eq!(tuple.1, 123);
let tuple: MyTuple = serde_json::from_str(three_elements).unwrap();
assert_eq!(tuple.0, "foo");
assert_eq!(tuple.1, 123);
}
For JSON, I'd combine RawValue and a custom deserialization:
use serde::{Deserialize, Deserializer};
#[derive(Debug)]
struct MyTuple(String, i32);
#[derive(Deserialize, Debug)]
struct MyTupleFutureCompat<'a>(
String,
i32,
#[serde(default, borrow)] Option<&'a serde_json::value::RawValue>,
);
impl<'de> Deserialize<'de> for MyTuple {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let t: MyTupleFutureCompat = Deserialize::deserialize(deserializer)?;
Ok(MyTuple(t.0, t.1))
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let json = r#"[
"foo",
123,
"ignore me"
]"#;
let d: MyTuple = serde_json::from_str(json)?;
println!("{:?}", d);
Ok(())
}
See also:
How to transform fields during deserialization using Serde?
Is there a way to deserialize arbitrary JSON using Serde without creating fine-grained objects?
Why can Serde not derive Deserialize for a struct containing only a &Path?
I want to serialize a HashMap with structs as keys:
use serde::{Deserialize, Serialize}; // 1.0.68
use std::collections::HashMap;
fn main() {
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash)]
struct Foo {
x: u64,
}
#[derive(Serialize, Deserialize, Debug)]
struct Bar {
x: HashMap<Foo, f64>,
}
let mut p = Bar { x: HashMap::new() };
p.x.insert(Foo { x: 0 }, 0.0);
let serialized = serde_json::to_string(&p).unwrap();
}
This code compiles, but when I run it I get an error:
Error("key must be a string", line: 0, column: 0)'
I changed the code:
#[derive(Serialize, Deserialize, Debug)]
struct Bar {
x: HashMap<u64, f64>,
}
let mut p = Bar { x: HashMap::new() };
p.x.insert(0, 0.0);
let serialized = serde_json::to_string(&p).unwrap();
The key in the HashMap is now a u64 instead of a string. Why does the first code give an error?
You can use serde_as from the serde_with crate to encode the HashMap as a sequence of key-value pairs:
use serde_with::serde_as; // 1.5.1
#[serde_as]
#[derive(Serialize, Deserialize, Debug)]
struct Bar {
#[serde_as(as = "Vec<(_, _)>")]
x: HashMap<Foo, f64>,
}
Which will serialize to (and deserialize from) this:
{
"x":[
[{"x": 0}, 0.0],
[{"x": 1}, 0.0],
[{"x": 2}, 0.0]
]
}
There is likely some overhead from converting the HashMap to Vec, but this can be very convenient.
According to JSONs specification, JSON keys must be strings. serde_json uses fmt::Display in here, for some non-string keys, to allow serialization of wider range of HashMaps. That's why HashMap<u64, f64> works as well as HashMap<String, f64> would. However, not all types are covered (Foo's case here).
That's why we need to provide our own Serialize implementation:
impl Display for Foo {
fn fmt(&self, f: &mut Formatter) -> std::fmt::Result {
write!(f, "{}", self.x)
}
}
impl Serialize for Bar {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut map = serializer.serialize_map(Some(self.x.len()))?;
for (k, v) in &self.x {
map.serialize_entry(&k.to_string(), &v)?;
}
map.end()
}
}
(playground)
I've found the bulletproof solution 😃
Extra dependencies not required
Compatible with HashMap, BTreeMap and other iterable types
Works with flexbuffers
The following code converts a field (map) to the intermediate Vec representation:
pub mod vectorize {
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::iter::FromIterator;
pub fn serialize<'a, T, K, V, S>(target: T, ser: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
T: IntoIterator<Item = (&'a K, &'a V)>,
K: Serialize + 'a,
V: Serialize + 'a,
{
let container: Vec<_> = target.into_iter().collect();
serde::Serialize::serialize(&container, ser)
}
pub fn deserialize<'de, T, K, V, D>(des: D) -> Result<T, D::Error>
where
D: Deserializer<'de>,
T: FromIterator<(K, V)>,
K: Deserialize<'de>,
V: Deserialize<'de>,
{
let container: Vec<_> = serde::Deserialize::deserialize(des)?;
Ok(T::from_iter(container.into_iter()))
}
}
To use it just add the module's name as an attribute:
#[derive(Debug, Serialize, Deserialize)]
struct MyComplexType {
#[serde(with = "vectorize")]
map: HashMap<MyKey, String>,
}
The remained part if you want to check it locally:
use anyhow::Error;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct MyKey {
one: String,
two: u16,
more: Vec<u8>,
}
#[derive(Debug, Serialize, Deserialize)]
struct MyComplexType {
#[serde(with = "vectorize")]
map: HashMap<MyKey, String>,
}
fn main() -> Result<(), Error> {
let key = MyKey {
one: "1".into(),
two: 2,
more: vec![1, 2, 3],
};
let mut map = HashMap::new();
map.insert(key.clone(), "value".into());
let instance = MyComplexType { map };
let serialized = serde_json::to_string(&instance)?;
println!("JSON: {}", serialized);
let deserialized: MyComplexType = serde_json::from_str(&serialized)?;
let expected_value = "value".to_string();
assert_eq!(deserialized.map.get(&key), Some(&expected_value));
Ok(())
}
And on the Rust playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bf1773b6e501a0ea255ccdf8ce37e74d
While all provided answers will fulfill the goal of serializing your HashMap to json they are ad hoc or hard to maintain.
One correct way to allow a specific data structure to be serialized with serde as keys in a map, is the same way serde handles integer keys in HashMaps (which works): They serialize the value to String. This has a few advantages; namely
Intermediate data-structure omitted,
no need to clone the entire HashMap,
easier maintained by applying OOP concepts, and
serialization usable in more complex structures such as MultiMap.
This can be done by manually implementing Serialize and Deserialize for your data-type.
I use composite ids for maps.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Proj {
pub value: u64,
}
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Doc {
pub proj: Proj,
pub value: u32,
}
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct Sec {
pub doc: Doc,
pub value: u32,
}
So now manually implementing serde serialization for them is kind of a hassle, so instead we delegate the implementation to the FromStr and From<Self> for String (Into<String> blanket) traits.
impl From<Doc> for String {
fn from(val: Doc) -> Self {
format!("{}{:08X}", val.proj, val.value)
}
}
impl FromStr for Doc {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match parse_doc(s) {
Ok((_, p)) => Ok(p),
Err(e) => Err(e.to_string()),
}
}
}
In order to parse the Doc we make use of nom. The parse functionality below is explained in their examples.
fn is_hex_digit(c: char) -> bool {
c.is_digit(16)
}
fn from_hex8(input: &str) -> Result<u32, std::num::ParseIntError> {
u32::from_str_radix(input, 16)
}
fn parse_hex8(input: &str) -> IResult<&str, u32> {
map_res(take_while_m_n(8, 8, is_hex_digit), from_hex8)(input)
}
fn parse_doc(input: &str) -> IResult<&str, Doc> {
let (input, proj) = parse_proj(input)?;
let (input, value) = parse_hex8(input)?;
Ok((input, Doc { value, proj }))
}
Now we need to hook up self.to_string() and str::parse(&str) to serde we can do this using a simple macro.
macro_rules! serde_str {
($type:ty) => {
impl Serialize for $type {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let s: String = self.clone().into();
serializer.serialize_str(&s)
}
}
impl<'de> Deserialize<'de> for $type {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
paste! {deserializer.deserialize_string( [<$type Visitor>] {})}
}
}
paste! {struct [<$type Visitor>] {}}
impl<'de> Visitor<'de> for paste! {[<$type Visitor>]} {
type Value = $type;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
formatter.write_str("\"")
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
match str::parse(v) {
Ok(id) => Ok(id),
Err(_) => Err(serde::de::Error::custom("invalid format")),
}
}
}
};
}
Here we are using paste to interpolate the names. Beware that now the struct will always serialize as defined above. Never as a struct, always as a string.
It is important to implement fn visit_str instead of fn visit_string because visit_string defers to visit_str.
Finally, we have to call the macro for our custom structs
serde_str!(Sec);
serde_str!(Doc);
serde_str!(Proj);
Now the specified types can be serialized to and from string with serde.
I have a structure that looks somewhat like this:
pub struct MyStruct {
data: Arc<Mutex<HashMap<i32, Vec<i32>>>>,
}
I can easily get a lock on the mutex and query the underlying HashMap:
let d = s.data.lock().unwrap();
let v = d.get(&1).unwrap();
println!("{:?}", v);
Now I want to make a method to encapsulate the querying, so I write something like this:
impl MyStruct {
pub fn get_data_for(&self, i: &i32) -> &Vec<i32> {
let d = self.data.lock().unwrap();
d.get(i).unwrap()
}
}
This fails to compile because I'm trying to return a reference to the data under a Mutex:
error: `d` does not live long enough
--> <anon>:30:9
|
30 | d.get(i).unwrap()
| ^
|
note: reference must be valid for the anonymous lifetime #1 defined on the block at 28:53...
--> <anon>:28:54
|
28 | pub fn get_data_for(&self, i: &i32) -> &Vec<i32> {
| ^
note: ...but borrowed value is only valid for the block suffix following statement 0 at 29:42
--> <anon>:29:43
|
29 | let d = self.data.lock().unwrap();
| ^
I can fix it by wrapping the HashMap values in an Arc, but it looks ugly (Arc in Arc) and complicates the code:
pub struct MyStruct {
data: Arc<Mutex<HashMap<i32, Arc<Vec<i32>>>>>,
}
What is the best way to approach this? Is it possible to make a method that does what I want, without modifying the data structure?
Full example code.
The parking_lot crate provides an implementation of Mutexes that's better on many accounts than the one in std. Among the goodies is MutexGuard::map, which implements an interface similar to owning_ref's.
use std::sync::Arc;
use parking_lot::{Mutex, MutexGuard, MappedMutexGuard};
use std::collections::HashMap;
pub struct MyStruct {
data: Arc<Mutex<HashMap<i32, Vec<i32>>>>,
}
impl MyStruct {
pub fn get_data_for(&self, i: &i32) -> MappedMutexGuard<Vec<i32>> {
MutexGuard::map(self.data.lock(), |d| d.get_mut(i).unwrap())
}
}
You can try it on the playground here.
This solution is similar to #Neikos's, but using owning_ref to do hold the MutexGuard and a reference to the Vec:
extern crate owning_ref;
use std::sync::Arc;
use std::sync::{Mutex,MutexGuard};
use std::collections::HashMap;
use std::vec::Vec;
use owning_ref::MutexGuardRef;
type HM = HashMap<i32, Vec<i32>>;
pub struct MyStruct {
data: Arc<Mutex<HM>>,
}
impl MyStruct {
pub fn new() -> MyStruct {
let mut hm = HashMap::new();
hm.insert(3, vec![2,3,5,7]);
MyStruct{
data: Arc::new(Mutex::new(hm)),
}
}
pub fn get_data_for<'ret, 'me:'ret, 'c>(&'me self, i: &'c i32) -> MutexGuardRef<'ret, HM, Vec<i32>> {
MutexGuardRef::new(self.data.lock().unwrap())
.map(|mg| mg.get(i).unwrap())
}
}
fn main() {
let s: MyStruct = MyStruct::new();
let vref = s.get_data_for(&3);
for x in vref.iter() {
println!("{}", x);
}
}
This has the advantage that it's easy (through the map method on owning_ref) to get a similar reference to anything else reachable from the Mutex (an individual item in a Vec, etc.) without having to re-implement the returned type.
This can be made possible by using a secondary struct that implements Deref and holds the MutexGuard.
Example:
use std::sync::{Arc, Mutex, MutexGuard};
use std::collections::HashMap;
use std::ops::Deref;
pub struct Inner<'a>(MutexGuard<'a, HashMap<i32, Vec<i32>>>, i32);
impl<'a> Deref for Inner<'a> {
type Target = Vec<i32>;
fn deref(&self) -> &Self::Target {
self.0.get(&self.1).unwrap()
}
}
pub struct MyStruct {
data: Arc<Mutex<HashMap<i32, Vec<i32>>>>,
}
impl MyStruct {
pub fn get_data_for<'a>(&'a self, i: i32) -> Inner<'a> {
let d = self.data.lock().unwrap();
Inner(d, i)
}
}
fn main() {
let mut hm = HashMap::new();
hm.insert(1, vec![1,2,3]);
let s = MyStruct {
data: Arc::new(Mutex::new(hm))
};
{
let v = s.get_data_for(1);
println!("{:?}", *v);
let x : Vec<_> = v.iter().map(|x| x * 2).collect();
println!("{:?}", x); // Just an example to see that it works
}
}
As described in Why can't I store a value and a reference to that value in the same struct?, the Rental crate allows for self-referential structs in certain cases. Here, we bundle the Arc, the MutexGuard, and the value all into a struct that Derefs to the value:
#[macro_use]
extern crate rental;
use std::{
collections::HashMap, sync::{Arc, Mutex},
};
use owning_mutex_guard_value::OwningMutexGuardValue;
pub struct MyStruct {
data: Arc<Mutex<HashMap<i32, Vec<i32>>>>,
}
impl MyStruct {
pub fn get_data_for(&self, i: &i32) -> OwningMutexGuardValue<HashMap<i32, Vec<i32>>, Vec<i32>> {
OwningMutexGuardValue::new(
self.data.clone(),
|d| Box::new(d.lock().unwrap()),
|g, _| g.get(i).unwrap(),
)
}
}
rental! {
mod owning_mutex_guard_value {
use std::sync::{Arc, Mutex, MutexGuard};
#[rental(deref_suffix)]
pub struct OwningMutexGuardValue<T, U>
where
T: 'static,
U: 'static,
{
lock: Arc<Mutex<T>>,
guard: Box<MutexGuard<'lock, T>>,
value: &'guard U,
}
}
}
fn main() {
let mut data = HashMap::new();
data.insert(1, vec![1, 2, 3]);
let s = MyStruct {
data: Arc::new(Mutex::new(data)),
};
let locked_data = s.get_data_for(&1);
let total: i32 = locked_data.iter().map(|x| x * 2).sum();
println!("{}", total);
assert!(s.data.try_lock().is_err());
drop(locked_data);
assert!(s.data.try_lock().is_ok());
}
Here's an implementation of the closure-passing approach mentioned in the comments:
impl MyStruct {
pub fn with_data_for<T>(&self, i: &i32, f: impl FnOnce(&Vec<i32>) -> T) -> Option<T> {
let map_guard = &self.data.lock().ok()?;
let vec = &map_guard.get(i)?;
Some(f(vec))
}
}
Rust Playground
Example usage:
s.with_data_for(&1, |v| {
println!("{:?}", v);
});
let sum: i32 = s.with_data_for(&1, |v| v.iter().sum()).unwrap();
println!("{}", sum);