Functionally creating a nested object from a flat structure

Functionally creating a nested object from a flat structure - rust

I am attempting to turn a flat structure like the following:
let flat = vec![
Foo {
a: "abc1".to_owned(),
b: "efg1".to_owned(),
c: "yyyy".to_owned(),
d: "aaaa".to_owned(),
},
Foo {
a: "abc1".to_owned(),
b: "efg2".to_owned(),
c: "zzzz".to_owned(),
d: "bbbb".to_owned(),
}];
into a nested JSON object through serde_json that looks something like:
{
"abc1": {
"efg1": {
"c": "hij1",
"d": "aaaa",
},
"efg2": {
"c": "zzzz",
"d": "bbbb",
},
}
}
(The values b are guaranteed to be unique within the array)
If I had needed only one layer, I would do something like this:
let map = flat.into_iter().map(|input| (input.a, NewType {
b: input.b,
c: input.c,
d: input.d,
})).collect::<Hashmap<String, NewType>>();
let out = serde_json::to_string(map).unwrap();
However, this doesn't seem to scale to multiple layers (i.e. (String, (String, NewType)) can't collect into Hashmap<String, Hashmap<String, NewType>>)
Is there a better way than manually looping and inserting entries into the hashmaps, before turning them into json?

A map will preserve the shape of the data. That is not what you want; the cardinality of the data has been changed after the transformation. So a mere map won't be sufficient.
Instead, a fold will do: you start with an empty HashMap, and populate it as you iterate through the collection. But it is hardly any more readable than a loop in this case. I find a multimap is quite useful here:
use multimap::MultiMap;
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
c: String,
d: String,
}
#[derive(Debug)]
struct NewFoo {
c: String,
d: String,
}
fn main() {
let flat = vec![
Foo {
a: "abc1".to_owned(),
b: "efg1".to_owned(),
c: "yyyy".to_owned(),
d: "aaaa".to_owned(),
},
Foo {
a: "abc1".to_owned(),
b: "efg2".to_owned(),
c: "zzzz".to_owned(),
d: "bbbb".to_owned(),
},
];
let map = flat
.into_iter()
.map(|e| (e.a, (e.b, NewFoo { c: e.c, d: e.d })))
.collect::<MultiMap<_, _>>()
.into_iter()
.map(|e| (e.0, e.1.into_iter().collect::<HashMap<_, _>>()))
.collect::<HashMap<_, _>>();
println!("{:#?}", map);
}

If you need to do something custom to flatten/merge your Foo structure, you could turn it into json Values in your rust code using something this:
let mut root: Map<String, Value> = Map::new();
for foo in flat.into_iter() {
let b = json!({ "c": foo.c, "d": foo.d });
if let Some(a) = root.get_mut(&foo.a) {
if let Value::Object(map) = a {
map.insert(foo.b, b);
}
} else {
root.insert(foo.a, json!({foo.b: b}));
}
};
link to playground

Related

Skip empty objects when deserializing array with serde

I need to deserialize an array (JSON) of a type let call Foo. I have implemented this and it works well for most stuff, but I have noticed the latest version of the data will sometimes include erroneous empty objects.
Prior to this change, each Foo can be de-serialized to the following enum:
#[derive(Deserialize)]
#[serde(untagged)]
pub enum Foo<'s> {
Error {
// My current workaround is using Option<Cow<'s, str>>
error: Cow<'s, str>,
},
Value {
a: u32,
b: i32,
// etc.
}
}
/// Foo is part of a larger struct Bar.
#[derive(Deserialize)]
#[serde(untagged)]
pub struct Bar<'s> {
foos: Vec<Foo<'s>>,
// etc.
}
This struct may represent one of the following JSON values:
// Valid inputs
[]
[{"a": 34, "b": -23},{"a": 33, "b": -2},{"a": 37, "b": 1}]
[{"error":"Unable to connect to network"}]
[{"a": 34, "b": -23},{"error":"Timeout"},{"a": 37, "b": 1}]
// Possible input for latest versions of data
[{},{},{},{},{},{},{"a": 34, "b": -23},{},{},{},{},{},{},{},{"error":"Timeout"},{},{},{},{},{},{}]
This does not happen very often, but it is enough to cause issues. Normally, the array should include 3 or less entries, but these extraneous empty objects break that convention. There is no meaningful information I can gain from parsing {} and in the worst cases there can be hundreds of them in one array.
I do not want to error on parsing {} as the array still contains other meaningful values, but I do not want to include {} in my parsed data either. Ideally I would also be able to use tinyvec::ArrayVec<[Foo<'s>; 3]> instead of a Vec<Foo<'s>> to save memory and reduce time spent performing allocation during paring, but am unable to due to this issue.
How can I skip {} JSON values when deserializing an array with serde in Rust?
I also put together a Rust Playground with some test cases to try different solutions.

serde_with::VecSkipError provides a way to ignore any elements which fail deserialization, by skipping them. This will ignore any errors and not only the empty object {}. So it might be too permissive.
#[serde_with::serde_as]
#[derive(Deserialize)]
pub struct Bar<'s> {
#[serde_as(as = "serde_with::VecSkipError<_>")]
foos: Vec<Foo<'s>>,
}
Playground

The simplest, but not performant, solution would be to define an enum that captures both the Foo case and the empty case, deserialize into a vector of those, and then filter that vector to get just the nonempty ones.
#[derive(Deserialize, Debug)]
#[serde(untagged)]
pub enum FooDe<'s> {
Nonempty(Foo<'s>),
Empty {},
}
fn main() {
let json = r#"[
{},{},{},{},{},{},
{"a": 34, "b": -23},
{},{},{},{},{},{},{},
{"error":"Timeout"},
{},{},{},{},{},{}
]"#;
let foo_des = serde_json::from_str::<Vec<FooDe>>(json).unwrap();
let foos = foo_des
.into_iter()
.filter_map(|item| {
use FooDe::*;
match item {
Nonempty(foo) => Some(foo),
Empty {} => None,
}
})
.collect();
let bar = Bar { foos };
println!("{:?}", bar);
// Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}
Conceptually this is simple but you're allocating a lot of space for Empty cases that you ultimately don't need. Instead, you can control exactly how deserialization is done by implementing it yourself.
struct BarVisitor<'s> {
marker: PhantomData<fn() -> Bar<'s>>,
}
impl<'s> BarVisitor<'s> {
fn new() -> Self {
BarVisitor {
marker: PhantomData,
}
}
}
// This is the trait that informs Serde how to deserialize Bar.
impl<'de, 's: 'de> Deserialize<'de> for Bar<'s> {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
impl<'de, 's: 'de> Visitor<'de> for BarVisitor<'s> {
// The type that our Visitor is going to produce.
type Value = Bar<'s>;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("a list of objects")
}
fn visit_seq<V>(self, mut access: V) -> Result<Self::Value, V::Error>
where
V: SeqAccess<'de>,
{
let mut foos = Vec::new();
while let Some(foo_de) = access.next_element::<FooDe>()? {
if let FooDe::Nonempty(foo) = foo_de {
foos.push(foo)
}
}
let bar = Bar { foos };
Ok(bar)
}
}
// Instantiate our Visitor and ask the Deserializer to drive
// it over the input data, resulting in an instance of Bar.
deserializer.deserialize_seq(BarVisitor::new())
}
}
fn main() {
let json = r#"[
{},{},{},{},{},{},
{"a": 34, "b": -23},
{},{},{},{},{},{},{},
{"error":"Timeout"},
{},{},{},{},{},{}
]"#;
let bar = serde_json::from_str::<Bar>(json).unwrap();
println!("{:?}", bar);
// Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}

Efficiently get all items of a vector with a given id

Say I have a vector of items where each item has an id, like in the example below. I can of course get all the items in the vector with a given id using something like large_vector.iter().filter(|item| item.id == given_id). However, for improved performance I can do some preprocessing and sort the vector by item id and store the bounds for each id, like in the example below. This way I can quickly access a slice of the vector for any given id. I end up doing this alot but feel like I am reinventing the wheel and needlessly opening myself up to bugs. Is there a better way to do this directly, preferably using the standard library else some other library?
use std::{collections::HashMap, ops::Range};
#[derive(Debug)]
struct Item {
id: String,
val: f64,
}
impl Item {
fn new(id: &str, val: f64) -> Item {
Item { id: id.into(), val }
}
}
fn main() {
let mut large_vector = vec![
Item::new("C", 2.21),
Item::new("A", 34.2),
Item::new("B", 23.54),
Item::new("C", 34.34),
Item::new("C", 45.21),
Item::new("B", 21.34),
];
// first sort by id
large_vector.sort_by(|item1, item2| item1.id.cmp(&item2.id));
dbg!(&large_vector);
// now create a HasMap storing bounds for each id
let mut lookup = HashMap::new();
let mut start: usize = 0;
let mut end: usize = 0;
if let Some(first_item) = large_vector.get(0) {
let mut current_id = first_item.id.clone();
// insert bound if entered new id section or is last item
for item in &large_vector {
if current_id != item.id {
lookup.insert(current_id.clone(), Range { start, end });
current_id = item.id.clone();
start = end;
}
end += 1;
}
lookup.insert(current_id.clone(), Range { start, end });
}
// test by getting the items for a given id
dbg!(&lookup);
let range = lookup.get("C").unwrap();
dbg!(range);
let items = large_vector[range.start..range.end]
.iter()
.collect::<Vec<_>>();
dbg!(items);
}
[src/main.rs:26] &large_vector = [
Item {
id: "A",
val: 34.2,
},
Item {
id: "B",
val: 23.54,
},
Item {
id: "B",
val: 21.34,
},
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]
[src/main.rs:47] &lookup = {
"A": 0..1,
"B": 1..3,
"C": 3..6,
}
[src/main.rs:49] range = 3..6
[src/main.rs:53] items = [
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]

Assuming that your items have to be in a vector, and you can only sort them, I can think of two possibilities:
The solution you proposed. It should be the fastest one for lookup, but has the drawback that the lookup tables get completely invalidated every time you insert/remove an item.
Keep the vector sorted and perform a log(n) based divide-and-conquer search to get the range. If you are interested in what I mean with that, I can provide you with some code.
But in general, I think a vector is simply the wrong data structure. I'd try to change that first.

Access struct field by variable

I want to iterate over over the fields of a struct and access its respective value for each iteration:
#[derive(Default, Debug)]
struct A {
foo: String,
bar: String,
baz: String
}
fn main() {
let fields = vec!["foo", "bar", "baz"];
let a: A = Default::default();
for field in fields {
let value = a[field] // this doesn't work
}
}
How can I access a field by variable?

Rust doesn't have any way of iterating directly over its fields. You should instead use a collection type such as Vec, array or one of the collections in std::collections if your data semantically represents a collection of some sort.
If you still feel the need to iterate over the fields, perhaps you need to re-consider your approach to your task and see if there isn't a more idiomatic/proper way to accomplish it

By using pattern matching, you can iterate over its fields.
#[derive(Default, Debug)]
struct A {
foo: String,
bar: String,
baz: String
}
impl A {
fn get(&self, field_string: &str) -> Result<&String, String> {
match field_string {
"foo" => Ok(&self.foo),
"bar" => Ok(&self.bar),
"baz" => Ok(&self.baz),
_ => Err(format!("invalid field name to get '{}'", field_string))
}
}
}
fn main() {
let fields = vec!["foo", "bar", "baz"];
let a = A {
foo: "value_of_foo".to_string(),
bar: "value_of_bar".to_string(),
baz: "value_of_baz".to_string()
};
for field in fields {
let value = a.get(field).unwrap();
println!("{:?}", value);
}
}
returns
"value_of_foo"
"value_of_bar"
"value_of_baz"
I am now writing a macro that implements such codes automatically for any struct, although there may be some bugs.
field_accessor (https://github.com/europeanplaice/field_accessor).
Cargo.toml
[dependencies]
field_accessor = "0"
use field_accessor::FieldAccessor;
#[derive(Default, Debug, FieldAccessor)]
struct A {
foo: String,
bar: String,
baz: String
}
fn main() {
let a = A {
foo: "value_of_foo".to_string(),
bar: "value_of_bar".to_string(),
baz: "value_of_baz".to_string()
};
for field in a.getstructinfo().field_names.iter() {
let value = a.get(field).unwrap();
println!("{:?}", value);
}
}
It also returns
"value_of_foo"
"value_of_bar"
"value_of_baz"

Based on the answer of sshashank124 I came to the conclusion that I should use an Hashmap instead of a struct:
fn main() {
let mut B = HashMap::new();
B.insert("foo", 1);
B.insert("bar", 2);
B.insert("baz", 3);
let fields = vec!["foo", "bar", "baz"];
for &field in &fields {
let value = B.get(field);
}
}

How to move values out of a vector when the vector is immediately discarded?

I am receiving data in the form of a string vector, and need to populate a struct using a subset of the values, like this:
const json: &str = r#"["a", "b", "c", "d", "e", "f", "g"]"#;
struct A {
third: String,
first: String,
fifth: String,
}
fn main() {
let data: Vec<String> = serde_json::from_str(json).unwrap();
let a = A {
third: data[2],
first: data[0],
fifth: data[4],
};
}
This doesn't work because I'm moving values out of the vector. The compiler believes that this leaves data in an uninitialized state that can cause problems, but because I never use data again, it shouldn't matter.
The conventional solution is swap_remove, but it is problematic because the elements are not accessed in reverse order (assuming the structure is populated top to bottom).
I solve this now by doing a mem::replace and having data as mut, which clutters this otherwise clean code:
fn main() {
let mut data: Vec<String> = serde_json::from_str(json).unwrap();
let a = A {
third: std::mem::replace(&mut data[2], "".to_string()),
first: std::mem::replace(&mut data[0], "".to_string()),
fifth: std::mem::replace(&mut data[4], "".to_string())
};
}
Is there an alternative to this solution that doesn't require me to have all these replace calls and data unnecessarily mut?

I've been in this situation, and the cleanest solution I've found was to create an extension:
trait Extract: Default {
/// Replace self with default and returns the initial value.
fn extract(&mut self) -> Self;
}
impl<T: Default> Extract for T {
fn extract(&mut self) -> Self {
std::mem::replace(self, T::default())
}
}
And in your solution, you can replace the std::mem::replace with it:
const JSON: &str = r#"["a", "b", "c", "d", "e", "f", "g"]"#;
struct A {
third: String,
first: String,
fifth: String,
}
fn main() {
let mut data: Vec<String> = serde_json::from_str(JSON).unwrap();
let _a = A {
third: data[2].extract(),
first: data[0].extract(),
fifth: data[4].extract(),
};
}
That's basically the same code, but it is much more readable.
If you like funny things, you can even write a macro:
macro_rules! vec_destruc {
{ $v:expr => $( $n:ident : $i:expr; )+ } => {
let ( $( $n ),+ ) = {
let mut v = $v;
(
$( std::mem::replace(&mut v[$i], Default::default()) ),+
)
};
}
}
const JSON: &str = r#"["a", "b", "c", "d", "e", "f", "g"]"#;
#[derive(Debug)]
struct A {
third: String,
first: String,
fifth: String,
}
fn main() {
let data: Vec<String> = serde_json::from_str(JSON).unwrap();
vec_destruc! { data =>
first: 0;
third: 2;
fifth: 4;
};
let a = A { first, third, fifth };
println!("{:?}", a);
}

In small cases like this (also seen in naïve command line argument processing), I transfer ownership of the vector into an iterator and pop all the values off, keeping those I'm interested in:
fn main() {
let data: Vec<String> = serde_json::from_str(json).unwrap();
let mut data = data.into_iter().fuse();
let first = data.next().expect("Needed five elements, missing the first");
let _ = data.next();
let third = data.next().expect("Needed five elements, missing the third");
let _ = data.next();
let fifth = data.next().expect("Needed five elements, missing the fifth");
let a = A {
third,
first,
fifth,
};
}
I'd challenge the requirement to have a vector, however. Using a tuple is simpler and avoids much of the error handling needed, if you have exactly 5 elements:
fn main() {
let data: (String, String, String, String, String) = serde_json::from_str(json).unwrap();
let a = A {
third: data.2,
first: data.0,
fifth: data.4,
};
}
See also:
How can I ignore extra tuple items when deserializing with Serde? ("trailing characters" error)

Another option is to use a vector of Option<String>. This allows us to move the values out, while keeping track of what values have been moved, so they are not dropped with the vector.
let mut data: Vec<Option<String>> = serde_json::from_str(json).unwrap();
let a = A {
third: data[2].take().unwrap(),
first: data[0].take().unwrap(),
fifth: data[4].take().unwrap(),
};

Is there a way to create a copy of an enum with some field values updated?

I have an enum with several record variants:
enum A {
Var1 { a: i64, b: i64 },
Var2 { c: i32, d: i32 },
}
I want to create a modified copy of such an enum (with different behavior for each variant). I know I can do this:
match a {
A::Var1 { a, b } => A::Var1 { a: new_a, b },
A::Var2 { c, d } => A::Var2 { c, d: new_d },
}
However each variant has quite a few fields, and I'd prefer not to explicitly pass them all. Is there any way to say "clone this enum, except use this value for field x instead of the cloned value"?

Not exactly.
There's "functional record update syntax", but it's only for structs:
struct Foo {
bar: u8,
baz: u8,
quux: u8,
}
fn foo() {
let foobar = Foo {
bar: 1,
baz: 2,
quux: 3,
};
let bazquux = Foo { baz: 4, ..foobar };
}
Best you can do without creating structs for each variant is something like this:
let mut altered = x.clone();
match &mut altered {
A::Var1 { a, .. } => *a = new_a,
A::Var2 { d, .. } => *d = new_d,
};
altered

I'm afraid you're hitting one of the restrictions on Rust based on its design and your only real solution is mutation in-place and writing a mutator function or four.
The problem with enums is that you need to match to be able to do anything on them. Until that point, Rust knows or infers very little about what the struct actually is. An additional problem is the lack of any kind of reflection-like ability to allow for the ability to query a type and figure out if it has a field, and the inability to do anything but exhaustively match all contents.
Honestly, the cleanest way may actually depend on the purpose of your mutations. Are they a defined set of changes to an enum based on a business concern of some sort? If so, you may actually want to wrap your logic into a trait extension and use that to encapsulate the logic.
Consider, for instance, a very contrived example. We're building an application that has to deal with different items and apply taxes to them. Said taxes depend on the type of products, and for some reason, all our products are represented by variants of an enum, like so:
#[derive(Debug)]
enum Item {
Food { price: u8, calories: u8 },
Technology { price: u8 },
}
trait TaxExt {
fn apply_tax(&mut self);
}
impl TaxExt for Item {
fn apply_tax(&mut self) {
match self {
&mut Item::Food {
ref mut price,
calories: _,
} => {
// Our food costs double, for tax reasons
*price *= 2;
}
&mut Item::Technology { ref mut price } => {
// Technology has a 1 unit tax
*price += 1;
}
}
}
}
fn main() {
let mut obj = Item::Food {
price: 3,
calories: 200,
};
obj.apply_tax();
println!("{:?}", obj);
}
playground
Assuming you can split your logic like so, it is probably the cleanest way to structure this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Functionally creating a nested object from a flat structure - rust

Related

Skip empty objects when deserializing array with serde

Efficiently get all items of a vector with a given id

Access struct field by variable

How to move values out of a vector when the vector is immediately discarded?

Is there a way to create a copy of an enum with some field values updated?

Categories

Resources