Additive deserializing with Serde

Additive deserializing with Serde - rust

I'd like to additively deserialize multiple files over the same data structure, where "additively" means that each new file deserializes by overwriting the fields that it effectively contains, leaving unmodified the ones that it does not. The context is config files; deserialize an "app" config provided by the app, then override it with a per-"user" config file.
I use "file" hear for the sake of clarity; this could be any deserializing data source.
Note: After writing the below, I realized maybe the question boils down to: is there a clever use of #[serde(default = ...)] to provide a default from an existing data structure? I'm not sure if that's (currently) possible.
Example
Data structure
struct S {
x: f32,
y: String,
}
"App" file (using JSON for example):
{ "x": 5.0, "y": "app" }
"User" file overriding only "y":
{ "y": "user" }
Expected deserializing (app, then user):
assert_eq!(s.x, 5.0);
assert_eq!(s.y, "user");
Expected solution
I'm ignoring on purpose any "dynamic" solution storing all config settings into, say, a single HashMap; although this works and is flexible, this is fairly inconvenient to use at runtime, and potentially slower. So I'm calling this approach out of scope for this question.
Data structure can contain other structs. Avoid having to write too many per-struct code manually (like implementing Deserialize by hand). A typical config file for a moderate-sized app can contains hundreds of settings, I don't want the burden of having to maintain those.
All fields can be expected to implement Default. The idea is that the first deserialized file would fallback on Default::default() for all missing fields, while subsequent ones would fallback on already-existing values if not explicitly overridden in the new file.
Avoid having to change every single field of every single struct to Option<T> just for the sake of serializing/deserializing. This would make runtime usage very painful, where due to above property there would anyway be no None value ever once deserialization completed (since, if a field is missing from all files, it defaults to Default::default() anyway).
I'm fine with a solution containing only a fixed number (2) of overriding files ("app" and "user" in example above).
Current partial solution
I know how to do the first part of falling back to Default; this is well documented. Simply use #[serde(default)] on all structs.
One approach would be to simply deserialize twice with #[serde(default)] and override any field which is equal to its default in the app config with its value in the user config. But this 1) probably requires all fields to implement Eq or PartialEq, and 2) is potentially expensive and not very elegant (lose the info during deserialization, then try to somehow recreate it).
I have a feeling I possibly need a custom Deserializer to hold a reference/value of the existing data structure, which I would fallback to when a field is not found, since the default one doesn't provide any user context when deserializing. But I'm not sure how to keep track of which field is currently being deserialized.
Any hint or idea much appreciated, thanks!

Frustratingly, serde::Deserialize has a method called deserialize_in_place that is explicitly omitted from docs.rs and is considered "part of the public API but hidden from rustdoc to hide it from newbies". This method does exactly what you're asking for (deserialize into an existing &mut T object), especially if you implement it yourself to ensure that only provided keys are overridden and other keys are ignored.

Related

Dealing with `Options` and defaults when parsing in TOML structs with Rust+Serde

I have been working on configuration parsing code and wondered if you could help me to pick the best types for this code.
I am trying to parse the following TOML:
[default]
a=10
b="abc"
[section1]
a = 78
[section2]
b="xyz"
The types of keys are the same in each section and each field follows the chain of defaults: sectionX.value => default.value => default value hardcoded in Rust via x.value.or(default.value).or(Some(...) for each field.
The most straightforward way to declare it in Rust (including serde attributes)
struct Section{
a: Option<usize>,
b: Option<String>,
}
The problem is that I want to parse all defaults first, and then use a fully materialized struct with no unassigned values in my code.
struct Section{
a: usize,
b: String,
}
I can use the following approaches:
Use original Section struct and always to unwrap/expect because I "know" the defaults have been assigned in the config parsing code. I make a mistake, I can catch panic and that does not look tidy. I'd like to leverage more help from the compiler.
Use the second Section struct (the one that has no Options). I can assign defaults via Serde annotations, but then I am loosing the signal that something was unspecified and needs a default from another level.
Declare both variants of the struct. This is the most verbose variant and I will look even more verbose when I grow 20+ fields or embedded structs in the config.
Generated Solution 3 via macros or some clever typing. Is there a crate for this? Maybe Section could have some generic type that can be Option in one place, but then "newtype" wrapper with a single value somewhere else?
Something else?
Some of the solutions above would work alright, but maybe there is a known elegant solution.

Why can't I use C#9's "with" keyword to create a copy of structs (like with records)

C# 9 has a new feature. A record type. The record is just a class, but with a bunch of automatically created functions and properties. But basically the idea (as I undstand it) was, a class that behaves like structs, for things like copying, coimparison with Equals, immutibility and so on.
Also with the record type was a new feature with the keyword "with". To create a copy of a record, you can write something like that: var copy = original with { Property = new_value, };
Now I wondered, if records were designt to behave like structs (but are classes). Why doesn't the new "with" keyword works also with structs. I mean, as far as I can tell, structs have all features, that are necessary for this feature. Like they are copied by value.
Instead to use similar features for structs, I have to write a copy constructor and can then write: var copy = new StructType(original) { Property = new_value, };

Short answer:
That's how the feature was designed.
Long answer:
The compiler creates a synthesized clone method with a reserved name <Clone>$, when you use with keyword, the compiler calls this clone method, and then modifies whatever properties you want to modify.
structs or classes doesn't have a synthesized clone method. Hence, with can't be used with them.
You may want to write a language proposal to extend the usage of with keyword.
Edit:
Currently, there is a proposal for allowing record structs. See Proposal: record structs for more information. This is what you may want.

SCIM PATCH library

I am implementing SCIM provisioning for my current project, and I am trying to implement the PATCH method and it seems not that easy.
What I read in the RFC is that SCIM PATCH is almost like JSON PATCH, but when I look deeper it seems a bit different on how the path is described which doesn't allow me to use json-patch libraries.
example:
"path":"addresses[type eq \"work\"]"
"path":"members[value eq
\"2819c223-7f76-453a-919d-413861904646\"]"
Do you know any library that is doing SCIM PATCH out of the box?
My project is currently a node project, but I don't care about the language I can rewrite it in javascript if needed.
Edit
I have finally create my own library for that, it is called scim-patch and it is available on npm https://www.npmjs.com/package/scim-patch

I implement SCIM PATCH operation in my own library. Please take a look here and here. It is currently a work in progress for v2, but the CRUD capability required by patch operations has matured.
First of all, you need a way to parse the SCIM path, which can optionally include a filter. I implement a finite state machine to parse the path and filter. A scanner would go through each byte of the text and point out interesting events, and a parser would use the scanner to break the text into meaningful tokens. For instance, emails[value eq "foo#bar.com"].type can be broken down to emails, [, eq, "foo#bar.com", ] and type. Finally, a compiler will take these token inputs and assemble it into an abstract syntax tree. On paper, it will look something like the following:
emails -> eq -> type
/ \
value "foo#bar.com"
Next, you need a way to traverse the resource data structure according to the abstract syntax tree. I designed my property model to carry a reference to the SCIM attribute. Consider the following resource:
{
"schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
"userName": "imulab",
"emails": [
{
"value": "foo#bar.com",
"type": "work"
},
{
"value": "bar#foo.com",
"type": "home"
}
]
}

I start traversing from the root of the resource and find the child called emails, which will return a multiValued property of complex type. I see my next token (eq) is the root of a filter, so I perform the filter operations on the two elements of emails. For each element, I go down the value child and evaluate its value. Since only the first element matches the filter, I finally go down the type child of that complex property and arrive at the target property. From there, you are free to perform Add, Replace and Remove operations.
There are two things I recommend to watch out.
One thing is that you traversing path will split when you hit a multiValued property. In the above example, we only have one elements that matched the filter. In reality, we may have many matches, or there could be no filter at all, forcing you to traverse all elements.
The other is the syntax of the SCIM path. The specification mandates that it is possible to prefix the schema URN in front the actual paths and delimit them with a :. So in that representation, emails.type and urn:ietf:params:scim:schemas:core:2.0:User:emails.type are actual equivalents. Note that the schema URN contains dots (.) in the 2.0 part. This creates further complication that now you cannot simply delimit the text by . and hope to get all correct tokens. I use a Trie data structure to record all schema URNs as reserved words. Whenever I start a new segment in the path, I will try to match it in the Trie and not solely rely on the . to terminate the segment.
Hope it will help your work.

Have a look at scim2-filter-parser: https://github.com/15five/scim2-filter-parser
It is a library mainly used by the authors' django-scim2 library: https://github.com/15five/django-scim2
It relies on python AST objects, but I think you should get some takeaways from there.

Since I did not found any typescript library to implement scim patch operations, I have implemented my own library.
You can find it here: https://www.npmjs.com/package/scim-patch

How to split long struct tags in golang?

Let's say I have following struct where valid is for validation of struct with custom messages for each validator (specially govalidator).
type Login struct {
Email string `json:"email" valid:"required~Email is required,email~The email address provided is not valid"`
Password string `json:"password" valid:"required~Password is required,stringlength(6|40)~Password length must be between 6 and 40"`
}
After adding a few validator, line is too long and not maintainable.
I want to split into new lines but not supported by go and not compatible with reflect.StructTag.Get.
However, according to my testing, validator works with multiline struct tags but vet fails.
Short, what is the correct way to split long struct tags ?

As you noted, the convention expected by StructTag.Get() does not allow using newline characters in struct tags (if you do not follow the convention, StructTag.Get() will not work properly). In my opinion that is just too much stuff being squeezed into a single tag value.
If you want to store that much meta info about your structures, I would store it outside of struct tags, properly modeled by other structs, so they can be accessed / processed in a type-safe manner.
If you have no choice and you do need to put that much info into a single tag, then you have to choose between the convenience of using the ready StructTag.Get() method, or omit the convention, use whatever format you want to in the struct tags, and simply implement your own tag-parsing logic.

Sharing weak trait object references

I'm trying to provide "views" of non-owned structs to separate components of a system.
Assume a set of traits with distinct methods: Drawable, Modifiable and a number of structs which implement at least one of the traits - SimpleBox, Panel, Expression.
Different components of the system will need to frequently access sequences of these objects, using methods of specific traits; consider a DrawingManager or a ModifyManager:
struct DrawingManager {
items: Vec<Weak<Drawable>>,
}
struct ModifyManager {
items: Vec<Weak<Modifiable>>
}
While a single object may be referenced in both managers, assume that there is a separate single owner of all structs:
struct ObjectManager {
boxes: Vec<Rc<Box>>,
panels: Vec<Rc<Panel>>,
expressions: Vec<Rc<Expression>>,
}
Ideally, it would be useful to be able to manage deleting structs from one place - i.e simply removing it from the ObjectManager being enough to invalidate references in all other components (hence the use of Weak).
Is there a way of doing this?
Is this the correct way to achieve this?
Is there a more idiomatic way of implementing this functionality?
The system contains several traits, so making a single trait using methods of all the other traits seems like a bad idea. Several traits have more than one method, so replacing them with closures is not possible.
What I have tried
As one object may produce one or more Rc<Trait>, we might envision implementing this with a HashMap<ID, Vec<Rc<Any>>> whereby we make each struct have a unique ID, which maps to a list of all Rc that have been made for it.
When we want to remove an object, we remove it from the corresponding list, and remove the entry in the hashmap, invalidating all Weak references.
However, implementing this fails, as to insert into the HashMap, one must upcast a Rc<Trait> -> Rc<Any>, only to downcast it later.

I'm not sure if this is the idiomatic way of doing this, but I've since developed a crate providing this functionality - dependent_view.
Using the crate, the initial problem can be solved by using DependentRc instead of plain Rc's:
struct ObjectManager {
boxes: Vec<DependentRc<Box>>,
panels: Vec<DependentRc<Panel>>,
expressions: Vec<DependentRc<Expression>>
}
let object_manager : ObjectManager = ObjectManager::new();
Then using macros provided by the crate, we can obtain Weak<> references to these structs:
let box_view : Weak<Drawable> = to_view!(object_manager.boxes[0]);
let panel_view : Weak<Drawable> = to_view!(object_manager.panels[0]);
let expression_view : Weak<Drawable> = to_view!(object_manager.expressions[0]);
With this, dropping the corresponding DependentRc<> will invalidate all Weak<> references that have been made of it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string