Processing a large dataset of nested data - yamldotnet

I have a rather large set of data which is structured in a somewhat unique fashion. It looks something like this:
foo:
- name: "some name"
location: "some location"
type: "someType"
bar:
- name: "A bar element"
location: "location here"
type: "someOtherType"
attachments:
- type: "attachmentTypeA"
name: "Attachment name"
- type: "attachmentTypeB"
name: "Attachment name"
baz:
- name: "another name"
location: "another location"
type: "anotherType"
qux:
- name: "My name here"
location: "My location here"
type: "SomeOtherTypeHere"
xyzzy:
- name: "Another name here"
location: "Another location here"
type: "anotherTypeHere"
bar:
- name: "Some name here"
location: "Some location here"
type: "typeHere"
attachments:
- type: "attachmentTypeA"
name: "attachment name here"
- type: "attachmentTypeA"
name: "attachment name here"
- type: "attachmentTypeB"
name: "attachment name here"
- name: "Another name here"
location: "Another location here"
type: "anotherTypeHere"
attachments:
- type: "attachmentTypeA"
name: "attachment name here"
- type: "attachmentTypeC"
name: "attachment name here"
- type: "attachmentTypeD"
name: "attachment name here"
- name: "Another baz listing"
location: "Baz location"
type: "bazTypeHere"
So basically, you have "foo" at the top level (and there can be more than one foo, but always at the top level). In general, the structure is:
foo > baz > qux > xyzzy > bar
However, any of the sub elements can be at the root, or under foo, provided they are in order. So these are valid:
foo
qux
xyzzy
bar
attachments
bar
attachments
As is this:
foo
baz
qux
xyzzy
bar
attachments
bar
attachments
qux
xyzzy
bar
attachments
bar
attachments
xyzzy
bar
attachments
bar
attachments
And so on. It's whacky, I know. But that's the dataset I inherited. I looked at the examples, in particular the DeserializeObjectGraph and LoadingAYamlStream examples. The DeserializeObjectGraph approach gets kind of crazy when the data is laid out like this. I finally gave up on it as it just got too hairy. The stream approach seems like a better fit, I think, but I'm running into troubles.
I am loading up the YAML as follows:
string contents = System.IO.File.ReadAllText ( fileName );
var input = new StringReader (contents);
var yaml = new YamlStream ();
yaml.Load (input);
As you can see, nothing fancy there. I'm just trying to get a "tree" of objects that I can then iterate through. I tried using the AllNodes property from the root node, but I can't for the life of me figure out how to iterate through them recursively in some manner than makes sense. I will also confess that I am a C# n00btard that is still learning (old C guy here), so bear with me!
Can anyone suggest an approach, or possibly some code or even pseudocode that might be able to help me out?

Related

How to Perform UPSERT Operation in Arango DB with Different multiple keys (Composite Key)?

In official documentations, it's already shown how to do that. Below, an example that working fine:
Example: 1
LET documents = [
{ name: 'Doc 1', value: 111, description: 'description 111' },
{ name: 'Doc 2', value: 222, description: 'description 2' },
{ name: 'Doc 3', value: 333, description: 'description 3' }
]
FOR doc IN documents
UPSERT { name: doc.name, description: doc.description }
INSERT doc
UPDATE doc
IN MyCollection
But, I want to check different multiple keys for each document on UPSERT, like:
Example: 2
LET documents = [
{ name: 'Doc 1', value: 777, description: 'description 111' },
{ name: 'Doc 2', value: 888, description: 'description 2' },
{ name: 'Doc 3', value: 999, description: 'description 3' }
]
FOR doc IN documents
UPSERT {
{ name: doc.name, description: doc.description },
{ value: doc.value, description: doc.description },
{ name: doc.name, value: doc.value }
}
INSERT doc
UPDATE doc
IN MyCollection
Or, any other other way (using filter or something). I had tried but nothing works
If I understand your problem, you would want to update a document, if there's an existing one with at least 2 fields matching, otherwise insert it as new.
UPSERT won't be able to do that. It can only do one match. So a subquery is necessary. In the solution below, I ran a query to find the key of the first document that matches at least 2 fields. If there's no such document then it will return null.
Then the UPSERT can work by matching the _key to that.
LET documents = [
{ name: 'Doc 1', value: 777, description: 'description 111' },
{ name: 'Doc 2', value: 888, description: 'description 2' },
{ name: 'Doc 3', value: 999, description: 'description 3' }
]
FOR doc IN documents
LET matchKey= FIRST(
FOR rec IN MyCollection
FILTER (rec.name==doc.name) + (rec.value==doc.value) + (rec.description==doc.description) > 1
LIMIT 1
RETURN rec._key
)
UPSERT {_key:matchKey}
INSERT doc
UPDATE doc
IN MyCollection
Note: There's a trick with adding booleans together which works because true will be converted to 1, while false is zero. You can write it out explicitly like this: (rec.name==doc.name?1:0)
While this will work for you it's not a very effective solution. Actually there's no effective one in this case because all the existing documents need to be scoured through to find a matching one, for each document to be added/updated. I'm not sure what kind of problem you are trying to solve with this, but it might be better to re-think your design so that the matching condition could be more simple.

Is there any way to exclude null properties when using Groovy's YAMLBuilder?

I'm using Groovy to generate an openapi spec document in YAML. I'm using YamlBuilder to convert the object model to a YAML string.
It's been working well so far, but one issue I've noticed is that null properties are present in the YAML output. This is causing validation errors in the openapi validators I'm using, so I'd like to remove any null properties from the YAML output.
Is there any way to achieve this? I can't see it in the docs. The equivalent JSONBuilder allows config options to be set, is there such a thing for YamlBuilder?
The part of the script which generates the YAML looks like this:
def generateSpec() {
println "============================\nGenerating Customer SPI spec\n============================"
def components = generateComponents()
def paths = generatePaths()
def info = Info
.builder()
.title('Customer SPI')
.description('A customer API')
.version('0.1')
.build()
def customerSpec = [openapi: "3.0.3", components : components, info : info, paths : paths]
def yaml = new YamlBuilder()
yaml(customerSpec)
println(yaml.toString())
return yaml.toString()
}
Here's my current output. Note the null value of the format property on firstname, among others.
---
openapi: "3.0.3"
components:
schemas:
Customer:
type: "object"
properties:
firstname:
type: "string"
format: null
ArrayOfCustomers:
items:
$ref: "#/components/schemas/Customer"
info:
title: "Customer SPI"
version: "0.1"
description: "An API."
paths:
/customers:
parameters: null
get:
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/ArrayOfCustomers"
description: "An array of customers matching the search criteria"
summary: "Search customers"
/customers/{customerRef}:
parameters:
- required: true
schema:
type: "string"
format: null
description: "Customer reference"
in: "path"
name: "customerRef"
get:
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/Customer"
description: "A customer with the given reference"
summary: "Load a customer"

How to show from MongoDB

I want to take first row information from my DB and define to my solution array
(please look picture)
I have 3 columns (title, slogan and description)
And I want to define "my title", "my slogan", "my description" to my solution array
const solution = [
{
title: // it should be "my title",
slogan: // it should be "my slogan",
description: // it should be "my description"
}
];

Collections in Hugo data file using Netlify CMS

I'm trying to get my head round using Netlify CMS with Hugo ssg.
I use:
netlify-cms#1.0
hugo#0.29
I have a simple netlify-cms config.yml with two collections: posts and authors.
backend:
name: github
repo: sebhewelt/atlas
branch: master
display_url: https://mypage.com
publish_mode: editorial_workflow
media_folder: "static/uploads"
public_folder: "/uploads"
collections:
- label: "Posts"
name: "post"
folder: "content"
create: true
slug: "{{year}}-{{month}}-{{day}}-{{slug}}"
fields:
- { label: "Title", name: "title", widget: "string" }
- { label: "Publish Date", name: "date", widget: "datetime" , format: "YYYY-MM-DD hh:mma"}
- { label: "Body", name: "body", widget: "markdown" }
- label: "Authors"
name: "author"
folder: "data"
create: true
fields:
- {label: "Name", name: "name", widget: "string"}
- {label: "About", name: "about", widget: "string"}
The docs distinguish two collection types, of which I assume i should choose file collection, as I'd like to hold the authors data in one file.
I'd like to be able to add authors via admin dashboard and save it to file in data folder. The docs doesn't provide an example of how should the file holding the authors look like (Or does the cms make it automatically?).
I encounter an error with my current config. When I'm saving the "New Author" i get this:
Failed to persist entry: Error: Collection must have a field name that
is a valid entry identifier
Why do i get this error?
Your authors file needs to be under a top-level collection. Also, if you want to be able to add multiple authors to the file, you need to wrap the "name" and "about" widgets in a "list" type widget.
Example:
collections:
- label: "Settings"
name: "settings"
files:
- name: "authors"
label: "Authors"
file: "data/authors.yml"
extension: "yml"
fields:
- label: "Author"
name: "author"
widget: "list"
fields:
- {label: "Name", name: "name", widget: "string"}
- {label: "About", name: "about", widget: "string"}
CMS docs for file collections: https://www.netlifycms.org/docs/collection-types/#file-collections
CMS docs for list widgets: https://www.netlifycms.org/docs/widgets/#list

node.js+mongoose - How to implement matching (like order matching)?

I have a node.js+mongoose rest api. I have two schemas which needs to be matched as and when a new entry is added to either one or on a timely basis. The matching will compare the whole set of documents with a set of paramters. Let's say, for example, i have the below 2 schemas -
Males: {
age: Number,
location: String,
language: String,
matchedFemales: []
}
Females: {
age: Number,
location: String,
language: String,
matchedMales: []
}
Now, i have to take a collection and scroll through all the documents and find matches. I have lots of parameters as matching criteria, but let us take for an example, language, location should be same and the age is almost equal (+ or - 1 year). Like below -
Males: [{
id: 1001,
age: 20,
location: London,
language: English,
matchedFemales: [2001,2002]
},
{
id: 1002,
age: 30,
location: London,
language: English,
matchedFemales: []
},
{
id: 1003,
age: 20,
location: Madrid,
language: Spanish,
matchedFemales: [2003]
}]
Females: [{
id: 2001,
age: 20,
location: London,
language: English,
matchedFemales: [1001]
},
{
id: 2002,
age: 19,
location: London,
language: English,
matchedFemales: [1001]
},
{
id: 2003,
age: 20,
location: Madrid,
language: Spanish,
matchedFemales: [1003]
}]
How to perform this matching and how to store the matches?
Should i iterate though each document in Male collection and find matches in the Female collection and update it? IF so, i plan to have a service do it and call this service every X minutes. This job will be time and resource consuming (as it has to go through each document and find matches) but will be run for a definite number of times per day. If its every 5 mins, then it will be run only 12 times in an hour.
Instead of matching all documents in LHS against all documents in RHS, as and when a record is getting inserted, i can find matches just for that document and update it. This method will be less time & resource consuming than the previous method, but it will be run more number of times i.e., for every insert/update this has to be done.
Or is there any other elegant way to do this?
P.S - If this question seems inappropriate, kindly direct me to the right source for reference or consultation.

Resources