Most elegant way to get two properties at the same time - groovy

Suppose you have hierarchical data and want to obtain the merged value of separate properties, what is the most elegant, or groovy, way to do so?
The following example holds information about failed and skipped tests. Of course, it does make sense, that these values are separated - but for the use case of retrieving a list of all tests, that did not run successfully, I came across two possible solutions, which both of them did not satisfy me.
def junitResultList = [
[
name: "Testsuite A",
children: [
failedTests: ["Test 1", "Test 2"],
skippedTests: []
]
],
[
name: "Testsuite B",
children: [
failedTests: ["CursorTest"],
skippedTests: ["ClickTest", "DragNDropTest"]
]
]
]
To be more specific, I want the value to be ["Test 1", "Test 2", "CursorTest", "ClickTest", "DragNDropTest"].
The first approach was simply to perform an addition of the spread test lists:
(junitResultList*.children*.failedTests +
junitResultList*.children*.skippedTests).flatten()
While this works, it appeared to me that specifying the path to these properties twice seems not to be the most groovy way, so I came up with this horrible but somehow appealing disasterpiece:
(junitResultList*.children*.findAll {
['skippedTests', 'failedTests'].contains(it.key)
})*.values().flatten()

You can simplify your initial expression to something like this:
junitResultList.children.collect { it.failedTests + it.skippedTests }.flatten()
or
junitResultList.children.collect { [it.failedTests, it.skippedTests] }.flatten()

You can just do as below:
//Define the keys to find
def requiredKeys = ['failedTests', 'skippedTests']
println requiredKeys.collect{ junitResultList.children."$it"}.flatten()
You can quickly try the same online demo

You can get the subMap()s and then the values() on that:
junitResultList*.children*.subMap(["failedTests","skippedTests"])*.values().flatten()

Related

Lowercasing complex object field names in azure data factory data flow

I'm trying to lowercase the field names in a row entry in azure data flow. Inside a complex object I've got something like
{
"field": "sample",
"functions": [
{
"Name": "asdf",
"Value": "sdfsd"
},
{
"Name": "dfs",
"Value": "zxcv"
}
]
}
and basically what I want is for "Name" and "Value to be "name" and "value". However can't seem to use any expressions that will work for the nested fields of a complex object in the expression builder.
I've tried using a something like a select with a rule-based mapping that is the rule being 1 == 1 and lower($$), but with $$ it seems to only work for root columns of the complex object and not the nested fields inside.
As suggested by #Mark Kromer MSFT, for changing case of columns inside complex type select the functions in the Hierarchy level.
Please check the below for your reference:
Here, I have used both.
You can see the difference in results.

combine a list of dictionaries with one key value match

listofdicts = [
{
"if-e0": "e0",
"ip-add-e0": "192.168.1.1",
"name": "host1"
},
{
"if-e1": "e1",
"ip-add-e1": "192.168.2.1",
"name": "host1"
},
{
"if-e1": "e1",
"ip-add-e1": "172.16.1.1",
"name": "host2"
},
{
"if-e2": "e2",
"ip-add-e2": "172.16.2.1",
"name": "host2"
}]
Expected Result:
listofdicts = [
{
"if-e0": "e0",
"ip-add-e0": "192.168.1.1",
"if-e1": "e1",
"ip-add-e1": "192.168.2.1",
"name": "host1"
},
{
"if-e1": "e1",
"ip-add-e1": "172.16.1.1",
"if-e2": "e2",
"ip-add-e2": "172.16.2.1",
"name": "host2"
}]
Have been trying to make this work but no luck yet, actual list has more than 60K dicts with unique and matching hosts.
It could be easier to solve but for me, it's been a nightmare from past few hrs.
Appreciate your assistance.
Regards,
Avinash
Graph theory seems to be helpful here.
To solve this, you need to build a graph, where each vertex relates to one dictionary from your input list.
There should be an edge between two vertices if there is a common key-value pair in the corresponding dictionaries (more specifically, for dictionaries d1 and d2 there should be an edge if len(set(d1.items()).intersection(d2.items())) != 0 or, simpler, if set(d1.items()).intersection(d2.items()). The condition means there is at least one key-value pair in the intersection of the sets of items of d1 and d2).
After the graph is built, you need to find all the connectivity components (that's a pretty simple DFS (depth-first search), you can google it if you're not familiar with graph algorithms). Each connectivity component's dictionaries should be combined in one: there should be one resulting dictionary per component. The list of these resulting dictionaries is your answer.
Here is an example of how you combine some dictionaries:
connectivity_component_dicts = [{...}, ...]
resulting_dict = {**d for d in connectivity_component_dicts}
# Note that the greater the index of `d` in `connectivity_component_dicts`,
# the higher priority its keys have if there are same keys in some dicts.
#Kolay.Ne Hi, Hey guys,
It did work with a very basic catch. Graph method is fantastic to solve it although I used below approach n that worked:
for d in listofdicts:
x = listofdicts.index(d)
for y in range(len(listofdicts)):
k = 'name'
if y != x and y < len(listofdicts):
if listofdicts[x][k] == listofdicts[y][k]:
dc = copy.deepcopy(listofdicts[y])
listofdicts[x].update(dc)
listofdicts.remove(dc)
Could be other approaches to solve it, im sure pathonic way will be just couple of lines, this solved my problem for the job in hand.
Thank you to kolay.Ne for responding quickly and trying to assist, and graph method is fantastic as well, requires professional coding and for sure that will be more scalable.
a = []
for i in listofdicts:
if i["name"] not in a:
a.append(i["name"])
print(i)

How to use UNWIND to execute block composed of a MATCH and two FOREACHs?

I'm running neo4j queries from node.js using the neo4j-driver. A lot of things were simplified to cut irrelevant information, but what is needed is here.
I have been trying to make a query to ingest a data set with some quirks, defined as follows:
Curriculum: A list of Publications
Publication: Contains data about a publication and a field that is a list of Authors
Author: Relevant fields are externalId and normalizedFullName.
externalId is an id that comes from the data's origin system. It is not guaranteed to be present, but if it is, it will uniquely identify a node
normalizedFullName will always be present and it's ok to assume the same author will always have the same name wherever it appears; it is also acceptable that full name may not be unique and that at some point two different persons may be stored as the same node
It is possible for an author to be part of a publication with only it's normalizedFullName and be part of another with normalizedFullName AND externalId. As you can see, it is not very consistent data, but this is not a problem for the ends I need it.
It will look like this: (don't mind any syntax error)
"curriculum": [
{
"data": {
"fieldA": "a",
"fieldB": "b"
},
"authors": [
{
"externalId": "",
"normalizedFullName": "namea namea"
},
{
"externalId": "123456",
"normalizedFullName": "nameb nameb"
}
]
},
{
"data": {
"fieldA": "d",
"fieldB": "e"
},
"authors": [
{
"externalId": "123321",
"normalizedFullName": "namea namea"
},
{
"externalId": "123456",
"normalizedFullName": "nameb nameb"
}
]
}
]
Merging everything
Merging the publication part is trivial, but things get complicated when it comes to the authors since I have to follow this logic (simplified here) to merge an author:
IF author don't have externalId OR isn't already a node created with his externalId THEN
merge by normalizedFullName
ELSE IF there is already a node with this externalId THEN
merge by externalId
So, acknowledging that I would need some kind of conditional merge, finding that it could be achieved by "the foreach trick", I was able to come up with this little monster (comments added to clarify):
// For each publication, merge it
UNWIND {publications} as publication
MERGE (p:Publication { fieldA: publication.data.fieldA, fieldB: publication.data.fieldB })
ON CREATE SET p = publication.data
WITH p, publication.authors AS authors
// Then, for each author in this publication
UNWIND authors AS author
// IF author don't have externalId OR isn't already a node created with his externalId THEN
MATCH (a:Author) WHERE a.externalId = author.data.externalId AND a.externalId <> '' WITH count(a) as found, author, p
// Merge by name
FOREACH(ignoreMe IN CASE WHEN found = 0 THEN [1] ELSE [] END |
MERGE (aa:Author { normalizedFullName: author.data.normalizedFullName })
ON CREATE SET aa = author.data
MERGE (aa)-[:CONTRIBUTED]->(p)
)
// Else, merge by externalId
FOREACH(ignoreMe IN CASE WHEN found > 0 THEN [1] ELSE [] END |
MERGE (aa:Author { externalId: autor.dadta.externalId })
ON CREATE SET aa = author.data
MERGE (aa)-[:CONTRIBUTED]->(p)
)
Note: This is not the real query i'm using, just shows the exact structures.
The Problem
It doesn't work. It only creates the publications (corretly) and never the authors. It seems the MATCH, FOREACH or a combination of both is messing up with the loop I expected to happen because of UNWIND.
I'm at a point where I can't find a way to do it properly. I also can't find what is wrong, even checking the documentation available.
So, what do I do?
(let me know if anymore information is needed)
Thanks in advance for any insight!
First of all: author.data.externalIddoes not exists. The right property path is author.externalId(without data). The same for author.data.normalizedFullName.
I simulated your scenario here putting your data set as a parameter in the Neo4j browser interface. After it I ran your query. As expected the author are never created.
I corrected your query doing these steps:
Changed author.data.externalId to author.externalId and author.data.normalizedFullName to author.normalizedFullName.
Changed MATCH (a:Author) to OPTIONAL MATCH (a:Author) to ensure that the query will continue even no results found.
Removed count(a) as found (not necessary) and changed tests from found = 0 to a IS NULL and from found > 0 to a IS NOT NULL.
Your corrected query:
UNWIND {publications} as publication
MERGE (p:Publication { fieldA: publication.data.fieldA, fieldB: publication.data.fieldB })
ON CREATE SET p = publication.data
WITH p, publication.authors AS authors
UNWIND authors AS author
OPTIONAL MATCH (a:Author) WHERE a.externalId = author.externalId AND a.externalId <> '' WITH a, author, p
FOREACH(ignoreMe IN CASE WHEN a IS NULL THEN [1] ELSE [] END |
MERGE (aa:Author { normalizedFullName: author.normalizedFullName })
ON CREATE SET aa = author
MERGE (aa)-[:CONTRIBUTED]->(p)
)
FOREACH(ignoreMe IN CASE WHEN a IS NOT NULL THEN [1] ELSE [] END |
MERGE (aa:Author { externalId: author.dadta.externalId })
ON CREATE SET aa = author
MERGE (aa)-[:CONTRIBUTED]->(p)
)
The data set created after I ran this query:
I think the problem (or at least one problem) is that if your author MATCH fails, the entire row for that author will be wiped out, and the rest of the query will not execute for that author.
Try using OPTIONAL MATCH instead, that will preserve the row and allow the query to finish for those rows.
As for additional options on how to do conditional cypher operations, we actually just released new versions of APOC Procedures with conditional cypher execution, so take a look at apoc.do.when() when you get the chance.

Accessing and looping though nested Json in Groovy

so I have been stuck with this problem for quite some time now.
I'm working with some data provided as JSON and retrieved via WSLITE using groovy.
So far handling the json structur and finding data was no problem since the webservice response provided by WSLITE
is retuned as an instance of JsonSlurper.
I now came across some cases where the structure that json response varies depending on what I query for.
Structure looks like this:
{
"result":[
...
"someId": [...]
"somethingElse": [...]
...
"variants":{
"0123":{
"description": "somethingsomething",
...,
...,
"subVariant":{
"1234001":{
name: "somename",
...
}
}
},
"4567":{
"description": "somethingsomething",
...,
...,
"subVariant":{
"4567001":{
name: "somename"
...
...
}
}
}
}
]
}
As you can see, the variants node is an object holding another nested object which holds even more nested objects.
The problem here is that the variants node sometimes holds more than one OBJECT but always at least one. Same vor the subVariant node.
Furthermore I do not now the name of the (numeric) nodes in advance. hence i have to build the path dinamically.
Since Json is handled as maps, lists and arrays in groovy I thought I was able to do this:
def js = JsonSlurper().parseText(jsonReponse)
println js.result.variants[0].subVariant[0].name //js.result.variants.getClass() prints class java.util.ArrayList
but that returns null
while accessing the data statically like this:
println "js.result.variants."0123".subVariant."1234001".name
works just fine.
Just pinting the variants like this
println js.result.variants
also works fine. it prints the first (and only) content of the whole variants tree.
My question is: Why is that?
It kind of seems like i can not address nexted objects by index... am i right? how else would i got about that?
thanks in advance.
Well the reason you cant access it using [0] is simply because "variants": { ... isnt an array, that would look like this: "variants" :[ ...
So your variants is an object with a variable number of attributes, which is basically a named map.
However, if you dont know the entry names, you can iterate through these using .each, so you could do something like js.result.variants.each{key, val -> println val.description}
And something similar for the subvariant

How to update embedded document?

How to update the text of second comment to "new content"
{
name: 'Me',
comments: [{
"author": "Joe S.",
"text": "I'm Thirsty"
},
{
"author": "Adder K.",
"text": "old content"
}]
}
Updating the embedded array basically involves two steps:
1.
You create a modified version of the whole array. There are multiple operations that you can use to modify an array, and they are listed here: http://www.rethinkdb.com/api/#js:document_manipulation-insert_at
In your example, if you know that the document that you want to update is the second element of the array, you would write something like
oldArray.changeAt(1, oldArray.nth(1).merge({text: "new content"}))
to generate the new array. 1 here is the index of the second element, as indexes start with 0. If you do not know the index, you can use the indexesOf function to search for a specific entry in the array. Multiple things are happening here: changeAt replaces an element of the array. Here, the element at index 1 is replaced by the result of oldArray.nth(1).merge({text: "new content"}). In that value, we first pick the element that we want to base our new element from, by using oldArray.nth(1). This gives us the JSON object
{
"author": "Adder K.",
"text": "old content"
}
By using merge, we can replace the text field of this object by the new value.
2.
Now that we can construct the new object, we still have to actually store it in the original row. For this, we use update and just set the "comments" field to the new array. We can access the value of the old array in the row through the ReQL r.row variable. Overall, the query will look as follows:
r.table(...).get(...).update({
comments: r.row('comments').changeAt(1,
r.row('comments').nth(1).merge({text: "new content"}))
}).run(conn, callback)
Daniel's solution is correct. However, there are several open issues on Github for planned enhancements, including:
generic object and array modification (https://github.com/rethinkdb/rethinkdb/issues/895)
being able to specify optional arguments for merge and update (https://github.com/rethinkdb/rethinkdb/issues/872)
being able to specify a conflict resolution function for merge (https://github.com/rethinkdb/rethinkdb/issues/873)
...among other related issues. Until those are introduced into ReQL (particularly #895), Daniel's approach is the correct one.

Resources