combine a list of dictionaries with one key value match

combine a list of dictionaries with one key value match - python-3.x

listofdicts = [
{
"if-e0": "e0",
"ip-add-e0": "192.168.1.1",
"name": "host1"
},
{
"if-e1": "e1",
"ip-add-e1": "192.168.2.1",
"name": "host1"
},
{
"if-e1": "e1",
"ip-add-e1": "172.16.1.1",
"name": "host2"
},
{
"if-e2": "e2",
"ip-add-e2": "172.16.2.1",
"name": "host2"
}]
Expected Result:
listofdicts = [
{
"if-e0": "e0",
"ip-add-e0": "192.168.1.1",
"if-e1": "e1",
"ip-add-e1": "192.168.2.1",
"name": "host1"
},
{
"if-e1": "e1",
"ip-add-e1": "172.16.1.1",
"if-e2": "e2",
"ip-add-e2": "172.16.2.1",
"name": "host2"
}]
Have been trying to make this work but no luck yet, actual list has more than 60K dicts with unique and matching hosts.
It could be easier to solve but for me, it's been a nightmare from past few hrs.
Appreciate your assistance.
Regards,
Avinash

Graph theory seems to be helpful here.
To solve this, you need to build a graph, where each vertex relates to one dictionary from your input list.
There should be an edge between two vertices if there is a common key-value pair in the corresponding dictionaries (more specifically, for dictionaries d1 and d2 there should be an edge if len(set(d1.items()).intersection(d2.items())) != 0 or, simpler, if set(d1.items()).intersection(d2.items()). The condition means there is at least one key-value pair in the intersection of the sets of items of d1 and d2).
After the graph is built, you need to find all the connectivity components (that's a pretty simple DFS (depth-first search), you can google it if you're not familiar with graph algorithms). Each connectivity component's dictionaries should be combined in one: there should be one resulting dictionary per component. The list of these resulting dictionaries is your answer.
Here is an example of how you combine some dictionaries:
connectivity_component_dicts = [{...}, ...]
resulting_dict = {**d for d in connectivity_component_dicts}
# Note that the greater the index of `d` in `connectivity_component_dicts`,
# the higher priority its keys have if there are same keys in some dicts.

#Kolay.Ne Hi, Hey guys,
It did work with a very basic catch. Graph method is fantastic to solve it although I used below approach n that worked:
for d in listofdicts:
x = listofdicts.index(d)
for y in range(len(listofdicts)):
k = 'name'
if y != x and y < len(listofdicts):
if listofdicts[x][k] == listofdicts[y][k]:
dc = copy.deepcopy(listofdicts[y])
listofdicts[x].update(dc)
listofdicts.remove(dc)
Could be other approaches to solve it, im sure pathonic way will be just couple of lines, this solved my problem for the job in hand.
Thank you to kolay.Ne for responding quickly and trying to assist, and graph method is fantastic as well, requires professional coding and for sure that will be more scalable.

a = []
for i in listofdicts:
if i["name"] not in a:
a.append(i["name"])
print(i)

Related

Lowercasing complex object field names in azure data factory data flow

I'm trying to lowercase the field names in a row entry in azure data flow. Inside a complex object I've got something like
{
"field": "sample",
"functions": [
{
"Name": "asdf",
"Value": "sdfsd"
},
{
"Name": "dfs",
"Value": "zxcv"
}
]
}
and basically what I want is for "Name" and "Value to be "name" and "value". However can't seem to use any expressions that will work for the nested fields of a complex object in the expression builder.
I've tried using a something like a select with a rule-based mapping that is the rule being 1 == 1 and lower($$), but with $$ it seems to only work for root columns of the complex object and not the nested fields inside.

As suggested by #Mark Kromer MSFT, for changing case of columns inside complex type select the functions in the Hierarchy level.
Please check the below for your reference:
Here, I have used both.
You can see the difference in results.

Preserve list ordering when creating maps in Terraform 0.12?

I have the following snippets in my configuration - the idea is to change current logic/syntax from 0.11 to 0.12. First, I am creating a map from lists,
my_vars = zipmap(
var.foo_vars,
flatten(data.terraform_remote_state.foo.*.outputs.some_id)
)
Then iterate over it to produce some key value pairs.
...
"var": [for key in keys(local.my_vars) :
{
name = key
value = lookup(local.my_vars, key)
}
],
...
And here is the relevant tfvars configuration.
foo_vars = [
"A",
"B",
"C"
]
The problem is that this logic doesn't seem to preserver order and I can't figure out a good way to make this happen. From what I understand, once you turn the lists into a map with zipmap, the order is recalculated. Is there anything that can be done to have the original order preserved?
I'm not tied to the current solution, so maybe there is a way to generate the key/values that doesn't require a map to be created first and can be done instead with only the two lists?
~ foo = [
{
name = "A"
value = "1"
},
- {
- name = "B"
- value = "2"
},
{
name = "C"
value = "3"
},
+ {
+ name = "B"
+ valueFrom = "2"
},
]

The important thing here is that, as you've noticed, Terraform's map type is an unordered map which identifies elements only by their keys, not by permission. Therefore if you have a situation where you need to preserve the order of a sequence then a map is not a suitable data structure to use.
I have a suspicion that keeping things ordered may not actually be necessary to solve your underlying problem here, but I can't tell from the information you've shared what the real-world meaning of all of these values is, so I'm going to answer on the assumption that you do need to preserve the order. If you are working with ordered sequences only because you are creating multiple instances of a resource using count, I'd suggest that you consider using resource for_each instead, which may allow you to solve your underlying problem in a way that is not sensitive to the order of items in var.foo_vars.
Given two lists of the same length, you can produce a new list that combines the corresponding elements from each list by writing a for expression like this:
locals {
my_vars = [
for i, some_id in data.terraform_remote_state.foo.*.outputs.some_id : {
name = var.foo_vars[i]
value = some_id
}
]
}
The above relies on the fact that i index values from one list are correlated with the element of the same index in the other list, and so we can use the i from the data source instances to access the corresponding element of var.foo_vars.

Most elegant way to get two properties at the same time

Suppose you have hierarchical data and want to obtain the merged value of separate properties, what is the most elegant, or groovy, way to do so?
The following example holds information about failed and skipped tests. Of course, it does make sense, that these values are separated - but for the use case of retrieving a list of all tests, that did not run successfully, I came across two possible solutions, which both of them did not satisfy me.
def junitResultList = [
[
name: "Testsuite A",
children: [
failedTests: ["Test 1", "Test 2"],
skippedTests: []
]
],
[
name: "Testsuite B",
children: [
failedTests: ["CursorTest"],
skippedTests: ["ClickTest", "DragNDropTest"]
]
]
]
To be more specific, I want the value to be ["Test 1", "Test 2", "CursorTest", "ClickTest", "DragNDropTest"].
The first approach was simply to perform an addition of the spread test lists:
(junitResultList*.children*.failedTests +
junitResultList*.children*.skippedTests).flatten()
While this works, it appeared to me that specifying the path to these properties twice seems not to be the most groovy way, so I came up with this horrible but somehow appealing disasterpiece:
(junitResultList*.children*.findAll {
['skippedTests', 'failedTests'].contains(it.key)
})*.values().flatten()

You can simplify your initial expression to something like this:
junitResultList.children.collect { it.failedTests + it.skippedTests }.flatten()
or
junitResultList.children.collect { [it.failedTests, it.skippedTests] }.flatten()

You can just do as below:
//Define the keys to find
def requiredKeys = ['failedTests', 'skippedTests']
println requiredKeys.collect{ junitResultList.children."$it"}.flatten()
You can quickly try the same online demo

You can get the subMap()s and then the values() on that:
junitResultList*.children*.subMap(["failedTests","skippedTests"])*.values().flatten()

ArangoDB : How to get all the possible paths between 2 vertices?

How to get all the possible paths between 2 vertices (eg. X and Y) with maxDepth = 2?
I tried with TRAVERSAL but it is taking around 10 seconds to execute. Here is the query :
FOR p IN TRAVERSAL(locations, connections, "X", "outbound", { minDepth: 1, maxDepth: 2, paths: true })
FILTER p.destination._key == "Y"
RETURN p.path.vertices[*].name
The locations (vertices) collection has 23753 documents, and the connections (edges) collection has 123414 documents.

You can speed up the query a lot if you put the filter for destination right into Traversal via the options filterVertices to give examples of vertices that should be touched by the traversal. With vertexFilterMethod you can define what should happen with all vertices that do not match the example.
So in your query you only want to match the target vertex "Y" and all other vertices should be passed through but not included in the result, exclude.
This makes the later FILTER obsolete.
Right now the internal optimizer is not able to do that automagically but this magic is on our roadmap.
This is a query containing the optimization:
FOR p IN TRAVERSAL(locations, connections, "X", "outbound", { minDepth: 1, maxDepth: 2, paths: true, filterVertices: [{_key: "Y"}], vertexFilterMethod: ["exclude"]})
RETURN p.path.vertices[*].name

Redundant query trigger when creating a graph?

whenever I try to create a new graph with 700.000 to 2 Mio edges, it takes a long time. I observed due to the great new feature in the API
/_api/query/current
that possibly the graph creation triggers automatically some kind of cache loading, but twice?
[
{
"id": "70",
"query": "FOR x IN GRAPH_VERTICES(#graph, {}) SORT RAND() LIMIT #limit RETURN x",
"started": "2015-03-31T19:06:59Z",
"runTime": 41.95919394493103
},
{
"id": "71",
"query": "FOR x IN GRAPH_VERTICES(#graph, {}) SORT RAND() LIMIT #limit RETURN x",
"started": "2015-03-31T19:06:59Z",
"runTime": 41.95719385147095
}
]
Is this correct. Is there a more efficient way?
Thanks in Advance!

The graph viewer issued the mentioned RAND() query two times:
- one instance is fired to determine a random vertex from the graph
- the other instance is fired to determine the attributes of some random vertices of the graph, in order to populate the search input field
The AQL that was used by the graph viewer was inefficient. It build a big list, sorted it randomly and returned 1 (first query) or 10 (second query) documents from it. This has been fixed in commit c28575f202a58d5c93e6c36883effda48c2a7159 so it's much more efficient now.
The fix will be included in the next build (i.e. 2.5.2).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

combine a list of dictionaries with one key value match - python-3.x

a = [] for i in listofdicts: if i["name"] not in a: a.append(i["name"]) print(i)

Related

Lowercasing complex object field names in azure data factory data flow

Preserve list ordering when creating maps in Terraform 0.12?

Most elegant way to get two properties at the same time

ArangoDB : How to get all the possible paths between 2 vertices?

Redundant query trigger when creating a graph?

Categories

Resources