Delete duplicates dictionary inside list by key

Delete duplicates dictionary inside list by key - python-3.x

I'm newbie in python. I have the next list with dictionaries inside.
l = [{'id': 2, 'source_id': 100},
{'id': 1, 'source_id': 100},
{'id': 3, 'source_id': 1234},
{'id': 5, 'source_id': 200},
{'id': 4, 'source_id': 200}]
And I want to get result like:
l = [{'id': 1, 'source_id': 100},
{'id': 3, 'source_id': 1234},
{'id': 4, 'source_id': 200}]
I understand first step is sorting the list:
sorted_sources_list = sorted(l, key=lambda source: source['id'])
But I don't know how delete duplicate with the greatest id.

You can iterate through each item and add the item to another empty array, but before you add, check if the item already exists within the new array. If it does, its a duplicate item, but if it doesn't, that's obvious, its not duplicate item.

Try:
getId = lambda x: x.get("source_id", None)
l = list(map(lambda x: list(list(x)[-1])[-1], groupby(sorted(l, key=getId), getId)))
Outputs:
[{'id': 1, 'source_id': 100}, {'id': 4, 'source_id': 200}, {'id': 3, 'source_id': 1234}]

Related

python 3 - match and append dicts based on key

Given a very large list, called tsv_data, which resembles:
{'id':1,'name':'bob','size':2},
{'id':2,'name':'bob','size':3},
{'id':3,'name':'sarah','size':2},
{'id':4,'name':'sarah','size':2},
{'id':5,'name':'sarah','size':3},
{'id':6,'name':'sarah','size':3},
{'id':7,'name':'jack','size':5},
And a separate list of all unique strings therein, called names:
{'bob','sarah','jack'}
The aim is to produce the following data structure:
[
{'name':'bob','children':
[
{'id':1,'size':2},
{'id':2,'size':3}
]
},
{'name':'sarah','children':
[
{'id':3,'size':2},
{'id':4,'size':2},
{'id':5,'size':3},
{'id':6,'size':3}
]
},
{'name':'jack','children':
[
{'id':7,'size':5}
]
}
]
Which is challenging for me to write a for loop to restructure as each length is different.
Is there a python solution that is robust to length of each item of name? Please demonstrate, thanks.

Here is a straightforward solution.
tsv_data = [
{'id':1,'name':'bob','size':2},
{'id':2,'name':'bob','size':3},
{'id':3,'name':'sarah','size':2},
{'id':4,'name':'sarah','size':2},
{'id':5,'name':'sarah','size':3},
{'id':6,'name':'sarah','size':3},
{'id':7,'name':'jack','size':5}
]
names = {'bob','sarah','jack'}
expected_keys = ('id', 'size')
result = []
for name in names:
result.append({'name': name,
'children': [ {k: v for k, v in d.items() if k in expected_keys}
for d in tsv_data if d.get('name') == name ]})
# result:
# [{'name': 'sarah',
# 'children': [{'id': 3, 'size': 2},
# {'id': 4, 'size': 2},
# {'id': 5, 'size': 3},
# {'id': 6, 'size': 3}]},
# {'name': 'bob', 'children': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}]},
# {'name': 'jack', 'children': [{'id': 7, 'size': 5}]}]
In this solution, it iterates over the whole tsv_data for each name. If the tsv_data or names is large and you want to run fast, you could create another dictionary to get a subset of tsv_data by name.

Python 3 - dynamically nest dicts into lists given unknown number of categories

Variable tsv_data has the following structure:
[
{'id':1,'name':'bob','type':'blue','size':2},
{'id':2,'name':'bob','type':'blue','size':3},
{'id':3,'name':'bob','type':'blue','size':4},
{'id':4,'name':'bob','type':'red','size':2},
{'id':5,'name':'sarah','type':'blue','size':2},
{'id':6,'name':'sarah','type':'blue','size':3},
{'id':7,'name':'sarah','type':'green','size':2},
{'id':8,'name':'jack','type':'blue','size':5},
]
Which I would like to restructure into:
[
{'name':'bob', 'children':[
{'name':'blue','children':[
{'id':1, 'size':2},
{'id':2, 'size':3},
{'id':3, 'size':4}
]},
{'name':'red','children':[
{'id':4, 'size':2}
]}
]},
{'name':'sarah', 'children':[
{'name':'blue','children':[
{'id':5, 'size':2},
{'id':6, 'size':3},
]},
{'name':'green','children':[
{'id':7, 'size':2}
]}
]},
{'name':'jack', 'children':[
{'name':'blue', 'children':[
{'id':8, 'size':5}
]}
]}
]
What is obstructing my progress is not knowing how many items will be in the children list for each major category. In a similar vein, we also don't know which categories will be present. It could be blue or green or red -- all three or in any combination (like only red and green or only green).
Question
How might we devise a fool-proof way to compile the basic list of list contained in tsv_data into a multi-tier hierarchical data structure as above?

Given your major categories as a list:
categories = ['name', 'type']
You can first transform the input data into a nested dict of lists so that it's easier and more efficient to access children by keys than your desired output format, a nested list of dicts:
tree = {}
for record in tsv_data:
node = tree
for category in categories[:-1]:
node = node.setdefault(record.pop(category), {})
node.setdefault(record.pop(categories[-1]), []).append(record)
tree would become:
{'bob': {'blue': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}, {'id': 3, 'size': 4}], 'red': [{'id': 4, 'size': 2}]}, 'sarah': {'blue': [{'id': 5, 'size': 2}, {'id': 6, 'size': 3}], 'green': [{'id': 7, 'size': 2}]}, 'jack': {'blue': [{'id': 8, 'size': 5}]}}
You can then transform the nested dict to your desired output structure with a recursive function:
def transform(node):
if isinstance(node, dict):
return [
{'name': name, 'children': transform(child)}
for name, child in node.items()
]
return node
so that transform(tree) would return:
[{'name': 'bob', 'children': [{'name': 'blue', 'children': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}, {'id': 3, 'size': 4}]}, {'name': 'red', 'children': [{'id': 4, 'size': 2}]}]}, {'name': 'sarah', 'children': [{'name': 'blue', 'children': [{'id': 5, 'size': 2}, {'id': 6, 'size': 3}]}, {'name': 'green', 'children': [{'id': 7, 'size': 2}]}]}, {'name': 'jack', 'children': [{'name': 'blue', 'children': [{'id': 8, 'size': 5}]}]}]
Demo: https://replit.com/#blhsing/NotableCourageousTranslations

python converting a List of Tuples into a Dict with external keys

I have a list of tuples like this
[
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e')
]
The result should be a list of dictionaries
[
{'id': 1, 'label': 'a'},
{'id': 2, 'label': 'b'},
{'id': 3, 'label': 'c'},
{'id': 4, 'label': 'd'},
{'id': 5, 'label': 'e'}
]
I'm using python 3.6
The first value in every tuple is named as 'id' and the second value in every tuple is named as 'label'.
I want to get the above result without using loop since the data will be huge.
Is there any built-in method to achieve my result?

Using map
Ex:
data = [
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e')
]
keys = ["id", 'label']
print(list(map(lambda x: dict(zip(keys, x)), data)))
#List comprehension
#print([dict(zip(keys, i)) for i in data])
Output:
[{'id': 1, 'label': 'a'},
{'id': 2, 'label': 'b'},
{'id': 3, 'label': 'c'},
{'id': 4, 'label': 'd'},
{'id': 5, 'label': 'e'}]

A simple comprehension will do:
l = [
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e')
]
results = [{"id" : id, "label": label} for id, label in l]
If your data is too big you can use a generator (so it will be lazily evaluated):
results = ({"id" : id, "label": label} for id, label in l)
Or use map:
results = map(lambda x: {"id" : x[0], "label": x[1]}, l)

import itertools as it # pip install itertools
ids=[] # lists
a=[
(1,'a'), # input
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e')
]
for i, j in a :
# for j in i:
di=("id", i ,"label", j)
ids.append(dict(it.zip_longest(*[iter(di)] * 2, fillvalue="")))
print(ids)
output :
[{'id': 5, 'label': 'e'}, {'id': 5, 'label': 'e'}, {'id': 5, 'label': 'e'}, {'id': 5, 'label': 'e'}, {'id': 5, 'label': 'e'}]

how to make collection of list from dict having same id?

[{'id': 6, 'name': 'Jorge'}, {'id': 6, 'name': 'Matthews'}, {'id': 6, 'name': 'Matthews'}, {'id': 7, 'name': 'Christine'}, {'id': 7, 'name': 'Smith'}, {'id': 7, 'name': 'Chris'}]
And i wanna make collection of list having same id like this
[{'id': 6, 'name': ['Jorge','Matthews','Matthews']}, {'id': 7, 'name': ['Christine','Smith','Chris']}]

L = [{'id': 6, 'name': 'Jorge'}, {'id': 6, 'name': 'Matthews'}, {'id': 6, 'name': 'Matthews'}, {'id': 7, 'name': 'Christine'}, {'id': 7, 'name': 'Smith'}, {'id': 7, 'name': 'Chris'}]
temp = {}
for d in L:
if d['id'] not in temp:
temp[d['id']] = []
temp[d['id']].append(d['name'])
answer = []
for k in sorted(temp):
answer.append({'id':k, 'name':temp[k]})

You can use itertools.groupby to group all the ids and then just extract the name for each element in the group:
In [1]:
import itertools as it
import operator as op
L = [{'id': 6, 'name': 'Jorge'}, ...]
_id = op.itemgetter('id')
[{'id':k, 'name':[e['name'] for e in g]} for k, g in it.groupby(sorted(L, key=_id), key=_id)]
Out[1]:
[{'id': 6, 'name': ['Jorge', 'Matthews', 'Matthews']},
{'id': 7, 'name': ['Christine', 'Smith', 'Chris']}]

Python: Retrieve result from inner dictionary

I'm fairly new to python and I don't know how can I retrieve a value from a inner dictionary:
This is the value I have in my variable:
variable = {'hosts': 1, 'key':'abc', 'result': {'data':[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]}, 'version': 2}
What I want to do is assign a new variable the number of licenses 'mike' has, for example.
Sorry for such a newbie, and apparent simple question, but I'm only using python for a couple of days and need this functioning asap. I've search the oracle (google) and stackoverflow but haven't been able to find an answer...
PS: Using python3

Working through it and starting with
>>> from pprint import pprint
>>> pprint(variable)
{'hosts': 1,
'key': 'abc',
'result': {'data': [{'id': 'john', 'licenses': 2},
{'id': 'mike', 'licenses': 1}]},
'version': 2}
First, let's get to the result dict:
>>> result = variable['result']
>>> pprint(result)
{'data': [{'id': 'john', 'licenses': 2}, {'id': 'mike', 'licenses': 1}]}
and then to its data key:
>>> data = result['data']
>>> pprint(data)
[{'id': 'john', 'licenses': 2}, {'id': 'mike', 'licenses': 1}]
Now, we have to scan that for the 'mike' dict:
>>> for item in data:
... if item['id'] == 'mike':
... print item['licenses']
... break
...
1
You could shorten that to:
>>> for item in variable['result']['data']:
... if item['id'] == 'mike':
... print item['licenses']
... break
...
1
But much better would be to rearrange your data structure like:
variable = {
'hosts': 1,
'version': 2,
'licenses': {
'john': 2,
'mike': 1,
}
}
Then you could just do:
>>> variable['licenses']['mike']
1

You can use nested references as follows:
variable['result']['data'][1]['licenses'] += 1
variable['result'] returns:
{'data':[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]}
variable['result']['data'] returns:
[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]
variable['result']['data'][1] returns:
{'licenses': 1, 'id':'mike'}
variable['result']['data'][1]['licenses'] returns:
1
which we then increment using +=1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Delete duplicates dictionary inside list by key - python-3.x

You can iterate through each item and add the item to another empty array, but before you add, check if the item already exists within the new array. If it does, its a duplicate item, but if it doesn't, that's obvious, its not duplicate item.

Try: getId = lambda x: x.get("source_id", None) l = list(map(lambda x: list(list(x)[-1])[-1], groupby(sorted(l, key=getId), getId))) Outputs: [{'id': 1, 'source_id': 100}, {'id': 4, 'source_id': 200}, {'id': 3, 'source_id': 1234}]

Related

python 3 - match and append dicts based on key

Python 3 - dynamically nest dicts into lists given unknown number of categories

python converting a List of Tuples into a Dict with external keys

how to make collection of list from dict having same id?

Python: Retrieve result from inner dictionary

Categories

Resources