My data structure is set up similar to this:
[[{'proj': 'XABCD'}, {'test': 1}], [{'proj': 'XABCD'}, {'test': 2}], [{'proj': 'XDEFG'}, {'test': 1}]]
I'd like to be able to split the main list based in the values of 'proj' so my result would be along the lines of a list for each unique project:
[[{'proj': 'XABCD'}, {'test': 1}], [{'proj': 'XABCD'}, {'test': 2}]]
[[{'proj': 'XDEFG'}, {'test': 1}]]
I do not know how many different projects will actually be present and what their names will be so I can't hardcode any sorting in.
I was thinking of looping through the main list, assigning each unique project as a key to a dictionary then appending the sublist to the value for that projects key. My code and result comes out like this:
projects = {}
for sample in contaminated_samples:
proj = sample[0]['proj']
if proj in projects.keys():
projects[proj].append(sample)
else:
projects[proj] = [sample]
{'XABCD': [[{'proj': 'XABCD'}, {'test': 1}], [{'proj': 'XABCD'}, {'test': 2}]], 'XDEFG': [[{'proj': 'XDEFG'}, {'test': 1}]]}
While this works I was wondering if there's a more efficient way or some sort of list/dictionary comprehension that would allow me to get the same/similar results.
I would basically do the same as you but I would simplify it slightly with setdefault()
data = [
[{'proj': 'XABCD'}, {'test': 1}],
[{'proj': 'XABCD'}, {'test': 2}],
[{'proj': 'XDEFG'}, {'test': 1}]
]
data2 = {}
for row in data:
data2.setdefault(row[0]["proj"], []).append(row)
data2 = list(data2.values())
print(data2)
result in:
[
[
[{'proj': 'XABCD'}, {'test': 1}],
[{'proj': 'XABCD'}, {'test': 2}]
],
[
[{'proj': 'XDEFG'}, {'test': 1}]
]
]
Related
I have the following test list:
testing = [
{'score': [('a', 90)],'text': 'abc'},
{'score': [('a', 80)], 'text': 'kuku'},
{'score': [('a', 70)], 'text': 'lulu'},
{'score': [('b', 90)], 'text': 'dalu'},
{'score': [('b', 86)], 'text': 'pupu'},
{'score': [('b', 80)], 'text': 'mumu'},
{'score': [('c', 46)], 'text': 'foo'},
{'score': [('c', 26)], 'text': 'too'}
]
I would like to go through each dict, group by the score's tuple first element (a, b or c) and average the second element + collect the texts for each first element of score's tuple to get the following:
{"a": {"avg_score": 80, "texts_unique": ['abc', 'kuku', 'lulu']}, "b": the same logic... }
I have seen a pandas approach, any best practice to do this?
Try:
from statistics import mean
testing = [
{"score": [("a", 90)], "text": "abc"},
{"score": [("a", 80)], "text": "kuku"},
{"score": [("a", 70)], "text": "lulu"},
{"score": [("b", 90)], "text": "dalu"},
{"score": [("b", 86)], "text": "pupu"},
{"score": [("b", 80)], "text": "mumu"},
{"score": [("c", 46)], "text": "foo"},
{"score": [("c", 26)], "text": "too"},
]
out = {}
for d in testing:
out.setdefault(d["score"][0][0], []).append((d["score"][0][1], d["text"]))
out = {
k: {
"avg_score": mean(i for i, _ in v),
"texts_unique": list(set(i for _, i in v)),
}
for k, v in out.items()
}
print(out)
Prints:
{
"a": {"avg_score": 80, "texts_unique": ["abc", "kuku", "lulu"]},
"b": {
"avg_score": 85.33333333333333,
"texts_unique": ["mumu", "dalu", "pupu"],
},
"c": {"avg_score": 36, "texts_unique": ["foo", "too"]},
}
You can use itertools.groupby to group your data around the letter key and then use a helper function to return the desired object for each letter:
import itertools
def grouper(g):
return { 'avg_score' : sum(t['score'][0][1] for t in g)/len(g), 'texts_unique' : list(set(t['text'] for t in g)) }
res = { k : grouper(list(g)) for k, g in itertools.groupby(testing, key=lambda t:t['score'][0][0]) }
Output:
{
"a": {
"avg_score": 80.0,
"texts_unique": [
"abc",
"lulu",
"kuku"
]
},
"b": {
"avg_score": 85.33333333333333,
"texts_unique": [
"mumu",
"dalu",
"pupu"
]
},
"c": {
"avg_score": 36.0,
"texts_unique": [
"foo",
"too"
]
}
}
A simplified look at my data right at parse:
[
{'id':'group1'},
{'id':'member1', 'parentId':'group1', 'size':51},
{'id':'member2', 'parentId':'group1', 'size':16},
{'id':'group2'},
{'id':'member1', 'parentId':'group2', 'size':21},
...
]
The desired output should be like this:
data =
[
[
{'id':'group1'},
{'id':'member1', 'parentId':'group1', 'size':51},
{'id':'member2', 'parentId':'group1', 'size':16}
],
[
{'id':'group2'},
{'id':'member1', 'parentId':'group2', 'size':21},
]
]
The issue is that it's very challenging to iterate through this kind of data structure because each list contains a different length of possible objects: some might have 10 some might have 3, making it unclear when to begin and end each list. And it's also not uniform. Note some have only 'id' entries and no 'parentId' or 'size' entries.
master_data = []
for i in range(len(tsv_data)):
temp = {}
for j in range(?????):
???
How can Python handle arranging vanilla .tsv data into a list of lists as seen above?
I thought one appropriate direction to take the code was to see if I could tally something simple, before tackling the whole data set. So I attempted to compute a count of all occurences of group1, based off this discussion:
group_counts = {}
for member in data:
group = member.get('group1')
try:
group_counts[group] += 1
except KeyError:
group_counts[group] = 1
However, this returned:
'list' object has no attribute 'get'
Which leads me to believe that counting text occurences may not be the solution afterall.
You could fetch all groups to create the new datastructure afterwards add all the items:
data = [
{
'id': 'group1'
}, {
'id': 'member1',
'parentId': 'group1',
'size': 51
}, {
'id': 'member2',
'parentId': 'group1',
'size': 16
}, {
'id': 'group2'
}, {
'id': 'member1',
'parentId': 'group2',
'size': 21
}, {
'id': 'member3',
'parentId': 'group1',
'size': 16
}
]
result = {} # Use a dict for easier grouping.
lastGrpId = 0
# extract all groups
for dct in data:
if 'group' in dct['id']:
result[dct['id']] = [dct]
# extract all items and add to groups
for dct in data:
if 'parentId' in dct:
result[dct['parentId']].append(dct)
nestedListResult = [v for k, v in result.items()]
Out:
[
[
{
'id': 'group1'
}, {
'id': 'member1',
'parentId': 'group1',
'size': 51
}, {
'id': 'member2',
'parentId': 'group1',
'size': 16
}, {
'id': 'member3',
'parentId': 'group1',
'size': 16
}
], [{
'id': 'group2'
}, {
'id': 'member1',
'parentId': 'group2',
'size': 21
}]
]
I have a list of dictionaries like this:
list1 = [{'name': 'maik','is_payed': 1, 'brand': 'HP', 'count': 1, 'items': [{'device': 'mouse', 'count': 110}]},{'name': 'milanie','is_payed': 0, 'brand': 'dell', 'count':10, 'items': [{'device': 'bales', 'count': 200}]}]
list2 = [{'name': 'maik','is_payed': 0, 'brand': 'HP', 'count': 20, 'items': [{'device': 'mouse', 'count': 1}]},{'name': 'nikola','is_payed': 1, 'brand': 'toshiba', 'count':10, 'items': [{'device': 'hard', 'count': 20}]}]
my_list= list1 + list2
count = pd.DataFrame(my_list).groupby(['name', 'is_payed'])
final_list_ = []
for commande, group in count:
print(commande)
records = group.to_dict("records")
final_list_.append({"name": commande[0],
"payed": commande[1],
"occurrence": len(group),
"items": pd.DataFrame(records).groupby('device').agg(
occurrence=('device', 'count')).reset_index().to_dict('records')})
I don't know how can I get it like this:
the 'payed' field is like this payed/total_commands
for example lets take maik he has two commands one is payed and the other one is not, so the final result will be like this:
{'name': 'maik','payed': 1/2, 'brand': 'HP', 'count': 21, 'items': [{'device': 'mouse', 'count': 111}]}
Since you just want to group by "name" and are only interested in the "played" values, let's concentrate on that and ignore the other data.
So for our purposes, your starting data looks like:
my_list = [
{'name': 'maik', 'is_payed': 1},
{'name': 'milanie', 'is_payed': 0},
{'name': 'maik', 'is_payed': 0},
{'name': 'nikola', 'is_payed': 1}
]
Now let's take a first pass over this data and count up the number of times we see a name and the number of times that name corresponds to an "is_payed" flag
results = {}
for item in my_list:
key = item["name"]
results.setdefault(key, {"sum": 0, "count": 0})
results[key]["count"] += 1
results[key]["is_payed"] += item["is_payed"]
At this point we have a dictionary that will look like:
{
'maik': {'is_payed': 1, 'count': 2},
'milanie': {'is_payed': 0, 'count': 1},
'nikola': {'is_payed': 1, 'count': 1}
}
Now we will take a pass over this dictionary and create our true final result:
results = [
{"name": key, "payed": f"{value['is_payed']}/{value['count']}"}
for key, value in results.items()
]
Giving us:
[
{'name': 'maik', 'payed': '1/2'},
{'name': 'milanie', 'payed': '0/1'},
{'name': 'nikola', 'payed': '1/1'}
]
I'm trying to use comprehension list to nest one dictionary list to another list of dictionary, I have two dictionary lists one is categories and another is oils. Add the result of list oils to each category if oils category_id is equal category id.
def nest(parent, child):
items = []
for element in child:
if element.get('category_id') == parent.get('id'):
items.append(element)
parent.update({'items': items})
return parent
def merge(parent, child):
results = []
for element in parent:
results.append(nest(element, child))
return results
categories = [
{'id': 1000, 'name': 'Single'},
{'id': 2000, 'name': 'Blend'}]
oils = [
{'id': 100, 'name': 'Orange', 'category_id': 1000},
{'id': 101, 'name': 'Lavender', 'category_id': 1000},
{'id': 102, 'name': 'Peppermint', 'category_id': 1000},
{'id': 104, 'name': 'Inspired', 'category_id': 2000},
{'id': 105, 'name': 'Focus', 'category_id': 2000},
{'id': 107, 'name': 'Tea Tree', 'category_id': 1000}]
results = merge(categories, oils)
print(results)
# output:
# [
# {'id': 1000, 'name': 'Single', 'items': [
# {'id': 100, 'name': 'Orange', 'category_id': 1000},
# {'id': 101, 'name': 'Lavender', 'category_id': 1000},
# {'id': 102, 'name': 'Peppermint', 'category_id': 1000},
# {'id': 107, 'name': 'Tea Tree', 'category_id': 1000}
# ]},
# {'id': 2000, 'name': 'Blend', 'items': [
# {'id': 104, 'name': 'Inspired', 'category_id': 2000},
# {'id': 105, 'name': 'Focus', 'category_id': 2000}
# ]}
# ]
I'm trying to convert the above to comprehension list without success
merged = [
element.update({'items': nest}) for nest in oils
for element in categories if element.get('id') == nest.get('category_id')
]
print(merged)
# output: [None, None, None, None, None, None]
merged = [dict(**c, items=[o for o in oils if o['category_id'] == c['id']]) for c in categories]
from pprint import pprint
pprint(merged)
Prints:
[{'id': 1000,
'items': [{'category_id': 1000, 'id': 100, 'name': 'Orange'},
{'category_id': 1000, 'id': 101, 'name': 'Lavender'},
{'category_id': 1000, 'id': 102, 'name': 'Peppermint'},
{'category_id': 1000, 'id': 107, 'name': 'Tea Tree'}],
'name': 'Single'},
{'id': 2000,
'items': [{'category_id': 2000, 'id': 104, 'name': 'Inspired'},
{'category_id': 2000, 'id': 105, 'name': 'Focus'}],
'name': 'Blend'}]
EDIT (to add dynamic variable):
variable = 'elements' # this is your dynamic variable
merged = [dict(**c, **{variable: [o for o in oils if o['category_id'] == c['id']]}) for c in categories]
I guess it would be nice as a simple function. Thanks.
def merge(parent, child, nested='nested', key='id', foreign='parent_id'):
return [
dict(**element, **{nested: [nest for nest in child if nest[foreign] == element[key]]})
for element in parent
]
I have a dataframe column that has string values in it (edited)
type of dataframe column is string BUT (edited)
its values look like list of dictionaries (edited)
how can i extract some key value from that string?
This STRING value looks like List of dictionaries.
How can I extract value of name key when key 'job' has a value as 'Director?
[{
'credit_id': '549e9edcc3a3682f2300824b',
'department': 'Camera',
'gender': 2,
'id': 473,
'job': 'Additional Photography',
'name': 'Brian Tufano',
'profile_path': None
}, {
'credit_id': '52fe4214c3a36847f8002595',
'department': 'Directing',
'gender': 2,
'id': 578,
'job': 'Director',
'name': 'Ridley Scott',
'profile_path': '/oTAL0z0vsjipCruxXUsDUIieuhk.jpg'
}, {
'credit_id': '52fe4214c3a36847f800259b',
'department': 'Production',
'gender': 2,
'id': 581,
'job': 'Producer',
'name': 'Michael Deeley',
'profile_path': None
}, {
'credit_id': '52fe4214c3a36847f800263f',
'department': 'Writing',
'gender': 2,
'id': 584,
'job': 'Novel',
'name': 'Philip K. Dick',
'profile_path': '/jDOKJN8SQ17QsJ7omv4yBNZi7XY.jpg'
}, {
'credit_id': '549e9f85c3a3685542004c7b',
'department': 'Crew',
'gender': 2,
'id': 584,
'job': 'Thanks',
'name': 'Philip K. Dick',
'profile_path': '/jDOKJN8SQ17QsJ7omv4yBNZi7XY.jpg'
}, {
'credit_id': '52fe4214c3a36847f800261b',
'department': 'Writing',
'gender': 2,
'id': 583,
'job': 'Screenplay',
'name': 'Hampton Fancher',
'profile_path': '/lrGecnLhzjzgwjKHvrmYtRAqOsP.jpg'
}
]
You can achieve this by converting your list into pandas dataframe:
pandas.DataFrame.
import pandas as pd
df =pd.DataFrame(yourList)
df['name'][df['job']=="Director"] # you will get a series of all the names matching your condition.
Or one more way is using .loc[]
df.loc[df["job"]=="Director",["name"]] # You will get a dataframe with name column having all the names matching the condition.
import pandas as pd
df =pd.DataFrame(LIST)
Name=df['name'][df['job']=="Director"]
print('Name =',Name)