the format data like this, To explain, I have this dict:
{
'41': [
'1029136700',
'1028348931'
],
'42': ['12234454']
...
}
then i want to convert the format like this used lambda and map:
[
{
'key':'41','value':'1029136700'
},
{
'key':'41','value': '1028348931'
},
{
'key':'42', 'value': '12234454'
}
...
]
Can you give me a clue on how to achieve this in python?
Here is a clue on how to get what you want - you have to iterate over the initial dictionary:
for key, list_of_vals in initial_dictionary.items():
for val in list_of_vals:
# now you have pairs
# for example, key = '41', val = '1029136700'
# use them as you need
Just couple more steps and you get exactly what you want.
However, if you want to do this with map() and lambda it would not be that easy. You have to replace every for loop (there 2 of them) with map() containing lambda. map() returns generator so you have to iterate over it to get actual result, for example, list(map(...)). Here is complete code how you can get what you want:
result = list()
set(map(lambda item: result.extend(list(
map(lambda val: {'key': item[0], 'value': val},
item[1]))),
initial_dictionary.items()))
print(result)
Output:
[{'key': '41', 'value': '1029136700'}, {'key': '41', 'value': '1028348931'}, {'key': '42', 'value': '12234454'}]
This implementation is significantly more difficult to read and understand.
Related
I have a list of dictionaries like below. I want to group the dictionaries based on grade, and convert the list of dictionaries to single dictionaries with key as grade value and value as list of dictionaries
Input:
[
{'name':'abc','mark':'99','grade':'A'},
{'name':'xyz','mark':'90','grade':'A'},
{'name':'123','mark':'70','grade':'C'},
]
I want my output like below:
{
A: [ {'name': 'abc','mark':'99'}, {'name': 'xyz','mark':'90'} ],
C: [ {'name': '123','mark':'70'} ]
}
I tried sorted and groupby; but not able to remove grade from dictionary.
Use a loop with dict.setdefault:
l = [{'name':'abc','mark':'99','grade':'A'},
{'name':'xyz','mark':'90','grade':'A'},
{'name':'123','mark':'70','grade':'C'},
]
out = {}
for d in l:
# avoid mutating the original dictionaries
d = d.copy()
# get grade, try to get the key in "out"
# if the key doesn't exist, initialize with an empty list
out.setdefault(d.pop('grade'), []).append(d)
print(out)
Output:
{'A': [{'name': 'abc', 'mark': '99'},
{'name': 'xyz', 'mark': '90'}],
'C': [{'name': '123', 'mark': '70'}],
}
I want to add the following to each object in a dictionary.
My dictionary looks like this:
{'Niveau1Obj1': {'Niveau2Obj1': {'Niveau3Obj1': {}}}}
For each object I want to add the following value:
{'type':'object'}
So the final outcome should look something like this:
{'Niveau1Obj1': {'type': 'object'}, 'Niveau2Obj1': {'type': 'object'}, 'Niveau3Obj1': {'type': 'object'}}
My code doesn't result in the desired outcome. The code is:
objects = {'Niveau1Obj1': {'Niveau2Obj1': {'Niveau3Obj1': {}}}}
for key, obj in objects.items():
objects[key].setdefault(key, {}).update({'type':'object'})
It only adds the {'type':'object'} only to the last part of the dictionary.
What am I doing wrong?
Try a recursion:
dct = {"Niveau1Obj1": {"Niveau2Obj1": {"Niveau3Obj1": {}}}}
def get_keys(d):
if isinstance(d, dict):
for k in d:
yield k
yield from get_keys(d[k])
out = {k: {"type": "object"} for k in get_keys(dct)}
print(out)
Prints:
{
"Niveau1Obj1": {"type": "object"},
"Niveau2Obj1": {"type": "object"},
"Niveau3Obj1": {"type": "object"},
}
So I have this nested multiple dictionaries in a jsonl file column as below:
`df['referenced_tweets'][0]`
producing (shortened output)
'id': '1392893055112400898',
'public_metrics': {'retweet_count': 0,
'reply_count': 1,
'like_count': 2,
'quote_count': 0},
'conversation_id': '1392893055112400898',
'created_at': '2021-05-13T17:22:37.000Z',
'reply_settings': 'everyone',
'entities': {'annotations': [{'start': 65,
'end': 77,
'probability': 0.9719000000000001,
'type': 'Person',
'normalized_text': 'Jill McMillan'}],
'mentions': [{'start': 23,
'end': 36,
'username': 'usasklibrary',
'protected': False,
'description': 'The official account of the University Library at USask.',
'created_at': '2019-06-04T17:19:12.000Z',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'url': '*removed*',
'expanded_url': 'http://library.usask.ca',
'display_url': 'library.usask.ca'}]}},
'name': 'University Library',
'url': '....',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1278828446026629120/G1w7t-HK_normal.jpg',
'verified': False,
'id': '1135959197902921728',
'public_metrics': {'followers_count': 365,
'following_count': 119,
'tweet_count': 556,
'listed_count': 9}}]},
'text': 'Wonderful session with #usasklibrary Graduate Writing Specialist Jill McMillan who is walking SURE students through the process of organizing/analyzing a literature review! So grateful to the library -- our largest SURE: Student Undergraduate Research Experience partner!',
...
My intention is to create a function that would auto extract specific columns (e.g. text,type) in the entire dataframe (not just a row). So I wrote the function:
### x = df['referenced_tweets']
def extract_TextType(x):
dic = {}
for i in x:
if i != " ":
new_df= pd.DataFrame.from_dict(i)
dic['refd_text']=new_df['text']
dic['refd_type'] = new_df['type']
else:
print('none')
return dic
However running the function:
df['referenced_tweets'].apply(extract_TextType)
produces an error:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
The whole point is to extract these two nested columns (texts & type) from the original "referenced tweets" column and match them to the original rows.
What am I doing wrong pls?
P.S.
The original df is shotgrabbed below:
A couple things to consider here. referenced_tweets holds a list so this line new_df= pd.DataFrame.from_dict(i) is most likely not parsing correctly the way you are entering it.
Also, because it's possible there are multiple tweets in that list you are correctly iterating over it but you don't need put it into a df to do so. This will also create a new dictionary in each cell as you are using a .apply(). If that's what you want that is ok. If you really just want a new dataframe, you can adapt the following. I don't have access to referenced_tweets so I'm using entities as an example.
Here's my example:
ents = df[df.entities.notnull()]['entities']
dict_hold_list = []
for ent in ents:
# print(ent['hashtags'])
for htag in ent['hashtags']:
# print(htag['text'])
# print(htag['indices'])
dict_hold_list.append({'text': htag['text'], 'indices': htag['indices']})
df_hashtags = pd.DataFrame(dict_hold_list)
Because you have not provided a good working json or dataframe, I can't test this, but your solution could look like this
refs = df[df.referenced_tweets.notnull()]['referenced_tweets']
dict_hold_list = []
for ref in refs:
# print(ref)
for r in ref:
# print(r['text'])
# print(r['type'])
dict_hold_list.append({'text': r['text'], 'type': r['type']})
df_ref_tweets = pd.DataFrame(dict_hold_list)
I am trying to achieve the following output:
[{'Key': 'Language', 'Value': 'Python'}, {'Key': 'Version', 'Value': '3.7'}]
I have implemented a method to achieve the above output:
#cli.command('test', context_settings=dict(
ignore_unknown_options=True,
allow_extra_args=True
))
#click.pass_context
def test(ctx):
data = dict()
tags=dict()
tag_list = list()
for item in ctx.args:
data.update([item.split('=')])
for items in data.items():
tags['Key'] = items[0]
tags['Value'] = items[1]
tag_list.append(tags)
print(tag_list)
Method Call:
python test.py test Language=Python Version=3.7
But I am getting below output:
[{'Key': 'Version', 'Value': '3.7'}, {'Key': 'Version', 'Value': '3.7'}]
The is old dict is been replaced and then appended.
Can you please help me with this?
Thanks & Regards,
You are mutating the same dict in the last loop, so your list will have references to the same dict.
Move this:
tags=dict()
...inside the final for loop:
for items in data.items():
tags=dict()
I've got this data structure coming from Vimeo API
{'duration': 720,
'language': 'sv',
'link': 'https://vimeo.com/neweuropefilmsale/incidentbyabank',
'name': 'INCIDENT BY A BANK',
'user': {
'link': 'https://vimeo.com/neweuropefilmsales',
'location': 'Warsaw, Poland',
'name': 'New Europe Film Sales'
}
}
I want to transofrm in
[720, "sv", "http..", "incident.." "http..", "Warsaw", "New Europe.."]
to load it in a Google spreadsheet. I also need to maintain consistence values order.
PS. I see similar questions but answers are not in Python 3
Thanks
I'm going to use the csv module to create a CSV file like you've described out of your data.
First, we should use a header row for your file, so the order doesn't matter, only dict keys do:
import csv
# This defines the order they'll show up in final file
fieldnames = [
'name', 'link', 'duration', 'language',
'user_name', 'user_link', 'user_location',
]
# Open the file with Python
with open('my_file.csv', 'w', newline='') as my_file:
# Attach a CSV writer to the file with the desired fieldnames
writer = csv.DictWriter(my_file, fieldnames)
# Write the header row
writer.writeheader()
Notice the DictWriter, this will allow us to write dicts based on their keys instead of the order (dicts are unordered pre-3.6). The above code will end up with a file like this:
name;link;duration;language;user_name;user_link;user_location
Which we can then add rows to, but let's convert your data first, so the keys match the above field names:
data = {
'duration': 720,
'language': 'sv',
'link': 'https://vimeo.com/neweuropefilmsale/incidentbyabank',
'name': 'INCIDENT BY A BANK',
'user': {
'link': 'https://vimeo.com/neweuropefilmsales',
'location': 'Warsaw, Poland',
'name': 'New Europe Film Sales'
}
}
for key, value in data['user'].items():
data['user_{}'.format(key)] = value
del data['user']
This ends up with the data dictionary like this:
data = {
'duration': 720,
'language': 'sv',
'link': 'https://vimeo.com/neweuropefilmsale/incidentbyabank',
'name': 'INCIDENT BY A BANK',
'user_link': 'https://vimeo.com/neweuropefilmsales',
'user_location': 'Warsaw, Poland',
'user_name': 'New Europe Film Sales',
}
We can now simply insert this as a whole row to the CSV writer, and everything else is done automatically:
# Using the same writer from above, insert the data from above
writer.writerow(data)
That's it, now just import this into your Google spreadsheets :)
This is a simple solution using recursion:
dictionary = {
'duration': 720,
'language': 'sv',
'link': 'https://vimeo.com/neweuropefilmsale/incidentbyabank',
'name': 'INCIDENT BY A BANK',
'user': {
'link': 'https://vimeo.com/neweuropefilmsales',
'location': 'Warsaw, Poland',
'name': 'New Europe Film Sales'
}
}
def flatten(current: dict, result: list=[]):
if isinstance(current, dict):
for key in current:
flatten(current[key], result)
else:
result.append(current)
return result
result = flatten(dictionary)
print(result)
Explanation: We call flatten() until we reach a value of the dictionary, that is not a dictionary itself (if isinstance(current, dict):). If we reach this value, we append it to our result list. It will work for any number of nested dictionaries.
See: How would I flatten a nested dictionary in Python 3?
I used the same solution, but I've changed the result collection to be a list.