I have a DataFrame called df_imdb:
Each row contains the information about a movie, This DataFrame has a column name 'genres' that shows the genre of that movie that could have more than one genre e.g. [{'id': 53, 'name': 'Thriller'}, {'id': 28, 'name': 'Action'}, {'id': 9648, 'name': 'Mystery'}]
I want to find out what are the most genre used in this movies (find the top 3 most used genres in this DataFrame)
The data is a list of dictionaries, multiple options here:
Option 1: Pure pandas, convert the values associated with key name to a Series and use value_counts
df = pd.DataFrame({'genres':[[{'id': 53, 'name': 'Thriller'}, {'id': 28, 'name': 'Action'}, {'id': 9648, 'name': 'Mystery'}],[{'id': 53, 'name': 'Thriller'}, {'id': 30, 'name': 'Blah'}, {'id': 9648, 'name': 'Mystery'}]]})
df['genres'].apply(lambda x: pd.Series([i['name'] for i in x]))\
.stack().value_counts()
You get
Thriller 2
Mystery 2
Action 1
Blah 1
Option 2: Convert the values to list and use Counter
from collections import Counter
l_genres = df['genres'].apply(lambda x: [i['name'] for i in x]).sum()
Counter(l_genres)
You get
Counter({'Thriller': 2, 'Action': 1, 'Mystery': 2, 'Blah': 1})
df['genres'].apply(lambda x: pd.Series([i['name'] for i in x])).stack().value_counts()
Edit: Data type is str and not list, first use literal_eval
from ast import literal_eval
df['genres'] = df['genres'].apply(literal_eval)
Give this a try:
c = df_imbd['genres'].apply(lambda x: [n['name'] for n in x]).sum()
pd.Series(c).value_counts()
Related
I'm newbie in python. I have the next list with dictionaries inside.
l = [{'id': 2, 'source_id': 100},
{'id': 1, 'source_id': 100},
{'id': 3, 'source_id': 1234},
{'id': 5, 'source_id': 200},
{'id': 4, 'source_id': 200}]
And I want to get result like:
l = [{'id': 1, 'source_id': 100},
{'id': 3, 'source_id': 1234},
{'id': 4, 'source_id': 200}]
I understand first step is sorting the list:
sorted_sources_list = sorted(l, key=lambda source: source['id'])
But I don't know how delete duplicate with the greatest id.
You can iterate through each item and add the item to another empty array, but before you add, check if the item already exists within the new array. If it does, its a duplicate item, but if it doesn't, that's obvious, its not duplicate item.
Try:
getId = lambda x: x.get("source_id", None)
l = list(map(lambda x: list(list(x)[-1])[-1], groupby(sorted(l, key=getId), getId)))
Outputs:
[{'id': 1, 'source_id': 100}, {'id': 4, 'source_id': 200}, {'id': 3, 'source_id': 1234}]
how best can i split this column in pandas to get a column with the genre? [{'id': 16, 'name': 'Animation'} in this case 'Animation'
use it like this:
dict = {'id': 16, 'name': 'Animation'}
df = pd.DataFrame(dict)
df['name']
and you will get the desired output
I have these two list of dicts like below:
src_dict = [{"id": 111, "name": "sam"}, {"id": 333, "name": "name_changed_to_not_ross"}, {"id": 444, "name": "jack"}]
dest_dict = [{"rec_id":"abc","fields":{"id":111,"name":"sam"}},
{"rec_id":"pqr","fields":{"id":222,"name":"john"}},
{"rec_id":"xyz","fields":{"id":333,"name":"ross"}}]
I need to create an insert and update list but in different format. While creating update list I need to get corresponding value from both list of dicts. (using blhsing approach this is possible but not getting complete result)
ids = {d['fields']['id'] for d in dest_dict}
records_update = [d for d in src_dict if d['id'] in ids]
records_insert = [d for d in src_dict if d['id'] not in ids]
Here is the result
records_insert (this is good)
[{'id': 444, 'name': 'jack'}]
records_update (this is the issue)
[{'id': 111, 'name': 'sam'}, {'id': 333, 'name': 'ross'}]
This is the output I expect in the records_update:
records_update (expected output)
[{'rec_id': 'abc', 'fields': {'id': 111, 'name': 'sam'}},
{'rec_id': 'xyz', 'fields': {'id': 333, 'name': 'name_changed_to_not_ross'}}]
Thanks!
I think you just got the order wrong.
ids = {d['id'] for d in src_dict}
records_update = [d for d in dest_dict if d['fields']['id'] in ids]
>>> records_update
[{'rec_id': 'abc', 'fields': {'id': 111, 'name': 'sam'}}, {'rec_id': 'xyz', 'fields': {'id': 333, 'name': 'ross'}}]
I have dictionary,
dict={
{'dept': 'ECE', 'id': 1, 'name': 'asif', 'City': 'Bangalore'},
{'dept': 'ECE', 'id': 2, 'name': 'iqbal', 'City': 'Kolkata'}
}
I wanted to is there any way so that I can filter out the name and dept on the basis of City?
I tried but couldn't find any way out.
Ok, If I understand your question, and from the fact that set has been converted to a list, I think something like that could be what you are looking for:
data=[
{'dept': 'ECE', 'id': 1, 'name': 'asif', 'City': 'Bangalore'},
{'dept': 'ECE', 'id': 2, 'name': 'iqbal', 'City': 'Kolkata'},
]
my_keys = ('name','dept',)
my_cities = ['Kolkata',]
my_dicts = [{key:value for key, value in dictionary.items() if key in my_keys} for dictionary in data if dictionary['City'] in my_cities]
print(my_dicts)
I'm fairly new to python and I don't know how can I retrieve a value from a inner dictionary:
This is the value I have in my variable:
variable = {'hosts': 1, 'key':'abc', 'result': {'data':[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]}, 'version': 2}
What I want to do is assign a new variable the number of licenses 'mike' has, for example.
Sorry for such a newbie, and apparent simple question, but I'm only using python for a couple of days and need this functioning asap. I've search the oracle (google) and stackoverflow but haven't been able to find an answer...
PS: Using python3
Working through it and starting with
>>> from pprint import pprint
>>> pprint(variable)
{'hosts': 1,
'key': 'abc',
'result': {'data': [{'id': 'john', 'licenses': 2},
{'id': 'mike', 'licenses': 1}]},
'version': 2}
First, let's get to the result dict:
>>> result = variable['result']
>>> pprint(result)
{'data': [{'id': 'john', 'licenses': 2}, {'id': 'mike', 'licenses': 1}]}
and then to its data key:
>>> data = result['data']
>>> pprint(data)
[{'id': 'john', 'licenses': 2}, {'id': 'mike', 'licenses': 1}]
Now, we have to scan that for the 'mike' dict:
>>> for item in data:
... if item['id'] == 'mike':
... print item['licenses']
... break
...
1
You could shorten that to:
>>> for item in variable['result']['data']:
... if item['id'] == 'mike':
... print item['licenses']
... break
...
1
But much better would be to rearrange your data structure like:
variable = {
'hosts': 1,
'version': 2,
'licenses': {
'john': 2,
'mike': 1,
}
}
Then you could just do:
>>> variable['licenses']['mike']
1
You can use nested references as follows:
variable['result']['data'][1]['licenses'] += 1
variable['result'] returns:
{'data':[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]}
variable['result']['data'] returns:
[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]
variable['result']['data'][1] returns:
{'licenses': 1, 'id':'mike'}
variable['result']['data'][1]['licenses'] returns:
1
which we then increment using +=1