Pick two items from a list based on a condition - python-3.x

Here is the simplified version of the problem ;)
Given following list,
my_list = [{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'mango', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'},
{'name': 'leek', 'type': 'vegetable'}]
How to pick only two items from the list for a particular type to achieve following?
filtered = [{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'leek', 'type': 'vegetable'}]

You can use itertools.groupby to group the elements of your list based on type and the grab only the first 2 elements from each group.
>>> from itertools import groupby
>>> f = lambda k: k['type']
>>> n = 2
>>> res = [grp for _,grps in groupby(sorted(my_list, key=f), f) for grp in list(grps)[:n]]
>>> pprint(res)
[{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'}]

you can groupby then pick the first 2:
from itertools import groupby
a = [list(j)[:2] for i, j in groupby(my_list, key = lambda x: x['type'])]
print(a)
[[{'name': 'apple', 'type': 'fruit'}, {'name': 'orange', 'type': 'fruit'}],
[{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'}]]
sum(a,[])
Out[299]:
[{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'}]

Related

How can I enter a dictionary inside an another empty dictionary?

The example code -
innerdict = {}
outerdict = {}
for line in range(1, 10, 2):
for j in range(1, 10, 2):
line_tuple = ("Item" + str( line ), int( line ))
key = line_tuple[1]
if line ==j:
outerdict[key] = dict( innerdict )
outerdict[key] = {'Name': '{0}'.format( "item"+str(j) ), 'Price': '{0}'.format( j )}
print(outerdict)
The ouput comes out like this-
{1: {'Name': 'item1', 'Price': '1'}, 3: {'Name': 'item3', 'Price': '3'}, 5: {'Name': 'item5', 'Price': '5'}, 7: {'Name': 'item7', 'Price': '7'}, 9: {'Name': 'item9', 'Price': '9'}}
The above output is achievable since it is conventional. I found a lot of online suggestions regarding nested dictionary comprehension.
But I want the output to come out like below-
{{'Name': 'item1', 'Price': '1'}, {'Name': 'item3', 'Price': '3'}, {'Name': 'item5', 'Price': '5'}, {'Name': 'item7', 'Price': '7'}, {'Name': 'item9', 'Price': '9'}}
Thanks in advance!
This is not possible, as the dict objects are not hashable.
{{1:2}} would mean putting a dict {1:2} into a set, which is not possible because of the un-hashability of the objects mentioned above. Better put them in a list:
[{1:2}, {2:3}]
What you want is something like a list of dictionaries. And this {{'Name': 'item1', 'Price': '1'}, {'Name': 'item3', 'Price': '3'}, {'Name': 'item5', 'Price': '5'}, {'Name': 'item7', 'Price': '7'}, {'Name': 'item9', 'Price': '9'}} is invalid as dictionary is considered to be a key-value pair and there is no key in this.
It can be checked by assigning the above to a variable and then checking its type.
d = {{'Name': 'item1', 'Price': '1'}, {'Name': 'item3', 'Price': '3'}, {'Name': 'item5', 'Price': '5'}, {'Name': 'item7', 'Price': '7'}, {'Name': 'item9', 'Price': '9'}}
print(type(d))
It will result in an error saying it's unhashable.

create nested object from records oriented dictionary

I have the following data frame:
[{'Name': 'foo', 'Description': 'foobar', 'Value': '5'}, {'Name': 'baz', 'Description': 'foobaz', 'Value': '4'}, {'Name': 'bar', 'Description': 'foofoo', 'Value': '8'}]
And I'd like to create two nested categories. One category for Name, Description keys and another category for Value key. Example of output for one object:
{'details': {'Name': 'foo', 'Description': 'foobar'}, 'stats': { 'Value': '5' }}
so far I'm only able to achieve this by joining "manually" each items. I'm pretty sure this is not the right solution.
Here is one solution:
data = [{'Name': 'foo', 'Description': 'foobar', 'Value': '5'}, {'Name': 'baz', 'Description': 'foobaz', 'Value': '4'}, {'Name': 'bar', 'Description': 'foofoo', 'Value': '8'}]
df = pd.DataFrame(data)
m = df.to_dict('records')
stats = [{'stats':i.popitem()} for i in m]
details = [{'details':i} for i in m]
g = list(zip(details,stats))
print(*g)
({'details': {'Name': 'foo', 'Description': 'foobar'}}, {'stats': ('Value', '5')}) ({'details': {'Name': 'baz', 'Description': 'foobaz'}}, {'stats': ('Value', '4')}) ({'details': {'Name': 'bar', 'Description': 'foofoo'}}, {'stats': ('Value', '8')})
The major function here is popitem(), which destructively pulls out a pair from the dictionary.
Using list comprehension:
from json import dump
result = [{
'details': {col: row[col] for col in ['Name', 'Description']},
'stat': {col: row[col] for col in ['Value']}
} for row in df.to_dict(orient='records')]
# Write to file
with open('result.json', 'w') as f:
dump(result, f)

How to extract multiple data points from multiple strings in Python?

I have a dataset that consists of thousands of entries such as the following:
[{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2015',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '392168030'},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2014',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '384356146'},
....17020-ish rows later.....
{'country': {'id': 'XH', 'value': 'IDA blend'},
'date': '1960',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '163861743'},
...]
I want to create a DataFrame using pandas such that y-axis = 'id' and x-axis = 'date', with 'value' being the stored value. I can't figure out the best way to approach this...
EDIT:
Imagine a sheet with just numbers ('value' from the dataset). The x-axis columns would be the extracted date and the y-axis rows would be the country id ('id'). The final object would be a dataset that is y*x in size. The numbers would all be of type 'float'.
EDIT 2:
The dataset represents ~304 countries from 1960 - 2016, so there are approximately 304 * 56 = 17024 entries in the dataset. I need to store the 'value' (where for entry 2, value = 392168030) with respect to each country and date.
EDIT 3:
Using the above data, an example output data set would be structured thusly:
2016 . 2015 . 2014 . ... 1960
1A . None . 392168030 384356146 . ... w
...
XH . x y z 163861743
First extract the information from origin dataset:
dataset = [{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2015',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '392168030'},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2014',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '384356146'},
{'country': {'id': 'XH', 'value': 'IDA blend'},
'date': '1960',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '163861743'}]
df = [[entry['country']['id'], entry['date'], entry['value']] for entry in dataset]
df = pd.DataFrame(df, columns=['id','date','value'])
Then pivot the datafrme:
df = df.pivot(index='id',columns='date',values='value')
The output:
date 1960 2014 2015 2016
id
1A None 384356146 392168030 None
XH 163861743 None None None
I had to make a guess about how the "thousands of entries" might look but I came up with this possible solution.
entry1 = {
'country': {'id': '1A', 'value': 'Arab World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None
}
entry2 = {
'country': {'id': '1B', 'value': 'Another World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None
}
entries = [entry1, entry2]
countries_index = []
date_cols = []
countries_index = []
date_cols = []
for entry in entries:
date_cols.append(entry['date'])
countries_index.append(entry['country']['id'])
import pandas as pd
df = pd.DataFrame(date_cols, columns=['date'], index=countries_index)
This creates a data frame,df which looks like this...
date
1A 2016
1B 2016

Splitting tuple into segments

If I have the following tuple...:
("Year-7 [{'Name': 'Barry', 'Age': 11}, {'Name': 'Larry', 'Age': 11}]",
"Year-8 [{'Name': 'Harry', 'Age': 11}, {'Name': 'Parry', 'Age': 11}]",
"Year-9 [{'Name': 'Sally', 'Age': 11}, {'Name': 'Garry', 'Age': 11}]")
How do I split this up into the following tuples?
("Year-7", "Year-8, "Year-9")
("[{'Name': 'Barry', 'Age': 11}, {'Name': 'Larry', 'Age': 11}]", "[{'Name': 'Harry', 'Age': 11}, {'Name': 'Parry', 'Age': 11}]", "[{'Name': 'Sally', 'Age': 11}, {'Name': 'Garry', 'Age': 11}]")
Thanks in advance,
Jack
.................
t = ("Year-7 [{'Name': 'Barry', 'Age': 11}, {'Name': 'Larry', 'Age': 11}]",
"Year-8 [{'Name': 'Harry', 'Age': 11}, {'Name': 'Parry', 'Age': 11}]",
"Year-9 [{'Name': 'Sally', 'Age': 11}, {'Name': 'Garry', 'Age': 11}]")
tuple([k[7:] for k in list(t)])
Did you also want:
tuple([k[:6] for k in list(t)])

pandas dictionary to list of dictionary key/values

I am trying to perform list comprehension with nested list of dictionary from data-frame and I get this after some tryouts. Is there pandas functionality that I might be missing than using for loops?
file = ['a.txt','a.txt','b.txt','c.txt']
year = ['2016','2017','2016','2018']
paper = ['Biology','Biology','Math','English']
name = ['Ann,Matt','Maya','Rob',np.nan]
df = pd.DataFrame({
'file':file,
'year':year,
'paper':paper,
'name':name
})
df
dfd = df.to_dict('index')
dfd
>>>
{0: {'file': 'a.txt', 'year': '2016', 'paper': 'Biology', 'name': 'Ann,Matt'},
1: {'file': 'a.txt', 'year': '2017', 'paper': 'Biology', 'name': 'Maya'},
2: {'file': 'b.txt', 'year': '2016', 'paper': 'Math', 'name': 'Rob'},
3: {'file': 'c.txt', 'year': '2018', 'paper': 'English', 'name': nan}}
Tried:
d = []
for i in dfd.items():
d.append(i)
>>>
[(0,
{'file': 'a.txt', 'year': '2016', 'paper': 'Biology', 'name': 'Ann,Matt'}),
(1, {'file': 'a.txt', 'year': '2017', 'paper': 'Biology', 'name': 'Maya'}),
(2, {'file': 'b.txt', 'year': '2016', 'paper': 'Math', 'name': 'Rob'}),
(3, {'file': 'c.txt', 'year': '2018', 'paper': 'English', 'name': nan})]
I am trying to get it like this: its in tuple format.
[{'file': 'a.txt', 'year': '2016', 'paper': 'Biology', 'name': 'Ann,Matt'},
{'file': 'a.txt', 'year': '2017', 'paper': 'Biology', 'name': 'Maya'},
{'file': 'b.txt', 'year': '2016', 'paper': 'Math', 'name': 'Rob'},
{'file': 'c.txt', 'year': '2018', 'paper': 'English', 'name': nan}]
You almost had it correct above. You can use dfd.items() to iterate over both the keys and values at once of your dfd dict. Then you can ignore the key part of the tuple and just add the value to the list comprehension like this:
d = [v for k,v in dfd.items()]
Just tested that with the data and it gives the output you want

Resources