How to convert list of list into dictionary object? - python-3.x

I have a list of lists that looks like this:
[[['1',
'1#1`']],
[['2', '2#2.com']],
[['3', '3#3.com']],
[['4', '4#4.com']],
[['5', '5#5.com']],
[['6', '6#6']],
[['7', '7#7']],
[['8', '8#8']],
[['8.5', '8.5#8.5']],
[['9', '9#9']],
[['10', '10#10']],
[['11', '11#11']],
[['12', '12#12']],
[['13', '13#13.com']],
[['14', '14#14.com']],
[['15', '15#15.com']],
[['16', '16#16.com']],
[['17', '17#17.com']],
[['18#18.com', '18']],
[['19', '19#19.com']]]
is there anyway I can clean up the list by making it into a dictionary object like so:
[{id:1,email:1#1},{id:2,email:2#2.com}]
Ideally if there are any emails in the id spot they flipped to the email spot?

You can use a list comprehension:
In [1]: mylist = [[['1',
...: '1#1`']],
...: [['2', '2#2.com']],
...: [['3', '3#3.com']],
...: [['4', '4#4.com']],
...: [['5', '5#5.com']],
...: [['6', '6#6']],
...: [['7', '7#7']],
...: [['8', '8#8']],
...: [['8.5', '8.5#8.5']],
...: [['9', '9#9']],
...: [['10', '10#10']],
...: [['11', '11#11']],
...: [['12', '12#12']],
...: [['13', '13#13.com']],
...: [['14', '14#14.com']],
...: [['15', '15#15.com']],
...: [['16', '16#16.com']],
...: [['17', '17#17.com']],
...: [['18#18.com', '18']],
...: [['19', '19#19.com']]]
In [2]: [{'id': i, 'email': e} for i, e in (pair[0] if '#' not in pair[0][0] else reversed(pair[0]) for pair in mylist)]
Out[2]:
[{'id': '1', 'email': '1#1`'},
{'id': '2', 'email': '2#2.com'},
{'id': '3', 'email': '3#3.com'},
{'id': '4', 'email': '4#4.com'},
{'id': '5', 'email': '5#5.com'},
{'id': '6', 'email': '6#6'},
{'id': '7', 'email': '7#7'},
{'id': '8', 'email': '8#8'},
{'id': '8.5', 'email': '8.5#8.5'},
{'id': '9', 'email': '9#9'},
{'id': '10', 'email': '10#10'},
{'id': '11', 'email': '11#11'},
{'id': '12', 'email': '12#12'},
{'id': '13', 'email': '13#13.com'},
{'id': '14', 'email': '14#14.com'},
{'id': '15', 'email': '15#15.com'},
{'id': '16', 'email': '16#16.com'},
{'id': '17', 'email': '17#17.com'},
{'id': '18', 'email': '18#18.com'},
{'id': '19', 'email': '19#19.com'}]
If you have arbitrary nesting, you can try this:
def flatten(lst):
for sub in lst:
if isinstance(sub, list):
yield from flatten(sub)
else:
yield sub
[{'id': i, 'email': e} for i, e in (pair if '#' not in pair[0] else reversed(pair) for pair in zip(*[flatten(mylist)]*2))]

Related

ValueError: Invalid broadcasting comparison with block values - how to resolve it in pythonic way

Hi I have two data frames and trying to compare the values in it but facing a ValueError in broadcasting:
dict_1 = {'a': {0: [{'value': 'A123',
'label': 'Professional'},
{'value': 'B141', 'label': 'Passion'}]},
'b': {0: [{'value': 'B5529',
'label': 'Innovation'},
{'value': 'B3134', 'label': 'Businees Value'},
{'value': 'B3856',
'label': 'Electrofication'},
{'value': 'B3859', 'label': 'Insurance'},
{'value': 'B3856', 'label': 'Requirements'},
{'value': 'B3345', 'label': 'Stories'}]},
"c" : "hello"}
dict_2 = {'a': {0: np.nan},
'b': {0: [{'value': 'B4785',
'label': 'Innovation'},
{'value': 'B4635', 'label': 'Businees Value'},
{'value': 'B1234', 'label': 'Requirements'},
{'value': 'B9853', 'label': 'Stories'}]},
'c': "hello"
}
df1 = pd.DataFrame(dict_1)
df2 = pd.DataFrame(dict_2)
Here I wanted to compare two rows only but not two complete dataframes (as I had a scenario that shape of df1=(500, 2) and shape of shape of df2 = (1, 2)). So I used the below code two extract the different values in the rows .
df1[~(df1[['a', 'b', 'c']] == df2[['a', 'b', 'c']].iloc[0])]
The desired result should be:
Here, df2 which has one row should compare with every row values of df1(in my scenario I have more than 1 row). If they are identical then it should be nan else I should get the corresponding values of df1
You can use mask and replace True matches with np.nan. If df2 and df1 have a single row
condition = df1 == df2
df1.mask(condition, other=np.nan)
Output:
Now if df2 has more than one row you can apply a callable that return True or False values, in this case calling apply to compare each row of df1 to the first element of df2. Otherwise one gets a different shape error.
dict_1 = {'a':
{0: [{'value': 'A123',
'label': 'Professional'},
{'value': 'B141', 'label': 'Passion'}],
1: [{'value': 'B123',
'label': 'Professional'},
{'value': 'B141', 'label': 'Passion'}]
},
'b': {0: [{'value': 'B5529',
'label': 'Innovation'},
{'value': 'B3134', 'label': 'Businees Value'},
{'value': 'B3856',
'label': 'Electrofication'},
{'value': 'B3859', 'label': 'Insurance'},
{'value': 'B3856', 'label': 'Requirements'},
{'value': 'B3345', 'label': 'Stories'}],
1: [{'value': 'C5529',
'label': 'Innovation'},
{'value': 'B3134', 'label': 'Businees Value'},
{'value': 'B3856',
'label': 'Electrofication'},
{'value': 'B3859', 'label': 'Insurance'},
{'value': 'B3856', 'label': 'Requirements'},
{'value': 'B3345', 'label': 'Stories'}],
},
"c" : {0: "hello", 1: "hola"}}
# New df1 with two rows
df1 = pd.DataFrame(dict_1)
condition = df1.apply(lambda x: x==df2.iloc[0], axis=1)
df1.mask(condition, other=np.nan)
Output

Remove some same-index elements from two lists based on one of them

Let's suppose that I have these two lists:
a = [{'id': 3}, {'id': 7}, None, {'id': 1}, {'id': 6}, None]
b = ['5', '5', '3', '5', '3', '5']
I want to filter both at the same-index based though only on a and specifically on filtering out the None elements of a.
So finally I want to have this:
[{'id': 3}, {'id': 7}, {'id': 1}, {'id': 6}]
['5', '5', '5', '3']
I have written this code for this:
a_temp = []
b_temp = []
for index, el in enumerate(a):
if el:
a_temp.append(a[index])
b_temp.append(b[index])
a = a_temp[:]
b = b_temp[:]
I am wondering though if there is any more pythonic way to do this?
This solution
uses zip() to group corresponding elements of a and b together
Makes a list of 2-tuples of corresponding elements, such that the corresponding element of a is not None
Use the zip(*iterable) idiom to flip the dimensions of the list, thus separating the single list of 2-tuples into two lists of singletons, which we assign to new_a and new_b
a = [{'id': 3}, {'id': 7}, None, {'id': 1}, {'id': 6}, None]
b = ['5', '5', '3', '5', '3', '5']
new_a, new_b = zip(*((x, y) for x, y in zip(a, b) if x))
# new_a = ({'id': 3}, {'id': 7}, {'id': 1}, {'id': 6})
# new_b = ('5', '5', '5', '3')
If you just want a simple solution, please try:
a = [{'id': 3}, {'id': 7}, None, {'id': 1}, {'id': 6}, None]
b = ['5', '5', '3', '5', '3', '5']
n = []
for i in range(len(b)):
if a[i] is None:
n.append(i)
for i in sorted(n, reverse=True):
a.pop(i)
b.pop(i)
a
[{'id': 3}, {'id': 7}, {'id': 1}, {'id': 6}]
b
['5', '5', '5', '3']

Pick two items from a list based on a condition

Here is the simplified version of the problem ;)
Given following list,
my_list = [{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'mango', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'},
{'name': 'leek', 'type': 'vegetable'}]
How to pick only two items from the list for a particular type to achieve following?
filtered = [{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'leek', 'type': 'vegetable'}]
You can use itertools.groupby to group the elements of your list based on type and the grab only the first 2 elements from each group.
>>> from itertools import groupby
>>> f = lambda k: k['type']
>>> n = 2
>>> res = [grp for _,grps in groupby(sorted(my_list, key=f), f) for grp in list(grps)[:n]]
>>> pprint(res)
[{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'}]
you can groupby then pick the first 2:
from itertools import groupby
a = [list(j)[:2] for i, j in groupby(my_list, key = lambda x: x['type'])]
print(a)
[[{'name': 'apple', 'type': 'fruit'}, {'name': 'orange', 'type': 'fruit'}],
[{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'}]]
sum(a,[])
Out[299]:
[{'name': 'apple', 'type': 'fruit'},
{'name': 'orange', 'type': 'fruit'},
{'name': 'tomato', 'type': 'vegetable'},
{'name': 'potato', 'type': 'vegetable'}]

how to make collection of list from dict having same id?

[{'id': 6, 'name': 'Jorge'}, {'id': 6, 'name': 'Matthews'}, {'id': 6, 'name': 'Matthews'}, {'id': 7, 'name': 'Christine'}, {'id': 7, 'name': 'Smith'}, {'id': 7, 'name': 'Chris'}]
And i wanna make collection of list having same id like this
[{'id': 6, 'name': ['Jorge','Matthews','Matthews']}, {'id': 7, 'name': ['Christine','Smith','Chris']}]
L = [{'id': 6, 'name': 'Jorge'}, {'id': 6, 'name': 'Matthews'}, {'id': 6, 'name': 'Matthews'}, {'id': 7, 'name': 'Christine'}, {'id': 7, 'name': 'Smith'}, {'id': 7, 'name': 'Chris'}]
temp = {}
for d in L:
if d['id'] not in temp:
temp[d['id']] = []
temp[d['id']].append(d['name'])
answer = []
for k in sorted(temp):
answer.append({'id':k, 'name':temp[k]})
You can use itertools.groupby to group all the ids and then just extract the name for each element in the group:
In [1]:
import itertools as it
import operator as op
L = [{'id': 6, 'name': 'Jorge'}, ...]
_id = op.itemgetter('id')
[{'id':k, 'name':[e['name'] for e in g]} for k, g in it.groupby(sorted(L, key=_id), key=_id)]
Out[1]:
[{'id': 6, 'name': ['Jorge', 'Matthews', 'Matthews']},
{'id': 7, 'name': ['Christine', 'Smith', 'Chris']}]

How to extract multiple data points from multiple strings in Python?

I have a dataset that consists of thousands of entries such as the following:
[{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2015',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '392168030'},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2014',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '384356146'},
....17020-ish rows later.....
{'country': {'id': 'XH', 'value': 'IDA blend'},
'date': '1960',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '163861743'},
...]
I want to create a DataFrame using pandas such that y-axis = 'id' and x-axis = 'date', with 'value' being the stored value. I can't figure out the best way to approach this...
EDIT:
Imagine a sheet with just numbers ('value' from the dataset). The x-axis columns would be the extracted date and the y-axis rows would be the country id ('id'). The final object would be a dataset that is y*x in size. The numbers would all be of type 'float'.
EDIT 2:
The dataset represents ~304 countries from 1960 - 2016, so there are approximately 304 * 56 = 17024 entries in the dataset. I need to store the 'value' (where for entry 2, value = 392168030) with respect to each country and date.
EDIT 3:
Using the above data, an example output data set would be structured thusly:
2016 . 2015 . 2014 . ... 1960
1A . None . 392168030 384356146 . ... w
...
XH . x y z 163861743
First extract the information from origin dataset:
dataset = [{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2015',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '392168030'},
{'country': {'id': '1A', 'value': 'Arab World'},
'date': '2014',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '384356146'},
{'country': {'id': 'XH', 'value': 'IDA blend'},
'date': '1960',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': '163861743'}]
df = [[entry['country']['id'], entry['date'], entry['value']] for entry in dataset]
df = pd.DataFrame(df, columns=['id','date','value'])
Then pivot the datafrme:
df = df.pivot(index='id',columns='date',values='value')
The output:
date 1960 2014 2015 2016
id
1A None 384356146 392168030 None
XH 163861743 None None None
I had to make a guess about how the "thousands of entries" might look but I came up with this possible solution.
entry1 = {
'country': {'id': '1A', 'value': 'Arab World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None
}
entry2 = {
'country': {'id': '1B', 'value': 'Another World'},
'date': '2016',
'decimal': '0',
'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
'value': None
}
entries = [entry1, entry2]
countries_index = []
date_cols = []
countries_index = []
date_cols = []
for entry in entries:
date_cols.append(entry['date'])
countries_index.append(entry['country']['id'])
import pandas as pd
df = pd.DataFrame(date_cols, columns=['date'], index=countries_index)
This creates a data frame,df which looks like this...
date
1A 2016
1B 2016

Resources