I have 5 lists. I would like to map these 5 lists to a list of dictionaries, each one with a key/value pair from each of the 5 lists for every dictionary instance [n]. My first thought is to set up a loop to enumerate each occurrence of a dictionary in the list of dictionaries, but not sure what that might look like. Any thoughts?
name = ["John", "Sally", "Allen", "Nick", "Charles", "Richie", "Derek"]
age = [21, 36, 33, 29, 40, 18, 35]
hometown = ["New York", "Washington", "Philadelphia", "Atlanta", "Miami", "LA", "Seattle"]
favorite_food = ["chicken", "steak", "spaghetti", "fish", "oreos", "hamburger", "cereal"]
pet = ["cat", "fish", "dog", "hamster", "dog", "cat", "snake"]
list of dictionaries such that
D[0]={'name':'John', 'age':'21', 'hometown': 'New York', 'favorite_food':
'chicken', 'pet': 'cat'}
You can use the built-in function zip and list/dict comprehensions for this:
name = ["John", "Sally", "Allen", "Nick", "Charles", "Richie", "Derek"]
age = [21, 36, 33, 29, 40, 18, 35]
hometown = ["New York", "Washington", "Philadelphia", "Atlanta", "Miami", "LA",
"Seattle"]
favorite_food = ["chicken", "steak", "spaghetti", "fish", "oreos", "hamburger", "cereal"]
pet = ["cat", "fish", "dog", "hamster", "dog", "cat", "snake"]
fields = ["name", "age", "hometown", "favourite_food", "pet"]
zipped = zip(name, age, hometown, favorite_food, pet)
d = [{k: v for k, v in zip(fields,el)} for el in zipped]
The zip function will allow you to "pair" up or tuple up several lists.
For the first three attributes, you can do this to get a tuple:
>>> for i in zip(name, age, hometown):
... print(i)
...
('John', 21, 'New York')
('Sally', 36, 'Washington')
('Allen', 33, 'Philadelphia')
('Nick', 29, 'Atlanta')
('Charles', 40, 'Miami')
('Richie', 18, 'LA')
('Derek', 35, 'Seattle')
If you make a list
L = []
you can add dictionaries to it:
>>> L=[]
>>> for i in zip(name, age, hometown):
... d = {}
... d['name']=t[0]
... d['age']=t[1]
... d['hometown']=t[2]
... L.append(d)
...
That's for the first three - extending to the whole lot should be clear.
Related
that's a basic one but I'm stuck.
I want to create a list with combined strings and numbers like [df_22, df_23, ... df_30].
I have tried
def createList(r1, r2):
return str(list(range(r1, r2+1)))
so it gives me a list of numbers:
In: mylist = createList(2, 30)
In: mylist
Out: '[22,23, 24, 25, 26, 27, 28, 29, 30]'
I am not sure how to add 'df_' to it because 'return df + str(list(range(r1, r2+1)))' gives an UFuncTypeError.
and
def createList(r1, r2):
res = 'df_'
return res + str(list(range(r1, r2+1)))
In: mylist = createList(22, 30)
In: mylist
Out: 'df_[22, 23, 24, 25, 26, 27, 28, 29, 30]'
I need my list to be
mylist
Out: [df_22, df_21, ... df_30]
This does what your title seems to describe:
def createList(r1, r2):
return str(['df_%d'%x for x in range(r1, r2+1)])
For a dictionary "a", with the keys "x, y and z" containing integer values.
What is the most efficient way to produce a joint list if I want to merge two keys in the dictionary (considering the size of the keys are identical and the values are of interger type)?
x+y and y+z ? .
Explanation:
Suppose you have to merge two keys and merge them into a new list or new dict without altering original dictionaries.
Example:
a = {"x" : {1,2,3,....,50}
"y" : {1,2,3,.....50}
"z" : {1,2,3,.....50}
}
Desired list:
x+y = [2,4,6,8.....,100]
y+z = [2,4,6,......,100]
A very efficient way is to do convert the dictionary to a pandas dataframe and allow it to do the job for you with its vectorized methods:
import pandas as pd
a = {"x" : range(1,51), "y" : range(1,51), "z" : range(1,51)}
df = pd.DataFrame(a)
x_plus_y = (df['x'] + df['y']).to_list()
y_plus_z = (df['y'] + df['z']).to_list()
print(x_plus_y)
#[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100]
It seems like you're trying to mimic a join-type operation. That is not native to python dicts, so if you really want that type of functionality, I'd recommend looking at the pandas library.
If you just want to merge dict keys without more advanced features, this function should help:
from itertools import chain
from collections import Counter
from typing import Dict, List, Set, Tuple
def merge_keys(data: Dict[str, Set[int]], *merge_list: List[Tuple[str, str]]):
merged_data = dict()
merged_counts = Counter(list(chain(*map(lambda k: list(data.get(k, {})) if k in merge_list else [], data))))
merged_data['+'.join(merge_list)] = [k*v for k,v in merged_counts.items()]
return merged_data
You can run this with merge_keys(a, "x", "y", "z", ...), where a is the name of your dict- you can put as many keys as you want ("x", "y", "z", ...), since this function takes a variable number of arguments.
If you want two separate merges in the same dict, all you need to do is:
b = merge_keys(a, "x", "y") | merge_keys(a, "y", "z")
Note that the order of the keys changes the final merged key ("y+z" vs "z+y") but not the value of their merged sets.
P.S: This was actually a little tricky since the original dict had set values, not lists, which aren't ordered, so you can't just add them elementwise. That's why I used Counter here, in case you were wondering.
I have 3 lists as below:
names = ["paul", "saul", "steve", "chimpy"]
ages = [28, 59, 22, 5]
scores = [59, 85, 55, 60]
And I need to convert them to a dictionary like this:
{'steve': [22, 55, 'fail'], 'saul': [59, 85, 'pass'], 'paul': [28, 59, 'fail'], 'chimpy': [5, 60, 'pass']}
'pass' and 'fail' are coming from the score if it is >=60 or not.
I can do this with a series of for loops but I'm looking for more neat/professional method.
Thank you.
Using zip you can do at least this "condensed" implementation:
res = dict()
for n,a,s in zip(names,ages,scores):
res[n] = [a,s,'fail' if s <60 else 'pass']
You could do this very neatly using dictionary comprehension:
D = {name: [score, age, 'fail' if score<60 else 'pass'] for name, score, age in zip(names, scores, ages)}
I'm trying to build a batch generator which takes a large Pandas DataFrame as input and output as a given number of rows (batch_size). I practiced on the smaller dataframe with 10 rows to get it work. I have trouble with the generator function where the for loop below works well on the practice dataframe, and spits out the designated batch size:
for i in range(0, len(df), 3):
lower = i
upper = i+3
print(df.iloc[lower:upper])
However, trying to build this into a generator function is proving difficult:
def Generator(batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
Unfortunately:
next(Generator(e.g.1))
returns the same row over and over again
I'm fairly new to working with this, and I feel I must be missing something, however, I can't spot what.
If anyone could point out what might be the issue I would very much appreciate it.
Edit:
The dataframe is predefined, it is:
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
Create an iterator on the result of calling your Generator and next() that iterator. Else you recreate new Generator "states" for the generator which might have the same "first line" if you provide a seed.
After fixing the indentation problems it works as it should:
import pandas as pd
# I dislike variable scope bleeding into the function, provide df explicitly
def Generator(df, batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah',
'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig',
'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'preTestScore', 'postTestScore'])
# capture a "state" for the generator function
i = iter(Generator(df, 2))
# get the next states from the iterator and print
print(next(i))
print(next(i))
print(next(i))
Output:
first_name last_name age preTestScore postTestScore
8 Sara Ormon 73 26 234
6 Gueniva Jaker 26 52 52
first_name last_name age preTestScore postTestScore
5 Sarah Mornig 53 13 82
9 Cat Koozer 24 26 254
first_name last_name age preTestScore postTestScore
1 Molly Jacobson 52 24 94
2 Tina Ali 36 31 57
Alternativly you can do:
k = Generator(df, 1)
print(next(k))
print(next(k))
print(next(k))
wich works as well.
If you do
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
You create three seperate shuffled df`s that might have the same line shown to you because you only ever print the first "iteration" of it and then it gets discarded
So I have 2 lists:
list_1 = ['BANK OF AMERICA, NATIONAL ASSOCIATION', 'WELLS FARGO & COMPANY', 'JPMORGAN CHASE & CO.', 'U.S. BANCORP', \
'SCOTTRADE BANK', 'CITIBANK, N.A.', 'PNC Bank N.A.', 'CAPITAL ONE FINANCIAL CORPORATION', 'SUNTRUST BANKS, INC.', 'Paypal Holdings, Inc']
list_2 = [["CAPITAL ONE FINANCIAL CORPORATION", 62],["CITIBANK, N.A.", 78],["JPMORGAN CHASE & CO.", 167], \
["Paypal Holdings, Inc", 56], ["SCOTTRADE BANK", 81],["SUNTRUST BANKS, INC.", 57],["U.S. BANCORP", 83],["WELLS FARGO & COMPANY", 179]]
List_1 is the master list which will not change and List_2 does not have BANK OF AMERICA, NATIONAL ASSOCIATION and PNC Bank N.A.. Anyway I want to compare both lists and if the name matches, I want to get the values of the list in the order of list_1. If the name is not in list_2, then it will put a 0 instead.
This is the example output output:
[0, 179, 167, 83, 81, 78, 0, 62, 57, 56]
If you are not restirtced to datatype being list here then you can convert your list_2 to a dict and this becomes a lot more easier:
>>> dict_list_2 = {x[0]:x[1] for x in list_2}
>>> d = []
>>> for x in list_1:
... d.append(dict_list_2.get(x, 0))
...
>>> d
[0, 179, 167, 83, 81, 78, 0, 62, 57, 56]
>>>