I could not combine 2 lists into dictionary using zip() - python-3.x

I just learn a new method zip() from Stackoverflow, but it does not work properly.
def diction():
import random
import string
import itertools
dictionary_key={}
upper_list = []
string_dictionary_upper = string.ascii_uppercase
for n in string_dictionary_upper:
upper_list.append(n)
upper_list_new = list(random.shuffle(upper_list))
dictionary_key = dict(zip(upper_list, upper_list_new))
diction()
The error code is 'NoneType' object is not iterable'. But I could not find why.

If you want to create a shuffled copy of a list do so in two steps
1) Copy the list
2) Shuffle the copy:
upper_list_new = upper_list[:] #create a copy
random.shuffle(upper_list_new) #shuffle the copy
The result can then be zipped with other lists.

Related

Automated creation of multiple datasets in Python-Pytables

In my script, I create several datasets manually:
import tables
dset1 = f.create_earray(f.root, "dataset1", atom=tables.Float64Atom(), shape=(0, 2))
dset2 = f.create_earray(f.root, "dataset2", atom=tables.Float64Atom(), shape=(0, 2))
dset3 = f.create_earray(f.root, "dataset3", atom=tables.Float64Atom(), shape=(0, 2))
...
I want to achieve two things:
Automate the above statements to execute in a loop fashion and create any desired (N) datasets
Then I also use .append method sequentially (as given below) which I also want to automate:
dset1.append(np_array1)
dset2.append(np_array2)
dset3.append(np_array3)
...
Will appreciate any assistance?
It's hard to provide specific advice without more details. If you already have the NumPy arrays, you can create the EArray with the data in a single call (using the obj= parameter). Here's a little code snippet that shows how do do this in a loop.
import tables as tb
import numpy as np
with tb.File('SO_64397597.h5','w') as h5f:
arr1 = np.ones((10,2))
arr2 = 2.*np.ones((10,2))
arr3 = 3.*np.ones((10,2))
arr_list = [arr1, arr2, arr3]
for cnt in range(1,4):
h5f.create_earray("/", "dataset"+str(cnt), obj=arr_list[cnt-1])
The code above doesn't create dataset objects. If you need them, you can access programmatically with this call:
# input where as path to node, name not required
ds = h5f.get_node("/dataset1")
# or
# input where as path to group, and name as dataset name
ds = h5f.get_node("/","dataset1")
If you don't have the arrays when you create the datasets, you can create the EArrays in the first loop, then add the np.array data in a second loop. See below:
with tb.File('SO_64397597.h5','w') as h5f:
for cnt in range(1,4):
h5f.create_earray("/", "dataset"+str(cnt), atom=tables.Float64Atom(), shape=(0, 2))
# get array data...
arr_list = [arr1, arr2, arr3]
# add array data
for cnt in range(1,4):
h5f.get_node("/","dataset"+str(cnt)).append(arr_list[cnt-1])

List comprehension requiring values from seperate lists for function input, with multiple return values

I have two lists. One of the lists contains many pandas.core.frame.DataFrame objects, named X_train_frames and the other contains many pandas.core.series.Series objects named y_train_frames.
Each value in X_train_frames maps to a label in y_train_frames
I would like to use them in a function together and return a list.
I have tried:
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state = 1, sampling_strategy = 'minority')
X_bal_frames, y_bal_frames = [smote.fit_resample(X_frame, y_frame) for X_frame, y_frame in zip(X_train_frames, y_train_frames)]
I receive the following error:
ValueError: too many values to unpack (expected 2)
I expect to return two lists of SMOTE resampled data in this case:
X_bal_frames will have a list of pandas.core.frame.DataFrames
and
y_bal_frames will have a list of pandas.core.series.Series
Given that zip(*x) will return two tuples of a 2D list, each part of the tuple can be saved using the syntax below.
a, b = zip(*x)
For the case of this example.
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state = 1, sampling_strategy = 'minority')
X_bal_frames, y_bal_frames = zip(*[smote.fit_resample(X_frame, y_frame) for X_frame, y_frame in zip(X_train_frames, y_train_frames)])

How to use Python3 multiprocessing to append list?

I have an empty list empty_list = []
and 2 other lists: list1=[[1,2,3],[4,5,6],[7,8,9]],list2=[[10,11,12],[13,14,15],[16,17,18]].
I would like to two things:
I would like to pick up [1,2,3] from list and [10,11,12] to make [1,2,3,10,11,12];[4,5,6]and[13,14,15] to form [4,5,6,13,14,15] and finally [7,8,9],[17,18,19] to form [7,8,9,17,18,19]
append listA=[1,2,3,10,11,12], listB=[4,5,6,13,14,15], listC=[7,8,9,17,18,19] to empty with axis=0.
I have done this work by non-multiprocess but slowly. I would ask how to do it by multiprocess.
I have two naive approaches but do not know how to implement it.
to use pool,
make a func0, for picking up sub-lists and merge them using pool.map(func0,[lst for lst in[ list1,list2,list3]]
make a func1 for appending listA, listB, listC to the empty and then pool.map(func1,[lst for lst in [listA,listB,listC]]),
to use multiprocessing.Array
but I have not figured out how to do it
This sample may not need to use multiprocessing but I have lists with thousands lines.
I am not sure if this can help, but you can avoid some list comprehensions:
empty_list=[]
for l1,l2 in zip(list1,list2):
empty_list.append(l1+l2)
Let's check time performance with some random lists:
import timeit
code_to_test = """
import numpy as np
list1 = [np.random.randint(0,10, 100).tolist() for i in range(10_000)]
list2 = [np.random.randint(0,10, 100).tolist() for i in range(10_000)]
empty_list=[]
for l1,l2 in zip(list1,list2):
empty_list.append(l1+l2)
"""
elapsed_time = timeit.timeit(code_to_test, number=100)/100
print(elapsed_time, ' seconds')
0.12564824399999452 seconds
You can use dask to parallelize numpy operations:
import dask.array as da
list1 = da.from_array(list1)
list2 = da.from_array(list2)
result = da.hstack([list1,list2])
result.compute()

Generating class name list based on class index list

I'm playing with iris_dataset from sklearn.datasets
I want to generate list similiar to iris_dataset['target'] but to have name of class instead of index.
The way I did it:
from sklearn.datasets import load_iris
iris_dataset=load_iris()
y=iris_dataset.target
print("Iris target: \n {}".format(iris_dataset.target))
unique_y = np.unique(y)
class_seq=['']
class_seq=class_seq*y.shape[0]
for i in range(y.shape[0]):
for (yy,tn) in zip(unique_y,iris_dataset.target_names):
if y[i]==yy:
class_seq[i]=tn
print("Class sequence: \n {}".format(class_seq))
but I would like to do it not looping through all of the elements of y, how to do it better way?
The outcome is that I need this list for pandas.radviz plot to have a proper legend:
pd.plotting.radviz(iris_DataFrame,'class_seq',color=['blue','red','green'])
And further to have it for any other dataset.
You can do it by looping over iris_dataset.target_names.size. This is only size 3 so it should be alot faster for large y arrays.
class_seq = np.empty(y.shape, dtype=iris_dataset.target_names.dtype)
for i in range(iris_dataset.target_names.size):
mask = y == i
class_seq[mask] = iris_dataset.target_names[i]
If you want to have class_seq as a list: class_seq = list(class_seq)
Yo can do it by list comprehension.
class_seq = [ iris_dataset.target_names[i] for i in iris_dataset.target]
or by using map
class_seq = list(map(lambda x : iris_dataset.target_names[x], iris_dataset.target))

List of dictionaries set comprehension calculation

My data structure is a list of dicts. I would like to run a function over the values of certain keys, and then output only a certain number of dictionaries as the result.
from datetime import datetime
from dateutil.parser import parse
today = '05/17/18'
adict = [{'taskid':1,'desc':'task1','complexity':5,'dl':'05/28/18'},{'taskid':2,'desc':'task2','complexity':3,'dl':'05/20/18'},
{'taskid':3,'desc':'task3','complexity':1,'dl':'05/25/18'}]
def conv_tm(t):
return datetime.strptime(t,'%m/%d/%y')
def days(obj):
day = conv_tm(today)
dl = conv_tm(obj)
dur = (dl-day).days
if dur <0:
dur = 1
return dur
I found the easiest way to process the dates for the 'dl' key was to run this dict comprehension:
vals = [days(i['dl']) for i in adict]
#this also worked, but I didn't like it as much
vals = list(map(lambda x: days(x['dl']), adict))
Now, I need to do 2 things: 1) zip this list back up to the 'dl' key, and 2)return or print a (random) set of 2 dicts w/o altering the origianl dict, perhaps like so:
{'taskid':1,'desc':task1,'dl':8,'complexity':5}
{'taskid':3,'desc':task3,'dl':8,'complexity':1}
Cheers
You could produce the new dicts directly like this:
new_dicts = [{**d, 'dl': days(d['dl'])} for d in adict]
If you need vals separately, you can use it to do this as well:
new_dicts = [{**d, 'dl': v} for d, v in zip(adict, vals)]

Resources