I could not combine 2 lists into dictionary using zip()

I could not combine 2 lists into dictionary using zip() - python-3.x

I just learn a new method zip() from Stackoverflow, but it does not work properly.
def diction():
import random
import string
import itertools
dictionary_key={}
upper_list = []
string_dictionary_upper = string.ascii_uppercase
for n in string_dictionary_upper:
upper_list.append(n)
upper_list_new = list(random.shuffle(upper_list))
dictionary_key = dict(zip(upper_list, upper_list_new))
diction()
The error code is 'NoneType' object is not iterable'. But I could not find why.

If you want to create a shuffled copy of a list do so in two steps
1) Copy the list
2) Shuffle the copy:
upper_list_new = upper_list[:] #create a copy
random.shuffle(upper_list_new) #shuffle the copy
The result can then be zipped with other lists.

Related

Automated creation of multiple datasets in Python-Pytables

In my script, I create several datasets manually:
import tables
dset1 = f.create_earray(f.root, "dataset1", atom=tables.Float64Atom(), shape=(0, 2))
dset2 = f.create_earray(f.root, "dataset2", atom=tables.Float64Atom(), shape=(0, 2))
dset3 = f.create_earray(f.root, "dataset3", atom=tables.Float64Atom(), shape=(0, 2))
...
I want to achieve two things:
Automate the above statements to execute in a loop fashion and create any desired (N) datasets
Then I also use .append method sequentially (as given below) which I also want to automate:
dset1.append(np_array1)
dset2.append(np_array2)
dset3.append(np_array3)
...
Will appreciate any assistance?

It's hard to provide specific advice without more details. If you already have the NumPy arrays, you can create the EArray with the data in a single call (using the obj= parameter). Here's a little code snippet that shows how do do this in a loop.
import tables as tb
import numpy as np
with tb.File('SO_64397597.h5','w') as h5f:
arr1 = np.ones((10,2))
arr2 = 2.*np.ones((10,2))
arr3 = 3.*np.ones((10,2))
arr_list = [arr1, arr2, arr3]
for cnt in range(1,4):
h5f.create_earray("/", "dataset"+str(cnt), obj=arr_list[cnt-1])
The code above doesn't create dataset objects. If you need them, you can access programmatically with this call:
# input where as path to node, name not required
ds = h5f.get_node("/dataset1")
# or
# input where as path to group, and name as dataset name
ds = h5f.get_node("/","dataset1")
If you don't have the arrays when you create the datasets, you can create the EArrays in the first loop, then add the np.array data in a second loop. See below:
with tb.File('SO_64397597.h5','w') as h5f:
for cnt in range(1,4):
h5f.create_earray("/", "dataset"+str(cnt), atom=tables.Float64Atom(), shape=(0, 2))
# get array data...
arr_list = [arr1, arr2, arr3]
# add array data
for cnt in range(1,4):
h5f.get_node("/","dataset"+str(cnt)).append(arr_list[cnt-1])

List comprehension requiring values from seperate lists for function input, with multiple return values

I have two lists. One of the lists contains many pandas.core.frame.DataFrame objects, named X_train_frames and the other contains many pandas.core.series.Series objects named y_train_frames.
Each value in X_train_frames maps to a label in y_train_frames
I would like to use them in a function together and return a list.
I have tried:
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state = 1, sampling_strategy = 'minority')
X_bal_frames, y_bal_frames = [smote.fit_resample(X_frame, y_frame) for X_frame, y_frame in zip(X_train_frames, y_train_frames)]
I receive the following error:
ValueError: too many values to unpack (expected 2)
I expect to return two lists of SMOTE resampled data in this case:
X_bal_frames will have a list of pandas.core.frame.DataFrames
and
y_bal_frames will have a list of pandas.core.series.Series

Given that zip(*x) will return two tuples of a 2D list, each part of the tuple can be saved using the syntax below.
a, b = zip(*x)
For the case of this example.
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state = 1, sampling_strategy = 'minority')
X_bal_frames, y_bal_frames = zip(*[smote.fit_resample(X_frame, y_frame) for X_frame, y_frame in zip(X_train_frames, y_train_frames)])

How to use Python3 multiprocessing to append list?

I have an empty list empty_list = []
and 2 other lists: list1=[[1,2,3],[4,5,6],[7,8,9]],list2=[[10,11,12],[13,14,15],[16,17,18]].
I would like to two things:
I would like to pick up [1,2,3] from list and [10,11,12] to make [1,2,3,10,11,12];[4,5,6]and[13,14,15] to form [4,5,6,13,14,15] and finally [7,8,9],[17,18,19] to form [7,8,9,17,18,19]
append listA=[1,2,3,10,11,12], listB=[4,5,6,13,14,15], listC=[7,8,9,17,18,19] to empty with axis=0.
I have done this work by non-multiprocess but slowly. I would ask how to do it by multiprocess.
I have two naive approaches but do not know how to implement it.
to use pool,
make a func0, for picking up sub-lists and merge them using pool.map(func0,[lst for lst in[ list1,list2,list3]]
make a func1 for appending listA, listB, listC to the empty and then pool.map(func1,[lst for lst in [listA,listB,listC]]),
to use multiprocessing.Array
but I have not figured out how to do it
This sample may not need to use multiprocessing but I have lists with thousands lines.

I am not sure if this can help, but you can avoid some list comprehensions:
empty_list=[]
for l1,l2 in zip(list1,list2):
empty_list.append(l1+l2)
Let's check time performance with some random lists:
import timeit
code_to_test = """
import numpy as np
list1 = [np.random.randint(0,10, 100).tolist() for i in range(10_000)]
list2 = [np.random.randint(0,10, 100).tolist() for i in range(10_000)]
empty_list=[]
for l1,l2 in zip(list1,list2):
empty_list.append(l1+l2)
"""
elapsed_time = timeit.timeit(code_to_test, number=100)/100
print(elapsed_time, ' seconds')
0.12564824399999452 seconds

You can use dask to parallelize numpy operations:
import dask.array as da
list1 = da.from_array(list1)
list2 = da.from_array(list2)
result = da.hstack([list1,list2])
result.compute()

Generating class name list based on class index list

I'm playing with iris_dataset from sklearn.datasets
I want to generate list similiar to iris_dataset['target'] but to have name of class instead of index.
The way I did it:
from sklearn.datasets import load_iris
iris_dataset=load_iris()
y=iris_dataset.target
print("Iris target: \n {}".format(iris_dataset.target))
unique_y = np.unique(y)
class_seq=['']
class_seq=class_seq*y.shape[0]
for i in range(y.shape[0]):
for (yy,tn) in zip(unique_y,iris_dataset.target_names):
if y[i]==yy:
class_seq[i]=tn
print("Class sequence: \n {}".format(class_seq))
but I would like to do it not looping through all of the elements of y, how to do it better way?
The outcome is that I need this list for pandas.radviz plot to have a proper legend:
pd.plotting.radviz(iris_DataFrame,'class_seq',color=['blue','red','green'])
And further to have it for any other dataset.

You can do it by looping over iris_dataset.target_names.size. This is only size 3 so it should be alot faster for large y arrays.
class_seq = np.empty(y.shape, dtype=iris_dataset.target_names.dtype)
for i in range(iris_dataset.target_names.size):
mask = y == i
class_seq[mask] = iris_dataset.target_names[i]
If you want to have class_seq as a list: class_seq = list(class_seq)

Yo can do it by list comprehension.
class_seq = [ iris_dataset.target_names[i] for i in iris_dataset.target]
or by using map
class_seq = list(map(lambda x : iris_dataset.target_names[x], iris_dataset.target))

List of dictionaries set comprehension calculation

My data structure is a list of dicts. I would like to run a function over the values of certain keys, and then output only a certain number of dictionaries as the result.
from datetime import datetime
from dateutil.parser import parse
today = '05/17/18'
adict = [{'taskid':1,'desc':'task1','complexity':5,'dl':'05/28/18'},{'taskid':2,'desc':'task2','complexity':3,'dl':'05/20/18'},
{'taskid':3,'desc':'task3','complexity':1,'dl':'05/25/18'}]
def conv_tm(t):
return datetime.strptime(t,'%m/%d/%y')
def days(obj):
day = conv_tm(today)
dl = conv_tm(obj)
dur = (dl-day).days
if dur <0:
dur = 1
return dur
I found the easiest way to process the dates for the 'dl' key was to run this dict comprehension:
vals = [days(i['dl']) for i in adict]
#this also worked, but I didn't like it as much
vals = list(map(lambda x: days(x['dl']), adict))
Now, I need to do 2 things: 1) zip this list back up to the 'dl' key, and 2)return or print a (random) set of 2 dicts w/o altering the origianl dict, perhaps like so:
{'taskid':1,'desc':task1,'dl':8,'complexity':5}
{'taskid':3,'desc':task3,'dl':8,'complexity':1}
Cheers

You could produce the new dicts directly like this:
new_dicts = [{**d, 'dl': days(d['dl'])} for d in adict]
If you need vals separately, you can use it to do this as well:
new_dicts = [{**d, 'dl': v} for d, v in zip(adict, vals)]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

I could not combine 2 lists into dictionary using zip() - python-3.x

If you want to create a shuffled copy of a list do so in two steps 1) Copy the list 2) Shuffle the copy: upper_list_new = upper_list[:] #create a copy random.shuffle(upper_list_new) #shuffle the copy The result can then be zipped with other lists.

Related

Automated creation of multiple datasets in Python-Pytables

List comprehension requiring values from seperate lists for function input, with multiple return values

How to use Python3 multiprocessing to append list?

Generating class name list based on class index list

List of dictionaries set comprehension calculation

Categories

Resources