How to use Python3 multiprocessing to append list? - python-3.x

I have an empty list empty_list = []
and 2 other lists: list1=[[1,2,3],[4,5,6],[7,8,9]],list2=[[10,11,12],[13,14,15],[16,17,18]].
I would like to two things:
I would like to pick up [1,2,3] from list and [10,11,12] to make [1,2,3,10,11,12];[4,5,6]and[13,14,15] to form [4,5,6,13,14,15] and finally [7,8,9],[17,18,19] to form [7,8,9,17,18,19]
append listA=[1,2,3,10,11,12], listB=[4,5,6,13,14,15], listC=[7,8,9,17,18,19] to empty with axis=0.
I have done this work by non-multiprocess but slowly. I would ask how to do it by multiprocess.
I have two naive approaches but do not know how to implement it.
to use pool,
make a func0, for picking up sub-lists and merge them using pool.map(func0,[lst for lst in[ list1,list2,list3]]
make a func1 for appending listA, listB, listC to the empty and then pool.map(func1,[lst for lst in [listA,listB,listC]]),
to use multiprocessing.Array
but I have not figured out how to do it
This sample may not need to use multiprocessing but I have lists with thousands lines.

I am not sure if this can help, but you can avoid some list comprehensions:
empty_list=[]
for l1,l2 in zip(list1,list2):
empty_list.append(l1+l2)
Let's check time performance with some random lists:
import timeit
code_to_test = """
import numpy as np
list1 = [np.random.randint(0,10, 100).tolist() for i in range(10_000)]
list2 = [np.random.randint(0,10, 100).tolist() for i in range(10_000)]
empty_list=[]
for l1,l2 in zip(list1,list2):
empty_list.append(l1+l2)
"""
elapsed_time = timeit.timeit(code_to_test, number=100)/100
print(elapsed_time, ' seconds')
0.12564824399999452 seconds

You can use dask to parallelize numpy operations:
import dask.array as da
list1 = da.from_array(list1)
list2 = da.from_array(list2)
result = da.hstack([list1,list2])
result.compute()

Related

how to extract lists having same element value?

I have a list of list like this data=[["date1","a",14,15],["date1","b",14,15],["date1","c",14,15],["date2","a",14,15],["date2","b",14,15],["date2","c",14,15],["date3","a",14,15],["date3","b",14,15],["date3","c",14,15]] I want to get lists having the same 2nd index. i tried this code but i got 9 lists when i just need 3 lists.
data=[["date1","a",14,15],["date1","b",14,15],["date1","c",14,15],["date2","a",14,15],["date2","b",14,15],["date2","c",14,15],["date3","a",14,15],["date3","b",14,15],["date3","c",14,15]]
for i in data:
a=[]
for j in data:
if (i[1]==j[1]):
a.append(j)
print(a)
i expected to get ["date1","a",14,15],["date2","a",14,15],["date3","a",14,15]
["date1","b",14,15],["date2","b",14,15],["date3","b",14,15]
["date1","c",14,15],["date2","c",14,15],["date3","c",14,15]
data=[["date1","a",14,15],["date1","b",14,15],["date1","c",14,15],["date2","a",14,15],["date2","b",14,15],["date2","c",14,15],["date3","a",14,15],["date3","b",14,15],["date3","c",14,15]]
from itertools import groupby
from operator import itemgetter
print(
[list(v) for k,v in groupby(sorted(data, key=itemgetter(1)), key=itemgetter(1))]
)
In order for groupby to work the data has to be sorted.
Depending on your use case, the list instantiation of the iterator might not be needed. Added it to see proper output instead of <itertools._grouper... >

Add a value from a list to the end of a value in another list in Python

I am sure that this is a basic task and that the answer is somewhere on google but the problem I have is that I don't know what this is "called" so I am having a bad time trying to google it, almost every page demonstrates merging two lists which is not in my interest.
I basically have two lists where I would like to add the values from the list "add" to the the end of each word in the list "stuff" and print it out.
add = ['123', '12345']
stuff = ['Cars', 'Suits', 'Drinks']
Desired output
Cars123
Cars12345
Suits123
Suits12345
Drinks123
Drinks12345
Thanks in advance, and sorry again for bad research.
Is there any reason you can't just use a nested loop? It's certainly the simplest solution.
for i in stuff:
for j in add:
print(i+j)
gives
Cars123
Cars12345
Suits123
Suits12345
Drinks123
Drinks12345
This assumes that both lists are strings.
As a side point, shadowing function names like add is generally a bad idea for variables, so I would consider changing that.
Ignore what I said about combinations in the comment!
>>> from itertools import product
>>> add = ['123', '12345']
>>> stuff = ['Cars', 'Suits', 'Drinks']
>>> for a, s in product(add, stuff):
... a+s
...
'123Cars'
'123Suits'
'123Drinks'
'12345Cars'
'12345Suits'
'12345Drinks'
Addendum: Timing information: This code, which compares the nested loop with the product function from itertools does indeed show that the latter takes more time, in the ratio of about 2.64.
import timeit
def approach_1():
add = ['123', '12345']; stuff = ['Cars', 'Suits', 'Drinks']
for a in add:
for s in stuff:
a+s
def approach_2():
from itertools import product
add = ['123', '12345']; stuff = ['Cars', 'Suits', 'Drinks']
for a, s in product(add, stuff):
a+s
t1 = timeit.timeit('approach_1()','from __main__ import approach_1', number=10000)
t2 = timeit.timeit('approach_2()','from __main__ import approach_2', number=10000)
print (t2/t1)
You need two for loops for that:
for stuff_element in stuff:
for add_element in add:
print(stuff_element+add_elements)
Try this :
for i in stuff:
for j in add:
print(i+j)
Let me know if it works

Generating class name list based on class index list

I'm playing with iris_dataset from sklearn.datasets
I want to generate list similiar to iris_dataset['target'] but to have name of class instead of index.
The way I did it:
from sklearn.datasets import load_iris
iris_dataset=load_iris()
y=iris_dataset.target
print("Iris target: \n {}".format(iris_dataset.target))
unique_y = np.unique(y)
class_seq=['']
class_seq=class_seq*y.shape[0]
for i in range(y.shape[0]):
for (yy,tn) in zip(unique_y,iris_dataset.target_names):
if y[i]==yy:
class_seq[i]=tn
print("Class sequence: \n {}".format(class_seq))
but I would like to do it not looping through all of the elements of y, how to do it better way?
The outcome is that I need this list for pandas.radviz plot to have a proper legend:
pd.plotting.radviz(iris_DataFrame,'class_seq',color=['blue','red','green'])
And further to have it for any other dataset.
You can do it by looping over iris_dataset.target_names.size. This is only size 3 so it should be alot faster for large y arrays.
class_seq = np.empty(y.shape, dtype=iris_dataset.target_names.dtype)
for i in range(iris_dataset.target_names.size):
mask = y == i
class_seq[mask] = iris_dataset.target_names[i]
If you want to have class_seq as a list: class_seq = list(class_seq)
Yo can do it by list comprehension.
class_seq = [ iris_dataset.target_names[i] for i in iris_dataset.target]
or by using map
class_seq = list(map(lambda x : iris_dataset.target_names[x], iris_dataset.target))

How to speed up for loop execution using multiprocessing in python

I have two lists. List A contains 500 words. List B contains 10000 words. I am trying to find similar words for List A with respect to B.I am using Spacy's similarity function.
The problem I am facing is that it takes ages to compute. I am new to multiprocessing usage, hence request help.
How do I speed up the execution of the for loop part through multiprocessing in python?
The following is my code.
ListA =['Dell', 'GPU',......] #500 words lists
ListB = ['Docker','Ec2'.......] #10000 words lists
s_words = []
for token1 in ListB:
list_to_sort = []
for token2 in ListA:
list_to_sort.append((token1, token2,nlp(str(token1)).similarity(nlp(str(token2)))))
sorted_list = sorted(list_to_sort, key = itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)
You can use multiprocessing package. This I hope will reduce your time significantly. See here for a sample code.
Have you tried nlp.pipe()?
You could do something like this:
from operator import itemgetter
import spacy
nlp = spacy.load("en_core_web_lg")
ListA = ['Apples', 'Monkey'] # 500 words lists
ListB = ['Grapefruit', 'Ape', 'Oranges', 'Banana'] # 10000 words lists
s_words = []
docs_a = nlp.pipe(ListA)
docs_b = list(nlp.pipe(ListB))
for token1 in docs_a:
list_to_sort = []
for token2 in docs_b:
list_to_sort.append((token1.text, token2.text, token1.similarity(token2)))
sorted_list = sorted(list_to_sort, key=itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)
print(s_words)
That should already speed things up for you. The function nlp.pipe() also has the parameter n_process which might be what you're looking for.

I could not combine 2 lists into dictionary using zip()

I just learn a new method zip() from Stackoverflow, but it does not work properly.
def diction():
import random
import string
import itertools
dictionary_key={}
upper_list = []
string_dictionary_upper = string.ascii_uppercase
for n in string_dictionary_upper:
upper_list.append(n)
upper_list_new = list(random.shuffle(upper_list))
dictionary_key = dict(zip(upper_list, upper_list_new))
diction()
The error code is 'NoneType' object is not iterable'. But I could not find why.
If you want to create a shuffled copy of a list do so in two steps
1) Copy the list
2) Shuffle the copy:
upper_list_new = upper_list[:] #create a copy
random.shuffle(upper_list_new) #shuffle the copy
The result can then be zipped with other lists.

Resources