np.random.choice conflict with multiprocessing? multiprocessing inside for loop? - python-3.x

I want to use np.random.choice inside a multiprocessing pool, but I get the IndexError: list index out of range. I don't get any error when I use the choice function inside a for loop (in series then, not parallel). Any ideas on how to overcome this? This is just a small part of my routine but would surely improve a lot its speed. My code is like below. I declare X before anything else in the routine so it works as a global variable, but it's dynamically populated inside the main. I also noticed that there is some conflict with multiprocessing and the for loop. Any ideas on how to implement this?
from multiprocessing import Pool
from numpy.random import choice
import numpy as np
K = 10
X = []
def function(k):
global X
np.random.RandomState(k)
aux = [i for i in np.arange(K) if i != k]
a,b,c = choice(aux,3,replace=False)
x = X[a]+0.7*(X[b]-X[c])
return x
if __name__ == '__main__':
X = np.arange(K)
for n in range(K):
pool = Pool(K)
w = pool.map(function,np.arange(K))
pool.close()
print(w)

Child processes do not share the memory space of parent processes. Since you populate X inside the if __name__ ... clause, the child processes only have access to the X defined at the top module, i.e X = []
A quick solution would be to shift the line X = np.arange(K) outside the clause like below:
from multiprocessing import Pool
from numpy.random import choice
import numpy as np
K = 10
X = []
X = np.arange(K)
def function(k):
global X
np.random.RandomState(k)
aux = [i for i in np.arange(K) if i != k]
a, b, c = choice(aux, 3, replace=False)
x = X[a] + 0.7 * (X[b] - X[c])
return k, x
if __name__ == '__main__':
pool = Pool(10)
w = pool.map(function, np.arange(K))
pool.close()
print(w)
Output
[(0, 10.899999999999999), (1, 9.4), (2, 5.7), (3, 7.4), (4, 1.1000000000000005), (5, -1.0999999999999996), (6, 5.6), (7, 3.8), (8, 5.5), (9, -4.8999999999999995)]
If you do not want to initialize X for all child processes (memory constraints?), you can use a manager to store X that can be shared to processes without having to copy it for every child. To pass more than one argument to the child processes, you will also have to use pool.starmap instead. Lastly, delete that global X, it is not doing anything useful since global is only used if you are planning to modify a global variable from a local scope.
from multiprocessing import Pool, Manager
from numpy.random import choice
import numpy as np
K = 10
def function(X, k):
np.random.RandomState(k)
aux = [i for i in np.arange(K) if i != k]
a, b, c = choice(aux, 3, replace=False)
x = X[a] + 0.7 * (X[b] - X[c])
return k, x
if __name__ == '__main__':
m = Manager()
X = m.list(np.arange(K))
pool = Pool(10)
args = [(X, val) for val in np.arange(K)]
w = pool.starmap(function, args)
pool.close()
print(w)
Output
[(0, -1.5999999999999996), (1, 7.3), (2, 4.9), (3, 1.9000000000000004), (4, 5.5), (5, -1.0999999999999996), (6, 4.800000000000001), (7, 7.3), (8, 0.10000000000000053), (9, 4.7)]

Related

Python Multiprocessing (Splitting data in smaller chunks - multiple function arguments)

Note from 22.02.21:
-Potentially my problem could also be solved by a more efficient memory usage instead of multiprocessing, since I realized that the memory load gets very high and might be a limiting factor here.
I'm trying to reduce the time that my script needs to run by making use of multiprocessing.
In the past I got some good tips about increasing the speed of the function itself (Increase performance of np.where() loop), but now I would like to make use of all cores of a 32-core workstation.
My function compares entries of two lists (X and Y) with a reference lists Q and Z. For every element in X/Y, it checks whether X[i] occurs somewhere in Q and whether Y[i] occurs in Z. If X[i] == Q[s] AND Y[i] == Z[s], it returns the index "s".
(Note: My real data consists of DNA sequencing reads and I need to map my reads to the reference.)
What I tried so far:
Splitting my long lists X and Y into even chunks (n-chunks, where n == cpu_count)
Trying the "concurrent.futures.ProcessPoolExecutor()" to run the function for each "sublist" in parallel and in the end combine the result of each process to one final dictionary (matchdict). (--> see commented out section)
My problem:
All cores are getting used when I uncomment the multiprocessing section but it ends up with an error (index out of range) which I could not yet resolve. (--> Tip: lower N to 1000 and you will immediately see the error without waiting forever)
Does anyone know how to solve this, or can suggest a better approach to use multiprocessing in my code?
Here is the code:
import numpy as np
import multiprocessing
import concurrent.futures
np.random.seed(1)
def matchdictfunc(index,x,y,q,z): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a, n): # function to split list in n even parts
k, m = divmod(len(a), n)
return list((a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)))
def splitinput(index,X,Y,Q,Z): # split large lists X and Y in n-even parts (n = cpu_count), make new list containing n-times Q and Z (to feed Q and Z for every process)
cpu_count = multiprocessing.cpu_count()
#create multiple chunks for X and Y and index:
index_split = split(index,cpu_count)
X_split = split(X,cpu_count)
Y_split = split(Y,cpu_count)
# create list with several times Q and Z since it needs to be same length as X_split etc:
Q_mult = []
Z_mult = []
for _ in range(cpu_count):
Q_mult.append(Q)
Z_mult.append(Z)
return index_split,X_split,Y_split,Q_mult,Z_mult
# N will finally scale up to 10^9
N = 10000000
M = 300
index = [str(x) for x in list(range(N))]
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X).tolist()
Y = np.char.mod('%d', Y).tolist()
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
# single-core:
matchdict = matchdictfunc(index,X,Y,Q,Z)
# split lists to number of processors (cpu_count)
index_split,X_split,Y_split,Q_mult,Z_mult = splitinput(index,X,Y,Q,Z)
## Multiprocessing attempt - FAILS! (index out of range)
# finallist = []
# if __name__ == '__main__':
# with concurrent.futures.ProcessPoolExecutor() as executor:
# results = executor.map(matchlistfunc,X_split,Y_split,Q_mult,Z_mult)
# for result in results:
# finallist.append(result)
# matchdict = {}
# for d in finallist:
# matchdict.update(d)
Your function matchdictfunc currently has arguments x, y, q, z but in fact does not use them, although in the multiprocessing version it will need to use two arguments. There is also no need for function splitinput to replicate Q and Z into returned values Q_split and Z_split. Currently, matchdictfunc is expecting Q and Z to be global variables and we can arrange for that to be the case in the multiprocessing version by using the initializer and initargs arguments when constructing the pool. You should also move code that you do not need to be executed by the sub-processes into the block controlled by if __name__ == '__main__':, such as the arary initialization code. These changes result in:
import numpy as np
import multiprocessing
import concurrent.futures
MULTIPROCESSING = True
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(index, X, Y): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a, n): # function to split list in n even parts
k, m = divmod(len(a), n)
return list((a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)))
def splitinput(index, X, Y): # split large lists X and Y in n-even parts (n = cpu_count))
cpu_count = multiprocessing.cpu_count()
#create multiple chunks for X and Y and index:
index_split = split(index,cpu_count)
X_split = split(X,cpu_count)
Y_split = split(Y,cpu_count)
return index_split, X_split ,Y_split
def main():
# following required for non-multiprocessing
if not MULTIPROCESSING:
global Q, Z
np.random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
index = [str(x) for x in list(range(N))]
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X).tolist()
Y = np.char.mod('%d', Y).tolist()
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
# for non-multiprocessing:
if not MULTIPROCESSING:
matchdict = matchdictfunc(index, X, Y)
else:
# for multiprocessing:
# split lists to number of processors (cpu_count)
index_split, X_split, Y_split = splitinput(index, X, Y)
with concurrent.futures.ProcessPoolExecutor(initializer=init_pool, initargs=(Q, Z)) as executor:
finallist = [result for result in executor.map(matchdictfunc, index_split, X_split, Y_split)]
matchdict = {}
for d in finallist:
matchdict.update(d)
#print(matchdict)
if __name__ == '__main__':
main()
Note: I tried this for a smaller value of N = 1000 (printing out the results of matchdict) and the multiprocessing version seemed to return the same results. My machine does not have the resources to run with the full value of N without freezing up everything else.
Another Approach
I am working under the assumption that your DNA data is external and the X and Y values can be read n values at a time or can be read in and written out so that this is possible. Then rather than having all the data resident in memory and splitting it up into 32 pieces, I propose that it be read n values at a time and thus broken up into approximately N/n pieces.
In the following code I have switched to using the imap method from class multiprocessing.pool.Pool. The advantage is that it lazily submits tasks to the process pool, that is, the iterable argument doesn't have to be a list or convertible to a list. Instead the pool will iterate over the iterable sending tasks to the pool in chunksize groups. In the code below, I have used a generator function for the argument to imap, which will generate successive X and Y values. Your actual generator function would first open the DNA file (or files) and read in successive portions of the file.
import numpy as np
import multiprocessing
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(t): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
index, X, Y = t
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def next_tuple(n, stop, M):
start = 0
while True:
end = min(start + n, stop)
index = [str(x) for x in list(range(start, end))]
x = np.random.randint(M, size=n)
y = np.random.randint(M, size=n)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
x = np.char.mod('%d', x).tolist()
y = np.char.mod('%d', y).tolist()
yield (index, x, y)
start = end
if start >= stop:
break
def compute_chunksize(XY_AT_A_TIME, N):
n_tasks, remainder = divmod(N, XY_AT_A_TIME)
if remainder:
n_tasks += 1
chunksize, remainder = divmod(n_tasks, multiprocessing.cpu_count() * 4)
if remainder:
chunksize += 1
return chunksize
def main():
np.random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
matchdict = {}
# number of X, Y pairs at a time:
# experiment with this, especially as N increases:
XY_AT_A_TIME = 10000
chunksize = compute_chunksize(XY_AT_A_TIME, N)
#print('chunksize =', chunksize) # 32 with 8 cores
with multiprocessing.Pool(initializer=init_pool, initargs=(Q, Z)) as pool:
for d in pool.imap(matchdictfunc, next_tuple(XY_AT_A_TIME, N, M), chunksize):
matchdict.update(d)
#print(matchdict)
if __name__ == '__main__':
import time
t = time.time()
main()
print('total time =', time.time() - t)
Update
I want to eliminate using numpy from the benchmark. It is known that numpy uses multiprocessing for some of its operations and when used in multiprocessing applications can be the cause of of reduced performance. So the first thing I did was to take the OP's original program and where the code was, for example:
import numpy as np
np.random.seed(1)
X = np.random.randint(M, size=N)
X = np.char.mod('%d', X).tolist()
I replaced it with:
import random
random.seed(1)
X = [str(random.randrange(M)) for _ in range(N)]
I then timed the OP's program to get the time for generating the X, Y, Q and Z lists and the total time. On my desktop the times were approximately 20 seconds and 37 seconds respectively! So in my multiprocessing version just generating the arguments for the process pool's processes is more than half the total running time. I also discovered for the second approach, that as I increased the value of XY_AT_A_TIME that the CPU utilization went down from 100% to around 50% but that the total elapsed time improved. I haven't quite figured out why this is.
Next I tried to emulate how the programs would function if they were reading the data in. So I wrote out 2 * N random integers to a file, temp.txt and modified the OP's program to initialize X and Y from the file and then modified my second approach's next_tuple function as follows:
def next_tuple(n, stop, M):
with open('temp.txt') as f:
start = 0
while True:
end = min(start + n, stop)
index = [str(x) for x in range(start, end)] # improvement
x = [f.readline().strip() for _ in range(n)]
y = [f.readline().strip() for _ in range(n)]
yield (index, x, y)
start = end
if start >= stop:
break
Again as I increased XY_AT_A_TIME the CPU utilization went down (best performance I found was value 400000 with CPU utilization only around 40%).
I finally rewrote my first approach trying to be more memory efficient (see below). This updated version again reads the random numbers from a file but uses generator functions for X, Y and index so I don't need memory for both the full lists and the splits. Again, I do not expect duplicated results for the multiprocessing and non-multiprocessing versions because of the way I am assigning the X and Y values in the two cases (a simple solution to this would have been to write the random numbers to an X-value file and a Y-value file and read the values back from the two files). But this has no effect on the running times. But again, the CPU utilization, despite using the default pool size of 8, was only 30 - 40% (it fluctuated quite a bit) and the overall running time was nearly double the non-multiprocessing running time. But why?
import random
import multiprocessing
import concurrent.futures
import time
MULTIPROCESSING = True
POOL_SIZE = multiprocessing.cpu_count()
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(index, X, Y): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a): # function to split list in POOL_SIZE even parts
k, m = divmod(N, POOL_SIZE)
divisions = [(i + 1) * k + min(i + 1, m) - (i * k + min(i, m)) for i in range(POOL_SIZE)]
parts = []
for division in divisions:
part = [next(a) for _ in range(division)]
parts.append(part)
return parts
def splitinput(index, X, Y): # split large lists X and Y in n-even parts (n = POOL_SIZE)
#create multiple chunks for X and Y and index:
index_split = split(index)
X_split = split(X)
Y_split = split(Y)
return index_split, X_split ,Y_split
def main():
global N
# following required for non-multiprocessing
if not MULTIPROCESSING:
global Q, Z
random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = [str(random.randrange(M)) for _ in range(120000)]
Z = [str(random.randrange(M)) for _ in range(120000)]
with open('temp.txt') as f:
# for non-multiprocessing:
if not MULTIPROCESSING:
index = [str(x) for x in range(N)]
X = [f.readline().strip() for _ in range(N)]
Y = [f.readline().strip() for _ in range(N)]
matchdict = matchdictfunc(index, X, Y)
else:
# for multiprocessing:
# split lists to number of processors (POOL_SIZE)
# generator functions:
index = (str(x) for x in range(N))
X = (f.readline().strip() for _ in range(N))
Y = (f.readline().strip() for _ in range(N))
index_split, X_split, Y_split = splitinput(index, X, Y)
with concurrent.futures.ProcessPoolExecutor(POOL_SIZE, initializer=init_pool, initargs=(Q, Z)) as executor:
finallist = [result for result in executor.map(matchdictfunc, index_split, X_split, Y_split)]
matchdict = {}
for d in finallist:
matchdict.update(d)
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
Resolution?
Can it be that the overhead of transferring the data from the main process to the subprocesses, which involves shared memory reading and writing, is what is slowing everything down? So, this final version was an attempt to eliminate this potential cause for the slowdown. On my desktop I have 8 processors. For the first approach dividing the N = 10000000 X and Y values among them means that each process should be processing N // 8 -> 1250000 values. So I wrote out the random numbers in 16 groups of 1250000 numbers (8 groups for X and 8 groups for Y) as a binary file noting the offset and length of each of these 16 groups using the following code:
import random
random.seed(1)
with open('temp.txt', 'wb') as f:
offsets = []
for i in range(16):
n = [str(random.randrange(300)) for _ in range(1250000)]
b = ','.join(n).encode('ascii')
l = len(b)
offsets.append((f.tell(), l))
f.write(b)
print(offsets)
And from that I constructed lists X_SPECS and Y_SPECS that the worker function matchdictfunc could use for reading in the values X and Y itself as needed. So now instead of passing 1250000 values at a time to this worker function, we are just passing indices 0, 1, ... 7 to the worker function so it knows which group it has to read in. Shared memory access has been totally eliminated in accessing X and Y (it's still required for Q and Z) and the disk access moved to the process pool. The CPU Utilization will, of course, not be 100% because the worker function is doing I/O. But I found that while the running time has now been greatly improved, it still offered no improvement over the original non-multiprocessing version:
OP's original program modified to read `X` and `Y` values in from file: 26.2 seconds
Multiprocessing elapsed time: 29.2 seconds
In fact, when I changed the code to use multithreading by replacing the ProcessPoolExecutor with ThreadPoolExecutor, the elpased time went down almost another second demonstrating the there is very little contention for the Global Interpreter Lock within the worker function, i.e. most of the time is being spent in C-language code. The main work is done by:
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
When we do this with multiprocessing, we have multiple list comprehensions and multiple zip operations (on smaller lists) being performed by separate processes and we then assemble the results in the end. This is conjecture on my part, but there just may not be any performance gains to be had by taking what are already efficient operations and scaling them down across multiple processors. Or in other words, I am stumped and that was my best guess.
The final version (with some additional optimizations -- please note):
import random
import concurrent.futures
import time
POOL_SIZE = 8
X_SPECS = [(0, 4541088), (4541088, 4541824), (9082912, 4540691), (13623603, 4541385), (18164988, 4541459), (22706447, 4542961), (27249408, 4541847), (31791255, 4542186)]
Y_SPECS = [(36333441, 4542101), (40875542, 4540120), (45415662, 4540802), (49956464, 4540971), (54497435, 4541427), (59038862, 4541523), (63580385, 4541571), (68121956, 4542335)]
def init_pool(q_z):
global Q_Z
Q_Z = q_z
def matchdictfunc(index, i): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
x_offset, x_len = X_SPECS[i]
y_offset, y_len = Y_SPECS[i]
with open('temp.txt', 'rb') as f:
f.seek(x_offset, 0)
X = f.read(x_len).decode('ascii').split(',')
f.seek(y_offset, 0)
Y = f.read(y_len).decode('ascii').split(',')
lookup = {}
for i, (q, z) in enumerate(Q_Z):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a): # function to split list in POOL_SIZE even parts
k, m = divmod(N, POOL_SIZE)
divisions = [(i + 1) * k + min(i + 1, m) - (i * k + min(i, m)) for i in range(POOL_SIZE)]
parts = []
for division in divisions:
part = [next(a) for _ in range(division)]
parts.append(part)
return parts
def main():
global N
random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = (str(random.randrange(M)) for _ in range(120000))
Z = (str(random.randrange(M)) for _ in range(120000))
Q_Z = list(zip(Q, Z)) # pre-compute the `zip` function
# for multiprocessing:
# split lists to number of processors (POOL_SIZE)
# generator functions:
index = (str(x) for x in range(N))
index_split = split(index)
with concurrent.futures.ProcessPoolExecutor(POOL_SIZE, initializer=init_pool, initargs=(Q_Z,)) as executor:
finallist = executor.map(matchdictfunc, index_split, range(8))
matchdict = {}
for d in finallist:
matchdict.update(d)
print(len(matchdict))
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
The Cost of Inter-Process Memory Transfers
In the code below function create_files was called to create 100 identical files consisting of a "pickled" list of 1,000,000 numbers. I then used a multiprocessing pool of size 8 twice to read the 100 files and unpickle the files to reconstitute the original lists. The difference between the first case (worker1) and the second case (worker2) was that in the second case the list is returned back to the caller (but not saved so that memory can be garbage collected immediately). The second case took more than three times longer than the first case. This can also explain in part why you do not see a speedup when you switch to multiprocessing.
from multiprocessing import Pool
import pickle
import time
def create_files():
l = [i for i in range(1000000)]
# create 100 identical files:
for file in range(1, 101):
with open(f'pkl/test{file}.pkl', 'wb') as f:
pickle.dump(l, f)
def worker1(file):
file_name = f'pkl/test{file}.pkl'
with open(file_name, 'rb') as f:
obj = pickle.load(f)
def worker2(file):
file_name = f'pkl/test{file}.pkl'
with open(file_name, 'rb') as f:
obj = pickle.load(f)
return file_name, obj
POOLSIZE = 8
if __name__ == '__main__':
#create_files()
pool = Pool(POOLSIZE)
t = time.time()
# no data returned:
for file in range(1, 101):
pool.apply_async(worker1, args=(file,))
pool.close()
pool.join()
print(time.time() - t)
pool = Pool(POOLSIZE)
t = time.time()
for file in range(1, 101):
pool.apply_async(worker2, args=(file,))
pool.close()
pool.join()
print(time.time() - t)
t = time.time()
for file in range(1, 101):
worker2(file)
print(time.time() - t)

How to use ray parallelism within a class in python?

I want to use the ray task method rather than the ray actor method to parallelise a method within a class. The reason being the latter seems to need to change how a class is instantiated (as shown here). A toy code example is below, as well as the error
import numpy as np
import ray
class MyClass(object):
def __init__(self):
ray.init(num_cpus=4)
#ray.remote
def func(self, x, y):
return x * y
def my_func(self):
a = [1, 2, 3]
b = np.random.normal(0, 1, 10000)
result = []
# we wish to parallelise over the array `a`
for sub_array in np.array_split(a, 3):
result.append(self.func.remote(sub_array, b))
return result
mc = MyClass()
mc.my_func()
>>> TypeError: missing a required argument: 'y'
The error arises because ray does not seem to be "aware" of the class, and so it expects an argument self.
The code works fine if we do not use classes:
#ray.remote
def func(x, y):
return x * y
def my_func():
a = [1, 2, 3, 4]
b = np.random.normal(0, 1, 10000)
result = []
# we wish to parallelise over the list `a`
# split `a` and send each chunk to a different processor
for sub_array in np.array_split(a, 4):
result.append(func.remote(sub_array, b))
return result
res = my_func()
ray.get(res)
>>> [array([-0.41929678, -0.83227786, -2.69814232, ..., -0.67379119,
-0.79057845, -0.06862196]),
array([-0.83859356, -1.66455572, -5.39628463, ..., -1.34758239,
-1.5811569 , -0.13724391]),
array([-1.25789034, -2.49683358, -8.09442695, ..., -2.02137358,
-2.37173535, -0.20586587]),
array([ -1.67718712, -3.32911144, -10.79256927, ..., -2.69516478,
-3.1623138 , -0.27448782])]```
We see the output is a list of 4 arrays, as expected. How can I get MyClass to work with parallelism using ray?
a few tips:
It's generally recommended that you only use the ray.remote decorator on functions or classes in python (not bound methods).
You should be very very careful about calling ray.init inside the constructor of a function, since ray.init is not idempotent (which means your program will fail if you instantiate multiple instances of MyClass). Instead, you should make sure ray.init is only run once in your program.
I think there's 2 ways of achieving the results you're going for with Ray here.
You could move func out of the class, so it becomes a function instead of a bound method. Note that in this approach MyClass will be serialized, which means that changes that func makes to MyClass will not be reflected anywhere outside the function. In your simplified example, this doesn't appear to be a problem. This approach makes it easiest to achieve the most parallelism.
#ray.remote
def func(obj, x, y):
return x * y
class MyClass(object):
def my_func(self):
...
# we wish to parallelise over the array `a`
for sub_array in np.array_split(a, 3):
result.append(func.remote(self, sub_array, b))
return result
The other approach you could consider is to use async actors. In this approach, the ray actor will handle concurrency via asyncio, but this comes with the limitations of asyncio.
#ray.remote(num_cpus=4)
class MyClass(object):
async def func(self, x, y):
return x * y
def my_func(self):
a = [1, 2, 3]
b = np.random.normal(0, 1, 10000)
result = []
# we wish to parallelise over the array `a`
for sub_array in np.array_split(a, 3):
result.append(self.func.remote(sub_array, b))
return result
Please see this code:
#ray.remote
class Prime:
# Constructor
def __init__(self,number) :
self.num = number
def SumPrime(self,num) :
tot = 0
for i in range(3,num):
c = 0
for j in range(2, int(i**0.5)+1):
if i%j == 0:
c = c + 1
if c == 0:
tot = tot + i
return tot
num = 1000000
start = time.time()
# make an object of Check class
prime = Prime.remote(num)
print("duration =", time.time() - start, "\nsum_prime = ", ray.get(prime.SumPrime.remote(num)))

multprocessing for a stochastic process with multiple arguments

I want to solve a stochastic differential equation using multiprocessing. A simplified not-parallel code is like:
import numpy as np
x = np.zeros((2, 3, 4)) #matrix
z = np.random.normal(0, 1, (2,3,4)) #noise
z_array = z
for i in range(2):
for j in range(3):
x[i,j,0] = i
for k in range(3):
x[i,j,k+1] = x[i,j,k]*z_array[i,j,k]
The outcomes are the noisez_array and the corresponding matrix x. I want to use multiprocessing for the second loop. The problem is that I don't know how to incorporate the noise z in the parallel code. A naive implementation is like
import os
import numpy as np
import functools as ft
from multiprocess import Pool
def fun(i, k):
x = np.zeros(4)
x[0] = i
for k in range(2):
z = np.random.normal(0, 1)
x[k+1] = x[k]*z
return x
if __name__=='__main__':
pool = Pool(os.cpu_count()-1)
x = np.zeros((2, 3, 4))
for i in range(2):
result = np.array(pool.map(ft.partial(fun, i), range(3)))
x[i] = result
pool.close()
pool.join()
Since the code involves random numbers, I am not sure whether parallel code is correct or not and I don't know how to get the noises z. Any ideas?
You can try pre-generating the noise z and passing it to the argument along with k as a tuple. That way you have the noise with you and you do not need to generate it in the function. You can also add the first loop with i in the original function in the tuple to run it in the multiprocessing code.
For the code below:
In the second code you wrote, you ran the k loop inside the fun as range(2), which I assume is a typo and I am keeping it till range(3) as in the original code
I have incorporated the first loop into the multiprocessing setup too
If memory is not an issue and the matrix is small, use the below option which is cleaner and the equivalency of your original code and multiprocessing code is easier to read. If memory is an issue, you can compute only smaller matrices inside the fun and then reshape the result rather than adding (let me know if you want that solution).
Main code:
import os
import numpy as np
from multiprocessing import Pool
def fun(t):
i, j, z = t
x = np.zeros((2, 3, 4))
x[i, j, 0] = i
for k in range(3):
x[i, j, k + 1] = x[i, j, k] * z[k]
return x
if __name__=='__main__':
z = np.random.normal(0, 1, (2,3,4))
pool = Pool(os.cpu_count() - 1)
map_args = ((i, j, z[i, j, :]) for i in range(2) for j in range (3))
result = np.array(pool.map(fun, map_args))
x = np.sum(result, 0)
pool.close()
pool.join()

identifying leaves and ancestors of a tree - Python

I have a tree as shown below:
I need to find leaves such that they have the same ancestor. For an example, in the above tree, we have two nodes with the same ancestor. Can someone suggest me a way to do that?
With the help of those two answers, I tried the following to find the pair of leaves with the common parent and then I need to join those two leaves and update the tree. But, that did not give me the correct pair of leaves and did not update the tree correctly. Can you please find the mistake here and help me with that?
def common_parent(T, n1, n2):
for n1 in N:
for n2 in N:
if T.neighbors(n1) == T.neighbors(n2):
return (n1, n2)
nodes_pairs = []
for n1 in N:
for n2 in N:
if n1 != n2 and common_parent(T, n1,n2):
nodes_pairs.append(common_ancestor(T, n1,n2))
print(nodes_pairs)
for n1 in N:
for n2 in N:
if T.neighbors(n1) == T.neighbors(n2):
T.add_edge(n1, n2, weight='distance')
print(T.edges())
Can be done like that:
import networkx as nx
import matplotlib.pyplot as plt
from collections import defaultdict
G = nx.Graph()
## setup, borrowed from https://necromuralist.github.io/data_science/posts/distance-in-social-networks/
left = tuple("AAKBCCFEDEIE")
right = tuple("KBBCFEGFEIJH")
G.add_edges_from(list(zip(left, right)))
##
# retrieve nodes of degree=1
k1corona = list(nx.k_corona(G, 1))
# and their parents
nodes = { node: list(G[node])[0] for _set in k1corona for node in _set }
# counting leaves for each parent
parents = defaultdict(int)
for node,parent in nodes.items():
parents[parent]+=1
# filtering out loners
leaves = [ node for node,parent in nodes.items() if parents[parent]>=2 ]
# drawing
pos = nx.spring_layout(G,random_state=0)
nx.draw_networkx(G, pos=pos, with_labels=True)
nx.draw_networkx_nodes(G.subgraph(leaves), pos=pos, with_labels=True, node_color='blue')
plt.show()
Find nodes with a common ancestor
create tree
G = nx.balanced_tree(2, 2, create_using=None)
plotting
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_edges(G,pos)
nx.draw_networkx_labels(G,pos, font_color='w')
plt.axis('off')
plt.show()
def common_ancestor(G, n1, n2):
if nx.shortest_path_length(G, n1, n2) == 2:
return (n1, n2)
nodes_pairs = []
for n1 in G.nodes():
for n2 in G.nodes():
if n1 != n2 and common_ancestor(G, n1,n2):
nodes_pairs.append(common_ancestor(G, n1,n2))
nodes_pairs
[out]:
[(0, 3),
(0, 4),
(0, 5),
(0, 6),
(1, 2),
(2, 1),
(3, 0),
(3, 4),
(4, 0),
(4, 3),
(5, 0),
(5, 6),
(6, 0),
(6, 5)]

Add and Multiplication of Polynomials in Python

I want to add and multiply two polynomials. A function takes two arguments like add([(4,3),(3,0)],[(-4,3),(2,1)]).So, the polynomial looks like
4x^3 + 3 and -4x^3 + 2x
I want to add and multiply both these two polynomials without using any library.
I have created a simplified version for both addition and multiplication by creating a blank list that can store the coefficients from constant terms to the co-eff of highest exponents. The logic is simply to update the coefficients and creating a list containing tuple pairs of the format (co-eff, exponent)
def add(p1,p2):
x = [0]*(max(p1[0][1],p2[0][1])+1)
for i in p1+p2:
x[i[1]]+=i[0]
res = [(x[i],i) for i in range(len(x)) if x[i]!=0]
res.sort(key = lambda r: r[1], reverse= True)
return res
def mul(p1,p2):
x = [0]*(p1[0][1]*p2[0][1]+1)
for i in p1:
for j in p2:
x[i[1]+j[1]]+=i[0]*j[0]
res = [(x[i],i) for i in range(len(x)) if x[i]!=0]
res.sort(key = lambda r: r[1], reverse= True)
return res
pls note that this code works only for non negative exponents
addition and multiplication of the polynomials you referred in the question yields the following results
add([(4,3),(3,0)],[(-4,3),(2,1)]) = [(2, 1), (3, 0)]
mul([(4,3),(3,0)],[(-4,3),(2,1)]) = [(-16, 6), (8, 4), (-12, 3), (6, 1)]
For addition I have written a method
def poly_add( x, y):
r = []
min_len = min( len(x), len(y))
for i in range(min_len):
if x[i][1] == y[i][1]:
m = x[i][0] + y[i][0]
if m != 0:
r.append((m, x[i][1]))
if x[i][1] != y[i][1]:
r.append((y[i]))
r.append((x[i]))
return r

Resources