multprocessing for a stochastic process with multiple arguments - python-3.x

I want to solve a stochastic differential equation using multiprocessing. A simplified not-parallel code is like:
import numpy as np
x = np.zeros((2, 3, 4)) #matrix
z = np.random.normal(0, 1, (2,3,4)) #noise
z_array = z
for i in range(2):
for j in range(3):
x[i,j,0] = i
for k in range(3):
x[i,j,k+1] = x[i,j,k]*z_array[i,j,k]
The outcomes are the noisez_array and the corresponding matrix x. I want to use multiprocessing for the second loop. The problem is that I don't know how to incorporate the noise z in the parallel code. A naive implementation is like
import os
import numpy as np
import functools as ft
from multiprocess import Pool
def fun(i, k):
x = np.zeros(4)
x[0] = i
for k in range(2):
z = np.random.normal(0, 1)
x[k+1] = x[k]*z
return x
if __name__=='__main__':
pool = Pool(os.cpu_count()-1)
x = np.zeros((2, 3, 4))
for i in range(2):
result = np.array(pool.map(ft.partial(fun, i), range(3)))
x[i] = result
pool.close()
pool.join()
Since the code involves random numbers, I am not sure whether parallel code is correct or not and I don't know how to get the noises z. Any ideas?

You can try pre-generating the noise z and passing it to the argument along with k as a tuple. That way you have the noise with you and you do not need to generate it in the function. You can also add the first loop with i in the original function in the tuple to run it in the multiprocessing code.
For the code below:
In the second code you wrote, you ran the k loop inside the fun as range(2), which I assume is a typo and I am keeping it till range(3) as in the original code
I have incorporated the first loop into the multiprocessing setup too
If memory is not an issue and the matrix is small, use the below option which is cleaner and the equivalency of your original code and multiprocessing code is easier to read. If memory is an issue, you can compute only smaller matrices inside the fun and then reshape the result rather than adding (let me know if you want that solution).
Main code:
import os
import numpy as np
from multiprocessing import Pool
def fun(t):
i, j, z = t
x = np.zeros((2, 3, 4))
x[i, j, 0] = i
for k in range(3):
x[i, j, k + 1] = x[i, j, k] * z[k]
return x
if __name__=='__main__':
z = np.random.normal(0, 1, (2,3,4))
pool = Pool(os.cpu_count() - 1)
map_args = ((i, j, z[i, j, :]) for i in range(2) for j in range (3))
result = np.array(pool.map(fun, map_args))
x = np.sum(result, 0)
pool.close()
pool.join()

Related

np.random.choice conflict with multiprocessing? multiprocessing inside for loop?

I want to use np.random.choice inside a multiprocessing pool, but I get the IndexError: list index out of range. I don't get any error when I use the choice function inside a for loop (in series then, not parallel). Any ideas on how to overcome this? This is just a small part of my routine but would surely improve a lot its speed. My code is like below. I declare X before anything else in the routine so it works as a global variable, but it's dynamically populated inside the main. I also noticed that there is some conflict with multiprocessing and the for loop. Any ideas on how to implement this?
from multiprocessing import Pool
from numpy.random import choice
import numpy as np
K = 10
X = []
def function(k):
global X
np.random.RandomState(k)
aux = [i for i in np.arange(K) if i != k]
a,b,c = choice(aux,3,replace=False)
x = X[a]+0.7*(X[b]-X[c])
return x
if __name__ == '__main__':
X = np.arange(K)
for n in range(K):
pool = Pool(K)
w = pool.map(function,np.arange(K))
pool.close()
print(w)
Child processes do not share the memory space of parent processes. Since you populate X inside the if __name__ ... clause, the child processes only have access to the X defined at the top module, i.e X = []
A quick solution would be to shift the line X = np.arange(K) outside the clause like below:
from multiprocessing import Pool
from numpy.random import choice
import numpy as np
K = 10
X = []
X = np.arange(K)
def function(k):
global X
np.random.RandomState(k)
aux = [i for i in np.arange(K) if i != k]
a, b, c = choice(aux, 3, replace=False)
x = X[a] + 0.7 * (X[b] - X[c])
return k, x
if __name__ == '__main__':
pool = Pool(10)
w = pool.map(function, np.arange(K))
pool.close()
print(w)
Output
[(0, 10.899999999999999), (1, 9.4), (2, 5.7), (3, 7.4), (4, 1.1000000000000005), (5, -1.0999999999999996), (6, 5.6), (7, 3.8), (8, 5.5), (9, -4.8999999999999995)]
If you do not want to initialize X for all child processes (memory constraints?), you can use a manager to store X that can be shared to processes without having to copy it for every child. To pass more than one argument to the child processes, you will also have to use pool.starmap instead. Lastly, delete that global X, it is not doing anything useful since global is only used if you are planning to modify a global variable from a local scope.
from multiprocessing import Pool, Manager
from numpy.random import choice
import numpy as np
K = 10
def function(X, k):
np.random.RandomState(k)
aux = [i for i in np.arange(K) if i != k]
a, b, c = choice(aux, 3, replace=False)
x = X[a] + 0.7 * (X[b] - X[c])
return k, x
if __name__ == '__main__':
m = Manager()
X = m.list(np.arange(K))
pool = Pool(10)
args = [(X, val) for val in np.arange(K)]
w = pool.starmap(function, args)
pool.close()
print(w)
Output
[(0, -1.5999999999999996), (1, 7.3), (2, 4.9), (3, 1.9000000000000004), (4, 5.5), (5, -1.0999999999999996), (6, 4.800000000000001), (7, 7.3), (8, 0.10000000000000053), (9, 4.7)]

Python Multiprocessing (Splitting data in smaller chunks - multiple function arguments)

Note from 22.02.21:
-Potentially my problem could also be solved by a more efficient memory usage instead of multiprocessing, since I realized that the memory load gets very high and might be a limiting factor here.
I'm trying to reduce the time that my script needs to run by making use of multiprocessing.
In the past I got some good tips about increasing the speed of the function itself (Increase performance of np.where() loop), but now I would like to make use of all cores of a 32-core workstation.
My function compares entries of two lists (X and Y) with a reference lists Q and Z. For every element in X/Y, it checks whether X[i] occurs somewhere in Q and whether Y[i] occurs in Z. If X[i] == Q[s] AND Y[i] == Z[s], it returns the index "s".
(Note: My real data consists of DNA sequencing reads and I need to map my reads to the reference.)
What I tried so far:
Splitting my long lists X and Y into even chunks (n-chunks, where n == cpu_count)
Trying the "concurrent.futures.ProcessPoolExecutor()" to run the function for each "sublist" in parallel and in the end combine the result of each process to one final dictionary (matchdict). (--> see commented out section)
My problem:
All cores are getting used when I uncomment the multiprocessing section but it ends up with an error (index out of range) which I could not yet resolve. (--> Tip: lower N to 1000 and you will immediately see the error without waiting forever)
Does anyone know how to solve this, or can suggest a better approach to use multiprocessing in my code?
Here is the code:
import numpy as np
import multiprocessing
import concurrent.futures
np.random.seed(1)
def matchdictfunc(index,x,y,q,z): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a, n): # function to split list in n even parts
k, m = divmod(len(a), n)
return list((a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)))
def splitinput(index,X,Y,Q,Z): # split large lists X and Y in n-even parts (n = cpu_count), make new list containing n-times Q and Z (to feed Q and Z for every process)
cpu_count = multiprocessing.cpu_count()
#create multiple chunks for X and Y and index:
index_split = split(index,cpu_count)
X_split = split(X,cpu_count)
Y_split = split(Y,cpu_count)
# create list with several times Q and Z since it needs to be same length as X_split etc:
Q_mult = []
Z_mult = []
for _ in range(cpu_count):
Q_mult.append(Q)
Z_mult.append(Z)
return index_split,X_split,Y_split,Q_mult,Z_mult
# N will finally scale up to 10^9
N = 10000000
M = 300
index = [str(x) for x in list(range(N))]
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X).tolist()
Y = np.char.mod('%d', Y).tolist()
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
# single-core:
matchdict = matchdictfunc(index,X,Y,Q,Z)
# split lists to number of processors (cpu_count)
index_split,X_split,Y_split,Q_mult,Z_mult = splitinput(index,X,Y,Q,Z)
## Multiprocessing attempt - FAILS! (index out of range)
# finallist = []
# if __name__ == '__main__':
# with concurrent.futures.ProcessPoolExecutor() as executor:
# results = executor.map(matchlistfunc,X_split,Y_split,Q_mult,Z_mult)
# for result in results:
# finallist.append(result)
# matchdict = {}
# for d in finallist:
# matchdict.update(d)
Your function matchdictfunc currently has arguments x, y, q, z but in fact does not use them, although in the multiprocessing version it will need to use two arguments. There is also no need for function splitinput to replicate Q and Z into returned values Q_split and Z_split. Currently, matchdictfunc is expecting Q and Z to be global variables and we can arrange for that to be the case in the multiprocessing version by using the initializer and initargs arguments when constructing the pool. You should also move code that you do not need to be executed by the sub-processes into the block controlled by if __name__ == '__main__':, such as the arary initialization code. These changes result in:
import numpy as np
import multiprocessing
import concurrent.futures
MULTIPROCESSING = True
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(index, X, Y): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a, n): # function to split list in n even parts
k, m = divmod(len(a), n)
return list((a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)))
def splitinput(index, X, Y): # split large lists X and Y in n-even parts (n = cpu_count))
cpu_count = multiprocessing.cpu_count()
#create multiple chunks for X and Y and index:
index_split = split(index,cpu_count)
X_split = split(X,cpu_count)
Y_split = split(Y,cpu_count)
return index_split, X_split ,Y_split
def main():
# following required for non-multiprocessing
if not MULTIPROCESSING:
global Q, Z
np.random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
index = [str(x) for x in list(range(N))]
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X).tolist()
Y = np.char.mod('%d', Y).tolist()
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
# for non-multiprocessing:
if not MULTIPROCESSING:
matchdict = matchdictfunc(index, X, Y)
else:
# for multiprocessing:
# split lists to number of processors (cpu_count)
index_split, X_split, Y_split = splitinput(index, X, Y)
with concurrent.futures.ProcessPoolExecutor(initializer=init_pool, initargs=(Q, Z)) as executor:
finallist = [result for result in executor.map(matchdictfunc, index_split, X_split, Y_split)]
matchdict = {}
for d in finallist:
matchdict.update(d)
#print(matchdict)
if __name__ == '__main__':
main()
Note: I tried this for a smaller value of N = 1000 (printing out the results of matchdict) and the multiprocessing version seemed to return the same results. My machine does not have the resources to run with the full value of N without freezing up everything else.
Another Approach
I am working under the assumption that your DNA data is external and the X and Y values can be read n values at a time or can be read in and written out so that this is possible. Then rather than having all the data resident in memory and splitting it up into 32 pieces, I propose that it be read n values at a time and thus broken up into approximately N/n pieces.
In the following code I have switched to using the imap method from class multiprocessing.pool.Pool. The advantage is that it lazily submits tasks to the process pool, that is, the iterable argument doesn't have to be a list or convertible to a list. Instead the pool will iterate over the iterable sending tasks to the pool in chunksize groups. In the code below, I have used a generator function for the argument to imap, which will generate successive X and Y values. Your actual generator function would first open the DNA file (or files) and read in successive portions of the file.
import numpy as np
import multiprocessing
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(t): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
index, X, Y = t
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def next_tuple(n, stop, M):
start = 0
while True:
end = min(start + n, stop)
index = [str(x) for x in list(range(start, end))]
x = np.random.randint(M, size=n)
y = np.random.randint(M, size=n)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
x = np.char.mod('%d', x).tolist()
y = np.char.mod('%d', y).tolist()
yield (index, x, y)
start = end
if start >= stop:
break
def compute_chunksize(XY_AT_A_TIME, N):
n_tasks, remainder = divmod(N, XY_AT_A_TIME)
if remainder:
n_tasks += 1
chunksize, remainder = divmod(n_tasks, multiprocessing.cpu_count() * 4)
if remainder:
chunksize += 1
return chunksize
def main():
np.random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
matchdict = {}
# number of X, Y pairs at a time:
# experiment with this, especially as N increases:
XY_AT_A_TIME = 10000
chunksize = compute_chunksize(XY_AT_A_TIME, N)
#print('chunksize =', chunksize) # 32 with 8 cores
with multiprocessing.Pool(initializer=init_pool, initargs=(Q, Z)) as pool:
for d in pool.imap(matchdictfunc, next_tuple(XY_AT_A_TIME, N, M), chunksize):
matchdict.update(d)
#print(matchdict)
if __name__ == '__main__':
import time
t = time.time()
main()
print('total time =', time.time() - t)
Update
I want to eliminate using numpy from the benchmark. It is known that numpy uses multiprocessing for some of its operations and when used in multiprocessing applications can be the cause of of reduced performance. So the first thing I did was to take the OP's original program and where the code was, for example:
import numpy as np
np.random.seed(1)
X = np.random.randint(M, size=N)
X = np.char.mod('%d', X).tolist()
I replaced it with:
import random
random.seed(1)
X = [str(random.randrange(M)) for _ in range(N)]
I then timed the OP's program to get the time for generating the X, Y, Q and Z lists and the total time. On my desktop the times were approximately 20 seconds and 37 seconds respectively! So in my multiprocessing version just generating the arguments for the process pool's processes is more than half the total running time. I also discovered for the second approach, that as I increased the value of XY_AT_A_TIME that the CPU utilization went down from 100% to around 50% but that the total elapsed time improved. I haven't quite figured out why this is.
Next I tried to emulate how the programs would function if they were reading the data in. So I wrote out 2 * N random integers to a file, temp.txt and modified the OP's program to initialize X and Y from the file and then modified my second approach's next_tuple function as follows:
def next_tuple(n, stop, M):
with open('temp.txt') as f:
start = 0
while True:
end = min(start + n, stop)
index = [str(x) for x in range(start, end)] # improvement
x = [f.readline().strip() for _ in range(n)]
y = [f.readline().strip() for _ in range(n)]
yield (index, x, y)
start = end
if start >= stop:
break
Again as I increased XY_AT_A_TIME the CPU utilization went down (best performance I found was value 400000 with CPU utilization only around 40%).
I finally rewrote my first approach trying to be more memory efficient (see below). This updated version again reads the random numbers from a file but uses generator functions for X, Y and index so I don't need memory for both the full lists and the splits. Again, I do not expect duplicated results for the multiprocessing and non-multiprocessing versions because of the way I am assigning the X and Y values in the two cases (a simple solution to this would have been to write the random numbers to an X-value file and a Y-value file and read the values back from the two files). But this has no effect on the running times. But again, the CPU utilization, despite using the default pool size of 8, was only 30 - 40% (it fluctuated quite a bit) and the overall running time was nearly double the non-multiprocessing running time. But why?
import random
import multiprocessing
import concurrent.futures
import time
MULTIPROCESSING = True
POOL_SIZE = multiprocessing.cpu_count()
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(index, X, Y): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a): # function to split list in POOL_SIZE even parts
k, m = divmod(N, POOL_SIZE)
divisions = [(i + 1) * k + min(i + 1, m) - (i * k + min(i, m)) for i in range(POOL_SIZE)]
parts = []
for division in divisions:
part = [next(a) for _ in range(division)]
parts.append(part)
return parts
def splitinput(index, X, Y): # split large lists X and Y in n-even parts (n = POOL_SIZE)
#create multiple chunks for X and Y and index:
index_split = split(index)
X_split = split(X)
Y_split = split(Y)
return index_split, X_split ,Y_split
def main():
global N
# following required for non-multiprocessing
if not MULTIPROCESSING:
global Q, Z
random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = [str(random.randrange(M)) for _ in range(120000)]
Z = [str(random.randrange(M)) for _ in range(120000)]
with open('temp.txt') as f:
# for non-multiprocessing:
if not MULTIPROCESSING:
index = [str(x) for x in range(N)]
X = [f.readline().strip() for _ in range(N)]
Y = [f.readline().strip() for _ in range(N)]
matchdict = matchdictfunc(index, X, Y)
else:
# for multiprocessing:
# split lists to number of processors (POOL_SIZE)
# generator functions:
index = (str(x) for x in range(N))
X = (f.readline().strip() for _ in range(N))
Y = (f.readline().strip() for _ in range(N))
index_split, X_split, Y_split = splitinput(index, X, Y)
with concurrent.futures.ProcessPoolExecutor(POOL_SIZE, initializer=init_pool, initargs=(Q, Z)) as executor:
finallist = [result for result in executor.map(matchdictfunc, index_split, X_split, Y_split)]
matchdict = {}
for d in finallist:
matchdict.update(d)
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
Resolution?
Can it be that the overhead of transferring the data from the main process to the subprocesses, which involves shared memory reading and writing, is what is slowing everything down? So, this final version was an attempt to eliminate this potential cause for the slowdown. On my desktop I have 8 processors. For the first approach dividing the N = 10000000 X and Y values among them means that each process should be processing N // 8 -> 1250000 values. So I wrote out the random numbers in 16 groups of 1250000 numbers (8 groups for X and 8 groups for Y) as a binary file noting the offset and length of each of these 16 groups using the following code:
import random
random.seed(1)
with open('temp.txt', 'wb') as f:
offsets = []
for i in range(16):
n = [str(random.randrange(300)) for _ in range(1250000)]
b = ','.join(n).encode('ascii')
l = len(b)
offsets.append((f.tell(), l))
f.write(b)
print(offsets)
And from that I constructed lists X_SPECS and Y_SPECS that the worker function matchdictfunc could use for reading in the values X and Y itself as needed. So now instead of passing 1250000 values at a time to this worker function, we are just passing indices 0, 1, ... 7 to the worker function so it knows which group it has to read in. Shared memory access has been totally eliminated in accessing X and Y (it's still required for Q and Z) and the disk access moved to the process pool. The CPU Utilization will, of course, not be 100% because the worker function is doing I/O. But I found that while the running time has now been greatly improved, it still offered no improvement over the original non-multiprocessing version:
OP's original program modified to read `X` and `Y` values in from file: 26.2 seconds
Multiprocessing elapsed time: 29.2 seconds
In fact, when I changed the code to use multithreading by replacing the ProcessPoolExecutor with ThreadPoolExecutor, the elpased time went down almost another second demonstrating the there is very little contention for the Global Interpreter Lock within the worker function, i.e. most of the time is being spent in C-language code. The main work is done by:
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
When we do this with multiprocessing, we have multiple list comprehensions and multiple zip operations (on smaller lists) being performed by separate processes and we then assemble the results in the end. This is conjecture on my part, but there just may not be any performance gains to be had by taking what are already efficient operations and scaling them down across multiple processors. Or in other words, I am stumped and that was my best guess.
The final version (with some additional optimizations -- please note):
import random
import concurrent.futures
import time
POOL_SIZE = 8
X_SPECS = [(0, 4541088), (4541088, 4541824), (9082912, 4540691), (13623603, 4541385), (18164988, 4541459), (22706447, 4542961), (27249408, 4541847), (31791255, 4542186)]
Y_SPECS = [(36333441, 4542101), (40875542, 4540120), (45415662, 4540802), (49956464, 4540971), (54497435, 4541427), (59038862, 4541523), (63580385, 4541571), (68121956, 4542335)]
def init_pool(q_z):
global Q_Z
Q_Z = q_z
def matchdictfunc(index, i): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
x_offset, x_len = X_SPECS[i]
y_offset, y_len = Y_SPECS[i]
with open('temp.txt', 'rb') as f:
f.seek(x_offset, 0)
X = f.read(x_len).decode('ascii').split(',')
f.seek(y_offset, 0)
Y = f.read(y_len).decode('ascii').split(',')
lookup = {}
for i, (q, z) in enumerate(Q_Z):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a): # function to split list in POOL_SIZE even parts
k, m = divmod(N, POOL_SIZE)
divisions = [(i + 1) * k + min(i + 1, m) - (i * k + min(i, m)) for i in range(POOL_SIZE)]
parts = []
for division in divisions:
part = [next(a) for _ in range(division)]
parts.append(part)
return parts
def main():
global N
random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = (str(random.randrange(M)) for _ in range(120000))
Z = (str(random.randrange(M)) for _ in range(120000))
Q_Z = list(zip(Q, Z)) # pre-compute the `zip` function
# for multiprocessing:
# split lists to number of processors (POOL_SIZE)
# generator functions:
index = (str(x) for x in range(N))
index_split = split(index)
with concurrent.futures.ProcessPoolExecutor(POOL_SIZE, initializer=init_pool, initargs=(Q_Z,)) as executor:
finallist = executor.map(matchdictfunc, index_split, range(8))
matchdict = {}
for d in finallist:
matchdict.update(d)
print(len(matchdict))
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
The Cost of Inter-Process Memory Transfers
In the code below function create_files was called to create 100 identical files consisting of a "pickled" list of 1,000,000 numbers. I then used a multiprocessing pool of size 8 twice to read the 100 files and unpickle the files to reconstitute the original lists. The difference between the first case (worker1) and the second case (worker2) was that in the second case the list is returned back to the caller (but not saved so that memory can be garbage collected immediately). The second case took more than three times longer than the first case. This can also explain in part why you do not see a speedup when you switch to multiprocessing.
from multiprocessing import Pool
import pickle
import time
def create_files():
l = [i for i in range(1000000)]
# create 100 identical files:
for file in range(1, 101):
with open(f'pkl/test{file}.pkl', 'wb') as f:
pickle.dump(l, f)
def worker1(file):
file_name = f'pkl/test{file}.pkl'
with open(file_name, 'rb') as f:
obj = pickle.load(f)
def worker2(file):
file_name = f'pkl/test{file}.pkl'
with open(file_name, 'rb') as f:
obj = pickle.load(f)
return file_name, obj
POOLSIZE = 8
if __name__ == '__main__':
#create_files()
pool = Pool(POOLSIZE)
t = time.time()
# no data returned:
for file in range(1, 101):
pool.apply_async(worker1, args=(file,))
pool.close()
pool.join()
print(time.time() - t)
pool = Pool(POOLSIZE)
t = time.time()
for file in range(1, 101):
pool.apply_async(worker2, args=(file,))
pool.close()
pool.join()
print(time.time() - t)
t = time.time()
for file in range(1, 101):
worker2(file)
print(time.time() - t)

Why is getting the first 30 keys of the dictionary in two statements faster than one statement?

I was doing a benchmark for myself that I encountered this interesting thing. I am trying to get the first 30 keys of a dictionary, and I have written three ways to get it as follows:
import time
dic = {str(i): i for i in range(10 ** 6)}
start_time = time.time()
x = list(dic.keys())[0:30]
print(time.time() - start_time)
start_time = time.time()
y = list(dic.keys())
x = y[0:30]
print(time.time() - start_time)
start_time = time.time()
z = dic.keys()
y = list(z)
x = y[0:30]
print(time.time() - start_time)
The results are:
0.015970945358276367
0.010970354080200195
0.01691460609436035
Surprisingly, the second method is much faster! Any thoughts on this?
Using Python's timeit module to measure various alternatives. I added mine which doesn't convert the keys to list:
from timeit import timeit
dic = {str(i): i for i in range(10 ** 6)}
def f1():
x = list(dic.keys())[0:30]
return x
def f2():
y = list(dic.keys())
x = y[0:30]
return x
def f3():
z = dic.keys()
y = list(z)
x = y[0:30]
return x
def f4():
x = [k for _, k in zip(range(30), dic.keys())]
return x
t1 = timeit(lambda: f1(), number=10)
t2 = timeit(lambda: f2(), number=10)
t3 = timeit(lambda: f3(), number=10)
t4 = timeit(lambda: f4(), number=10)
print(t1)
print(t2)
print(t3)
print(t4)
Prints:
0.1911074290110264
0.20418328599771485
0.18727918600779958
3.5186996683478355e-05
Maybe this is due to inaccuracies in your measure of time. You can use timeit for doing this kind of things:
import timeit
dic = {str(i): i for i in range(10 ** 6)}
# 27.5125/29.0836/26.8525
timeit.timeit("x = list(dic.keys())[0:30]", number=1000, globals={"dic": dic})
# 28.6648/26.4684/30.9534
timeit.timeit("y = list(dic.keys());x=y[0:30]", number=1000)
# 31.7345/29.5301/30.7541
timeit.timeit("z=dic.keys();y=list(z);x=y[0:30]", number=1000, globals={'dic': dic})
The comments show the times I got when running the same code 3 different times. As you can see, even by performing a large number of repetitions, it is possible to obtain quite large variations in time measured. This can be due to several different things:
An item can be in the cache of your processor or not.
Your processor can be occupied doing several other things.
Etc...
As stated by #Andrej Kesely, your bottleneck is due to the fact that you cast your dictionary keys into a list. By doing so, Python goes through the entire dictionary keys, because that's how it converts something to a list generally. Hence, by avoiding this, you can get much better results.

Different Result when calculating Polepairs with Octave vs Python

I am trying to calculate pole pairs with both Octave 5.1.10 and Python 3.8.
The Octave code:
wc=1
n=4
s={n}
G=1
function poles (n, wc, G)
s={n}
for k =1:n
s{k}=wc*e^((j*(2*k+n-1)*pi)/(2*n))
endfor
endfunction
The ouput is:
s =
{
[1,1] = -0.38268 + 0.92388i
[1,2] = -0.92388 + 0.38268i
[1,3] = -0.92388 - 0.38268i
[1,4] = -0.38268 - 0.92388i
}
The Python code:
import numpy as np
import math
wc=1
n=4
G=1
def poles (n, wc, G):
import math
s=[] #contains the complex polpairs
e=math.e
pi=math.pi
for k in range(n):
s.append(wc*e**((1j*(2*k+n-1)*pi)/(2*n)))
return s
returns
s=[
(0.38268343236508984 + 0.9238795325112867j),
(-0.3826834323650897 + 0.9238795325112867j),
(-0.9238795325112867 + 0.3826834323650899j),
(-0.9238795325112868 - 0.38268343236508967j)]
Can someone explain to me why these two outputs are different?
In your octave loop, k takes values from 1 to 4.
In your python loop, k takes values from 0 to 3
If you want the same behaviour in your python loop, change
for k in range(4):
to
for k in range(1, 5):

Smoothing values (neighbors between 1-9)

Instructions: Compute and store R=1000 random values from 0-1 as x. moving_window_average(x, n_neighbors) is pre-loaded into memory from 3a. Compute the moving window average for x for the range of n_neighbors 1-9. Store x as well as each of these averages as consecutive lists in a list called Y.
My solution:
R = 1000
n_neighbors = 9
x = [random.uniform(0,1) for i in range(R)]
Y = [moving_window_average(x, n_neighbors) for n_neighbors in range(1,n_neighbors)]
where moving_window_average(x, n_neighbors) is a function as follows:
def moving_window_average(x, n_neighbors=1):
n = len(x)
width = n_neighbors*2 + 1
x = [x[0]]*n_neighbors + x + [x[-1]]*n_neighbors
# To complete the function,
# return a list of the mean of values from i to i+width for all values i from 0 to n-1.
mean_values=[]
for i in range(1,n+1):
mean_values.append((x[i-1] + x[i] + x[i+1])/width)
return (mean_values)
This gives me an error, Check your usage of Y again. Even though I've tested for a few values, I did not get yet why there is a problem with this exercise. Did I just misunderstand something?
The instruction tells you to compute moving averages for all neighbors ranging from 1 to 9. So the below code should work:
import random
random.seed(1)
R = 1000
x = []
for i in range(R):
num = random.uniform(0,1)
x.append(num)
Y = []
Y.append(x)
for i in range(1,10):
mov_avg = moving_window_average(x, n_neighbors=i)
Y.append(mov_avg)
Actually your moving_window_average(list, n_neighbors) function is not going to work with a n_neighbors bigger than one, I mean, the interpreter won't say a thing, but you're not delivering correctness on what you have been asked.
I suggest you to use something like:
def moving_window_average(x, n_neighbors=1):
n = len(x)
width = n_neighbors*2 + 1
x = [x[0]]*n_neighbors + x + [x[-1]]*n_neighbors
mean_values = []
for i in range(n):
temp = x[i: i+width]
sum_= 0
for elm in temp:
sum_+= elm
mean_values.append(sum_ / width)
return mean_values
My solution for +100XP
import random
random.seed(1)
R=1000
Y = list()
x = [random.uniform(0, 1) for num in range(R)]
for n_neighbors in range(10):
Y.append(moving_window_average(x, n_neighbors))

Resources