Python Multiprocessing (Splitting data in smaller chunks - multiple function arguments) - python-3.x
Note from 22.02.21:
-Potentially my problem could also be solved by a more efficient memory usage instead of multiprocessing, since I realized that the memory load gets very high and might be a limiting factor here.
I'm trying to reduce the time that my script needs to run by making use of multiprocessing.
In the past I got some good tips about increasing the speed of the function itself (Increase performance of np.where() loop), but now I would like to make use of all cores of a 32-core workstation.
My function compares entries of two lists (X and Y) with a reference lists Q and Z. For every element in X/Y, it checks whether X[i] occurs somewhere in Q and whether Y[i] occurs in Z. If X[i] == Q[s] AND Y[i] == Z[s], it returns the index "s".
(Note: My real data consists of DNA sequencing reads and I need to map my reads to the reference.)
What I tried so far:
Splitting my long lists X and Y into even chunks (n-chunks, where n == cpu_count)
Trying the "concurrent.futures.ProcessPoolExecutor()" to run the function for each "sublist" in parallel and in the end combine the result of each process to one final dictionary (matchdict). (--> see commented out section)
My problem:
All cores are getting used when I uncomment the multiprocessing section but it ends up with an error (index out of range) which I could not yet resolve. (--> Tip: lower N to 1000 and you will immediately see the error without waiting forever)
Does anyone know how to solve this, or can suggest a better approach to use multiprocessing in my code?
Here is the code:
import numpy as np
import multiprocessing
import concurrent.futures
np.random.seed(1)
def matchdictfunc(index,x,y,q,z): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a, n): # function to split list in n even parts
k, m = divmod(len(a), n)
return list((a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)))
def splitinput(index,X,Y,Q,Z): # split large lists X and Y in n-even parts (n = cpu_count), make new list containing n-times Q and Z (to feed Q and Z for every process)
cpu_count = multiprocessing.cpu_count()
#create multiple chunks for X and Y and index:
index_split = split(index,cpu_count)
X_split = split(X,cpu_count)
Y_split = split(Y,cpu_count)
# create list with several times Q and Z since it needs to be same length as X_split etc:
Q_mult = []
Z_mult = []
for _ in range(cpu_count):
Q_mult.append(Q)
Z_mult.append(Z)
return index_split,X_split,Y_split,Q_mult,Z_mult
# N will finally scale up to 10^9
N = 10000000
M = 300
index = [str(x) for x in list(range(N))]
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X).tolist()
Y = np.char.mod('%d', Y).tolist()
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
# single-core:
matchdict = matchdictfunc(index,X,Y,Q,Z)
# split lists to number of processors (cpu_count)
index_split,X_split,Y_split,Q_mult,Z_mult = splitinput(index,X,Y,Q,Z)
## Multiprocessing attempt - FAILS! (index out of range)
# finallist = []
# if __name__ == '__main__':
# with concurrent.futures.ProcessPoolExecutor() as executor:
# results = executor.map(matchlistfunc,X_split,Y_split,Q_mult,Z_mult)
# for result in results:
# finallist.append(result)
# matchdict = {}
# for d in finallist:
# matchdict.update(d)
Your function matchdictfunc currently has arguments x, y, q, z but in fact does not use them, although in the multiprocessing version it will need to use two arguments. There is also no need for function splitinput to replicate Q and Z into returned values Q_split and Z_split. Currently, matchdictfunc is expecting Q and Z to be global variables and we can arrange for that to be the case in the multiprocessing version by using the initializer and initargs arguments when constructing the pool. You should also move code that you do not need to be executed by the sub-processes into the block controlled by if __name__ == '__main__':, such as the arary initialization code. These changes result in:
import numpy as np
import multiprocessing
import concurrent.futures
MULTIPROCESSING = True
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(index, X, Y): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a, n): # function to split list in n even parts
k, m = divmod(len(a), n)
return list((a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)))
def splitinput(index, X, Y): # split large lists X and Y in n-even parts (n = cpu_count))
cpu_count = multiprocessing.cpu_count()
#create multiple chunks for X and Y and index:
index_split = split(index,cpu_count)
X_split = split(X,cpu_count)
Y_split = split(Y,cpu_count)
return index_split, X_split ,Y_split
def main():
# following required for non-multiprocessing
if not MULTIPROCESSING:
global Q, Z
np.random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
index = [str(x) for x in list(range(N))]
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X).tolist()
Y = np.char.mod('%d', Y).tolist()
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
# for non-multiprocessing:
if not MULTIPROCESSING:
matchdict = matchdictfunc(index, X, Y)
else:
# for multiprocessing:
# split lists to number of processors (cpu_count)
index_split, X_split, Y_split = splitinput(index, X, Y)
with concurrent.futures.ProcessPoolExecutor(initializer=init_pool, initargs=(Q, Z)) as executor:
finallist = [result for result in executor.map(matchdictfunc, index_split, X_split, Y_split)]
matchdict = {}
for d in finallist:
matchdict.update(d)
#print(matchdict)
if __name__ == '__main__':
main()
Note: I tried this for a smaller value of N = 1000 (printing out the results of matchdict) and the multiprocessing version seemed to return the same results. My machine does not have the resources to run with the full value of N without freezing up everything else.
Another Approach
I am working under the assumption that your DNA data is external and the X and Y values can be read n values at a time or can be read in and written out so that this is possible. Then rather than having all the data resident in memory and splitting it up into 32 pieces, I propose that it be read n values at a time and thus broken up into approximately N/n pieces.
In the following code I have switched to using the imap method from class multiprocessing.pool.Pool. The advantage is that it lazily submits tasks to the process pool, that is, the iterable argument doesn't have to be a list or convertible to a list. Instead the pool will iterate over the iterable sending tasks to the pool in chunksize groups. In the code below, I have used a generator function for the argument to imap, which will generate successive X and Y values. Your actual generator function would first open the DNA file (or files) and read in successive portions of the file.
import numpy as np
import multiprocessing
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(t): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
index, X, Y = t
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def next_tuple(n, stop, M):
start = 0
while True:
end = min(start + n, stop)
index = [str(x) for x in list(range(start, end))]
x = np.random.randint(M, size=n)
y = np.random.randint(M, size=n)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
x = np.char.mod('%d', x).tolist()
y = np.char.mod('%d', y).tolist()
yield (index, x, y)
start = end
if start >= stop:
break
def compute_chunksize(XY_AT_A_TIME, N):
n_tasks, remainder = divmod(N, XY_AT_A_TIME)
if remainder:
n_tasks += 1
chunksize, remainder = divmod(n_tasks, multiprocessing.cpu_count() * 4)
if remainder:
chunksize += 1
return chunksize
def main():
np.random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays and then to list, to represent original data (which are strings and not numbers)
Q = np.char.mod('%d', Q).tolist()
Z = np.char.mod('%d', Z).tolist()
matchdict = {}
# number of X, Y pairs at a time:
# experiment with this, especially as N increases:
XY_AT_A_TIME = 10000
chunksize = compute_chunksize(XY_AT_A_TIME, N)
#print('chunksize =', chunksize) # 32 with 8 cores
with multiprocessing.Pool(initializer=init_pool, initargs=(Q, Z)) as pool:
for d in pool.imap(matchdictfunc, next_tuple(XY_AT_A_TIME, N, M), chunksize):
matchdict.update(d)
#print(matchdict)
if __name__ == '__main__':
import time
t = time.time()
main()
print('total time =', time.time() - t)
Update
I want to eliminate using numpy from the benchmark. It is known that numpy uses multiprocessing for some of its operations and when used in multiprocessing applications can be the cause of of reduced performance. So the first thing I did was to take the OP's original program and where the code was, for example:
import numpy as np
np.random.seed(1)
X = np.random.randint(M, size=N)
X = np.char.mod('%d', X).tolist()
I replaced it with:
import random
random.seed(1)
X = [str(random.randrange(M)) for _ in range(N)]
I then timed the OP's program to get the time for generating the X, Y, Q and Z lists and the total time. On my desktop the times were approximately 20 seconds and 37 seconds respectively! So in my multiprocessing version just generating the arguments for the process pool's processes is more than half the total running time. I also discovered for the second approach, that as I increased the value of XY_AT_A_TIME that the CPU utilization went down from 100% to around 50% but that the total elapsed time improved. I haven't quite figured out why this is.
Next I tried to emulate how the programs would function if they were reading the data in. So I wrote out 2 * N random integers to a file, temp.txt and modified the OP's program to initialize X and Y from the file and then modified my second approach's next_tuple function as follows:
def next_tuple(n, stop, M):
with open('temp.txt') as f:
start = 0
while True:
end = min(start + n, stop)
index = [str(x) for x in range(start, end)] # improvement
x = [f.readline().strip() for _ in range(n)]
y = [f.readline().strip() for _ in range(n)]
yield (index, x, y)
start = end
if start >= stop:
break
Again as I increased XY_AT_A_TIME the CPU utilization went down (best performance I found was value 400000 with CPU utilization only around 40%).
I finally rewrote my first approach trying to be more memory efficient (see below). This updated version again reads the random numbers from a file but uses generator functions for X, Y and index so I don't need memory for both the full lists and the splits. Again, I do not expect duplicated results for the multiprocessing and non-multiprocessing versions because of the way I am assigning the X and Y values in the two cases (a simple solution to this would have been to write the random numbers to an X-value file and a Y-value file and read the values back from the two files). But this has no effect on the running times. But again, the CPU utilization, despite using the default pool size of 8, was only 30 - 40% (it fluctuated quite a bit) and the overall running time was nearly double the non-multiprocessing running time. But why?
import random
import multiprocessing
import concurrent.futures
import time
MULTIPROCESSING = True
POOL_SIZE = multiprocessing.cpu_count()
def init_pool(q, z):
global Q, Z
Q = q
Z = z
def matchdictfunc(index, X, Y): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a): # function to split list in POOL_SIZE even parts
k, m = divmod(N, POOL_SIZE)
divisions = [(i + 1) * k + min(i + 1, m) - (i * k + min(i, m)) for i in range(POOL_SIZE)]
parts = []
for division in divisions:
part = [next(a) for _ in range(division)]
parts.append(part)
return parts
def splitinput(index, X, Y): # split large lists X and Y in n-even parts (n = POOL_SIZE)
#create multiple chunks for X and Y and index:
index_split = split(index)
X_split = split(X)
Y_split = split(Y)
return index_split, X_split ,Y_split
def main():
global N
# following required for non-multiprocessing
if not MULTIPROCESSING:
global Q, Z
random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = [str(random.randrange(M)) for _ in range(120000)]
Z = [str(random.randrange(M)) for _ in range(120000)]
with open('temp.txt') as f:
# for non-multiprocessing:
if not MULTIPROCESSING:
index = [str(x) for x in range(N)]
X = [f.readline().strip() for _ in range(N)]
Y = [f.readline().strip() for _ in range(N)]
matchdict = matchdictfunc(index, X, Y)
else:
# for multiprocessing:
# split lists to number of processors (POOL_SIZE)
# generator functions:
index = (str(x) for x in range(N))
X = (f.readline().strip() for _ in range(N))
Y = (f.readline().strip() for _ in range(N))
index_split, X_split, Y_split = splitinput(index, X, Y)
with concurrent.futures.ProcessPoolExecutor(POOL_SIZE, initializer=init_pool, initargs=(Q, Z)) as executor:
finallist = [result for result in executor.map(matchdictfunc, index_split, X_split, Y_split)]
matchdict = {}
for d in finallist:
matchdict.update(d)
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
Resolution?
Can it be that the overhead of transferring the data from the main process to the subprocesses, which involves shared memory reading and writing, is what is slowing everything down? So, this final version was an attempt to eliminate this potential cause for the slowdown. On my desktop I have 8 processors. For the first approach dividing the N = 10000000 X and Y values among them means that each process should be processing N // 8 -> 1250000 values. So I wrote out the random numbers in 16 groups of 1250000 numbers (8 groups for X and 8 groups for Y) as a binary file noting the offset and length of each of these 16 groups using the following code:
import random
random.seed(1)
with open('temp.txt', 'wb') as f:
offsets = []
for i in range(16):
n = [str(random.randrange(300)) for _ in range(1250000)]
b = ','.join(n).encode('ascii')
l = len(b)
offsets.append((f.tell(), l))
f.write(b)
print(offsets)
And from that I constructed lists X_SPECS and Y_SPECS that the worker function matchdictfunc could use for reading in the values X and Y itself as needed. So now instead of passing 1250000 values at a time to this worker function, we are just passing indices 0, 1, ... 7 to the worker function so it knows which group it has to read in. Shared memory access has been totally eliminated in accessing X and Y (it's still required for Q and Z) and the disk access moved to the process pool. The CPU Utilization will, of course, not be 100% because the worker function is doing I/O. But I found that while the running time has now been greatly improved, it still offered no improvement over the original non-multiprocessing version:
OP's original program modified to read `X` and `Y` values in from file: 26.2 seconds
Multiprocessing elapsed time: 29.2 seconds
In fact, when I changed the code to use multithreading by replacing the ProcessPoolExecutor with ThreadPoolExecutor, the elpased time went down almost another second demonstrating the there is very little contention for the Global Interpreter Lock within the worker function, i.e. most of the time is being spent in C-language code. The main work is done by:
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
When we do this with multiprocessing, we have multiple list comprehensions and multiple zip operations (on smaller lists) being performed by separate processes and we then assemble the results in the end. This is conjecture on my part, but there just may not be any performance gains to be had by taking what are already efficient operations and scaling them down across multiple processors. Or in other words, I am stumped and that was my best guess.
The final version (with some additional optimizations -- please note):
import random
import concurrent.futures
import time
POOL_SIZE = 8
X_SPECS = [(0, 4541088), (4541088, 4541824), (9082912, 4540691), (13623603, 4541385), (18164988, 4541459), (22706447, 4542961), (27249408, 4541847), (31791255, 4542186)]
Y_SPECS = [(36333441, 4542101), (40875542, 4540120), (45415662, 4540802), (49956464, 4540971), (54497435, 4541427), (59038862, 4541523), (63580385, 4541571), (68121956, 4542335)]
def init_pool(q_z):
global Q_Z
Q_Z = q_z
def matchdictfunc(index, i): # function to match entries of X and Y to Q and Z and get index of Q/Z where their values match X/Y
x_offset, x_len = X_SPECS[i]
y_offset, y_len = Y_SPECS[i]
with open('temp.txt', 'rb') as f:
f.seek(x_offset, 0)
X = f.read(x_len).decode('ascii').split(',')
f.seek(y_offset, 0)
Y = f.read(y_len).decode('ascii').split(',')
lookup = {}
for i, (q, z) in enumerate(Q_Z):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
matchdict = {}
for ind, match in enumerate(matchlist):
matchdict[index[ind]] = match
return matchdict
def split(a): # function to split list in POOL_SIZE even parts
k, m = divmod(N, POOL_SIZE)
divisions = [(i + 1) * k + min(i + 1, m) - (i * k + min(i, m)) for i in range(POOL_SIZE)]
parts = []
for division in divisions:
part = [next(a) for _ in range(division)]
parts.append(part)
return parts
def main():
global N
random.seed(1)
# N will finally scale up to 10^9
N = 10000000
M = 300
# Q and Z size is fixed at 120000
Q = (str(random.randrange(M)) for _ in range(120000))
Z = (str(random.randrange(M)) for _ in range(120000))
Q_Z = list(zip(Q, Z)) # pre-compute the `zip` function
# for multiprocessing:
# split lists to number of processors (POOL_SIZE)
# generator functions:
index = (str(x) for x in range(N))
index_split = split(index)
with concurrent.futures.ProcessPoolExecutor(POOL_SIZE, initializer=init_pool, initargs=(Q_Z,)) as executor:
finallist = executor.map(matchdictfunc, index_split, range(8))
matchdict = {}
for d in finallist:
matchdict.update(d)
print(len(matchdict))
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
The Cost of Inter-Process Memory Transfers
In the code below function create_files was called to create 100 identical files consisting of a "pickled" list of 1,000,000 numbers. I then used a multiprocessing pool of size 8 twice to read the 100 files and unpickle the files to reconstitute the original lists. The difference between the first case (worker1) and the second case (worker2) was that in the second case the list is returned back to the caller (but not saved so that memory can be garbage collected immediately). The second case took more than three times longer than the first case. This can also explain in part why you do not see a speedup when you switch to multiprocessing.
from multiprocessing import Pool
import pickle
import time
def create_files():
l = [i for i in range(1000000)]
# create 100 identical files:
for file in range(1, 101):
with open(f'pkl/test{file}.pkl', 'wb') as f:
pickle.dump(l, f)
def worker1(file):
file_name = f'pkl/test{file}.pkl'
with open(file_name, 'rb') as f:
obj = pickle.load(f)
def worker2(file):
file_name = f'pkl/test{file}.pkl'
with open(file_name, 'rb') as f:
obj = pickle.load(f)
return file_name, obj
POOLSIZE = 8
if __name__ == '__main__':
#create_files()
pool = Pool(POOLSIZE)
t = time.time()
# no data returned:
for file in range(1, 101):
pool.apply_async(worker1, args=(file,))
pool.close()
pool.join()
print(time.time() - t)
pool = Pool(POOLSIZE)
t = time.time()
for file in range(1, 101):
pool.apply_async(worker2, args=(file,))
pool.close()
pool.join()
print(time.time() - t)
t = time.time()
for file in range(1, 101):
worker2(file)
print(time.time() - t)
Related
Numpy Vectorization for Nested 'for' loop
I was trying to write a program which plots level set for any given function. rmin = -5.0 rmax = 5.0 c = 4.0 x = np.arange(rmin,rmax,0.1) y = np.arange(rmin,rmax,0.1) x,y = np.meshgrid(x,y) f = lambda x,y: y**2.0 - 4*x realplots = [] for i in range(x.shape[0]): for j in range(x.shape[1]): if abs(f(x[i,j],y[i,j])-c)< 1e-4: realplots.append([x[i,j],y[i,j]])` But it being a nested for loop, is taking lot of time. Any help in vectorizing the above code/new method of plotting level set is highly appreciated.(Note: The function 'f' will be changed at the time of running.So, the vectorization must be done without considering the function's properties) I tried vectorizing through ans = np.where(abs(f(x,y)-c)<1e-4,np.array([x,y]),[0,0]) but it was giving me operands could not be broadcast together with shapes (100,100) (2,100,100) (2,) I was adding [0,0] as an escape from else condition in np.where which is indeed wrong.
Since you get the values rather than the indexes, you don't really need np.where. You can directly use the mask to index x and y, look at the "Boolean array indexing" section of the documentation. It is straightforward: def vectorized(x, y, c, f, threshold): mask = np.abs(f(x, y) - c) < threshold x, y = x[mask], y[mask] return np.stack([x, y], axis=-1) Your function for reference: def op(x, y, c, f, threshold): res = [] for i in range(x.shape[0]): for j in range(x.shape[1]): if abs(f(x[i, j], y[i, j]) - c) < threshold: res.append([x[i, j], y[i, j]]) return res Tests: rmin, rmax = -5.0, +5.0 c = 4.0 threshold = 1e-4 x = np.arange(rmin, rmax, 0.1) y = np.arange(rmin, rmax, 0.1) x, y = np.meshgrid(x, y) f = lambda x, y: y**2 - 4 * x res_op = op(x, y, c, f, threshold) res_vec = vectorized(x, y, c, f, threshold) assert np.allclose(res_op, res_vec)
Why is getting the first 30 keys of the dictionary in two statements faster than one statement?
I was doing a benchmark for myself that I encountered this interesting thing. I am trying to get the first 30 keys of a dictionary, and I have written three ways to get it as follows: import time dic = {str(i): i for i in range(10 ** 6)} start_time = time.time() x = list(dic.keys())[0:30] print(time.time() - start_time) start_time = time.time() y = list(dic.keys()) x = y[0:30] print(time.time() - start_time) start_time = time.time() z = dic.keys() y = list(z) x = y[0:30] print(time.time() - start_time) The results are: 0.015970945358276367 0.010970354080200195 0.01691460609436035 Surprisingly, the second method is much faster! Any thoughts on this?
Using Python's timeit module to measure various alternatives. I added mine which doesn't convert the keys to list: from timeit import timeit dic = {str(i): i for i in range(10 ** 6)} def f1(): x = list(dic.keys())[0:30] return x def f2(): y = list(dic.keys()) x = y[0:30] return x def f3(): z = dic.keys() y = list(z) x = y[0:30] return x def f4(): x = [k for _, k in zip(range(30), dic.keys())] return x t1 = timeit(lambda: f1(), number=10) t2 = timeit(lambda: f2(), number=10) t3 = timeit(lambda: f3(), number=10) t4 = timeit(lambda: f4(), number=10) print(t1) print(t2) print(t3) print(t4) Prints: 0.1911074290110264 0.20418328599771485 0.18727918600779958 3.5186996683478355e-05
Maybe this is due to inaccuracies in your measure of time. You can use timeit for doing this kind of things: import timeit dic = {str(i): i for i in range(10 ** 6)} # 27.5125/29.0836/26.8525 timeit.timeit("x = list(dic.keys())[0:30]", number=1000, globals={"dic": dic}) # 28.6648/26.4684/30.9534 timeit.timeit("y = list(dic.keys());x=y[0:30]", number=1000) # 31.7345/29.5301/30.7541 timeit.timeit("z=dic.keys();y=list(z);x=y[0:30]", number=1000, globals={'dic': dic}) The comments show the times I got when running the same code 3 different times. As you can see, even by performing a large number of repetitions, it is possible to obtain quite large variations in time measured. This can be due to several different things: An item can be in the cache of your processor or not. Your processor can be occupied doing several other things. Etc... As stated by #Andrej Kesely, your bottleneck is due to the fact that you cast your dictionary keys into a list. By doing so, Python goes through the entire dictionary keys, because that's how it converts something to a list generally. Hence, by avoiding this, you can get much better results.
Get output from functions which are run in parallel
I have two functions which I am running asynchronously using python multiprocessing. I need to store values from each function in an array, to use it later. I have tried to store the output in a numpy array in each of the functions and return them. However, I am not getting any output. The print statements are working alright. x = np.array([]) y = np.array([]) def func1(): for _ in range(1,150): print((rstr.xeger(random.choice(sList)))) np.append(x,rstr.xeger(random.choice(sList))) return x def func2(): for _ in range(1,150): print(fake.name()) np.append(y,fake.name()) return y if __name__ == '__main__': p1 = multiprocessing.Pool(5) p2 = multiprocessing.Pool(5) print("Started") start = time.time() x = p1.apply_async(func1) y = p2.apply_async(func2) print(time.time() - start) print("Ended") I need the arrays x and y which are being returned from the two functions.
Better way to solve simultaneous linear equations programmatically in Python
I have the following code that solves simultaneous linear equations by starting with the first equation and finding y when x=0, then putting that y into the second equation and finding x, then putting that x back into the first equation etc... Obviously, this has the potential to reach infinity, so if it reaches +-inf then it swaps the order of the equations so the spiral/ladder goes the other way. This seems to work, tho I'm not such a good mathematician that I can prove it will always work beyond a hunch, and of course some lines never meet (I know how to use matrices and linear algebra to check straight off whether they will never meet, but I'm not so interested in that atm). Is there a better way to 'spiral' in on the answer? I'm not interested in using math functions or numpy for the whole solution - I want to be able to code the solution. I don't mind using libraries to improve the performance, for instance using some sort of statistical method. This may be a very naive question from either a coding or maths point of view, but if so I'd like to know why! My code is as follows: # A python program to solve 2d simultaneous equations # by iterating over coefficients in spirals import numpy as np def Input(coeff_or_constant, var, lower, upper): val = int(input("Let the {} {} be a number between {} and {}: ".format(coeff_or_constant, var, lower, upper))) if val >= lower and val <= upper : return val else: print("Invalid input") exit(0) def Equation(equation_array): a = Input("coefficient", "a", 0, 10) b = Input("coefficient", "b", 0, 10) c = Input("constant", "c", 0, 10) equation_list = [a, b, c] equation_array.append(equation_list) return equation_array def Stringify_Equations(equation_array): A = str(equation_array[0][0]) B = str(equation_array[0][1]) C = str(equation_array[0][2]) D = str(equation_array[1][0]) E = str(equation_array[1][1]) F = str(equation_array[1][2]) eq1 = str(A + "y = " + B + "x + " + C) eq2 = str(D + "y = " + E + "x + " + F) print(eq1) print(eq2) def Spiral(equation_array): a = equation_array[0][0] b = equation_array[0][1] c = equation_array[0][2] d = equation_array[1][0] e = equation_array[1][1] f = equation_array[1][2] # start at y when x = 0 x = 0 infinity_flag = False count = 0 coords = [] coords.append([0, 0]) coords.append([1, 1]) # solve equation 2 for x when y = START while not (coords[0][0] == coords[1][0]): try: y = ( ( b * x ) + c ) / a except: y = 0 print(y) try: x = ( ( d * y ) - f ) / e except: x = 0 if x >= 100000 or x <= -100000: count = count + 1 if count >= 100000: print("It\'s looking like these linear equations don\'t intersect!") break print(x) new_coords = [x, y] coords.append(new_coords) coords.pop(0) if not ((x == float("inf") or x == float("-inf")) and (y == float("inf") or y == float("-inf"))): pass else: infinity_flag if False else True if infinity_flag == False: # if the spiral is divergent this switches the equations around so it converges # the infinity_flag is to check if both spirals returned infinity meaning the lines do not intersect # I think this would mostly work for linear equations, but for other kinds of equations it might not x = 0 a = equation_array[1][0] b = equation_array[1][1] c = equation_array[1][2] d = equation_array[0][0] e = equation_array[0][1] f = equation_array[0][2] infinity_flag = False else: print("These linear equations do not intersect") break y = round(y, 3) x = round(x, 3) print(x, y) equation_array = [] print("Specify coefficients a and b, and a constant c for equation 1") equations = Equation(equation_array) print("Specify coefficients a and b, and a constant c for equation 1") equations = Equation(equation_array) print(equation_array) Stringify_Equations(equation_array) Spiral(equation_array)
Smoothing values (neighbors between 1-9)
Instructions: Compute and store R=1000 random values from 0-1 as x. moving_window_average(x, n_neighbors) is pre-loaded into memory from 3a. Compute the moving window average for x for the range of n_neighbors 1-9. Store x as well as each of these averages as consecutive lists in a list called Y. My solution: R = 1000 n_neighbors = 9 x = [random.uniform(0,1) for i in range(R)] Y = [moving_window_average(x, n_neighbors) for n_neighbors in range(1,n_neighbors)] where moving_window_average(x, n_neighbors) is a function as follows: def moving_window_average(x, n_neighbors=1): n = len(x) width = n_neighbors*2 + 1 x = [x[0]]*n_neighbors + x + [x[-1]]*n_neighbors # To complete the function, # return a list of the mean of values from i to i+width for all values i from 0 to n-1. mean_values=[] for i in range(1,n+1): mean_values.append((x[i-1] + x[i] + x[i+1])/width) return (mean_values) This gives me an error, Check your usage of Y again. Even though I've tested for a few values, I did not get yet why there is a problem with this exercise. Did I just misunderstand something?
The instruction tells you to compute moving averages for all neighbors ranging from 1 to 9. So the below code should work: import random random.seed(1) R = 1000 x = [] for i in range(R): num = random.uniform(0,1) x.append(num) Y = [] Y.append(x) for i in range(1,10): mov_avg = moving_window_average(x, n_neighbors=i) Y.append(mov_avg)
Actually your moving_window_average(list, n_neighbors) function is not going to work with a n_neighbors bigger than one, I mean, the interpreter won't say a thing, but you're not delivering correctness on what you have been asked. I suggest you to use something like: def moving_window_average(x, n_neighbors=1): n = len(x) width = n_neighbors*2 + 1 x = [x[0]]*n_neighbors + x + [x[-1]]*n_neighbors mean_values = [] for i in range(n): temp = x[i: i+width] sum_= 0 for elm in temp: sum_+= elm mean_values.append(sum_ / width) return mean_values
My solution for +100XP import random random.seed(1) R=1000 Y = list() x = [random.uniform(0, 1) for num in range(R)] for n_neighbors in range(10): Y.append(moving_window_average(x, n_neighbors))