I am using python-3.x and I would like to speed my code by parallelizing my functions using the multiprocessing, I applied the multiprocessing but for some reason, it didn't work probably, I am not sure where is the problem?
so the following is a small example of what I did.
Any suggestions are appreciated
import numpy as np
import math
import multiprocessing as mp
lower_bound = -500
upper_bound =500
dimension =1000
Base_Value = 10
Popula_size = 3000
MinResolution = 8
population_in = np.random.choice ( np.linspace ( lower_bound , upper_bound , Base_Value ** MinResolution ) , size = ( Popula_size , dimension ) , replace = True )
resolution = np.random.randint(1, 8, size = (1, dimension))
def Discretiz(lower_bound, upper_bound, DiscPopulation, resolution):
pop_size = int(len(DiscPopulation))
the_new_population = np.zeros ((pop_size, dimension))
for i in range (pop_size) :
for ii in range (dimension):
decimal = int(np.round((DiscPopulation[i][ii] - lower_bound) / ((upper_bound-lower_bound)/(math.pow(Base_Value,resolution[:,ii])-1))))
the_new_population[i, ii] = (lower_bound + decimal * ((upper_bound-lower_bound)/(math.pow(Base_Value,resolution[:,ii])-1)))
return the_new_population
# without_parallelizing
# the_new_population = Discretiz(lower_bound, upper_bound, population_in, resolution)
# wit_parallelizing
pool = mp.Pool(mp.cpu_count())
the_new_population = [pool.apply(Discretiz, args=(lower_bound, upper_bound, population_in, resolution))]
print (the_new_population)
With:
population_in = np.random.choice ( np.linspace ( lower_bound , upper_bound , Base_Value ** MinResolution ) , size = ( Popula_size , dimension ) , replace = True )
you make a 2d array (Popula_size, dimension) shape. This is passed as DiscPopulation.
resolution = np.random.randint(1, 8, size = (1, dimension))
The double iteration function can be replaced with one that operates on whole arrays without the slow iteration:
def Discretiz(lower_bound, upper_bound, DiscPopulation, resolution):
pop_size = DiscPopulation[0] # no need for the 'int'
num = DiscPopulation - lower_bound
divisor = (upper_bound-lower_bound)/(Base_value**resolution-1)
decimal = num/divisor
# this divide does (pop,dimension)/(1,dimension); ok by broadcasting)
decimal = np.round(decimal) # no need for int
the_new_population = lower_bound + decimal * divisor
return the_new_population
I wrote this in-place here. It is syntactically correct, but I have not tried to run it.
I fixed the code now but still not faster than the old one it takes more time not sure why?
without parallelizing: 25.831339597702026 seconds
with parallelizing: 44.12706518173218 seconds ???!!!
import numpy as np
import math
import multiprocessing as mp
import time
from multiprocessing import Process, Value, Array, Manager, Pool, cpu_count
import time
lower_bound = -500
upper_bound =500
dimension =1000
Base_Value = 10
Popula_size = 2000
MinResolution = 8
population_in = np.random.choice ( np.linspace ( lower_bound , upper_bound , Base_Value ** MinResolution ) , size = ( Popula_size , dimension ) , replace = True )
resolution = np.random.randint(1, 8, size = (1, dimension))
start_time = time.time()
def Discretiz1(DiscPopulation, resolution):
# def Discretiz1(DiscPopulation):
DiscPopulation = np.reshape(DiscPopulation, (Popula_size, dimension))
resolution = np.reshape(resolution, (1,dimension))
the_new_population = np.zeros ((Popula_size, dimension))
for i in range (Popula_size) :
for ii in range (dimension):
decimal = int(np.round((DiscPopulation[i][ii] - lower_bound) / ((upper_bound-lower_bound)/(math.pow(Base_Value,resolution[:,ii])-1))))
the_new_population[i, ii] = (lower_bound + decimal * ((upper_bound-lower_bound)/(math.pow(Base_Value,resolution[:,ii])-1)))
# print(the_new_population)
if __name__ == '__main__':
num_cores = cpu_count()
Pool(processes=num_cores)
population_in = np.reshape(population_in, (1,Popula_size * dimension))[0]
resolution = np.reshape(resolution, (1,dimension))[0]
arr1 = Array('d', population_in)
arr2 = Array('i', resolution)
start_time = time.time()
p = Process(target=Discretiz1, args=(arr1, arr2))
p.start()
p.join()
print('--- %s seconds ---'%(time.time() - start_time))
print("--- %s seconds ---3" % (time.time() - start_time))
this is the olde one or without parallelizing:
import numpy as np
import math
import multiprocessing as mp
import time
from multiprocessing import Process, Value, Array, Manager, Pool, cpu_count
import time
lower_bound = -500
upper_bound =500
dimension =1000
Base_Value = 10
Popula_size = 2000
MinResolution = 8
population_in = np.random.choice ( np.linspace ( lower_bound , upper_bound , Base_Value ** MinResolution ) , size = ( Popula_size , dimension ) , replace = True )
resolution = np.random.randint(1, 8, size = (1, dimension))
start_time = time.time()
def Discretiz(lower_bound, upper_bound, DiscPopulation, resolution):
pop_size = int(len(DiscPopulation))
the_new_population = np.zeros ((pop_size, dimension))
for i in range (pop_size) :
for ii in range (dimension):
decimal = int(np.round((DiscPopulation[i][ii] - lower_bound) / ((upper_bound-lower_bound)/(math.pow(Base_Value,resolution[:,ii])-1))))
the_new_population[i, ii] = (lower_bound + decimal * ((upper_bound-lower_bound)/(math.pow(Base_Value,resolution[:,ii])-1)))
return the_new_population
# without_parallelizing
the_new_population = Discretiz(lower_bound, upper_bound, population_in, resolution)
print("--- %s seconds ---" % (time.time() - start_time))
Related
I'm writing a script that tracks the shifts of a sample by estimating the displacement of an ensemble of particles. The first implementation, in Python, works alright, but it takes too long for a large amount of samples. To combat this, I tried rewriting the method in Cython, but as this was my first time ever using it, I can't seem to get any performance increases. I know 3D FFTs exist and are often faster than looped 2D FFTs, but for this instance, they take too much memory and or slower than for-loops.
Python function:
import numpy as np
from scipy.fft import fftshift
import pyfftw
def python_corr(frame_a, frame_b):
DTYPEf = 'float32'
DTYPEc = 'complex64'
k = frame_a.shape[0]
m = frame_a.shape[1] # size y of 2d sample
n = frame_a.shape[2] # size x of 2d sample
fs = [m,n] # sample shape
bs = [m,n//2+1] # rfft sample shape
corr = np.zeros([k,m,n], DTYPEf) # out
fft_forward = pyfftw.builders.rfft2(
pyfftw.empty_aligned(fs, dtype = DTYPEf),
axes = [-2,-1],
)
fft_backward = pyfftw.builders.irfft2(
pyfftw.empty_aligned(bs, dtype = DTYPEc),
axes = [-2,-1],
)
for ind in range(k): # looping over 2D samples
window_a = frame_a[ind,:,:]
window_b = frame_b[ind,:,:]
corr[ind,:,:] = fftshift( # cross correlation via FFT algorithm
np.real(fft_backward(
np.conj(fft_forward(window_a))*fft_forward(window_b)
)),
axes = [-2,-1]
)
return corr
Cython function:
import numpy as np
from scipy.fft import fftshift
import pyfftw
cimport numpy as np
np.import_array()
cimport cython
DTYPEf = np.float32
ctypedef np.float32_t DTYPEf_t
DTYPEc = np.complex64
ctypedef np.complex64_t DTYPEc_t
#cython.boundscheck(False)
#cython.nonecheck(False)
def cython_corr(
np.ndarray[DTYPEf_t, ndim = 3] frame_a,
np.ndarray[DTYPEf_t, ndim = 3] frame_b,
):
cdef int ind, k, m, n
k = frame_a.shape[0]
m = frame_a.shape[1] # size y of sample
n = frame_a.shape[2] # size x of sample
cdef DTYPEf_t[:,:] window_a = pyfftw.empty_aligned([m,n], dtype = DTYPEf) # sample a
window_a[:,:] = 0.
cdef DTYPEf_t[:,:] window_b = pyfftw.empty_aligned([m,n], dtype = DTYPEf) # sample b
window_b[:,:] = 0.
cdef DTYPEf_t[:,:] corr = pyfftw.empty_aligned([m,n], dtype = DTYPEf) # cross-corr matrix
corr[:,:] = 0.
cdef DTYPEf_t[:,:,:] out = pyfftw.empty_aligned([k,m,n], dtype = DTYPEf) # out
out[:,:] = 0.
cdef object fft_forward
cdef object fft_backward
cdef DTYPEc_t[:,:] f2a = pyfftw.empty_aligned([m, n//2+1], dtype = DTYPEc) # rfft out of sample a
f2a[:,:] = 0. + 0.j
cdef DTYPEc_t[:,:] f2b = pyfftw.empty_aligned([m, n//2+1], dtype = DTYPEc) # rfft out of sample b
f2b[:,:] = 0. + 0.j
cdef DTYPEc_t[:,:] r = pyfftw.empty_aligned([m, n//2+1], dtype = DTYPEc) # power spectrum of sample a and b
r[:,:] = 0. + 0.j
fft_forward = pyfftw.builders.rfft2(
pyfftw.empty_aligned([m,n], dtype = DTYPEf),
axes = [0,1],
)
fft_backward = pyfftw.builders.irfft2(
pyfftw.empty_aligned([m,n//2+1], dtype = DTYPEc),
axes = [0,1],
)
for ind in range(k):
window_a = frame_a[ind,:,:]
window_b = frame_b[ind,:,:]
r = np.conj(fft_forward(window_a))*fft_forward(window_b) # power spectrum of sample a and b
corr = fft_backward(r).real # cross correlation
corr = fftshift(corr, axes = [0,1]) # shift Q1 --> Q3, Q2 --> Q4
# the fftshift could be moved out of the loop, but lets use that as a last resort :)
out[ind,:,:] = corr
return out
Test for methods:
import time
aa = bb = np.empty([14000, 24,24]).astype('float32') # a small test with 14000 24x24px samples
print(f'Number of samples: {aa.shape[0]}')
start = time.time()
corr = python_corr(aa, bb)
print(f'Time for Python: {time.time() - start}')
del corr
start = time.time()
corr = cython_corr(aa, bb)
print(f'Time for Cython: {time.time() - start}')
del corr
I have used the Equation of Motion (Newtons Law) for a simple spring and mass scenario incorporating it into the given 2nd ODE equation y" + (k/m)x = 0; y(0) = 3; y'(0) = 0.
Using the Euler method and the exact solution to solve the problem, I have been able to run and receive some ok results. However, when I execute a plot of the results I get this diagonal line across the oscillating results that I am after.
Current plot output with diagonal line
Can anyone help point out what is causing this issue, and how I can fix it please?
MY CODE:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sympy import Function, dsolve, Eq, Derivative, sin, cos, symbols
from sympy.abc import x, i
import math
# Given is y" + (k/m)x = 0; y(0) = 3; y'(0) = 0
# Parameters
h = 0.01; #Step Size
t = 50.0; #Time(sec)
k = 1; #Spring Stiffness
m = 1; #Mass
x0 = 3;
v0 = 0;
# Exact Analytical Solution
x_exact = x0*cos(math.sqrt(k/m)*t);
v_exact = -x0*math.sqrt(k/m)*sin(math.sqrt(k/m)*t);
# Eulers Method
x = np.zeros( int( t/h ) );
v = np.zeros( int( t/h ) );
x[1] = x0;
v[1] = v0;
x_exact = np.zeros( int( t/h ) );
v_exact = np.zeros( int( t/h ) );
te = np.zeros( int( t/h ) );
x_exact[1] = x0;
v_exact[1] = v0;
#print(len(x));
for i in range(1, int(t/h) - 1): #MAIN LOOP
x[i+1] = x[i] + h*v[i];
v[i+1] = v[i] - h*k/m*x[i];
te[i] = i * h
x_exact[i] = x0*cos(math.sqrt(k/m)* te[i]);
v_exact[i] = -x0*math.sqrt(k/m)*sin(math.sqrt(k/m)* te[i]);
# print(x_exact[i], '\t'*2, x[i]);
#plot
%config InlineBackend.figure_format = 'svg'
plt.plot(te, x_exact, te ,v_exact)
plt.title("DISPLACEMENT")
plt.xlabel("Time (s)")
plt.ylabel("Displacement (m)")
plt.grid(linewidth=0.3)
An in some details more direct computation is
te = np.arange(0,t,h)
N = len(te)
w = (k/m)**0.5
x_exact = x0*np.cos(w*te);
v_exact = -x0*w*np.sin(w*te);
plt.plot(te, x_exact, te ,v_exact)
resulting in
Note that arrays in python start at the index zero,
x = np.empty(N)
v = np.empty(N)
x[0] = x0;
v[0] = v0;
for i in range(N - 1): #MAIN LOOP
x[i+1] = x[i] + h*v[i];
v[i+1] = v[i] - h*k/m*x[i];
plt.plot(te, x, te ,v)
then gives the plot
with the expected increasing amplitude.
I'm trying to run the code below but there is an error in the above.
The complete code and error is:
line 27
return np.exp(-1.0)*self.rf*self.T)*average SyntaxError: invalid syntax
import numpy as np
import math
import time
class optionPricing:
def __init__(self,S0,E,T,rf,sigma,interations):
self.S0 = S0
self.E = E
self.T = T
self.rf = rf
self.sigma = sigma
self.interations = interations
def call_option_simulation(self):
option_data = np.zeros([self.interations, 2])
rand = np.random.normal(0, 1, [1, self.interations])
stock_price = self.S0*np.exp(self.T*(self.rf - 0.5*self.sigma**2)+self.sigma*np.sqrt(self.T)*rand)
option_data[:,1] = stock_price - self.E
average = np.sum(np.amax(option_data, axis=1))/float(self.interations)
return np.exp(-1.0)*self.rf*self.T)*average
def put_option_simulation(self):
option_data = np.zeros([self.interations, 2])
rand = np.random.normal(0, 1, [1, self.interations])
stock_price = self.S0 * np.exp(self.T * (self.rf - 0.5 * self.sigma ** 2) + self.sigma * np.sqrt(self.T) * rand)
option_data[:, 1] = self.E - stock_price
average = np.sum(np.amax(option_data, axis=1)) / float(self.interations)
return np.exp(-1.0) * self.rf * self.T) * average
if __name__ == "__name__":
S0=100 #underlaying stock price at t=0
E=100 #strike price
T = 1 #time to maturity
rf = 0.05 #risk-free rate
sigma=0.2 #volatility of the underlying stock
interations = 10000000 #number of interations in the monte-carlo simulation
model = optionPricing(S0,E,T,rf,sigma,interations)
print("call option price with monte-carlo approach: ", model.call_option_simulation())
ptint("put option price with monte-carlo approach: ", model.put_option_simulation())
two open bracket and one close . return np.exp(-1.0)self.rfself.T)*average
I am trying to do Monte Carlo minimization to solve for parameters of a given equation. My equation has 4 parameters, making my iteration about 4**n
when I try iteration n = 100, I saw it is not a good idea to search all the parameter space.
Here is my code:
import sys
import numpy as np
#import matplotlib.pyplot as plt
#import pandas as pd
import random
#method returns sum square for given parameter m and c
def currentFunc(x,alpha1,alpha2,alpha3,alpha4):
term = -(x/alpha4)
term_Norm = term
expoterm = np.exp(term_Norm)
expoterm = np.exp(term_Norm)
#print('check term: x: %0.10f %0.10f exp: %0.10f' % (x,term_Norm,expoterm) )
return(-alpha1*( (alpha2/(alpha3+ expoterm )) - 1))
def sumsquarecurr(x,y,a1,a2,a3,a4):
xsize = len(x)
ysize = len(y)
sumsqdiff = 0
if(xsize != ysize):
print("check your X and Y length exiting ...")
sys.exit(0)
for i in range(ysize):
diff = y[i] - currentFunc(x[i],a1,a2,a3,a4)
sumsqdiff+=diff*diff
return sumsqdiff
# number of random number (this affects the accuracy of the Monte Carlo method
n = 10
a_rnad = []
b_rnad = []
c_rnad = []
d_rnad = []
for i in range(n):
#random.seed(555)
xtemp = random.uniform(0.0, 2.0)
print('check %.4f ' % (xtemp))
a_rnad.append(xtemp)
b_rnad.append(xtemp)
c_rnad.append(xtemp)
d_rnad.append(xtemp)
Yfit=[-7,-5,-3,-1,1,3,5,7]
Xfit=[8.077448e-07,6.221196e-07,4.231292e-07,1.710039e-07,-4.313762e-05,-8.248818e-05,-1.017410e-04,-1.087409e-04]
# placeholder for the parameters and the minimun sum squared
#[alpha1,alpha2,alpha3,alpha4,min]
minparam = [0,0,0,0,99999999999.0]
for j in range(len(a_rnad)):
for i in range(len(b_rnad)):
for k in range(len(c_rnad)):
for m in range(len(d_rnad)):
minsumsqdiff_temp =sumsquarecurr(Xfit,Yfit,a_rnad[j],b_rnad[i],c_rnad[k],d_rnad[m])
print('alpha1: %.4f alpha2: %.4f alpha3: %.4f alpha4: %.4f min: %0.4f' % (a_rnad[j],b_rnad[i],c_rnad[k],d_rnad[m],minsumsqdiff_temp))
if(minsumsqdiff_temp<minparam[4]):
minparam[0] = a_rnad[j]
minparam[1] = b_rnad[i]
minparam[2] = c_rnad[k]
minparam[3] = d_rnad[m]
minparam[4] = minsumsqdiff_temp
print('minimazation: alpha1: %.4f alpha2: %.4f alpha3: %.4f alpha4: %.4f min: %0.4f' % (minparam[0],minparam[1],minparam[2],minparam[3],minparam[4]))
Question:
is there a way to make this algorithm run faster (either by cutting the search/phase space down)?
I feel I am reinventing the wheel. please does anyone know a python module that can do what I am trying to do?
Thanks in advance for your help
I was trying to make a program in Python 3.4.1 to obtain the prime numbers from 2 to 100,000.
My problem is that it takes too much time to process all the information and it never give me any result.
I had left it for around half an hour, it slows me all the computer and it doesn't give me what I want.
I am using the Eratosthenes' Sieve algorithm "Criba de Eratostenes".
Here is my code:
from math import *
def primos(num):
num2 = num + 1
tnumeros = [] # tnumeros = every number from 2 to num
npnumeros= [] # npnumeros = every number that is no prime
pnumeros = [] # pnumeros = every prime number
for a in range( 2, num2 ):
tnumeros.append( a )
for i in range( 2, int( sqrt( num ) ) + 1 ):
for j in range( i, int( num / i ) + 1 ):
np = i * j
npnumeros.append( np )
npnumeros = list( set( npnumeros ) )
for e in tnumeros:
if ( e in npnumeros ):
continue
else:
pnumeros.append( e )
return ( str( "".join( str( pnumeros ) ) ) )
print( primos( 100000 ) )
Don't use a list for your npnumeros value; use a set instead. You're only interested in looking up whether a number is in that collection, so make it a set from the start:
npnumeros = set()
# ...
for i in range( 2, int( sqrt( num ) ) + 1 ):
for j in range( i, int( num / i ) + 1 ):
np = i * j
npnumeros.add( np )
# npnumeros = list( set( npnumeros ) ) # Remove this line, it's no longer needed
for e in tnumeros:
if ( e in npnumeros ):
continue
else:
pnumeros.append( e )
The reason your code is slow is that looking up numbers in a list is O(N) time, and doing that inside an O(N) loop is O(N^2) time. But looking up numbers in a set is O(1) time, so you'll have O(N) time inside that loop. Going from O(N^2) to O(N) is going to represent a HUGE difference in processing speed.
If you don't understand the O(N) notation I used, Google "Big O notation" to read more about it.
This is a severly truncated answer, due to the fact this question should probably be moved to CR.
One quick speed up is simply leaving npnumeros as a set instead of a list. What that means is that the later computation if ( e in npnumeros ): will happen sigifcantly faster.
The modified code:
from math import *
def primos(num):
num2 = num + 1
tnumeros = [] # tnumeros = every number from 2 to num
npnumeros= [] # npnumeros = every number that is no prime
pnumeros = [] # pnumeros = every prime number
for a in range( 2, num2 ):
tnumeros.append( a )
for i in range( 2, int( sqrt( num ) ) + 1 ):
for j in range( i, int( num / i ) + 1 ):
np = i * j
npnumeros.append( np )
npnumeros = set( npnumeros )
for e in tnumeros:
if ( e in npnumeros ):
continue
else:
pnumeros.append( e )
return ( str( "".join( str( pnumeros ) ) ) )
print( primos( 100000 ) )
runs ~60 times faster.