Slower times in multithreading using Julia 1.3.1 - multithreading

I started recently using the new multithreading interface in the 1.3.1 version. After I tried the fibonacci example in this blog post and getting significant speedups, I started experimenting with some old algorithms of mine.
I have a function that uses the trapezoid method to calculate integrals, both below or above a curve:
function trapezoid( x :: AbstractVector ,
y :: AbstractVector ;
y0 :: Number = 0.0 ,
inv :: Number = NaN )
int = zeros(length(x)-1)
for i = 2:length(x)
if isnan(inv) == true
int[i-1] = (y[i]+y[i-1]-2y0) * (x[i]-x[i-1]) / 2
else
int[i-1] = (2inv-(y[i]+y[i-1])-2y0) * (x[i]-x[i-1]) / 2
end # if
end # for
integral = sum(int) ;
return integral
end
Then I have a very inefficient algorithm that determines the midpoint index of a curve comparing the area below and above the curve:
function EAM_without_threads( x :: Vector{Float64} ,
y :: Vector{Float64} ,
y0 :: Real ,
ymean :: Real )
approx = Vector{Float64}(undef,length(x)-1)
for i in 1:length(x)-1
x1 = #view(x[1:i ])
x2 = #view(x[i:end])
y1 = #view(y[1:i ])
y2 = #view(y[i:end])
Al = trapezoid( x1 , y1 , y0=y0 )
Au = trapezoid( x2 , y2 , inv=ymean )
approx[i] = abs(Al-Au)
end
minind = findmin(approx)[2]
return x[minind]
end
And:
function EAM_with_threads( x :: Vector{Float64} ,
y :: Vector{Float64} ,
y0 :: Real ,
ymean :: Real )
approx = Vector{Float64}(undef,length(x)-1)
for i in 1:length(x)-1
x1 = #view(x[1:i ])
x2 = #view(x[i:end])
y1 = #view(y[1:i ])
y2 = #view(y[i:end])
Al = #spawn trapezoid( x1 , y1 , y0=y0 )
Au = #spawn trapezoid( x2 , y2 , inv=ymean )
approx[i] = abs(fetch(Al)-fetch(Au))
end
minind = findmin(approx)[2]
return x[minind]
end
This is what I used to try both functions:
using SpecialFunctions
using BenchmarkTools
x = collect(-10.0:5e-4:10.0)
y = erf.(x)
And then got these results:
julia> #btime EAM_without_threads(x,y,-1.0,1.0)
7.515 s (315905 allocations: 11.94 GiB)
julia> #btime EAM_with_threads(x,y,-1.0,1.0)
10.295 s (1274131 allocations: 12.00 GiB)
I don't understand... Using htop I can see that all my 8 threads are working almost at full capacity. This is my machine:
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-4712MQ CPU # 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 8
I know about the overhead of dealing with several threads, and in small problems I understand if it's slower, but why in this case?
I'm also searching for multithreading "good practices", because I guess not every piece of code will benefit from parallelism.
Thank you all in advance.

Your code is doing some very redundant work here. It's doing a full trapezoidal integral for each step, instead of just updating Al and Au incrementally. Here I've rewritten the code so that it does zero allocations, and my version of the EAM is on my computer 5 orders of magnitude faster than the original, without using any threads.
In general: before you start looking into things like threading, consider whether your algorithm is efficient. You can get much bigger speedups from a fast algorithm than from threading.
function trapz(x, y; y0=0.0, inv=NaN)
length(x) != length(y) && error("Input arrays cannot have different lengths")
s = zero(eltype(x))
if isnan(inv)
#inbounds for i in eachindex(x, y)[1:end-1]
s += (y[i+1] + y[i] - 2y0) * (x[i+1] - x[i])
end
else
#inbounds for i in eachindex(x, y)[1:end-1]
s += (2inv - (y[i+1] + y[i]) - 2y0) * (x[i+1] - x[i])
end
end
return s / 2
end
function eam(x, y, y0, ymean)
length(x) != length(y) && error("Input arrays cannot have different lengths")
Au = trapz(x, y; inv=ymean)
Al = zero(Au)
amin = abs(Al - Au)
ind = firstindex(x)
#inbounds for i in eachindex(x, y)[2:end-1] # 2:length(x)-1
Al += (y[i] + y[i-1] - 2y0) * (x[i] - x[i-1]) / 2
Au -= (2ymean - (y[i] + y[i-1])) * (x[i] - x[i-1]) / 2
aval = abs(Al - Au)
if aval < amin
(amin, ind) = (aval, i)
end
end
return x[ind]
end
Benchmarks here (I use #time for your code and #btime for my own, since it would just be too time consuming to use #btime on really slow code):
julia> x = collect(-10.0:5e-4:10.0);
julia> y = erf.(x);
julia> #time EAM_without_threads(x, y, -1.0, 1.0)
15.611004 seconds (421.72 k allocations: 11.942 GiB, 11.73% gc time)
0.0
julia> #btime eam($x, $y, -1.0, 1.0)
181.991 μs (0 allocations: 0 bytes)
0.0
A small extra remark: you should not write if isnan(inv) == true, that is redundant. Just write if isnan(inv).

Try this
function EAM_with_threads( x :: Vector{Float64} ,
y :: Vector{Float64} ,
y0 :: Real ,
ymean :: Real )
approx = Vector{Float64}(undef,length(x)-1)
Threads.#threads for i in 1:length(x)-1
x1 = #view(x[1:i ])
x2 = #view(x[i:end])
y1 = #view(y[1:i ])
y2 = #view(y[i:end])
Al = trapezoid( x1 , y1 , y0=y0 )
Au = trapezoid( x2 , y2 , inv=ymean )
approx[i] = abs(Al-Au)
end
minind = findmin(approx)[2]
return x[minind]
end
Your for loop is easily parallelize, so the lowest fruit is to do each iteration of the "for loop" in parallel. It is much easier to reduce the overall time taken by doing this in parallel then to try and parallelize the internal instance of a "for loop".
I know about the overhead of dealing with several threads, and in
small problems I understand if it's slower, but why in this case?
Well, I think your first problem is that you didn't benchmark how long it takes to do
trapezoid( x1 , y1 , y0=y0 )
If you did, you will find that it takes hardly any time at all. Anything that does not take up a substantial amount of time is not worth doing in parallel. If A and B is independent and they both take up a long time then you should do A and B in parallel. Otherwise find something else to parallelize first.
Lets look at what you have
x = collect(-10.0:5e-4:10.0)
and
for i in 1:length(x)-1
So basically your for loop has around 40000 iterations
Your multithreading method takes
total_time = setup_time * 40000 + ind_work_time/2 * 40000
Where as parallelizing the for loop takes
total_time = setup_time * 1 + ind_work_time * 40000/8
For comparison, the non-multithreaded method take
total_time = ind_work_time * 40000

Related

Optimizing asymmetrically reweighted penalized least squares smoothing (from matlab to python)

I'm trying to apply the method for baselinining vibrational spectra, which is announced as an improvement over asymmetric and iterative re-weighted least-squares algorithms in the 2015 paper (doi:10.1039/c4an01061b), where the following matlab code was provided:
function z = baseline(y, lambda, ratio)
% Estimate baseline with arPLS in Matlab
N = length(y);
D = diff(speye(N), 2);
H = lambda*D'*D;
w = ones(N, 1);
while true
W = spdiags(w, 0, N, N);
% Cholesky decomposition
C = chol(W + H);
z = C \ (C' \ (w.*y) );
d = y - z;
% make d-, and get w^t with m and s
dn = d(d<0);
m = mean(d);
s = std(d);
wt = 1./ (1 + exp( 2* (d-(2*s-m))/s ) );
% check exit condition and backup
if norm(w-wt)/norm(w) < ratio, break; end
end
that I rewrote into python:
def baseline_arPLS(y, lam, ratio):
# Estimate baseline with arPLS
N = len(y)
k = [numpy.ones(N), -2*numpy.ones(N-1), numpy.ones(N-2)]
offset = [0, 1, 2]
D = diags(k, offset).toarray()
H = lam * numpy.matmul(D.T, D)
w_ = numpy.ones(N)
while True:
W = spdiags(w_, 0, N, N, format='csr')
# Cholesky decomposition
C = cholesky(W + H)
z_ = spsolve(C.T, w_ * y)
z = spsolve(C, z_)
d = y - z
# make d- and get w^t with m and s
dn = d[d<0]
m = numpy.mean(dn)
s = numpy.std(dn)
wt = 1. / (1 + numpy.exp(2 * (d - (2*s-m)) / s))
# check exit condition and backup
norm_wt, norm_w = norm(w_-wt), norm(w_)
if (norm_wt / norm_w) < ratio:
break
w_ = wt
return(z)
Except for the input vector y the method requires parameters lam and ratio and it runs ok for values lam<1.e+07 and ratio>1.e-01, but outputs poor results. When values are changed outside this range, for example lam=1e+07, ratio=1e-02 the CPU starts heating up and job never finishes (I interrupted it after 1min). Also in both cases the following warning shows up:
/usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/dsolve/linsolve.py: 144: SparseEfficencyWarning: spsolve requires A to be CSC or CSR matrix format warn('spsolve requires A to be CSC or CSR format',
although I added the recommended format='csr' option to the spdiags call.
And here's some synthetic data (similar to one in the paper) for testing purposes. The noise was added along with a 3rd degree polynomial baseline The method works well for parameters bl_1 and fails to converge for bl_2:
import numpy
from matplotlib import pyplot
from scipy.sparse import spdiags, diags, identity
from scipy.sparse.linalg import spsolve
from numpy.linalg import cholesky, norm
import sys
x = numpy.arange(0, 1000)
noise = numpy.random.uniform(low=0, high = 10, size=len(x))
poly_3rd_degree = numpy.poly1d([1.2e-06, -1.23e-03, .36, -4.e-04])
poly_baseline = poly_3rd_degree(x)
y = 100 * numpy.exp(-((x-300)/15)**2)+\
200 * numpy.exp(-((x-750)/30)**2)+ \
100 * numpy.exp(-((x-800)/15)**2) + noise + poly_baseline
bl_1 = baseline_arPLS(y, 1e+07, 1e-01)
bl_2 = baseline_arPLS(y, 1e+07, 1e-02)
pyplot.figure(1)
pyplot.plot(x, y, 'C0')
pyplot.plot(x, poly_baseline, 'C1')
pyplot.plot(x, bl_1, 'k')
pyplot.show()
sys.exit(0)
All this is telling me that I'm doing something very non-optimal in my python implementation. Since I'm not knowledgeable enough about the intricacies of scipy computations I'm kindly asking for suggestions on how to achieve convergence in this calculations.
(I encountered an issue in running the "straight" matlab version of the code because the line D = diff(speye(N), 2); truncates the last two rows of the matrix, creating dimension mismatch later in the function. Following the description of matrix D's appearance I substituted this line by directly creating a tridiagonal matrix using the diags function.)
Guided by the comment #hpaulj made, and suspecting that the loop exit wasn't coded properly, I re-visited the paper and found out that the authors actually implemented an exit condition that was not featured in their matlab script. Changing the while loop condition provides an exit for any set of parameters; my understanding is that algorithm is not guaranteed to converge in all cases, which is why this condition is necessary but was omitted by error. Here's the edited version of my python code:
def baseline_arPLS(y, lam, ratio):
# Estimate baseline with arPLS
N = len(y)
k = [numpy.ones(N), -2*numpy.ones(N-1), numpy.ones(N-2)]
offset = [0, 1, 2]
D = diags(k, offset).toarray()
H = lam * numpy.matmul(D.T, D)
w_ = numpy.ones(N)
i = 0
N_iterations = 100
while i < N_iterations:
W = spdiags(w_, 0, N, N, format='csr')
# Cholesky decomposition
C = cholesky(W + H)
z_ = spsolve(C.T, w_ * y)
z = spsolve(C, z_)
d = y - z
# make d- and get w^t with m and s
dn = d[d<0]
m = numpy.mean(dn)
s = numpy.std(dn)
wt = 1. / (1 + numpy.exp(2 * (d - (2*s-m)) / s))
# check exit condition and backup
norm_wt, norm_w = norm(w_-wt), norm(w_)
if (norm_wt / norm_w) < ratio:
break
w_ = wt
i += 1
return(z)

Where is my code hanging (in an infinite loop)?

I am new to Python and trying to get this script to run, but it seems to be hanging in an infinite loop. When I use ctrl+c to stop it, it is always on line 103.
vs = 20.05 * np.sqrt(Tb + Lb * (y - y0)) # m/s speed of sound as a function of temperature
I am used to MatLab (from school) and the editor it has. I ran into issues earlier with the encoding for this code. Any suggestions on a (free) editor? I am currently using JEdit and/or Notepad.
Here is the full script:
#!/usr/bin/env python
# -*- coding: ANSI -*-
import numpy as np
from math import *
from astropy.table import Table
import matplotlib.pyplot as plt
from hanging_threads import start_monitoring#test for code hanging
start_monitoring(seconds_frozen=10, test_interval=100)
"""Initial Conditions and Inputs"""
d = 154.71/1000 # diameter of bullet (in meters)
m = 46.7 # mass of bullet ( in kg)
K3 = 0.87*0.3735 # drag coefficient at supersonic speed
Cd1 = 0.87*0.108 #drag coefficient at subsonic speed
v0 = 802 # muzzle velocity in m/sec
dt = 0.01 # timestep in seconds
"""coriolis inputs"""
L = 90*np.pi/180 # radians - latitude of firing site
AZ = 90*np.pi/180 # radians - azimuth angle of fire measured clockwise from North
omega = 0.0000727 #rad/s rotation of the earth
"""wind inputs"""
wx = 0 # m/s
wz = 0 # m/s
"""initializing variables"""
vx = 0 #initial x velocity
vy = 0 #initial y velocity
vy0 = 0
y_max = 0 #apogee
v = 0
t = 0
x = 0
"""Variable Atmospheric Pressure"""
rho0 = 1.2041 # density of air at sea-level (kg/m^3)
T = 20 #temperature at sea level in celcius
Tb = T + 273.15 # temperature at sea level in Kelvin
Lb = -2/304.8 # temperature lapse rate in K/m (-2degrees/1000ft)- not valid above 36000ft
y = 0 # current altitude
y0 = 0 # initial altitude
g = 9.81 # acceleration due to gravity in m/s/s
M = 0.0289644 #kg/mol # molar mass of air
R = 8.3144598 # J/molK - universal gas constant
# air density as a function of altitude and temperature
rho = rho0 * ((Tb/(Tb+Lb*(y-y0)))**(1+(g*M/(R*Lb))))
"""Variable Speed of Sound"""
vs = 20.05*np.sqrt(Tb +Lb*(y-y0)) # m/s speed of sound as a function of temperature
Area = pi*(d/2)**2 # computing the reference area
phi_incr = 5 #phi0 increment (degrees)
N = 12 # length of table
"""Range table"""
dtype = [('phi0', 'f8'), ('phi_impact', 'f8'), ('x', 'f8'), ('z', 'f8'),('y', 'f8'), ('vx', 'f8'), ('vz', 'f8'), ('vy', 'f8'), ('v', 'f8'),('M', 'f8'), ('t', 'f8')]
table = Table(data=np.zeros(N, dtype=dtype))
"""Calculates entire trajectory for each specified angle"""
for i in range(N):
phi0 = (i + 1) * phi_incr
"""list of initial variables used in while loop"""
t = 0
y = 0
y_max = y
x = 0
z = 0
vx = v0*np.cos(radians(phi0))
vy = v0*np.sin(radians(phi0))
vx_w = 0
vz_w = 0
vz = 0
v = v0
ay = 0
ax = 0
wx = wx
wz = wz
rho = rho0 * ((Tb / (Tb + Lb * (y - y0))) ** (1 + (g * M / (R * Lb))))
vs = 20.05 * np.sqrt(Tb + Lb * (y - y0)) # m/s speed of sound as a function of temperature
ax_c = -2 * omega * ((vz * sin(L)) + vy * cos(L) * sin(AZ))
ay_c = 2 * omega * ((vz * cos(L) * cos(AZ)) + vx_w * cos(L) * sin(AZ))
az_c = -2 * omega * ((vy * cos(L) * cos(AZ)) - vx_w * sin(L))
Mach = v/vs
""" initializing variables for plots"""
t_list = [t]
x_list = [x]
y_list = [y]
vy_list = [vy]
v_list = [v]
phi0_list = [phi0]
Mach_list = [Mach]
while y >= 0:
phi0 = phi0
"""drag calculation with variable density, Temp and sound speed"""
rho = rho0 * ((Tb / (Tb + Lb * (y - y0))) ** (1 + (g * M / (R *Lb))))
vs = 20.05 * np.sqrt(Tb + Lb * (y - y0)) # m/s speed of sound as a function of temperature
Cd3 = K3 / sqrt(v / vs)
Mach = v/vs
"""Determining drag regime"""
if v > 1.2 * vs: #supersonic
Cd = Cd3
elif v < 0.8 * vs: #subsonic
Cd = Cd1
else: #transonic
Cd = ((Cd3 - Cd1)*(v/vs - 0.8)/(0.4)) + Cd1
"""Acceleration due to Coriolis"""
ax_c = -2*omega*((vz_w*sin(L))+ vy*cos(L)*sin(AZ))
ay_c = 2*omega*((vz_w*cos(L)*cos(AZ))+ vx_w*cos(L)*sin(AZ))
az_c = -2*omega*((vy*cos(L)*cos(AZ))- vx_w*sin(L))
"""Total acceleration calcs"""
if vx > 0:
ax = -0.5*rho*((vx-wx)**2)*Cd*Area/m + ax_c
else:
ax = 0
""" Vy before and after peak"""
if vy > 0:
ay = (-0.5 * rho * (vy ** 2) * Cd * Area / m) - g + ay_c
else:
ay = (0.5 * rho * (vy ** 2) * Cd * Area / m) - g + ay_c
az = az_c
vx = vx + ax*dt # vx without wind
# vx_w = vx with drag and no wind + wind
vx_w = vx + 2*wx*(1-(vx/v0*np.cos(radians(phi0))))
vy = vy + ay*dt
vz = vz + az*dt
vz_w = vz + wz*(1-(vx/v0*np.cos(radians(phi0))))
"""projectile velocity"""
v = sqrt(vx_w**2 + vy**2 + vz**2)
"""new x, y, z positions"""
x = x + vx_w*dt
y = y + vy*dt
z = z + vz_w*dt
if y_max <= y:
y_max = y
phi_impact = degrees(atan(vy/vx)) #impact angle in degrees
""" appends selected data for ability to plot"""
t_list.append(t)
x_list.append(x)
y_list.append(y)
vy_list.append(vy)
v_list.append(v)
phi0_list.append(phi0)
Mach_list.append(Mach)
if y < 0:
break
t += dt
"""Range table output"""
table[i] = ('%.f' % phi0, '%.3f' % phi_impact, '%.1f' % x,'%.2f' % z, '%.1f' % y_max, '%.1f' % vx_w,'%.1f' % vz,'%.1f' % vy,'%.1f' % v,'%.2f' %Mach, '%.1f' % t)
""" Plot"""
plt.plot(x_list, y_list, label='%d°' % phi0)#plt.plot(x_list, y_list, label='%d°' % phi0)
plt.title('Altitude versus Range')
plt.ylabel('Altitude (m)')
plt.xlabel('Range (m)')
plt.axis([0, 30000, 0, 15000])
plt.grid(True)
print(table)
legend = plt.legend(title="Firing Angle",loc=0, fontsize='small', fancybox=True)
plt.show()
Thank you in advance
Which Editor Should I Use?
Personally, I prefer VSCode, but Sublime is also pretty popular. If you really want to go barebones, try Vim. All three are completely free.
Code Errors
After scanning your code snippet, it appears that you are caught in an infinite loop, which you enter with the statement while y >= 0. The reason you always get line 103 when you hit Ctrl+C is likely because that takes the longest, making it more likely to land there at any given time.
Note that currently, you can only escape your while loop through this branch:
if y_max <= y:
y_max= y
phi_impact = degrees(atan(vy/vx)) #impact angle in degrees
""" appends selected data for ability to plot"""
t_list.append(t)
x_list.append(x)
y_list.append(y)
vy_list.append(vy)
v_list.append(v)
phi0_list.append(phi0)
Mach_list.append(Mach)
if y < 0:
break
t += dt
This means that if ymax never drops below y, or y never drops below zero, then you will infinitely loop. Granted, I haven't looked at your code in any great depth, but from the surface it appears that y_max is never decremented (meaning it will always be at least equal to y). Furthermore, y is only updated when you do y = y + vy*dt, which will only ever increase y if vy >= 0 (I assume dt is always positive).
Debugging
As #Giacomo Catenazzi suggested, try printing out y and y_max at the top of the while loop and see how they change as your code runs. I suspect they are not decrementing like you expected.

Simpson's rule 3/8 for n intervals in Python

im trying to write a program that gives the integral approximation of e(x^2) between 0 and 1 based on this integral formula:
Formula
i've done this code so far but it keeps giving the wrong answer (Other methods gives 1.46 as an answer, this one gives 1.006).
I think that maybe there is a problem with the two for cycles that does the Riemman sum, or that there is a problem in the way i've wrote the formula. I also tried to re-write the formula in other ways but i had no success
Any kind of help is appreciated.
import math
import numpy as np
def f(x):
y = np.exp(x**2)
return y
a = float(input("¿Cual es el limite inferior? \n"))
b = float(input("¿Cual es el limite superior? \n"))
n = int(input("¿Cual es el numero de intervalos? "))
x = np.zeros([n+1])
y = np.zeros([n])
z = np.zeros([n])
h = (b-a)/n
print (h)
x[0] = a
x[n] = b
suma1 = 0
suma2 = 0
for i in np.arange(1,n):
x[i] = x[i-1] + h
suma1 = suma1 + f(x[i])
alfa = (x[i]-x[i-1])/3
for i in np.arange(0,n):
y[i] = (x[i-1]+ alfa)
suma2 = suma2 + f(y[i])
z[i] = y[i] + alfa
int3 = ((b-a)/(8*n)) * (f(x[0])+f(x[n]) + (3*(suma2+f(z[i]))) + (2*(suma1)))
print (int3)
I'm not a math major but I remember helping a friend with this rule for something about waterplane area for ships.
Here's an implementation based on Wikipedia's description of the Simpson's 3/8 rule:
# The input parameters
a, b, n = 0, 1, 10
# Divide the interval into 3*n sub-intervals
# and hence 3*n+1 endpoints
x = np.linspace(a,b,3*n+1)
y = f(x)
# The weight for each points
w = [1,3,3,1]
result = 0
for i in range(0, 3*n, 3):
# Calculate the area, 4 points at a time
result += (x[i+3] - x[i]) / 8 * (y[i:i+4] * w).sum()
# result = 1.4626525814387632
You can do it using numpy.vectorize (Based on this wikipedia post):
a, b, n = 0, 1, 10**6
h = (b-a) / n
x = np.linspace(0,n,n+1)*h + a
fv = np.vectorize(f)
(
3*h/8 * (
f(x[0]) +
3 * fv(x[np.mod(np.arange(len(x)), 3) != 0]).sum() + #skip every 3rd index
2 * fv(x[::3]).sum() + #get every 3rd index
f(x[-1])
)
)
#Output: 1.462654874404461
If you use numpy's built-in functions (which I think is always possible), performance will improve considerably:
a, b, n = 0, 1, 10**6
x = np.exp(np.square(np.linspace(0,n,n+1)*h + a))
(
3*h/8 * (
x[0] +
3 * x[np.mod(np.arange(len(x)), 3) != 0].sum()+
2 * x[::3].sum() +
x[-1]
)
)
#Output: 1.462654874404461

Manually implementing approximation functions

I have a dataset from kaggle of 45,253 rows and a single column for temperature in Kelvin for the city of Detroit. It's mean = 282.97, std = 11, min = 243.48, max = 308.05.
This is the result when plotted as a histogram of 100 bins with density=True:
I am expected to write the following two functions and see whichever one approximates the closest to the histogram:
Like this one here using scipy.stats.norm.pdf:
I generated the above image using:
x = np.linspace(dataset.Detroit.min(), dataset.Detroit.max(), 1001)
P_norm = norm.pdf(x, dataset.Detroit.mean(), dataset.Detroit.std())
plot_pdf_single(x, P_norm)
However, whenever I try to implement any of the two approximation functions all of my values for P_norm result in 0s or infs.
This is what I tried:
P_norm = [(1.0/(np.sqrt(2.0*pi*(std*std))))*np.exp(((-x_i-mu)*(-x_i-mu))/(2.0*(std*std))) for x_i in x]
I also broke it down into parts for a single x_i:
part1 = ((-x[0] - mu)*(-x[0] - mu)) / (2.0*(std * std))
part2 = np.exp(part1)
part3 = 1.0 / (np.sqrt(2.0 * pi * (std*std)))
total = part3*part2
I got the following values:
1145.3913234604413
inf
0.036267480036493875
inf
Since both of the equations use the same formula:
def pdf_approximation(x_i, mu, std):
return (1.0 / (np.sqrt(2.0 * pi * (std*std)))) * np.exp((-(x_i-mu)*(x_i-mu)) / (2.0 * (std*std)))
The code for the first approximation is:
mu = 283
std = 11
P_norm = np.array([pdf_approximation(x_i, mu, std) for x_i in x])
plot_pdf_single(x, P_norm)
The code for the second approximation is:
mu1 = 276
std1 = 6
mu2 = 293
std2 = 6.5
P_norm = np.array([(pdf_approximation(x_i, mu1, std1) * 0.5) + (pdf_approximation(x_i, mu2, std2) * 0.5) for x_i in x])
plot_pdf_single(x, P_norm)

Prevent rounding, maintaining certain level of accuracy

I am trying to apply a Runge Kutta method for solving an ODE. The problem is, python somewhere keeps rounding like a madman and I don't understand why or is something syntactically telling python to round everything? I've tried converting everything to float () to no avail. What should I do to have python compute everything satisfying some accuracy demand?
import numpy as np
def fn(x,y):
return x-y
def rk3 (y0,x):
n = len (x)
y = np.array([y0]*n)
for j in range(n-1):
h = x[j+1]-x[j]
k1 = h * fn(x[j],y[j])
k2 = h * fn(x[j] + h / 3.0, y[j] + k1 / 3.0)
k3 = h * fn(x[j] + 2.0*h /3.0, y[j] + 2.0*k2 /3.0)
y[j+1] = y[j] + k1*1.0/4.0 + k3 *3.0/4.0
return y
v = rk3(1, np.linspace(0,5,500))
The mistake is passing an integer in rk3(1,np.linspace(0,5,500)). If one changes to 1.0 all further operations are regarded as float point arithmetic as required.

Resources