Divide by Zero in Mean()? - python-3.x

I'm trying to write some code to compute mean, Variance, Standard Deviation, FWHM, and finally evaluate the Gaussian Integral. I've been running into a division by zero error that I can't get past and I would like to know the solution for this ?
Where it's throwing an error I've tried to throw an exception handler as follows
Average = (sum(yvalues)) / (len(yvalues)) try: return (sum(yvalues) / len(yvalues))
expect ZeroDivisionError:
return 0
xvalues = []
yvalues = []
def generate():
for i in range(0,300):
a = rand.uniform((float("-inf") , float("inf")))
b = rand.uniform((float("-inf") , float("inf")))
xvalues.append(i)
### Defining the variable 'y'
y = a * (b + i)
yvalues.append(y) + 1
def mean():
Average = (sum(yvalues))/(len(yvalues))
print("The average is", Average)
return Average
def varience():
# This calculates the SD and the varience
s = []
for i in yvalues:
z = i - mean()
z = (np.abs(i-z))**2
s.append(y)**2
t = mean()
v = numpy.sqrt(t)
print("Answer for Varience is:", v)
return v
Traceback (most recent call last):
File "Tuesday.py", line 42, in <module>
def make_gauss(sigma=varience(), mu=mean(), x = random.uniform((float("inf"))*-1, float("inf"))):
File "Tuesday.py", line 35, in varience
t = mean()
File "Tuesday.py", line 25, in mean
Average = (sum(yvalues))/(len(yvalues))
ZeroDivisionError: division by zero

There are a few things that are not quite right as people noted above.
import random
import numpy as np
def generate():
xvalues, yvalues = [], []
for i in range(0,300):
a = random.uniform(-1000, 1000)
b = random.uniform(-1000, 1000)
xvalues.append(i)
### Defining the variable 'y'
y = a * (b + i)
yvalues.append(y)
return xvalues, yvalues
def mean(yvalues):
return sum(yvalues)/len(yvalues)
def variance(yvalues):
# This calculates the SD and the varience
s = []
yvalues_mean = mean(yvalues)
for y in yvalues:
z = (y - yvalues_mean)**2
s.append(z)
t = mean(s)
return t
def variance2(yvalues):
yvalues_mean = mean(yvalues)
return sum( (y-yvalues_mean)**2 for y in yvalues) / len(yvalues)
# Generate the xvalues and yvalues
xvalues, yvalues = generate()
# Now do the calculation, based on the passed parameters
mean_yvalues = mean(yvalues)
variance_yvalues = variance(yvalues)
variance_yvalues2 = variance2(yvalues)
print('Mean {} variance {} {}'.format(mean_yvalues, variance_yvalues, variance_yvalues2))
# Using Numpy
np_mean = np.mean(yvalues)
np_var = np.var(yvalues)
print('Numpy: Mean {} variance {}'.format(np_mean, np_var))
The way variance was calculated isn't quite right, but given the comment of "SD and variance" you were probably going to calculate both.
The code above gives 2 (well, 3) ways to do what I understand you were trying to do but I changed a few of the methods to clean them up a bit. generate() returns two lists now. mean() returns the mean, etc. The function variance2() gives an alternative way to calculate the variance but using a list comprehension style.
The last couple of lines are an example using numpy which has all of it built in and, if available, is a great way to go.
The one part that wasn't clear was the random.uniform(float("-inf"), float("inf"))) which seems to be an error (?).

You are calling mean before you call generate.
This is obvious since yvalues.append(y) + 1 (in generate) would have caused another error (TypeError) since .append returns None and you can't add 1 to None.
Change yvalues.append(y) + 1 to yvalues.append(y + 1) and then make sure to call generate before you call mean.
Also notice that you have the same error in varience (which should be called variance, btw). s.append(y)**2 should be s.append(y ** 2).
Another error you have is that the stacktrace shows make_gauss(sigma=varience(), mu=mean(), x = random.uniform((float("inf"))*-1, float("inf"))).
I'm pretty sure you don't actually want to call varience and mean on this line, just reference them. So also change that line to make_gauss(sigma=varience, mu=mean, x = random.uniform((float("inf"))*-1, float("inf")))

Related

scipy solve_ivp with adaptive solution

I am struggling to understand how scipy.solve_ivp() handles errors in a system of ODE. Lets say I have the following, simple code for a single ODEs, and I think I might be doing things wrong in some way. Lets say my rhs looks something like:
from scipy.integrate import solve_ivp
def rhs_func(t, y):
z = 1.0/( x - y + 1j )
return z
Suppose we call solve_ivp with the following signature:
Z_solution = ivp_adaptive.solve_ivp( fun = rhs_func,
t_span = [100,0],
y0 = y0, #some initial value of 0 for example
method ='RK45',
t_eval = None,
args = some_additional_arguments_to_rhs_func,
dense_output = False,
rtol = 1e-8
atol = 1e-10
)
Now, the absolute and relative tolerances are supposed to fix the error of the caculation. The problem I am having has to do with the "t_eval=None" in this case. Apparently, this choice lets the integrator (in this case of type RK45) to choose the time step according to the specified tolerances above being or not being exceeded, i.e., the steps are not fixed, but somehow taking a larger step in t would mean a solution has been found that lies below the tolerances above (atol=1e-10 , rtol=1e-8). This is particularly useful in problems with large variations of the time scale, where a uniform discretization of t is very inefficient.
My big problem has to do with the following piece of code in scipy.integrate._ivp.solve_ivp() around line 575, with the "t_eval == None" case:
while status is None:
message = solver.step()
if solver.status == 'finished':
status = 0
elif solver.status == 'failed':
status = -1
break
t_old = solver.t_old
t = solver.t
y = solver.y
if dense_output:
sol = solver.dense_output()
interpolants.append(sol)
else:
sol = None
if events is not None:
g_new = [event(t, y) for event in events]
active_events = find_active_events(g, g_new, event_dir)
if active_events.size > 0:
if sol is None:
sol = solver.dense_output()
root_indices, roots, terminate = handle_events(
sol, events, active_events, is_terminal, t_old, t)
for e, te in zip(root_indices, roots):
t_events[e].append(te)
y_events[e].append(sol(te))
if terminate:
status = 1
t = roots[-1]
y = sol(t)
g = g_new
# HERE I HAVE MODIFIED THE FILE BY CALLING AN INTERPOLATION FUNCTION FOR THE SOLUTION
if t_eval is None:
ts.append(t)
#ys.append(y)
# this calls to adapt the solution to a new set of values x over which y(x,t) is
# defined
interp_solution(t,y,solver,args)
y = solver.y
ys.append(y)
where I have defined a function:
def interp_solution( t, y, solver, args ):
import numpy as np
from scipy import interpolate
x_old = args.get_old_grid() # this call just returns an array of the style of
# x_new, and is where y is defined
x_new = np.linspace( -t, t, dim ) # the new array where components of y are
# defined
y_interp = interpolate.interp1d( x_old, y )
y_new = y_interp( x_new )
solver.y = y_new # update the solver y
# finally, we change the maximum allowed step of the integrator if t is below
# some threshold value
if ( t < args.get_threshold() ):
solver.max_step = #some number
return y_new
When I look at the results, it seems that this is very sensitive to the tolerances and the way the integration steps are performed, but somehow I fail to see where errors could come from in this approach -- can anyone explain if this approach is somehow affecting the solution and the associated errors ? How can one implement a similar approach in this fashion? Any help is greatly appreaciated.

optimization doesn't respect constraint

I have an optimization problem and I'm solving it with scipy and the minimization module. I uses SLSQP as method, because it is the only one, which fits to my problem. The function to optimize is a cost function with 'x' as a list of percentages. I have some constraints which has to be respected:
At first, the sum of the percentages should be 1 (PercentSum(x)) This constrain is added as 'eg' (equal) as you can see in the code.
The second constraint is about a physical value which must be less then 'proberty1Max '. This constrain is added as 'ineq' (inequal). So if 'proberty1 < proberty1Max ' the function should be bigger than 0. Otherwise the function should be 0. The functions is differentiable.
Below you can see a model of my try. The problem is the 'constrain' function. I get solutions, where the sum of 'prop' is bigger than 'probertyMax'.
import numpy as np
from scipy.optimize import minimize
class objects:
def __init__(self, percentOfInput, min, max, cost, proberty1, proberty2):
self.percentOfInput = percentOfInput
self.min = min
self.max = max
self.cost = cost
self.proberty1 = proberty1
self.proberty2 = proberty2
class data:
def __init__(self):
self.objectList = list()
self.objectList.append(objects(10, 0, 20, 200, 2, 7))
self.objectList.append(objects(20, 5, 30, 230, 4, 2))
self.objectList.append(objects(30, 10, 40, 270, 5, 9))
self.objectList.append(objects(15, 0, 30, 120, 2, 2))
self.objectList.append(objects(25, 10, 40, 160, 3, 5))
self.proberty1Max = 1
self.proberty2Max = 6
D = data()
def optiFunction(x):
for index, obj in enumerate(D.objectList):
obj.percentOfInput = x[1]
costSum = 0
for obj in D.objectList:
costSum += obj.cost * obj.percentOfInput
return costSum
def PercentSum(x):
y = np.sum(x) -100
return y
def constraint(x, val):
for index, obj in enumerate(D.objectList):
obj.percentOfInput = x[1]
prop = 0
if val == 1:
for obj in D.objectList:
prop += obj.proberty1 * obj.percentOfInput
return D.proberty1Max -prop
else:
for obj in D.objectList:
prop += obj.proberty2 * obj.percentOfInput
return D.proberty2Max -prop
def checkConstrainOK(cons, x):
for con in cons:
y = con['fun'](x)
if con['type'] == 'eq' and y != 0:
print("eq constrain not respected y= ", y)
return False
elif con['type'] == 'ineq' and y <0:
print("ineq constrain not respected y= ", y)
return False
return True
initialGuess = []
b = []
for obj in D.objectList:
initialGuess.append(obj.percentOfInput)
b.append((obj.min, obj.max))
bnds = tuple(b)
cons = list()
cons.append({'type': 'eq', 'fun': PercentSum})
cons.append({'type': 'ineq', 'fun': lambda x, val=1 :constraint(x, val) })
cons.append({'type': 'ineq', 'fun': lambda x, val=2 :constraint(x, val) })
solution = minimize(optiFunction,initialGuess,method='SLSQP',\
bounds=bnds,constraints=cons,options={'eps':0.001,'disp':True})
print('status ' + str(solution.status))
print('message ' + str(solution.message))
checkConstrainOK(cons, solution.x)
There is no way to find a solution, but the output is this:
Positive directional derivative for linesearch (Exit mode 8)
Current function value: 4900.000012746761
Iterations: 7
Function evaluations: 21
Gradient evaluations: 3
status 8
message Positive directional derivative for linesearch
Where is my fault? In this case it ends with mode 8, because the example is very small. With bigger data the algorithm ends with mode 0. But I think it should ends with a hint that an constraint couldn't be hold.
It doesn't make a difference, if proberty1Max is set to 4 or to 1. But in the case it is 1, there could not be a valid solution.
PS: I changed a lot in this question... Now the code is executable.
EDIT:
1.Okay, I learned, an inequal constrain is accepted if the output is positiv (>0). In the past I think <0 would also be accepted. Because of this the constrain function is now a little bit shorter.
What about the constrain. In my real solution I add some constrains using a loop. In this case it is nice to feed a function with an index of the loop and in the function this index is used to choose an element of an array. In my example here, the "val" decides if the constrain is for proberty1 oder property2. What the constrain mean is, how much of a property is in the hole mix. So I'm calculating the property multiplied with the percentOfInput. "prop" is the sum of this over all objects.
I think there might be a connection to the issue tux007 mentioned in the comments. link to the issue
I think the optimizer doesn't work correct, if the initial guess is not a valid solution.
Linear programming is not good for overdetermined equations. My problem doesn't have a unique solution, its an approximation.
As mentioned in the comment I think this is the problem:
Misleading output from....
If you have a look at the latest changes, the constrain is not satisfied, but the algorithm says: "Positive directional derivative for linesearch"

simpson integration on python

I am trying to integrate numerically using simpson integration rule for f(x) = 2x from 0 to 1, but keep getting a large error. The desired output is 1 but, the output from python is 1.334. Can someone help me find a solution to this problem?
thank you.
import numpy as np
def f(x):
return 2*x
def simpson(f,a,b,n):
x = np.linspace(a,b,n)
dx = (b-a)/n
for i in np.arange(1,n):
if i % 2 != 0:
y = 4*f(x)
elif i % 2 == 0:
y = 2*f(x)
return (f(a)+sum(y)+f(x)[-1])*dx/3
a = 0
b = 1
n = 1000
ans = simpson(f,a,b,n)
print(ans)
There is everything wrong. x is an array, everytime you call f(x), you are evaluating the function over the whole array. As n is even and n-1 odd, the y in the last loop is 4*f(x) and from its sum something is computed
Then n is the number of segments. The number of points is n+1. A correct implementation is
def simpson(f,a,b,n):
x = np.linspace(a,b,n+1)
y = f(x)
dx = x[1]-x[0]
return (y[0]+4*sum(y[1::2])+2*sum(y[2:-1:2])+y[-1])*dx/3
simpson(lambda x:2*x, 0, 1, 1000)
which then correctly returns 1.000. You might want to add a test if n is even, and increase it by one if that is not the case.
If you really want to keep the loop, you need to actually accumulate the sum inside the loop.
def simpson(f,a,b,n):
dx = (b-a)/n;
res = 0;
for i in range(1,n): res += f(a+i*dx)*(2 if i%2==0 else 4);
return (f(a)+f(b) + res)*dx/3;
simpson(lambda x:2*x, 0, 1, 1000)
But loops are generally slower than vectorized operations, so if you use numpy, use vectorized operations. Or just use directly scipy.integrate.simps.

Weighted moving average in python with different width in different regions

I was trying to take a oscillation avarage of a highly oscillating data. The oscillations are not uniform, it has less oscillations in the initial regions.
x = np.linspace(0, 1000, 1000001)
y = some oscillating data say, sin(x^2)
(The original data file is huge, so I can't upload it)
I want to take a weighted moving avarage of the function and plot it. Initially the period of the function is larger, so I want to take avarage over a large time interval. While I can do with smaller time interval latter.
I have found a possible elegant solution in following post:
Weighted moving average in python
However, I want to have different width in different regions of x. Say when x is between (0,100) I want the width=0.6, while when x is between (101, 300) width=0.2 and so on.
This is what I have tried to implement( with my limited knowledge in programing!)
def weighted_moving_average(x,y,step_size=0.05):#change the width to control average
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
if x.any() < 100:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.6)
bin_avg[index] = np.average(y,weights=weights)
else:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.1)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
It is needless to say that this is not working! I am getting the plot with the first value of sigma. Please help...
The following snippet should do more or less what you tried to do. You have mainly a logical problem in your code, x.any() < 100 will always be True, so you'll never execute the second part.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
y = np.sin(x**2)
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
def weighted_average(x,y,step_size=0.3):
weights = np.zeros_like(x)
bin_centers = np.arange(np.min(x),np.max(x)-.5*step_size,step_size)+.5*step_size
bin_avg = np.zeros_like(bin_centers)
for i, center in enumerate(bin_centers):
# Select the indices that should count to that bin
idx = ((x >= center-.5*step_size) & (x <= center+.5*step_size))
weights = gaussian(x[idx], mean=center, sigma=step_size)
bin_avg[i] = np.average(y[idx], weights=weights)
return (bin_centers,bin_avg)
idx = x <= 4
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.6))
idx = x >= 3
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.1))
plt.plot(x,y)
plt.legend(['0.6', '0.1', 'y'])
plt.show()
However, depending on the usage, you could also implement moving average directly:
x = np.linspace(0, 60, 1000)
y = np.sin(x**2)
z = np.zeros_like(x)
z[0] = x[0]
for i, t in enumerate(x[1:]):
a=.2
z[i+1] = a*y[i+1] + (1-a)*z[i]
plt.plot(x,y)
plt.plot(x,z)
plt.legend(['data', 'moving average'])
plt.show()
Of course you could then change a adaptively, e.g. depending of the local variance. Also note that this has apriori a small bias depending on a and the step size in x.

repeating tests in multiple functions python

I have some function for sound processing/ sound processing. And before it was all a single channel. But know i make it less or more multi channel.
At this point i have the feeling i do part of the scrips over and over again.
In this example it are two functions(my original function is longer) but the same happens also in single scripts.
my Two functions
import numpy as np
# def FFT(x, fs, *args, **kwargs):
def FFT(x, fs, output='complex'):
from scipy.fftpack import fft, fftfreq
N = len(x)
X = fft(x) / N
if output is 'complex':
F = np.linspace(0, N) / (N / fs)
return(F, X, [])
elif output is 'ReIm':
F = np.linspace(0, N) / (N / fs)
RE = np.real(X)
IM = np.imag(X)
return(F, RE, IM)
elif output is 'AmPh0':
F = np.linspace(0, (N-1)/2, N/2)
F = F/(N/fs)
# N should be int becouse of nfft
half_spec = np.int(N / 2)
AMP = abs(X[0:half_spec])
PHI = np.arctan(np.real(X[0:half_spec]) / np.imag(X[0:half_spec]))
return(F, AMP, PHI)
elif output is 'AmPh':
half_spec = np.int(N / 2)
F = np.linspace(1, (N-1)/2, N/2 - 1)
F = F/(N/fs)
AMP = abs(X[1:half_spec])
PHI = np.arctan(np.real(X[1:half_spec])/np.imag(X[1:half_spec]))
return(F, AMP, PHI)
def mFFT(x, fs, spectrum='complex'):
fft_shape = np.shape(x)
if len(fft_shape) == 1:
mF, mX1, mX2 = FFT(x, fs, spectrum)
elif len(fft_shape) == 2:
if fft_shape[0] < fft_shape[1]:
pass
elif fft_shape[0] > fft_shape[1]:
x = x.T
fft_shape = np.shape(x)
mF = mX1 = mX2 = []
for channel in range(fft_shape[0]):
si_mF, si_mX1, si_mX2 = FFT(x[channel], fs, spectrum)
if channel == 0:
mF = np.append(mF, si_mF)
mX1 = np.append(mX1, si_mX1)
mX2 = np.append(mX2, si_mX2)
else:
mF = np.vstack((mF, si_mF))
mX1 = np.vstack((mX1, si_mX1))
if si_mX2 == []:
pass
else:
mX2 = np.vstack((mX2, si_mX2))
elif len(fft_shape) > 2:
raise ValueError("Shape of input can't be greather than 2")
return(mF, mX1, mX2)
The second funcion in this case have the problem.
The reason for this checks is best to understand with an example:
I have recorded a sample of 1 second of audio data with 4 microphones.
so i have an ndim array of 4 x 44100 samples.
The FFT works on every even length array. This means that i get an result in both situations (4 x 44100 and 44100 x 4).
For all function after this function i have also 2 data types. or a complex signal or an tuple of two signals (amplitude and phase)... what's create an extra switch/ check in the script.
check type (tuple or complex data)
check direction (ad change it)
Check size / shape
run function and append/ stack this
Are there some methods to make this less repeative i have this situation in at least 10 functions...
Bert,
The problematic I'm understanding is the repeat of the calls you're making to do checks of all sorts. I'm not understanding all but I'm guessing they are made to format your data in a way you'll be able to execute fft on it.
One of the philosophy about computer programming in Python is "It's easier to ask forgiveness than it is to get permission."[1] . This means, you should probably try first and then ask forgiveness (try, except). It's much faster to do it this way then to do a lots of checks on the value. Also, those who are going to use your program should understand how it works pretty easily; make it easy to read without those check by separating the logic business from the technical logic. Don't worry, it's not evident and the fact you're asking is an indicator you're catching something isn't right :).
Here is what I would propose for your case (and it's not the perfect solution!):
def mFFT(x, fs, spectrum='complex'):
#Assume we're correcty align when receiving the data
#:param: x assume that we're multi-channel in the format [channel X soundtrack ]
#also, don't do this:
#mF = mX1 = si_mX2 = []
# see why : https://stackoverflow.com/questions/2402646/python-initializing-multiple-lists-line
mF = []
mX1 = []
mX2 = []
try:
for channel in range(len(x)):
si_mF, si_mX1, si_mX2 = FFT(x[channel], fs, spectrum)
mF.append(si_mF)
mX1.append(si_mX1)
mX2.append(si_mX2)
return (mF, mX1, mX2)
except:
#this is where you would try to point out why it could have failed. One good you had was the check for the orientation of the data and try again;
if np.shape(x)[0] > np.shape(x)[1]:
result = mFFT(x.T,fs,spectrum)
return result
else :
if np.shape(x)[0] > 2:
raise(ValueError("Shape of input isn't supported for greather than 2"))
I gave an example because I believe you expected one, but I'm not giving the perfect answer away ;). The problematic you have is a design problematic and no, there are no easy solution. What I propose to you is to start by assuming that the order is always in this format [ n-th channel X sample size ] (i.e. [ 4 channel X 44100 sample]). That way, you try it out first like this(as in try/except), then maybe as the inverse order.
Another suggestion (and it really depends on your use case), would be to make a data structure class that would manipulate the FFT data to return the complex or the ReIm or the AmPh0 or the AmPh as getters. (so you treat the input data as to be always time and you just give what the users want).
class FFT(object):
def __init__(self,x, fs):
from scipy.fftpack import fft, fftfreq
self.N = len(x)
self.fs = fs
self.X = fft(x) / N
def get_complex(self):
F = np.linspace(0, self.N) / (self.N / self.fs)
return(F, self.X, [])
def get_ReIm(self):
F = np.linspace(0, self.N) / (self.N / self.fs)
RE,IM = np.real(self.X), np.imag(self.X)
return(F, RE, IM)
def get_AmPh0(self):
F = np.linspace(0, (self.N-1)/2, self.N/2)/(self.N/self.fs)
# N should be int because of nfft
half_spec = np.int(self.N / 2)
AMP = abs(self.X[:half_spec])
PHI = np.arctan(np.real(self.X[:half_spec]) / np.imag(self.X[:half_spec]))
return(F, AMP, PHI)
This can then be used to be called depending on the desired output from another class with an eval to get the desire output (but you require to use the same convention across your code ;) ). 2

Resources