repeating tests in multiple functions python - python-3.x

I have some function for sound processing/ sound processing. And before it was all a single channel. But know i make it less or more multi channel.
At this point i have the feeling i do part of the scrips over and over again.
In this example it are two functions(my original function is longer) but the same happens also in single scripts.
my Two functions
import numpy as np
# def FFT(x, fs, *args, **kwargs):
def FFT(x, fs, output='complex'):
from scipy.fftpack import fft, fftfreq
N = len(x)
X = fft(x) / N
if output is 'complex':
F = np.linspace(0, N) / (N / fs)
return(F, X, [])
elif output is 'ReIm':
F = np.linspace(0, N) / (N / fs)
RE = np.real(X)
IM = np.imag(X)
return(F, RE, IM)
elif output is 'AmPh0':
F = np.linspace(0, (N-1)/2, N/2)
F = F/(N/fs)
# N should be int becouse of nfft
half_spec = np.int(N / 2)
AMP = abs(X[0:half_spec])
PHI = np.arctan(np.real(X[0:half_spec]) / np.imag(X[0:half_spec]))
return(F, AMP, PHI)
elif output is 'AmPh':
half_spec = np.int(N / 2)
F = np.linspace(1, (N-1)/2, N/2 - 1)
F = F/(N/fs)
AMP = abs(X[1:half_spec])
PHI = np.arctan(np.real(X[1:half_spec])/np.imag(X[1:half_spec]))
return(F, AMP, PHI)
def mFFT(x, fs, spectrum='complex'):
fft_shape = np.shape(x)
if len(fft_shape) == 1:
mF, mX1, mX2 = FFT(x, fs, spectrum)
elif len(fft_shape) == 2:
if fft_shape[0] < fft_shape[1]:
pass
elif fft_shape[0] > fft_shape[1]:
x = x.T
fft_shape = np.shape(x)
mF = mX1 = mX2 = []
for channel in range(fft_shape[0]):
si_mF, si_mX1, si_mX2 = FFT(x[channel], fs, spectrum)
if channel == 0:
mF = np.append(mF, si_mF)
mX1 = np.append(mX1, si_mX1)
mX2 = np.append(mX2, si_mX2)
else:
mF = np.vstack((mF, si_mF))
mX1 = np.vstack((mX1, si_mX1))
if si_mX2 == []:
pass
else:
mX2 = np.vstack((mX2, si_mX2))
elif len(fft_shape) > 2:
raise ValueError("Shape of input can't be greather than 2")
return(mF, mX1, mX2)
The second funcion in this case have the problem.
The reason for this checks is best to understand with an example:
I have recorded a sample of 1 second of audio data with 4 microphones.
so i have an ndim array of 4 x 44100 samples.
The FFT works on every even length array. This means that i get an result in both situations (4 x 44100 and 44100 x 4).
For all function after this function i have also 2 data types. or a complex signal or an tuple of two signals (amplitude and phase)... what's create an extra switch/ check in the script.
check type (tuple or complex data)
check direction (ad change it)
Check size / shape
run function and append/ stack this
Are there some methods to make this less repeative i have this situation in at least 10 functions...

Bert,
The problematic I'm understanding is the repeat of the calls you're making to do checks of all sorts. I'm not understanding all but I'm guessing they are made to format your data in a way you'll be able to execute fft on it.
One of the philosophy about computer programming in Python is "It's easier to ask forgiveness than it is to get permission."[1] . This means, you should probably try first and then ask forgiveness (try, except). It's much faster to do it this way then to do a lots of checks on the value. Also, those who are going to use your program should understand how it works pretty easily; make it easy to read without those check by separating the logic business from the technical logic. Don't worry, it's not evident and the fact you're asking is an indicator you're catching something isn't right :).
Here is what I would propose for your case (and it's not the perfect solution!):
def mFFT(x, fs, spectrum='complex'):
#Assume we're correcty align when receiving the data
#:param: x assume that we're multi-channel in the format [channel X soundtrack ]
#also, don't do this:
#mF = mX1 = si_mX2 = []
# see why : https://stackoverflow.com/questions/2402646/python-initializing-multiple-lists-line
mF = []
mX1 = []
mX2 = []
try:
for channel in range(len(x)):
si_mF, si_mX1, si_mX2 = FFT(x[channel], fs, spectrum)
mF.append(si_mF)
mX1.append(si_mX1)
mX2.append(si_mX2)
return (mF, mX1, mX2)
except:
#this is where you would try to point out why it could have failed. One good you had was the check for the orientation of the data and try again;
if np.shape(x)[0] > np.shape(x)[1]:
result = mFFT(x.T,fs,spectrum)
return result
else :
if np.shape(x)[0] > 2:
raise(ValueError("Shape of input isn't supported for greather than 2"))
I gave an example because I believe you expected one, but I'm not giving the perfect answer away ;). The problematic you have is a design problematic and no, there are no easy solution. What I propose to you is to start by assuming that the order is always in this format [ n-th channel X sample size ] (i.e. [ 4 channel X 44100 sample]). That way, you try it out first like this(as in try/except), then maybe as the inverse order.
Another suggestion (and it really depends on your use case), would be to make a data structure class that would manipulate the FFT data to return the complex or the ReIm or the AmPh0 or the AmPh as getters. (so you treat the input data as to be always time and you just give what the users want).
class FFT(object):
def __init__(self,x, fs):
from scipy.fftpack import fft, fftfreq
self.N = len(x)
self.fs = fs
self.X = fft(x) / N
def get_complex(self):
F = np.linspace(0, self.N) / (self.N / self.fs)
return(F, self.X, [])
def get_ReIm(self):
F = np.linspace(0, self.N) / (self.N / self.fs)
RE,IM = np.real(self.X), np.imag(self.X)
return(F, RE, IM)
def get_AmPh0(self):
F = np.linspace(0, (self.N-1)/2, self.N/2)/(self.N/self.fs)
# N should be int because of nfft
half_spec = np.int(self.N / 2)
AMP = abs(self.X[:half_spec])
PHI = np.arctan(np.real(self.X[:half_spec]) / np.imag(self.X[:half_spec]))
return(F, AMP, PHI)
This can then be used to be called depending on the desired output from another class with an eval to get the desire output (but you require to use the same convention across your code ;) ). 2

Related

how do I make a for loop understand that I want it to run i for all values inside x. x being a defined array

My code right now is as it follows:
from math import *
import matplotlib.pyplot as plt
import numpy as np
"""
TITLE
"""
def f(x,y):
for i in range(len(x)):
y.append(exp(-x[i]) - sin (pi*x[i]/2))
def ddxf(x,y2):
for i in range(len(x)):
y2.append(-exp(-x[i]) - (pi/2)*cos(pi*x[i]/2))
y = []
y2 = []
f(x, y)
x = np.linspace(0, 4, 100)
plt.title('Graph of function x')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.plot(x, y, 'g')
plt.grid(True)
plt.show()
x0 = float(input("Insert the approximate value of the first positive root: "))
intMax = 100
i = 0
epsilon = 1.e-7
while abs(f(x0,y)) > epsilon and i < intMax:
x1 = x0 - (f(x0))/(ddxf(x0))
x0 = x1
i += 1
print (i)
print (x1)
I get this error when running the program. it seems that (len(x)) cannot be used if x isnt a string. which doesn't make sense to me. if the array has a len and isn't infinite, why (len(x)) cant read his length? anyway, please help me. hope I made myself clear
Regarding the error: You are using x before defining it. In your code, you first use f(x, y) and only after that you actually define what x is (namely x = np.linspace(0, 4, 100)). You probably want to swap these two lines to fix the issue.
Regarding the question in the title: len(x) should be fine to get the length of a list. However, in Python you don't need to go through a list like that. You can instead use for element_name in list_name: This will essentially go through list_name element by element and make it available to you with the name element_name.
There is also something called list comprehensions in Python - you might want to take a look at those and see whether you can apply it to your code.

scipy solve_ivp with adaptive solution

I am struggling to understand how scipy.solve_ivp() handles errors in a system of ODE. Lets say I have the following, simple code for a single ODEs, and I think I might be doing things wrong in some way. Lets say my rhs looks something like:
from scipy.integrate import solve_ivp
def rhs_func(t, y):
z = 1.0/( x - y + 1j )
return z
Suppose we call solve_ivp with the following signature:
Z_solution = ivp_adaptive.solve_ivp( fun = rhs_func,
t_span = [100,0],
y0 = y0, #some initial value of 0 for example
method ='RK45',
t_eval = None,
args = some_additional_arguments_to_rhs_func,
dense_output = False,
rtol = 1e-8
atol = 1e-10
)
Now, the absolute and relative tolerances are supposed to fix the error of the caculation. The problem I am having has to do with the "t_eval=None" in this case. Apparently, this choice lets the integrator (in this case of type RK45) to choose the time step according to the specified tolerances above being or not being exceeded, i.e., the steps are not fixed, but somehow taking a larger step in t would mean a solution has been found that lies below the tolerances above (atol=1e-10 , rtol=1e-8). This is particularly useful in problems with large variations of the time scale, where a uniform discretization of t is very inefficient.
My big problem has to do with the following piece of code in scipy.integrate._ivp.solve_ivp() around line 575, with the "t_eval == None" case:
while status is None:
message = solver.step()
if solver.status == 'finished':
status = 0
elif solver.status == 'failed':
status = -1
break
t_old = solver.t_old
t = solver.t
y = solver.y
if dense_output:
sol = solver.dense_output()
interpolants.append(sol)
else:
sol = None
if events is not None:
g_new = [event(t, y) for event in events]
active_events = find_active_events(g, g_new, event_dir)
if active_events.size > 0:
if sol is None:
sol = solver.dense_output()
root_indices, roots, terminate = handle_events(
sol, events, active_events, is_terminal, t_old, t)
for e, te in zip(root_indices, roots):
t_events[e].append(te)
y_events[e].append(sol(te))
if terminate:
status = 1
t = roots[-1]
y = sol(t)
g = g_new
# HERE I HAVE MODIFIED THE FILE BY CALLING AN INTERPOLATION FUNCTION FOR THE SOLUTION
if t_eval is None:
ts.append(t)
#ys.append(y)
# this calls to adapt the solution to a new set of values x over which y(x,t) is
# defined
interp_solution(t,y,solver,args)
y = solver.y
ys.append(y)
where I have defined a function:
def interp_solution( t, y, solver, args ):
import numpy as np
from scipy import interpolate
x_old = args.get_old_grid() # this call just returns an array of the style of
# x_new, and is where y is defined
x_new = np.linspace( -t, t, dim ) # the new array where components of y are
# defined
y_interp = interpolate.interp1d( x_old, y )
y_new = y_interp( x_new )
solver.y = y_new # update the solver y
# finally, we change the maximum allowed step of the integrator if t is below
# some threshold value
if ( t < args.get_threshold() ):
solver.max_step = #some number
return y_new
When I look at the results, it seems that this is very sensitive to the tolerances and the way the integration steps are performed, but somehow I fail to see where errors could come from in this approach -- can anyone explain if this approach is somehow affecting the solution and the associated errors ? How can one implement a similar approach in this fashion? Any help is greatly appreaciated.

Why is getting the first 30 keys of the dictionary in two statements faster than one statement?

I was doing a benchmark for myself that I encountered this interesting thing. I am trying to get the first 30 keys of a dictionary, and I have written three ways to get it as follows:
import time
dic = {str(i): i for i in range(10 ** 6)}
start_time = time.time()
x = list(dic.keys())[0:30]
print(time.time() - start_time)
start_time = time.time()
y = list(dic.keys())
x = y[0:30]
print(time.time() - start_time)
start_time = time.time()
z = dic.keys()
y = list(z)
x = y[0:30]
print(time.time() - start_time)
The results are:
0.015970945358276367
0.010970354080200195
0.01691460609436035
Surprisingly, the second method is much faster! Any thoughts on this?
Using Python's timeit module to measure various alternatives. I added mine which doesn't convert the keys to list:
from timeit import timeit
dic = {str(i): i for i in range(10 ** 6)}
def f1():
x = list(dic.keys())[0:30]
return x
def f2():
y = list(dic.keys())
x = y[0:30]
return x
def f3():
z = dic.keys()
y = list(z)
x = y[0:30]
return x
def f4():
x = [k for _, k in zip(range(30), dic.keys())]
return x
t1 = timeit(lambda: f1(), number=10)
t2 = timeit(lambda: f2(), number=10)
t3 = timeit(lambda: f3(), number=10)
t4 = timeit(lambda: f4(), number=10)
print(t1)
print(t2)
print(t3)
print(t4)
Prints:
0.1911074290110264
0.20418328599771485
0.18727918600779958
3.5186996683478355e-05
Maybe this is due to inaccuracies in your measure of time. You can use timeit for doing this kind of things:
import timeit
dic = {str(i): i for i in range(10 ** 6)}
# 27.5125/29.0836/26.8525
timeit.timeit("x = list(dic.keys())[0:30]", number=1000, globals={"dic": dic})
# 28.6648/26.4684/30.9534
timeit.timeit("y = list(dic.keys());x=y[0:30]", number=1000)
# 31.7345/29.5301/30.7541
timeit.timeit("z=dic.keys();y=list(z);x=y[0:30]", number=1000, globals={'dic': dic})
The comments show the times I got when running the same code 3 different times. As you can see, even by performing a large number of repetitions, it is possible to obtain quite large variations in time measured. This can be due to several different things:
An item can be in the cache of your processor or not.
Your processor can be occupied doing several other things.
Etc...
As stated by #Andrej Kesely, your bottleneck is due to the fact that you cast your dictionary keys into a list. By doing so, Python goes through the entire dictionary keys, because that's how it converts something to a list generally. Hence, by avoiding this, you can get much better results.

Divide by Zero in Mean()?

I'm trying to write some code to compute mean, Variance, Standard Deviation, FWHM, and finally evaluate the Gaussian Integral. I've been running into a division by zero error that I can't get past and I would like to know the solution for this ?
Where it's throwing an error I've tried to throw an exception handler as follows
Average = (sum(yvalues)) / (len(yvalues)) try: return (sum(yvalues) / len(yvalues))
expect ZeroDivisionError:
return 0
xvalues = []
yvalues = []
def generate():
for i in range(0,300):
a = rand.uniform((float("-inf") , float("inf")))
b = rand.uniform((float("-inf") , float("inf")))
xvalues.append(i)
### Defining the variable 'y'
y = a * (b + i)
yvalues.append(y) + 1
def mean():
Average = (sum(yvalues))/(len(yvalues))
print("The average is", Average)
return Average
def varience():
# This calculates the SD and the varience
s = []
for i in yvalues:
z = i - mean()
z = (np.abs(i-z))**2
s.append(y)**2
t = mean()
v = numpy.sqrt(t)
print("Answer for Varience is:", v)
return v
Traceback (most recent call last):
File "Tuesday.py", line 42, in <module>
def make_gauss(sigma=varience(), mu=mean(), x = random.uniform((float("inf"))*-1, float("inf"))):
File "Tuesday.py", line 35, in varience
t = mean()
File "Tuesday.py", line 25, in mean
Average = (sum(yvalues))/(len(yvalues))
ZeroDivisionError: division by zero
There are a few things that are not quite right as people noted above.
import random
import numpy as np
def generate():
xvalues, yvalues = [], []
for i in range(0,300):
a = random.uniform(-1000, 1000)
b = random.uniform(-1000, 1000)
xvalues.append(i)
### Defining the variable 'y'
y = a * (b + i)
yvalues.append(y)
return xvalues, yvalues
def mean(yvalues):
return sum(yvalues)/len(yvalues)
def variance(yvalues):
# This calculates the SD and the varience
s = []
yvalues_mean = mean(yvalues)
for y in yvalues:
z = (y - yvalues_mean)**2
s.append(z)
t = mean(s)
return t
def variance2(yvalues):
yvalues_mean = mean(yvalues)
return sum( (y-yvalues_mean)**2 for y in yvalues) / len(yvalues)
# Generate the xvalues and yvalues
xvalues, yvalues = generate()
# Now do the calculation, based on the passed parameters
mean_yvalues = mean(yvalues)
variance_yvalues = variance(yvalues)
variance_yvalues2 = variance2(yvalues)
print('Mean {} variance {} {}'.format(mean_yvalues, variance_yvalues, variance_yvalues2))
# Using Numpy
np_mean = np.mean(yvalues)
np_var = np.var(yvalues)
print('Numpy: Mean {} variance {}'.format(np_mean, np_var))
The way variance was calculated isn't quite right, but given the comment of "SD and variance" you were probably going to calculate both.
The code above gives 2 (well, 3) ways to do what I understand you were trying to do but I changed a few of the methods to clean them up a bit. generate() returns two lists now. mean() returns the mean, etc. The function variance2() gives an alternative way to calculate the variance but using a list comprehension style.
The last couple of lines are an example using numpy which has all of it built in and, if available, is a great way to go.
The one part that wasn't clear was the random.uniform(float("-inf"), float("inf"))) which seems to be an error (?).
You are calling mean before you call generate.
This is obvious since yvalues.append(y) + 1 (in generate) would have caused another error (TypeError) since .append returns None and you can't add 1 to None.
Change yvalues.append(y) + 1 to yvalues.append(y + 1) and then make sure to call generate before you call mean.
Also notice that you have the same error in varience (which should be called variance, btw). s.append(y)**2 should be s.append(y ** 2).
Another error you have is that the stacktrace shows make_gauss(sigma=varience(), mu=mean(), x = random.uniform((float("inf"))*-1, float("inf"))).
I'm pretty sure you don't actually want to call varience and mean on this line, just reference them. So also change that line to make_gauss(sigma=varience, mu=mean, x = random.uniform((float("inf"))*-1, float("inf")))

simpson integration on python

I am trying to integrate numerically using simpson integration rule for f(x) = 2x from 0 to 1, but keep getting a large error. The desired output is 1 but, the output from python is 1.334. Can someone help me find a solution to this problem?
thank you.
import numpy as np
def f(x):
return 2*x
def simpson(f,a,b,n):
x = np.linspace(a,b,n)
dx = (b-a)/n
for i in np.arange(1,n):
if i % 2 != 0:
y = 4*f(x)
elif i % 2 == 0:
y = 2*f(x)
return (f(a)+sum(y)+f(x)[-1])*dx/3
a = 0
b = 1
n = 1000
ans = simpson(f,a,b,n)
print(ans)
There is everything wrong. x is an array, everytime you call f(x), you are evaluating the function over the whole array. As n is even and n-1 odd, the y in the last loop is 4*f(x) and from its sum something is computed
Then n is the number of segments. The number of points is n+1. A correct implementation is
def simpson(f,a,b,n):
x = np.linspace(a,b,n+1)
y = f(x)
dx = x[1]-x[0]
return (y[0]+4*sum(y[1::2])+2*sum(y[2:-1:2])+y[-1])*dx/3
simpson(lambda x:2*x, 0, 1, 1000)
which then correctly returns 1.000. You might want to add a test if n is even, and increase it by one if that is not the case.
If you really want to keep the loop, you need to actually accumulate the sum inside the loop.
def simpson(f,a,b,n):
dx = (b-a)/n;
res = 0;
for i in range(1,n): res += f(a+i*dx)*(2 if i%2==0 else 4);
return (f(a)+f(b) + res)*dx/3;
simpson(lambda x:2*x, 0, 1, 1000)
But loops are generally slower than vectorized operations, so if you use numpy, use vectorized operations. Or just use directly scipy.integrate.simps.

Resources