Python, scipy - How to fit a curve using a piecewise function with a conditional parameter that also needs to be calculated? - python-3.x

As the title suggests, I'm trying to fit a piecewise equation to a large data set. The equations I would like to fit to my data are as follows:
y(x) = b, when x < c
else:
y(x) = b + exp(a(x-c)) - 1, when x >= c
There are multiple answers to how such an issue can be addressed, but as a Python beginner I can't figure out how to apply them to my problem:
Curve fit with a piecewise function?
Conditional curve fit with scipy?
The problem is that all variables (a,b and c) have to be calculated by the fitting algorithm.
Thank you for your help!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Reduced Dataset
y = np.array([23.032, 21.765, 20.525, 21.856, 21.592, 20.754, 20.345, 20.534,
23.502, 21.725, 20.126, 21.381, 20.217, 21.553, 21.176, 20.976,
20.723, 20.401, 22.898, 22.02 , 21.09 , 22.543, 22.584, 22.799,
20.623, 20.529, 20.921, 22.505, 22.793, 20.845, 20.584, 22.026,
20.621, 23.316, 22.748, 20.253, 21.218, 23.422, 23.79 , 21.371,
24.318, 22.484, 24.775, 23.773, 25.623, 23.204, 25.729, 26.861,
27.268, 27.436, 29.471, 31.836, 34.034, 34.057, 35.674, 41.512,
48.249])
x = np.array([3756., 3759., 3762., 3765., 3768., 3771., 3774., 3777., 3780.,
3783., 3786., 3789., 3792., 3795., 3798., 3801., 3804., 3807.,
3810., 3813., 3816., 3819., 3822., 3825., 3828., 3831., 3834.,
3837., 3840., 3843., 3846., 3849., 3852., 3855., 3858., 3861.,
3864., 3867., 3870., 3873., 3876., 3879., 3882., 3885., 3888.,
3891., 3894., 3897., 3900., 3903., 3906., 3909., 3912., 3915.,
3918., 3921., 3924.])
# Simple exponential function without conditions (works so far)
def exponential_fit(x,a,b,c):
return b + np.exp(a*(x-c))
popt, pcov = curve_fit(exponential_fit, x, y, p0 = [0.1, 20,3800])
plt.plot(x, y, 'bo')
plt.plot(x, exponential_fit(x, *popt), 'r-')
plt.show()

You should change your function to something like
def exponential_fit(x, a, b, c):
if x >= c:
return b + np.exp(a*(x-c))-1
else:
return b
Edit: As chaosink pointed out in the comments, this approach no longer works as the the above function assumes that x is a scalar. However, curve_fit evaluates the function for array-like x. Consequently, one should use vectorised operations instead, see here for more details. To do so, one can either use
def exponential_fit(x, a, b, c):
return np.where(x >= c, b + np.exp(a*(x-c))-1, b)
or chaosink's suggestion in the comments:
def exponential_fit(x, a, b, c):
mask = (x >= c)
return mask * (b + np.exp(a*(x-c)) - 1) + ~mask * b
Both give:

Related

Python3, scipy.optimize: Fit model to multiple datas sets

I have a model which is defined as:
m(x,z) = C1*x^2*sin(z)+C2*x^3*cos(z)
I have multiple data sets for different z (z=1, z=2, z=3), in which they give me m(x,z) as a function of x.
The parameters C1 and C2 have to be the same for all z values.
So I have to fit my model to the three data sets simultaneously otherwise I will have different values of C1 and C2 for different values of z.
It this possible to do with scipy.optimize.
I can do it for just one value of z, but can't figure out how to do it for all z's.
For one z I just write this:
def my_function(x,C1,C1):
z=1
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
data = 'some/path/for/data/z=1'
x= data[:,0]
y= data[:,1]
from lmfit import Model
gmodel = Model(my_function)
result = gmodel.fit(y, x=x, C1=1.1)
print(result.fit_report())
How can I do it for multiple set of datas (i.e different z values?)
So what you want to do is fit a multi-dimensional fit (2-D in your case) to your data; that way for the entire data set you get a single set of C parameters that bests describes your data. I think the best way to do this is using scipy.optimize.curve_fit().
So your code would look something like this:
import scipy.optimize as optimize
import numpy as np
def my_function(xz, *par):
""" Here xz is a 2D array, so in the form [x, z] using your variables, and *par is an array of arguments (C1, C2) in your case """
x = xz[:,0]
z = xz[:,1]
return par[0] * x**2 * np.sin(z) + par[1] * x**3 * np.cos(z)
# generate fake data. You will presumable have this already
x = np.linspace(0, 10, 100)
z = np.linspace(0, 3, 100)
xx, zz = np.meshgrid(x, z)
xz = np.array([xx.flatten(), zz.flatten()]).T
fakeDataCoefficients = [4, 6.5]
fakeData = my_function(xz, *fakeDataCoefficients) + np.random.uniform(-0.5, 0.5, xx.size)
# Fit the fake data and return the set of coefficients that jointly fit the x and z
# points (and will hopefully be the same as the fakeDataCoefficients
popt, _ = optimize.curve_fit(my_function, xz, fakeData, p0=fakeDataCoefficients)
# Print the results
print(popt)
When I do this fit I get precisely the fakeDataCoefficients I used to generate the function, so the fit works well.
So the conclusion is that you don't do 3 fits independently, setting the value of z each time, but instead you do a 2D fit which takes the values of x and z simultaneously to find the best coefficients.
Your code is incomplete and has a few syntax errors.
But I think that you want to build a model that concatenates the models for the different data sets, and then fit the concatenated data to that model. Within the context of lmfit (disclosure: author and maintainer), I often find it easier to use minimize() and an objective function for multiple data set fits rather than the Model class. Perhaps start with something like this:
import lmfit
import numpy as np
# define the model function for each dataset
def my_function(x, c1, c2, z=1):
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
# Then write an objective function like this
def f2min(params, x, data2d, zlist):
ndata, npts = data2d.shape
residual = 0.0*data2d[:]
for i in range(ndata):
c1 = params['c1_%d' % (i+1)].value
c2 = params['c2_%d' % (i+1)].value
residual[i,:] = data[i,:] - my_function(x, c1, c2, z=zlist[i])
return residual.flatten()
# now build that `data2d`, `zlist` and build the `Parameters`
data2d = []
zlist = []
x = None
for fname in dataset_names:
d = np.loadtxt(fname) # or however you read / generate data
if x is None: x = d[:, 0]
data2d.append(d[:, 1])
zlist.append(z_for_dataset(fname)) # or however ...
data2d = np.array(data2d) # turn list into nd array
ndata, npts = data2d.shape
params = lmfit.Parameters()
for i in range(ndata):
params.add('c1_%d' % (i+1), value=1.0) # give a better starting value!
params.add('c2_%d' % (i+1), value=1.0) # give a better starting value!
# now you're ready to do the fit and print out the results:
result = lmfit.minimize(f2min, params, args=(x, data2d, zlist))
print(results.fit_report())
That code really a sketch and is all untested, but hopefully will give you a good starting foundation.

Numpy: division by zero error, but mathematically function is apparently defined

I'm testing out some functions to fit with data, and one of them (in 2-D) is
f(x) = (1/(1-x)) / (1 + 1/(1-x))
Which, according to Wolfram and the Google plotters, gives you the result
f(1) = 1
I've tried to get this to work without hard coding the case
if x == 1:
return 1
but I end up with a nan and a RunTimeWarning informing me that I have indeed divided by zero.
import numpy as np
def f(x):
return 1/(1-x) / (1 + 1/(1-x))
x_range = np.linspace(0, 1, 50)
y = f(x_range)
print(y)
Is there a more elegant solution than to simply introduce a hard-coded if?
Is there a reason to keep it in this form, you can simplify it to:
def f(x):
return 1/(2-x)
Wolfram and Google probably to some sort of algebraic simplification too.
Just simplify the equation for f(x) = (1/(1-x)) / (1 + 1/(1-x)). The simplified equation will be (1/(2-x)). Now update the program as:
import numpy as np
def f(x):
return 1/(2-x)
x_range = np.linspace(0, 1, 50)
y = f(x_range)
print(y)
output:
[0.5 0.50515464 0.51041667 0.51578947 0.5212766 0.52688172
0.5326087 0.53846154 0.54444444 0.5505618 0.55681818 0.56321839
0.56976744 0.57647059 0.58333333 0.59036145 0.59756098 0.60493827
0.6125 0.62025316 0.62820513 0.63636364 0.64473684 0.65333333
0.66216216 0.67123288 0.68055556 0.69014085 0.7 0.71014493
0.72058824 0.73134328 0.74242424 0.75384615 0.765625 0.77777778
0.79032258 0.80327869 0.81666667 0.83050847 0.84482759 0.85964912
0.875 0.89090909 0.90740741 0.9245283 0.94230769 0.96078431
0.98 1. ]

Python partial derivative

I am trying to put numbers in a function that has partial derivatives but I can't find a correct way to do it,I have searched all the internet and I always get an error.Here is the code:
from sympy import symbols,diff
import sympy as sp
import numpy as np
from scipy.misc import derivative
a, b, c, d, e, g, h, x= symbols('a b c d e g h x', real=True)
da=0.1
db=0.2
dc=0.05
dd=0
de=0
dg=0
dh=0
f = 4*a*b+a*sp.sin(c)+a**3+c**8*b
x = sp.sqrt(pow(diff(f, a)*da, 2)+pow(diff(f, b)*db, 2)+pow(diff(f, c)*dc, 2))
def F(a, b, c):
return x
print(derivative(F(2 ,3 ,5)))
I get the following error: derivative() missing 1 required positional argument: 'x0'
I am new to python so maybe it's a stupid question but I would feel grateful if someone helped me.
You can find three partial derivatives of function foo by variables a, b and c at the point (2,3,5):
f = 4*a*b+a*sp.sin(c)+a**3+c**8*b
foo = sp.sqrt(pow(diff(f, a)*da, 2)+pow(diff(f, b)*db, 2)+pow(diff(f, c)*dc, 2))
foo_da = diff(foo, a)
foo_db = diff(foo, b)
foo_dc = diff(foo, c)
print(foo_da," = ", float(foo_da.subs({a:2, b:3, c:5})))
print(foo_db," = ", float(foo_db.subs({a:2, b:3, c:5})))
print(foo_dc," = ", float(foo_dc.subs({a:2, b:3, c:5})))
I have used a python package 'sympy' to perform the partial derivative. The point at which the partial derivative is to be evaluated is val. The argument 'val' can be passed as a list or tuple.
# Sympy implementation to return the derivative of a function in x,y
# Enter ginput as a string expression in x and y and val as 1x2 array
def partial_derivative_x_y(ginput,der_var,val):
import sympy as sp
x,y = sp.symbols('x y')
function = lambda x,y: ginput
derivative_x = sp.lambdify((x,y),sp.diff(function(x,y),x))
derivative_y = sp.lambdify((x,y),sp.diff(function(x,y),y))
if der_var == 'x' :
return derivative_x(val[0],val[1])
if der_var == 'y' :
return derivative_y(val[0],val[1])
input1 = 'x*y**2 + 5*log(x*y +x**7) + 99'
partial_derivative_x_y(input1,'y',(3,1))

Lambifying Nested Parametric Integrals

I have a loss function L(u1,...,un) that takes the form
L(u) = Integral( (I**2 + J**2) * h, (t,c,d) ) //h=h(t), I=I(t,u)
where
I = Integral( f, (x,a,b) ) // f=f(x,t,u)
J = Integral( g, (x,a,b) ) // g=g(x,t,u)
I want to numerically minimize L with scipy, hence I need to lambdify the expression.
However at this point in time lambdify does not natively support translating integrals. With some tricks one can get it to work with single parametric integrals, see Lambdify A Parametric Integral. However I don't see how the proposed solution could possibly be extended to this generalised case.
One idea that in principle should work is the following:
Take the computational graph defining L. Recursively, starting from the leaves, replace each symbolic operation with the corresponding numerical operation, expressed as a lambda function. However this would lead to an immense nesting of lambda function, which I suspect has a very bad influence on performance.
Ideally we would want to arrive at the same result as one would by hand crafting:
L = lambda u: quad(lambda t: (quad(lambda x: f,a,b)[0]**2
+ quad(lambda x: g,a,b)[0]**2)*h, c, d)[0]
MWE: (using code from old thread)
from sympy import *
from scipy.integrate import quad
import numpy as np
def integral_as_quad(function, limits):
x, a, b = limits
param = tuple(function.free_symbols - {x})
f = sp.lambdify((x, *param), function, modules=['numpy'])
return quad(f, a, b, args=param)[0]
x,a,b,t = sp.symbols('x a b t')
f = (x/t-a)**2
g = (x/t-b)**2
h = exp(-t)
I = Integral( f**2,(x,0,1))
J = Integral( g**2,(x,0,1))
K = Integral( (I**2 + J**2)*h,(t,1,+oo))
F = lambdify( (a,b), K, modules=['sympy',{"Integral": integral_as_quad}])
L = lambda a,b: quad(lambda t: (quad(lambda x: (x/t-a)**2,0,1)[0]**2
+ quad(lambda x: (x/t-b)**2,0,1)[0]**2)*np.exp(-t), 1, np.inf)[0]
display(F(1,1))
display(type(F(1,1)))
display(L(1,1))

Trapezoidal approximation error plotting in python

Im trying to code a function that plots the error of the composite trapezoidal rule against the step size.
Obviously this doesn't look to good since i'm just starting to learn these things.
Anyhow i managed to get the plot and everything, but i'm supposed to get a plot with slope 2, so i am in need of help to figure out where i did go wrong.
from scipy import *
from pylab import *
from matplotlib import *
def f(x): #define function to integrate
return exp(x)
a=int(input("Integrate from? ")) #input for a
b=int(input("to? ")) #inpput for b
n=1
def ctrapezoidal(f,a,b,n): #define the function ctrapezoidal
h=(b-a)/n #assign h
s=0 #clear sum1-value
for i in range(n): #create the sum of the function
s+=f(a+((i)/n)*(b-a)) #iterate th sum
I=(h/2)*(f(a)+f(b))+h*s #the function + the sum
return (I, h) #returns the approximation of the integral
val=[] #start list
stepsize=[]
error=[]
while len(val)<=2 or abs(val[-1]-val[-2])>1e-2:
I, h=ctrapezoidal(f,a,b,n)
val.append(I)
stepsize.append(h)
n+=1
for i in range(len(val)):
error.append(abs(val[i]-(e**b-e**a)))
error=np.array(error)
stepsize=np.array(stepsize)
plt.loglog(stepsize, error, basex=10, basey=10)
plt.grid(True,which="both",ls="steps")
plt.ylabel('error')
plt.xlabel('h')

Resources