Python: Fitting a piecewise polynomial - python-3.x

I am trying to fit a piecewise polynomial function
Code:
import numpy as np
import scipy
from scipy.interpolate import UnivariateSpline, splrep
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
def piecewise_func(x, X, Y):
"""
cond_l: condition list
func_l: function list
"""
spl = UnivariateSpline(X, Y, k=3, s=0.5)
tck = (spl._data[8], spl._data[9], 3) # tck = (knots, coefficients, degree)
p = scipy.interpolate.PPoly.from_spline(tck)
cond_l = []
func_l = []
for idx, i in enumerate(range(3, len(spl.get_knots()) + 3 - 1)):
cond_l.append([(x >= p.x[i] & x < p.x[i + 1])])
func_l.append([lambda x: p.c[3, i] + p.c[2, i] * x + p.c[1, i] * x ** 2 + p.c[0, i] * x ** 3])
return np.piecewise(x, cond_l, func_l)
if __name__ == '__main__':
xdata = [0.28190937, 0.63429607, 0.91620544, 1.68793236, 2.32350115, 2.95215219, 4.5,
4.78103382, 7.2, 7.53430054, 8.03627018, 9., 9.86212529, 11.25951191, 11.62658532, 11.65598578, 13.90295926]
ydata = [0.36273168, 0.81614628, 1.17887796, 1.4475374, 5.52692706, 2.17548169, 3.55313396, 3.80326533, 7.75556311, 8.30176616, 10.72117182, 11.2499386,
11.72296513, 11.02146624, 14.51260631, 20.59365525, 21.77847853]
spl = UnivariateSpline(xdata, ydata, k=3, s=1)
plt.plot(xdata, ydata, '*')
plt.plot(xdata, spl(xdata))
plt.show()
p, e = curve_fit(piecewise_func, xdata, ydata)
# x_plot = np.linspace(0., 0.15, len(x))
# plt.plot(x, y, "+")
# plt.plot(x, (piecewise_func(x_plot, *p)), 'C3-', lw=3)
I tried the UnivariateSpline function to interpolate, I see the following result
However, I don't want the polynomial curve to pass through all data points. I tried varying the smoothing factor but I am not able to obtain something like the one below.
Expected output:
I'm trying curve fitting (Use UnivariateSpline to fit data tightly) to get the expected output and I have the following issues.
piecewise_func in the code posted returns the piecewise polynomial.
Passing this to curve_fit(piecewise_func, xdata, ydata) returns an error
Error:
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
ValueError: diff requires input that is at least one dimensional
I am not sure what is wrong.
Suggestions on how to get the expected fit will be
of great help.

I would recommend having a closer look at the parameter s in the UnivariateSpline documentation:
s : float or None, optional
Positive smoothing factor used to choose the number of knots. Number of knots will be increased until the smoothing condition is satisfied:
sum((w[i] * (y[i]-spl(x[i])))**2, axis=0) <= s
If s is None, s = len(w) which should be a good value if 1/w[i] is an estimate of the standard deviation of y[i]. If 0, spline will interpolate through all data points. Default is None.
Since you do not set w, this is just a complicated way of saying that s is the least squares error that you allow, i.e., squared errors summed over all the data points. Your value of 1 does not lead to interpolation but it is quite tight compared to what you want to achieve.
Taking
spl = UnivariateSpline(xdata, ydata, k=3, s=10)
you get the following:
Yet closer to your goal is s=100:
So my recommendation is to play around with s and if that proves insufficient, to ask a new question describing what you need more precisely. I haven't had a proper look at the problem with piecewise_func.

Related

Find out if point is part of curve (spline, splipy)

I have some coordinates of a 3D point curve through which I lay a spline like so:
from splipy import curve_factory
pts = [...] #3D coordinate points
curve = curve_factory.curve(pts)
I know that I can get a point in 3D along the curve by evaluating it after a certain length:
point_on_curve = curve.evaluate(t)
print(point_on_curve) #outputs coordinates: (x y z)
Is it however somehow possible to do it the other way round? Is there a function/method that can tell me if a certain point is part of the curve? Or if its almost part of the curve? Something like:
curve.func(point) #output: True
or
curve.func(point) #output: distance to curve 0.0001 --> also part of curve
Thanks!
I've found this script by ventusff that performs an optimization to find the value of the parameter that you call t (in the script is u) which gives the point on the spline closest to the external point.
I report below the code with some changes to make it clearer for you. I've defined a tolerance equal to 0.001.
The selection of the optimization solver and of its parameter values requires a little bit of study. I do not have enough time now for doing that, but you can try to experiment a little bit.
In this case SciPy is used for spline generation and evaluation, but you can easily replace it with splipy. The optimization is the interesting part performed using SciPy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev
from scipy.spatial.distance import euclidean
from scipy.optimize import fmin_bfgs
points_count = 40
phi = np.linspace(0, 2. * np.pi, points_count)
k = np.linspace(0, 2, points_count)
r = 0.5 + np.cos(phi)
x, y, z = r * np.cos(phi), r * np.sin(phi), k
tck, u = splprep([x, y, z], s=1)
points = splev(u, tck)
idx = np.random.randint(low=0, high=40)
noise = np.random.normal(scale=0.01)
external_point = np.array([points[0][idx], points[1][idx], points[2][idx]]) + noise
def distance_to_point(u_):
s = splev(u_, tck)
return euclidean(external_point, [s[0][0], s[1][0], s[2][0]])
closest_u = fmin_bfgs(distance_to_point, x0=np.array([0.0]), gtol=1e-8)
closest_point = splev(closest_u, tck)
tol = 1e-3
if euclidean(external_point, [closest_point[0][0], closest_point[1][0], closest_point[2][0]]) < tol:
print("The point is very close to the spline.")
ax = plt.figure().add_subplot(projection='3d')
ax.plot(points[0], points[1], points[2], "r-", label="Spline")
ax.plot(external_point[0], external_point[1], external_point[2], "bo", label="External Point")
ax.plot(closest_point[0], closest_point[1], closest_point[2], "go", label="Closest Point")
plt.legend()
plt.show()
The script draws the plot below:
and prints the following output:
Current function value: 0.000941
Iterations: 5
Function evaluations: 75
Gradient evaluations: 32
The point is very close to the spline.

How to create synthetic data for a decaying curve in order to extrapolate it beyond some point?

In the following curve, I would like to extend the measurements beyond x=1 in order to have a better estimate of the green curve compared to red line.
Note: I do not have the analytical form of the function but only x, y data sets in the range (0, 1).
Is there any package in python in order to extrapolate a curve beyond some value given that we have the interpolated form of the curve?
Here is my attempt assuming a linear drop:
from scipy.interpolate import interp1d
import numpy as np
CURVE_CUT_INDEX = #the index corresponding to x=1 in the x array
def extrapolator_function(x_vals, y_vals, x_list):
interpolator = interp1d(x_vals, y_vals, kind='cubic')
x_1 = x_vals[-1]
y_1 = interpolator(x_1)
y_grad, x_grad = (np.gradient(y_vals, np.arange(y_vals.size)),
np.gradient(x_vals, np.arange(x_vals.size)))
slope = np.divide(y_grad, x_grad, out=np.zeros_like(y_grad), where=x_grad != 0)[-1]
x_out = x_list[CURVE_CUT_INDEX + 1:]
y_pred = np.array([slope * (x-x_1) + y_1 for x in x_out])
return x_vals, y_vals, x_out, y_pred
def plotter(ax, x_list, y_list):
x_vals, y_vals = x_list[0:CURVE_CUT_INDEX + 1], y_list(x_list)[0:CURVE_CUT_INDEX + 1]
x_vals, y_vals, x_out, y_pred = extrapolator_function(x_vals, y_vals, x_list)
return ax.plot(x_vals, y_vals, 'g-', x_out, y_pred, 'r-', alpha=1, lw=2)
which will result in the following extrapolation scheme (which is not what I want).

Python Scipy Curvefit to Linear Quadratic Curve

I'm trying to fit a linear quadratic model curve to experiment data. The Y axis values reduce from 1 to 10^-5. When I use the following code, the resulting curve often seems to not fit the data at higher X values. I have a suspicion that because the Y values at high X values are so small, the resulting difference between the experiment value and model value is small. But I would like the model curve to pass as close to the higher X value points as possible (even if it means the low values are not as well fitted). I haven't found anything about weighting in scipy.optimize.curve_fit, other than using standard deviations (which I don't have). How can I improve my model fit at high X values?
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def lq(x, a, b):
#y(x) = exp[-(ax+bx²)]
y = []
for i in x:
x2=i**2
ax = a*i
bx2 = b*x2
y.append(np.exp(-(ax+bx2)))
return y
#x and y are from experiment
x=[0,1.778,2.921,3.302,6.317,9.524,10.54]
y=[1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041]
(a,b), pcov = curve_fit(lq, x, y, p0=[0.05,0.05])
#make the model curve using a and b
xmodel = list(range(0,20))
ymodel = lq(xmodel, a, b)
fig, ax1 = plt.subplots()
ax1.set_yscale('log')
ax1.plot(x,y, "ro", label="Experiment")
ax1.plot(xmodel,ymodel, "r--", label="Model")
plt.show()
I agree with your assessment that the fit is not very sensitive to small misfits for the small values of y. Since you are plotting the data and fit on a semi-log plot, I think that what you really want is to fit in the log-space as well. That is, you could fit log(y) to a quadratic function. As an aside (but an important one if you're going to be doing numerical work with Python), you should not loop over lists but rather use numpy arrays: this will make everything faster and simpler. With such changes, your script might look like
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def lq(x, a, b):
return -(a*x+b*x*x)
x = np.array([0,1.778,2.921,3.302,6.317,9.524,10.54])
y = np.array([1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041])
(a,b), pcov = curve_fit(lq, x, np.log(y), p0=[0.05,0.05])
xmodel = np.arange(20) # Note: use numpy!
ymodel = np.exp(lq(xmodel, a, b)) # Note: take exp() as inverse log()
fig, ax1 = plt.subplots()
ax1.set_yscale('log')
ax1.plot(x, y, "ro", label="Experiment")
ax1.plot(xmodel,ymodel, "r--", label="Model")
plt.show()
Note that the model function is changed to just be the ax+bx^2 you wanted to write in the first place and that this is now fitting np.log(y), not y. This will give a much more satisfying fit at the smaller y values.
You might also find lmfit (https://lmfit.github.io/lmfit-py/) helpful for this problem (disclaimer: I am a lead author). With this, your fit script could become
from lmfit import Model
model = Model(lq)
params = model.make_params(a=0.05, b=0.05)
result = model.fit(np.log(y), params, x=x)
print(result.fit_report())
xmodel = np.arange(20)
ymodel = np.exp(result.eval(x=xmodel))
plt.plot(x, y, "ro", label="Experiment")
plt.plot(xmodel, ymodel, "r--", label="Model")
plt.yscale('log')
plt.legend()
plt.show()
This will print out a report including fit statistics and interpretable uncertainties and correlations between variables:
[[Model]]
Model(lq)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 7
# data points = 7
# variables = 2
chi-square = 0.16149397
reduced chi-square = 0.03229879
Akaike info crit = -22.3843833
Bayesian info crit = -22.4925630
[[Variables]]
a: -0.05212688 +/- 0.04406602 (84.54%) (init = 0.05)
b: 0.05274458 +/- 0.00479056 (9.08%) (init = 0.05)
[[Correlations]] (unreported correlations are < 0.100)
C(a, b) = -0.968
and give a plot of
Note that lmfit Parameters can be fixed or bounded and that lmfit comes with many built-in models.
Finally, if you were to include a constant term in the quadratic model, you would not really need an iterative method but could use polynomial regression, as with numpy.polyfit.
Here is a graphical Python fitter using your data with a Gompertz type of sigmoidal equation. This code uses scipy's Differential Evolution genetic algorithm module to determine initial parameter estimates for scipy's non-linear curve_fit() routine. That scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, requiring bounds within which to search. In this example, I made all of the parameter search bounds from -2.0 to 2.0, and that seems to work in this case. Note that it is much easier to provide ranges for the initial parameter estimates than specific values, and those parameter ranges can be generous.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
#x and y are from experiment
x=[0,1.778,2.921,3.302,6.317,9.524,10.54]
y=[1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041]
# alias data to match previous example code
xData = numpy.array(x, dtype=float)
yData = numpy.array(y, dtype=float)
def func(x, a, b, c): # Sigmoidal Gompertz C from zunzun.com
return a * numpy.exp(b * numpy.exp(c*x))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([-2.0, 2.0]) # search bounds for a
parameterBounds.append([-2.0, 2.0]) # search bounds for b
parameterBounds.append([-2.0, 2.0]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# plot wuth log Y axis scaling
plt.yscale('log')
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Error in Scipy curve fit for more than two parameters

I am quite new to Scipy. I have a data file (https://www.dropbox.com/s/mwz8s2kap2mnwo0/data.dat?dl=0) and want to fit the function aexp(bx^c). The problem is when I give manually the value of c (say c = 0.75), the code works perfectly, but if I want to find the 'a', 'b' and 'c' from the fit, the code does not work and producing a flat line. Sorry if the problem is too silly. The code reads as:
import numpy as np
from scipy.optimize import curve_fit
import sys
import matplotlib.pyplot as plt
import math as math
filename = sys.argv[1]
data = np.loadtxt(filename)
x = np.array(data[:,0])
y = np.array(data[:,1])
def func(x, a, b, c):
return a*np.exp(b*x**c)
params = curve_fit(func, x, y)
[a, b, c] = params[0]
perr = np.sqrt(np.diag(params[1]))
x_new = []
y_new = []
for i in np.linspace(1.00003e-05, 0.10303175629999914, num=1000):
j = func(i, a, b, c)
x_new.append(i)
y_new.append(j)
x1 = np.array(x_new)
y1 = np.array(y_new)
print ("a = ", a, "error = ", perr[0], "error % = ", (perr[0]/a)*100, '\t' "b = ", b, "error = ", perr[1], "error % = ", (perr[1]/b)*100), '\t' "c = ", c, "error = ", perr[2], "error % = ", (perr[2]/c)*100,
#np.savetxt('fit.dat', np.c_[x1, y1])
plt.plot(x, y, label='data')
plt.plot(x1, y1, label = 'a*np.exp(b*x**c)')
plt.xlabel('Time(s)')
plt.ylabel('SRO')
plt.legend()
plt.show()
Exponential equations can be quite sensitive to the non-linear solver's initial parameter estimates. By default, many non-linear solvers - including scipy's curve_fit - use default initial parameter values of 1.0 for these initial parameter estimates if none are supplied, and in this particular case those values were not good initial estimates for your combination of data and equation. Scipy does include a genetic algorithm which can be used to determine the initial parameter estimates, and their implementation requires bounds within which to search. Here is an example graphical solver using the scipy differential_evolution genetic algorithm module for this purpose, note the ranges that I have used for the genetic algorithm to search within. It is much easier to give ranges for the parameters in this way rather than explicit values, though this is not always true it worked here. You will need to change the file path that I used to load the data.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
filename = '/home/zunzun/Downloads/data.dat'
data = numpy.loadtxt(filename)
xData = numpy.array(data[:,0])
yData = numpy.array(data[:,1])
def func(x, a, b, c):
return a*numpy.exp(b*x**c)
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minData = min(minX, minY)
maxData = min(maxX, maxY)
parameterBounds = []
parameterBounds.append([-maxData * 10.0, maxData * 10.0]) # search bounds for a
parameterBounds.append([-maxData * 10.0, maxData * 10.0]) # search bounds for b
parameterBounds.append([-maxData * 10.0, maxData * 10.0]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

curve_fit with polynomials of variable length

I'm new to python (and programming in general) and want to make a polynomial fit using curve_fit, where the order of the polynomials (or the number of fit parameters) is variable.
I made this code which is working for a fixed number of 3 parameters a,b,c
# fit function
def fit_func(x, a,b,c):
p = np.polyval([a,b,c], x)
return p
# do the fitting
popt, pcov = curve_fit(fit_func, x_data, y_data)
But now I'd like to have my fit function to only depend on a number N of parameters instead of a,b,c,....
I'm guessing that's not a very hard thing to do, but because of my limited knowledge I can't get it work.
I've already looked at this question, but I wasn't able to apply it to my problem.
You can define the function to be fit to your data like this:
def fit_func(x, *coeffs):
y = np.polyval(coeffs, x)
return y
Then, when you call curve_fit, set the argument p0 to the initial guess of the polynomial coefficients. For example, this plot is generated by the script that follows.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# Generate a sample input dataset for the demonstration.
x = np.arange(12)
y = np.cos(0.4*x)
def fit_func(x, *coeffs):
y = np.polyval(coeffs, x)
return y
fit_results = []
for n in range(2, 6):
# The initial guess of the parameters to be found by curve_fit.
# Warning: in general, an array of ones might not be a good enough
# guess for `curve_fit`, but in this example, it works.
p0 = np.ones(n)
popt, pcov = curve_fit(fit_func, x, y, p0=p0)
# XXX Should check pcov here, but in this example, curve_fit converges.
fit_results.append(popt)
plt.plot(x, y, 'k.', label='data')
xx = np.linspace(x.min(), x.max(), 100)
for p in fit_results:
yy = fit_func(xx, *p)
plt.plot(xx, yy, alpha=0.6, label='n = %d' % len(p))
plt.legend(framealpha=1, shadow=True)
plt.grid(True)
plt.xlabel('x')
plt.show()
The parameters of polyval specify p is an array of coefficients from the highest to lowest. With x being a number or array of numbers to evaluate the polynomial at. It says, the following.
If p is of length N, this function returns the value:
p[0]*x**(N-1) + p[1]*x**(N-2) + ... + p[N-2]*x + p[N-1]
def fit_func(p,x):
z = np.polyval(p,x)
return z
e.g.
t= np.array([3,4,5,3])
y = fit_func(t,5)
503
which is if you do the math here is right.

Resources