Python - Find min, max, and inflection points of normal curve - python-3.x

I am trying to find the locations (i.e., the x-value) of minimum, start of season, peak growing season, maximum growth, senescence, end of season, minimum (i.e., inflection points) in a vegetation curve. I am using a normal curve here as an example. I did come across few codes to find the change in slope and 1st/2nd order derivative, but not able to implement them for my case. Please direct me if there is any relevant example and your help is appreciated. Thanks!
## Version 2 code
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# Slice the group in 3
def group_in_threes(slicable):
for i in range(len(slicable)-2):
yield slicable[i:i+3]
# Locate the change in slope
def turns(L):
for index, three in enumerate(group_in_threes(L)):
if (three[0] > three[1] < three[2]) or (three[0] < three[1] > three[2]):
yield index + 1
# 1st inflection point estimation
dy = np.diff(y, n=1) # first derivative
idx_max_dy = np.argmax(dy)
ix = list(turns(dy))
print(ix)
# All inflection point estimation
dy2 = np.diff(dy, n=2) # Second derivative?
idx_max_dy2 = np.argmax(dy2)
ix2 = list(turns(dy2))
print(ix2)
# Graph
plt.plot(x, y)
#plt.plot(x[ix], y[ix], 'or', label='estimated inflection point')
plt.plot(x[ix2], y[ix2], 'or', label='estimated inflection point - 2')
plt.xlabel('x'); plt.ylabel('y'); plt.legend(loc='best');

Here is a very simple and not robust method to find the inflection point of a non-noisy curve:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# 1st inflection point estimation
dy = np.diff(y) # first derivative
idx_max_dy = np.argmax(dy)
# Graph
plt.plot(x, y)
plt.plot(x[idx_max_dy], y[idx_max_dy], 'or', label='estimated inflection point')
plt.xlabel('x'); plt.ylabel('y'); plt.legend();
The actual position of the inflection point is x1 = mean - std for a Gaussian curve.
For this to work with real data, they have to be smoothed before looking for the max, by applying for example a simple moving average, a gaussian filter or a Savitzky-Golay filter which can directly output the second derivative... the choice of the right filter depends on the data

Related

Find out if point is part of curve (spline, splipy)

I have some coordinates of a 3D point curve through which I lay a spline like so:
from splipy import curve_factory
pts = [...] #3D coordinate points
curve = curve_factory.curve(pts)
I know that I can get a point in 3D along the curve by evaluating it after a certain length:
point_on_curve = curve.evaluate(t)
print(point_on_curve) #outputs coordinates: (x y z)
Is it however somehow possible to do it the other way round? Is there a function/method that can tell me if a certain point is part of the curve? Or if its almost part of the curve? Something like:
curve.func(point) #output: True
or
curve.func(point) #output: distance to curve 0.0001 --> also part of curve
Thanks!
I've found this script by ventusff that performs an optimization to find the value of the parameter that you call t (in the script is u) which gives the point on the spline closest to the external point.
I report below the code with some changes to make it clearer for you. I've defined a tolerance equal to 0.001.
The selection of the optimization solver and of its parameter values requires a little bit of study. I do not have enough time now for doing that, but you can try to experiment a little bit.
In this case SciPy is used for spline generation and evaluation, but you can easily replace it with splipy. The optimization is the interesting part performed using SciPy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev
from scipy.spatial.distance import euclidean
from scipy.optimize import fmin_bfgs
points_count = 40
phi = np.linspace(0, 2. * np.pi, points_count)
k = np.linspace(0, 2, points_count)
r = 0.5 + np.cos(phi)
x, y, z = r * np.cos(phi), r * np.sin(phi), k
tck, u = splprep([x, y, z], s=1)
points = splev(u, tck)
idx = np.random.randint(low=0, high=40)
noise = np.random.normal(scale=0.01)
external_point = np.array([points[0][idx], points[1][idx], points[2][idx]]) + noise
def distance_to_point(u_):
s = splev(u_, tck)
return euclidean(external_point, [s[0][0], s[1][0], s[2][0]])
closest_u = fmin_bfgs(distance_to_point, x0=np.array([0.0]), gtol=1e-8)
closest_point = splev(closest_u, tck)
tol = 1e-3
if euclidean(external_point, [closest_point[0][0], closest_point[1][0], closest_point[2][0]]) < tol:
print("The point is very close to the spline.")
ax = plt.figure().add_subplot(projection='3d')
ax.plot(points[0], points[1], points[2], "r-", label="Spline")
ax.plot(external_point[0], external_point[1], external_point[2], "bo", label="External Point")
ax.plot(closest_point[0], closest_point[1], closest_point[2], "go", label="Closest Point")
plt.legend()
plt.show()
The script draws the plot below:
and prints the following output:
Current function value: 0.000941
Iterations: 5
Function evaluations: 75
Gradient evaluations: 32
The point is very close to the spline.

Using the linear_model perceptron module from sklearn to separate points

I am trying to use this sklearn module for a binary classification problem and my data is clearly linearly separable.
what I dont understand is why the green area of my plot does not include the five red circles.
.
I have tried to vary the number of iterations parameter(max_iter) from 100 to 10000, however it does not make any difference.
here is my code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Perceptron
def learn_and_display_Perceptron(datafile):
#taking data reading this from the above functions
data = np.loadtxt(datafile)
n,d = data.shape
x = data[:,0:2]
y = data[:,2]
clf = Perceptron(max_iter=10000)
clf.fit(x, y)
sv = np.zeros(n,dtype=bool) ## all False array
notsv = np.logical_not(sv) # all True array
# Determine the x1- and x2- limits of the plot
x1min = min(x[:,0]) - 1
x1max = max(x[:,0]) + 1
x2min = min(x[:,1]) - 1
x2max = max(x[:,1]) + 1
plt.xlim(x1min,x1max)
plt.ylim(x2min,x2max)
# Plot the data points, enlarging those that are support vectors
plt.plot(x[(y==1)*notsv,0], x[(y==1)*notsv,1], 'ro')
plt.plot(x[(y==1)*sv,0], x[(y==1)*sv,1], 'ro', markersize=10)
plt.plot(x[(y==-1)*notsv,0], x[(y==-1)*notsv,1], 'k^')
plt.plot(x[(y==-1)*sv,0], x[(y==-1)*sv,1], 'k^', markersize=10)
# Construct a grid of points and evaluate classifier at each grid points
grid_spacing = 0.05
xx1, xx2 = np.meshgrid(np.arange(x1min, x1max, grid_spacing), np.arange(x2min, x2max, grid_spacing))
grid = np.c_[xx1.ravel(), xx2.ravel()]
Z = clf.predict(grid)
# Quantize the values to -1, -0.5, 0, 0.5, 1 for display purposes
for i in range(len(Z)):
Z[i] = min(Z[i],1.0)
Z[i] = max(Z[i],-1.0)
if (Z[i] > 0.0) and (Z[i] < 1.0):
Z[i] = 0.5
if (Z[i] < 0.0) and (Z[i] > -1.0):
Z[i] = -0.5
# Show boundary and margin using a color plot
Z = Z.reshape(xx1.shape)
plt.pcolormesh(xx1, xx2, Z, cmap=plt.cm.PRGn, vmin=-2, vmax=2, shading='auto')
plt.show()
my datafile, data_1.txt can be found on here, https://github.com/bluetail14/MyCourserapractice/tree/main/Edx
What can I change in my code to adjust the green/purple borderline to include the five red circles?
Nice code. You need to change the eta0 value,
clf = Perceptron(max_iter=10000, eta0=0.1)

Find all positive-going zero-crossings in a large quasi-periodic array

I need to find zero-crossings in a 1D array of a roughly periodic function. It will be the points where an orbiting satellite crosses the Earth's equator going north.
I've worked out a simple solution based on finding points where one value is zero or negative and the next is positive, then using a quadratic or cubic interpolator with scipy.optimize.brentq to find the nearby zeros.
The interpolator does not go beyond cubic, and before I learn to use a better interpolator I'd first like to check if there already exists a fast method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09).
Question: So I'm asking does there already exist a faster method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09) than the way I've done it here?
The plot shows the errors between the interpolated zeros and the actual value of the function, the smaller line is the cubic interpolation, the larger is quadratic.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.optimize import brentq
def f(x):
return np.sin(x + np.sin(x*e)/e) # roughly periodic function
halfpi, pi, twopi = [f*np.pi for f in (0.5, 1, 2)]
e = np.exp(1)
x = np.arange(0, 10000, 0.1)
y = np.sin(x + np.sin(x*e)/e)
crossings = np.where((y[1:] > 0) * (y[:-1] <= 0))[0]
Qd = interp1d(x, y, kind='quadratic', assume_sorted=True)
Cu = interp1d(x, y, kind='cubic', assume_sorted=True)
x0sQd = [brentq(Qd, x[i-1], x[i+1]) for i in crossings[1:-1]]
x0sCu = [brentq(Cu, x[i-1], x[i+1]) for i in crossings[1:-1]]
y0sQd = [f(x0) for x0 in x0sQd]
y0sCu = [f(x0) for x0 in x0sCu]
if True:
plt.figure()
plt.plot(x0sQd, y0sQd)
plt.plot(x0sCu, y0sCu)
plt.show()

Python Scipy Curvefit to Linear Quadratic Curve

I'm trying to fit a linear quadratic model curve to experiment data. The Y axis values reduce from 1 to 10^-5. When I use the following code, the resulting curve often seems to not fit the data at higher X values. I have a suspicion that because the Y values at high X values are so small, the resulting difference between the experiment value and model value is small. But I would like the model curve to pass as close to the higher X value points as possible (even if it means the low values are not as well fitted). I haven't found anything about weighting in scipy.optimize.curve_fit, other than using standard deviations (which I don't have). How can I improve my model fit at high X values?
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def lq(x, a, b):
#y(x) = exp[-(ax+bx²)]
y = []
for i in x:
x2=i**2
ax = a*i
bx2 = b*x2
y.append(np.exp(-(ax+bx2)))
return y
#x and y are from experiment
x=[0,1.778,2.921,3.302,6.317,9.524,10.54]
y=[1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041]
(a,b), pcov = curve_fit(lq, x, y, p0=[0.05,0.05])
#make the model curve using a and b
xmodel = list(range(0,20))
ymodel = lq(xmodel, a, b)
fig, ax1 = plt.subplots()
ax1.set_yscale('log')
ax1.plot(x,y, "ro", label="Experiment")
ax1.plot(xmodel,ymodel, "r--", label="Model")
plt.show()
I agree with your assessment that the fit is not very sensitive to small misfits for the small values of y. Since you are plotting the data and fit on a semi-log plot, I think that what you really want is to fit in the log-space as well. That is, you could fit log(y) to a quadratic function. As an aside (but an important one if you're going to be doing numerical work with Python), you should not loop over lists but rather use numpy arrays: this will make everything faster and simpler. With such changes, your script might look like
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def lq(x, a, b):
return -(a*x+b*x*x)
x = np.array([0,1.778,2.921,3.302,6.317,9.524,10.54])
y = np.array([1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041])
(a,b), pcov = curve_fit(lq, x, np.log(y), p0=[0.05,0.05])
xmodel = np.arange(20) # Note: use numpy!
ymodel = np.exp(lq(xmodel, a, b)) # Note: take exp() as inverse log()
fig, ax1 = plt.subplots()
ax1.set_yscale('log')
ax1.plot(x, y, "ro", label="Experiment")
ax1.plot(xmodel,ymodel, "r--", label="Model")
plt.show()
Note that the model function is changed to just be the ax+bx^2 you wanted to write in the first place and that this is now fitting np.log(y), not y. This will give a much more satisfying fit at the smaller y values.
You might also find lmfit (https://lmfit.github.io/lmfit-py/) helpful for this problem (disclaimer: I am a lead author). With this, your fit script could become
from lmfit import Model
model = Model(lq)
params = model.make_params(a=0.05, b=0.05)
result = model.fit(np.log(y), params, x=x)
print(result.fit_report())
xmodel = np.arange(20)
ymodel = np.exp(result.eval(x=xmodel))
plt.plot(x, y, "ro", label="Experiment")
plt.plot(xmodel, ymodel, "r--", label="Model")
plt.yscale('log')
plt.legend()
plt.show()
This will print out a report including fit statistics and interpretable uncertainties and correlations between variables:
[[Model]]
Model(lq)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 7
# data points = 7
# variables = 2
chi-square = 0.16149397
reduced chi-square = 0.03229879
Akaike info crit = -22.3843833
Bayesian info crit = -22.4925630
[[Variables]]
a: -0.05212688 +/- 0.04406602 (84.54%) (init = 0.05)
b: 0.05274458 +/- 0.00479056 (9.08%) (init = 0.05)
[[Correlations]] (unreported correlations are < 0.100)
C(a, b) = -0.968
and give a plot of
Note that lmfit Parameters can be fixed or bounded and that lmfit comes with many built-in models.
Finally, if you were to include a constant term in the quadratic model, you would not really need an iterative method but could use polynomial regression, as with numpy.polyfit.
Here is a graphical Python fitter using your data with a Gompertz type of sigmoidal equation. This code uses scipy's Differential Evolution genetic algorithm module to determine initial parameter estimates for scipy's non-linear curve_fit() routine. That scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, requiring bounds within which to search. In this example, I made all of the parameter search bounds from -2.0 to 2.0, and that seems to work in this case. Note that it is much easier to provide ranges for the initial parameter estimates than specific values, and those parameter ranges can be generous.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
#x and y are from experiment
x=[0,1.778,2.921,3.302,6.317,9.524,10.54]
y=[1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041]
# alias data to match previous example code
xData = numpy.array(x, dtype=float)
yData = numpy.array(y, dtype=float)
def func(x, a, b, c): # Sigmoidal Gompertz C from zunzun.com
return a * numpy.exp(b * numpy.exp(c*x))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([-2.0, 2.0]) # search bounds for a
parameterBounds.append([-2.0, 2.0]) # search bounds for b
parameterBounds.append([-2.0, 2.0]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# plot wuth log Y axis scaling
plt.yscale('log')
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Using horizontal line to fit the model

I am writing a python code using horizontal line for investigating the under-fiting using the function sin(2.pi.x) in range of [0,1].
I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1.
import matplotlib.pyplot as plt
import numpy as np
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
I need to fit the model using horizontal line and display it. But I don't know how to do next.
Could you help me figure out this problem? I'd appreciate about it.
Assuming that you want to use the least squares loss function, by definition you are trying to find the value of yhat minimizing np.sum((y-yhat)**2). Differentiating by yhat, you'll find that the minimum is achieved at yhat = np.sum(y)/N, which is of course nothing but y.mean(), as also already pointed out by #ImportanceOfBeingErnest in the comments.
plt.scatter(X, y)
plt.plot(X, np.zeros(N) + np.mean(y))
From what I understand you're generating a noisy Sine wave and trying to fit a horizontal line?
import os
import fnmatch
import numpy as np
import matplotlib.pyplot as plt
# generate N random points
N=60
X= np.linspace(0.0,2*np.pi, num=N)
noise = 0.1 * np.random.randn(N)
y= np.sin(4*X) + noise
numer = sum([xi*yi for xi,yi in zip(X, y)]) - N * np.mean(X) * np.mean(y)
denum = sum([xi**2 for xi in X]) - N * np.mean(X)**2
b = numer / denum
A = np.mean(y) - b * np.mean(X)
y_ = b * X+ A
plt.plot(X,y)
plt.plot(X,y_)
plt.show()

Resources