Find all positive-going zero-crossings in a large quasi-periodic array - python-3.x

I need to find zero-crossings in a 1D array of a roughly periodic function. It will be the points where an orbiting satellite crosses the Earth's equator going north.
I've worked out a simple solution based on finding points where one value is zero or negative and the next is positive, then using a quadratic or cubic interpolator with scipy.optimize.brentq to find the nearby zeros.
The interpolator does not go beyond cubic, and before I learn to use a better interpolator I'd first like to check if there already exists a fast method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09).
Question: So I'm asking does there already exist a faster method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09) than the way I've done it here?
The plot shows the errors between the interpolated zeros and the actual value of the function, the smaller line is the cubic interpolation, the larger is quadratic.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.optimize import brentq
def f(x):
return np.sin(x + np.sin(x*e)/e) # roughly periodic function
halfpi, pi, twopi = [f*np.pi for f in (0.5, 1, 2)]
e = np.exp(1)
x = np.arange(0, 10000, 0.1)
y = np.sin(x + np.sin(x*e)/e)
crossings = np.where((y[1:] > 0) * (y[:-1] <= 0))[0]
Qd = interp1d(x, y, kind='quadratic', assume_sorted=True)
Cu = interp1d(x, y, kind='cubic', assume_sorted=True)
x0sQd = [brentq(Qd, x[i-1], x[i+1]) for i in crossings[1:-1]]
x0sCu = [brentq(Cu, x[i-1], x[i+1]) for i in crossings[1:-1]]
y0sQd = [f(x0) for x0 in x0sQd]
y0sCu = [f(x0) for x0 in x0sCu]
if True:
plt.figure()
plt.plot(x0sQd, y0sQd)
plt.plot(x0sCu, y0sCu)
plt.show()

Related

Find out if point is part of curve (spline, splipy)

I have some coordinates of a 3D point curve through which I lay a spline like so:
from splipy import curve_factory
pts = [...] #3D coordinate points
curve = curve_factory.curve(pts)
I know that I can get a point in 3D along the curve by evaluating it after a certain length:
point_on_curve = curve.evaluate(t)
print(point_on_curve) #outputs coordinates: (x y z)
Is it however somehow possible to do it the other way round? Is there a function/method that can tell me if a certain point is part of the curve? Or if its almost part of the curve? Something like:
curve.func(point) #output: True
or
curve.func(point) #output: distance to curve 0.0001 --> also part of curve
Thanks!
I've found this script by ventusff that performs an optimization to find the value of the parameter that you call t (in the script is u) which gives the point on the spline closest to the external point.
I report below the code with some changes to make it clearer for you. I've defined a tolerance equal to 0.001.
The selection of the optimization solver and of its parameter values requires a little bit of study. I do not have enough time now for doing that, but you can try to experiment a little bit.
In this case SciPy is used for spline generation and evaluation, but you can easily replace it with splipy. The optimization is the interesting part performed using SciPy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev
from scipy.spatial.distance import euclidean
from scipy.optimize import fmin_bfgs
points_count = 40
phi = np.linspace(0, 2. * np.pi, points_count)
k = np.linspace(0, 2, points_count)
r = 0.5 + np.cos(phi)
x, y, z = r * np.cos(phi), r * np.sin(phi), k
tck, u = splprep([x, y, z], s=1)
points = splev(u, tck)
idx = np.random.randint(low=0, high=40)
noise = np.random.normal(scale=0.01)
external_point = np.array([points[0][idx], points[1][idx], points[2][idx]]) + noise
def distance_to_point(u_):
s = splev(u_, tck)
return euclidean(external_point, [s[0][0], s[1][0], s[2][0]])
closest_u = fmin_bfgs(distance_to_point, x0=np.array([0.0]), gtol=1e-8)
closest_point = splev(closest_u, tck)
tol = 1e-3
if euclidean(external_point, [closest_point[0][0], closest_point[1][0], closest_point[2][0]]) < tol:
print("The point is very close to the spline.")
ax = plt.figure().add_subplot(projection='3d')
ax.plot(points[0], points[1], points[2], "r-", label="Spline")
ax.plot(external_point[0], external_point[1], external_point[2], "bo", label="External Point")
ax.plot(closest_point[0], closest_point[1], closest_point[2], "go", label="Closest Point")
plt.legend()
plt.show()
The script draws the plot below:
and prints the following output:
Current function value: 0.000941
Iterations: 5
Function evaluations: 75
Gradient evaluations: 32
The point is very close to the spline.

Simple line plot in python is rounding values to integers. Why?

I'm trying to add python to my repertoire (R is my program of choice) and am having an issue with a simple line plot.
While the generated array (in this case, y) is of float type (which I want), when I plot a simple line plot using matplotlib, that same y is no truncated to the nearest whole integer.
Any help would be appreciated.
Thanks. Here's sample code.
P.S. Any hints as to cleaning up the code would also be more than welcome.
import sys
import numpy as np
from numpy import random
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as matplotlib
plt.style.use('ggplot')
greens = np.array([0,0])
others = np.array(np.arange(1,37))
# no axis provided, array elements will be flattened
roulette = np.append(greens, others)
spins1000 = np.array(random.choice(roulette, size=(1000)))
# Create function for cum mean in python
def cum_mean(arr):
cum_sum = np.cumsum(arr, axis=0)
for i in range(cum_sum.shape[0]):
if i == 0:
continue
print(cum_sum[i] / (i + 1))
cum_sum[i] = cum_sum[i] / (i + 1)
return cum_sum
y = np.array(cum_mean(spins1000))
x = np.array(np.arange(1,1001))
fig, ax = plt.subplots(figsize=(10, 6))
ax.set(xlim=(0, 1000), ylim=(10.00, 25.00))
line = ax.plot(x, y, color='red', lw=1)[0]
plt.draw()
plt.show()
There are two things happening, which in combination cause the strange behavior.
cum_sum = np.cumsum(arr, axis=0) with arr being an array of integers, make cum_sum also an array of integers
in the loop, writing cum_sum[i] = cum_sum[i] / (i + 1) stores the result (which is a float) into an integer array; this storing rounds the number
A solution would either be to create cum_sum as float (as in cum_sum = np.cumsum(arr, dtype=float)). Or to do things "the numpy way", and create a new array in one go: return cum_sum / np.arange(1, cum_sum.shape[0] + 1). Note that numpy's array operations are vectorized, so dividing an array by an array gets the same result as dividing element by element. This runs quite faster (similar to what happens in R).
Also, if you would write cum_sum = cum_sum / np.arange(1, 1001), cum_sum would be a new float array. Only by accessing it element-by-element, the array stays an array of integers. Note that np.arange() already creates a numpy array, so calling np.array again doesn't change it.
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('ggplot')
greens = np.array([0, 0])
others = np.arange(1, 37)
# no axis provided, array elements will be flattened
roulette = np.append(greens, others)
spins1000 = np.array(np.random.choice(roulette, size=(1000)))
# Create function for cum mean in python
def cum_mean(arr):
cum_sum = np.cumsum(arr)
return cum_sum / (np.arange(1, cum_sum.shape[0] + 1))
y = cum_mean(spins1000)
x = np.arange(1, 1001)
fig, ax = plt.subplots(figsize=(10, 6))
ax.set(xlim=(0, 1000), ylim=(10.00, 25.00))
line = ax.plot(x, y, color='red', lw=1)[0]
plt.show()

Python - Find min, max, and inflection points of normal curve

I am trying to find the locations (i.e., the x-value) of minimum, start of season, peak growing season, maximum growth, senescence, end of season, minimum (i.e., inflection points) in a vegetation curve. I am using a normal curve here as an example. I did come across few codes to find the change in slope and 1st/2nd order derivative, but not able to implement them for my case. Please direct me if there is any relevant example and your help is appreciated. Thanks!
## Version 2 code
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# Slice the group in 3
def group_in_threes(slicable):
for i in range(len(slicable)-2):
yield slicable[i:i+3]
# Locate the change in slope
def turns(L):
for index, three in enumerate(group_in_threes(L)):
if (three[0] > three[1] < three[2]) or (three[0] < three[1] > three[2]):
yield index + 1
# 1st inflection point estimation
dy = np.diff(y, n=1) # first derivative
idx_max_dy = np.argmax(dy)
ix = list(turns(dy))
print(ix)
# All inflection point estimation
dy2 = np.diff(dy, n=2) # Second derivative?
idx_max_dy2 = np.argmax(dy2)
ix2 = list(turns(dy2))
print(ix2)
# Graph
plt.plot(x, y)
#plt.plot(x[ix], y[ix], 'or', label='estimated inflection point')
plt.plot(x[ix2], y[ix2], 'or', label='estimated inflection point - 2')
plt.xlabel('x'); plt.ylabel('y'); plt.legend(loc='best');
Here is a very simple and not robust method to find the inflection point of a non-noisy curve:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# 1st inflection point estimation
dy = np.diff(y) # first derivative
idx_max_dy = np.argmax(dy)
# Graph
plt.plot(x, y)
plt.plot(x[idx_max_dy], y[idx_max_dy], 'or', label='estimated inflection point')
plt.xlabel('x'); plt.ylabel('y'); plt.legend();
The actual position of the inflection point is x1 = mean - std for a Gaussian curve.
For this to work with real data, they have to be smoothed before looking for the max, by applying for example a simple moving average, a gaussian filter or a Savitzky-Golay filter which can directly output the second derivative... the choice of the right filter depends on the data

1-D interpolation using python 3.x

I have a data that looks like a sigmoidal plot but flipped relative to the vertical line.
But the plot is a result of plotting 1D data instead of some sort of function.
My goal is to find the x value when the y value is at 50%. As you can see, there is no data point when y is exactly at 50%.
Interpolate comes to my mind. But I'm not sure if interpolate enable me to find the x value when the y value is 50%. So my question is 1) can you use interpolate to find the x when the y is 50%? or 2)do you need to fit the data to some sort of a function?
Below is what I currently have in my code
import numpy as np
import matplotlib.pyplot as plt
my_x = [4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66]
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172, 0.98974611323163653, 0.961630837657524, 0.99327633558441175, 0.99338952769251909, 0.99428263292577534, 0.98690514212711611, 0.99111667721533181, 0.99149418924880861, 0.99133773062680464, 0.99143506380003499, 0.99151080464011454, 0.99268261743308517, 0.99289757252812316, 0.99100207861144063, 0.99157171773324027, 0.99112571824824358, 0.99031608691035722, 0.98978104266076905, 0.989782674787969, 0.98897835092187614, 0.98517540405423909, 0.98308943666187076, 0.96081810781994603, 0.85563541881892147, 0.61570811548079107, 0.33076276040577052, 0.14655134838124245, 0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
my_y=my_y_raw/np.max(my_y_raw)
plt.plot(my_x, my_y,color='k', markersize=40)
plt.scatter(my_x,my_y,marker='*',label="myplot", color='k', edgecolor='k', linewidth=1,facecolors='none',s=50)
plt.legend(loc="lower left")
plt.xlim([4,102])
plt.show()
Using SciPy
The most straightforward way to do the interpolation is to use the SciPy interpolate.interp1d function. SciPy is closely related to NumPy and you may already have it installed. The advantage to interp1d is that it can sort the data for you. This comes at the cost of somewhat funky syntax. In many interpolation functions it is assumed that you are trying to interpolate a y value from an x value. These functions generally need the "x" values to be monotonically increasing. In your case, we swap the normal sense of x and y. The y values have an outlier as #Abhishek Mishra has pointed out. In the case of your data, you are lucky and you can get away with the the leaving the outlier in.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
my_x = [4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,
48,50,52,54,56,58,60,62,64,66]
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172,
0.98974611323163653, 0.961630837657524, 0.99327633558441175,
0.99338952769251909, 0.99428263292577534, 0.98690514212711611,
0.99111667721533181, 0.99149418924880861, 0.99133773062680464,
0.99143506380003499, 0.99151080464011454, 0.99268261743308517,
0.99289757252812316, 0.99100207861144063, 0.99157171773324027,
0.99112571824824358, 0.99031608691035722, 0.98978104266076905,
0.989782674787969, 0.98897835092187614, 0.98517540405423909,
0.98308943666187076, 0.96081810781994603, 0.85563541881892147,
0.61570811548079107, 0.33076276040577052, 0.14655134838124245,
0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
# set assume_sorted to have scipy automatically sort for you
f = interp1d(my_y_raw, my_x, assume_sorted = False)
xnew = f(0.5)
print('interpolated value is ', xnew)
plt.plot(my_x, my_y_raw,'x-', markersize=10)
plt.plot(xnew, 0.5, 'x', color = 'r', markersize=20)
plt.plot((0, xnew), (0.5,0.5), ':')
plt.grid(True)
plt.show()
which gives
interpolated value is 56.81214249272691
Using NumPy
Numpy also has an interp function, but it doesn't do the sort for you. And if you don't sort, you'll be sorry:
Does not check that the x-coordinate sequence xp is increasing. If xp
is not increasing, the results are nonsense.
The only way I could get np.interp to work was to shove the data in to a structured array.
import numpy as np
import matplotlib.pyplot as plt
my_x = np.array([4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,
48,50,52,54,56,58,60,62,64,66], dtype = np.float)
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172,
0.98974611323163653, 0.961630837657524, 0.99327633558441175,
0.99338952769251909, 0.99428263292577534, 0.98690514212711611,
0.99111667721533181, 0.99149418924880861, 0.99133773062680464,
0.99143506380003499, 0.99151080464011454, 0.99268261743308517,
0.99289757252812316, 0.99100207861144063, 0.99157171773324027,
0.99112571824824358, 0.99031608691035722, 0.98978104266076905,
0.989782674787969, 0.98897835092187614, 0.98517540405423909,
0.98308943666187076, 0.96081810781994603, 0.85563541881892147,
0.61570811548079107, 0.33076276040577052, 0.14655134838124245,
0.076853147122142126, 0.035831324928136087, 0.021344669212790181],
dtype = np.float)
dt = np.dtype([('x', np.float), ('y', np.float)])
data = np.zeros( (len(my_x)), dtype = dt)
data['x'] = my_x
data['y'] = my_y_raw
data.sort(order = 'y') # sort data in place by y values
print('numpy interp gives ', np.interp(0.5, data['y'], data['x']))
which gives
numpy interp gives 56.81214249272691
As you said, your data looks like a flipped sigmoidal. Can we make the assumption that your function is a strictly decreasing function? If that is the case, we can try the following methods:
Remove all the points where the data is not strictly decreasing.For example, for your data that point will be near 0.
Use the binary search to find the location where y=0.5 should be put in.
Now you know two (x, y) pairs where your desired y=0.5 should lie.
You can use simple linear interpolation if (x, y) pairs are very close.
Otherwise, you can see what is the approximation of sigmoid near those pairs.
You might not need to fit any functions to your data. Simply find the following two elements:
The largest x for which y<50%
The smallest x for which y>50%
Then use interpolation and find the x*. Below is the code
my_x = np.array([4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66])
my_y=np.array([0.99470977497817203, 0.99434995886145172, 0.98974611323163653, 0.961630837657524, 0.99327633558441175, 0.99338952769251909, 0.99428263292577534, 0.98690514212711611, 0.99111667721533181, 0.99149418924880861, 0.99133773062680464, 0.99143506380003499, 0.99151080464011454, 0.99268261743308517, 0.99289757252812316, 0.99100207861144063, 0.99157171773324027, 0.99112571824824358, 0.99031608691035722, 0.98978104266076905, 0.989782674787969, 0.98897835092187614, 0.98517540405423909, 0.98308943666187076, 0.96081810781994603, 0.85563541881892147, 0.61570811548079107, 0.33076276040577052, 0.14655134838124245, 0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
tempInd1 = my_y<.5 # This will only work if the values are monotonic
x1 = my_x[tempInd1][0]
y1 = my_y[tempInd1][0]
x2 = my_x[~tempInd1][-1]
y2 = my_y[~tempInd1][-1]
scipy.interp(0.5, [y1, y2], [x1, x2])

Using horizontal line to fit the model

I am writing a python code using horizontal line for investigating the under-fiting using the function sin(2.pi.x) in range of [0,1].
I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1.
import matplotlib.pyplot as plt
import numpy as np
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
I need to fit the model using horizontal line and display it. But I don't know how to do next.
Could you help me figure out this problem? I'd appreciate about it.
Assuming that you want to use the least squares loss function, by definition you are trying to find the value of yhat minimizing np.sum((y-yhat)**2). Differentiating by yhat, you'll find that the minimum is achieved at yhat = np.sum(y)/N, which is of course nothing but y.mean(), as also already pointed out by #ImportanceOfBeingErnest in the comments.
plt.scatter(X, y)
plt.plot(X, np.zeros(N) + np.mean(y))
From what I understand you're generating a noisy Sine wave and trying to fit a horizontal line?
import os
import fnmatch
import numpy as np
import matplotlib.pyplot as plt
# generate N random points
N=60
X= np.linspace(0.0,2*np.pi, num=N)
noise = 0.1 * np.random.randn(N)
y= np.sin(4*X) + noise
numer = sum([xi*yi for xi,yi in zip(X, y)]) - N * np.mean(X) * np.mean(y)
denum = sum([xi**2 for xi in X]) - N * np.mean(X)**2
b = numer / denum
A = np.mean(y) - b * np.mean(X)
y_ = b * X+ A
plt.plot(X,y)
plt.plot(X,y_)
plt.show()

Resources