Using horizontal line to fit the model - python-3.x

I am writing a python code using horizontal line for investigating the under-fiting using the function sin(2.pi.x) in range of [0,1].
I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1.
import matplotlib.pyplot as plt
import numpy as np
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
I need to fit the model using horizontal line and display it. But I don't know how to do next.
Could you help me figure out this problem? I'd appreciate about it.

Assuming that you want to use the least squares loss function, by definition you are trying to find the value of yhat minimizing np.sum((y-yhat)**2). Differentiating by yhat, you'll find that the minimum is achieved at yhat = np.sum(y)/N, which is of course nothing but y.mean(), as also already pointed out by #ImportanceOfBeingErnest in the comments.
plt.scatter(X, y)
plt.plot(X, np.zeros(N) + np.mean(y))

From what I understand you're generating a noisy Sine wave and trying to fit a horizontal line?
import os
import fnmatch
import numpy as np
import matplotlib.pyplot as plt
# generate N random points
N=60
X= np.linspace(0.0,2*np.pi, num=N)
noise = 0.1 * np.random.randn(N)
y= np.sin(4*X) + noise
numer = sum([xi*yi for xi,yi in zip(X, y)]) - N * np.mean(X) * np.mean(y)
denum = sum([xi**2 for xi in X]) - N * np.mean(X)**2
b = numer / denum
A = np.mean(y) - b * np.mean(X)
y_ = b * X+ A
plt.plot(X,y)
plt.plot(X,y_)
plt.show()

Related

Find out if point is part of curve (spline, splipy)

I have some coordinates of a 3D point curve through which I lay a spline like so:
from splipy import curve_factory
pts = [...] #3D coordinate points
curve = curve_factory.curve(pts)
I know that I can get a point in 3D along the curve by evaluating it after a certain length:
point_on_curve = curve.evaluate(t)
print(point_on_curve) #outputs coordinates: (x y z)
Is it however somehow possible to do it the other way round? Is there a function/method that can tell me if a certain point is part of the curve? Or if its almost part of the curve? Something like:
curve.func(point) #output: True
or
curve.func(point) #output: distance to curve 0.0001 --> also part of curve
Thanks!
I've found this script by ventusff that performs an optimization to find the value of the parameter that you call t (in the script is u) which gives the point on the spline closest to the external point.
I report below the code with some changes to make it clearer for you. I've defined a tolerance equal to 0.001.
The selection of the optimization solver and of its parameter values requires a little bit of study. I do not have enough time now for doing that, but you can try to experiment a little bit.
In this case SciPy is used for spline generation and evaluation, but you can easily replace it with splipy. The optimization is the interesting part performed using SciPy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev
from scipy.spatial.distance import euclidean
from scipy.optimize import fmin_bfgs
points_count = 40
phi = np.linspace(0, 2. * np.pi, points_count)
k = np.linspace(0, 2, points_count)
r = 0.5 + np.cos(phi)
x, y, z = r * np.cos(phi), r * np.sin(phi), k
tck, u = splprep([x, y, z], s=1)
points = splev(u, tck)
idx = np.random.randint(low=0, high=40)
noise = np.random.normal(scale=0.01)
external_point = np.array([points[0][idx], points[1][idx], points[2][idx]]) + noise
def distance_to_point(u_):
s = splev(u_, tck)
return euclidean(external_point, [s[0][0], s[1][0], s[2][0]])
closest_u = fmin_bfgs(distance_to_point, x0=np.array([0.0]), gtol=1e-8)
closest_point = splev(closest_u, tck)
tol = 1e-3
if euclidean(external_point, [closest_point[0][0], closest_point[1][0], closest_point[2][0]]) < tol:
print("The point is very close to the spline.")
ax = plt.figure().add_subplot(projection='3d')
ax.plot(points[0], points[1], points[2], "r-", label="Spline")
ax.plot(external_point[0], external_point[1], external_point[2], "bo", label="External Point")
ax.plot(closest_point[0], closest_point[1], closest_point[2], "go", label="Closest Point")
plt.legend()
plt.show()
The script draws the plot below:
and prints the following output:
Current function value: 0.000941
Iterations: 5
Function evaluations: 75
Gradient evaluations: 32
The point is very close to the spline.

Converting Normal Distribution to Lognormal distribution

I have been following lectures of MIT open course on Application of Mathematics in Finance. I am trying to code out the concepts for better understanding.
According to lectures(from what I understand), if random variable X is normally distributed then exp(X) is log-normally distributed and vice versa (please correct me if I am wrong here).
Here is what I tried:
I have list of integers that are normally distributed:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
mu = np.mean(X)
sigma = np.std(X)
count, bins, ignored = plt.hist(X, 35, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2)
),linewidth=2, color='r')
plt.show()
Output:
Normally distributed curve
Now I want to get log-normal distribution from above data, here is what I have tried:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
ln = []
for x in X:
val = np.e**x
ln.append(val)
X_ln = np.array(ln)
X_ln = np.array(X_ln) / np.min(X_ln)
mu = np.mean(X_ln)
sigma = np.std(X_ln)
count, bins, ignored = plt.hist(X_ln, 10, density=True)
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, color='r', linewidth=2)
plt.show()
Output :
Not so clean Output
I know there is a better way to do this, but I can't figure out how. Any suggestions would be highly appreciated.
Here are couple of references:
Log normal distribution in Python
MIT lecture notes(topic-1.1)
In case this is relevant, here is a list of elements I am trying to process:
List of elements
Update 1:
I have normalized X before adding values to ln. This fixed the distribution of histogram, however, I can't seem to fix to get red line to show log-normal distribution. Also the new histogram distribution is not very different from normal distribution. I can't think of any suitable reason for that.
This is the block of code I have added:
def normalize(v):
norm=np.linalg.norm(v, ord=1)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
X = np.array(l)
X = normalize(X)
New Output:
Slightly better result

Is there any equivalent of hyperopts lognormal in Optuna?

I am trying to use Optuna for hyperparameter tuning of my model.
I am stuck in a place where I want to define a search space having lognormal/normal distribution. It is possible in hyperopt using hp.lognormal. Is it possible to define such a space using a combination of the existing suggest_ api of Optuna?
You could perhaps make use of inverse transforms from suggest_float(..., 0, 1) (i.e. U(0, 1)) since Optuna currently doesn't provide suggest_ variants for those two distributions directly. This example might be a starting point https://gist.github.com/hvy/4ef02ee2945fe50718c71953e1d6381d
Please find the code below
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy.special import erfcinv
import optuna
def objective(trial):
# Suggest from U(0, 1) with Optuna.
x = trial.suggest_float("x", 0, 1)
# Inverse transform into normal.
y0 = norm.ppf(x, loc=0, scale=1)
# Inverse transform into lognormal.
y1 = np.exp(-np.sqrt(2) * erfcinv(2 * x))
return y0, y1
if __name__ == "__main__":
n_objectives = 2 # Normal and lognormal.
study = optuna.create_study(
sampler=optuna.samplers.RandomSampler(),
# Could be "maximize". Does not matter for this demonstration.
directions=["minimize"] * n_objectives,
)
study.optimize(objective, n_trials=10000)
fig, axs = plt.subplots(n_objectives)
for i in range(n_objectives):
axs[i].hist(list(t.values[i] for t in study.trials), bins=100)
plt.show()

Find all positive-going zero-crossings in a large quasi-periodic array

I need to find zero-crossings in a 1D array of a roughly periodic function. It will be the points where an orbiting satellite crosses the Earth's equator going north.
I've worked out a simple solution based on finding points where one value is zero or negative and the next is positive, then using a quadratic or cubic interpolator with scipy.optimize.brentq to find the nearby zeros.
The interpolator does not go beyond cubic, and before I learn to use a better interpolator I'd first like to check if there already exists a fast method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09).
Question: So I'm asking does there already exist a faster method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09) than the way I've done it here?
The plot shows the errors between the interpolated zeros and the actual value of the function, the smaller line is the cubic interpolation, the larger is quadratic.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.optimize import brentq
def f(x):
return np.sin(x + np.sin(x*e)/e) # roughly periodic function
halfpi, pi, twopi = [f*np.pi for f in (0.5, 1, 2)]
e = np.exp(1)
x = np.arange(0, 10000, 0.1)
y = np.sin(x + np.sin(x*e)/e)
crossings = np.where((y[1:] > 0) * (y[:-1] <= 0))[0]
Qd = interp1d(x, y, kind='quadratic', assume_sorted=True)
Cu = interp1d(x, y, kind='cubic', assume_sorted=True)
x0sQd = [brentq(Qd, x[i-1], x[i+1]) for i in crossings[1:-1]]
x0sCu = [brentq(Cu, x[i-1], x[i+1]) for i in crossings[1:-1]]
y0sQd = [f(x0) for x0 in x0sQd]
y0sCu = [f(x0) for x0 in x0sCu]
if True:
plt.figure()
plt.plot(x0sQd, y0sQd)
plt.plot(x0sCu, y0sCu)
plt.show()

Polar plot in Matplotlib by mapping into Cartesian coordinate

I have a variable (P) which is a function of angle (theta):
In this equation the K is a constant, theta_p is equal to zero and I is the modified Bessel function of the first kind (order 0) which is defined as:
Now, I want to plot the P versus theta for different values of constant K. First I calculated the parameter I and then plug it into the first equation to calculate P for different angles theta. I mapped it into a Cartesian coordinate by putting :
x = P*cos(theta)
y = P*sin(theta)
Here is my python implementation using matplotlib and scipy when the constant k=2.0:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
def integrand(x, a, k):
return a*np.exp(k*np.cos(x))
theta = (np.arange(0, 362, 2))
theta_p = 0.0
X = []
Y = []
for i in range(len(theta)):
a = (1 / np.pi)
k = 2.0
Bessel = quad(integrand, 0, np.pi, args=(a, k))
I = list(Bessel)[0]
P = (1 / (np.pi * I)) * np.exp(k * np.cos(2 * (theta[i]*np.pi/180. - theta_p)))
x = P*np.cos(theta[i]*np.pi/180.)
y = P*np.sin(theta[i]*np.pi/180.)
X.append(x)
Y.append(y)
plt.plot(X,Y, linestyle='-', linewidth=3, color='red')
axes = plt.gca()
plt.show()
I should get a set of graphs like the below figure for different K values:
(Note that the distributions were plotted on a circle of unit 1 to ease visualization)
However it seems like the graphs produced by the above code are not similar to the above figure.
Any idea what is the issue with the above implementation?
Thanks in advance for your help.
Here is how it looks like (for k=2):
The reference for these formulas are the equation 5 and 6 that you could find here
You had a mistake in your formula.
Your formula gives the delta of your function above a unit circle. So in your function to get the plot you want, simply add 1 to it.
Here is what you want, with some tidied up python. ...note you can do the whole calculation of the 'P' values as a numpy vector line, you don't need to loop over the indicies. ...also you can just do a polar plot directly in matplotlib - you don't need to transform it into cartesian.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
theta = np.arange(0, 2*np.pi+0.1, 2*np.pi/100)
def integrand(x, a, k):
return a*np.exp(k*np.cos(x))
for k in np.arange(0, 5, 0.5):
a = (1 / np.pi)
Bessel = quad(integrand, 0, np.pi, args=(a, k))
I = Bessel[0]
P = 1 + (1/(np.pi * I)) * np.exp(k * np.cos(2 * theta))
plt.polar(theta, P)
plt.show()

Resources