Is there any equivalent of hyperopts lognormal in Optuna? - python-3.x

I am trying to use Optuna for hyperparameter tuning of my model.
I am stuck in a place where I want to define a search space having lognormal/normal distribution. It is possible in hyperopt using hp.lognormal. Is it possible to define such a space using a combination of the existing suggest_ api of Optuna?

You could perhaps make use of inverse transforms from suggest_float(..., 0, 1) (i.e. U(0, 1)) since Optuna currently doesn't provide suggest_ variants for those two distributions directly. This example might be a starting point https://gist.github.com/hvy/4ef02ee2945fe50718c71953e1d6381d
Please find the code below
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy.special import erfcinv
import optuna
def objective(trial):
# Suggest from U(0, 1) with Optuna.
x = trial.suggest_float("x", 0, 1)
# Inverse transform into normal.
y0 = norm.ppf(x, loc=0, scale=1)
# Inverse transform into lognormal.
y1 = np.exp(-np.sqrt(2) * erfcinv(2 * x))
return y0, y1
if __name__ == "__main__":
n_objectives = 2 # Normal and lognormal.
study = optuna.create_study(
sampler=optuna.samplers.RandomSampler(),
# Could be "maximize". Does not matter for this demonstration.
directions=["minimize"] * n_objectives,
)
study.optimize(objective, n_trials=10000)
fig, axs = plt.subplots(n_objectives)
for i in range(n_objectives):
axs[i].hist(list(t.values[i] for t in study.trials), bins=100)
plt.show()

Related

Converting Normal Distribution to Lognormal distribution

I have been following lectures of MIT open course on Application of Mathematics in Finance. I am trying to code out the concepts for better understanding.
According to lectures(from what I understand), if random variable X is normally distributed then exp(X) is log-normally distributed and vice versa (please correct me if I am wrong here).
Here is what I tried:
I have list of integers that are normally distributed:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
mu = np.mean(X)
sigma = np.std(X)
count, bins, ignored = plt.hist(X, 35, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2)
),linewidth=2, color='r')
plt.show()
Output:
Normally distributed curve
Now I want to get log-normal distribution from above data, here is what I have tried:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
ln = []
for x in X:
val = np.e**x
ln.append(val)
X_ln = np.array(ln)
X_ln = np.array(X_ln) / np.min(X_ln)
mu = np.mean(X_ln)
sigma = np.std(X_ln)
count, bins, ignored = plt.hist(X_ln, 10, density=True)
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, color='r', linewidth=2)
plt.show()
Output :
Not so clean Output
I know there is a better way to do this, but I can't figure out how. Any suggestions would be highly appreciated.
Here are couple of references:
Log normal distribution in Python
MIT lecture notes(topic-1.1)
In case this is relevant, here is a list of elements I am trying to process:
List of elements
Update 1:
I have normalized X before adding values to ln. This fixed the distribution of histogram, however, I can't seem to fix to get red line to show log-normal distribution. Also the new histogram distribution is not very different from normal distribution. I can't think of any suitable reason for that.
This is the block of code I have added:
def normalize(v):
norm=np.linalg.norm(v, ord=1)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
X = np.array(l)
X = normalize(X)
New Output:
Slightly better result

What is the best way to find a function to closely approximate this data?

I am working with Python and linear regression, but can't seem to find a way to generate an accurate function. The following graph was generated from a 1000 element list of values.
I have tried Skicit-learn, but I can't get it to actually learn and improve the estimate.
Ideally, the function will closely mirror the graph. The graph itself is blatantly sinusoidal, so I imagine that this might be straightforward.
here is an example for the RandomForestRegressor It's based on a tutorial I did to learn, so intellectual property might belong to somebody else. If anybody knows the proper reference, please comment/edit!
I think this fits your data - however I'd like to add that this creates/trains a random forest model, not a function in the sense of a physical description of the process that generates the data.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
rng = np.random.RandomState(42)
x = 10 * rng.rand(200)
def model(x, sigma=0.3):
fast_oscillation = np.sin(5 * x)
slow_oscillation = np.sin(0.5 * x)
noise = sigma * rng.randn(len(x))
return slow_oscillation + fast_oscillation + noise
y = model(x)
forest = RandomForestRegressor(200)
forest.fit(x[:, None], y)
xfit = np.linspace(0, 10, 1000)
yfit = forest.predict(xfit[:, None])
ytrue = model(xfit, sigma=0)
plt.errorbar(x, y, 0.3, fmt='o', alpha=0.6)
plt.plot(xfit, yfit, '-r')
plt.plot(xfit, ytrue, '-k', alpha=0.5)

How do I calculate the global efficiency of graph in igraph (python)?

I am trying to calculate the global efficiency of a graph in igraph but I am not sure if I using the module correctly. I think there is a solution that might make a bit of sense but it is in r, and I wasn't able to decipher what they were saying.
I have tried writing the code in a networkx fashion trying to emulate the way they calculate global efficiency but I have been unsuccessful thus far. I am using igraph due to the fact that I am dealing with large graphs. Any help would be really appreciated :D
This is what I have tried:
import igraph
import pandas as pd
import numpy as np
from itertools import permutations
datasafe = pd.read_csv("b1.csv", index_col=0)
D = datasafe.values
g = igraph.Graph.Adjacency((D > 0).tolist())
g.es['weight'] = D[D.nonzero()]
def efficiency_weighted(g):
weights = g.es["weight"][:]
eff = (1.0 / np.array(g.shortest_paths_dijkstra(weights=weights)))
return eff
def global_efficiecny_weighted(g):
n=180.0
denom=n*(n-1)
g_eff = sum(efficiency_weighted(g) for u, v in permutations(g, 2))
return g_eff
global_efficiecny_weighted(g)
The error message I am getting says:- TypeError: 'Graph' object is not iterable
Assuming that you want the nodal efficiency for all nodes, then you can do this:
import numpy as np
from igraph import *
np.seterr(divide='ignore')
# Example using a random graph with 20 nodes
g = Graph.Erdos_Renyi(20,0.5)
# Assign weights on the edges. Here 1s everywhere
g.es["weight"] = np.ones(g.ecount())
def nodal_eff(g):
weights = g.es["weight"][:]
sp = (1.0 / np.array(g.shortest_paths_dijkstra(weights=weights)))
np.fill_diagonal(sp,0)
N=sp.shape[0]
ne= (1.0/(N-1)) * np.apply_along_axis(sum,0,sp)
return ne
eff = nodal_eff(g)
print(eff)
#[0.68421053 0.81578947 0.73684211 0.76315789 0.76315789 0.71052632
# 0.81578947 0.81578947 0.81578947 0.73684211 0.71052632 0.68421053
# 0.71052632 0.81578947 0.84210526 0.76315789 0.68421053 0.68421053
# 0.78947368 0.76315789]
To get the global just do:
np.mean(eff)

Solving the Lorentz model using Runge Kutta 4th Order in Python without a package

I wish to solve the Lorentz model in Python without the help of a package and my codes seems not to work to my expectation. I do not know why I am not getting the expected results and Lorentz attractor. The main problem I guess is related to how to store the various values for the solution of x,y and z respectively.Below are my codes for the Runge-Kutta 45 for the Lorentz model with 3D plot of solutions:
import numpy as np
import matplotlib.pyplot as plt
#from scipy.integrate import odeint
#a) Defining the Runge-Kutta45 method
def fx(x,y,z,t):
dxdt=sigma*(y-z)
return dxdt
def fy(x,y,z,t):
dydt=x*(rho-z)-y
return dydt
def fz(x,y,z,t):
dzdt=x*y-beta*z
return dzdt
def RungeKutta45(x,y,z,fx,fy,fz,t,h):
k1x,k1y,k1z=h*fx(x,y,z,t),h*fy(x,y,z,t),h*fz(x,y,z,t)
k2x,k2y,k2z=h*fx(x+k1x/2,y+k1y/2,z+k1z/2,t+h/2),h*fy(x+k1x/2,y+k1y/2,z+k1z/2,t+h/2),h*fz(x+k1x/2,y+k1y/2,z+k1z/2,t+h/2)
k3x,k3y,k3z=h*fx(x+k2x/2,y+k2y/2,z+k2z/2,t+h/2),h*fy(x+k2x/2,y+k2y/2,z+k2z/2,t+h/2),h*fz(x+k2x/2,y+k2y/2,z+k2z/2,t+h/2)
k4x,k4y,k4z=h*fx(x+k3x,y+k3y,z+k3z,t+h),h*fy(x+k3x,y+k3y,z+k3z,t+h),h*fz(x+k3x,y+k3y,z+k3z,t+h)
return x+(k1x+2*k2x+2*k3x+k4x)/6,y+(k1y+2*k2y+2*k3y+k4y)/6,z+(k1z+2*k2z+2*k3z+k4z)/6
sigma=10.
beta=8./3.
rho=28.
tIn=0.
tFin=10.
h=0.05
totalSteps=int(np.floor((tFin-tIn)/h))
t=np.zeros(totalSteps)
x=np.zeros(totalSteps)
y=np.zeros(totalSteps)
z=np.zeros(totalSteps)
for i in range(1, totalSteps):
x[i-1]=1. #Initial condition
y[i-1]=1. #Initial condition
z[i-1]=1. #Initial condition
t[0]=0. #Starting value of t
t[i]=t[i-1]+h
x,y,z=RungeKutta45(x,y,z,fx,fy,fz,t[i-1],h)
#Plotting solution
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
fig=plt.figure()
ax=fig.gca(projection='3d')
ax.plot(x,y,z,'r',label='Lorentz 3D Solution')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.legend()
I changed the integration step (btw., classical 4th order Runge-Kutta, not any adaptive RK45) to use the python core concept of lists and list operations extensively to reduce the number of places where the computation is defined. There were no errors there to correct, but I think the algorithm itself is more concentrated.
You had an error in the system that changed it into a system that rapidly diverges. You had fx = sigma*(y-z) while the Lorenz system has fx = sigma*(y-x).
Next your main loop has some strange assignments. In every loop you first set the previous coordinates to the initial conditions and then replace the full arrays with the RK step applied to the full arrays. I replaced that completely, there are no small steps to a correct solution.
import numpy as np
import matplotlib.pyplot as plt
#from scipy.integrate import odeint
def fx(x,y,z,t): return sigma*(y-x)
def fy(x,y,z,t): return x*(rho-z)-y
def fz(x,y,z,t): return x*y-beta*z
#a) Defining the classical Runge-Kutta 4th order method
def RungeKutta4(x,y,z,fx,fy,fz,t,h):
k1x,k1y,k1z = ( h*f(x,y,z,t) for f in (fx,fy,fz) )
xs, ys,zs,ts = ( r+0.5*kr for r,kr in zip((x,y,z,t),(k1x,k1y,k1z,h)) )
k2x,k2y,k2z = ( h*f(xs,ys,zs,ts) for f in (fx,fy,fz) )
xs, ys,zs,ts = ( r+0.5*kr for r,kr in zip((x,y,z,t),(k2x,k2y,k2z,h)) )
k3x,k3y,k3z = ( h*f(xs,ys,zs,ts) for f in (fx,fy,fz) )
xs, ys,zs,ts = ( r+kr for r,kr in zip((x,y,z,t),(k3x,k3y,k3z,h)) )
k4x,k4y,k4z =( h*f(xs,ys,zs,ts) for f in (fx,fy,fz) )
return (r+(k1r+2*k2r+2*k3r+k4r)/6 for r,k1r,k2r,k3r,k4r in
zip((x,y,z),(k1x,k1y,k1z),(k2x,k2y,k2z),(k3x,k3y,k3z),(k4x,k4y,k4z)))
sigma=10.
beta=8./3.
rho=28.
tIn=0.
tFin=10.
h=0.01
totalSteps=int(np.floor((tFin-tIn)/h))
t = totalSteps * [0.0]
x = totalSteps * [0.0]
y = totalSteps * [0.0]
z = totalSteps * [0.0]
x[0],y[0],z[0],t[0] = 1., 1., 1., 0. #Initial condition
for i in range(1, totalSteps):
x[i],y[i],z[i] = RungeKutta4(x[i-1],y[i-1],z[i-1], fx,fy,fz, t[i-1], h)
Using tFin = 40 and h=0.01 I get the image
looking like the typical image of the Lorenz attractor.

Using horizontal line to fit the model

I am writing a python code using horizontal line for investigating the under-fiting using the function sin(2.pi.x) in range of [0,1].
I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1.
import matplotlib.pyplot as plt
import numpy as np
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
I need to fit the model using horizontal line and display it. But I don't know how to do next.
Could you help me figure out this problem? I'd appreciate about it.
Assuming that you want to use the least squares loss function, by definition you are trying to find the value of yhat minimizing np.sum((y-yhat)**2). Differentiating by yhat, you'll find that the minimum is achieved at yhat = np.sum(y)/N, which is of course nothing but y.mean(), as also already pointed out by #ImportanceOfBeingErnest in the comments.
plt.scatter(X, y)
plt.plot(X, np.zeros(N) + np.mean(y))
From what I understand you're generating a noisy Sine wave and trying to fit a horizontal line?
import os
import fnmatch
import numpy as np
import matplotlib.pyplot as plt
# generate N random points
N=60
X= np.linspace(0.0,2*np.pi, num=N)
noise = 0.1 * np.random.randn(N)
y= np.sin(4*X) + noise
numer = sum([xi*yi for xi,yi in zip(X, y)]) - N * np.mean(X) * np.mean(y)
denum = sum([xi**2 for xi in X]) - N * np.mean(X)**2
b = numer / denum
A = np.mean(y) - b * np.mean(X)
y_ = b * X+ A
plt.plot(X,y)
plt.plot(X,y_)
plt.show()

Resources