Converting Normal Distribution to Lognormal distribution - python-3.x

I have been following lectures of MIT open course on Application of Mathematics in Finance. I am trying to code out the concepts for better understanding.
According to lectures(from what I understand), if random variable X is normally distributed then exp(X) is log-normally distributed and vice versa (please correct me if I am wrong here).
Here is what I tried:
I have list of integers that are normally distributed:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
mu = np.mean(X)
sigma = np.std(X)
count, bins, ignored = plt.hist(X, 35, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2)
),linewidth=2, color='r')
plt.show()
Output:
Normally distributed curve
Now I want to get log-normal distribution from above data, here is what I have tried:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
ln = []
for x in X:
val = np.e**x
ln.append(val)
X_ln = np.array(ln)
X_ln = np.array(X_ln) / np.min(X_ln)
mu = np.mean(X_ln)
sigma = np.std(X_ln)
count, bins, ignored = plt.hist(X_ln, 10, density=True)
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, color='r', linewidth=2)
plt.show()
Output :
Not so clean Output
I know there is a better way to do this, but I can't figure out how. Any suggestions would be highly appreciated.
Here are couple of references:
Log normal distribution in Python
MIT lecture notes(topic-1.1)
In case this is relevant, here is a list of elements I am trying to process:
List of elements
Update 1:
I have normalized X before adding values to ln. This fixed the distribution of histogram, however, I can't seem to fix to get red line to show log-normal distribution. Also the new histogram distribution is not very different from normal distribution. I can't think of any suitable reason for that.
This is the block of code I have added:
def normalize(v):
norm=np.linalg.norm(v, ord=1)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
X = np.array(l)
X = normalize(X)
New Output:
Slightly better result

Related

Why does matplotlib magnitude_spectrum function seem to show wrong magnitudes?

I created a 1 second long audio sample consiting of two sine waves and then used matplotlibs magnitude spectrum function to plot the spectrum and the results seem to be wrong. The two waves have the exact same amplitude throughout the one second audio sample, and yet the magnitudes are vastly different. This seemed weird to me, so I have also used numpys functions to plot the DFT and the magnitudes are the exact same, as I think they should be. The resulting plots are shown in the image below. Does anyone know, why that might be? Did I do anything wrong in my code? Any help will be much appriciated.
Minimal working example:
import matplotlib.pyplot as plt
import numpy as np
sr = 20000
freq1 = 200
freq2 = 100
duration = 1
x = np.linspace(0, duration, sr * duration)
y = np.concatenate([0.5*np.sin(freq1 * 2 * np.pi * x[:10000]) + 0.5*np.sin(freq2 * 2 * np.pi * x[:10000]), np.sin(freq1 * 2 * np.pi * x[10000:15000]), np.sin(freq2 * 2 * np.pi * x[15000:20000])])
fig, ax = plt.subplots(3, 1, figsize=(12, 10))
ax[0].plot(x, y)
ax[0].axis(xmin=0, xmax=1)
ax[0].set_xlabel('Time [s]')
ax[0].set_ylabel('Amplitude [-]')
ax[1].magnitude_spectrum(y, Fs=sr, color='C1')
ax[1].axis(xmin=0, xmax=500)
ax[1].set_xlabel('Frequency [Hz]')
ax[1].set_ylabel('Magnitude [-]')
ax[2].plot(np.fft.rfftfreq(sr, d=1/sr), np.abs(np.fft.rfft(y, norm='ortho'))/100)
ax[2].axis(xmin=0, xmax=500)
ax[2].set_xlabel('Frequency [Hz]')
ax[2].set_ylabel('Magnitude [-]')
plt.tight_layout()
plt.show()
I think it is related to the window used in the matplotlib. By default, it uses Hanning window, so change to window type to window_none. Also the way the scaling is done is different in both cases. By doing following changes, you will see them both match.
from matplotlib import mlab
ax[1].magnitude_spectrum(y, Fs=sr, color='C1', window=mlab.window_none)
ax[2].plot(np.fft.rfftfreq(sr, d=1/sr), np.abs(np.fft.rfft(y))/sr)
results in

Polar plot in Matplotlib by mapping into Cartesian coordinate

I have a variable (P) which is a function of angle (theta):
In this equation the K is a constant, theta_p is equal to zero and I is the modified Bessel function of the first kind (order 0) which is defined as:
Now, I want to plot the P versus theta for different values of constant K. First I calculated the parameter I and then plug it into the first equation to calculate P for different angles theta. I mapped it into a Cartesian coordinate by putting :
x = P*cos(theta)
y = P*sin(theta)
Here is my python implementation using matplotlib and scipy when the constant k=2.0:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
def integrand(x, a, k):
return a*np.exp(k*np.cos(x))
theta = (np.arange(0, 362, 2))
theta_p = 0.0
X = []
Y = []
for i in range(len(theta)):
a = (1 / np.pi)
k = 2.0
Bessel = quad(integrand, 0, np.pi, args=(a, k))
I = list(Bessel)[0]
P = (1 / (np.pi * I)) * np.exp(k * np.cos(2 * (theta[i]*np.pi/180. - theta_p)))
x = P*np.cos(theta[i]*np.pi/180.)
y = P*np.sin(theta[i]*np.pi/180.)
X.append(x)
Y.append(y)
plt.plot(X,Y, linestyle='-', linewidth=3, color='red')
axes = plt.gca()
plt.show()
I should get a set of graphs like the below figure for different K values:
(Note that the distributions were plotted on a circle of unit 1 to ease visualization)
However it seems like the graphs produced by the above code are not similar to the above figure.
Any idea what is the issue with the above implementation?
Thanks in advance for your help.
Here is how it looks like (for k=2):
The reference for these formulas are the equation 5 and 6 that you could find here
You had a mistake in your formula.
Your formula gives the delta of your function above a unit circle. So in your function to get the plot you want, simply add 1 to it.
Here is what you want, with some tidied up python. ...note you can do the whole calculation of the 'P' values as a numpy vector line, you don't need to loop over the indicies. ...also you can just do a polar plot directly in matplotlib - you don't need to transform it into cartesian.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
theta = np.arange(0, 2*np.pi+0.1, 2*np.pi/100)
def integrand(x, a, k):
return a*np.exp(k*np.cos(x))
for k in np.arange(0, 5, 0.5):
a = (1 / np.pi)
Bessel = quad(integrand, 0, np.pi, args=(a, k))
I = Bessel[0]
P = 1 + (1/(np.pi * I)) * np.exp(k * np.cos(2 * theta))
plt.polar(theta, P)
plt.show()

What is the best way to find a function to closely approximate this data?

I am working with Python and linear regression, but can't seem to find a way to generate an accurate function. The following graph was generated from a 1000 element list of values.
I have tried Skicit-learn, but I can't get it to actually learn and improve the estimate.
Ideally, the function will closely mirror the graph. The graph itself is blatantly sinusoidal, so I imagine that this might be straightforward.
here is an example for the RandomForestRegressor It's based on a tutorial I did to learn, so intellectual property might belong to somebody else. If anybody knows the proper reference, please comment/edit!
I think this fits your data - however I'd like to add that this creates/trains a random forest model, not a function in the sense of a physical description of the process that generates the data.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
rng = np.random.RandomState(42)
x = 10 * rng.rand(200)
def model(x, sigma=0.3):
fast_oscillation = np.sin(5 * x)
slow_oscillation = np.sin(0.5 * x)
noise = sigma * rng.randn(len(x))
return slow_oscillation + fast_oscillation + noise
y = model(x)
forest = RandomForestRegressor(200)
forest.fit(x[:, None], y)
xfit = np.linspace(0, 10, 1000)
yfit = forest.predict(xfit[:, None])
ytrue = model(xfit, sigma=0)
plt.errorbar(x, y, 0.3, fmt='o', alpha=0.6)
plt.plot(xfit, yfit, '-r')
plt.plot(xfit, ytrue, '-k', alpha=0.5)

Python - Graphing normal distribution line with list of data

I'm working on the Electrical Engineering project which requires plotting the normal distribution of the list of data.
We randomly measured the resistance of 30 resistors and wrote down them.
X = [14.95, 14.94, 14.92, 14.98, 16.53, 14.96, 16.20, 14.32, 15.32, 14.25, 15.36, 14.95, 15.13, 14.26, 14.94, 15.6,
15.20, 14.94, 15.02, 15, 14.62, 14.94, 14.94, 14.98, 15.12, 15.06, 14.95, 14.96, 15.13, 15.20]
I want to get graph like this:
But I get the graph like this one:
I have to get more values in the graph where datum is near to mean.
This is the code that I'm using currently:
import numpy as np
from matplotlib import pyplot as plt
import math
X = [14.95, 14.94, 14.92, 14.98, 16.53, 14.96, 16.20, 14.32, 15.32, 14.25, 15.36, 14.95, 15.13, 14.26, 14.94, 15.6,
15.20, 14.94, 15.02, 15, 14.62, 14.94, 14.94, 14.98, 15.12, 15.06, 14.95, 14.96, 15.13, 15.20]
X = np.sort(X)
mean = np.mean(X)
sigma = 0
for i in X:
sigma += np.square(mean - i)
sigma = np.sqrt(sigma / (len(X) - 1))
def func(x):
return np.exp(np.square(x - mean) / (2 * np.square(sigma))) / np.sqrt(2 * math.pi * sigma)
Y = []
for i in X:
Y.append(func(i))
plt.plot(X, Y, marker='o', color='b')
plt.show()
Assuming I understood your question properly, which I think that you are just trying to add more data points to generate a normal distribution curve.
mu = np.mean(X)
sigma = np.std(X) #You manually calculated it but you can also use this built-in function
data = np.random.normal(mu, sigma, SIZE_OF_DATA_YOU_NEED)
However, if you're also just trying to form the normal distribution curve, you can't just be plotting each value against its probability density function.
Try
count, bins, ignored = plt.hist(data, 30, normed=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ),linewidth=2, color='r')
plt.show()
Might want to concatenate X against the new data points too.
Hope this help in some way, also attaching a link to numpy.random.normal() in case it helps in some kind of way (https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.random.normal.html).

Using horizontal line to fit the model

I am writing a python code using horizontal line for investigating the under-fiting using the function sin(2.pi.x) in range of [0,1].
I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1.
import matplotlib.pyplot as plt
import numpy as np
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
I need to fit the model using horizontal line and display it. But I don't know how to do next.
Could you help me figure out this problem? I'd appreciate about it.
Assuming that you want to use the least squares loss function, by definition you are trying to find the value of yhat minimizing np.sum((y-yhat)**2). Differentiating by yhat, you'll find that the minimum is achieved at yhat = np.sum(y)/N, which is of course nothing but y.mean(), as also already pointed out by #ImportanceOfBeingErnest in the comments.
plt.scatter(X, y)
plt.plot(X, np.zeros(N) + np.mean(y))
From what I understand you're generating a noisy Sine wave and trying to fit a horizontal line?
import os
import fnmatch
import numpy as np
import matplotlib.pyplot as plt
# generate N random points
N=60
X= np.linspace(0.0,2*np.pi, num=N)
noise = 0.1 * np.random.randn(N)
y= np.sin(4*X) + noise
numer = sum([xi*yi for xi,yi in zip(X, y)]) - N * np.mean(X) * np.mean(y)
denum = sum([xi**2 for xi in X]) - N * np.mean(X)**2
b = numer / denum
A = np.mean(y) - b * np.mean(X)
y_ = b * X+ A
plt.plot(X,y)
plt.plot(X,y_)
plt.show()

Resources