How to get statistics from a histogram? - python-3.x

I'm trying to get the four first order histogram statistics (mean,
variance, skewness and kurtosis) from a histogram.
I have this code that calculates the histogram:
import cv2
from matplotlib import pyplot as plt
img1 = 'img.jpg'
gray_img = cv2.imread(img1, cv2.IMREAD_GRAYSCALE)
plt.hist(gray_img.ravel(),256,[0,256])
plt.title('Histogram for gray scale picture')
plt.show()
How can I get that statistics?

Based on my answer here
def mean_h(val, freq):
return np.average(val, weights = freq)
def var_h(val, freq):
dev = freq * (val - mean_h(val, freq)) ** 2
return dev.sum() / freq.sum()
def moment_h(val, freq, n):
n = (freq * (val - mean_h(val, freq)) ** n).sum() / freq.sum()
d = var_h(val, freq) ** (n / 2)
return n / d
skewness and kurtosis are just the 3rd and 4th moments

If the number of bins is reasonable, you should be able to just count the values manually, put in a vector; and calculate all those moments.

Related

Cost function returns a very large value

I'm trying to do a multi linear regression algorithm . My data has two features (size of the house in the first column, number of bedrooms in the second column) and here is the head of my data set (the third column is the price of the house):
[[2.10400e+03 3.00000e+00 3.99900e+05]
[1.60000e+03 3.00000e+00 3.29900e+05]
[2.40000e+03 3.00000e+00 3.69000e+05]
[1.41600e+03 2.00000e+00 2.32000e+05]
[3.00000e+03 4.00000e+00 5.39900e+05]]
I wrote the following algorithm to compute the cost :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = np.loadtxt('data2.txt', delimiter=',')
y_data = list()
for i in range(len(data)):
y_data.append(data[i][2])
#Convert our list to numpy arrays
y_data = np.asarray(y_data)
x_data = data[:, 0:2]
#applying feature scaling
for i in range(len(x_data)):
#Size of the house
x_data[i][0] = (x_data[i][0] - np.mean(x_data[:, 0])) / np.std(x_data[:, 0])
#Number of rooms
x_data[i][1] = (x_data[i][1] - np.mean(x_data[:, 1])) / np.std(x_data[:, 1])
#Adding a column of ones to our data
x_data = np.c_[np.ones(len(x_data)), x_data]
def cost(x, y, theta):
m = len(x)
predictions = np.arange(97).reshape(97, 1)
for i in range(len(x)):
predictions[i] = (x[i] * theta).sum()
sqrErrors = (np.subtract(predictions, y)) ** 2
return (1 / ( 2 * m)) * sqrErrors.sum()
theta = [[0],
[0],
[0]
]
The cost functions returns a value of 6361101029137.691 and when i run the gradient descent it gets larger. So what's the problem and how can i fix it please? Thanks

Polar plot in Matplotlib by mapping into Cartesian coordinate

I have a variable (P) which is a function of angle (theta):
In this equation the K is a constant, theta_p is equal to zero and I is the modified Bessel function of the first kind (order 0) which is defined as:
Now, I want to plot the P versus theta for different values of constant K. First I calculated the parameter I and then plug it into the first equation to calculate P for different angles theta. I mapped it into a Cartesian coordinate by putting :
x = P*cos(theta)
y = P*sin(theta)
Here is my python implementation using matplotlib and scipy when the constant k=2.0:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
def integrand(x, a, k):
return a*np.exp(k*np.cos(x))
theta = (np.arange(0, 362, 2))
theta_p = 0.0
X = []
Y = []
for i in range(len(theta)):
a = (1 / np.pi)
k = 2.0
Bessel = quad(integrand, 0, np.pi, args=(a, k))
I = list(Bessel)[0]
P = (1 / (np.pi * I)) * np.exp(k * np.cos(2 * (theta[i]*np.pi/180. - theta_p)))
x = P*np.cos(theta[i]*np.pi/180.)
y = P*np.sin(theta[i]*np.pi/180.)
X.append(x)
Y.append(y)
plt.plot(X,Y, linestyle='-', linewidth=3, color='red')
axes = plt.gca()
plt.show()
I should get a set of graphs like the below figure for different K values:
(Note that the distributions were plotted on a circle of unit 1 to ease visualization)
However it seems like the graphs produced by the above code are not similar to the above figure.
Any idea what is the issue with the above implementation?
Thanks in advance for your help.
Here is how it looks like (for k=2):
The reference for these formulas are the equation 5 and 6 that you could find here
You had a mistake in your formula.
Your formula gives the delta of your function above a unit circle. So in your function to get the plot you want, simply add 1 to it.
Here is what you want, with some tidied up python. ...note you can do the whole calculation of the 'P' values as a numpy vector line, you don't need to loop over the indicies. ...also you can just do a polar plot directly in matplotlib - you don't need to transform it into cartesian.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
theta = np.arange(0, 2*np.pi+0.1, 2*np.pi/100)
def integrand(x, a, k):
return a*np.exp(k*np.cos(x))
for k in np.arange(0, 5, 0.5):
a = (1 / np.pi)
Bessel = quad(integrand, 0, np.pi, args=(a, k))
I = Bessel[0]
P = 1 + (1/(np.pi * I)) * np.exp(k * np.cos(2 * theta))
plt.polar(theta, P)
plt.show()

Python - Find min, max, and inflection points of normal curve

I am trying to find the locations (i.e., the x-value) of minimum, start of season, peak growing season, maximum growth, senescence, end of season, minimum (i.e., inflection points) in a vegetation curve. I am using a normal curve here as an example. I did come across few codes to find the change in slope and 1st/2nd order derivative, but not able to implement them for my case. Please direct me if there is any relevant example and your help is appreciated. Thanks!
## Version 2 code
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# Slice the group in 3
def group_in_threes(slicable):
for i in range(len(slicable)-2):
yield slicable[i:i+3]
# Locate the change in slope
def turns(L):
for index, three in enumerate(group_in_threes(L)):
if (three[0] > three[1] < three[2]) or (three[0] < three[1] > three[2]):
yield index + 1
# 1st inflection point estimation
dy = np.diff(y, n=1) # first derivative
idx_max_dy = np.argmax(dy)
ix = list(turns(dy))
print(ix)
# All inflection point estimation
dy2 = np.diff(dy, n=2) # Second derivative?
idx_max_dy2 = np.argmax(dy2)
ix2 = list(turns(dy2))
print(ix2)
# Graph
plt.plot(x, y)
#plt.plot(x[ix], y[ix], 'or', label='estimated inflection point')
plt.plot(x[ix2], y[ix2], 'or', label='estimated inflection point - 2')
plt.xlabel('x'); plt.ylabel('y'); plt.legend(loc='best');
Here is a very simple and not robust method to find the inflection point of a non-noisy curve:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# 1st inflection point estimation
dy = np.diff(y) # first derivative
idx_max_dy = np.argmax(dy)
# Graph
plt.plot(x, y)
plt.plot(x[idx_max_dy], y[idx_max_dy], 'or', label='estimated inflection point')
plt.xlabel('x'); plt.ylabel('y'); plt.legend();
The actual position of the inflection point is x1 = mean - std for a Gaussian curve.
For this to work with real data, they have to be smoothed before looking for the max, by applying for example a simple moving average, a gaussian filter or a Savitzky-Golay filter which can directly output the second derivative... the choice of the right filter depends on the data

Sklearn BIC criterion : differents optimum values of k for clustering

I want to determine the best value of k (number of clusters) for the KMeans algo and a dataset.
I found a ressource in the documentation of Sklearn : The Gaussian Mixture Model Selection using the BIC criterion.
I found an example of code on the site that I adapted to my dataset.
But each run of this code give a different value of optimal value of k . Why ?
Here the code :
import numpy as np
import pandas as pd
import itertools
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn import mixture
print(__doc__)
# Number of samples per component
n_samples = 440
path = 'C:/Users/Lionel/Downloads'
file = 'Wholesale customers data.csv'
data = pd.read_csv(path + '/'+file)
X = np.array(data.iloc[:,2 :])
lowest_bic = np.infty
bic = []
n_components_range = range(1, 12)
cv_types = ['spherical', 'tied', 'diag', 'full']
for cv_type in cv_types:
for n_components in n_components_range:
# Fit a Gaussian mixture with EM
gmm = mixture.GaussianMixture(n_components=n_components,
covariance_type=cv_type)
gmm.fit(X)
bic.append(gmm.bic(X))
if bic[-1] < lowest_bic:
lowest_bic = bic[-1]
best_gmm = gmm
bic = np.array(bic)
color_iter = itertools.cycle(['navy', 'turquoise', 'cornflowerblue',
'darkorange'])
clf = best_gmm
print(clf)
bars = []
# Plot the BIC scores
spl = plt.subplot(2, 1, 1)
#spl = plt.plot()
for i, (cv_type, color) in enumerate(zip(cv_types, color_iter)):
xpos = np.array(n_components_range) + .2 * (i - 2)
bars.append(plt.bar(xpos, bic[i * len(n_components_range):
(i + 1) * len(n_components_range)],
width=.2, color=color))
plt.xticks(n_components_range)
plt.ylim([bic.min() * 1.01 - .01 * bic.max(), bic.max()])
plt.title('BIC score per model')
xpos = np.mod(bic.argmin(), len(n_components_range)) + .65 +\
.2 * np.floor(bic.argmin() / len(n_components_range))
plt.text(xpos, bic.min() * 0.97 + .03 * bic.max(), '*', fontsize=14)
spl.set_xlabel('Number of components')
spl.legend([b[0] for b in bars], cv_types)
# Plot the winner
splot = plt.subplot(2, 1, 2)
Y_ = clf.predict(X)
for i, (mean, cov, color) in enumerate(zip(clf.means_, clf.covariances_,
color_iter)):
v, w = linalg.eigh(cov)
if not np.any(Y_ == i):
continue
plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], .8, color=color)
# Plot an ellipse to show the Gaussian component
angle = np.arctan2(w[0][1], w[0][0])
angle = 180. * angle / np.pi # convert to degrees
v = 2. * np.sqrt(2.) * np.sqrt(v)
ell = mpl.patches.Ellipse(mean, v[0], v[1], 180. + angle, color=color)
ell.set_clip_box(splot.bbox)
ell.set_alpha(.5)
splot.add_artist(ell)
plt.xticks(())
plt.yticks(())
plt.title('Selected GMM: full model, 2 components')
plt.subplots_adjust(hspace=.35, bottom=.02)
plt.show()
Here the link to my dataset :
https://drive.google.com/open?id=1yMw1rMh12ml6Lh3yrL6WDLbEnLM-SmiN
Have you an explanation for this behaviour ?

Trapezoidal wave in Python

How do I generate a trapezoidal wave in Python?
I looked into the modules such as SciPy and NumPy, but in vain. Is there a module such as the scipy.signal.gaussian which returns an array of values representing the Gaussian function wave?
I generated this using the trapezoidal kernel of Astropy,
Trapezoid1DKernel(30,slope=1.0)
. I want to implement this in Python without using Astropy.
While the width and the slope are sufficient to define a triangular signal, you would need a third parameter for a trapezoidal signal: the amplitude.
Using those three parameters, you can easily adjust the scipy.signal.sawtooth function to give you a trapeziodal shape by truncating and offsetting the triangular shaped function.
from scipy import signal
import matplotlib.pyplot as plt
import numpy as np
def trapzoid_signal(t, width=2., slope=1., amp=1., offs=0):
a = slope*width*signal.sawtooth(2*np.pi*t/width, width=0.5)/4.
a[a>amp/2.] = amp/2.
a[a<-amp/2.] = -amp/2.
return a + amp/2. + offs
t = np.linspace(0, 6, 501)
plt.plot(t,trapzoid_signal(t, width=2, slope=2, amp=1.), label="width=2, slope=2, amp=1")
plt.plot(t,trapzoid_signal(t, width=4, slope=1, amp=0.6), label="width=4, slope=1, amp=0.6")
plt.legend( loc=(0.25,1.015))
plt.show()
Note that you may also like to define a phase, depeding on the use case.
In order to define a single pulse, you might want to modify the function a bit and supply an array which ranges over [0,width].
from scipy import signal
import matplotlib.pyplot as plt
import numpy as np
def trapzoid_signal(t, width=2., slope=1., amp=1., offs=0):
a = slope*width*signal.sawtooth(2*np.pi*t/width, width=0.5)/4.
a += slope*width/4.
a[a>amp] = amp
return a + offs
for w,s,a in zip([2,5], [2,1], [1,0.6]):
t = np.linspace(0, w, 501)
l = "width={}, slope={}, amp={}".format(w,s,a)
plt.plot(t,trapzoid_signal(t, width=w, slope=s, amp=a), label=l)
plt.legend( loc="upper right")
plt.show()
From the SciPy website it looks like this isn't included (they currently have sawtooth and square, but not trapezoid). As a generalised version of the C example the following will do what you want,
import numpy as np
import matplotlib.pyplot as plt
def trapezoidalWave(xin, width=1., slope=1.):
x = xin%(4*width)
if (x <= width):
# Ascending line
return x*slope;
elif (x <= 2.*width):
# Top horizontal line
return width*slope
elif (x <= 3.*width):
# Descending line
return 3.*width*slope - x*slope
elif (x <= 4*width):
# Bottom horizontal line
return 0.
x = np.linspace(0.,20,1000)
for i in x:
plt.plot(i, trapezoidalWave(i), 'k.')
plt.plot(i, trapezoidalWave(i, 1.5, 2.), 'r.')
plt.show()
which looks like,
This can be done more elegantly with Heaviside functions which allow you to use NumPy arrays,
import numpy as np
import matplotlib.pyplot as plt
def H(x):
return 0.5 * (np.sign(x) + 1)
def trapWave(xin, width=1., slope=1.):
x = xin%(4*width)
y = ((H(x)-H(x-width))*x*slope +
(H(x-width)-H(x-2.*width))*width*slope +
(H(x-2.*width)-H(x-3.*width))*(3.*width*slope - x*slope))
return y
x = np.linspace(0.,20,1000)
plt.plot(x, trapWave(x))
plt.plot(x, trapWave(x, 1.5, 2.))
plt.show()
For this example, the Heaviside version is about 20 times faster!
The below example shows how to do that to get points and show scope.
Equation based on reply: Equation for trapezoidal wave equation
import math
import numpy as np
import matplotlib.pyplot as plt
def get_wave_point(x, a, m, l, c):
# Equation from: https://stackoverflow.com/questions/11041498/equation-for-trapezoidal-wave-equation
# a/pi(arcsin(sin((pi/m)x+l))+arccos(cos((pi/m)x+l)))-a/2+c
# a is the amplitude
# m is the period
# l is the horizontal transition
# c is the vertical transition
point = a/math.pi*(math.asin(math.sin((math.pi/m)*x+l))+math.acos(math.cos((math.pi/m)*x+l)))-a/2+c
return point
print('Testing wave')
x = np.linspace(0., 10, 1000)
listofpoints = []
for i in x:
plt.plot(i, get_wave_point(i, 5, 2, 50, 20), 'k.')
listofpoints.append(get_wave_point(i, 5, 2, 50, 20))
print('List of points : {} '.format(listofpoints))
plt.show()
The whole credit goes to #ImportanceOfBeingErnest . I am just revising some edits to his code which just made my day.
from scipy import signal
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
def trapzoid_signal(t, width=2., slope=1., amp=1., offs=0):
a = slope*width*signal.sawtooth(2*np.pi*t/width, width=0.5)/4.
a += slope*width/4.
a[a>amp] = amp
return a + offs
for w,s,a in zip([32],[1],[0.0322]):
t = np.linspace(0, w, 34)
plt.plot(t,trapzoid_signal(t, width=w, slope=s, amp=a))
plt.show()
The result:
I'll throw a very late hat into this ring, namely, a function using only numpy that produces a single (symmetric) trapezoid at a desired location, with all the usual parameters. Also posted here
import numpy as np
def trapezoid(x, center=0, slope=1, width=1, height=1, offset=0):
"""
For given array x, returns a (symmetric) trapezoid with plateau at y=h (or -h if
slope is negative), centered at center value of "x".
Note: Negative widths and heights just converted to 0
Parameters
----------
x : array_like
array of x values at which the trapezoid should be evaluated
center : float
x coordinate of the center of the (symmetric) trapezoid
slope : float
slope of the sides of the trapezoid
width : float
width of the plateau of the trapezoid
height : float
(positive) vertical distance between the base and plateau of the trapezoid
offset : array_like
vertical shift (either single value or the same shape as x) to add to y before returning
Returns
-------
y : array_like
y value(s) of trapezoid with above parameters, evaluated at x
"""
# ---------- input checking ----------
if width < 0: width = 0
if height < 0: height = 0
x = np.asarray(x)
slope_negative = slope < 0
slope = np.abs(slope) # Do all calculations with positive slope, invert at end if necessary
# ---------- Calculation ----------
y = np.zeros_like(x)
mask_left = x - center < -width/2.0
mask_right = x - center > width/2.0
y[mask_left] = slope*(x[mask_left] - center + width/2.0)
y[mask_right] = -slope*(x[mask_right] - center - width/2.0)
y += height # Shift plateau up to y=h
y[y < 0] = 0 # cut off below zero (so that trapezoid flattens off at "offset")
if slope_negative: y = -y # invert non-plateau
return y + offset
Which outputs something like
import matplotlib.pyplot as plt
plt.style.use("seaborn-colorblind")
x = np.linspace(-5,5,1000)
for i in range(1,4):
plt.plot(x,trapezoid(x, center=0, slope=1, width=i, height=i, offset = 0), label=f"width = height = {i}\nslope=1")
plt.plot(x,trapezoid(x, center=0, slope=-1, width=2.5, height=1, offset = 0), label=f"width = height = 1.5,\nslope=-1")
plt.ylim((-2.5,3.5))
plt.legend(frameon=False, loc='lower center', ncol=2)
Example output:

Resources