I want to make a few tools to help to learn and teach basic statistic. One of them aims to help visualise z-score probability table:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
import scipy.stats as st
def draw_z_score(x, cond, mean, std, title,color='b'):
y = st.norm.pdf(x, mean, std)
z = x[cond]
plt.plot(x, y)
plt.ylim(ymin=0)
plt.xlim(xmin=-4.5, xmax=4.5)
plt.fill_between(z, 0, st.norm.pdf(z, mean, std),color=color)
plt.title(title)
plt.tight_layout()
plt.show()
def z_table_probabilty (z_score, z_score2=None, area='l'):
normal = np.arange(-3.9, 3.9, 0.1)
if area == 'l':
Pz = round(st.norm.cdf(z_score), 4)
draw_z_score(normal,normal<z_score,0,1,f'z = {z_score} P(z)={Pz}')
elif area == 'r':
Pz = round(1 - st.norm.cdf(z_score), 4)
draw_z_score(normal,normal>z_score,0,1,f'Z ={z_score} P(1-z)={Pz}',color='r')
elif area == 'tt' and z_score2 != None:
z2 = max(z_score, z_score2)
z = min(z_score, z_score2)
Pz = round(st.norm.cdf(z2) - st.norm.cdf(z), 4)
draw_z_score(normal,(normal<z2)&(normal>z),0, 1, f'z= {z} i z\'= {z2} P(z\'-z)={Pz}', color='y')
Now, when I try:
z_table_probabilty(-0.9)
I have:
z-score=-0.9
Could someone tell my why z-score -0.9 is equal 1 on my plot?, and why the distances between x=4 and end of distribution tail, and x=-4 and end of other tail are different? The whole plot seems to be slightly moved.
What have I done wrong?
Thanks
MV
normal = np.arange(-3.9, 3.9, 0.1) creates an array from -3.9 to 3.8. The end point is not included. Hence you see the curve start at -3.9 and end at 3.8.
With normal<z_score you choose all the points in normal which are smaller than z_score. When z_score=-0.9, those points are from -3.9 to -1.0 because -0.9 is not smaller than -0.9.
In total I would recommend defining normal a bit more dense. This would avoid the two problems. E.g.
normal = np.linspace(-3.9, 3.9, 391)
to create points in steps of 0.02 instead of 0.1.
Related
I have some coordinates of a 3D point curve through which I lay a spline like so:
from splipy import curve_factory
pts = [...] #3D coordinate points
curve = curve_factory.curve(pts)
I know that I can get a point in 3D along the curve by evaluating it after a certain length:
point_on_curve = curve.evaluate(t)
print(point_on_curve) #outputs coordinates: (x y z)
Is it however somehow possible to do it the other way round? Is there a function/method that can tell me if a certain point is part of the curve? Or if its almost part of the curve? Something like:
curve.func(point) #output: True
or
curve.func(point) #output: distance to curve 0.0001 --> also part of curve
Thanks!
I've found this script by ventusff that performs an optimization to find the value of the parameter that you call t (in the script is u) which gives the point on the spline closest to the external point.
I report below the code with some changes to make it clearer for you. I've defined a tolerance equal to 0.001.
The selection of the optimization solver and of its parameter values requires a little bit of study. I do not have enough time now for doing that, but you can try to experiment a little bit.
In this case SciPy is used for spline generation and evaluation, but you can easily replace it with splipy. The optimization is the interesting part performed using SciPy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev
from scipy.spatial.distance import euclidean
from scipy.optimize import fmin_bfgs
points_count = 40
phi = np.linspace(0, 2. * np.pi, points_count)
k = np.linspace(0, 2, points_count)
r = 0.5 + np.cos(phi)
x, y, z = r * np.cos(phi), r * np.sin(phi), k
tck, u = splprep([x, y, z], s=1)
points = splev(u, tck)
idx = np.random.randint(low=0, high=40)
noise = np.random.normal(scale=0.01)
external_point = np.array([points[0][idx], points[1][idx], points[2][idx]]) + noise
def distance_to_point(u_):
s = splev(u_, tck)
return euclidean(external_point, [s[0][0], s[1][0], s[2][0]])
closest_u = fmin_bfgs(distance_to_point, x0=np.array([0.0]), gtol=1e-8)
closest_point = splev(closest_u, tck)
tol = 1e-3
if euclidean(external_point, [closest_point[0][0], closest_point[1][0], closest_point[2][0]]) < tol:
print("The point is very close to the spline.")
ax = plt.figure().add_subplot(projection='3d')
ax.plot(points[0], points[1], points[2], "r-", label="Spline")
ax.plot(external_point[0], external_point[1], external_point[2], "bo", label="External Point")
ax.plot(closest_point[0], closest_point[1], closest_point[2], "go", label="Closest Point")
plt.legend()
plt.show()
The script draws the plot below:
and prints the following output:
Current function value: 0.000941
Iterations: 5
Function evaluations: 75
Gradient evaluations: 32
The point is very close to the spline.
I'm using the below code to draw the ECC curve y^2+x^3+x^2 =0
import numpy as np
import matplotlib.pyplot as plt
import math
def main():
fig = plt.figure()
ax = fig.add_subplot(111)
y, x = np.ogrid[-2:2:1000j, -2:2:1000j]
ax.contour(x.ravel(), y.ravel(), pow(y, 2) + pow(x, 3) + pow(x, 2) , [0],colors='red')
ax.grid()
plt.show()
if __name__ == '__main__':
main()
The output is
The expected image, however, is this
As we can see, the isolated point at (0,0) is not drawn. Any suggestions to solve this issue?
As already mentioned in the comment, it seems that a single point is not displayed as a contour. The best solution would be if the application indicates such points in some way by itself. Perhaps the library allows this, but I have not found a way and therefore show two workarounds here:
Option 1:
The isolated point at (0,0) could be marked explicitly:
ax.plot(0, 0, color="red", marker = "o", markersize = 2.5, zorder = 10)
In the case of multiple points, a masked array is a good choice, here.
Option 2:
The plot can be slightly varied around z = 0, e.g. z = 0.0002:
z = pow(y,2) + pow(x, 2) + pow(x, 3)
ax.contour(x.ravel(), y.ravel(), z, [0.0002], colors='red', zorder=10)
This will move the whole plot. Alternatively, the area around the isolated point alone could be shifted (by adding a second contour call with a small x,y grid around the isolated point at (0,0)). This does not change the rest.
I would like to create a version of this 2D binned "color map" with smoothed colors.
I am not even sure this would be the correct nomenclature for the plot, but, essentially, I want my figure to be color coded by the median values of a third variable for points that reside in each defined bin of my (X, Y) space.
Even though I am able to accomplish that to a certain degree (see example), I would like to find a way to create a version of the same plot with a smoothed color gradient. That would allow me to visualize the overall behavior of my distribution.
I tried ideas described here: Smoothing 2D map in python
and here: Python: binned_statistic_2d mean calculation ignoring NaNs in data
as well as links therein, but could not find a clear solution to the problem.
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
plt.imshow(ret.statistic.T, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')
Which produces the following image (from random data):
Therefore, the simple question would be: how to smooth these colors?
Thanks in advance!
PS: sorry for excessive coding, but I believe a clear visualization is crucial for this particular problem.
Thanks to everyone who viewed this issue and tried to help!
I ended up being able to solve my own problem. In the end, it was all about image smoothing with Gaussian Kernel.
This link: Gaussian filtering a image with Nan in Python gave me the insight for the solution.
I, basically, implemented the exactly same code, but, in the end, mapped the previously known NaN pixels from the original 2D array to the resulting smoothed version. Unlike the solution from the link, my version does NOT fill NaN pixels with some value derived from the pixels around. Or, it does, but then I erase those again.
Here is the final figure produced for the example I provided:
Final code, for reference, for those who might need in the future:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import scipy.stats as st
import scipy.ndimage
import scipy as sp
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
sigma=1 # standard deviation for Gaussian kernel
truncate=5.0 # truncate filter at this many sigmas
U = ret.statistic.T.copy()
V=U.copy()
V[np.isnan(U)]=0
VV=sp.ndimage.gaussian_filter(V,sigma=sigma)
W=0*U.copy()+1
W[np.isnan(U)]=0
WW=sp.ndimage.gaussian_filter(W,sigma=sigma)
np.seterr(divide='ignore', invalid='ignore')
Z=VV/WW
for i in range(len(Z)):
for j in range(len(Z[0])):
if np.isnan(U[i][j]):
Z[i][j] = np.nan
plt.imshow(Z, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')
I am trying to find the locations (i.e., the x-value) of minimum, start of season, peak growing season, maximum growth, senescence, end of season, minimum (i.e., inflection points) in a vegetation curve. I am using a normal curve here as an example. I did come across few codes to find the change in slope and 1st/2nd order derivative, but not able to implement them for my case. Please direct me if there is any relevant example and your help is appreciated. Thanks!
## Version 2 code
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# Slice the group in 3
def group_in_threes(slicable):
for i in range(len(slicable)-2):
yield slicable[i:i+3]
# Locate the change in slope
def turns(L):
for index, three in enumerate(group_in_threes(L)):
if (three[0] > three[1] < three[2]) or (three[0] < three[1] > three[2]):
yield index + 1
# 1st inflection point estimation
dy = np.diff(y, n=1) # first derivative
idx_max_dy = np.argmax(dy)
ix = list(turns(dy))
print(ix)
# All inflection point estimation
dy2 = np.diff(dy, n=2) # Second derivative?
idx_max_dy2 = np.argmax(dy2)
ix2 = list(turns(dy2))
print(ix2)
# Graph
plt.plot(x, y)
#plt.plot(x[ix], y[ix], 'or', label='estimated inflection point')
plt.plot(x[ix2], y[ix2], 'or', label='estimated inflection point - 2')
plt.xlabel('x'); plt.ylabel('y'); plt.legend(loc='best');
Here is a very simple and not robust method to find the inflection point of a non-noisy curve:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x_min = 0.0
x_max = 16.0
mean = 8
std = 2
x = np.linspace(x_min, x_max, 100)
y = norm.pdf(x, mean, std)
# 1st inflection point estimation
dy = np.diff(y) # first derivative
idx_max_dy = np.argmax(dy)
# Graph
plt.plot(x, y)
plt.plot(x[idx_max_dy], y[idx_max_dy], 'or', label='estimated inflection point')
plt.xlabel('x'); plt.ylabel('y'); plt.legend();
The actual position of the inflection point is x1 = mean - std for a Gaussian curve.
For this to work with real data, they have to be smoothed before looking for the max, by applying for example a simple moving average, a gaussian filter or a Savitzky-Golay filter which can directly output the second derivative... the choice of the right filter depends on the data
I am trying to find the value of the point of the minima(local and global doesn't matter as there is only one minima in the first graph) in the first graph.How to do that.Point of minima is marked in red.
First graph is the smoothened out version of the second graph to prevent the issue of local minimas.
I obtained graph using following steps-
import cv2
from matplotlib import pyplot as plt
green = cv2.imread('5.tiff',1)
a = cv2.calcHist([green],[0],None,[256],[0,256])
blurs = cv2.GaussianBlur(a,(13,13),0)
plt.subplot(2,1,1)
plt.plot(blurs)
plt.subplot(2,1,2)
plt.plot(a)
Basically you can define a local minima as a point where you can go neither left or right without increasing your value. Let me demonstrate this with the help of cos() graph
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(1000)
y = np.cos(x * np.pi / 180)
plt.plot(x, y)
The values are stored in the y variable. At every index (except first and last), just check the 2 neighboring values, if both values are bigger then you are currently at a local minima. Here is the code :
local_min = []
for i in range(1, len(y)-1):
if y[i-1] >= y[i] and y[i] <= y[i+1]:
local_min.append(i)
print(local_min)
Output:
[180, 540, 900]