I want to fit a monotonically increasing smooth spline function for a dataset
Code:
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
x = [0., 0.75, 1.8, 2.25, 3.75, 4.5, 6.45, 6.75, 7.5, 8.325, 10.875, 11.25, 12.525, 12.75, 15., 20.85, 21.]
y = [2.83811035, 2.81541896, 3.14311655, 3.22373554, 3.43033456, 3.50433385, 3.66794514, 3.462296, 3.59480959,
3.56250726, 3.6209845, 3.63034523, 3.68238915, 3.69096892, 3.75560395, 3.83545191, 3.90419498]
plt.plot(x, y, '*')
plt.show()
f = interp1d(x, y, kind='cubic')
yinp = f(x)
plt.plot(x, yinp)
plt.show()
The current fit looks like the above. I would like to know how to fit a monotonically increasing spline function.
I found an example in r posted here How to make monotonic (increasing) smooth spline with smooth.spline() function?. I am not sure what's the appropriate function in the scipy library.
Suggestions will be really helpful.
EDIT:
I'm looking for something like the below (ref.)
EDIT 2:
I'm now using a smoothing spline function scipy.interpolate.UnivariateSpline
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
x = np.array([0., 0.75, 1.8, 2.25, 3.75, 4.5, 6.45, 6.75,
7.5, 8.325, 10.875, 11.25, 12.525, 12.75, 15.,
20.85, 21.])
y = np.array([2.83811035, 2.81541896, 3.14311655,
3.22373554, 3.43033456, 3.50433385,
3.66794514, 3.462296, 3.59480959,
3.56250726, 3.6209845, 3.63034523,
3.68238915, 3.69096892, 3.75560395,
3.83545191, 3.90419498])
spl = UnivariateSpline(x, y, s=0.05)
xs = np.linspace(x.min(), x.max(), 100)
plt.plot(x, y, 'ro', ms=5)
plt.plot(xs, spl(xs), 'cyan', lw=1)
plt.show()
gives the following result
I could get the coeffs and knots of the spline using
print(spl.get_coeffs())
print(spl.get_knots())
k = 3
tck = splrep(xmean, ymean, k=k, s=0.09)
knots = tck[0]
coeffs = tck[1]
print('knot points=', knots)
print('coefficients=', coeffs)
But I am not sure how to use the coefficients and manually generate the function of the spline curve. Could someone please add a bit more detail to this?
For example, when we have 4 data points
x = [0., 0.75, 1.8, 2.25]
y = [2.83811035, 2.81541896, 3.14311655, 3.22373554]
I would like to print the piecewise polynomial function to understand how the spline function looks like.
Use splrep from scipy.interpolate and manually tweak its s parameter which controls the amount of smoothing.
EDIT: The definition of the spline function that splrep returns, in terms of knots and coefficients, is equivalent to scipy.interpolate.BSpline, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BSpline.html
Related
I am trying to do a linear interpolation
Code:
from scipy.interpolate import interp1d
x_ = [2., 2.8, 16.7]
y_ = [0.87678869, 0.49044994, 1.4322675]
y_std = []
plt.plot(x_, y_, marker='o', color='k')
f = interp1d(x_, y_, kind='linear', bounds_error=False)
plt.plot(x_, f(x_), marker='o', color='tab:green')
plt.show()
I expected a linear line, but the result appears to be a piecewise fit.
Could someone please suggest what's going wrong?
Interpolate just fills in the content between the points, and therefore it must hit every single point in your original dataset.
What you're looking for is to run a linear regression:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
Aim of the task: I have sets of coordinates (X and Y) coordinates of the geometry and I want to make my geometry aligned. The coordinate and respective geometry is shown in the picture.
X1_coordinate = [0.0, 0.87, 1.37, 1.87, 2.73, 3.6, 4.46, 4.96, 5.46, 4.6, 3.73, 2.87, 2.0, 1.5, 1.0, 0.5, 2.37, 3.23, 4.1]
Y1_coordinate = [0.0, 0.5, -0.37, -1.23, -0.73, -0.23, 0.27, -0.6, -1.46, -1.96, -2.46, -2.96, -3.46, -2.6, -1.73, -0.87, -2.1, -1.6, -1.1]
Question: Can I apply Principal Component Analysis on 2D geometries to make it aligned such that its principal axis will be parallel to the reference axis (X and Y)?
Expected output: I want my geometry like this. This is just an example. I want my geometry in such as way that, principal axis of geometry lies on the reference axis or be parallel to reference axis.
What I tried: I tried below code to implement PCA and to obtain the geometry aligned.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.preprocessing import StandardScaler
plt.style.use('ggplot')
# Load the data
# iris = datasets.load_iris()
X = X1_coordinate
y = Y1_coordinate
# Z-score the features
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
# The PCA model
pca = PCA(n_components=2) # estimate only 2 PCs
X_new = pca.fit_transform(X) # project the original data into the PCA space
However, after running the code, I got error as mentioned below.
Kindly let me know what should I do to make my geometry aligned. Looking forward to get answers.
Basically, you can apply PCA to this task.
import sklearn
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
circle_pts = sklearn.datasets.make_circles() #get two circles with labels
circle_pts = circle_pts[0][circle_pts[1]==0] #leave only one circle
ang = 63/180*np.pi #radians of rotation
R = np.array([[np.cos(ang), -np.sin(ang)], [np.sin(ang), np.cos(ang)]])
ellipse_pts = circle_pts
ellipse_pts[:,0] *= 4.5
ellipse_rot_pts = ellipse_pts # R.T
plt.figure()
plt.scatter(ellipse_rot_pts[:,0], ellipse_rot_pts[:,1])
plt.axis("equal")
plt.tight_layout()
plt.show()
scaler = StandardScaler(with_std=False)
scaler.fit(ellipse_rot_pts)
X = scaler.transform(ellipse_rot_pts)
pca = PCA(n_components=2) # estimate only 2 PCs
X_new = pca.fit_transform(X) # project the original data into the PCA space
plt.figure()
plt.scatter(X[:,0],X[:,1])
singular_values = pca.singular_values_
plt.plot([0, singular_values[0]*pca.components_[0,0]], [0, singular_values[0]*pca.components_[0,1]])
plt.plot([0, singular_values[1]*pca.components_[1,0]], [0, singular_values[1]*pca.components_[1,1]])
plt.axis("equal")
plt.show()
plt.figure()
plt.title("Aligned with axis figure")
plt.scatter(X_new[:,0],X_new[:,1])
plt.axis("equal")
plt.show()
But the problem is that not every geometry is appropriate for this. ellipse has two main axis of symmetry. Your figure for example doesn't. So principal components that are been seeking via maximum variance in data doesn't correspond with your example(expected output) axis alignement.
For example your set of points correspond to this variant of components alignment:
Your geometry
And for a modificated little more symmetric object.
A little more symmetrical figure
Hope i helped
So, here is my code:
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from fitter import Fitter, get_common_distributions
df = pd.read_csv("project3.csv")
bins = [282.33, 594.33, 906.33, 1281.33, 15030.33, 1842.33, 2154.33, 2466.33, 2778.33, 3090.33, 3402.33]
#declaring
facecolor = '#EAEAEA'
color_bars = '#3475D0'
txt_color1 = '#252525'
txt_color2 = '#004C74'
fig, ax = plt.subplots(1, figsize=(16, 6), facecolor=facecolor)
ax.set_facecolor(facecolor)
n, bins, patches = plt.hist(df.City1, color=color_bars, bins=10)
#grid
minor_locator = AutoMinorLocator(2)
plt.gca().xaxis.set_minor_locator(minor_locator)
plt.grid(which='minor', color=facecolor, lw = 0.5)
xticks = [(bins[idx+1] + value)/2 for idx, value in enumerate(bins[:-1])]
xticks_labels = [ "{:.0f}-{:.0f}".format(value, bins[idx+1]) for idx, value in enumerate(bins[:-1])]
plt.xticks(xticks, labels=xticks_labels, c=txt_color1, fontsize=13)
#beautify
ax.tick_params(axis='x', which='both',length=0)
plt.yticks([])
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
for idx, value in enumerate(n):
if value > 0:
plt.text(xticks[idx], value+5, int(value), ha='center', fontsize=16, c=txt_color1)
plt.title('Histogram of rainfall in City1\n', loc = 'right', fontsize = 20, c=txt_color1)
plt.xlabel('\nCentimeters of rainfall', c=txt_color2, fontsize=14)
plt.ylabel('Frequency of occurrence', c=txt_color2, fontsize=14)
plt.tight_layout()
#plt.savefig('City1_Raw.png', facecolor=facecolor)
plt.show()
city1 = df['City1'].values
f = Fitter(city1, distributions=get_common_distributions())
f.fit()
fig = f.plot_pdf(names=None, Nbest=4, lw=1, method='sumsquare_error')
plt.show()
print(f.get_best(method = 'sumsquare_error'))
The issue is with the plots it shows. The first histogram it generates is
Next I get another graph with best fitted distributions which is
Then an output statement
{'chi2': {'df': 10.692966790090342, 'loc': 16.690849400411103, 'scale': 118.71595997157786}}
Process finished with exit code 0
I have a couple of questions. Why is chi2, the best fitted distribution not plotted on the graph?
How do I plot these distributions on top of the histograms and not separately? The hist() function in fitter library can do that but there I don't get to control the bins and so I end up getting like 100 bins with some flat looking data.
How do I solve this issue? I need to plot the best fit curve on the histogram that looks like image1. Can I use any other module/package to get the work done in similar way? This uses least squares fit but I am OK with least likelihood or log likelihood too.
Simple way of plotting things on top of each other (using some properties of the Fitter class)
import scipy.stats as st
import matplotlib.pyplot as plt
from fitter import Fitter, get_common_distributions
from scipy import stats
numberofpoints=50000
df = stats.norm.rvs( loc=1090, scale=500, size=numberofpoints)
fig, ax = plt.subplots(1, figsize=(16, 6))
n, bins, patches = ax.hist( df, bins=30, density=True)
f = Fitter(df, distributions=get_common_distributions())
f.fit()
errorlist = sorted(
[
[f._fitted_errors[dist], dist]
for dist in get_common_distributions()
]
)[:4]
for err, dist in errorlist:
ax.plot( f.x, f.fitted_pdf[dist] )
plt.show()
Using the histogram normalization, one would need to play with scaling to generalize again.
I would like to create a version of this 2D binned "color map" with smoothed colors.
I am not even sure this would be the correct nomenclature for the plot, but, essentially, I want my figure to be color coded by the median values of a third variable for points that reside in each defined bin of my (X, Y) space.
Even though I am able to accomplish that to a certain degree (see example), I would like to find a way to create a version of the same plot with a smoothed color gradient. That would allow me to visualize the overall behavior of my distribution.
I tried ideas described here: Smoothing 2D map in python
and here: Python: binned_statistic_2d mean calculation ignoring NaNs in data
as well as links therein, but could not find a clear solution to the problem.
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
plt.imshow(ret.statistic.T, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')
Which produces the following image (from random data):
Therefore, the simple question would be: how to smooth these colors?
Thanks in advance!
PS: sorry for excessive coding, but I believe a clear visualization is crucial for this particular problem.
Thanks to everyone who viewed this issue and tried to help!
I ended up being able to solve my own problem. In the end, it was all about image smoothing with Gaussian Kernel.
This link: Gaussian filtering a image with Nan in Python gave me the insight for the solution.
I, basically, implemented the exactly same code, but, in the end, mapped the previously known NaN pixels from the original 2D array to the resulting smoothed version. Unlike the solution from the link, my version does NOT fill NaN pixels with some value derived from the pixels around. Or, it does, but then I erase those again.
Here is the final figure produced for the example I provided:
Final code, for reference, for those who might need in the future:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import scipy.stats as st
import scipy.ndimage
import scipy as sp
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
sigma=1 # standard deviation for Gaussian kernel
truncate=5.0 # truncate filter at this many sigmas
U = ret.statistic.T.copy()
V=U.copy()
V[np.isnan(U)]=0
VV=sp.ndimage.gaussian_filter(V,sigma=sigma)
W=0*U.copy()+1
W[np.isnan(U)]=0
WW=sp.ndimage.gaussian_filter(W,sigma=sigma)
np.seterr(divide='ignore', invalid='ignore')
Z=VV/WW
for i in range(len(Z)):
for j in range(len(Z[0])):
if np.isnan(U[i][j]):
Z[i][j] = np.nan
plt.imshow(Z, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')
I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.