How to have the best gaussian fit on a histogram plot - python-3.x

I have a histogram and I'm trying to fit the best norm(Gaussian) function as you can see below. the problem is that the gaussian fit isn't the best fit that I expected.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
from astropy.modeling import models, fitting
bins=np.arange(-1,8,0.3)
#Reading data
a18 = np.loadtxt('AndXII18I.srt')
arr18 = np.array(a18[:,11])
axs[0,0].hist(arr18,bins,histtype='step')
axs[0,0].set_xlim([np.min(arr18), np.max(arr18)])
x = np.linspace(-1, bins[len(bins)-2],len(bins)-1)
x1 = np.linspace(-1, 8, 1000)
# guesses for the parameters:
g_init = models.Gaussian1D(1, 0, 1.)
fit_g = fitting.LevMarLSQFitter()
axs[0,0].plot(x1,t18)
axs[0,0].plot(edges18[8],hist18[8],'o')
g18 = fit_g(g_init, x, y18[0])
a18=g18.mean
t18=g18.amplitude*np.exp(-(x1-a18)**2/(2*g18.stddev**2))
plt.show()

Related

Is there a library that will help me fit data easily? I found fitter and i will provide the code but it shows some errors

So, here is my code:
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from fitter import Fitter, get_common_distributions
df = pd.read_csv("project3.csv")
bins = [282.33, 594.33, 906.33, 1281.33, 15030.33, 1842.33, 2154.33, 2466.33, 2778.33, 3090.33, 3402.33]
#declaring
facecolor = '#EAEAEA'
color_bars = '#3475D0'
txt_color1 = '#252525'
txt_color2 = '#004C74'
fig, ax = plt.subplots(1, figsize=(16, 6), facecolor=facecolor)
ax.set_facecolor(facecolor)
n, bins, patches = plt.hist(df.City1, color=color_bars, bins=10)
#grid
minor_locator = AutoMinorLocator(2)
plt.gca().xaxis.set_minor_locator(minor_locator)
plt.grid(which='minor', color=facecolor, lw = 0.5)
xticks = [(bins[idx+1] + value)/2 for idx, value in enumerate(bins[:-1])]
xticks_labels = [ "{:.0f}-{:.0f}".format(value, bins[idx+1]) for idx, value in enumerate(bins[:-1])]
plt.xticks(xticks, labels=xticks_labels, c=txt_color1, fontsize=13)
#beautify
ax.tick_params(axis='x', which='both',length=0)
plt.yticks([])
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
for idx, value in enumerate(n):
if value > 0:
plt.text(xticks[idx], value+5, int(value), ha='center', fontsize=16, c=txt_color1)
plt.title('Histogram of rainfall in City1\n', loc = 'right', fontsize = 20, c=txt_color1)
plt.xlabel('\nCentimeters of rainfall', c=txt_color2, fontsize=14)
plt.ylabel('Frequency of occurrence', c=txt_color2, fontsize=14)
plt.tight_layout()
#plt.savefig('City1_Raw.png', facecolor=facecolor)
plt.show()
city1 = df['City1'].values
f = Fitter(city1, distributions=get_common_distributions())
f.fit()
fig = f.plot_pdf(names=None, Nbest=4, lw=1, method='sumsquare_error')
plt.show()
print(f.get_best(method = 'sumsquare_error'))
The issue is with the plots it shows. The first histogram it generates is
Next I get another graph with best fitted distributions which is
Then an output statement
{'chi2': {'df': 10.692966790090342, 'loc': 16.690849400411103, 'scale': 118.71595997157786}}
Process finished with exit code 0
I have a couple of questions. Why is chi2, the best fitted distribution not plotted on the graph?
How do I plot these distributions on top of the histograms and not separately? The hist() function in fitter library can do that but there I don't get to control the bins and so I end up getting like 100 bins with some flat looking data.
How do I solve this issue? I need to plot the best fit curve on the histogram that looks like image1. Can I use any other module/package to get the work done in similar way? This uses least squares fit but I am OK with least likelihood or log likelihood too.
Simple way of plotting things on top of each other (using some properties of the Fitter class)
import scipy.stats as st
import matplotlib.pyplot as plt
from fitter import Fitter, get_common_distributions
from scipy import stats
numberofpoints=50000
df = stats.norm.rvs( loc=1090, scale=500, size=numberofpoints)
fig, ax = plt.subplots(1, figsize=(16, 6))
n, bins, patches = ax.hist( df, bins=30, density=True)
f = Fitter(df, distributions=get_common_distributions())
f.fit()
errorlist = sorted(
[
[f._fitted_errors[dist], dist]
for dist in get_common_distributions()
]
)[:4]
for err, dist in errorlist:
ax.plot( f.x, f.fitted_pdf[dist] )
plt.show()
Using the histogram normalization, one would need to play with scaling to generalize again.

Plotting a dot that moves along side a dispersive wave?

How would I go on about plotting a dot that moves along a wave pack/superposition. I saw this on the website and wanted to try for myself.https://blog.soton.ac.uk/soundwaves/further-concepts/2-dispersive-waves/. So I know how to animate a superpositon of two sine waves. But how would I plot a dot that moves along it? I won't post my entire code, but it looks somewhat like this
import matplotlib.pyplot as plt
import numpy as np
N = 1000
x = np.linspace(0,100,N)
wave1 = np.sin(2*x)
wave2 = np.sin(3*x)
sWave = wave1+wave2
plt.plot(x,sWave)
plt.ion()
for t in np.arange(0,400):
sWave.set_ydata(sWave)
plt.draw()
plt.pause(.1)
plt.ioff()
plt.show()
Note that this is just a quick draft of my original code.
You can add a scatter and update its data in a loop by using .set_offsets().
import matplotlib.pyplot as plt
import numpy as np
N = 1000
x = np.linspace(0, 100, N)
wave1 = np.sin(2*x)
wave2 = np.sin(3*x)
sWave = wave1 + wave2
fig, ax = plt.subplots()
ax.plot(x, sWave)
scatter = ax.scatter([], [], facecolor="red") # Initialize an empty scatter.
for t in range(N):
scatter.set_offsets((x[t], sWave[t])) # Modify that scatter's data.
fig.canvas.draw()
plt.pause(.001)

How to plot Ocean Currents with Cartopy

I am trying to plot a netCDF4 file containing ocean currents from a NASA database for a project, but I keep getting errors such as "x and y coordinates are not compatible with the shape of the vector components".
I have tried changing the streamplot to a contourf (when I did it said that it needed to be a 2d array) which I tried to change but I could not get it to work.
import os
import matplotlib.pyplot as plt
from netCDF4 import Dataset as netcdf_dataset
import numpy as np
import cartopy.crs as ccrs
fname = "oscar_vel2019.nc.gz.nc"
data=netcdf_dataset(fname)
v = data.variables['v'][0, :, :, :]
vf = data.variables['vm'][0, :, :, :]
u = data.variables['u'][0, :, :, :]
uf = data.variables['um'][0, :, :, :]
lats = data.variables['latitude'][:]
lons = data.variables['longitude'][:]
ax = plt.axes(projection=ccrs.PlateCarree())
mymap=plt.streamplot(lons, lats, u, v, 60, transform=ccrs.PlateCarree())
ax.coastlines()
plt.show()
I would like it to work such that the ocean currents are visible on the plot and to show the movement of particles in the currents through an animation. I really don't have much knowledge with this which is why I am asking. Here is the link from which I got the file: https://podaac-opendap.jpl.nasa.gov/opendap/hyrax/allData/oscar/preview/L4/oscar_third_deg/oscar_vel2019.nc.gz.html
OK, I downloaded the data. The problem is that u and v are 4-dimensional, so you need to squeeze out the "depth" dimension. Cartopy also doesn't accept longitudes greater than 180, and you probably won't get away with stream plotting the whole thing. Also, density=60 will take forever...
This is ugly, but gives you the idea.
import xarray as xr
import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
with xr.open_dataset('/Users/jklymak/downloads/oscar_vel2019.nc.gz.nc') as ds:
print(ds)
ax = plt.axes(projection=ccrs.PlateCarree())
dec = 10
lon = ds.longitude.values[::dec]
lon[lon>180] = lon[lon>180] - 360
mymap=plt.streamplot(lon, ds.latitude.values[::dec], ds.u.values[0, 0, ::dec, ::dec], ds.v.values[0, 0, ::dec, ::dec], 6, transform=ccrs.PlateCarree())
ax.coastlines()
plt.show()

Smooth curves in Python Plots [duplicate]

I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.

How can I add a normal distribution curve to multiple histograms?

With the following code I create four histograms:
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.normal((1, 2, 3 , 4), size=(100, 4)))
data.hist(bins=10)
I want the histograms to look like this:
I know how to make it one graph at the time, see here
But how can I do it for multiple histograms without specifying each single one? Ideally I could use 'pd.scatter_matrix'.
Plot each histogram seperately and do the fit to each histogram as in the example you linked or take a look at the hist api example here. Essentially what should be done is
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
fig = plt.figure()
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
for ax in [ax1, ax2, ax3, ax4]:
n, bins, patches = ax.hist(**your_data_here**, 50, normed=1, facecolor='green', alpha=0.75)
bincenters = 0.5*(bins[1:]+bins[:-1])
y = mlab.normpdf( bincenters, mu, sigma)
l = ax.plot(bincenters, y, 'r--', linewidth=1)
plt.show()

Resources