Plotting trends and predictions data from OLS (statsmodels) - python-3.x

I have this data from 1992 to 2016:
Year month data stdBS index1
1992-05-01 1992 5 302.35 31.69 727319
1992-06-01 1992 6 305.07 27.59 727350
1992-07-01 1992 7 297.12 29.12 727380
1992-08-01 1992 8 304.39 21.41 727411
1992-09-01 1992 9 294.30 32.26 727442
Using this code:
flow2=fmO['data']
fig,ax = plt.subplots(1,1, figsize=(6,4))
res2 = sm.tsa.seasonal_decompose(flow2)
residual = res2.resid
seasonal = res2.seasonal
trend = res2.trend
fig = res2.plot()
plt.show()
I obtained this plot:
Everything is fine, but now I need to plot the predictions fit
trend2 = trend.reset_index()
X = fm0.index1
y = trend
X = sm.add_constant(X)
model = sm.OLS(y,X, missing='drop')
results = model.fit()
predictions = results.predict(X)
p = results.summary()
With this short code:
fig, ax = plt.subplots(figsize=(8,4))
ax.scatter(df0.index, trend)
ax.plot(df0.index, df0.predic, 'r')
ax.set_ylabel('Data')
I obtained this plot:
But I lost the index of the original trend plot. My question is if there exists some simple way to plot trend data from sm.tsa.seasonal_decompose with the linear fit predictions with the original index time?

Related

Bar plot and line plot shifts away from each other when using twinx()

I am dealing a CDF4 data using xarray to open it. I want to plot a line and a bar in just one figure, so I used twinx(). The problem is the xticks of the line and bar plot are not matched to each other. The xticks of the line plot is shifted one unit (one month) to the right when compared to those of the bar plot.
ERA5_land = xr.open('Era5landmon2001_2021nc.sec')
prec = ERA5_land.tp.groupby('time.month').sum().sel(longitude = 8.4, latitude = 49) #calculate the sum of precipitation at (49, 8.4)
temp = ERA5_land.t2m.groupby('time.month').mean().sel(longitude = 8.4, latitude = 49) #calculate the mean of temperature at (49, 8.4)
f,ax1 = plt.subplots()
prec.to_series().plot.bar(ax = ax1,)
ax2 = ax1.twinx()
temp.plot( ax = ax2)
The data can be downloaded from here: ERA5-Land hourly data from 1950 to present. The variables are '2m temperature' and 'total precipitation', measured from 2001 to 2021. All the 12 months in a year were taken into account.
My result:
Time shift of the line plot
I tried to set ax2.set_xticks(ticks = ax1.get_xticks() and ax2.set_xticks(ticks = range(1,13) but nothing was success.
f,ax1 = plt.subplots()
prec.to_series().plot.bar(ax = ax1)
ax2 = ax1.twinx()
a2.set_xticks(ticks = ax1.get_xticks())
temp.plot( ax = ax2)
I also know that the xticks of the two plots are different, but I don't know to do match them.

Too long processing time for a 3-d plot

I am comparatively new to python, so I am not able to assess if there is something wrong with my code or is the process taking too long to complete or anything else.
I wrote a code for plotting a large dataset (3d array) in a 3d plot, but my PC takes forever to complete (or not complete). I have been waiting for about one hour for it to complete nearly.
a = pd.DataFrame(np.array([Ensemble_test,df['RF'],y])).transpose()
a # is a dataset with dimentions 335516 rows × 3 columns
### All the 3 rows are numbers
Output:
0 1 2
0 172.981614 130.624674 -42.356940
1 189.851754 139.632304 -50.219450
## I tried plotting using following
from mpl_toolkits.mplot3d import Axes3D
df=a.unstack().reset_index()
df.columns=["X","Y","Z"]
df['X']=pd.Categorical(df['X'])
df['X']=df['X'].cat.codes
# Make the plot
fig = plt.figure(figsize = (8,8))
ax = fig.gca(projection='3d')
im = ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap='Spectral', linewidth=0.001, vmax = 30,
vmin = -30, antialiased=True)
ax.view_init(40,20)
#fig.colorbar(im, ax=ax, fraction = 0.023)
ax.set_ylabel('RD')
ax.set_zlabel('Difference')
ax.set_xlabel('Ensemble')
I wanted to have a 3-d plot but the process takes too long. I don't know what the problem is.
Any other alternatives/suggestions for 3-d plotting are also welcome.
[My PC is 8th gen 'i7' with '16 GB' RAM]

Divide subplots onto multiple screens

I want to plot some data from a csv file using dataframe.
My code currently displays 266=14*19 separate subplots. Currently, it is coded to display all 266 subplots on one screen and each is small and difficult to read.
I tried to use a 1x1 plot
#fig, cx=plt.subplots(1,1, sharex=False, sharey=False, figsize=(18,12))
Main code:
fig, cx=plt.subplots(14,19, sharex=False, sharey=False, figsize=(18,12))
#fig, cx=plt.subplots(1,1, sharex=False, sharey=False, figsize=(18,12))
plt.subplots_adjust(hspace=0.5)
cx = cx.ravel()
for i in range(0,len(Bond)):
cx[i].plot(VelLog[Bond[i]], color='b')
cx[i].set_xlabel('Time (ms)')
cx[i].set_ylabel('Velocity (m/s)')
cx[i].set_ylim(-250,150)
cx[i].set_title(Bond[i])
plt.savefig('Velocity.png', dpi=120)
plt.show()
##################################
Error message when I un-comment Line 1
cx[i].plot(VelLog[Bond[i]], color='b')
IndexError: index 209 is out of bounds for axis 0 with size 209
How can I only display a few subplots on a screen at a time to increase readability?
Like 5x5 + 5x5 + 5x5 + 5x5 + 5x5,+ 5x5 + 5x5 + 5x5 + 5x5 + 5x5 + 4x4 = 266
11 different screens.
Is there a way to add a chart filter as an alternative?
Here is my updated code with your suggestions. It now creates 11 figures. I was able to plot all 266 graphs but each graph looks the same
Nrows, Ncols = 14, 19
Nplots = Nrows*Ncols
nrows, ncols = 5, 5
naxes = nrows*ncols
nfigures = Nplots//naxes
count = 0
figures = []
for fignum in range(nfigures+1):
fig, axes = plt.subplots(nrows, ncols, sharex=False, sharey=False, figsize=(18,12))
plt.subplots_adjust(hspace=0.5)
#axes = axes.ravel()
figures.append(fig)
axes = axes.flat
for ax in axes:
#print(count)
if count<Nplots:
for i in range(0,len(Bond)):
cvs_row, cvs_col = divmod(count, Ncols)
cvs_row, cvs_col = cvs_row+1, cvs_col+1
ax.plot(VelLog[Bond[i]], color='b', label='%d,%d'%(cvs_row, cvs_col))
ax.set_xlabel('Time (ms)')
ax.set_ylabel('Velocity (m/s)')
ax.set_ylim(-250,150)
ax.set_title(Bond[i])
count = count+1
plt.savefig('Velocity.png', dpi=120)
plt.show()
Here is the result for one figure
enter image description here
Here I plot the same function 14*19 times, but you can get an idea
We have two problems, keeping track of what we are plotting and
delaying the actual plotting until we are finished with all of them.
The first issue is solved here using a global counter and the divmod
builtin, the second storing all the figures in a list and calling
plt.show() only at the very end of the plotting phase.
To show what I mean with "keeping track" I have added a label and a
legend to each individual subplot.
To keep things reasonable I don't check if the last figure is empty
and also I don't check is some subplots of the last figures are empty,
but remember that you can remove some subplots from a figure if you
want a clean last figure.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, np.pi, 31)
y = np.sin(x)
Nrows, Ncols = 14, 19
Nplots = Nrows*Ncols
nrows, ncols = 5, 5
naxes = nrows*ncols
nfigures = Nplots//naxes
count = 0
figures = []
for fignum in range(nfigures+1):
fig, axes = plt.subplots(nrows, ncols)
figures.append(fig)
axes = axes.flat
for ax in axes:
print(count)
if count<Nplots:
cvs_row, cvs_col = divmod(count, Ncols)
cvs_row, cvs_col = cvs_row+1, cvs_col+1
ax.plot(x, y, label='%d,%d'%(cvs_row, cvs_col))
ax.legend()
count = count+1
plt.show()
Here I show just the last figure, to show what I mean with "empty
subplots"...

Plotting Multiple Plots on a single figure from within a for loop - Python

I have reviewed the response to this question: How would I iterate over a list of files and plot them as subplots on a single figure?
But am none the wiser on how to achieve my goal. I would like to plot multiple data sets, with differing x axes, onto a single figure in Python. I have included a snippet of my code below, which performs an FFT on a dataset, then calculates 3 Butterworth filter outputs. Ideally I would like to have all plotted on a single figure, which I have attempted to achieve in the code below.
The for loop calculates the 3 Butterworth filter outputs, the code above - the FFT and the code directly below attempts to append the FFT curve and sqrt(0.5) line to the previously generated plots for display.
Any Direction or advice would be appreciated.
"""Performs a Fast Fourier Transform on the data specified at the base of the code"""
def FFT(col):
x = io2.loc[1:,'Time']
y = io2.loc[1:,col]
# Number of samplepoints
#N = 600
N = pd.Series.count(x)
N2 = int(N/2)
# sample spacing
#T = 1.0 / 800.0
T = 1/(io2.loc[2,'Time'] - io2.loc[1,'Time'])
#x = np.linspace(0.0, N*T, N)
#y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = scipy.fftpack.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N2)
fig=plt.figure()
plt.clf()
i=1
for order in [3, 6, 9]:
ax=fig.add_subplot(111, label="order = %d" % order)
b, a = butter_lowpass(cutoff, fs, order=order)
w, h = freqz(b, a, worN=2000)
ax.plot((fs * 0.5 / np.pi) * w, abs(h))
i=i+1
ax4=fig.add_subplot(111, label='sqrt(0.5)', frame_on=False)
ax5=fig.add_subplot(111, label="FFT of "+col, frame_on=False)
ax4.plot([0, 0.5 * fs], [np.sqrt(0.5), np.sqrt(0.5)], '--')
ax5.plot(xf, 2.0/N * np.abs(yf[:N2]))
plt.xlabel('Frequency (Hz)')
plt.ylabel('Gain')
plt.grid(True)
plt.legend(loc='best')
#fig, ax = plt.subplots()
#ax.plot(xf, 2.0/N * np.abs(yf[:N2]), label="FFT of "+col)
plt.axis([0,5000,0,0.1])
#plt.xlabel('Frequency (Hz)')
#plt.ylabel('Amplitude (mm)')
#plt.legend(loc=0)
plt.show()
return
Kind Regards,
Here you can find a minimal example of how to plot multiple lines with different x and y datasets. You are recreating the plot every time you type add_subplot(111). Instead, you should call plot multiple times. I have added an example for a single plot with multiple lines, as well as an example for one subplot per line.
import numpy as np
import matplotlib.pyplot as plt
x1 = np.arange(0, 10, 1)
x2 = np.arange(3, 12, 0.1)
x3 = np.arange(2, 8, 0.01)
y1 = np.sin(x1)
y2 = np.cos(x2**0.8)
y3 = np.sin(4.*x3)**3
data = []
data.append((x1, y1, 'label1'))
data.append((x2, y2, 'label2'))
data.append((x3, y3, 'label3'))
# All lines in one plot.
plt.figure()
for n in data:
plt.plot(n[0], n[1], label=n[2])
plt.legend(loc=0, frameon=False)
# One subplot per data set.
cols = 2
rows = len(data)//2 + len(data)%2
plt.figure()
gs = plt.GridSpec(rows, cols)
for n in range(len(data)):
i = n%2
j = n//2
plt.subplot(gs[j,i])
plt.plot(data[n][0], data[n][1])
plt.title(data[n][2])
plt.tight_layout()
plt.show()

How to plot cdf on histogram in matplotlib

I currently have a script that will plot a histogram of relative frequency, given a pandas series. The code is:
def to_percent3(y, position):
s = str(100 * y)
if matplotlib.rcParams['text.usetex'] is True:
return s + r'$\%$'
else:
return s + '%'
df = pd.read_csv('mycsv.csv')
waypointfreq = df['Waypoint Frequency(Secs)']
cumfreq = df['Waypoint Frequency(Secs)']
perctile = np.percentile(waypointfreq, 95) # claculates 95th percentile
bins = np.arange(0,perctile+1,1) # creates list increasing by 1 to 96th percentile
plt.hist(waypointfreq, bins = bins, normed=True)
formatter = FuncFormatter(to_percent3) #changes y axis to percent
plt.gca().yaxis.set_major_formatter(formatter)
plt.axis([0, perctile, 0, 0.03]) #Defines the axis' by the 95th percentile and 10%Relative frequency
plt.xlabel('Waypoint Frequency(Secs)')
plt.xticks(np.arange(0, perctile, 15.0))
plt.title('Relative Frequency of Average Waypoint Frequency')
plt.grid(True)
plt.show()
It produces a plot that looks like this:
What I'd like is to overlay this plot with a line showing the cdf, plotted against a secondary axis. I know that I can create the cumulative graph with the command:
waypointfreq = df['Waypoint Frequency(Secs)']
perctile = np.percentile(waypointfreq, 95) # claculates 90th percentile
bins = np.arange(0,perctile+5,1) # creates list increasing by 2 to 90th percentile
plt.hist(waypointfreq, bins = bins, normed=True, histtype='stepfilled',cumulative=True)
formatter = FuncFormatter(to_percent3) #changes y axis to percent
plt.gca().yaxis.set_major_formatter(formatter)
plt.axis([0, perctile, 0, 1]) #Defines the axis' by the 90th percentile and 10%Relative frequency
plt.xlabel('Waypoint Frequency(Secs)')
plt.xticks(np.arange(0, perctile, 15.0))
plt.title('Cumulative Frequency of Average Waypoint Frequency')
plt.grid(True)
plt.savefig(r'output\4 Cumulative Frequency of Waypoint Frequency.png', bbox_inches='tight')
plt.show()
However, this is plotted on a separate graph, instead of over the previous one. Any help or insight would be appreciated.
Maybe this code snippet helps:
import numpy as np
from scipy.integrate import cumtrapz
from scipy.stats import norm
from matplotlib import pyplot as plt
n = 1000
x = np.linspace(-3,3, n)
data = norm.rvs(size=n)
data = data + abs(min(data))
data = np.sort(data)
cdf = cumtrapz(x=x, y=data )
cdf = cdf / max(cdf)
fig, ax = plt.subplots(ncols=1)
ax1 = ax.twinx()
ax.hist(data, normed=True, histtype='stepfilled', alpha=0.2)
ax1.plot(data[1:],cdf)
If your CDF is not smooth, you could fit a distribution

Resources