Break a pandas line plot at specific date - python-3.x

I have a time-series dataframe with missing data for some time period. I would like to create a line plot and break a line where there is missing data.
data_site1_ave[["samples", "lkt"]].plot(figsize=(15,4), title = "Site 1", xlabel='')
Is it possible to create a gap, let's say from 2018-05-01 to 2018-10-30 in the line plot?

Yes, you can create arbitrary gaps by simply calling df.plot() several times, on the appropriate slices of the full dataframe. To make everything appear in the same plot, you can pass the ax keyword argument to plt.plot() via the df.plot() method. Turn the legend off for all but one call, so that the legend only has the one entry.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create sample time series
N = 365
np.random.seed(42)
x = pd.date_range('2018-01-01', freq='d', periods=N)
y = np.cumsum(np.random.rand(N, 1) - 0.5)
df = pd.DataFrame(y, columns=['y'], index=x)
# plot time series with gap
fig, ax = plt.subplots()
df.loc[:'2018-05-01'].plot(ax=ax, c='blue')
df.loc['2018-10-31':].plot(ax=ax, c='blue', legend=False);

Related

How to create a line plot in python, by importing data from excel and using it to create a plot that shares a common X-Axis?

Trying to create a plot using Python Spyder. I have sample data in excel which I am able to import into Spyder, I want one column ('Frequency') to be the X axis, and the rest of the columns ('C1,C2,C3,C4') to be plotted on the Y axis. How do I do this? This is the data in excel and how the plot looks in excel (https://i.stack.imgur.com/eRug5.png) , the plot and data
This is what I have so far . These commands below (Also seen in the image) give an empty plot.
data = data.head()
#data.plot(kind='line', x='Frequency', y=['C1','C2','C3','C4'])
df = pd.DataFrame(data, columns=["Frequency","C1", "C2","C3","C4"])
df.plot(x = "Frequency",y=["C1", "C2","C3","C4"])
Here is an example, you can change columns names:
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'X_Axis':[1,3,5,7,10,20],
'col_2':[.4,.5,.4,.5,.5,.4],
'col_3':[.7,.8,.9,.4,.2,.3],
'col_4':[.1,.3,.5,.7,.1,.0],
'col_5':[.5,.3,.6,.9,.2,.4]})
dfm = df.melt('X_Axis', var_name='cols', value_name='vals')
g = sns.catplot(x="X_Axis", y="vals", hue='cols', data=dfm, kind='point')
import pandas as pd
import matplotlib.pyplot as plt
path = r"C:\Users\Alisha.Walia\Desktop\Alisha\SAMPLE.xlsx"
data = pd.read_excel(path)
#df = pd.DataFrame.from_dict(data)
#print(df)
#prints out data from excl in tabular format
dict1 = (data.to_dict()) #print(dict1)
Frequency=data["Frequency "].to_list() #print (Frequency)
C1=data["C1"].to_list() #print(C1)
C2=data["C2"].to_list() #print(C2)
C3=data["C3"].to_list() #print(C3)
C4=data["C4"].to_list() #print(C4)
plt.plot(Frequency,C1)
plt.plot(Frequency,C2)
plt.plot(Frequency,C3)
plt.plot(Frequency,C4)
plt.style.use('ggplot')
plt.title('SAMPLE')
plt.xlabel('Frequency 20Hz-200MHz')
plt.ylabel('Capacitance pF')
plt.xlim(5, 500)
plt.ylim(-20,20)
plt.legend()
plt.show()

X and Y label being cut in matplotlib plots

I have this code:
import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
start = datetime.date(2016,1,1)
end = datetime.date.today()
stock = 'fb'
fig = plt.figure(dpi=1400)
data = web.DataReader(stock, 'yahoo', start, end)
fig, ax = plt.subplots(dpi=720)
data['vol_pct'] = data['Volume'].pct_change()
data.plot(y='vol_pct', ax = plt.gca(), title = 'this is the title \n second line')
ax.set(xlabel="Date")
ax.legend(loc='upper center', bbox_to_anchor=(0.32, -0.22), shadow=True, ncol=2)
plt.savefig('Test')
This is an example of another code but the problem is the same:
At bottom of the plot you can see that the legend is being cut out. In another plot of a different code which i am working on, even the ylabel is also cut when i save the plot using plt.savefig('Test').How can i can fix this?
It's a long-standing issue with .savefig() that it doesn't check legend and axis locations before setting bounds. As a rule, I solve this with the bbox_inches argument:
plt.savefig('Test', bbox_inches='tight')
This is similar to calling plt.tight_layout(), but takes all of the relevant artists into account, whereas tight_layout will often pull some objects into frame while cutting off new ones.
I have to tell pyplot to keep it tight more than half the time, so I'm not sure why this isn't the default behavior.
plt.subplots_adjust(bottom=0.4 ......)
I think this modification will satisfy you.
Or maybe you can relocate the legend to loc="upper left"
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots_adjust.html
please also checked this issue which raised 8 years ago..
Moving matplotlib legend outside of the axis makes it cutoff by the figure box

Tick labels only displayed in one subplot

Need to display custom shared x-axis tick labels on both subplots, using two datasets with different dates.
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
#Dataset 1
rng1 = date_range(start='01-01-2015', periods=5, freq='1M')
df1 = DataFrame({'y':np.random.normal(size=len(rng1))}, index=rng1)
y1 = df1['y']
#Dataset 2
rng2 = date_range(start='01-01-2015', periods=5, freq='2M')
df2 = DataFrame({'y':np.random.normal(size=len(rng2))}, index=rng2)
y2 = df2['y']
#Figure
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
y1.plot(ax=ax1)
y2.plot(ax=ax2)
plt.xticks(rotation=30)
ax1.xaxis.set_minor_formatter(plt.NullFormatter())
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%D'))
plt.show()
(Above) Creates figure without x-axis labels on the upper subplot
I expect adding the below code to display the same x-axis labels to the upper subplot, but they are not showing up. What am I doing wrong?
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%D'))
Setting sharex=False does not work because the dates are different for each dataset.
First of all, if you want to use matplotlib.dates locators and formatters on a plot created via pandas you should use the x_compat=True argument in your plot, otherwise pandas may scale the axis rather arbitrarily.
Then '%D' is not a valid format string. Maybe you mean '%b'?
Now there are two options.
Use sharex=False, set your locators and formatters to both axes, and finally set the limits of the one plot to the limits of the other. In this case since the lower plot comprises a larger range,
ax1.set_xlim(ax2.get_xlim())
The other option is to use sharex=True and turn the labels visible again.
plt.setp(ax1.get_xticklabels(), visible=True)
Unfortunately this option is broken on the newest matplotlib version. I just opened a bug report about it.
Full code for the first option (since the second one is not working):
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
#Dataset 1
rng1 = date_range(start='2015-01-01', periods=5, freq='1M')
df1 = DataFrame({'y':np.random.normal(size=len(rng1))}, index=rng1)
y1 = df1['y']
#Dataset 2
rng2 = date_range(start='2015-01-01', periods=5, freq='2M')
df2 = DataFrame({'y':np.random.normal(size=len(rng2))}, index=rng2)
y2 = df2['y']
#Figure
fig,(ax1,ax2) = plt.subplots(2,1,sharex=False)
y1.plot(ax=ax1, x_compat=True)
y2.plot(ax=ax2, x_compat=True)
plt.xticks(rotation=30)
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax1.xaxis.set_minor_locator(plt.NullLocator())
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax2.xaxis.set_minor_locator(plt.NullLocator())
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax1.set_xlim(ax2.get_xlim())
plt.show()
This seemed to work for me, just following the shared_axis_demo
fig = plt.figure()
ax1 = plt.subplot(211)
_ = plt.plot(y1)
plt.setp(ax1.get_xticklabels(), visible=True)
ax2 = plt.subplot(212, sharex=ax1)
_ = plt.plot(y2)
plt.show()

Matplotlib Line Graph with Table from Pandas Pivot Table

Given the following:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig, ax = plt.subplots(1,1)
t.plot(ax=ax)
I'd like to add a summary table below it as a separate axis.
I would allocate the other axis via subplot2grid, but using subplots(2,1) will work as well as I can just adapt it to my needs.
I'd like the table to look like this:
2013 2014 2015 2016
A 7 2 6 5
B 9 8 7 4
...with no borders/lines if possible.
Update:
Here's a sample of what I've tried with subplot2grid:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig = plt.figure(figsize=(figsize), dpi=300)
ax1 = plt.subplot2grid((100,100), (0,0), rowspan=70, colspan=100)
ax2 = plt.subplot2grid((100,100), (80,0), rowspan=20, colspan=100)
t.plot(ax=ax1)
ax1.legend_.remove()
ax1.spines['top'].set_visible(False);ax1.spines['right'].set_visible(False);ax1.spines['bottom'].set_visible(False);ax1.spines['left'].set_visible(False)
ax2.spines['top'].set_visible(False);ax2.spines['right'].set_visible(False);ax2.spines['bottom'].set_visible(False);ax2.spines['left'].set_visible(False)
#ax1.xaxis.set_visible(False) #Hide x axis label
ax2.xaxis.set_visible(False)
ax2.yaxis.set_visible(False)
ax1.tick_params(axis='x',which='both',bottom='on',top='off')
ax1.tick_params(axis='y',which='both',left='on',right='off')
ax2.tick_params(axis='x',which='both',bottom='off',top='off')
ax2.tick_params(axis='y',which='both',left='off',right='off')
from matplotlib.colors import ListedColormap
t2=df.pivot_table(df,index=['Group'],columns=['YYYYMM'],aggfunc=np.sum).sortlevel(ascending=False)
sns.heatmap(df,annot=True,fmt='d',linewidths=.5,cbar=False,cmap=ListedColormap(['white']))
plt.show()
...which produces this (notice that the bottom plot is hidden; this is intentional as I only want to display faint grey lines between rows and columns where absolutely needed (per Stephen Few - "Show Me The Numbers").
But I would like for the years in the table to align with the year index tick labels on the x-axis.
Another update:
Using Seaborn (see last 4 lines of code in update), I tried a heat map, which might get me to where I need to be. I just need to format the numbers, label the groups, and maybe shift the dates.

Pull out chunks of a plot made in python and re-display

I have made a plot in jupyter that has an x-axis spanning for about 40 seconds. I want to pull out sections that are milliseconds long and re-display them as separate plots (so that they can be better viewed). How would I go about doing this?
You could use some subplots, and slice the original data arrays. For example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,40,1000)
y = np.random.random(1000)
fig, [ax1,ax2,ax3] = plt.subplots(3,1)
ax1.plot(x,y)
ax2.plot(x[100:120],y[100:120])
ax3.plot(x[500:520],y[500:520])
plt.show()

Resources