Matplotlib Line Graph with Table from Pandas Pivot Table - python-3.x

Given the following:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig, ax = plt.subplots(1,1)
t.plot(ax=ax)
I'd like to add a summary table below it as a separate axis.
I would allocate the other axis via subplot2grid, but using subplots(2,1) will work as well as I can just adapt it to my needs.
I'd like the table to look like this:
2013 2014 2015 2016
A 7 2 6 5
B 9 8 7 4
...with no borders/lines if possible.
Update:
Here's a sample of what I've tried with subplot2grid:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig = plt.figure(figsize=(figsize), dpi=300)
ax1 = plt.subplot2grid((100,100), (0,0), rowspan=70, colspan=100)
ax2 = plt.subplot2grid((100,100), (80,0), rowspan=20, colspan=100)
t.plot(ax=ax1)
ax1.legend_.remove()
ax1.spines['top'].set_visible(False);ax1.spines['right'].set_visible(False);ax1.spines['bottom'].set_visible(False);ax1.spines['left'].set_visible(False)
ax2.spines['top'].set_visible(False);ax2.spines['right'].set_visible(False);ax2.spines['bottom'].set_visible(False);ax2.spines['left'].set_visible(False)
#ax1.xaxis.set_visible(False) #Hide x axis label
ax2.xaxis.set_visible(False)
ax2.yaxis.set_visible(False)
ax1.tick_params(axis='x',which='both',bottom='on',top='off')
ax1.tick_params(axis='y',which='both',left='on',right='off')
ax2.tick_params(axis='x',which='both',bottom='off',top='off')
ax2.tick_params(axis='y',which='both',left='off',right='off')
from matplotlib.colors import ListedColormap
t2=df.pivot_table(df,index=['Group'],columns=['YYYYMM'],aggfunc=np.sum).sortlevel(ascending=False)
sns.heatmap(df,annot=True,fmt='d',linewidths=.5,cbar=False,cmap=ListedColormap(['white']))
plt.show()
...which produces this (notice that the bottom plot is hidden; this is intentional as I only want to display faint grey lines between rows and columns where absolutely needed (per Stephen Few - "Show Me The Numbers").
But I would like for the years in the table to align with the year index tick labels on the x-axis.
Another update:
Using Seaborn (see last 4 lines of code in update), I tried a heat map, which might get me to where I need to be. I just need to format the numbers, label the groups, and maybe shift the dates.

Related

Break a pandas line plot at specific date

I have a time-series dataframe with missing data for some time period. I would like to create a line plot and break a line where there is missing data.
data_site1_ave[["samples", "lkt"]].plot(figsize=(15,4), title = "Site 1", xlabel='')
Is it possible to create a gap, let's say from 2018-05-01 to 2018-10-30 in the line plot?
Yes, you can create arbitrary gaps by simply calling df.plot() several times, on the appropriate slices of the full dataframe. To make everything appear in the same plot, you can pass the ax keyword argument to plt.plot() via the df.plot() method. Turn the legend off for all but one call, so that the legend only has the one entry.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create sample time series
N = 365
np.random.seed(42)
x = pd.date_range('2018-01-01', freq='d', periods=N)
y = np.cumsum(np.random.rand(N, 1) - 0.5)
df = pd.DataFrame(y, columns=['y'], index=x)
# plot time series with gap
fig, ax = plt.subplots()
df.loc[:'2018-05-01'].plot(ax=ax, c='blue')
df.loc['2018-10-31':].plot(ax=ax, c='blue', legend=False);

How to prevent pandas.Dataframe.hist() from creating subplots with superimposed data?

I'm trying to display two dataframe columns with respect to a third one. Here is what my code looks like:
1 import pandas as pd
2 import matplotlib.pyplot as plt
3
4 col1=[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]
5 col2=[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2]
6 col3=[1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75]
7
8 myDF = pd.DataFrame({'col1': col1, 'col2': col2, 'col3':col3})
9
10 myDF.hist(['col3','col3'], weights=[myDF.col1,myDF.col2])
11
12 plt.savefig('myDF.png', format='png')
And here's the output:
I tried imposing a single subplot with
fig, ax = plt.subplots(figsize = (1,1))
myDF.hist(['col3','col3'], weights=[myDF.col1,myDF.col2], ax=ax)
But it's being overriden at runtime.
Any ideas how i can get a single plot?
EDIT: The following worked
x = [list(myDF.col3), list(myDF.col3)]
plt.hist(x, weights=[myDF.col1,myDF.col2])
Seems to be an issue with the pandas hist() function (?)
I am not able in a clean way. You can try something like:
import pandas as pd
import matplotlib.pyplot as plt
col1=[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]
col2=[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2]
col3=[1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75]
myDF = pd.DataFrame({'col1': col1, 'col2': col2, 'col3':col3})
ax = myDF.hist(column='col3', weights=[myDF.col1], align='mid', rwidth=0.8)
myDF.hist(column='col3', weights=[myDF.col2], ax=ax, align='mid', rwidth=0.8, alpha=0.8)
you can try playing around with the bars alignment (left, mid or right) and with the bars rwidth...

matplotlib x-axis messed up [duplicate]

I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter

how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3

I have dataframe i want to draw a scattor plot by dividing plot in 2 regions in region one only plot f_x_f1 vs A_x_f1, and in region2 plot f_x_f2 vs A_x_f2
please if someone can provide better solution for this problem
here is example of my dataframe
df=pd.DataFrame({'f_x_f1':[0.3,0.28,0.34],'A_x_f1':[0.003,0.28,0.034],'f1':[0.4,0.4,0.4],'f_x_f2':[0.91,0.88,0.96],'A_x_f2':[0.003,0.28,0.034],'f2':[1.3,1.3,1.3]})
Here, using matplotlib!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
### making some sample data
df = pd.DataFrame({"f_x_f1": np.random.randint(1,100,100)
, "A_x_f1": np.random.randint(1,100,100)
, "f_x_f2": np.random.randint(1,100,100)
, "A_x_f2": np.random.randint(1,100,100) })
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0].scatter(df.f_x_f1,df.A_x_f1)
ax[0].set_title("f_x_f1 vs A_x_f1")
ax[1].scatter(df.f_x_f2,df.A_x_f2)
ax[1].set_title("f_x_f2 vs A_x_f2")
OUTPUT:

Set hue using a range of values in Seaborn stripplot

I am trying to set hue based on a range of values rather than unique values in seaborn stripplot. For example, different colors for different value ranges (1940-1950, 1950-1960 etc.).
sns.stripplot('Condition', 'IM', data=dd3, jitter=0.3, hue= dd3['Year Built'])
Output Figure
Thanks
Looks like you need to bin the data. Use .cut() in the below manner. The years are binned into 5 groups. You can arrange your own step in .arrange() to adjust your ranges.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.randint(0,100,size=100)
y = np.random.randint(0,100, size=100)
year = np.random.randint(1918, 2019, size=100)
df = pd.DataFrame({
'x':x,
'y':y,
'year':year
})
df['year_bin'] = pd.cut(df['year'], np.arange(min(year), max(year), step=20))
sns.lmplot('x','y', data=df, hue='year_bin')
plt.show()
Output:

Resources