How to prevent pandas.Dataframe.hist() from creating subplots with superimposed data? - python-3.x

I'm trying to display two dataframe columns with respect to a third one. Here is what my code looks like:
1 import pandas as pd
2 import matplotlib.pyplot as plt
3
4 col1=[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]
5 col2=[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2]
6 col3=[1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75]
7
8 myDF = pd.DataFrame({'col1': col1, 'col2': col2, 'col3':col3})
9
10 myDF.hist(['col3','col3'], weights=[myDF.col1,myDF.col2])
11
12 plt.savefig('myDF.png', format='png')
And here's the output:
I tried imposing a single subplot with
fig, ax = plt.subplots(figsize = (1,1))
myDF.hist(['col3','col3'], weights=[myDF.col1,myDF.col2], ax=ax)
But it's being overriden at runtime.
Any ideas how i can get a single plot?
EDIT: The following worked
x = [list(myDF.col3), list(myDF.col3)]
plt.hist(x, weights=[myDF.col1,myDF.col2])
Seems to be an issue with the pandas hist() function (?)

I am not able in a clean way. You can try something like:
import pandas as pd
import matplotlib.pyplot as plt
col1=[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]
col2=[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2]
col3=[1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75]
myDF = pd.DataFrame({'col1': col1, 'col2': col2, 'col3':col3})
ax = myDF.hist(column='col3', weights=[myDF.col1], align='mid', rwidth=0.8)
myDF.hist(column='col3', weights=[myDF.col2], ax=ax, align='mid', rwidth=0.8, alpha=0.8)
you can try playing around with the bars alignment (left, mid or right) and with the bars rwidth...

Related

Group Box Plots for different numerical variables in one figure

I have a data frame with several numerical variables and I would like to create box plots for each variable and group them in one figure. So each variable should have its own box plot and all these box plots should be in 1 figure.How can I do that in Seaborn or Matplotlib?
Thank you very much!
Yes, you can do with seaborn:
df = pd.DataFrame(np.random.rand(100,4), columns=list('ABCD'))
num_col_list = ['A','B','C','D']
sns.boxplot(data=df.melt(value_vars=num_col_list),
x='variable', y='value')
Output:
Or with just pandas/matplotlib:
df.boxplot(column=num_col_list)
Output:
If you use a pandas data frame you can use the boxplot function:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10, 4),columns=['Col1', 'Col2', 'Col3', 'Col4'])
df.boxplot(column=['Col1', 'Col2', 'Col3'])
plt.show()

Pandas and Matplotlib: Adding tooltip to make interactive

I am trying to add tool tip to the graph, so whenever we hover around the graph it will give the info. How do i add one and make it an interactive one?
import matplotlib.pyplot as plt
import pandas as pd
import pandas as pd
from numpy import nan
from matplotlib import dates as mpl_dates
df = dataset
df["Date"] = pd.to_datetime(df["Date"]).dt.strftime('%m/%d/%Y')
#df["Date"] = pd.to_datetime(df["Date"]).apply(lambda x: x.strftime('%B-%Y'))
df.loc[df['Actuals'] == 0, ['Actuals']] = nan
df.loc[df['Actuals'] > 0, ['Predicted_Lower']] = nan
df.loc[df['Actuals'] > 0, ['Predicted_Upper']] = nan
# gca stands for 'get current axis'
ax = plt.gca()
y1 = df['Predicted_Lower']
y2 = df['Predicted_Upper']
x = df['Date']
ax.fill_between(x,y1, y2, facecolor="blue", alpha=0.7)
df.plot(kind='line',x='Date',y='Predicted', color='black', ax=ax)
df.plot(kind='line',x='Date',y='Actuals', color='green', ax=ax)
df.plot(kind='line',x='Date',y='Predicted_Lower',color='white',ax=ax)
df.plot(kind='line',x='Date',y='Predicted_Upper',color='white', ax=ax)
date_format = mpl_dates.DateFormatter('%Y-%m-%d')
plt.gca().xaxis.set_major_formatter(date_format)
locs, labels = plt.xticks()
plt.xticks(locs[::3], labels[::3], rotation=45)
plt.show()
plt.xticks(rotation=45)
plt.legend(['Predicted','Actuals'])
plt.xlabel('Date')
df.head(30)
plt.show()
using pandas, matplotlib, I am getting the data from sql server that is connected to Power BI and writing pyscripts to display graphs.
It's possible using matplotlib as discussed here.
However, you might want to look into other plotting packages such as plotly where it is builtin, default behavior.
import plotly.express as px
df = pd.DataFrame(np.arange(20), columns=['x'])
df['y'] = df['x']**2
px.line(df, x='x', y='y')
In your example, you could try something like
px.line(df, x='Date', y=Predicted, ...)

matplotlib x-axis messed up [duplicate]

I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter

how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3

I have dataframe i want to draw a scattor plot by dividing plot in 2 regions in region one only plot f_x_f1 vs A_x_f1, and in region2 plot f_x_f2 vs A_x_f2
please if someone can provide better solution for this problem
here is example of my dataframe
df=pd.DataFrame({'f_x_f1':[0.3,0.28,0.34],'A_x_f1':[0.003,0.28,0.034],'f1':[0.4,0.4,0.4],'f_x_f2':[0.91,0.88,0.96],'A_x_f2':[0.003,0.28,0.034],'f2':[1.3,1.3,1.3]})
Here, using matplotlib!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
### making some sample data
df = pd.DataFrame({"f_x_f1": np.random.randint(1,100,100)
, "A_x_f1": np.random.randint(1,100,100)
, "f_x_f2": np.random.randint(1,100,100)
, "A_x_f2": np.random.randint(1,100,100) })
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0].scatter(df.f_x_f1,df.A_x_f1)
ax[0].set_title("f_x_f1 vs A_x_f1")
ax[1].scatter(df.f_x_f2,df.A_x_f2)
ax[1].set_title("f_x_f2 vs A_x_f2")
OUTPUT:

Matplotlib Line Graph with Table from Pandas Pivot Table

Given the following:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig, ax = plt.subplots(1,1)
t.plot(ax=ax)
I'd like to add a summary table below it as a separate axis.
I would allocate the other axis via subplot2grid, but using subplots(2,1) will work as well as I can just adapt it to my needs.
I'd like the table to look like this:
2013 2014 2015 2016
A 7 2 6 5
B 9 8 7 4
...with no borders/lines if possible.
Update:
Here's a sample of what I've tried with subplot2grid:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig = plt.figure(figsize=(figsize), dpi=300)
ax1 = plt.subplot2grid((100,100), (0,0), rowspan=70, colspan=100)
ax2 = plt.subplot2grid((100,100), (80,0), rowspan=20, colspan=100)
t.plot(ax=ax1)
ax1.legend_.remove()
ax1.spines['top'].set_visible(False);ax1.spines['right'].set_visible(False);ax1.spines['bottom'].set_visible(False);ax1.spines['left'].set_visible(False)
ax2.spines['top'].set_visible(False);ax2.spines['right'].set_visible(False);ax2.spines['bottom'].set_visible(False);ax2.spines['left'].set_visible(False)
#ax1.xaxis.set_visible(False) #Hide x axis label
ax2.xaxis.set_visible(False)
ax2.yaxis.set_visible(False)
ax1.tick_params(axis='x',which='both',bottom='on',top='off')
ax1.tick_params(axis='y',which='both',left='on',right='off')
ax2.tick_params(axis='x',which='both',bottom='off',top='off')
ax2.tick_params(axis='y',which='both',left='off',right='off')
from matplotlib.colors import ListedColormap
t2=df.pivot_table(df,index=['Group'],columns=['YYYYMM'],aggfunc=np.sum).sortlevel(ascending=False)
sns.heatmap(df,annot=True,fmt='d',linewidths=.5,cbar=False,cmap=ListedColormap(['white']))
plt.show()
...which produces this (notice that the bottom plot is hidden; this is intentional as I only want to display faint grey lines between rows and columns where absolutely needed (per Stephen Few - "Show Me The Numbers").
But I would like for the years in the table to align with the year index tick labels on the x-axis.
Another update:
Using Seaborn (see last 4 lines of code in update), I tried a heat map, which might get me to where I need to be. I just need to format the numbers, label the groups, and maybe shift the dates.

Resources