Matplotlib bar plot with table formatting - python-3.x

I have added a table to the bottom of my plot, but there are a number of issues with it:
The right has too much padding.
The left has too little padding.
The bottom has no padding.
The cells are too small for the text within them.
The table is too close to the bottom of the plot.
The cells belonging to the row names are not colored to match those of the bars.
I'm going out of my mind fiddling with this. Can someone help me fix these issues?
Here is the code (Python 3):
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# Set styles
plt.style.use(['seaborn-paper', 'seaborn-whitegrid'])
plt.style.use(['seaborn'])
sns.set(palette='colorblind')
matplotlib.rc("font", family="Times New Roman", size=12)
labels = ['n=1','n=2','n=3','n=4','n=5']
a = [98.8,98.8,98.8,98.8,98.8]
b = [98.6,97.8,97.0,96.2,95.4]
bar_width = 0.20
data = [a,b]
print(data)
colors = plt.cm.BuPu(np.linspace(0, 0.5, len(labels)))
columns = ('n=1', 'n=2', 'n=3', 'n=4', 'n=5')
index = np.arange(len(labels))
plt.bar(index, a, bar_width)
plt.bar(index+bar_width+.02, b, bar_width)
plt.table(cellText=data,
rowLabels=['a', 'b'],
rowColours=colors,
colLabels=columns,
loc='bottom')
plt.subplots_adjust(bottom=0.7)
plt.ylabel('Some y label which effect the bottom padding!')
plt.xticks([])
plt.title('Some title')
plt.show()
This is the output:
Update
This is working now, but in case someone else is having issues: Make sure you are not viewing your plots and the changes you make to them with IntelliJ SciView as it does not represent changes accurately and introduces some formatting issues!

I think you can fix the first problem by setting the bounding box when you make the table using bbox like this:
bbox=[0, 0.225, 1, 0.2]
where the parameters are [left, bottom, width, height].
For the second issue (the coloring), that is because the color array is not corresponding to the seaborn coloring. You can query the seaborn color palette with
sns.color_palette(palette='colorblind')
this will give you a list of the colors seaborn is using.
Check the modifications below:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# Set styles
plt.style.use(['seaborn-paper', 'seaborn-whitegrid'])
plt.style.use(['seaborn'])
sns.set(palette='colorblind')
matplotlib.rc("font", family="Times New Roman", size=12)
labels = ['n=1','n=2','n=3','n=4','n=5']
a = [98.8,98.8,98.8,98.8,98.8]
b = [98.6,97.8,97.0,96.2,95.4]
bar_width = 0.20
data = [a,b]
colors = sns.color_palette(palette='colorblind')
columns = ('n=1', 'n=2', 'n=3', 'n=4', 'n=5')
index = np.arange(len(labels))
fig = plt.figure(figsize=(12,9))
plt.bar(index, a, bar_width)
plt.bar(index+bar_width+.02, b, bar_width)
plt.table(cellText=data,
rowLabels=[' a ', ' b '],
rowColours=colors,
colLabels=columns,
loc='bottom',
bbox=[0, 0.225, 1, 0.2])
fig.subplots_adjust(bottom=0.1)
plt.ylabel('Some y label which effect the bottom padding!')
plt.xticks([])
plt.title('Some title')
plt.show()
I also changed the subplot adjustment to subplot_adjust(bottom=0.1) because it wasn't coming out right otherwise. Here is the output:

Related

problem on filing up the colour between two index values

I have a timeseries data timeseries.txt. First I select a index value (here 50) and put a red line mark on that selected index value. And I want to highlight portion before(idx-20) and after(idx+20) the red line index value on the timeseries.
I wrote this code however i am able to put the red line mark on the timeseries but while using fill_betweenx it doesnot work. I hope experts may help me overcoming this problem.Thanks.
import matplotlib.pyplot as plt
import numpy as np
input_data=np.loadtxt("timeseries.txt")
time=np.arange(len(input_data))
plt.plot(time,input_data)
idx = [50]
mark = [time[i] for i in idx]
plt.plot(idx,[input_data[i] for i in mark], marker="|",color='red',markerfacecolor='none',mew=0.4,ms=30,alpha=2.0)
plt.fill_betweenx(idx-20,idx+20 alpha=0.25,color='lightsteelblue')
plt.show()
If you are looking for just a semi-transparent rectangle, you can use patches.Rectangle to draw one. Refer here. I have updated your code to add a rectangle. See if this meets your requirement. I have used a sine wave as I didn't have your data.
import matplotlib.pyplot as plt
import numpy as np
## Create sine wave
x = np.arange(100)
input_data=np.sin(2*np.pi*3*x/100)
time=np.arange(len(input_data))
plt.plot(time,input_data)
idx = [50]
mark = [time[i] for i in idx]
plt.plot(idx,[input_data[i] for i in mark], marker="|", color='red', markerfacecolor='none', mew=0.4,ms=30,alpha=2.0)
#plt.fill_betweenx(mark,idx-20,0, alpha=0.25,color='lightsteelblue')
# Create a Rectangle patch
import matplotlib.patches as patches
from matplotlib.patches import Rectangle
plt.gca().add_patch(Rectangle((idx[0]-20, -0.15), 40, .3, facecolor = 'lightsteelblue',fill=True,alpha=0.25, lw=0))
plt.show()
EDIT
Please refer to the Rectangle documentation provided earlier in the response. You will need to adjust the start coordinates (x,y) and the height and width to see how big/small you need the Rectangle. For eg: changing the rectangle code like this...
plt.gca().add_patch(Rectangle((idx[0]-10, -0.40), 20, 0.8, facecolor = 'lightsteelblue',fill=True,alpha=0.25, lw=0))
will give you this plot.

Control marker properties in seaborn pairwise boxplot

I'm trying to plot a boxplot for two different datasets on the same plot. The x axis are the hours in a day, while the y axis goes from 0 to 1 (let's call it Efficiency). I would like to have different markers for the means of each dataset' boxes. I use the 'meanprops' for seaborn but that changes the marker style for both datasets at the same time. I've added 2000 lines of data in the excel that can be downloaded here. The values might not coincide with the ones in the picture but should be enough.
Basically I want the red squares to be blue on the orange boxplot, and red on the blue boxplot. Here is what I managed to do so far:
I tried changing the meanprops by using a dictionary with the labels as keys , but it seems to be entering a loop (in PyCharm is says Evaluating...)
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
#make sure you have your path sorted out
group1 = pd.read_excel('group1.xls')
ax,fig = plt.subplots(figsize = (20,10))
#does not work
#ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
# meanprops={"marker":{'7':"s",'8':'s'},"markerfacecolor":{'7':"white",'8':'white'},
#"markeredgecolor":{'7':"blue",'8':'red'})
#works but produces similar markers
ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
meanprops={"marker":"s","markerfacecolor":"white", "markeredgecolor":"blue"})
plt.legend(title='Groups', loc=2, bbox_to_anchor=(1, 1),borderaxespad=0.5)
# Add transparency to colors
for patch in ax.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .4))
ax.set_xlabel("Hours",fontsize=14)
ax.set_ylabel("M1 Efficiency",fontsize=14)
ax.tick_params(labelsize=10)
plt.show()
I also tried the FacetGrid but to no avail (Stops at 'Evaluating...'):
g = sns.FacetGrid(group1, col="M1_eff", hue="labels",hue_kws=dict(marker=["^", "v"]))
g = (g.map(plt.boxplot, "hour", "M1_eff")
.add_legend())
g.show()
Any help is appreciated!
I don't think you can do this using sns.boxplot() directly. I think you'll have to draw the means "by hand"
N=100
df = pd.DataFrame({'hour':np.random.randint(0,3,size=(N,)),
'M1_eff': np.random.random(size=(N,)),
'labels':np.random.choice([7,8],size=(N,))})
x_col = 'hour'
y_col = 'M1_eff'
hue_col = 'labels'
width = 0.8
hue_order=[7,8]
marker_colors = ['red','blue']
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(hue_order)
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
fig, ax = plt.subplots()
ax = sns.boxplot(data=df, x=x_col, y=y_col, hue=hue_col, hue_order=hue_order, showfliers=False, showmeans=False)
means = df.groupby([hue_col,x_col])[y_col].mean()
for (gr,temp),o,c in zip(means.groupby(level=0),offsets,marker_colors):
ax.plot(np.arange(temp.values.size)+o, temp.values, 's', c=c)

Using "hue" for a Seaborn visual: how to get legend in one graph?

I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()

Seaborn barplot with two y-axis

considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()

Issue with drawparallels argument in Basemap

This seems like it should be an easy fix but I can't get it to work. I would like 40°N to display in the attached plot, but setting the labels argument in drawparallels to [1,0,1,1] isn't doing the trick. That should plot the parallels lables where they intersect the left, top and bottom of the plot according to the documentation. I would also like for 0° to once again show up in the bottom right corner. Any idea of how I can fix those 2 issues?
from netCDF4 import Dataset as NetCDFFile
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.basemap import addcyclic
nc = NetCDFFile('C:/myfile.nc')
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time = nc.variables['time'][:]
olr = nc.variables['olr'][:]
olr,lon = addcyclic(olr,lon)
map = Basemap(llcrnrlon=0.,llcrnrlat=-40.,urcrnrlon=360.,urcrnrlat=40.,resolution='l')
lons,lats = np.meshgrid(lon,lat)
x,y = map(lons,lats)
levels = np.arange(-19.5,20.0,0.5)
levels = levels[levels!=0]
ticks = np.arange(-20.0,20.0,4.0)
cs = map.contourf(x,y,olr[0],levels, cmap='bwr')
cbar = plt.colorbar(cs, orientation='horizontal', cmap='bwr', spacing='proportional', ticks=ticks)
cbar.set_label('Outgoing Longwave Radiation Anomalies $\mathregular{(W/m^2)}$')
map.drawcoastlines()
map.drawparallels(np.arange(-40,40,20),labels=[1,0,1,1], linewidth=0.5, fontsize=7)
map.drawmeridians(np.arange(0,360,40),labels=[1,1,0,1], linewidth=0.5, fontsize=7)
The first part of the question is easy. In order for the label to show up, you have to actually draw the parallel, but np.arange(-40,40,20) does not include 40. So, if you change that statement to np.arange(-40,41,20) your 40N label will show up.
The second part should in principle be solvable in the same way, but Basemap apparently uses the modulo of the longitudes to compute the position of the labels, so just using np.arange(0,361,40) when drawing the meridians will result in two 0 labels on top of each other. However, we can capture the labels that drawmeridians generates and manually change the position of the second 0 label. The labels are stored in a dictionary, so they are easy to deal with. To compute the x position of the last label, I compute the difference in x-position between the first and the second label, multiply that with the amount of meridians to be drawn (360/40) and add the x-position of the first label.
Here the complete example:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
map = Basemap(llcrnrlon=0.,llcrnrlat=-40.,urcrnrlon=360.,urcrnrlat=40.,resolution='l')
map.drawcoastlines()
yticks = map.drawparallels(
np.arange(-40,41,20),labels=[1,0,1,1], linewidth=0.5, fontsize=7
)
xticks = map.drawmeridians(
np.arange(0,361,40),labels=[1,1,0,1], linewidth=0.5, fontsize=7
)
first_pos = xticks[0][1][0].get_position()
second_pos = xticks[40][1][0].get_position()
last_x = first_pos[0]+(second_pos[0]-first_pos[0])*360/40
xticks[360][1][0].set_position((last_x,first_pos[1]))
plt.show()
Here the resulting plot:
Hope this helps.

Resources