I'm working on building a Bokeh plot using bokeh.plotting. I have two series with a shared index that I want to plot two vertical bars for. When I use a single bar everything works fine, but when I add a second y range and the second bar it seems to be impacting the primary y range (changes the vales from 0 to 4), and my second vbar() overlays the first. Any assistance on why the bars overlap instead of being side by side and why the second series/yaxis seems to impact the first even though they are separate would be appreciated.
import pandas as pd
import bokeh.plotting as bp
from bokeh.models import NumeralTickFormatter, HoverTool, Range1d, LinearAxis
df_x_series = ['a','b','c']
fig = bp.figure(title='WIP',x_range=df_x_series,plot_width=1200,plot_height=600,toolbar_location='below',toolbar_sticky=False,tools=['reset','save'],active_scroll=None,active_drag=None,active_tap=None)
fig.title.align= 'center'
fig.extra_y_ranges = {'c_count':Range1d(start=0, end=10)}
fig.add_layout(LinearAxis(y_range_name='c_count'), 'right')
fig.vbar(bottom=0, top=[1,2,3], x=['a','b','c'], color='blue', legend='Amt', width=0.3, alpha=0.5)
fig.vbar(bottom=0, top=[5,7,8], x=['a','b','c'], color='green', legend='Ct', width=0.3, alpha=0.8, y_range_name='c_count')
fig.yaxis[0].formatter = NumeralTickFormatter(format='0.0')
bp.output_file('bar.html')
bp.show(fig)
Here's the plot I believe you want:
And here's the code:
import bokeh.plotting as bp
from bokeh.models import NumeralTickFormatter, Range1d, LinearAxis
df_x_series = ['a', 'b', 'c']
fig = bp.figure(
title='WIP',
x_range=df_x_series,
y_range=Range1d(start=0, end=4),
plot_width=1200, plot_height=600,
toolbar_location='below',
toolbar_sticky=False,
tools=['reset', 'save'],
active_scroll=None, active_drag=None, active_tap=None
)
fig.title.align = 'center'
fig.extra_y_ranges = {'c_count': Range1d(start=0, end=10)}
fig.add_layout(LinearAxis(y_range_name='c_count'), 'right')
fig.vbar(bottom=0, top=[1, 2, 3], x=['a:0.35', 'b:0.35', 'c:0.35'], color='blue', legend='Amt', width=0.3, alpha=0.5)
fig.vbar(bottom=0, top=[5, 7, 8], x=['a:0.65', 'b:0.65', 'c:0.65'], color='green', legend='Ct', width=0.3, alpha=0.8, y_range_name='c_count')
fig.yaxis[0].formatter = NumeralTickFormatter(format='0.0')
bp.output_file('bar.html')
bp.show(fig)
A couple of notes:
Categorical axes are currently a bit (ahem) ugly in Bokeh. We hope to address this in the coming months. Each one has a scale of 0 - 1 after a colon which allows you to move things left and right. So I move the first bar to the left by 0.3/2 and the second bar to the right by 0.3/2 (0.3 because that's the width you had used)
The y_range changed because you were using the default y_range for your initial y_range which is a DataRange1d. DataRange uses all the data for the plot to pick its values and adds some padding which is why it was starting at below 0 and going up to the max of your new data. By manually specifying a range in the figure call you get around this.
Thanks for providing a code sample to work from :D
Related
I am trying to create a barchart (overlaid on a line graph with days as the x axis instead of quarters) where the labels are end-of-quarter days. That is all fine, and generates nicely, but I am trying to set the labels so that they are lined up with the right edge of the plot and the corresponding bar's right-side is aligned with the x-tick.
A reproducible example (with just the bar chart, not the line) is:
import matplotlib.pyplot as pyplot
import pandas
import random
random.seed(2020)
dates = pandas.date_range("2016-12-31", "2017-12-31")
bar = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
line = pandas.DataFrame([random.random() for r in range(len(dates))], index = dates, columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
ax2 = fig.add_subplot(111, frame_on = False)
bar.plot(kind = "bar", ax = ax, width = 1)
line.plot(kind = "line", ax = ax2)
ax2.set_xticks([])
ax.yaxis.tick_right()
ax.yaxis.set_label_position("right")
fig.tight_layout()
pyplot.show()
Which yields a plot as:
My goal is to have the right side of the 2017-12-31 column aligned with the right edge of the plot and the 2017-12-31 label at the right side as well. Further, the left side of the 2017-03-31 bar touch the left side of the plot. For the remaining bars, I would like them evenly spaced with all labels aligned with the right side of each bar, and no space in between bars. Like this example below:
Frankly, I'm at a loss. I've tried adding ha="right" to no such avail and just shifting the graphs but that leaves me with other problems and doesn't really address the problem. Even with the bars shifted, I'm still fairly constrained as to moving the tick labels and haven't found anything online that remotely addresses the problem.
Would it be better to create the bar chart so that it has the same index as the line chart, then set the x tick labels to be the desired dates?
Does anyone have any guidance? I've spent too much time on this problem today and it's driving me nuts.
In order to plot the bar chart tightly, you can use the autoscale function as below.
To move the tick labels, you can modify the transformations to include some offset. Below I used 0.7 but you can select it based on other sizes used in your chart.
import matplotlib.pyplot as pyplot
import pandas
import matplotlib.transforms as tr
df = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
df.plot(kind = "bar", ax = ax, width = 1)
pyplot.autoscale(enable=True, axis='x', tight=True) # tight layout
# for each tick label, shift 0.7 to right
for tick in ax.get_xticklabels():
tick.set_transform(tick.get_transform()+tr.ScaledTranslation(0.7, 0, fig.dpi_scale_trans))
pyplot.show()
The result looks like this.
I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()
I have added a table to the bottom of my plot, but there are a number of issues with it:
The right has too much padding.
The left has too little padding.
The bottom has no padding.
The cells are too small for the text within them.
The table is too close to the bottom of the plot.
The cells belonging to the row names are not colored to match those of the bars.
I'm going out of my mind fiddling with this. Can someone help me fix these issues?
Here is the code (Python 3):
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# Set styles
plt.style.use(['seaborn-paper', 'seaborn-whitegrid'])
plt.style.use(['seaborn'])
sns.set(palette='colorblind')
matplotlib.rc("font", family="Times New Roman", size=12)
labels = ['n=1','n=2','n=3','n=4','n=5']
a = [98.8,98.8,98.8,98.8,98.8]
b = [98.6,97.8,97.0,96.2,95.4]
bar_width = 0.20
data = [a,b]
print(data)
colors = plt.cm.BuPu(np.linspace(0, 0.5, len(labels)))
columns = ('n=1', 'n=2', 'n=3', 'n=4', 'n=5')
index = np.arange(len(labels))
plt.bar(index, a, bar_width)
plt.bar(index+bar_width+.02, b, bar_width)
plt.table(cellText=data,
rowLabels=['a', 'b'],
rowColours=colors,
colLabels=columns,
loc='bottom')
plt.subplots_adjust(bottom=0.7)
plt.ylabel('Some y label which effect the bottom padding!')
plt.xticks([])
plt.title('Some title')
plt.show()
This is the output:
Update
This is working now, but in case someone else is having issues: Make sure you are not viewing your plots and the changes you make to them with IntelliJ SciView as it does not represent changes accurately and introduces some formatting issues!
I think you can fix the first problem by setting the bounding box when you make the table using bbox like this:
bbox=[0, 0.225, 1, 0.2]
where the parameters are [left, bottom, width, height].
For the second issue (the coloring), that is because the color array is not corresponding to the seaborn coloring. You can query the seaborn color palette with
sns.color_palette(palette='colorblind')
this will give you a list of the colors seaborn is using.
Check the modifications below:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# Set styles
plt.style.use(['seaborn-paper', 'seaborn-whitegrid'])
plt.style.use(['seaborn'])
sns.set(palette='colorblind')
matplotlib.rc("font", family="Times New Roman", size=12)
labels = ['n=1','n=2','n=3','n=4','n=5']
a = [98.8,98.8,98.8,98.8,98.8]
b = [98.6,97.8,97.0,96.2,95.4]
bar_width = 0.20
data = [a,b]
colors = sns.color_palette(palette='colorblind')
columns = ('n=1', 'n=2', 'n=3', 'n=4', 'n=5')
index = np.arange(len(labels))
fig = plt.figure(figsize=(12,9))
plt.bar(index, a, bar_width)
plt.bar(index+bar_width+.02, b, bar_width)
plt.table(cellText=data,
rowLabels=[' a ', ' b '],
rowColours=colors,
colLabels=columns,
loc='bottom',
bbox=[0, 0.225, 1, 0.2])
fig.subplots_adjust(bottom=0.1)
plt.ylabel('Some y label which effect the bottom padding!')
plt.xticks([])
plt.title('Some title')
plt.show()
I also changed the subplot adjustment to subplot_adjust(bottom=0.1) because it wasn't coming out right otherwise. Here is the output:
considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()
I am using seaborn(v.0.7.1) together with matplotlib(1.5.1) and pandas (v.0.18.1) to plot different clusters of data of different sizes as heat maps within a for loop as shown in the following code.
My issue is that since each cluster contains different number of rows, the final figures are of different sizes (i.e. the height and width of each box in the heat map is different across different heat maps)(see figures). Eventually, I would like to have figures of the same size (as explained above).
I have checked some parts of seabornand matplotlib documentations as well as stackoverflowbut since I do not know what the exact keywords are to look for (as evident in the question title itself) I have not been able to find any answer. [EDIT: Now I have updated the title based on a suggestion from #ImportanceOfBeingErnest. Previously the title was read: "Enforcing the same width across multiple plots".]
import numpy as np
import pandas as pd
clusters = pd.DataFrame([(1,'aaaaaaaaaaaaaaaaa'),(1,'b'), (1,'c'), (1,'d'), (2,'e'), (2,'f')])
clusters.columns = ['c', 'p']
clusters.set_index('c', inplace=True)
g = pd.DataFrame(np.ones((6,4)))
c= pd.DataFrame([(1,'aaaaaaaaaaaaaaaaa'),(2,'b'), (3,'c'), (4,'d'), (5,'e'), (6,'f')])
c.columns = ['i', 'R']
for i in range(1,3,1):
ee = clusters[clusters.index==i].p
inds = []
for v in ee:
inds.append(np.where(c.R.values == v)[0][0])
f, ax = plt.subplots(1, figsize=(13, 15))
ax = sns.heatmap(g.iloc[inds], square=True, ax=ax, cbar=True, linewidths=2, linecolor='k', cmap="Reds", cbar_kws={"shrink": .5},
vmin = math.floor(g.values.min()), vmax =math.ceil(g.values.max()))
null = ax.set_xticklabels(['a', 'b', 'c', 'd'], fontsize=15)
null = ax.set_yticklabels(c.R.values[inds][::-1], fontsize=15, rotation=0)
plt.tight_layout(pad=3)
[EDIT]: Now I have added some code to create a minimal, functional example as suggested by #Brian. Now I have noticed that the issue might have been caused by the text!
Under the following conditions
If only the squares in the saved images should have the same size and we don't care about the plot on screen and
We can omit the colorbar
the solution is rather straight forward.
One would define the size that one square should have in the final image squaresize = 50, find out the number of squares to draw in each dimension (n, m) and adjust the figure size as
figwidth = m*squaresize/float(dpi)
figheight = n*squaresize/float(dpi)
where dpi denotes the pixels per inch.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
dpi=100
squaresize = 50 # pixels
n = 3
m = 4
data = np.random.rand(n,m)
figwidth = m*squaresize/float(dpi)
figheight = n*squaresize/float(dpi)
f, ax = plt.subplots(1, figsize=(figwidth, figheight), dpi=dpi)
f.subplots_adjust(left=0, right=1, bottom=0, top=1)
ax = sns.heatmap(data, square=True, ax=ax, cbar=False)
plt.savefig(__file__+".png", dpi=dpi, bbox_inches="tight")
The bbox_inches="tight" makes sure that the labels etc. are still drawn (i.e. the final figure size will be larger than the one calculated here, depending on how much space the labels need).
To apply this example to your case you'd still need to find out how many rows and columns you have in the heatmap depending on the dataframe, but as I don't have it's structure, it's hard to provide a general solution.