The code below creates a bar plot with an inverted y-axis. What I don't manage yet is that the bars do not "hang from above" but start at the bottom. In other words, I like the bars to start at the maximum value of the y axis (i.e. at the x-axis) and ending at the value of df['y']. How can I do that?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data={'x_cat': ['aaaaa',
'bvvvvvv',
'deeeee',
'qqqqqqq',
'rr rrrrrrrr',
'rss sdasr',
'cccccccccccc',
'aarrrrrrrrrrra'
],
'y': [11.91,
35.19,
43.61,
46.12,
75.03,
81.39,
83.28,
89.20]
})
df['rank'] = df['y'].rank(method='dense') - 1
fig = plt.figure()
ax = fig.add_subplot(111)
# increase space below subplot
fig.subplots_adjust(bottom=0.3)
ax.bar(df['rank'],
df['y'],
width=0.8,
)
# invert y axis
ax.invert_yaxis()
# label x axis
ax.set_xticks(range(len(df)))
ax.set_xticklabels(df['x_cat'],
fontdict={'fontsize': 14})
for tick in ax.get_xticklabels():
tick.set_rotation(90)
You would need to calculate the new bottom. (Note that
because the axis is inverted, the "bottom" becomes the visual top of the bars.) The bottom is the value, the height is maximum minus the value itself.
I changed some other aspects of your plot, e.g. if your values are not sorted, calculating the rank and using it for plotting would result in wrong labelling. Hence better sort the dataframe beforehands (and forget about the rank).
Finally, we would need to adjust the "sticky edges" of the bars, because they should sit tight to the bottom of the figure (i.e. the top of the axis).
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'x_cat': ['aaaaa', 'bvvvvvv', 'deeeee', 'qqqqqqq', 'rr rrrrrrrr',
'rss sdasr', 'cccccccccccc', 'aarrrrrrrrrrra'],
'y': [11.91, 35.19, 43.61, 46.12, 75.03, 81.39, 83.28, 89.20]})
df.sort_values("y", inplace=True)
fig = plt.figure()
ax = fig.add_subplot(111)
# increase space below subplot
fig.subplots_adjust(bottom=0.3)
bars = ax.bar(df['x_cat'], df['y'].max()-df['y'], bottom=df['y'], width=0.8, )
# invert y axis
ax.invert_yaxis()
ax.tick_params(axis="x", rotation=90, labelsize=14)
for bar in bars:
bar.sticky_edges.y[:] = [df['y'].values.max()]
ax.autoscale()
plt.show()
Related
I'm plotting the counts of a categorical variable and want to add a second y-axis that shows the percentage of the total number of samples.
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
g = sns.catplot(x="alive", col="embark_town", col_wrap=4,
data=titanic[titanic.deck.notnull()],
kind="count", height=4, aspect=.8)
for i, ax in enumerate(g.axes.flat):
# Create second y-axis for the percentages on the right
ax1 = ax.twinx()
### Attempt to fix percentages by plotting the bars over
#g = sns.catplot(x="alive", col="embark_town", col_wrap=4,
# data=titanic[titanic.deck.notnull()],
# kind="count", height=4, aspect=.8,
# ax = ax1)
# Label by the percentages
ax1.set_ylim(ax.get_ylim())
ax1.set_yticklabels(np.round(ax.get_yticks()/titanic[titanic.deck.notnull()].shape[0],1))
ax1.set_ylabel('Percentage')
# Rotate x-labels
labels = ax.get_xticklabels() # get x labels
ax.set_xticklabels(labels, rotation=90)
# Ensure good spacing
g.fig.tight_layout()
Right, now my issue is that the percentages are being duplicated on the right y-axis, as show in the image below
I've tried to correct this by plotting the counts on the new axis, but that adds another row of subplots (see commented out code in the for loop). How can I get the right y-axis labels to not have duplicate values and actually reflect the percentages of the total count?
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
g = sns.catplot(x="alive", col="embark_town", col_wrap=4,
data=titanic[titanic.deck.notnull()],
kind="count", height=4, aspect=.8)
# calculate numbre of samples:
total_samples = len(titanic[titanic.deck.notnull()])
for i, ax in enumerate(g.axes.flat):
# bounds of the left y-axis:
ymin, ymax = ax.get_ylim()
# # Create second y-axis for the percentages on the right
ax1 = ax.twinx()
# scale right axis labels to total samples and mutliply with 100 for percentages
ax1.set_ylim(100*ymin/total_samples, 100*ymax/total_samples)
# Ensure good spacing
g.fig.tight_layout()
I am trying to create a barchart (overlaid on a line graph with days as the x axis instead of quarters) where the labels are end-of-quarter days. That is all fine, and generates nicely, but I am trying to set the labels so that they are lined up with the right edge of the plot and the corresponding bar's right-side is aligned with the x-tick.
A reproducible example (with just the bar chart, not the line) is:
import matplotlib.pyplot as pyplot
import pandas
import random
random.seed(2020)
dates = pandas.date_range("2016-12-31", "2017-12-31")
bar = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
line = pandas.DataFrame([random.random() for r in range(len(dates))], index = dates, columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
ax2 = fig.add_subplot(111, frame_on = False)
bar.plot(kind = "bar", ax = ax, width = 1)
line.plot(kind = "line", ax = ax2)
ax2.set_xticks([])
ax.yaxis.tick_right()
ax.yaxis.set_label_position("right")
fig.tight_layout()
pyplot.show()
Which yields a plot as:
My goal is to have the right side of the 2017-12-31 column aligned with the right edge of the plot and the 2017-12-31 label at the right side as well. Further, the left side of the 2017-03-31 bar touch the left side of the plot. For the remaining bars, I would like them evenly spaced with all labels aligned with the right side of each bar, and no space in between bars. Like this example below:
Frankly, I'm at a loss. I've tried adding ha="right" to no such avail and just shifting the graphs but that leaves me with other problems and doesn't really address the problem. Even with the bars shifted, I'm still fairly constrained as to moving the tick labels and haven't found anything online that remotely addresses the problem.
Would it be better to create the bar chart so that it has the same index as the line chart, then set the x tick labels to be the desired dates?
Does anyone have any guidance? I've spent too much time on this problem today and it's driving me nuts.
In order to plot the bar chart tightly, you can use the autoscale function as below.
To move the tick labels, you can modify the transformations to include some offset. Below I used 0.7 but you can select it based on other sizes used in your chart.
import matplotlib.pyplot as pyplot
import pandas
import matplotlib.transforms as tr
df = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
df.plot(kind = "bar", ax = ax, width = 1)
pyplot.autoscale(enable=True, axis='x', tight=True) # tight layout
# for each tick label, shift 0.7 to right
for tick in ax.get_xticklabels():
tick.set_transform(tick.get_transform()+tr.ScaledTranslation(0.7, 0, fig.dpi_scale_trans))
pyplot.show()
The result looks like this.
So currently learning how to import data and work with it in matplotlib and I am having trouble even tho I have the exact code from the book.
This is what the plot looks like, but my question is how can I get it where there is no white space between the start and the end of the x-axis.
Here is the code:
import csv
from matplotlib import pyplot as plt
from datetime import datetime
# Get dates and high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#for index, column_header in enumerate(header_row):
#print(index, column_header)
dates, highs = [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10,6))
plt.plot(dates, highs, c='red')
# Format plot.
plt.title("Daily high temperatures, July 2014", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()
There is an automatic margin set at the edges, which ensures the data to be nicely fitting within the axis spines. In this case such a margin is probably desired on the y axis. By default it is set to 0.05 in units of axis span.
To set the margin to 0 on the x axis, use
plt.margins(x=0)
or
ax.margins(x=0)
depending on the context. Also see the documentation.
In case you want to get rid of the margin in the whole script, you can use
plt.rcParams['axes.xmargin'] = 0
at the beginning of your script (same for y of course). If you want to get rid of the margin entirely and forever, you might want to change the according line in the matplotlib rc file:
axes.xmargin : 0
axes.ymargin : 0
Example
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
tips.plot(ax=ax1, title='Default Margin')
tips.plot(ax=ax2, title='Margins: x=0')
ax2.margins(x=0)
Alternatively, use plt.xlim(..) or ax.set_xlim(..) to manually set the limits of the axes such that there is no white space left.
If you only want to remove the margin on one side but not the other, e.g. remove the margin from the right but not from the left, you can use set_xlim() on a matplotlib axes object.
import seaborn as sns
import matplotlib.pyplot as plt
import math
max_x_value = 100
x_values = [i for i in range (1, max_x_value + 1)]
y_values = [math.log(i) for i in x_values]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
sn.lineplot(ax=ax1, x=x_values, y=y_values)
sn.lineplot(ax=ax2, x=x_values, y=y_values)
ax2.set_xlim(-5, max_x_value) # tune the -5 to your needs
considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()
This question already has answers here:
multiple axis in matplotlib with different scales [duplicate]
(3 answers)
Closed 5 years ago.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
d = ['d1','d2','d3','d4','d5','d6']
value = [111111, 222222, 333333, 444444, 555555, 666666]
y_cumsum = np.cumsum(value)
sns.barplot(d, value)
sns.pointplot(d, y_cumsum)
plt.show()
I'm trying to make pareto diagram with barplot and pointplot. But I can't print percentages to the right side ytick. By the way, if I manuplate yticks it overlaps itself.
plt.yticks([1,2,3,4,5])
overlaps like in the image.
Edit: I mean that I want to quarter percentages (0, 25%, 50%, 75%, 100%) on the right hand side of the graphic, as well.
From what I understood, you want to show the percentages on the right hand side of your figure. To do that, we can create a second y axis using twinx(). All we need to do then is to set the limits of this second axis appropriately, and set some custom labels:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
d = ['d1','d2','d3','d4','d5','d6']
value = [111111, 222222, 333333, 444444, 555555, 666666]
fig, ax = plt.subplots()
ax2 = ax.twinx() # create a second y axis
y_cumsum = np.cumsum(value)
sns.barplot(d, value, ax=ax)
sns.pointplot(d, y_cumsum, ax=ax)
y_max = y_cumsum.max() # maximum of the array
# find the percentages of the max y values.
# This will be where the "0%, 25%" labels will be placed
ticks = [0, 0.25*y_max, 0.5*y_max, 0.75*y_max, y_max]
ax2.set_ylim(ax.get_ylim()) # set second y axis to have the same limits as the first y axis
ax2.set_yticks(ticks)
ax2.set_yticklabels(["0%", "25%","50%","75%","100%"]) # set the labels
ax2.grid("off")
plt.show()
This produces the following figure: