How to get / set the correct (formatted) yticks of a colorbar in matplotlib without whitespace in the colorbar? - python-3.x

How to get the correct yticks of a colorbar in matplotlib without whitespace in the colorbar?
This is my code, note that the colors of the colorbar are misaligned if I apply .set_ticks() using the (formatted) values I got through get.ticks(), these values (as printed in the output) seem incorrect as the minimum shown is 15 but my minimum input value is 17.15116279.
import geopandas as gpd # version 0.11.0
import matplotlib.pyplot as plt # version 3.5.2
import matplotlib.colors as clr
from matplotlib import colorbar
from matplotlib.colors import Normalize # tbv colorbar
from matplotlib import cm
import matplotlib.ticker as mtick
cmap = clr.LinearSegmentedColormap.from_list('custom blue', ["#fce19c", "#c4ddee"], N=400)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est>0) & (world.name!="Antarctica")]
vals = [22.36958444, 29.21348315, 30.74534161, 37.42331288, 20.,
19.31407942, 26.08695652, 26.36165577, 25.0, 17.79279279,
17.15116279, 19.60784314]
world = world[:len(vals)]
world['gdp_per_cap'] = vals
fig, ax = plt.subplots(1, 1)
ax = world.plot(column='gdp_per_cap', ax=ax, legend=False, cmap=cmapgeelblauw)
from mpl_toolkits.axes_grid1.axes_divider import make_axes_locatable
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
vmin = world['gdp_per_cap'].min()
vmax = world['gdp_per_cap'].max()
norm = Normalize(vmin=vmin, vmax=vmax)
n_cmap = cm.ScalarMappable(norm=norm, cmap=cmap)
n_cmap.set_array([])
cbar = fig.colorbar(n_cmap, cax=cax)
print(cax==cbar.ax) # True
vals = cbar.ax.get_yticks()
print(vals)
cbar.ax.yaxis.set_ticks(vals)
cbar.ax.set_yticklabels(['{:,.0%}'.format(x/100) for x in vals])
plt.show()
Note that the colorbar remains correct is if
cbar.ax.yaxis.set_ticks(vals)
is not applied. But in that case I get the warning "UserWarning: FixedFormatter should only be used together with FixedLocator".
Also note: to avoid the issue I could apply a format this way:
cax_format = mtick.PercentFormatter(decimals=2)
cbar = fig.colorbar(n_cmap, cax=cax, format=cax_format)
And if I add the line
fig.draw_without_rendering()
# followed by vals = cbar.ax.get_yticks()
as suggested by Stef in the comments then the values are different (but still incorrect from my point of view) and the colorbar gets a 2nd white area due to this:
This is what is looks like if I do not set the ticks: This is what I am after but the warning made me set the ticks and realise that something may be wrong.
Based on the 2nd comment by Stef: "note that not necessarily all ticks are within the view limits, i.e. this first and last one may not actually be displayed. Manually setting ticks, on the other hand, expands the view limits to the ticks range given. If these are outside vmin / vmax it will cause the white gap you see."
Indeed, if I manually adjust the values as follows:
fig.draw_without_rendering()
vals = cbar.ax.get_yticks()
print(vals)
vals = [vmin] + vals[1:-1].tolist() + [vmax]
print(vals)
cbar.ax.yaxis.set_ticks(vals)
vals = ['{:,.0%}'.format(x/100) for x in vals]
vals = [''] + vals[1:-1] + ['']
print(vals)
cbar.ax.set_yticklabels(vals)
plt.show()
Then you get:

By manually setting the ticks and tick labels, you create a fixed locator and a corresponding function formatter. Using a fixed locator is seldom the optimal solution due to the possible pitfalls outlined in the comments.
If you just want to add a % sign and/or change the number of decimals, you can use a string formatter which is implicitely created when you pass a formatting string to set_major_formatter:
cax.yaxis.set_major_formatter('{x:g} %')

Related

How to increase the size of the figure by percentage but keep the original aspect ratio?

I have the following code to draw a figure
import pandas as pd
import urllib3
import seaborn as sns
decathlon = pd.read_csv("https://raw.githubusercontent.com/leanhdung1994/Deep-Learning/main/decathlon.txt", sep='\t')
fig = sns.scatterplot(data = decathlon,
x = '100m', y = 'Long.jump',
hue = 'Points', palette = 'viridis')
sns.regplot(data = decathlon,
x = '100m', y = 'Long.jump',
scatter = False)
I read answers for similar questions and they use the option plt.figure(figsize=(20,10)). I would like to keep the original aspect (the ration of width to length), but increase the size of the figure by some percentage for better look.
Could you please elaborate on how to do so?
I forgot to add a line %config InlineBackend.figure_format = 'svg' in above code. When I add this line below answer unfortunately does not work.
First, the object returned by scatterplot() is an Axes, not a figure. scatterplot() uses the current axes to draw the plot. If there is no current axes, then matplotlib automatically creates one in the current figure. If there is not current figure, then matplotlib automatically creates a new figure.
The size of this figure is determined by the value in rcParams['figure.figsize']. Therefore, you should create a figure that has the same aspect ratio as defined in this variable before calling your plots.
For instance, the code below creates a figure that's 2x the size of the default figure.
tips = sns.load_dataset('tips')
fig = plt.figure(figsize= 2 * np.array(plt.rcParams['figure.figsize']))
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day")
sns.regplot(data=tips, x="total_bill", y="tip", scatter=False, ax=ax)

Control marker properties in seaborn pairwise boxplot

I'm trying to plot a boxplot for two different datasets on the same plot. The x axis are the hours in a day, while the y axis goes from 0 to 1 (let's call it Efficiency). I would like to have different markers for the means of each dataset' boxes. I use the 'meanprops' for seaborn but that changes the marker style for both datasets at the same time. I've added 2000 lines of data in the excel that can be downloaded here. The values might not coincide with the ones in the picture but should be enough.
Basically I want the red squares to be blue on the orange boxplot, and red on the blue boxplot. Here is what I managed to do so far:
I tried changing the meanprops by using a dictionary with the labels as keys , but it seems to be entering a loop (in PyCharm is says Evaluating...)
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
#make sure you have your path sorted out
group1 = pd.read_excel('group1.xls')
ax,fig = plt.subplots(figsize = (20,10))
#does not work
#ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
# meanprops={"marker":{'7':"s",'8':'s'},"markerfacecolor":{'7':"white",'8':'white'},
#"markeredgecolor":{'7':"blue",'8':'red'})
#works but produces similar markers
ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
meanprops={"marker":"s","markerfacecolor":"white", "markeredgecolor":"blue"})
plt.legend(title='Groups', loc=2, bbox_to_anchor=(1, 1),borderaxespad=0.5)
# Add transparency to colors
for patch in ax.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .4))
ax.set_xlabel("Hours",fontsize=14)
ax.set_ylabel("M1 Efficiency",fontsize=14)
ax.tick_params(labelsize=10)
plt.show()
I also tried the FacetGrid but to no avail (Stops at 'Evaluating...'):
g = sns.FacetGrid(group1, col="M1_eff", hue="labels",hue_kws=dict(marker=["^", "v"]))
g = (g.map(plt.boxplot, "hour", "M1_eff")
.add_legend())
g.show()
Any help is appreciated!
I don't think you can do this using sns.boxplot() directly. I think you'll have to draw the means "by hand"
N=100
df = pd.DataFrame({'hour':np.random.randint(0,3,size=(N,)),
'M1_eff': np.random.random(size=(N,)),
'labels':np.random.choice([7,8],size=(N,))})
x_col = 'hour'
y_col = 'M1_eff'
hue_col = 'labels'
width = 0.8
hue_order=[7,8]
marker_colors = ['red','blue']
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(hue_order)
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
fig, ax = plt.subplots()
ax = sns.boxplot(data=df, x=x_col, y=y_col, hue=hue_col, hue_order=hue_order, showfliers=False, showmeans=False)
means = df.groupby([hue_col,x_col])[y_col].mean()
for (gr,temp),o,c in zip(means.groupby(level=0),offsets,marker_colors):
ax.plot(np.arange(temp.values.size)+o, temp.values, 's', c=c)

Issue with drawparallels argument in Basemap

This seems like it should be an easy fix but I can't get it to work. I would like 40°N to display in the attached plot, but setting the labels argument in drawparallels to [1,0,1,1] isn't doing the trick. That should plot the parallels lables where they intersect the left, top and bottom of the plot according to the documentation. I would also like for 0° to once again show up in the bottom right corner. Any idea of how I can fix those 2 issues?
from netCDF4 import Dataset as NetCDFFile
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.basemap import addcyclic
nc = NetCDFFile('C:/myfile.nc')
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time = nc.variables['time'][:]
olr = nc.variables['olr'][:]
olr,lon = addcyclic(olr,lon)
map = Basemap(llcrnrlon=0.,llcrnrlat=-40.,urcrnrlon=360.,urcrnrlat=40.,resolution='l')
lons,lats = np.meshgrid(lon,lat)
x,y = map(lons,lats)
levels = np.arange(-19.5,20.0,0.5)
levels = levels[levels!=0]
ticks = np.arange(-20.0,20.0,4.0)
cs = map.contourf(x,y,olr[0],levels, cmap='bwr')
cbar = plt.colorbar(cs, orientation='horizontal', cmap='bwr', spacing='proportional', ticks=ticks)
cbar.set_label('Outgoing Longwave Radiation Anomalies $\mathregular{(W/m^2)}$')
map.drawcoastlines()
map.drawparallels(np.arange(-40,40,20),labels=[1,0,1,1], linewidth=0.5, fontsize=7)
map.drawmeridians(np.arange(0,360,40),labels=[1,1,0,1], linewidth=0.5, fontsize=7)
The first part of the question is easy. In order for the label to show up, you have to actually draw the parallel, but np.arange(-40,40,20) does not include 40. So, if you change that statement to np.arange(-40,41,20) your 40N label will show up.
The second part should in principle be solvable in the same way, but Basemap apparently uses the modulo of the longitudes to compute the position of the labels, so just using np.arange(0,361,40) when drawing the meridians will result in two 0 labels on top of each other. However, we can capture the labels that drawmeridians generates and manually change the position of the second 0 label. The labels are stored in a dictionary, so they are easy to deal with. To compute the x position of the last label, I compute the difference in x-position between the first and the second label, multiply that with the amount of meridians to be drawn (360/40) and add the x-position of the first label.
Here the complete example:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
map = Basemap(llcrnrlon=0.,llcrnrlat=-40.,urcrnrlon=360.,urcrnrlat=40.,resolution='l')
map.drawcoastlines()
yticks = map.drawparallels(
np.arange(-40,41,20),labels=[1,0,1,1], linewidth=0.5, fontsize=7
)
xticks = map.drawmeridians(
np.arange(0,361,40),labels=[1,1,0,1], linewidth=0.5, fontsize=7
)
first_pos = xticks[0][1][0].get_position()
second_pos = xticks[40][1][0].get_position()
last_x = first_pos[0]+(second_pos[0]-first_pos[0])*360/40
xticks[360][1][0].set_position((last_x,first_pos[1]))
plt.show()
Here the resulting plot:
Hope this helps.

Python matplotlib graphing [duplicate]

I need help with setting the limits of y-axis on matplotlib. Here is the code that I tried, unsuccessfully.
import matplotlib.pyplot as plt
plt.figure(1, figsize = (8.5,11))
plt.suptitle('plot title')
ax = []
aPlot = plt.subplot(321, axisbg = 'w', title = "Year 1")
ax.append(aPlot)
plt.plot(paramValues,plotDataPrice[0], color = '#340B8C',
marker = 'o', ms = 5, mfc = '#EB1717')
plt.xticks(paramValues)
plt.ylabel('Average Price')
plt.xlabel('Mark-up')
plt.grid(True)
plt.ylim((25,250))
With the data I have for this plot, I get y-axis limits of 20 and 200. However, I want the limits 20 and 250.
Get current axis via plt.gca(), and then set its limits:
ax = plt.gca()
ax.set_xlim([xmin, xmax])
ax.set_ylim([ymin, ymax])
One thing you can do is to set your axis range by yourself by using matplotlib.pyplot.axis.
matplotlib.pyplot.axis
from matplotlib import pyplot as plt
plt.axis([0, 10, 0, 20])
0,10 is for x axis range.
0,20 is for y axis range.
or you can also use matplotlib.pyplot.xlim or matplotlib.pyplot.ylim
matplotlib.pyplot.ylim
plt.ylim(-2, 2)
plt.xlim(0,10)
Another workaround is to get the plot's axes and reassign changing only the y-values:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,25,250))
You can instantiate an object from matplotlib.pyplot.axes and call the set_ylim() on it. It would be something like this:
import matplotlib.pyplot as plt
axes = plt.axes()
axes.set_ylim([0, 1])
Just for fine tuning. If you want to set only one of the boundaries of the axis and let the other boundary unchanged, you can choose one or more of the following statements
plt.xlim(right=xmax) #xmax is your value
plt.xlim(left=xmin) #xmin is your value
plt.ylim(top=ymax) #ymax is your value
plt.ylim(bottom=ymin) #ymin is your value
Take a look at the documentation for xlim and for ylim
This worked at least in matplotlib version 2.2.2:
plt.axis([None, None, 0, 100])
Probably this is a nice way to set up for example xmin and ymax only, etc.
To add to #Hima's answer, if you want to modify a current x or y limit you could use the following.
import numpy as np # you probably alredy do this so no extra overhead
fig, axes = plt.subplot()
axes.plot(data[:,0], data[:,1])
xlim = axes.get_xlim()
# example of how to zoomout by a factor of 0.1
factor = 0.1
new_xlim = (xlim[0] + xlim[1])/2 + np.array((-0.5, 0.5)) * (xlim[1] - xlim[0]) * (1 + factor)
axes.set_xlim(new_xlim)
I find this particularly useful when I want to zoom out or zoom in just a little from the default plot settings.
This should work. Your code works for me, like for Tamás and Manoj Govindan. It looks like you could try to update Matplotlib. If you can't update Matplotlib (for instance if you have insufficient administrative rights), maybe using a different backend with matplotlib.use() could help.

Can't add matplotlib colorbar ticks

I am trying to add ticks and labels to a color bar, but it just doesn't seem to show up in the output. I have tried two approaches(as shown in the code below). Second appraoch was to do as shown in another question on Stack Overflow here: How to add Matplotlib Colorbar Ticks.
I must be overlooking something very simple here as I am a beginner in Matplotlib and Python.
I have managed to obtain the color bar, but the ticks I want just don't show up. Any help here will be greatly appreciated as I have been stuck at it for hours after trying and searching.
Here is the code I used to generate a heatmap using hexbin over a basemap.
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from matplotlib.colors import LinearSegmentedColormap
from matplotlib import cm
#Loading data from CSV file
DATA_FILE = '....../Population_data.csv'
roc_data = pd.read_csv(DATA_FILE)
roc_data.head()
#Creating figure window
fig = plt.figure(figsize=(14,10))
ax = fig.add_subplot(111)
#Drawing the basemap
m = Basemap(projection='merc', lat_0=43.12, lon_0=-77.626,
resolution = 'i',llcrnrlon=-78.236,
llcrnrlat=42.935,
urcrnrlon=-77.072,
urcrnrlat=43.349)
m.drawcoastlines()
m.drawcounties(zorder=20, color='red')
m.drawcountries()
m.drawmapboundary()
#plotting the heatmap using hexbin
x, y = m(roc_data['Longitude'].values, roc_data['Latitude'].values)
values = roc_data['Total(20-64)']
m.hexbin(x, y, gridsize = 125, bins = 'log', C = values, cmap = cm.Reds)
#Defining minimum, mean and maximum population values
max_p = roc_data['Total(20-64)'].max()
min_p = roc_data['Total(20-64)'].min()
mean_p = roc_data['Total(20-64)'].mean()
#Adding Colorbar
cb = m.colorbar(location = 'bottom', format = '%d', label = 'Population by Census Blocks')
#setting ticks
#cb.set_ticks([48, 107, 1302]) #First approach, didn't work
#cb.set_ticklabels(['Min', 'Mean', 'Max'])
cb.set_ticks([min_p, mean_p, max_p]) #Second appraoch, assumed ticks and tick labels should be same
cb.set_ticklabels([min_p, mean_p, max_p]) #from the above mentioned stackoverflow question, but did't work
plt.show()
The output I get by using the first or second approach for colorbar ticks is the same. It is as here:
Heatmap and colorbar with no ticks and labels
I want the minimum, median and maximum population values (48, 107 and 1302) to be shown on the colorbar with the labels Min, Mean and Max. Thank you for your time
When plotting the hexbin plot with mode bins = 'log', the colors will be plotted with a logarithmic scaling. This means that if the data minimum, mean and maximum are min, mean and max, their values on the logarithmically scaled colorbar are log10(min), log10(mean), log10(max).
The ticks on the colorbar therefore needs to be set with the log values. The ticklabels can be set to any value. However I would think that simply putting something like "mean" on a logarithmic scale may not be too informative.
A particularity is that the minimum of the colorbar is actually log10(min+1). The +1 is due to the log which is negative below 1.
Here is a complete example.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
from mpl_toolkits.basemap import Basemap
from matplotlib import cm
lon = -78.236+np.random.rand(1000)*(-77.072+78.236)
lat = 42.935 + np.random.rand(1000)*(43.349-42.935)
t = 99+np.random.normal(10,20,1000)
t[:50] = np.linspace(48,1302)
roc_data = pd.DataFrame({'Longitude':lon, 'Latitude':lat, "T":t })
#Creating figure window
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
#Drawing the basemap
m = Basemap(projection='merc', lat_0=43.12, lon_0=-77.626,
resolution = 'i',llcrnrlon=-78.236,
llcrnrlat=42.935,
urcrnrlon=-77.072,
urcrnrlat=43.349)
m.drawcoastlines()
m.drawcounties(zorder=20, color='red')
m.drawcountries()
m.drawmapboundary()
#plotting the heatmap using hexbin
x, y = m(roc_data['Longitude'].values, roc_data['Latitude'].values)
values = roc_data['T']
m.hexbin(x, y, gridsize = 125, bins = 'log', C = values, cmap = cm.Reds) #bins = 'log',
#Defining minimum, mean and maximum population values
max_p = roc_data['T'].max()
min_p = roc_data['T'].min()
mean_p = roc_data['T'].mean()
print [min_p, mean_p, max_p]
print [np.log10(min_p), np.log10(mean_p), np.log10(max_p)]
#Adding Colorbar
cb = m.colorbar(location = 'bottom', format = '%d', label = 'Population by Census Blocks') #format = '%d',
#setting ticks
cb.set_ticks([np.log10(min_p+1), np.log10(mean_p), np.log10(max_p)])
cb.set_ticklabels(['Min\n({:.1f})'.format(min_p), 'Mean\n({:.1f})'.format(mean_p), 'Max\n({:.1f})'.format(max_p)])
plt.tight_layout()
plt.show()

Resources