I want to plot a map using Basemap and I can't find how to plot a UTM grid on the map.
I've seen how to plot the grid using long/lat but not in UTM. In Basemap y use epsg=5520 which is UTM 31N.
m = Basemap(epsg=5520, llcrnrlat=52, llcrnrlon=5,urcrnrlat=53, urcrnrlon=6, resolution='l')
service='World_Imagery', xpixels=3500)
m.drawparallels(np.arange(52, 53, 0.05), labels=[1, 0, 0, 0])
m.drawmeridians(np.arange(5, 6, 0.05), labels=[0, 0, 0, 1])
Any thoughts about how to implement a UTM grid?

With Basemap, plotting UTM grid lines or ticks on UTM map projection is not easy because Basemap's data coordinates (conversion from long-lat) are deviated from real UTM values. So, to get appropriate (x,y) from (long, lat), I use pyproj package. In the provided code, command plot() is used to plot all the grid ticks. And annotate() is used to plot the grid labels outside the map area. Values of grid labels need to multiply with 10000 to get metres units.
Here is the working code:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
import pyproj
# need pyproj package for coordinate tranformation
pp = pyproj.Proj(init='epsg:5520')
# use map's corners (long,lat) to get grid coordinates (x,y)
corners = [[5,52], [5,53], [6.2, 53], [5.95, 51.5]]
for ea in corners:
x,y = pp(ea[0],ea[1]) #(long,lat) to (x,y)
lon,lat = pp(x, y, inverse=True)
print(x, y, "%4.1f"%(lon), "%4.1f"%(lat))
# the output of print() above, give extents in grid coordinates
#x range: 1630000, 1720000 m -> 1630, 1720 km
#y range: 5710000, 5900000 m -> 5710, 5900 km
low_x, hi_x = 1630000, 1720000
low_y, hi_y = 5710000, 5900000
grid_sp = 10000 # 10km grid spacing
# we will plot grid ticks '+' at 10km spacing
# .. inside the plotting area
lon_lat = [] # for positions of grid ticks '+'
for ea in np.arange(low_x, hi_x, grid_sp): # xs
for eb in np.arange(low_y, hi_y, grid_sp): # ys
lon,lat = pp(ea, eb, inverse=True)
#print(ea, eb, lon, lat)
# lon, lat is good for plotting on basemap
# for annotation above top edge, every 10km
yt = 5870000 # y at top edge of map
xs_top = [] # for labels' positions of x grid
for xi in np.arange(low_x, hi_x, grid_sp): # xs
lon,lat = pp(xi, yt, inverse=True)
#print(ea, eb, lon, lat)
# make anno text for every 10 km along map's top edge
anno_top = map(str, list(range(low_x/grid_sp, hi_x/grid_sp)))
# for annotation to the right, every 10km
xr = 1700000 # x at the right edge of map
ys_rt = [] # for labels' positions of y grid
for yi in np.arange(low_y, hi_y, grid_sp): # ys
lon,lat = pp(xr, yi, inverse=True)
#print(xr, yi, lon, lat)
# make anno text for every 10 km along map's right edge
anno_rt = map(str, list(range(low_y/grid_sp, hi_y/grid_sp)))
# prep fig/axes for Basemap plot
fig, ax = plt.subplots(figsize=(10, 12))
m = Basemap(epsg=5520, llcrnrlat=52, llcrnrlon=5, urcrnrlat=53, urcrnrlon=6, resolution='i')
# option to plot imagery, need internet connection
if True:
server = ''
m.arcgisimage(server=server, service='World_Imagery', xpixels=1500)
# plot grid ticks '+' inside map area
m.plot(np.array(lon_lat)[:,0], np.array(lon_lat)[:,1], 'w+', latlon=True, zorder=10)
# option to plot grid labels on top/right edges
if True:
# grid labels on top edge
for id,ea in enumerate(xs_top):
if ea[0]>5.0 and ea[0]<6.0:
ax.annotate(anno_top[id], \
m(ea[0], ea[1]), \
xytext=[-8,50], \
textcoords='offset points', \
# grid labels on right edge
for id,ea in enumerate(ys_rt):
if ea[1]>52.0 and ea[1]<53.0:
ax.annotate(anno_rt[id], \
m(ea[0], ea[1]), \
xytext=[10,-5], \
textcoords='offset points', \
m.drawparallels(np.arange(52, 53.1, 0.1), labels=[1, 0, 0, 0])
m.drawmeridians(np.arange(5, 6, 0.1), labels=[0, 0, 0, 1])
The resulting plot:


How to draw vertical average lines for overlapping histograms in a loop

I'm trying to draw with matplotlib two average vertical line for every overlapping histograms using a loop. I have managed to draw the first one, but I don't know how to draw the second one. I'm using two variables from a dataset to draw the histograms. One variable (feat) is categorical (0 - 1), and the other one (objective) is numerical. The code is the following:
for chas in df[feat].unique():
plt.hist(df.loc[df[feat] == chas, objective], bins = 15, alpha = 0.5, density = True, label = chas)
plt.axvline(df[objective].mean(), linestyle = 'dashed', linewidth = 2)
plt.legend(loc = 'upper right')
I also have to add to the legend the mean and standard deviation values for each histogram.
How can I do it? Thank you in advance.
I recommend you using axes to plot your figure. Pls see code below and the artist tutorial here.
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
mu1, sigma1 = 100, 8
mu2, sigma2 = 150, 15
x1 = mu1 + sigma1 * np.random.randn(10000)
x2 = mu2 + sigma2 * np.random.randn(10000)
fig, ax = plt.subplots(1, 1, figsize=(7.2, 7.2))
# the histogram of the data
lbs = ['a', 'b']
colors = ['r', 'g']
for i, x in enumerate([x1, x2]):
n, bins, patches = ax.hist(x, 50, density=True, facecolor=colors[i], alpha=0.75, label=lbs[i])

Common X and Y axis lable for all subplots in the case of sns.lineplot and axhline? [duplicate]

I have the following plot:
import matplotlib.pyplot as plt
fig2 = plt.figure()
ax3 = fig2.add_subplot(2,1,1)
ax4 = fig2.add_subplot(2,1,2)
ax4.loglog(x1, y1)
ax3.loglog(x2, y2)
I want to be able to create axes labels and titles not just for each of the two subplots, but also common labels that span both subplots. For example, since both plots have identical axes, I only need one set of x and y- axes labels. I do want different titles for each subplot though.
I tried a few things but none of them worked right
You can create a big subplot that covers the two subplots and then set the common labels.
import random
import matplotlib.pyplot as plt
x = range(1, 101)
y1 = [random.randint(1, 100) for _ in range(len(x))]
y2 = [random.randint(1, 100) for _ in range(len(x))]
fig = plt.figure()
ax = fig.add_subplot(111) # The big subplot
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
# Turn off axis lines and ticks of the big subplot
ax.tick_params(labelcolor='w', top=False, bottom=False, left=False, right=False)
ax1.loglog(x, y1)
ax2.loglog(x, y2)
# Set common labels
ax.set_xlabel('common xlabel')
ax.set_ylabel('common ylabel')
ax1.set_title('ax1 title')
ax2.set_title('ax2 title')
plt.savefig('common_labels.png', dpi=300)
Another way is using fig.text() to set the locations of the common labels directly.
import random
import matplotlib.pyplot as plt
x = range(1, 101)
y1 = [random.randint(1, 100) for _ in range(len(x))]
y2 = [random.randint(1, 100) for _ in range(len(x))]
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.loglog(x, y1)
ax2.loglog(x, y2)
# Set common labels
fig.text(0.5, 0.04, 'common xlabel', ha='center', va='center')
fig.text(0.06, 0.5, 'common ylabel', ha='center', va='center', rotation='vertical')
ax1.set_title('ax1 title')
ax2.set_title('ax2 title')
plt.savefig('common_labels_text.png', dpi=300)
One simple way using subplots:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3, 4, sharex=True, sharey=True)
# add a big axes, hide frame
fig.add_subplot(111, frameon=False)
# hide tick and tick label of the big axes
plt.tick_params(labelcolor='none', top=False, bottom=False, left=False, right=False)
plt.xlabel("common X")
plt.ylabel("common Y")
New in matplotlib 3.4.0
There are now built-in methods to set common axis labels:
fig.supxlabel('common x label')
fig.supylabel('common y label')
To reproduce OP's loglog plots (common labels but separate titles):
x = np.arange(0.01, 10.01, 0.01)
y = 2 ** x
fig, (ax1, ax2) = plt.subplots(2, 1, constrained_layout=True)
ax1.loglog(y, x)
ax2.loglog(x, y)
# separate subplot titles
# common axis labels
plt.setp() will do the job:
# plot something
fig, axs = plt.subplots(3,3, figsize=(15, 8), sharex=True, sharey=True)
for i, ax in enumerate(axs.flat):
ax.set_title(f'Title {i}')
# set labels
plt.setp(axs[-1, :], xlabel='x axis label')
plt.setp(axs[:, 0], ylabel='y axis label')
Wen-wei Liao's answer is good if you are not trying to export vector graphics or that you have set up your matplotlib backends to ignore colorless axes; otherwise the hidden axes would show up in the exported graphic.
My answer suplabel here is similar to the fig.suptitle which uses the fig.text function. Therefore there is no axes artist being created and made colorless.
However, if you try to call it multiple times you will get text added on top of each other (as fig.suptitle does too). Wen-wei Liao's answer doesn't, because fig.add_subplot(111) will return the same Axes object if it is already created.
My function can also be called after the plots have been created.
def suplabel(axis,label,label_prop=None,
''' Add super ylabel or xlabel to the figure
Similar to matplotlib.suptitle
axis - string: "x" or "y"
label - string
label_prop - keyword dictionary for Text
labelpad - padding from the axis (default: 5)
ha - horizontal alignment (default: "center")
va - vertical alignment (default: "center")
fig = pylab.gcf()
xmin = []
ymin = []
for ax in fig.axes:
xmin,ymin = min(xmin),min(ymin)
dpi = fig.dpi
if axis.lower() == "y":
x = xmin-float(labelpad)/dpi
y = 0.5
elif axis.lower() == 'x':
rotation = 0.
x = 0.5
y = ymin - float(labelpad)/dpi
raise Exception("Unexpected axis: x or y")
if label_prop is None:
label_prop = dict()
Here is a solution where you set the ylabel of one of the plots and adjust the position of it so it is centered vertically. This way you avoid problems mentioned by KYC.
import numpy as np
import matplotlib.pyplot as plt
def set_shared_ylabel(a, ylabel, labelpad = 0.01):
"""Set a y label shared by multiple axes
a: list of axes
ylabel: string
labelpad: float
Sets the padding between ticklabels and axis label"""
f = a[0].get_figure()
f.canvas.draw() #sets f.canvas.renderer needed below
# get the center position for all plots
top = a[0].get_position().y1
bottom = a[-1].get_position().y0
# get the coordinates of the left side of the tick labels
x0 = 1
for at in a:
at.set_ylabel('') # just to make sure we don't and up with multiple labels
bboxes, _ = at.yaxis.get_ticklabel_extents(f.canvas.renderer)
bboxes = bboxes.inverse_transformed(f.transFigure)
xt = bboxes.x0
if xt < x0:
x0 = xt
tick_label_left = x0
# set position of label
a[-1].yaxis.set_label_coords(tick_label_left - labelpad,(bottom + top)/2, transform=f.transFigure)
length = 100
x = np.linspace(0,100, length)
y1 = np.random.random(length) * 1000
y2 = np.random.random(length)
f,a = plt.subplots(2, sharex=True, gridspec_kw={'hspace':0})
a[0].plot(x, y1)
a[1].plot(x, y2)
set_shared_ylabel(a, 'shared y label (a. u.)')
# list loss and acc are your data
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
ax1.plot(iteration1, loss)
ax2.plot(iteration2, acc)
ax1.set_title('Training Loss')
ax2.set_title('Training Accuracy')
The methods in the other answers will not work properly when the yticks are large. The ylabel will either overlap with ticks, be clipped on the left or completely invisible/outside of the figure.
I've modified Hagne's answer so it works with more than 1 column of subplots, for both xlabel and ylabel, and it shifts the plot to keep the ylabel visible in the figure.
def set_shared_ylabel(a, xlabel, ylabel, labelpad = 0.01, figleftpad=0.05):
"""Set a y label shared by multiple axes
a: list of axes
ylabel: string
labelpad: float
Sets the padding between ticklabels and axis label"""
f = a[0,0].get_figure()
f.canvas.draw() #sets f.canvas.renderer needed below
# get the center position for all plots
top = a[0,0].get_position().y1
bottom = a[-1,-1].get_position().y0
# get the coordinates of the left side of the tick labels
x0 = 1
x1 = 1
for at_row in a:
at = at_row[0]
at.set_ylabel('') # just to make sure we don't and up with multiple labels
bboxes, _ = at.yaxis.get_ticklabel_extents(f.canvas.renderer)
bboxes = bboxes.inverse_transformed(f.transFigure)
xt = bboxes.x0
if xt < x0:
x0 = xt
x1 = bboxes.x1
tick_label_left = x0
# shrink plot on left to prevent ylabel clipping
# (x1 - tick_label_left) is the x coordinate of right end of tick label,
# basically how much padding is needed to fit tick labels in the figure
# figleftpad is additional padding to fit the ylabel
plt.subplots_adjust(left=(x1 - tick_label_left) + figleftpad)
# set position of label,
# note that (figleftpad-labelpad) refers to the middle of the ylabel
a[-1,-1].yaxis.set_label_coords(figleftpad-labelpad,(bottom + top)/2, transform=f.transFigure)
# set xlabel
y0 = 1
for at in axes[-1]:
at.set_xlabel('') # just to make sure we don't and up with multiple labels
bboxes, _ = at.xaxis.get_ticklabel_extents(fig.canvas.renderer)
bboxes = bboxes.inverse_transformed(fig.transFigure)
yt = bboxes.y0
if yt < y0:
y0 = yt
tick_label_bottom = y0
axes[-1, -1].set_xlabel(xlabel)
axes[-1, -1].xaxis.set_label_coords((left + right) / 2, tick_label_bottom - labelpad, transform=fig.transFigure)
It works for the following example, while Hagne's answer won't draw ylabel (since it's outside of the canvas) and KYC's ylabel overlaps with the tick labels:
import matplotlib.pyplot as plt
import itertools
fig, axes = plt.subplots(3, 4, sharey='row', sharex=True, squeeze=False)
for i, a in enumerate(itertools.chain(*axes)):
a.plot([0,4**i], [0,4**i])
set_shared_ylabel(axes, 'common X', 'common Y')
Alternatively, if you are fine with colorless axis, I've modified Julian Chen's solution so ylabel won't overlap with tick labels.
Basically, we just have to set ylims of the colorless so it matches the largest ylims of the subplots so the colorless tick labels sets the correct location for the ylabel.
Again, we have to shrink the plot to prevent clipping. Here I've hard coded the amount to shrink, but you can play around to find a number that works for you or calculate it like in the method above.
import matplotlib.pyplot as plt
import itertools
fig, axes = plt.subplots(3, 4, sharey='row', sharex=True, squeeze=False)
miny = maxy = 0
for i, a in enumerate(itertools.chain(*axes)):
a.plot([0,4**i], [0,4**i])
miny = min(miny, a.get_ylim()[0])
maxy = max(maxy, a.get_ylim()[1])
# add a big axes, hide frame
# set ylim to match the largest range of any subplot
ax_invis = fig.add_subplot(111, frameon=False)
ax_invis.set_ylim([miny, maxy])
# hide tick and tick label of the big axis
plt.tick_params(labelcolor='none', top=False, bottom=False, left=False, right=False)
plt.xlabel("common X")
plt.ylabel("common Y")
# shrink plot to prevent clipping
You could use "set" in axes as follows:
axes[0].set(xlabel="KartalOl", ylabel="Labeled")

Unable to plot circles on a map projection in basemap using Python

I'm trying to plot circles on a miller projection map using a center latitude, longitude and radius. I can't get the circles to show up on the map projection. I've tried plotting them using different techniques as shown in the links.
How to plot a circle in basemap or add artiste
How to make smooth circles on basemap projections
Here is my code:
def plot_notams(dict_of_filtered_notams):
''' Create a map of the US and plot all NOTAMS from a given time period.'''
'''Create the map'''
fig = plt.figure(figsize=(8,6), dpi=200)
ax = fig.add_subplot(111)
m = Basemap(projection='mill',llcrnrlat=20, urcrnrlat=55, llcrnrlon=-135, urcrnrlon=-60, resolution='h')
m.fillcontinents(color='coral', lake_color='aqua')
m.drawmeridians(np.arange(-130, -65, 10), labels=[1,0,0,1], textcolor='black')
m.drawparallels(np.arange(20, 60, 5), labels=[1,0,0,1], textcolor='black')
''' Now add the NOTAMS to the map '''
notam_data = dict_of_filtered_notams['final_notam_list']
for line in notam_data:
notam_lat = float(line.split()[0])
notam_lon = float(line.split()[1])
coords = convert_coords(notam_lon, notam_lat)
notam_lon, notam_lat = coords[0], coords[1]
FL400_radius = np.radians(float(line.split()[2]))
x,y = m(notam_lon, notam_lat)
print("notam_lon = ",notam_lon, "notam_lat = ", notam_lat,"\n")
print("x,y values = ",'%.3f'%x,",",'%.3f'%y,"\n")
print("FL400_radius = ",('% 3.2f' % FL400_radius))
cir = plt.Circle((x,y), FL400_radius, color="white", fill=False)
(The convert_coords function is simply formatting the notam_lon/notam_lat values into a usable format as shown in the data below.)
Here is what my data looks like (you can see where it's being printed in the code above):
notam_lon = -117.7839 notam_lat = 39.6431
x,y values = 1914342.075 , 2398770.441
FL400_radius = 6.98
Here's an image of what my code above produces:
I also tried using the map.plot() function (specifically, m.plot(x,y, "o")) in place of "ax.add_patch(cir)." That worked but plotted points or "o's," of course. Here's the image produced by replacing "ax.add_patch(cir)" with "m.plot(x,y, "o")."
And as a final note, I'm using basemap 1.2.0-1 and matplotlib 3.0.3. I haven't found any indication that these versions are incompatible. Also, this inability to plot a circle wasn't an issue 2 months ago when I did this last. I'm at a loss here. I appreciate any feedback. Thank you.
To plot circles on a map, you need appropriate locations (x,y) and radius. Here is a working code and resulting plot.
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
# make up 10 data points for location of circles
notam_lon = np.linspace(-117.7839, -100, 10)
notam_lat = np.linspace(39.6431, 52, 10)
# original radius of circle is too small
FL400_radius = 6.98 # what unit?
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
m = Basemap(projection='mill', llcrnrlat=20, urcrnrlat=55, llcrnrlon=-135, urcrnrlon=-60, resolution='l')
# radiusm = (m.ymax-m.ymin)/10. is good for check plot
radiusm = FL400_radius*10000 # meters, you adjust as needed here
for xi,yi in zip(notam_lon, notam_lat):
# xy=m(xi,yi): conversion (long,lat) to (x,y) on map
circle1 = plt.Circle(xy=m(xi,yi), radius=radiusm, \
edgecolor="blue", facecolor="yellow", zorder=10)
#ax.add_patch(circle1) # deprecated
ax.add_artist(circle1) # use this instead
m.fillcontinents(color='coral', lake_color='aqua')
# m.drawmapboundary(fill_color='aqua') <-- causes deprecation warnings
# use this instead:
rect = plt.Rectangle((m.xmin,m.ymin), m.xmax-m.xmin, m.ymax-m.ymin, facecolor="aqua", zorder=-10)
m.drawmeridians(np.arange(-130, -65, 10), labels=[1,0,0,1], textcolor='black')
m.drawparallels(np.arange(20, 60, 5), labels=[1,0,0,1], textcolor='black')
The output map:
Hope this is useful.

Obtaining coordinates in projected map using Cartopy

I'm trying to obtain the coordinates of the features of a map using Cartopy but I would like to obtain the map projected coordinates instead of the data from the original projection.
For instance:
import matplotlib.pyplot as plt
import as ccrs
fig = plt.figure(figsize=(10, 10))
ax = plt.axes(projection=ccrs.epsg(3857))
lines = ax.plot((0, 360), (-85.06, 85.06), transform=ccrs.PlateCarree())
The previous code shows a map with two lines using the map projection but lines (a list with matplotlib.lines.Line2D instances) is just only one object with the coordinates in the original projection of the data (lines[0].get_data() ---> (array([ 0, 360]), array([-85.06, 85.06]))).
On an interactive plot, a Qt5 backend obtained after, I can see coordinates in EPSG:3857 and in PlateCarree when the cursor is over the map so I wonder if there is an easy way to get lines in EPSG:3857 coordinates.
EDIT: The example above is quite simplified. I've tried to do it simple for better understanding but maybe is better to show the real problem.
I have a grid of data with longitudes in the range [0, 360]. I can modify the arrays in order to have inputs in the range [-180, 180] and I'm using Cartopy/Matplotlib to plot contours. From the contours I'm obtaining a matplotlib.contour.QuadContourSet with several matplotlib.collections.LineCollection. From each matplotlib.collections.LineCollection I can obtain the matplotlib.path.Paths and I would like to have the coordinates of each Path in EPSG:3857 instead of in the original PlateCarree so I can use cartopy.mpl.patch.path_to_geos to convert each Path to a shapely geometry object in the EPSG:3857 projection without having to extract vertices from each Path, convert them from PlateCarree to EPSG:3857 and then create a new Path with the converted coordinates to use cartopy.mpl.patch.path_to_geos to obtain geometries in the crs I need.
The question asks for a coordinate transformation using Cartopy's feature, and maybe something else.
Here I provide the code that performs coordinate transformation and computation check.
import matplotlib.pyplot as plt
import as ccrs
import numpy as np
# Test data in geographic lon, lat (degrees)
lons = np.array((0, 360.01)) # any number of longitude
lats = np.array((-85.06, 85.06)) # .. longitude
# define all CRS
crs_longlat = ccrs.PlateCarree()
crs_3857 = ccrs.epsg(3857)
# Transformation function
def coordXform(orig_crs, target_crs, x, y):
Converts array of (y,x) from orig_crs -> target_crs
y, x: numpy array of float values
orig_crs: source CRS
target_crs: target CRS
# original code is one-liner
# it leaves an open axes that need to plt.close() later
# return plt.axes( projection = target_crs ).projection.transform_points( orig_crs, x, y )
# new improved code follows
xys = plt.axes( projection = target_crs ).projection.transform_points( orig_crs, x, y )
# print(plt.gca()) # current axes: GeoAxes: _EPSGProjection(3857)
plt.close() # Kill GeoAxes
# print(plt.gca()) # AxesSubplot (new current axes)
return xys
# Transform geographic (lon-lat) to (x, y) of epsg(3857)
xys = coordXform(crs_longlat, crs_3857, lons, lats)
for ea in xys:
print("(x, y) meters: " + str(ea[0]) + ', ' + str(ea[1]))
#(x, y) meters: 0.0, -20006332.4374
#(x, y) meters: 1113.19490794, 20006332.4374
# Computation check
# Transform (x, y) of epsg(3857) to geographic (lon-lat), degrees
xs = xys[:,0] # all x's
ys = xys[:,1] # all y's
lls = coordXform(crs_3857, crs_longlat, xs, ys)
for ea in lls:
print("(lon, lat) degrees: " + str(ea[0]) + ', ' + str(ea[1]))
#(lon, lat) degrees: 0.0, -85.06
#(lon, lat) degrees: 0.01, 85.06
# plt.close() # no need now
Edit 2
According to the constructive comments, the transformation function above can be written as follows:
def coordXform(orig_crs, target_crs, x, y):
return target_crs.transform_points( orig_crs, x, y )

Recreating decision-boundary plot in python with scikit-learn and matplotlib

I found this wonderful graph in post here Variation on "How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?". In this example K-NN is used to clasify data into three classes. I especially enjoy that it features the probability of class membership as a indication of the "confidence".
r and ggplot seem to do a great job.I wonder, whether this can be re-created in python? My initial thought tends to scikit-learn and matplotlib. Here is the iris example from scikit:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
n_neighbors = 15
# import some data to play with
iris = datasets.load_iris()
X =[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
y =
h = .02 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights), y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
This produces a graph in a sense very similar:
I have three questions:
How can I introduce the confidence to the plot?
How can I plot the decision-boundaries with a connected line?
Let's say I have a new observation, how can I introduce it to the plot and plot if it is classified correctly?
I stumbled upon your question about a year ago, and loved the plot -- I just never got around to answering it, until now. Hopefully the code comments below are self-explanitory enough (I also blogged about, if you want more details). Maybe four years too late, haha.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from matplotlib.lines import Line2D
from matplotlib.ticker import MaxNLocator
from sklearn import neighbors
iris = datasets.load_iris()
x =[:,0:2]
y =
# create the x0, x1 feature
x0 = x[:,0]
x1 = x[:,1]
# set main parameters for KNN plot
N_NEIGHBORS = 15 # KNN number of neighbors
H = 0.1 # mesh stepsize
PROB_DOT_SCALE = 40 # modifier to scale the probability dots
PROB_DOT_SCALE_POWER = 3 # exponential used to increase/decrease size of prob dots
TRUE_DOT_SIZE = 50 # size of the true labels
PAD = 1.0 # how much to "pad" around the true labels
clf = neighbors.KNeighborsClassifier(N_NEIGHBORS, weights='uniform'), y)
# find the min/max points for both x0 and x1 features
# these min/max values will be used to set the bounds
# for the plot
x0_min, x0_max = np.round(x0.min())-PAD, np.round(x0.max()+PAD)
x1_min, x1_max = np.round(x1.min())-PAD, np.round(x1.max()+PAD)
# create 1D arrays representing the range of probability data points
# on both the x0 and x1 axes.
x0_axis_range = np.arange(x0_min,x0_max, H)
x1_axis_range = np.arange(x1_min,x1_max, H)
# create meshgrid between the two axis ranges
xx0, xx1 = np.meshgrid(x0_axis_range, x1_axis_range)
# put the xx in the same dimensional format as the original x
# because it's easier to work with that way (at least for me)
# * shape will be: [no_dots, no_dimensions]
# where no_dimensions = 2 (x0 and x1 axis)
xx = np.reshape(np.stack((xx0.ravel(),xx1.ravel()),axis=1),(-1,2))
yy_hat = clf.predict(xx) # prediction of all the little dots
yy_prob = clf.predict_proba(xx) # probability of each dot being
# the predicted color
yy_size = np.max(yy_prob, axis=1)
# make figure'seaborn-whitegrid') # set style because it looks nice
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8,6), dpi=150)
# establish colors and colormap
# * color blind colors, from
redish = '#d73027'
orangeish = '#fc8d59'
yellowish = '#fee090'
blueish = '#4575b4'
colormap = np.array([redish,blueish,orangeish])
# plot all the little dots, position defined by the xx values, color
# defined by the knn predictions (yy_hat), and size defined by the
# probability of that color (yy_prob)
# * because the yy_hat values are either 0, 1, 2, we can use
# these as values to index into the colormap array
# * size of dots (the probability) increases exponentially (^3), so that there is
# a nice difference between different probabilities. I'm sure there is a more
# elegant way to do this though...
# * linewidths=0 so that there are no "edges" around the dots
ax.scatter(xx[:,0], xx[:,1], c=colormap[yy_hat], alpha=0.4,
s=PROB_DOT_SCALE*yy_size**PROB_DOT_SCALE_POWER, linewidths=0,)
# plot the contours
# * we have to reshape the yy_hat to get it into a
# 2D dimensional format, representing both the x0
# and x1 axis
# * the number of levels and color scheme was manually tuned
# to make sense for this data. Would probably change, for
# instance, if there were 4, or 5 (etc.) classes
ax.contour(x0_axis_range, x1_axis_range,
levels=3, linewidths=1,
colors=[redish,blueish, blueish,orangeish,])
# plot the original x values.
# * zorder is 3 so that the dots appear above all the other dots
ax.scatter(x[:,0], x[:,1], c=colormap[y], s=TRUE_DOT_SIZE, zorder=3,
linewidths=0.7, edgecolor='k')
# create legends
x_min, x_max = ax.get_xlim()
y_min, y_max = ax.get_ylim()
# set x-y labels
# create class legend
# Line2D properties:
# about size of scatter plot points:
legend_class = []
for flower_class, color in zip(['c', 's', 'v'], [blueish, redish, orangeish]):
legend_class.append(Line2D([0], [0], marker='o', label=flower_class,ls='None',
markerfacecolor=color, markersize=np.sqrt(TRUE_DOT_SIZE),
markeredgecolor='k', markeredgewidth=0.7))
# iterate over each of the probabilities to create prob legend
prob_values = [0.4, 0.6, 0.8, 1.0]
legend_prob = []
for prob in prob_values:
legend_prob.append(Line2D([0], [0], marker='o', label=prob, ls='None', alpha=0.8,
markeredgecolor='k', markeredgewidth=0))
legend1 = ax.legend(handles=legend_class, loc='center',
bbox_to_anchor=(1.05, 0.35),
frameon=False, title='class')
legend2 = ax.legend(handles=legend_prob, loc='center',
bbox_to_anchor=(1.05, 0.65),
frameon=False, title='prob', )
ax.add_artist(legend1) # add legend back after it disappears
ax.set_yticks(np.arange(x1_min,x1_max, 1)) # I don't like the decimals
ax.grid(False) # remove gridlines (inherited from 'seaborn-whitegrid' style)
# only use integers for axis tick labels
# from:
# set the aspect ratio to 1, for looks
# remove first ticks from axis labels, for looks
# from:
ax.set_yticks(np.arange(x1_min,x1_max, 1)[1:])
