I am currently going through the Kaggle Titanic Machine Learning thing and using http://nbviewer.jupyter.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb to figure it out as I am a relative beginner to Python. I thought I understood what the first few steps were doing and I am trying to recreate an earlier step by making a figure with multiple plots on it. I can't seem to get the plots to actually show up.
Here is my code:
`
import pandas as pd
import numpy as np
import pylab as plt
train=pd.read_csv("train.csv")
#Set the global default size of matplotlib figures
plt.rc('figure', figsize=(10, 5))
#Size of matplotlib figures that contain subplots
figsize_with_subplots = (10, 10)
# Size of matplotlib histogram bins
bin_size = 10
females_df = train[train['Sex']== 'female']
print("females_df", females_df)
females_xt = pd.crosstab(females_df['Pclass'],train['Survived'])
females_xt_pct = females_xt.div(females_xt.sum(1).astype(float), axis = 0)
males = train[train['Sex'] == 'male']
males_xt = pd.crosstab(males['Pclass'], train['Survived'])
males_xt_pct= males_xt.div(males_xt.sum(1).astype(float), axis = 0)
plt.figure(5)
plt.subplot(221)
females_xt_pct.plot(kind='bar', title='Female Survival Rate by Pclass')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
plt.subplot(222)
males_xt_pct.plot(kind='bar', title= 'Male Survival Rate by Pclass')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
`
And this is displaying two blank plots separately (one in the 221 location, and then next plot on a new figure in the 222 location) and then another plot with males that actually works at the end. What am I doing wrong here?
In order to plot the pandas plot to apreviously created subplot, you may use the ax argument of the pandas plotting function.
ax=plt.subplot(..)
df.plot(..., ax=ax)
So in this case the code may look like
plt.figure(5)
ax=plt.subplot(221)
females_xt_pct.plot(kind='bar', title='Female Survival Rate by Pclass',ax=ax)
ax2=plt.subplot(222)
males_xt_pct.plot(kind='bar', title= 'Male Survival Rate by Pclass',ax=ax2)
Related
I am trying to add a legend to my scatter plot with 13 classes, however, with my code below, I am only able to get the first label. Can you assist me in generating the full list to show up in the legend of the scatter plot?
Here is my example code:
from sklearn.datasets import make_blobs
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_blobs(n_samples=1000, centers=13, n_features=2)
classes = [f"class {i}" for i in range(13)]
#fig = plt.figure()
plt.figure(figsize=(15, 12))
scatter = plt.scatter(
x=X[:,0],
y=X[:,1],
s = 20,
c = y,
cmap='Spectral'
#c=[sns.color_palette()[x] for x in y_train_new]
)
plt.gca().set_aspect('equal', 'datalim')
plt.legend(classes)
plt.title('Dataset', fontsize=24)
You can do that by replacing the plt.legend(classes) in your code by this line... I hope this is what you are looking for. I am using matplotlib 3.3.4.
plt.legend(handles=scatter.legend_elements()[0], labels=classes)
Output plot
I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()
guys, I'm a chemist and I've finished an experiment that gave me the energies of a metal d orbitals.
It is relatively easy to get the correct proportion of energies in Excel 1 and use a drawing program like Inkscape to draw the diagram for molecular orbitals (like I did with this one below 2) but I’d love to use python to get a beautiful diagram that considers the energies of my orbitals like we see in the books.
My first attempt using seaborn and swarmplot is obviously too far from the correct approach and maybe (probably!) is not the correct way to get there. I'd be more than happy to achieve something like the right side here in 3.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Energies = [-0.40008, -0.39583, -0.38466, -0.23478, -0.21239]
orbitals = ["dz2", "dxy", "dyz", "dx2y2", "dxz"]
df = pd.DataFrame(Energies)
df["Orbitals"] = pd.DataFrame(orbitals)
sns.swarmplot(y=df[0], size=16)
Thanks for any help.
1 The excel one
2 Drawn by hand using the excel version as the model
3 Extracted from literature
You can draw anything you like deriving from basic shapes and functions in matplotlib. Energy levels could be simple markers, the texts can be produced by annotate.
import numpy as np
import matplotlib.pyplot as plt
Energies = [-0.40008, -0.39583, -0.38466, -0.23478, -0.21239]
orbitals = ["$d_{z^2}$", "$d_{xy}$", "$d_{yz}$", "$d_{x^2 - y^2}$", "$d_{xz}$"]
x = np.arange(len(Energies))
fig, ax = plt.subplots()
ax.scatter(x, Energies, s=1444, marker="_", linewidth=3, zorder=3)
ax.grid(axis='y')
for xi,yi,tx in zip(x,Energies,orbitals):
ax.annotate(tx, xy=(xi,yi), xytext=(0,-4), size=18,
ha="center", va="top", textcoords="offset points")
ax.margins(0.2)
plt.show()
considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()
I have the results of a (H,ranges) = numpy.histogram2d() computation and I'm trying to plot it.
Given H I can easily put it into plt.imshow(H) to get the corresponding image. (see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow )
My problem is that the axis of the produced image are the "cell counting" of H and are completely unrelated to the values of ranges.
I know I can use the keyword extent (as pointed in: Change values on matplotlib imshow() graph axis ). But this solution does not work for me: my values on range are not growing linearly (actually they are going exponentially)
My question is: How can I put the value of range in plt.imshow()? Or at least, or can I manually set the label values of the plt.imshow resulting object?
Editing the extent is not a good solution.
You can just change the tick labels to something more appropriate for your data.
For example, here we'll set every 5th pixel to an exponential function:
import numpy as np
import matplotlib.pyplot as plt
im = np.random.rand(21,21)
fig,(ax1,ax2) = plt.subplots(1,2)
ax1.imshow(im)
ax2.imshow(im)
# Where we want the ticks, in pixel locations
ticks = np.linspace(0,20,5)
# What those pixel locations correspond to in data coordinates.
# Also set the float format here
ticklabels = ["{:6.2f}".format(i) for i in np.exp(ticks/5)]
ax2.set_xticks(ticks)
ax2.set_xticklabels(ticklabels)
ax2.set_yticks(ticks)
ax2.set_yticklabels(ticklabels)
plt.show()
Expanding a bit on #thomas answer
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mi
im = np.random.rand(20, 20)
ticks = np.exp(np.linspace(0, 10, 20))
fig, ax = plt.subplots()
ax.pcolor(ticks, ticks, im, cmap='viridis')
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlim([1, np.exp(10)])
ax.set_ylim([1, np.exp(10)])
By letting mpl take care of the non-linear mapping you can now accurately over-plot other artists. There is a performance hit for this (as pcolor is more expensive to draw than AxesImage), but getting accurate ticks is worth it.
imshow is for displaying images, so it does not support x and y bins.
You could either use pcolor instead,
H,xedges,yedges = np.histogram2d()
plt.pcolor(xedges,yedges,H)
or use plt.hist2d which directly plots your histogram.