I have a dataset that represents male and female in binary. Males are represented as 0 while females are represented as 1. What i hope to do is to change 0 to Male, and 1 to Female in the plot legend. I tried to follow this post, but it didn't work out.
It gives me an error message that looks like this:
AttributeError Traceback (most recent call last)
<ipython-input-11-b3c99d4311ab> in <module>
23 # plot the legend
24 plt.legend()
---> 25 legend = g._legend
26 new_labels = ['Female', 'Male']
27 for t, l in zip(legend.texts, new_labels): t.set_text(l)
AttributeError: 'AxesSubplot' object has no attribute '_legend'
This is how my currrent code looks like:
## store them in different variable names
X = salary['years']
y = salary['salary']
g = salary['gender']
# prepare the scatterplot
sns.set()
plt.figure(figsize=(10,10))
g = sns.scatterplot(x=salary.years, y=salary.salary, data=salary, hue='gender')
# equations of the models
model1 = 50 + 2.776962335386217*X
model2 = 60.019802 + 2.214645*X
model3_male = 60.014922 + 2.179305*X + 1.040140*1
model3_female = 60.014922 + 2.179305*X + 1.040140*0
# plot the scatterplots
plt.plot(X, model1, color='r', label='Model 1')
plt.plot(X, model2, color='g', label='Model 2')
plt.plot(X, model3_male, color='b', label='Model 3(Male)')
plt.plot(X, model3_female, color='y', label='Model 3(Female)')
# plot the legend
plt.legend()
legend = g._legend
new_labels = ['Female', 'Male']
for t, l in zip(legend.texts, new_labels): t.set_text(l)
# set the title
plt.title('Scatterplot of salary and model fits')
plt.show()
I don't have your data, so I generate some by my own:
gender salary years
male 40000 1
male 32000 2
male 45000 3
male 54000 4
female 72000 5
female 62000 6
female 92000 7
female 55000 8
female 35000 9
female 48000 10
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
salary = pd.read_csv("1.csv", delim_whitespace=True)
print(salary)
X = salary['years']
y = salary['salary']
g = salary['gender']
# prepare the scatterplot
sns.set()
plt.figure(figsize=(10,10))
g = sns.scatterplot(x=salary.years, y=salary.salary, data=salary, hue='gender')
# equations of the models
model1 = 50 + 2.776962335386217*X
model2 = 60.019802 + 2.214645*X
model3_male = 60.014922 + 2.179305*X + 1.040140*1
model3_female = 60.014922 + 2.179305*X + 1.040140*0
# plot the scatterplots
plt.plot(X, model1, color='r', label='Model 1')
plt.plot(X, model2, color='g', label='Model 2')
plt.plot(X, model3_male, color='b', label='Model 3(Male)')
plt.plot(X, model3_female, color='y', label='Model 3(Female)')
# plot the legend
plt.legend()
# set the title
plt.title('Scatterplot of salary and model fits')
plt.show()
It works fine. So I guess values in your gender column are 0 or 1. In that case, you can do the following before g = salary['gender'] to replace 0 with male and 1 with female:
salary['gender'] = salary['gender'].map({1: 'female', 0: 'male'})
Back to your error:
---> 25 legend = g._legend
26 new_labels = ['Female', 'Male']
27 for t, l in zip(legend.texts, new_labels): t.set_text(l)
AttributeError: 'AxesSubplot' object has no attribute '_legend'
g returned by sns.scatterplot is class matplotlib.axes.Axes. To get lengend object from it, you need to use ax.get_legend() or ax.legend() rather than ax._legend. You can follow the officail Legend guide documentation.
legend = g.legend()
new_labels = ['Female', 'Male']
for t, l in zip(legend.texts[-2:], new_labels): t.set_text(l)
Related
I have a pandas dataframe like this:
Favorite B | Q1
________________
McDonalds | 5
BurgerKing | 6
KFC | 3
Brand4 | 2
i am plotting histograms out of it:
x=pd.Series(df["Q1"])
result = plt.hist(x, bins=7, color='c', edgecolor='k', alpha=0.65)
plt.axvline(x.mean(), color='k', linestyle='dashed', linewidth=1)
min_ylim, max_ylim = plt.ylim()
plt.text(x.mean()*1.1, max_ylim*0.9, 'Mean: {:.2f}'.format(x.mean()))
plt.title(str(i))
I want a different color for each bin.
How can I do it?
This should work (based on this example):
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(0)
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
n, bins, patches = plt.hist(x, bins=len(colors))
# adapt the color of each patch
for c, p in zip(colors, patches):
p.set_facecolor(c)
plt.show()
I'm trying to draw with matplotlib two average vertical line for every overlapping histograms using a loop. I have managed to draw the first one, but I don't know how to draw the second one. I'm using two variables from a dataset to draw the histograms. One variable (feat) is categorical (0 - 1), and the other one (objective) is numerical. The code is the following:
for chas in df[feat].unique():
plt.hist(df.loc[df[feat] == chas, objective], bins = 15, alpha = 0.5, density = True, label = chas)
plt.axvline(df[objective].mean(), linestyle = 'dashed', linewidth = 2)
plt.title(objective)
plt.legend(loc = 'upper right')
I also have to add to the legend the mean and standard deviation values for each histogram.
How can I do it? Thank you in advance.
I recommend you using axes to plot your figure. Pls see code below and the artist tutorial here.
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
mu1, sigma1 = 100, 8
mu2, sigma2 = 150, 15
x1 = mu1 + sigma1 * np.random.randn(10000)
x2 = mu2 + sigma2 * np.random.randn(10000)
fig, ax = plt.subplots(1, 1, figsize=(7.2, 7.2))
# the histogram of the data
lbs = ['a', 'b']
colors = ['r', 'g']
for i, x in enumerate([x1, x2]):
n, bins, patches = ax.hist(x, 50, density=True, facecolor=colors[i], alpha=0.75, label=lbs[i])
ax.axvline(bins.mean())
ax.legend()
I am reading CSV file:
Notation Level RFResult PRIResult PDResult Total Result
AAA 1 1.23 0 2 3.23
AAA 1 3.4 1 0 4.4
BBB 2 0.26 1 1.42 2.68
BBB 2 0.73 1 1.3 3.03
CCC 3 0.30 0 2.73 3.03
DDD 4 0.25 1 1.50 2.75
AAA 5 0.25 1 1.50 2.75
FFF 6 0.26 1 1.42 2.68
...
...
Here is the code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.rad_csv('home\NewFiles\Files.csv')
Notation = df['Notation']
Level = df['Level']
RFResult = df['RFResult']
PRIResult = df['PRIResult']
PDResult = df['PDResult']
fig, axes = plt.subplots(nrows=7, ncols=1)
ax1, ax2, ax3, ax4, ax5, ax6, ax7 = axes.flatten()
n_bins = 13
ax1.hist(data['Total'], n_bins, histtype='bar') #Current this shows all Total Results in one plot
plt.show()
I want to show each Level Total Result in each different axes like as follow:
ax1 will show Level 1 Total Result
ax2 will show Level 2 Total Result
ax3 will show Level 3 Total Result
ax4 will show Level 4 Total Result
ax5 will show Level 5 Total Result
ax6 will show Level 6 Total Result
ax7 will show Level 7 Total Result
You can select a filtered part of a dataframe just by indexing: df[df['Level'] == level]['Total']. You can loop through the axes using for ax in axes.flatten(). To also get the index, use for ind, ax in enumerate(axes.flatten()). Note that Python normally starts counting from 1, so adding 1 to the index would be a good choice to indicate the level.
Note that when you have backslashes in a string, you can escape them using an r-string: r'home\NewFiles\Files.csv'.
The default ylim is from 0 to the maximum bar height, plus some padding. This can be changed for each ax separately. In the example below a list of ymax values is used to show the principle.
ax.grid(True, axis='both) sets the grid on for that ax. Instead of 'both', also 'x' or 'y' can be used to only set the grid for that axis. A grid line is drawn for each tick value. (The example below tries to use little space, so only a few gridlines are visible.)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
N = 1000
df = pd.DataFrame({'Level': np.random.randint(1, 6, N), 'Total': np.random.uniform(1, 5, N)})
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True)
ymax_per_level = [27, 29, 28, 26, 27]
for ind, (ax, lev_ymax) in enumerate(zip(axes.flatten(), ymax_per_level)):
level = ind + 1
n_bins = 13
ax.hist(df[df['Level'] == level]['Total'], bins=n_bins, histtype='bar')
ax.set_ylabel(f'TL={level}') # to add the level in the ylabel
ax.set_ylim(0, lev_ymax)
ax.grid(True, axis='both')
plt.show()
PS: A stacked histogram with custom legend and custom vertical lines could be created as:
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import pandas as pd
import numpy as np
N = 1000
df = pd.DataFrame({'Level': np.random.randint(1, 6, N),
'RFResult': np.random.uniform(1, 5, N),
'PRIResult': np.random.uniform(1, 5, N),
'PDResult': np.random.uniform(1, 5, N)})
df['Total'] = df['RFResult'] + df['PRIResult'] + df['PDResult']
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True)
colors = ['crimson', 'limegreen', 'dodgerblue']
column_names = ['RFResult', 'PRIResult', 'PDResult']
level_vertical_line = [1, 2, 3, 4, 5]
for level, (ax, vertical_line) in enumerate(zip(axes.flatten(), level_vertical_line), start=1):
n_bins = 13
level_data = df[df['Level'] == level][column_names].to_numpy()
# vertical_line = level_data.mean()
ax.hist(level_data, bins=n_bins,
histtype='bar', stacked=True, color=colors)
ax.axvline(vertical_line, color='gold', ls=':', lw=2)
ax.set_ylabel(f'TL={level}') # to add the level in the ylabel
ax.margins(x=0.01)
ax.grid(True, axis='both')
legend_handles = [Patch(color=color) for color in colors]
axes[0].legend(legend_handles, column_names, ncol=len(column_names), loc='lower center', bbox_to_anchor=(0.5, 1.02))
plt.show()
I want to draw a simple choropleth map of NYC with binned # of yellow cab rides. My gpd.DataFrame looks like this:
bin cnt shape
0 15 1 POLYGON ((-74.25559 40.62194, -74.24448 40.621...
1 16 1 POLYGON ((-74.25559 40.63033, -74.24448 40.630...
2 25 1 POLYGON ((-74.25559 40.70582, -74.24448 40.705...
3 27 1 POLYGON ((-74.25559 40.72260, -74.24448 40.722...
4 32 12 POLYGON ((-74.25559 40.76454, -74.24448 40.764...
where bin is a number of region, cnt is target variable of my plot and shape column is just a series of shapely rectangles composing one covering the whole New York.
Drawing NYC from shapefile:
usa = gpd.read_file('shapefiles/gadm36_USA_2.shp')[['NAME_1', 'NAME_2', 'geometry']]
nyc = usa[usa.NAME_1 == 'New York']
ax = plt.axes([0, 0, 2, 2], projection=ccrs.PlateCarree())
ax.set_extent([-74.25559, -73.70001, 40.49612, 40.91553], ccrs.Geodetic())
ax.add_geometries(nyc.geometry.values,
ccrs.PlateCarree(),
facecolor='#1A237E');
Drawing choropleth alone works fine:
gdf.plot(column='cnt',
cmap='inferno',
scheme='natural_breaks', k=10,
legend=True)
But if I put ax parameter:
gdf.plot(ax=ax, ...)
the output is
<Figure size 432x288 with 0 Axes>
EDIT:
Got it working with following code:
from matplotlib.colors import ListedColormap
cmap = plt.get_cmap('summer')
my_cmap = cmap(np.arange(cmap.N))
my_cmap[:,-1] = np.full((cmap.N, ), 0.75)
my_cmap = ListedColormap(my_cmap)
gax = gdf.plot(column='cnt',
cmap=my_cmap,
scheme='natural_breaks', k=10,
figsize=(16,10),
legend=True,
legend_kwds=dict(loc='best'))
gax.set_title('# of yellow cab rides in NYC', fontdict={'fontsize': 20}, loc='center');
nyc.plot(ax=gax,
color='#141414',
zorder=0)
gax.set_xlim(-74.25559, -73.70001)
gax.set_ylim(40.49612, 40.91553)
When only doing this with .plot calls from geopandas this seems to work fine. Had to make up some data as I don't have yours. Let me know if this helps somehow. Code example should work as is in IPython.
%matplotlib inline
import geopandas as gpd
import numpy as np
from shapely.geometry import Polygon
from random import random
crs = {'init': 'epsg:4326'}
num_squares = 10
# load natural earth shapes
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# create random choropleth
minx, miny, maxx, maxy = world.geometry.total_bounds
x_coords = np.linspace(minx, maxx, num_squares+1)
y_coords = np.linspace(miny, maxy, num_squares+1)
polygons = [Polygon([[x_coords[i], y_coords[j]],
[x_coords[i+1], y_coords[j]],
[x_coords[i+1], y_coords[j+1]],
[x_coords[i], y_coords[j+1]]]) for i in
range(num_squares) for j in range(num_squares)]
vals = [random() for i in range(num_squares) for j in range(num_squares)]
choro_gdf = gpd.GeoDataFrame({'cnt' : vals, 'geometry' : polygons})
choro_gdf.crs = crs
# now plot both together
ax = choro_gdf.plot(column='cnt',
cmap='inferno',
scheme='natural_breaks', k=10,
#legend=True
)
world.plot(ax=ax)
This should give you something like the following
--Edit, if you're worried about setting the correct limits (as you're doing with the boroughs), please just paste the following to the end of the code (for example)
ax.set_xlim(0, 50)
ax.set_ylim(0, 25)
This should then give you:
I have made a violinplot and want to rename the x-labels .
ax = sns.violinplot(x="Week_Number", y="Ammonia", data=Res)
this is the output:
And What I want to Have is , rather than 1 I want Week 1 , than for 44 i Want Week 2 until Week 10 for 52.
Thanks Everyone
You're looking to set_xticklabels property (doc). To apply this function, you need to have the axis. There is the same for y labels with set_yticklabels.
Here the code is adapted from Seaborn examples:
# Import modules
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Create your list of labels
week_list = ["Week_" + str(i) for i in range(1, 10)]
# ['Week_1', 'Week_2', 'Week_3', 'Week_4', 'Week_5', 'Week_6', 'Week_7', 'Week_8', 'Week_9']
fig = plt.figure() # Create a new figure for getting axis
ax = fig.add_subplot(111) # Get the axis
# Create a random dataset across several variables
rs = np.random.RandomState(0)
n, p = 40, 8
d = rs.normal(0, 2, (n, p))
d += np.log(np.arange(1, p + 1)) * -5 + 10
# Use cubehelix to get a custom sequential palette
pal = sns.cubehelix_palette(p, rot=-.5, dark=.3)
# Show each distribution with both violins and points
sns.violinplot(data=d, palette=pal, inner="points")
week_list = ["Week_" + str(i) for i in range(1,10)]
# Set the x labels
ax.set_xticklabels(week_list)
# Show figure
plt.show()