I have used bar plot to display the following dataframe:
city pred actual
9 j 10.05 12.68
0 a 9.72 9.56
6 g 8.29 9.11
2 c 8.22 8.49
3 d 7.88 7.92
8 i 7.04 7.35
5 f 6.06 6.33
1 b 5.94 6.00
7 h 5.52 5.72
4 e 5.37 5.62
10 k 6.04 5.50
Code to plot:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (10, 7)
colors = ['b', 'g', 'r', 'c', 'm', 'y', 'g']
df = df.sort_values(by=['actual'], ascending=False)
ax = df.plot(x="city", y=["actual", "pred"], kind="bar", color = colors, alpha=0.8)
plt.legend(["actual", "pred"], fontsize=15)
plt.gca().set_xticklabels(df['city'])
plt.suptitle("pred vs actual", fontsize=18)
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=2), (p.get_x()+p.get_width()/2., \
p.get_height()), ha='center', va='center', xytext=(0, 10), textcoords='offset points')
plt.tight_layout()
plt.show()
Output:
What I'm trying try to do is to hide unwanted city text labels from x axis. My expected output will like this:
How can I do that? Thank you.
This line of code is the only one I find works:
ax.xaxis.label.set_visible(False)
If you have other solutions, welcome to share.
Related
I have a dataset that represents male and female in binary. Males are represented as 0 while females are represented as 1. What i hope to do is to change 0 to Male, and 1 to Female in the plot legend. I tried to follow this post, but it didn't work out.
It gives me an error message that looks like this:
AttributeError Traceback (most recent call last)
<ipython-input-11-b3c99d4311ab> in <module>
23 # plot the legend
24 plt.legend()
---> 25 legend = g._legend
26 new_labels = ['Female', 'Male']
27 for t, l in zip(legend.texts, new_labels): t.set_text(l)
AttributeError: 'AxesSubplot' object has no attribute '_legend'
This is how my currrent code looks like:
## store them in different variable names
X = salary['years']
y = salary['salary']
g = salary['gender']
# prepare the scatterplot
sns.set()
plt.figure(figsize=(10,10))
g = sns.scatterplot(x=salary.years, y=salary.salary, data=salary, hue='gender')
# equations of the models
model1 = 50 + 2.776962335386217*X
model2 = 60.019802 + 2.214645*X
model3_male = 60.014922 + 2.179305*X + 1.040140*1
model3_female = 60.014922 + 2.179305*X + 1.040140*0
# plot the scatterplots
plt.plot(X, model1, color='r', label='Model 1')
plt.plot(X, model2, color='g', label='Model 2')
plt.plot(X, model3_male, color='b', label='Model 3(Male)')
plt.plot(X, model3_female, color='y', label='Model 3(Female)')
# plot the legend
plt.legend()
legend = g._legend
new_labels = ['Female', 'Male']
for t, l in zip(legend.texts, new_labels): t.set_text(l)
# set the title
plt.title('Scatterplot of salary and model fits')
plt.show()
I don't have your data, so I generate some by my own:
gender salary years
male 40000 1
male 32000 2
male 45000 3
male 54000 4
female 72000 5
female 62000 6
female 92000 7
female 55000 8
female 35000 9
female 48000 10
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
salary = pd.read_csv("1.csv", delim_whitespace=True)
print(salary)
X = salary['years']
y = salary['salary']
g = salary['gender']
# prepare the scatterplot
sns.set()
plt.figure(figsize=(10,10))
g = sns.scatterplot(x=salary.years, y=salary.salary, data=salary, hue='gender')
# equations of the models
model1 = 50 + 2.776962335386217*X
model2 = 60.019802 + 2.214645*X
model3_male = 60.014922 + 2.179305*X + 1.040140*1
model3_female = 60.014922 + 2.179305*X + 1.040140*0
# plot the scatterplots
plt.plot(X, model1, color='r', label='Model 1')
plt.plot(X, model2, color='g', label='Model 2')
plt.plot(X, model3_male, color='b', label='Model 3(Male)')
plt.plot(X, model3_female, color='y', label='Model 3(Female)')
# plot the legend
plt.legend()
# set the title
plt.title('Scatterplot of salary and model fits')
plt.show()
It works fine. So I guess values in your gender column are 0 or 1. In that case, you can do the following before g = salary['gender'] to replace 0 with male and 1 with female:
salary['gender'] = salary['gender'].map({1: 'female', 0: 'male'})
Back to your error:
---> 25 legend = g._legend
26 new_labels = ['Female', 'Male']
27 for t, l in zip(legend.texts, new_labels): t.set_text(l)
AttributeError: 'AxesSubplot' object has no attribute '_legend'
g returned by sns.scatterplot is class matplotlib.axes.Axes. To get lengend object from it, you need to use ax.get_legend() or ax.legend() rather than ax._legend. You can follow the officail Legend guide documentation.
legend = g.legend()
new_labels = ['Female', 'Male']
for t, l in zip(legend.texts[-2:], new_labels): t.set_text(l)
I am reading CSV file:
Notation Level RFResult PRIResult PDResult Total Result
AAA 1 1.23 0 2 3.23
AAA 1 3.4 1 0 4.4
BBB 2 0.26 1 1.42 2.68
BBB 2 0.73 1 1.3 3.03
CCC 3 0.30 0 2.73 3.03
DDD 4 0.25 1 1.50 2.75
AAA 5 0.25 1 1.50 2.75
FFF 6 0.26 1 1.42 2.68
...
...
Here is the code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.rad_csv('home\NewFiles\Files.csv')
Notation = df['Notation']
Level = df['Level']
RFResult = df['RFResult']
PRIResult = df['PRIResult']
PDResult = df['PDResult']
fig, axes = plt.subplots(nrows=7, ncols=1)
ax1, ax2, ax3, ax4, ax5, ax6, ax7 = axes.flatten()
n_bins = 13
ax1.hist(data['Total'], n_bins, histtype='bar') #Current this shows all Total Results in one plot
plt.show()
I want to show each Level Total Result in each different axes like as follow:
ax1 will show Level 1 Total Result
ax2 will show Level 2 Total Result
ax3 will show Level 3 Total Result
ax4 will show Level 4 Total Result
ax5 will show Level 5 Total Result
ax6 will show Level 6 Total Result
ax7 will show Level 7 Total Result
You can select a filtered part of a dataframe just by indexing: df[df['Level'] == level]['Total']. You can loop through the axes using for ax in axes.flatten(). To also get the index, use for ind, ax in enumerate(axes.flatten()). Note that Python normally starts counting from 1, so adding 1 to the index would be a good choice to indicate the level.
Note that when you have backslashes in a string, you can escape them using an r-string: r'home\NewFiles\Files.csv'.
The default ylim is from 0 to the maximum bar height, plus some padding. This can be changed for each ax separately. In the example below a list of ymax values is used to show the principle.
ax.grid(True, axis='both) sets the grid on for that ax. Instead of 'both', also 'x' or 'y' can be used to only set the grid for that axis. A grid line is drawn for each tick value. (The example below tries to use little space, so only a few gridlines are visible.)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
N = 1000
df = pd.DataFrame({'Level': np.random.randint(1, 6, N), 'Total': np.random.uniform(1, 5, N)})
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True)
ymax_per_level = [27, 29, 28, 26, 27]
for ind, (ax, lev_ymax) in enumerate(zip(axes.flatten(), ymax_per_level)):
level = ind + 1
n_bins = 13
ax.hist(df[df['Level'] == level]['Total'], bins=n_bins, histtype='bar')
ax.set_ylabel(f'TL={level}') # to add the level in the ylabel
ax.set_ylim(0, lev_ymax)
ax.grid(True, axis='both')
plt.show()
PS: A stacked histogram with custom legend and custom vertical lines could be created as:
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import pandas as pd
import numpy as np
N = 1000
df = pd.DataFrame({'Level': np.random.randint(1, 6, N),
'RFResult': np.random.uniform(1, 5, N),
'PRIResult': np.random.uniform(1, 5, N),
'PDResult': np.random.uniform(1, 5, N)})
df['Total'] = df['RFResult'] + df['PRIResult'] + df['PDResult']
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True)
colors = ['crimson', 'limegreen', 'dodgerblue']
column_names = ['RFResult', 'PRIResult', 'PDResult']
level_vertical_line = [1, 2, 3, 4, 5]
for level, (ax, vertical_line) in enumerate(zip(axes.flatten(), level_vertical_line), start=1):
n_bins = 13
level_data = df[df['Level'] == level][column_names].to_numpy()
# vertical_line = level_data.mean()
ax.hist(level_data, bins=n_bins,
histtype='bar', stacked=True, color=colors)
ax.axvline(vertical_line, color='gold', ls=':', lw=2)
ax.set_ylabel(f'TL={level}') # to add the level in the ylabel
ax.margins(x=0.01)
ax.grid(True, axis='both')
legend_handles = [Patch(color=color) for color in colors]
axes[0].legend(legend_handles, column_names, ncol=len(column_names), loc='lower center', bbox_to_anchor=(0.5, 1.02))
plt.show()
I need to highlight a specific point in each boxplot. For example, I want to highlight the point where petal_width is 0.8 in a boxplot chart for petal_length for each species.
Here is the example:
iris = sns.load_dataset('iris')
##Create three points where petal_width is 0.8 for each species
iris_2 = pd.DataFrame({'sepal_length':Series([1,2,3],dtype='float32'), 'sepal_width':Series([1.1,2.1,3.1],dtype='float32'),
'petal_length':Series([1,2,3],dtype='float32'), 'petal_width':Series([0.8,0.8,0.8],dtype='float32'),
'species':Series(['setosa','versicolor','virginica'])})
iris_all = pd.concat([iris, iris_2]).reset_index(drop = True)
sns.boxplot(x='species', y = 'petal_length', data = iris_all)
sns.regplot(x= iris_all['species'][iris_all['petal_width'] == 0.8],
y= iris_all['petal_length'][iris_all['petal_width'] == 0.8], scatter=True, fit_reg=False, marker='o',
scatter_kws={"s": 100})
But the code doesn't work. I wonder how I can correct it. Thanks.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
# Create three points where petal_width is 0.8 for each species
iris_2 = pd.DataFrame(
{'sepal_length': pd.Series([1, 2, 3], dtype='float32'), 'sepal_width': pd.Series([1.1, 2.1, 3.1], dtype='float32'),
'petal_length': pd.Series([1, 2, 3], dtype='float32'), 'petal_width': pd.Series([0.8, 0.8, 0.8], dtype='float32'),
'species': pd.Series(['setosa', 'versicolor', 'virginica'])})
iris_all = pd.concat([iris, iris_2]).reset_index(drop=True)
sns.boxplot(x='species', y='petal_length', data=iris_all)
sns.regplot(x=iris_all['species'][(iris_all['petal_width'] > 0.79) & (iris_all['petal_width'] < 0.81)],
y=iris_all['petal_length'][(iris_all['petal_width'] > 0.79) & (iris_all['petal_width'] < 0.81)],
color='blue',
scatter=True, fit_reg=False,
marker='+',
scatter_kws={"s": 100})
plt.show()
I have a csv file which looks like below
date mse
2018-02-11 14.34
2018-02-12 7.24
2018-02-13 4.5
2018-02-14 3.5
2018-02-16 12.67
2018-02-21 45.66
2018-02-22 15.33
2018-02-24 98.44
2018-02-26 23.55
2018-02-27 45.12
2018-02-28 78.44
2018-03-01 34.11
2018-03-05 23.33
2018-03-06 7.45
... ...
Now I want to get two clusters for the mse values so that I know what values lies to which cluster and their mean.
Now since I do not have any other set of values apart from mse (I have to provide X and Y), I would like to use just mse values to get a k means cluster.For now for the other set of values, I pass it as range which is of same size as no of mse values.This is what I did
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
df = pd.read_csv("generate_csv/all_data_device.csv", parse_dates=["date"])
f1 = df['mse'].values
# generate another list
f2 = list(range(0, len(f1)))
X = np.array(list(zip(f1, f2)))
kmeans = KMeans(n_clusters=2).fit(X)
labels = kmeans.predict(X)
# Centroid values
centroids = kmeans.cluster_centers_
#print(centroids)
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X[:, 0], X[:, 1], c=labels)
ax.scatter(centroids[:, 0], centroids[:, 1], marker='*', c='#050505', s=1000)
plt.title('K Mean Classification')
plt.show()
How can I just use the mse values to get the k means cluster?I am aware of the function 'reshape()' but not quite sure how to use it?
Demo:
In [29]: kmeans = KMeans(n_clusters=2)
In [30]: df['label'] = kmeans.fit_predict(df[['mse']])
# NOTE: ----> ^ ^
In [31]: df
Out[31]:
date mse label
0 2018-02-11 14.34 0
1 2018-02-12 7.24 0
2 2018-02-13 4.50 0
3 2018-02-14 3.50 0
4 2018-02-16 12.67 0
5 2018-02-21 45.66 0
6 2018-02-22 15.33 0
7 2018-02-24 98.44 1
8 2018-02-26 23.55 0
9 2018-02-27 45.12 0
10 2018-02-28 78.44 1
11 2018-03-01 34.11 0
12 2018-03-05 23.33 0
13 2018-03-06 7.45 0
plotting:
In [64]: ax = df[df['label']==0].plot.scatter(x='mse', y='label', s=50, color='white', edgecolor='black')
In [65]: df[df['label']==1].plot.scatter(x='mse', y='label', s=50, color='white', ax=ax, edgecolor='red')
Out[65]: <matplotlib.axes._subplots.AxesSubplot at 0xfa42be0>
In [66]: plt.scatter(kmeans.cluster_centers_.ravel(), [0.5]*len(kmeans.cluster_centers_), s=100, color='green', marker='*')
Out[66]: <matplotlib.collections.PathCollection at 0xfabf208>
I am trying to create a simple scatter plot. For this specific purpose, I would like to concentric circles around the origin with different colors (like a bullseye with 3 regions). I wonder, if there is something similar to axvspan and axhspan but for concentric shading?
Let me give you an example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = x = np.linspace(0, 20, 50)
y = np.cos(3*x)
a = 3 # radius 0 to >a
b = 5 # radius a to >b
c = 7 # radius b to c
plt.axvspan(a, b, color='r', alpha = 0.5)
plt.axhspan(a, b, color='y', alpha = 0.5)
plt.scatter(x, y)
plt.show()
Instead of the horizontal and vertical shading, I want concentric green shading with a radius a from the origin, yellow from a to b, and red from b to c. Any ideas?
This is my solution:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = np.linspace(0, 20, 50)
y = np.cos(3*x)
a = 3 # radius 0 to >a
b = 5 # radius a to >b
c = 7 # radius b to c
circle1 = plt.Circle((0, 0), a, color='green', alpha=0.3)
circle2 = plt.Circle((0, 0), b, color='yellow', alpha=0.3)
circle3 = plt.Circle((0, 0), c, color='red', alpha=0.3)
ax.add_artist(circle3)
ax.add_artist(circle2)
ax.add_artist(circle1)
plt.scatter(x, y)
plt.axis([-22, 22, -22, 22])
plt.show()
Output: