I have the following data frame my_df:
my_1 my_2 my_3
--------------------------------
0 5 7 4
1 3 5 13
2 1 2 8
3 12 9 9
4 6 1 2
I want to make a plot where x-axis is categorical values with my_1, my_2, and my_3. y-axis is integer. For each column in my_df, I want to plot all its 5 values at x = my_i. What kind of plot should I use in matplotlib? Thanks!
You could make a bar chart:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
df.T.plot(kind='bar')
plt.show()
or a scatter plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
fig, ax = plt.subplots()
cols = np.arange(len(df.columns))
x = np.repeat(cols, len(df))
y = df.values.ravel(order='F')
color = np.tile(np.arange(len(df)), len(df.columns))
scatter = ax.scatter(x, y, s=150, c=color)
ax.set_xticks(cols)
ax.set_xticklabels(df.columns)
cbar = plt.colorbar(scatter)
cbar.set_ticks(np.arange(len(df)))
plt.show()
Just for fun, here is how to make the same scatter plot using Pandas' df.plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
columns = df.columns
index = df.index
df = df.stack()
df.index.names = ['color', 'column']
df = df.rename('y').reset_index()
df['x'] = pd.Categorical(df['column']).codes
ax = df.plot(kind='scatter', x='x', y='y', c='color', colorbar=True,
cmap='viridis', s=150)
ax.set_xticks(np.arange(len(columns)))
ax.set_xticklabels(columns)
cbar = ax.collections[-1].colorbar
cbar.set_ticks(index)
plt.show()
Unfortunately, it requires quite a bit of DataFrame manipulation just to call
df.plot and then there are some extra matplotlib calls needed to set the tick
marks on the scatter plot and colorbar. Since Pandas is not saving effort here,
I would go with the first (NumPy/matplotlib) approach shown above.
Related
My legend now shows,
I want to add my label in legend, from 0 to 7, but I don't want to add a for-loop in my code and correct each label step by step, my code like that,
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:,0],
pca_2_spec[:,1],
s = 7,
marker='o',
c = pred_pca_2_spec,
cmap= 'rainbow')
ax.legend(*points.legend_elements(), title = 'cluster')
plt.show()
Assuming pred_pca_2_spec is some np.array with values [0, 5, 10, 15, 20, 30, 35] to change the values of these to be in the range 0-7, simply divide (each element) by 5.
Sample Data:
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(54)
pca_2_spec = np.random.randint(-100, 300, (100, 2))
pred_pca_2_spec = np.random.choice([0, 5, 10, 15, 20, 25, 30, 35], 100)
Plotting Code:
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:, 0],
pca_2_spec[:, 1],
s=7,
marker='o',
c=pred_pca_2_spec / 5, # Divide By 5
cmap='rainbow')
ax.legend(*points.legend_elements(), title='cluster')
plt.show()
Say I have an example dataframe below:
Division Home Corners Away Corners
Bundesliga 5 3
Bundesliga 5 5
EPL 7 4
EPL 3 2
League 1 10 6
Serie A 3 3
Serie A 8 2
League 1 3 1
I want to create a boxplot of total corners per game grouped by divison, but I want the home corners and away Corners to be separated but on the same figure. Similar to what the "hue" keyword accomplishes, but how do I accomplish that?
seaborn.boxplot
Reshape the data to a long form with pandas.DataFrame.stack
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'Division': ['Bundesliga', 'Bundesliga', 'EPL', 'EPL', 'League 1', 'Serie A', 'Serie A', 'League 1'],
'Home Corners': [5, 5, 7, 3, 10, 3, 8, 3],
'Away Corners ': [3, 5, 4, 2, 6, 3, 2, 1]}
df = pd.DataFrame(data)
# convert the data to a long format
df.set_index('Division', inplace=True)
dfl = df.stack().reset_index().rename(columns={'level_1': 'corners', 0: 'val'})
# plot
sns.boxplot('corners', 'val', data=dfl, hue='Division')
plt.legend(title='Division', bbox_to_anchor=(1.05, 1), loc='upper left')
You can melt the original data and use sns.boxplot:
sns.boxplot(data=df.melt('Division', var_name='Home/Away', value_name='Corners'),
x='Division', y='Corners',hue='Home/Away')
Output:
I am trying to create a series of graphs that share x and y labels. I can get the graphs to each have a label (explained well here!), but this is not what I am looking for.
I want one label that covers the y axis of both graphs, and same for the x axis.
I've been looking at the matplotlib and pandas documentation and I was unable to find anything that addresses this issues when the using by argument.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], sharey=True, sharex=True)
plt.ylabel('ylabel') # I assume the label is created on the 4th graph and then deleted?
plt.xlabel('xlabel') # Creates a label on the 4th graph.
plt.tight_layout()
plt.show()
The ouput looks like this.
Is there any way that I can create a Y Label that goes across the entire left side of the image (not each graph individually) and the same for the X Label.
As you can see, the x label only appears on the last graph created, and there is no y label.
Help?
This is one way to do it indirectly using the x- and y-labels as texts. I am not aware of a direct way using plt.xlabel or plt.ylabel. When passing an axis object to df.hist, the sharex and sharey arguments have to be passed in plt.subplots(). Here you can manually control/specify the position where you want to put the labels. For example, if you think the x-label is too close to the ticks, you can use 0.5, -0.02, 'X-label' to shift it slightly below.
import matplotlib.pyplot as plt
import pandas as pd
f, ax = plt.subplots(2, 2, figsize=(8, 6), sharex=True, sharey=True)
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], ax=ax)
f.text(0, 0.5, 'Y-label', ha='center', va='center', fontsize=20, rotation='vertical')
f.text(0.5, 0, 'X-label', ha='center', va='center', fontsize=20)
plt.tight_layout()
I fixed the issue with the variable number of sub-plots using something like this:
cols = 3
n = len(set(df['A']))
rows = int(n / cols) + (0 if n % cols == 0 else 1)
fig, axes = plt.subplots(rows, cols)
extra = rows * cols - n
if extra:
newaxes = []
count = 0
for row in range(rows):
for col in range(cols):
if count < n:
newaxes.append(axes[row][col])
else:
axes[row][col].axis('off')
count += 1
else:
newaxes = axes
hist = df.hist(by=df['A'], ax=newaxes)
How Can i make this image stretchable using mouse event in matplotlib. please help.
Here the code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as image
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
fig = plt.figure()
X = [1, 2, 3, 4, 5, 6, 7]
Y = [1, 3, 4, 2, 5, 8, 6]
mainaxes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # main axes
img = image.imread('https://upload.wikimedia.org/wikipedia/commons/7/70/Example.png')
z = 0.3 + 0.3
imagebox = OffsetImage(img, zoom=z)
imgbox = AnnotationBbox(imagebox, (0.3,0.5), frameon=True)
mainaxes.add_artist(imgbox)
imgbox.draggable()
plt.show()
I have time series data which are multi-indexed on (Year, Month) as seen here:
print(df.index)
print(df)
MultiIndex(levels=[[2016, 2017], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0], [2, 3, 4, 5, 6, 7, 8, 9]],
names=['Year', 'Month'])
Value
Year Month
2016 3 65.018150
4 63.130035
5 71.071254
6 72.127967
7 67.357795
8 66.639228
9 64.815232
10 68.387698
I want to do very basic linear regression on these time series data. Because pandas.DataFrame.plot does not do any regression, I intend to use Seaborn to do my plotting.
I attempted to do this by using lmplot:
sns.lmplot(x=("Year", "Month"), y="Value", data=df, fit_reg=True)
but I get an error:
TypeError: '>' not supported between instances of 'str' and 'tuple'
This is particularly interesting to me because all elements in df.index.levels[:] are of type numpy.int64, all elements in df.index.labels[:] are of type numpy.int8.
Why am I receiving this error? How can I resolve it?
You can use reset_index to turn the dataframe's index into columns. Plotting DataFrames columns is then straight forward with seaborn.
As I guess the reason to use lmplot would be to show different regressions for different years (otherwise a regplot may be better suited), the "Year"column can be used as hue.
import numpy as np
import pandas as pd
import seaborn.apionly as sns
import matplotlib.pyplot as plt
iterables = [[2016, 2017], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
index = pd.MultiIndex.from_product(iterables, names=['Year', 'Month'])
df = pd.DataFrame({"values":np.random.rand(24)}, index=index)
df2 = df.reset_index() # or, df.reset_index(inplace=True) if df is not required otherwise
g = sns.lmplot(x="Month", y="values", data=df2, hue="Year")
plt.show()
Consider the following approach:
df['x'] = df.index.get_level_values(0) + df.index.get_level_values(1)/100
yields:
In [49]: df
Out[49]:
Value x
Year Month
2016 3 65.018150 2016.03
4 63.130035 2016.04
5 71.071254 2016.05
6 72.127967 2016.06
7 67.357795 2016.07
8 66.639228 2016.08
9 64.815232 2016.09
10 68.387698 2016.10
let's prepare X-ticks labels:
labels = df.index.get_level_values(0).astype(str) + '-' + \
df.index.get_level_values(1).astype(str).str.zfill(2)
sns.lmplot(x='x', y='Value', data=df, fit_reg=True)
ax = plt.gca()
ax.set_xticklabels(labels)
Result: