Highlight a point in grouped boxplot in seaborn

Highlight a point in grouped boxplot in seaborn - python-3.x

I need to highlight a specific point in each boxplot. For example, I want to highlight the point where petal_width is 0.8 in a boxplot chart for petal_length for each species.
Here is the example:
iris = sns.load_dataset('iris')
##Create three points where petal_width is 0.8 for each species
iris_2 = pd.DataFrame({'sepal_length':Series([1,2,3],dtype='float32'), 'sepal_width':Series([1.1,2.1,3.1],dtype='float32'),
'petal_length':Series([1,2,3],dtype='float32'), 'petal_width':Series([0.8,0.8,0.8],dtype='float32'),
'species':Series(['setosa','versicolor','virginica'])})
iris_all = pd.concat([iris, iris_2]).reset_index(drop = True)
sns.boxplot(x='species', y = 'petal_length', data = iris_all)
sns.regplot(x= iris_all['species'][iris_all['petal_width'] == 0.8],
y= iris_all['petal_length'][iris_all['petal_width'] == 0.8], scatter=True, fit_reg=False, marker='o',
scatter_kws={"s": 100})
But the code doesn't work. I wonder how I can correct it. Thanks.

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
# Create three points where petal_width is 0.8 for each species
iris_2 = pd.DataFrame(
{'sepal_length': pd.Series([1, 2, 3], dtype='float32'), 'sepal_width': pd.Series([1.1, 2.1, 3.1], dtype='float32'),
'petal_length': pd.Series([1, 2, 3], dtype='float32'), 'petal_width': pd.Series([0.8, 0.8, 0.8], dtype='float32'),
'species': pd.Series(['setosa', 'versicolor', 'virginica'])})
iris_all = pd.concat([iris, iris_2]).reset_index(drop=True)
sns.boxplot(x='species', y='petal_length', data=iris_all)
sns.regplot(x=iris_all['species'][(iris_all['petal_width'] > 0.79) & (iris_all['petal_width'] < 0.81)],
y=iris_all['petal_length'][(iris_all['petal_width'] > 0.79) & (iris_all['petal_width'] < 0.81)],
color='blue',
scatter=True, fit_reg=False,
marker='+',
scatter_kws={"s": 100})
plt.show()

Related

Custom annotation of text in seaborn heatmap

I want to assign different fontsizes for positive and negative values in the following heatmap plotted using seaborn.
import seaborn as sns # % matplotlib inline
import matplotlib.pyplot as plt
data = np.array([[0.000000, 0.000000], [-0.231049, 0.000000], [0.231049, 0.000000]])
data = {0: [0.000000, 0.000000], 1: [2.31049, 0.000000], 2: [-0.231049, 0.000000]}
df = pd.DataFrame.from_dict(data, orient='index')
sns.heatmap(
df, cmap='bwr', vmax=10, vmin=0, annot=True, fmt='f',
linewidths=0.25, annot_kws={"fontsize": 16}, center=0, square=True
)
sns.heatmap(
df, cmap='bwr', vmax=0, vmin=-10, annot=True, fmt='f',
linewidths=0.25, annot_kws={"fontsize": 6}, center=0, square=True
)
plt.show()
I tried to specify the min and max and plot, in two steps but the colors and fonts aren't-displayed right.
Suggestions on how to fix this will be of great help.

To make it easier to keep the properties in sync, the code below uses a for loop. For the positive part, the dataframe is filtered to only contain the positive values. (Internally, pandas fills in NaN for the values that get filtered away, and seaborn leaves those cells blank.)
vmin and vmax are set to the same values for both the negative and positive part of the loop. That way, the colorbar will show all values. To avoid drawing the colorbar twice, cbar=False once.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(-10, 11, (12, 12)))
fig, ax = plt.subplots()
for posneg in ['pos', 'neg']:
sns.heatmap(
df[df > 0] if posneg == 'pos' else df[df < 0],
cmap='bwr', vmin=-10, vmax=10, center=0, annot=True, fmt='.0f',
annot_kws={"fontsize": 16 if posneg == 'pos' else 8},
cbar=(posneg != 'pos'), cbar_kws={'ticks': range(-10, 11, 2)},
linewidths=0.25, square=True, ax=ax
)
plt.show()
PS: The code above uses if/else inside some of the arguments. Such a conditional expression can be handy when only something short is involved, or in a list comprehension.
An alternative would be to use a normal if test together with variables, e.g.:
for posneg in ['pos', 'neg']:
if posneg == 'pos':
df_filtered = df[df > 0]
fontsize = 16
fontweight = 'bold'
else:
df_filtered = df[df < 0]
fontsize = 12
fontweight = 'light'
sns.heatmap(
df_filtered,
cmap='bwr', vmin=-10, vmax=10, center=0, annot=True, fmt='.0f',
annot_kws={"fontweight": fontweight, "fontsize": fontsize},
cbar=(posneg != 'pos'), cbar_kws={'ticks': range(-10, 11, 2)},
linewidths=0.25, square=True, ax=ax
)

How to visualize a list of strings on a colorbar in matplotlib

I have a dataset like
x = 3,4,6,77,3
y = 8,5,2,5,5
labels = "null","exit","power","smile","null"
Then I use
from matplotlib import pyplot as plt
plt.scatter(x,y)
colorbar = plt.colorbar(labels)
plt.show()
to make a scatter plot, but cannot make colorbar showing labels as its colors.
How to get this?

I'm not sure, if it's a good idea to do that for scatter plots in general (you have the same description for different data points, maybe just use some legend here?), but I guess a specific solution to what you have in mind, might be the following:
from matplotlib import pyplot as plt
# Data
x = [3, 4, 6, 77, 3]
y = [8, 5, 2, 5, 5]
labels = ('null', 'exit', 'power', 'smile', 'null')
# Customize colormap and scatter plot
cm = plt.cm.get_cmap('hsv')
sc = plt.scatter(x, y, c=range(5), cmap=cm)
cbar = plt.colorbar(sc, ticks=range(5))
cbar.ax.set_yticklabels(labels)
plt.show()
This will result in such an output:
The code combines this Matplotlib demo and this SO answer.
Hope that helps!
EDIT: Incorporating the comments, I can only think of some kind of label color dictionary, generating a custom colormap from the colors, and before plotting explicitly grabbing the proper color indices from the labels.
Here's the updated code (I added some additional colors and data points to check scalability):
from matplotlib import pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
# Color information; create custom colormap
label_color_dict = {'null': '#FF0000',
'exit': '#00FF00',
'power': '#0000FF',
'smile': '#FF00FF',
'addon': '#AAAAAA',
'addon2': '#444444'}
all_labels = list(label_color_dict.keys())
all_colors = list(label_color_dict.values())
n_colors = len(all_colors)
cm = LinearSegmentedColormap.from_list('custom_colormap', all_colors, N=n_colors)
# Data
x = [3, 4, 6, 77, 3, 10, 40]
y = [8, 5, 2, 5, 5, 4, 7]
labels = ('null', 'exit', 'power', 'smile', 'null', 'addon', 'addon2')
# Get indices from color list for given labels
color_idx = [all_colors.index(label_color_dict[label]) for label in labels]
# Customize colorbar and plot
sc = plt.scatter(x, y, c=color_idx, cmap=cm)
c_ticks = np.arange(n_colors) * (n_colors / (n_colors + 1)) + (2 / n_colors)
cbar = plt.colorbar(sc, ticks=c_ticks)
cbar.ax.set_yticklabels(all_labels)
plt.show()
And, the new output:
Finding the correct middle point of each color segment is (still) not good, but I'll leave this optimization to you.

How to plot a line representing a value from a dataframe with two geometry columns?

I have the following data:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt
points = gpd.GeoDataFrame([['A', Point(1.5, 1.75), Point(2, 2), 16],
['B', Point(3.0,2.0), Point(3, 4), 18],
['C', Point(2.5,1.25), Point(1, 1), 19]],
columns=['id', 'geometry', 'geometry b', 'value'],
geometry='geometry')
the first geometry column represents the starting point, the second the ending point of a line which has a value corresponding to the value column.
I have tried to plot this with:
f, ax = plt.subplots(1, figsize = [12, 12])
points.plot(ax=ax, column = 'value')
However it just plots the first geometry column and colours the points corresponding to their value.
How do I produce a plot that draws the lines, colour coded to their value?

This working code and its output plot demonstrate how you can achieve what you need. See comments in the code for more details.
import geopandas as gpd
from shapely.geometry import Point, LineString
import matplotlib.pyplot as plt
# handle all points and create relating lines
pA1 = Point(1.5, 1.75)
pA2 = Point(2, 2)
line_A = LineString([[pA1.x, pA1.y], [pA2.x, pA2.y]])
pB1 = Point(3.0, 2.0)
pB2 = Point(3, 4)
line_B = LineString([[pB1.x, pB1.y], [pB2.x, pB2.y]])
pC1 = Point(2.5, 1.25)
pC2 = Point(1, 1)
line_C = LineString([[pC1.x, pC1.y], [pC2.x, pC2.y]])
# create a geodataframe,
# assigning the column containing `LineString` as its geometry
pts_and_lines = gpd.GeoDataFrame([['A', pA1, pA2, 16, line_A],
['B', pB1, pB2, 18, line_B],
['C', pC1, pC2, 19, line_C]],
columns=['id', 'beg_pt', 'end_pt', 'value', 'LineString_obj'],
geometry='LineString_obj') # declare LineString (last column) as the `geometry`
# make a plot of the geodataframe obtained
f, ax = plt.subplots(1, figsize = [4, 4])
pts_and_lines.plot(ax=ax, column = 'value');
plt.show()
The output plot:
If you prefer to build a dataframe containing from_point and to_point first, then append new column containing LineString creating from the existing points, here is an alternative code.
import geopandas as gpd
from shapely.geometry import Point, LineString
import matplotlib.pyplot as plt
# this dataframe `points_df` contains from_point, to_point for creating `lineString`.
points_df = gpd.GeoDataFrame([['A', Point(1.5, 1.75), Point(2, 2), 16],
['B', Point(3.0,2.0), Point(3, 4), 18],
['C', Point(2.5,1.25), Point(1, 1), 19]],
columns=['id', 'geometry_a', 'geometry_b', 'value'])
# add new column, `line` to the dataframe,
# this column contains `LineString` geometry.
points_df['line'] = points_df.apply(lambda x: LineString([x['geometry_a'], x['geometry_b']]), axis=1)
# assign geometry to `points_df` using the column that has `LineString` geometry
# take the result as `target_gdf`
# `target_gdf` is now capable of plotting with matplotlib
target_gdf = gpd.GeoDataFrame(points_df, geometry=points_df['line'])
f, ax = plt.subplots(1, figsize = [4, 4])
target_gdf.plot(ax=ax, column = 'value');
plt.show()
Its output plot is the same as the previous one.

Legend in separate subplot and grid

I have a collection of plots, arranged in two grids. In the left grid, I have one plot in the top (whole width) and two in the bottom (side-by-side). The two in the bottom are sharing legends. In my right grid, I want the legends, it is a lots of data series, and I would like to use the whole height of my figure.
The appearance of the data series are animated, but I would like the legends not to be.
My idea was to draw the time series in my right grid with legends, and hide the data series. But my only solution is ax.set_visible(False), which removes everything.
This is principally how the script looks like (simplified version):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as anim
from matplotlib.gridspec import GridSpec
data = np.array([[1],[2],[3],[4]])
sett = np.array([1,2,3,4])
data1 = np.hstack((data,data*2, data*3, data*4))
data2 = np.hstack((3*data, 3*data/2, 3*data/3, 3*data/4))
df1 = pd.DataFrame(data = np.array(data1), index = [1,2,3,4], columns =
sett).transpose()
df2 = pd.DataFrame(data = np.array(data2), index = [1,2,3,4], columns =
sett).transpose()
gs1 = GridSpec(2,2)
gs1.update(left=0.05, right = 0.80, hspace = 0.05)
gs2 = GridSpec(3,1)
gs2.update(left=0.85, right = 0.98, hspace = 0.05)
figure = plt.figure()
plt.clf()
ax1 = plt.subplot(gs1[0,:])
ax2 = plt.subplot(gs1[1,0])
ax3 = plt.subplot(gs1[1,1], sharey = ax2)
ax4 = plt.subplot(gs2[:,0])
ax1.set_ylim(0,25)
label = ['s0', 's1', 's2', 's3', 's4']
ax4.plot(df1[1], df2[:])
ax4.legend(labels = label)
def make_frame(i):
ct=sett[i]
ax2.plot(df1[1], df1[ct])
ax3.plot(df1[1], df2[ct])
ax3.legend(labels = label)
ani = anim.FuncAnimation(figure, make_frame, frames = len(sett),
interval =500, repeat = False)
plt.show()
How can I remove the data series and keep the legend in gs2/ax4?
Don't bother I plot the first data series twice in ax2 and ax3 - it is ok in my original script. However - if someone can enlighten me on why, it is very much appreciated.

I'm not entirely sure what the desired output should be. Are you trying to put the legend at the place of ax4 right now, but not have the plot shown in ax4 at the moment.
My solution would be to not create ax4 at all. Instead you can use bbox_to_anchor to move the position of the legend. Here I use the transform from ax1 to establish a location in reference to ax1 and I move the legend slightly past the right edge and at the top of ax1.
See "legend guide" for more information.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as anim
from matplotlib.gridspec import GridSpec
data = np.array([[1], [2], [3], [4]])
sett = np.array([1, 2, 3, 4])
data1 = np.hstack((data, data * 2, data * 3, data * 4))
data2 = np.hstack((3 * data, 3 * data / 2, 3 * data / 3, 3 * data / 4))
df1 = pd.DataFrame(data=np.array(data1), index=[1, 2, 3, 4], columns=sett).transpose()
df2 = pd.DataFrame(data=np.array(data2), index=[1, 2, 3, 4], columns=sett).transpose()
gs1 = GridSpec(2, 2)
gs1.update(left=0.05, right=0.80, hspace=0.05)
figure = plt.figure()
plt.clf()
ax1 = plt.subplot(gs1[0, :])
ax2 = plt.subplot(gs1[1, 0])
ax3 = plt.subplot(gs1[1, 1], sharey=ax2)
label = ['s0', 's1', 's2', 's3', 's4']
def make_frame(i):
ct = sett[i]
ax2.plot(df1[1], df1[ct])
ax3.plot(df1[1], df2[ct])
ax3.legend(labels=label, loc='upper left', bbox_to_anchor=(1.05, 1.), bbox_transform=ax1.transAxes)
ani = anim.FuncAnimation(figure, make_frame, frames=len(sett),
interval=500, repeat=False)
plt.show()
EDIT: using a proxy artist to create all the legends before the animation starts
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as anim
from matplotlib.gridspec import GridSpec
data = np.array([[1], [2], [3], [4]])
sett = np.array([1, 2, 3, 4])
data1 = np.hstack((data, data * 2, data * 3, data * 4))
data2 = np.hstack((3 * data, 3 * data / 2, 3 * data / 3, 3 * data / 4))
df1 = pd.DataFrame(data=np.array(data1), index=[1, 2, 3, 4], columns=sett).transpose()
df2 = pd.DataFrame(data=np.array(data2), index=[1, 2, 3, 4], columns=sett).transpose()
gs1 = GridSpec(2, 2)
gs1.update(left=0.05, right=0.80, hspace=0.05)
figure = plt.figure()
plt.clf()
ax1 = plt.subplot(gs1[0, :])
ax2 = plt.subplot(gs1[1, 0])
ax3 = plt.subplot(gs1[1, 1], sharey=ax2)
ax1.set_ylim(0, 25)
labels = ['s0', 's1', 's2', 's3', 's4']
colors = ['C0', 'C1', 'C2', 'C3', 'C4']
proxies = [plt.plot([], [], c=c)[0] for c in colors]
ax1.legend(proxies, labels, bbox_to_anchor=(1., 1.), loc="upper left")
def init_frame():
pass
def make_frame(i):
ct = sett[i]
ax2.plot(df1[1], df1[ct], c=colors[i], label=labels[i])
ax3.plot(df1[1], df2[ct], c=colors[i], label=labels[i])
ax3.legend()
ani = anim.FuncAnimation(figure, make_frame, init_func=init_frame, frames=len(sett),
interval=500, repeat=False)
plt.show()

I would create the line plots prior to animating anything. You can initialize them with empty lists and then set the data one by one.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as anim
from matplotlib.gridspec import GridSpec
data = np.array([1,2,3,4,5])
data1 = np.vstack((data,data*2, data*3, data*4))
data2 = np.vstack((3*data, 3*data/2, 3*data/3, 3*data/4))
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
sett = np.arange(len(df1.columns))
gs1 = GridSpec(2,2)
gs1.update(left=0.05, right = 0.80, hspace = 0.05)
figure = plt.figure()
ax1 = plt.subplot(gs1[0,:])
ax2 = plt.subplot(gs1[1,0])
ax3 = plt.subplot(gs1[1,1], sharey = ax2, sharex= ax2)
ax2.set_ylim(0,25)
lines1 = ax2.plot(*[[] for _ in range(len(sett)*2)])
lines2 = ax3.plot(*[[] for _ in range(len(sett)*2)])
label = ['s0', 's1', 's2', 's3', 's4']
ax1.legend(handles = lines1, labels=label, bbox_to_anchor=(1.05,1), loc="upper left")
def init():
for line in lines1+lines2:
line.set_data([],[])
def make_frame(i):
ct=sett[i]
lines1[i].set_data(df1.index, df1[ct])
lines2[i].set_data(df1.index, df2[ct])
ax2.relim()
ax2.autoscale_view()
ani = anim.FuncAnimation(figure, make_frame, init_func=init, frames = len(sett),
interval =500, repeat = False)
ani.save("anigif.gif", writer="imagemagick")
plt.show()

How to plot vertical lines in plotly offline?

How would one plot a vertical line in plotly offline, using python? I want to add lines at x=20, x=40, and x=60, all in the same plot.
def graph_contracts(self):
trace1 = go.Scatter(
x=np.array(range(len(all_prices))),
y=np.array(all_prices), mode='markers', marker=dict(size=10, color='rgba(152, 0, 0, .8)'))
data = [trace1]
layout = go.Layout(title='Market Contracts by Period',
xaxis=dict(title='Contract #',
titlefont=dict(family='Courier New, monospace', size=18, color='#7f7f7f')),
yaxis=dict(title='Prices ($)',
titlefont=dict(family='Courier New, monospace', size=18, color='#7f7f7f')))
fig = go.Figure(data=data, layout=layout)
py.offline.plot(fig)

You can add lines via shape in layout, e.g.
import plotly
plotly.offline.init_notebook_mode()
import random
x=[i for i in range(100)]
trace = plotly.graph_objs.Scatter(x=x,
y=[random.random() for _ in x],
mode='markers')
shapes = list()
for i in (20, 40, 60):
shapes.append({'type': 'line',
'xref': 'x',
'yref': 'y',
'x0': i,
'y0': 0,
'x1': i,
'y1': 1})
layout = plotly.graph_objs.Layout(shapes=shapes)
fig = plotly.graph_objs.Figure(data=[trace],
layout=layout)
plotly.offline.plot(fig)
would give you

This is my example. The most important instruction is this.
fig.add_trace(go.Scatter(x=[12, 12], y=[-300,300], mode="lines", name="SIGNAL"))
The most important attribute is MODE='LINES'.
Actually this example is about a segment with x=12
EXAMPLE
import pandas as pd
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import numpy as np
import plotly.tools as tls
df1 = pd.read_csv('./jnjw_f8.csv')
layout = go.Layout(
xaxis = go.layout.XAxis(
tickmode = 'linear',
tick0 = 1,
dtick = 3
),
yaxis = go.layout.YAxis(
tickmode = 'linear',
tick0 = -100,
dtick = 3
))
fig = go.Figure(layout = layout)
fig.add_trace(go.Scatter(x = df1['x'], y =
df1['y1'],name='JNJW_sqrt'))
fig.add_trace(go.Scatter(x=[12, 12], y=[-300,300],
mode="lines", name="SIGNAL"))
fig.show()
Look here too.
how to plot a vertical line with plotly

A feature for vertical and horizontal lines is implemented with Plotly.py 4.12 (released 11/20). It works for plotly express and graph objects. See here: https://community.plotly.com/t/announcing-plotly-py-4-12-horizontal-and-vertical-lines-and-rectangles/46783
Simple example:
import plotly.express as px
df = px.data.stocks(indexed=True)
fig = px.line(df)
fig.add_vline(x='2018-09-24')
fig.show()

fig.add_vline(x=2.5, line_width=3, line_dash="dash", line_color="green")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Highlight a point in grouped boxplot in seaborn - python-3.x

Related

Custom annotation of text in seaborn heatmap

How to visualize a list of strings on a colorbar in matplotlib

How to plot a line representing a value from a dataframe with two geometry columns?

Legend in separate subplot and grid

How to plot vertical lines in plotly offline?

Categories

Resources