How to annotate with multiple columns using mplcursors - python-3.x

Below is code for a scatter plot annotated using mplcursors which uses two columns, labeling the points by a third column.
How can two values from two columns from a single dataframe be selected for annotation text in a single text box?
When instead of only "name" in the annotation text box, I would like both "height" and "name" to show in the annotation text box. Using df[['height', 'name']] does not work.
How can this be achieved otherwise?
df = pd.DataFrame(
[("Alice", 163, 54),
("Bob", 174, 67),
("Charlie", 177, 73),
("Diane", 168, 57)],
columns=["name", "height", "weight"])
df.plot.scatter("height", "weight")
mplcursors.cursor(multiple = True).connect("add", lambda sel: sel.annotation.set_text((df["name"])[sel.target.index]))
plt.show()

df.loc[sel.target.index, ["name", 'height']].to_string(): correctly select the columns and row with .loc and then create a string with .to_string()
Tested in python 3.8, matplotlib 3.4.2, pandas 1.3.1, and jupyterlab 3.1.4
In mplcursors v0.5.1, Selection.target.index is deprecated, use Selection.index instead.
df.iloc[x.index, :] instead of df.iloc[x.target.index, :]
from mplcursors import cursor
import matplotlib.pyplot as plt
import pandas as pd
# for interactive plots in Jupyter Lab, use the following magic command, otherwise comment it out
%matplotlib qt
df = pd.DataFrame([('Alice', 163, 54), ('Bob', 174, 67), ('Charlie', 177, 73), ('Diane', 168, 57)], columns=["name", "height", "weight"])
ax = df.plot(kind='scatter', x="height", y="weight", c='tab:blue')
cr = cursor(ax, hover=True, multiple=True)
cr.connect("add", lambda sel: sel.annotation.set_text((df.loc[sel.index, ["name", 'height']].to_string())))
plt.show()

Related

Pandas Series of dates to vlines kwarg in mplfinance plot

import numpy as np
import pandas as pd
df = pd.DataFrame({'dt': ['2021-2-13', '2022-2-15'],
'w': [5, 7],
'n': [11, 8]})
df.reset_index()
print(list(df.loc[:,'dt'].values))
gives: ['2021-2-13', '2022-2-15']
NEEDED: [('2021-2-13'), ('2022-2-15')]
Important (at comment's Q): "NEEDED" is the way "mplfinance" accepts vlines argument for plot (checked) - I need to draw vertical lines for specified dates at x-axis of chart
import mplfinance as mpf
RES['Date'] = RES['Date'].dt.strftime('%Y-%m-%d')
my_vlines=RES.loc[:,'Date'].values # NOT WORKS
fig, axlist = mpf.plot( ohlc_df, type="candle", vlines= my_vlines, xrotation=30, returnfig=True, figsize=(6,4))
will only work if explcit my_vlines= [('2022-01-18'), ('2022-02-25')]
SOLVED: Oh, it really appears to be so simple after all
my_vlines=list(RES.loc[:,'Date'].values)
Your question asks for a list of Numpy arrays but your desired output looks like Tuples. If you need Tuples, note that it's the comma that makes the tuple not the parentheses, so you'd do something like this:
desired_format = [(x,) for x in list(df.loc[:,'dt'].values)]
If you want numpy arrays, you could do this
desired_format = [np.array(x) for x in list(df.loc[:,'dt'].values)]
I think I understand your problem. Please see the example code below and let me know if this resolves your problem. I expanded on your dataframe to meet mplfinance plot criteria.
import pandas as pd
import numpy as np
import mplfinance as mpf
df = pd.DataFrame({'dt': ['2021-2-13', '2022-2-15'],'Open': [5,7],'Close': [11, 8],'High': [21,30],'Low': [7, 3]})
df['dt']=pd.to_datetime(df['dt'])
df.set_index('dt', inplace = True)
mpf.plot(df, vlines = dict(vlines = df.index.tolist()))

Converting string to date in numpy unpack

I'm learning how to extract data from links and then proceeding to graph them.
For this tutorial, I was using the yahoo dataset of a stock.
The code is as follows
import matplotlib.pyplot as plt
import numpy as np
import urllib
import matplotlib.dates as mdates
import datetime
def bytespdate2num(fmt, encoding='utf-8'):
strconverter = mdates.strpdate2num(fmt)
def bytesconverter(b):
s = b.decode(encoding)
return strconverter(s)
return bytesconverter
def graph_data(stock):
stock_price_url = 'https://pythonprogramming.net/yahoo_finance_replacement'
source_code = urllib.request.urlopen(stock_price_url).read().decode()
stock_data = []
split_source=source_code.split('\n')
print(len(split_source))
for line in split_source:
split_line=line.split(',')
if (len(split_line)==7):
stock_data.append(line)
date,openn,closep,highp,lowp,openp,volume=np.loadtxt(stock_data,delimiter=',',unpack=True,converters={0:bytespdate2num('%Y-%m-%d')})
plt.plot_date(date,closep)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Graph')
plt.show()
graph_data('TSLA')
The whole code is pretty easy to understand except the part of converting the string datatype into date format using bytesupdate2num function.
Is there an easier way to convert strings extracted from reading a URL into date format during numpy extraction or is there another method I can use.
Thank you
With a guess as to the csv format, I can use the numpy 'native' datetime dtype:
In [183]: txt = ['2020-10-23 1 2.3']*3
In [184]: txt
Out[184]: ['2020-10-23 1 2.3', '2020-10-23 1 2.3', '2020-10-23 1 2.3']
If I let genfromtxt do its own dtype conversions:
In [187]: np.genfromtxt(txt, dtype=None, encoding=None)
Out[187]:
array([('2020-10-23', 1, 2.3), ('2020-10-23', 1, 2.3),
('2020-10-23', 1, 2.3)],
dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])
the date column is rendered as a string.
If I specify a datetime64 format:
In [188]: np.array('2020-10-23', dtype='datetime64[D]')
Out[188]: array('2020-10-23', dtype='datetime64[D]')
In [189]: np.genfromtxt(txt, dtype=['datetime64[D]',int,float], encoding=None)
Out[189]:
array([('2020-10-23', 1, 2.3), ('2020-10-23', 1, 2.3),
('2020-10-23', 1, 2.3)],
dtype=[('f0', '<M8[D]'), ('f1', '<i8'), ('f2', '<f8')])
This date appears to work in plt
In [190]: plt.plot_date(_['f0'], _['f1'])
I used genfromtxt because I'm more familiar with its ability to handle dtypes.

Python add two cursors with mplcursors

I want to use mplcursors to show both x and y value with a label information, on the same tooltip.
Actually I used two cursor but the box information are overlapped.
Here is my code example:
from matplotlib import pyplot as plt
import mplcursors
from pandas import DataFrame
df = DataFrame(
[("Alice", 163, 54),
("Bob", 174, 67),
("Charlie", 177, 73),
("Diane", 168, 57)],
columns=["name", "height", "weight"])
scatter1=df.plot.scatter("height", "weight")
c1=mplcursors.cursor(scatter1)
mplcursors.cursor().connect(
"add", lambda sel: sel.annotation.set_text(df["name"][sel.target.index]))
plt.show()
You could leave out the first cursor (c1) and add all information to the other cursor. Like so:
from matplotlib import pyplot as plt
import mplcursors
from pandas import DataFrame
df = DataFrame(
[("Alice", 163, 54),
("Bob", 174, 67),
("Charlie", 177, 73),
("Diane", 168, 57)],
columns=["name", "height", "weight"])
scatter1 = df.plot.scatter("height", "weight")
mplcursors.cursor(scatter1, hover=True).connect("add",
lambda sel: sel.annotation.set_text(
f'{df["name"][sel.target.index]}\nHeight: {df["height"][sel.target.index] / 100} m\nWeight: {df["weight"][sel.target.index]} kg'))
plt.show()

Float format in matplotlib table

Given a pandas dataframe, I am trying to translate it into a table by using this code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = {"Name": ["John", "Leonardo", "Chris", "Linda"],
"Location" : ["New York", "Florence", "Athens", "London"],
"Age" : [41, 33, 53, 22],
"Km": [1023,2312,1852,1345]}
df = pd.DataFrame(data)
fig, ax = plt.subplots()
ax.axis('off')
ax.set_title("Table", fontsize=16, weight='bold')
table = ax.table(cellText=df.values,
bbox=[0, 0, 1.5, 1],
cellLoc='center',
colLabels=df.columns)
And it works. However I can figure out how to set the format for numbers as {:,.2f}, that is, with commas as thousands separators and two decimals.
Any suggestion?
Insert the following two lines of code after df is created and the rest of your code works as desired.
The Age and Km columns are defined as type int; convert these to float before using your str.format:
df.update(df[['Age', 'Km']].astype(float))
Now use DataFrame.applymap(str.format) on these two columns:
df.update(df[['Age', 'Km']].applymap('{:,.2f}'.format))

Matplotlib Line Graph from Pivot Table: Custom Color of Lines

Given the following:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig, ax = plt.subplots(1,1)
t.plot(ax=ax)
Are there arguments in t.plot() that would allow me to specify the colors of each line?
Thanks in advance!
You can provide line styles:
t.plot(ax=ax, style=['yellow', 'red'])
You can use:
ax.set_color_cycle(['red', 'black'])
Sample:
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2]
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
fig, ax = plt.subplots(1,1)
ax.set_color_cycle(['red', 'black'])
t.plot(ax=ax)
EDIT:
Very interesting, by it seems better is use full name of colors, because it is difference as Mike 1. answer:
t.plot(ax=ax, style=['yellow', 'red'])

Resources