I want to use mplcursors to show both x and y value with a label information, on the same tooltip.
Actually I used two cursor but the box information are overlapped.
Here is my code example:
from matplotlib import pyplot as plt
import mplcursors
from pandas import DataFrame
df = DataFrame(
[("Alice", 163, 54),
("Bob", 174, 67),
("Charlie", 177, 73),
("Diane", 168, 57)],
columns=["name", "height", "weight"])
scatter1=df.plot.scatter("height", "weight")
c1=mplcursors.cursor(scatter1)
mplcursors.cursor().connect(
"add", lambda sel: sel.annotation.set_text(df["name"][sel.target.index]))
plt.show()
You could leave out the first cursor (c1) and add all information to the other cursor. Like so:
from matplotlib import pyplot as plt
import mplcursors
from pandas import DataFrame
df = DataFrame(
[("Alice", 163, 54),
("Bob", 174, 67),
("Charlie", 177, 73),
("Diane", 168, 57)],
columns=["name", "height", "weight"])
scatter1 = df.plot.scatter("height", "weight")
mplcursors.cursor(scatter1, hover=True).connect("add",
lambda sel: sel.annotation.set_text(
f'{df["name"][sel.target.index]}\nHeight: {df["height"][sel.target.index] / 100} m\nWeight: {df["weight"][sel.target.index]} kg'))
plt.show()
Related
Below is code for a scatter plot annotated using mplcursors which uses two columns, labeling the points by a third column.
How can two values from two columns from a single dataframe be selected for annotation text in a single text box?
When instead of only "name" in the annotation text box, I would like both "height" and "name" to show in the annotation text box. Using df[['height', 'name']] does not work.
How can this be achieved otherwise?
df = pd.DataFrame(
[("Alice", 163, 54),
("Bob", 174, 67),
("Charlie", 177, 73),
("Diane", 168, 57)],
columns=["name", "height", "weight"])
df.plot.scatter("height", "weight")
mplcursors.cursor(multiple = True).connect("add", lambda sel: sel.annotation.set_text((df["name"])[sel.target.index]))
plt.show()
df.loc[sel.target.index, ["name", 'height']].to_string(): correctly select the columns and row with .loc and then create a string with .to_string()
Tested in python 3.8, matplotlib 3.4.2, pandas 1.3.1, and jupyterlab 3.1.4
In mplcursors v0.5.1, Selection.target.index is deprecated, use Selection.index instead.
df.iloc[x.index, :] instead of df.iloc[x.target.index, :]
from mplcursors import cursor
import matplotlib.pyplot as plt
import pandas as pd
# for interactive plots in Jupyter Lab, use the following magic command, otherwise comment it out
%matplotlib qt
df = pd.DataFrame([('Alice', 163, 54), ('Bob', 174, 67), ('Charlie', 177, 73), ('Diane', 168, 57)], columns=["name", "height", "weight"])
ax = df.plot(kind='scatter', x="height", y="weight", c='tab:blue')
cr = cursor(ax, hover=True, multiple=True)
cr.connect("add", lambda sel: sel.annotation.set_text((df.loc[sel.index, ["name", 'height']].to_string())))
plt.show()
I want to connect two points in a data frame plot with another line and add it to the plot:
import numpy as np
from numpy.random import randn
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
days = [datetime(2016, 1, 1), datetime(2016, 1, 2),datetime(2016, 1, 3),datetime(2016, 1, 4)]
dt_ind = pd.DatetimeIndex(days)
data = np.random.randn(4,2)
cols = ['A','B']
df = pd.DataFrame(data,dt_ind,cols)
df['A'].plot(figsize=(12,4), sort_columns=True)
here is the data frame:
enter image description here
and the plot:
enter image description here
how is that possible? for example add a line from point 2 to point 4 (or any two points)
You want to use matplotlib's plt.subplots() function to return a fig and ax object, so you can then add separate lines to your ax.
import numpy as np
from numpy.random import randn
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
days = [datetime(2016, 1, 1),
datetime(2016, 1, 2),
datetime(2016, 1, 3),
datetime(2016, 1, 4)]
dt_ind = pd.DatetimeIndex(days)
data = np.random.randn(4,2)
cols = ['A','B']
df = pd.DataFrame(data,dt_ind,cols)
fig, ax = plt.subplots()
ax.plot(df['A'], color='red')
ax.plot([df.index[1], df.index[3]],
[df['A'][1], df['A'][3]], color='blue')
Recently I try to read LSUN dataset and train a DL network. But error shows up when I run the following code.
import torch
import torchvision
import torchvision.transforms as transforms
import time
from torch import nn, optim
import numpy as np
import matplotlib.pyplot as plt
root='d:/datasets/'
lsun_dataset = torchvision.datasets.LSUN(root=root, classes=['restaurant_train'], transform=transforms.Compose([
transforms.Resize(96),
transforms.RandomCrop(64),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
]))
print(lsun_dataset);
the error message is as below.
Traceback (most recent call last):
File "D:\coding\paper\lsgan\dataloadingtest.py", line 13, in <module>
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
File "d:\anaconda\lib\site-packages\torchvision\datasets\lsun.py", line 83, in __init__
transform=transform))
File "d:\anaconda\lib\site-packages\torchvision\datasets\lsun.py", line 25, in __init__
readahead=False, meminit=False)
lmdb.Error: d:/datasets//restaurant_train_lmdb: \u03f5\u0373\ufffd\u04b2\ufffd\ufffd\ufffd\u05b8\ufffd\ufffd\ufffd\ufffd��\ufffd\ufffd\ufffd\ufffd
can anyone tell me how to fix it? Thx.
Platform: Windows 10, Python version: 3.7.3
I have a dataset Data.csv
Country,Age,Salary,Purchased
France,44,72000,No
Spain,27,48000,Yes
Germany,30,54000,No
Spain,38,61000,No
Germany,40,,Yes
France,35,58000,Yes
Spain,,52000,No
France,48,79000,Yes
Germany,50,83000,No
France,37,67000,Yes
I tried to fill nan values using sklearn.impute.SimpleImputer by using following code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
# Taking care of missing data
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = 'NaN', strategy = 'mean')
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
But I get a error which says:
File "C:\Users\Krishna Rohith\Machine Learning A-Z\Part 1 - Data Preprocessing\Section 2 ----------- --------- Part 1 - Data Preprocessing --------------------\missing_data.py", line 16, in <module>
imputer = imputer.fit(X[:, 1:3])
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\impute\_base.py", line 268, in fit
X = self._validate_input(X)
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\impute\_base.py", line 242, in _validate_input
raise ve
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\impute\_base.py", line 235, in _validate_input
force_all_finite=force_all_finite, copy=self.copy)
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 562, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I know how to do it numpy but can someone please tell me using sklearn.impute?
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
Replace 'NaN' by numpy default Nan np.nan
Given a pandas dataframe, I am trying to translate it into a table by using this code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = {"Name": ["John", "Leonardo", "Chris", "Linda"],
"Location" : ["New York", "Florence", "Athens", "London"],
"Age" : [41, 33, 53, 22],
"Km": [1023,2312,1852,1345]}
df = pd.DataFrame(data)
fig, ax = plt.subplots()
ax.axis('off')
ax.set_title("Table", fontsize=16, weight='bold')
table = ax.table(cellText=df.values,
bbox=[0, 0, 1.5, 1],
cellLoc='center',
colLabels=df.columns)
And it works. However I can figure out how to set the format for numbers as {:,.2f}, that is, with commas as thousands separators and two decimals.
Any suggestion?
Insert the following two lines of code after df is created and the rest of your code works as desired.
The Age and Km columns are defined as type int; convert these to float before using your str.format:
df.update(df[['Age', 'Km']].astype(float))
Now use DataFrame.applymap(str.format) on these two columns:
df.update(df[['Age', 'Km']].applymap('{:,.2f}'.format))