Matplotlib warning using pandas.DataFrame.plot.scatter() - python-3.x

One windows 10, with versions:
Python 3.5.2, pandas 0.23.4, matplotlib 3.0.0, numpy 1.15.2,
the following code give me the following warning that i would like to sort out
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
# a 5x4 random pandas DataFrame
pf = pd.DataFrame(np.random.random((5,4)), columns=['a', 'b', 'c', 'd'])
# colors:
colors = cm.rainbow(np.linspace(0, 1, 4))
fig1 = pf.plot.scatter('a', 'b', color='k')
for i, j in enumerate(['b', 'c', 'd']):
pf.plot.scatter('a', j, color=colors[i+1], ax = fig1)
And I get a warning:
'c' argument looks like a single numeric RGB or RGBA sequence, which
should be avoided as value-mapping will have precedence in case its
length matches with 'x' & 'y'. Please use a 2-D array with a single
row if you really want to specify the same RGB or RGBA value for all
points.
Could you point me on how to address that warning?

I can't reproduce the warning with matplotlib 3.0 and pandas 0.23.4, but what it says is essentially that you should not use a single RGB tuple to specify a color.
So instead of color=colors[i+1] use
color=[colors[i+1]]

Related

Pandas Series of dates to vlines kwarg in mplfinance plot

import numpy as np
import pandas as pd
df = pd.DataFrame({'dt': ['2021-2-13', '2022-2-15'],
'w': [5, 7],
'n': [11, 8]})
df.reset_index()
print(list(df.loc[:,'dt'].values))
gives: ['2021-2-13', '2022-2-15']
NEEDED: [('2021-2-13'), ('2022-2-15')]
Important (at comment's Q): "NEEDED" is the way "mplfinance" accepts vlines argument for plot (checked) - I need to draw vertical lines for specified dates at x-axis of chart
import mplfinance as mpf
RES['Date'] = RES['Date'].dt.strftime('%Y-%m-%d')
my_vlines=RES.loc[:,'Date'].values # NOT WORKS
fig, axlist = mpf.plot( ohlc_df, type="candle", vlines= my_vlines, xrotation=30, returnfig=True, figsize=(6,4))
will only work if explcit my_vlines= [('2022-01-18'), ('2022-02-25')]
SOLVED: Oh, it really appears to be so simple after all
my_vlines=list(RES.loc[:,'Date'].values)
Your question asks for a list of Numpy arrays but your desired output looks like Tuples. If you need Tuples, note that it's the comma that makes the tuple not the parentheses, so you'd do something like this:
desired_format = [(x,) for x in list(df.loc[:,'dt'].values)]
If you want numpy arrays, you could do this
desired_format = [np.array(x) for x in list(df.loc[:,'dt'].values)]
I think I understand your problem. Please see the example code below and let me know if this resolves your problem. I expanded on your dataframe to meet mplfinance plot criteria.
import pandas as pd
import numpy as np
import mplfinance as mpf
df = pd.DataFrame({'dt': ['2021-2-13', '2022-2-15'],'Open': [5,7],'Close': [11, 8],'High': [21,30],'Low': [7, 3]})
df['dt']=pd.to_datetime(df['dt'])
df.set_index('dt', inplace = True)
mpf.plot(df, vlines = dict(vlines = df.index.tolist()))

Matplotlib: applying cellColours to only certain columns/cells

Got myself in a pickle.
I'm creating a basic table in Matplotlib (via Pandas, but that's not the issue). What I'm trying to accomplish is to create a table where the first column, which will be string values, remains white...but columns 2,3,4,5,6 are floating/integers and will be colored by a custom normalized colormap.
I've started with the basics, and created the 'colored' table via the code below. This only plots the columns with integer values at this point, see here:
What I ulimately need to do is plot this with an additional column, say before column 'A' or after column 'F' which holds string values, e.g. ['MBIAS', 'RMSE', 'BAGSS', 'MBIAS', 'MBIAS'].
However if I try to apply the cellColours method in the code below to a table that mixes lists of strings and float/integers, it obviously fails.
Is there a method to apply a cellColours scheme to only certain cells, or row/columns? Can I loop through, applying the custom colormap to specific cells?
Any help or tips would be appreciated!
Code:
import numpy as np
import matplotlib
from matplotlib import cm
import matplotlib.pyplot as plt
from pandas import *
#Create sample data in pandas dataframe
idx = Index(np.arange(1,6))
df = DataFrame(abs(2*np.random.randn(5, 5)), index=idx, columns=['A', 'B', 'C', 'D', 'E'])
model = ['conusarw', 'conusarw', 'conusarw', 'nam04', 'emhrrr']
df['Model'] = model
df1 = df[['A','B','C','D','E']]
test = df1.round({'A':2,'B':2,'C':2,'D':2,'E':2})
print(test)
vals = test.values
print(vals)
#Creates normalized list (from 0-1) based a user provided range and center of distribution.
norm = matplotlib.colors.TwoSlopeNorm(vmin=0,vcenter=1,vmax=10)
#Merges colormap to the normalized data based on customized normalization pattern from above.
colours = plt.cm.coolwarm(norm(vals))
#Create figure in Matplotlib in which to plot table.
fig = plt.figure(figsize=(15,8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
#Plot table, using pandas dataframe information and data.
#Customized lists of data and names can also be provided.
the_table=plt.table(cellText=vals, rowLabels=model, colLabels=df.columns,
loc='center', cellColours=colours)
plt.savefig('test_table.png')
Instead of the fast vectorized call colours = plt.cm.coolwarm(norm(vals)), you can just use regular Python loops with if-tests. The code below loops through the individual rows, then through the individual elements and test whether they are numeric. A similar loop prepares the rounded values. Speed is not really a problem, unless you'd have thousands of elements.
(The code uses import pandas as pd, as import * from pandas isn't recommended.)
import matplotlib.pyplot as plt
from matplotlib.colors import to_rgba, TwoSlopeNorm
import pandas as pd
import numpy as np
# Create sample data in pandas dataframe
idx = pd.Index(np.arange(1, 6))
df = pd.DataFrame(abs(2 * np.random.randn(5, 5)), index=idx, columns=['A', 'B', 'C', 'D', 'E'])
df['Model'] = ['conusarw', 'conusarw', 'conusarw', 'nam04', 'emhrrr']
cmap = plt.cm.coolwarm
norm = TwoSlopeNorm(vmin=0, vcenter=1, vmax=10)
colours = [['white' if not np.issubdtype(type(val), np.number) else cmap(norm(val)) for val in row]
for row in df.values]
vals = [[val if not np.issubdtype(type(val), np.number) else np.round(val, 2) for val in row]
for row in df.values]
fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
the_table = plt.table(cellText=vals, rowLabels=df['Model'].to_list(), colLabels=df.columns,
loc='center', cellColours=colours)
plt.show()
PS: If speed is a concern, the following code is a bit trickier. It uses:
setting the "bad color" of a colormap
pd.to_numeric(..., errors='coerce') to convert all strings to nans
as pd.to_numeric() only works for 1D arrays, ravel() and reshape() are used
using the same arrays, np.where can do the rounding
cmap = plt.cm.coolwarm.copy()
cmap.set_bad('white')
norm = TwoSlopeNorm(vmin=0, vcenter=1, vmax=10)
values = pd.to_numeric(df.values.ravel(), errors='coerce').reshape(df.shape)
colours = cmap(norm(values))
vals = np.where(np.isnan(values), df.values, np.round(values, 2))
fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
the_table = plt.table(cellText=vals, rowLabels=df['Model'].to_list(), colLabels=df.columns,
loc='center', cellColours=colours)

Changing data types after interpolating

from tnorma import tnorma
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tnorma import tnorma
df6 = pd.read_csv('Static06_new.csv')
Z =tnorma(df6)
df = pd.DataFrame(Z)
print(df)
This is a simple enough code. The interpolation code "tnorma" is given here https://pypi.org/project/tnorma/
The Static06_new.csv files contains positional data with time column. The time column is not continuous, hence I am interpolating them.
The interpolation is successful, however, I am unable to convert the result back into a data-frame for further analysis.
The error received when running my code is as follows:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
values = np.array([convert(v) for v in values])
0
0 [[109.00000000000001, 0.009174311926606001, 35...
1 [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, ...
2 [0, 3694]
Not sure how to proceed. Ideal case I would like it back in a data-frame format and save it as .csv file.
Kind regards,

Seaborn and mplcursors

I have some data that I want to plot on a scatter chart, and display the associated label for each point. The data looks like
xlist=[1,2,3,4]
ylist=[2,3,4,5]
labels=['a', 'b', 'c', 'd']
I can plot using Seaborn and tried to use mplcursor, but the displayed labels are the x and y instead of labels.
sns.scatterplot(x, y)
mplcursors.cursor(hover=True)
How can I make it display the labels, instead of (x, y)?
You will need to read the mplcursors documentation and copy the example on that matter from it to your code. Let me do that for you:
import matplotlib.pyplot as plt
import seaborn as sns
import mplcursors
xlist=[1,2,3,4]
ylist=[2,3,4,5]
labels=['a', 'b', 'c', 'd']
sns.scatterplot(xlist, ylist)
cursor = mplcursors.cursor(hover=True)
cursor.connect(
"add", lambda sel: sel.annotation.set_text(labels[sel.target.index]))
plt.show()

plt.errorbar for X string value

I have a dataframe as below
import pandas as pd
import matplotlib.pylab as plt
df = pd.DataFrame({'name':['one', 'two', 'three'], 'assess':[100,200,300]})
I want to build errorbar like this
c = 30
plt.errorbar(df['name'], df['assess'], yerr=c, fmt='o')
and of course i get
ValueError: could not convert string to float
I can convert string to float, but I'm losing value signatures and maybe there's a more elegant way?
Matplotlib can indeed only work with numerical data. There is an example in the matplotlib collection showing how to handle cases where you have categorical data. The solution is to plot a range of values and set the labels afterwards using plt.xticks(ticks, labels) or a combination of ax.set_xticks(ticks) and ax.set_xticklabels(labels).
In your case the former works fine:
import pandas as pd
import matplotlib.pylab as plt
df = pd.DataFrame({'name':['one', 'two', 'three'], 'assess':[100,200,300]})
c = 30
plt.errorbar(range(len(df['name'])), df['assess'], yerr=c, fmt='o')
plt.xticks(range(len(df['name'])), df['name'])
plt.show()

Resources