Matplotlib: applying cellColours to only certain columns/cells - python-3.x

Got myself in a pickle.
I'm creating a basic table in Matplotlib (via Pandas, but that's not the issue). What I'm trying to accomplish is to create a table where the first column, which will be string values, remains white...but columns 2,3,4,5,6 are floating/integers and will be colored by a custom normalized colormap.
I've started with the basics, and created the 'colored' table via the code below. This only plots the columns with integer values at this point, see here:
What I ulimately need to do is plot this with an additional column, say before column 'A' or after column 'F' which holds string values, e.g. ['MBIAS', 'RMSE', 'BAGSS', 'MBIAS', 'MBIAS'].
However if I try to apply the cellColours method in the code below to a table that mixes lists of strings and float/integers, it obviously fails.
Is there a method to apply a cellColours scheme to only certain cells, or row/columns? Can I loop through, applying the custom colormap to specific cells?
Any help or tips would be appreciated!
Code:
import numpy as np
import matplotlib
from matplotlib import cm
import matplotlib.pyplot as plt
from pandas import *
#Create sample data in pandas dataframe
idx = Index(np.arange(1,6))
df = DataFrame(abs(2*np.random.randn(5, 5)), index=idx, columns=['A', 'B', 'C', 'D', 'E'])
model = ['conusarw', 'conusarw', 'conusarw', 'nam04', 'emhrrr']
df['Model'] = model
df1 = df[['A','B','C','D','E']]
test = df1.round({'A':2,'B':2,'C':2,'D':2,'E':2})
print(test)
vals = test.values
print(vals)
#Creates normalized list (from 0-1) based a user provided range and center of distribution.
norm = matplotlib.colors.TwoSlopeNorm(vmin=0,vcenter=1,vmax=10)
#Merges colormap to the normalized data based on customized normalization pattern from above.
colours = plt.cm.coolwarm(norm(vals))
#Create figure in Matplotlib in which to plot table.
fig = plt.figure(figsize=(15,8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
#Plot table, using pandas dataframe information and data.
#Customized lists of data and names can also be provided.
the_table=plt.table(cellText=vals, rowLabels=model, colLabels=df.columns,
loc='center', cellColours=colours)
plt.savefig('test_table.png')

Instead of the fast vectorized call colours = plt.cm.coolwarm(norm(vals)), you can just use regular Python loops with if-tests. The code below loops through the individual rows, then through the individual elements and test whether they are numeric. A similar loop prepares the rounded values. Speed is not really a problem, unless you'd have thousands of elements.
(The code uses import pandas as pd, as import * from pandas isn't recommended.)
import matplotlib.pyplot as plt
from matplotlib.colors import to_rgba, TwoSlopeNorm
import pandas as pd
import numpy as np
# Create sample data in pandas dataframe
idx = pd.Index(np.arange(1, 6))
df = pd.DataFrame(abs(2 * np.random.randn(5, 5)), index=idx, columns=['A', 'B', 'C', 'D', 'E'])
df['Model'] = ['conusarw', 'conusarw', 'conusarw', 'nam04', 'emhrrr']
cmap = plt.cm.coolwarm
norm = TwoSlopeNorm(vmin=0, vcenter=1, vmax=10)
colours = [['white' if not np.issubdtype(type(val), np.number) else cmap(norm(val)) for val in row]
for row in df.values]
vals = [[val if not np.issubdtype(type(val), np.number) else np.round(val, 2) for val in row]
for row in df.values]
fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
the_table = plt.table(cellText=vals, rowLabels=df['Model'].to_list(), colLabels=df.columns,
loc='center', cellColours=colours)
plt.show()
PS: If speed is a concern, the following code is a bit trickier. It uses:
setting the "bad color" of a colormap
pd.to_numeric(..., errors='coerce') to convert all strings to nans
as pd.to_numeric() only works for 1D arrays, ravel() and reshape() are used
using the same arrays, np.where can do the rounding
cmap = plt.cm.coolwarm.copy()
cmap.set_bad('white')
norm = TwoSlopeNorm(vmin=0, vcenter=1, vmax=10)
values = pd.to_numeric(df.values.ravel(), errors='coerce').reshape(df.shape)
colours = cmap(norm(values))
vals = np.where(np.isnan(values), df.values, np.round(values, 2))
fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
the_table = plt.table(cellText=vals, rowLabels=df['Model'].to_list(), colLabels=df.columns,
loc='center', cellColours=colours)

Related

How to create a line plot in python, by importing data from excel and using it to create a plot that shares a common X-Axis?

Trying to create a plot using Python Spyder. I have sample data in excel which I am able to import into Spyder, I want one column ('Frequency') to be the X axis, and the rest of the columns ('C1,C2,C3,C4') to be plotted on the Y axis. How do I do this? This is the data in excel and how the plot looks in excel (https://i.stack.imgur.com/eRug5.png) , the plot and data
This is what I have so far . These commands below (Also seen in the image) give an empty plot.
data = data.head()
#data.plot(kind='line', x='Frequency', y=['C1','C2','C3','C4'])
df = pd.DataFrame(data, columns=["Frequency","C1", "C2","C3","C4"])
df.plot(x = "Frequency",y=["C1", "C2","C3","C4"])
Here is an example, you can change columns names:
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'X_Axis':[1,3,5,7,10,20],
'col_2':[.4,.5,.4,.5,.5,.4],
'col_3':[.7,.8,.9,.4,.2,.3],
'col_4':[.1,.3,.5,.7,.1,.0],
'col_5':[.5,.3,.6,.9,.2,.4]})
dfm = df.melt('X_Axis', var_name='cols', value_name='vals')
g = sns.catplot(x="X_Axis", y="vals", hue='cols', data=dfm, kind='point')
import pandas as pd
import matplotlib.pyplot as plt
path = r"C:\Users\Alisha.Walia\Desktop\Alisha\SAMPLE.xlsx"
data = pd.read_excel(path)
#df = pd.DataFrame.from_dict(data)
#print(df)
#prints out data from excl in tabular format
dict1 = (data.to_dict()) #print(dict1)
Frequency=data["Frequency "].to_list() #print (Frequency)
C1=data["C1"].to_list() #print(C1)
C2=data["C2"].to_list() #print(C2)
C3=data["C3"].to_list() #print(C3)
C4=data["C4"].to_list() #print(C4)
plt.plot(Frequency,C1)
plt.plot(Frequency,C2)
plt.plot(Frequency,C3)
plt.plot(Frequency,C4)
plt.style.use('ggplot')
plt.title('SAMPLE')
plt.xlabel('Frequency 20Hz-200MHz')
plt.ylabel('Capacitance pF')
plt.xlim(5, 500)
plt.ylim(-20,20)
plt.legend()
plt.show()

Plotting Pandas DF with Numpy Arrays

I have a Pandas df with multiple columns and each cell inside has a various number of elements of a Numpy array. I would like plot all the elements of the array for every cell within column.
I have tried
plt.plot(df['column'])
plt.plot(df['column'][0:])
both gives a ValueErr: setting an array element with a sequence
It is very important that these values get plotted to its corresponding index as the index represents linear time in this dataframe. I would really appreciate it if someone showed me how to do this properly. Perhaps there is a package other than matplotlib.pylot that is better suited for this?
Thank you
plt.plot needs a list of x-coordinates together with an equally long list of y-coordinates. As you seem to want to use the index of the dataframe for the x-coordinate and each cell contents for the y-coordinates, you need to repeat the x-values as many times as the length of the y-coordinates.
Note that this format doesn't suit a line plot, as connecting subsequent points would create some strange vertical lines. plt.plot accepts a marker as its third parameter, for example '.' to draw a simple dot at each position.
A code example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
N = 30
df = pd.DataFrame({f'column{c}':
[np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
for c in range(1, 6)})
legend_handles = []
colors = plt.cm.Set1.colors
desired_columns = df.columns
for column, color in zip(desired_columns, colors):
for ind, cell in df[column].iteritems():
if len(cell) > 0:
plotted, = plt.plot([ind] * len(cell), cell, '.', color=color)
legend_handles.append(plotted)
plt.legend(legend_handles, desired_columns)
plt.show()
Note that pandas really isn't meant to store complete arrays inside cells. The preferred way is to create a dataframe in "long" form, with each value in a separate row (with the "index" repeated). Most functions of pandas and seaborn don't understand about arrays inside cells.
Here's a way to create a long form which can be called using Seaborn:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
N = 30
df = pd.DataFrame({f'column{c}':
[np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
for c in range(1, 6)})
desired_columns = df.columns
df_long_data = []
for column in desired_columns:
for ind, cell in df[column].iteritems():
for val in cell:
dict = {'timestamp': ind, 'column_name': column, 'value': val}
df_long_data.append(dict)
df_long = pd.DataFrame(df_long_data)
sns.scatterplot(x='timestamp', y='value', hue='column_name', data=df_long)
plt.show()
As per your problem, you have numpy arrays in each cell which you wanna plot. To pass your data to plt.plot() method you might need to pass every cell individually as whenever you try to pass it as a whole like you did, it is actually a sequence that you are passing. But the plot() method will accept a numpy array.
This might help:
for column in df.columns:
for cell in df[column]:
plt.plot(cell)
plt.show()

How do i fix "If using all scalar values, you must pass an index" error?

I am manually trying to build a linear regression model for understanding purpose without using the builtin function. I am getting the error while plotting the regression line. Kindly help me fix it.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sb
data = {'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(np.ones(10), columns = ['ones'])
df_new = pd.concat([df2,df], axis = 1)
X = df_new.loc[:, ['ones', 'X']].values
Y = df_new['Y'].values.reshape(-1,1)
theta = np.array([0.5, 0.2]).reshape(-1,1)
Y_pred = X.dot(theta)
sb.lineplot(df['X'].values.reshape(-1,1),Y_pred)
plt.show()
Error message:
If using all scalar values, you must pass an index
You are passing a 2d array, while seaborn's lineplot expects a 1d array (or a pandas column which is basically same). So change it to
sb.lineplot(df['X'].values,Y_pred.reshape(-1))

Set hue using a range of values in Seaborn stripplot

I am trying to set hue based on a range of values rather than unique values in seaborn stripplot. For example, different colors for different value ranges (1940-1950, 1950-1960 etc.).
sns.stripplot('Condition', 'IM', data=dd3, jitter=0.3, hue= dd3['Year Built'])
Output Figure
Thanks
Looks like you need to bin the data. Use .cut() in the below manner. The years are binned into 5 groups. You can arrange your own step in .arrange() to adjust your ranges.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.randint(0,100,size=100)
y = np.random.randint(0,100, size=100)
year = np.random.randint(1918, 2019, size=100)
df = pd.DataFrame({
'x':x,
'y':y,
'year':year
})
df['year_bin'] = pd.cut(df['year'], np.arange(min(year), max(year), step=20))
sns.lmplot('x','y', data=df, hue='year_bin')
plt.show()
Output:

How do I map df column values to hex color in one go?

I have a pandas dataframe with two columns. One of the columns values needs to be mapped to colors in hex. Another graphing process takes over from there.
This is what I have tried so far. Part of the toy code is taken from here.
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
# Create dataframe
df = pd.DataFrame(np.random.randint(0,21,size=(7, 2)), columns=['some_value', 'another_value'])
# Add a nan to handle realworld
df.iloc[-1] = np.nan
# Try to map values to colors in hex
# # Taken from here
norm = matplotlib.colors.Normalize(vmin=0, vmax=21, clip=True)
mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
df['some_value_color'] = df['some_value'].apply(lambda x: mapper.to_rgba(x))
df
Which outputs:
How do I convert 'some_value' df column values to hex in one go?
Ideally using the sns.cubehelix_palette(light=1)
I am not opposed to using something other than matplotlib
Thanks in advance.
You may use matplotlib.colors.to_hex() to convert a color to hexadecimal representation.
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
# Create dataframe
df = pd.DataFrame(np.random.randint(0,21,size=(7, 2)), columns=['some_value', 'another_value'])
# Add a nan to handle realworld
df.iloc[-1] = np.nan
# Try to map values to colors in hex
# # Taken from here
norm = matplotlib.colors.Normalize(vmin=0, vmax=21, clip=True)
mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
df['some_value_color'] = df['some_value'].apply(lambda x: mcolors.to_hex(mapper.to_rgba(x)))
df
Efficiency
The above method it easy to use, but may not be very efficient. In the folling let's compare some alternatives.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
def create_df(n=10):
# Create dataframe
df = pd.DataFrame(np.random.randint(0,21,size=(n, 2)),
columns=['some_value', 'another_value'])
# Add a nan to handle realworld
df.iloc[-1] = np.nan
return df
The following is the solution from above. It applies the conversion to the dataframe row by row. This quite inefficient.
def apply1(df):
# map values to colors in hex via
# matplotlib to_hex by pandas apply
norm = mcolors.Normalize(vmin=np.nanmin(df['some_value'].values),
vmax=np.nanmax(df['some_value'].values), clip=True)
mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
df['some_value_color'] = df['some_value'].apply(lambda x: mcolors.to_hex(mapper.to_rgba(x)))
return df
That's why we might choose to calculate the values into a numpy array first and just assign this array as the newly created column.
def apply2(df):
# map values to colors in hex via
# matplotlib to_hex by assigning numpy array as column
norm = mcolors.Normalize(vmin=np.nanmin(df['some_value'].values),
vmax=np.nanmax(df['some_value'].values), clip=True)
mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
a = mapper.to_rgba(df['some_value'])
df['some_value_color'] = np.apply_along_axis(mcolors.to_hex, 1, a)
return df
Finally we may use a look up table (LUT) which is created from the matplotlib colormap, and index the LUT by the normalized data. Because this solution needs to create the LUT first, it is rather ineffienct for dataframes with less entries than the LUT has colors, but will pay off for large dataframes.
def apply3(df):
# map values to colors in hex via
# creating a hex Look up table table and apply the normalized data to it
norm = mcolors.Normalize(vmin=np.nanmin(df['some_value'].values),
vmax=np.nanmax(df['some_value'].values), clip=True)
lut = plt.cm.viridis(np.linspace(0,1,256))
lut = np.apply_along_axis(mcolors.to_hex, 1, lut)
a = (norm(df['some_value'].values)*255).astype(np.int16)
df['some_value_color'] = lut[a]
return df
Compare the timings
Let's take a dataframe with 10000 rows.
df = create_df(10000)
Original solution (apply1)
%timeit apply1(df)
2.66 s per loop
Array solution (apply2)
%timeit apply2(df)
240 ms per loop
LUT solution (apply3)
%timeit apply1(df)
7.64 ms per loop
In this case the LUT solution gives almost a factor 400 of improvement.

Resources