How do you use matplotlib function fill_between with pandas dataframe - python-3.x

I have a stock price line graph which works,however I wanted to use the fill between function. I have tried passing in the values directly from the series and also creating lists etc. and nothing works. Is this possible?
myDF = pd.read_csv('C:/Workarea/OneDrive/PyProjects/Learning/stocks_sentdex/GOOG-LON_TSCO.csv')
print(myDF)
myDF = myDF.set_index('Date')
myDF['Close'].plot()
plt.fill_between(?, 0, ?, alpha=0.3)
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Check it out')
plt.legend()
plt.subplots_adjust(left=0.09,bottom=0.16, right=0.94,top=0.90, wspace=0.2, hspace=0)
plt.show()
All the examples I have seen use their own data or read from a urllib. All help greatly appreciated.

import pandas as pd
import pandas_datareader.data as pdata
import matplotlib.pyplot as plt
# myDF = pd.read_csv('C:/Workarea/OneDrive/PyProjects/Learning/stocks_sentdex/GOOG-LON_TSCO.csv')
# myDF = myDF.set_index('Date')
myDF = pdata.get_data_google('LON:TSCO', start='2009-01-02', end='2009-12-31')
fig, ax = plt.subplots()
ax.fill_between(myDF.index, 0, myDF['Close'], alpha=0.3, label='LON:TSCO')
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.set_title('Check it out')
ax.legend()
fig.subplots_adjust(left=0.09,bottom=0.16, right=0.94,top=0.90, wspace=0.2, hspace=0)
fig.autofmt_xdate()
plt.show()
The error message
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'
could occur if either myDF.index or myDF['Close'] is an object array. As a simple example,
In [110]: plt.fill_between(np.array([1,2], dtype='O'), 0, np.array([1,2], dtype='O'))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Chances are it is the Date that are mere strings rather than datetime-like objects. To fix this, use pd.to_datetime(myDF['Date']) to convert the date strings into datetime-like objects.
myDF = pd.read_csv('C:/Workarea/OneDrive/PyProjects/Learning/stocks_sentdex/GOOG-LON_TSCO.csv')
myDF['Date'] = pd.to_datetime(myDF['Date'])
myDF = myDF.set_index('Date')

Related

Matplotlib: applying cellColours to only certain columns/cells

Got myself in a pickle.
I'm creating a basic table in Matplotlib (via Pandas, but that's not the issue). What I'm trying to accomplish is to create a table where the first column, which will be string values, remains white...but columns 2,3,4,5,6 are floating/integers and will be colored by a custom normalized colormap.
I've started with the basics, and created the 'colored' table via the code below. This only plots the columns with integer values at this point, see here:
What I ulimately need to do is plot this with an additional column, say before column 'A' or after column 'F' which holds string values, e.g. ['MBIAS', 'RMSE', 'BAGSS', 'MBIAS', 'MBIAS'].
However if I try to apply the cellColours method in the code below to a table that mixes lists of strings and float/integers, it obviously fails.
Is there a method to apply a cellColours scheme to only certain cells, or row/columns? Can I loop through, applying the custom colormap to specific cells?
Any help or tips would be appreciated!
Code:
import numpy as np
import matplotlib
from matplotlib import cm
import matplotlib.pyplot as plt
from pandas import *
#Create sample data in pandas dataframe
idx = Index(np.arange(1,6))
df = DataFrame(abs(2*np.random.randn(5, 5)), index=idx, columns=['A', 'B', 'C', 'D', 'E'])
model = ['conusarw', 'conusarw', 'conusarw', 'nam04', 'emhrrr']
df['Model'] = model
df1 = df[['A','B','C','D','E']]
test = df1.round({'A':2,'B':2,'C':2,'D':2,'E':2})
print(test)
vals = test.values
print(vals)
#Creates normalized list (from 0-1) based a user provided range and center of distribution.
norm = matplotlib.colors.TwoSlopeNorm(vmin=0,vcenter=1,vmax=10)
#Merges colormap to the normalized data based on customized normalization pattern from above.
colours = plt.cm.coolwarm(norm(vals))
#Create figure in Matplotlib in which to plot table.
fig = plt.figure(figsize=(15,8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
#Plot table, using pandas dataframe information and data.
#Customized lists of data and names can also be provided.
the_table=plt.table(cellText=vals, rowLabels=model, colLabels=df.columns,
loc='center', cellColours=colours)
plt.savefig('test_table.png')
Instead of the fast vectorized call colours = plt.cm.coolwarm(norm(vals)), you can just use regular Python loops with if-tests. The code below loops through the individual rows, then through the individual elements and test whether they are numeric. A similar loop prepares the rounded values. Speed is not really a problem, unless you'd have thousands of elements.
(The code uses import pandas as pd, as import * from pandas isn't recommended.)
import matplotlib.pyplot as plt
from matplotlib.colors import to_rgba, TwoSlopeNorm
import pandas as pd
import numpy as np
# Create sample data in pandas dataframe
idx = pd.Index(np.arange(1, 6))
df = pd.DataFrame(abs(2 * np.random.randn(5, 5)), index=idx, columns=['A', 'B', 'C', 'D', 'E'])
df['Model'] = ['conusarw', 'conusarw', 'conusarw', 'nam04', 'emhrrr']
cmap = plt.cm.coolwarm
norm = TwoSlopeNorm(vmin=0, vcenter=1, vmax=10)
colours = [['white' if not np.issubdtype(type(val), np.number) else cmap(norm(val)) for val in row]
for row in df.values]
vals = [[val if not np.issubdtype(type(val), np.number) else np.round(val, 2) for val in row]
for row in df.values]
fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
the_table = plt.table(cellText=vals, rowLabels=df['Model'].to_list(), colLabels=df.columns,
loc='center', cellColours=colours)
plt.show()
PS: If speed is a concern, the following code is a bit trickier. It uses:
setting the "bad color" of a colormap
pd.to_numeric(..., errors='coerce') to convert all strings to nans
as pd.to_numeric() only works for 1D arrays, ravel() and reshape() are used
using the same arrays, np.where can do the rounding
cmap = plt.cm.coolwarm.copy()
cmap.set_bad('white')
norm = TwoSlopeNorm(vmin=0, vcenter=1, vmax=10)
values = pd.to_numeric(df.values.ravel(), errors='coerce').reshape(df.shape)
colours = cmap(norm(values))
vals = np.where(np.isnan(values), df.values, np.round(values, 2))
fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(111, frameon=False, xticks=[], yticks=[])
the_table = plt.table(cellText=vals, rowLabels=df['Model'].to_list(), colLabels=df.columns,
loc='center', cellColours=colours)

Break a pandas line plot at specific date

I have a time-series dataframe with missing data for some time period. I would like to create a line plot and break a line where there is missing data.
data_site1_ave[["samples", "lkt"]].plot(figsize=(15,4), title = "Site 1", xlabel='')
Is it possible to create a gap, let's say from 2018-05-01 to 2018-10-30 in the line plot?
Yes, you can create arbitrary gaps by simply calling df.plot() several times, on the appropriate slices of the full dataframe. To make everything appear in the same plot, you can pass the ax keyword argument to plt.plot() via the df.plot() method. Turn the legend off for all but one call, so that the legend only has the one entry.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create sample time series
N = 365
np.random.seed(42)
x = pd.date_range('2018-01-01', freq='d', periods=N)
y = np.cumsum(np.random.rand(N, 1) - 0.5)
df = pd.DataFrame(y, columns=['y'], index=x)
# plot time series with gap
fig, ax = plt.subplots()
df.loc[:'2018-05-01'].plot(ax=ax, c='blue')
df.loc['2018-10-31':].plot(ax=ax, c='blue', legend=False);

How do i fix "If using all scalar values, you must pass an index" error?

I am manually trying to build a linear regression model for understanding purpose without using the builtin function. I am getting the error while plotting the regression line. Kindly help me fix it.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sb
data = {'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(np.ones(10), columns = ['ones'])
df_new = pd.concat([df2,df], axis = 1)
X = df_new.loc[:, ['ones', 'X']].values
Y = df_new['Y'].values.reshape(-1,1)
theta = np.array([0.5, 0.2]).reshape(-1,1)
Y_pred = X.dot(theta)
sb.lineplot(df['X'].values.reshape(-1,1),Y_pred)
plt.show()
Error message:
If using all scalar values, you must pass an index
You are passing a 2d array, while seaborn's lineplot expects a 1d array (or a pandas column which is basically same). So change it to
sb.lineplot(df['X'].values,Y_pred.reshape(-1))

PathCollection' object has no attribute legend_elements''

I know this exact question has been asked here, however the current solution does nothing for me. I can't seem to generate a legend that has a different color for each label. I have tried the current documentation on Matplotlib to no avail. I keep getting the error that my PathCollection object has no attribute legend_elements
EDIT: Also, I want my legend to be just the Years, unique years for the plot not how it is right now with is that each data point is mapped to my legend.
Here's what I have
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.pyplot import legend
import os
%config InlineBackend.figure_format = 'retina'
path = None
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
path = os.path.join(dirname, filename)
# Indexes to be removed
early_demo_dividend = 13
high_income = 24
lower_middle_income = 40
north_america = 46
members = 50
post_demo = 56
_removals = [early_demo_dividend, high_income, lower_middle_income, north_america, members, post_demo]
#Read in data
df = pd.read_csv(path)
#Get the rows we want
df = df.loc[df['1960'] > 1]
df = df.drop(columns=["Code", "Type", "Indicator Name"])
#Remove the odd rows
for i in _removals:
df = df.drop(df.index[i])
#Format the dataframe
df = df.melt('Name', var_name='Year', value_name='Budget')
#Plot setup
plt.figure().set_size_inches(16,6)
plt.xticks(rotation=90)
plt.grid(True)
#Plot labels
plt.title('Military Spending of Countries')
plt.xlabel('Countries')
plt.ylabel('Budget in Billions')
#Plot data
new_year = df['Year'].astype(int)
scatter = plt.scatter(df['Name'], df['Budget'], c=(new_year / 10000) , label=new_year)
#Legend setup produce a legend with the unique colors from the scatter
legend1 = plt.legend(*scatter.legend_elements(),
loc="lower left", title="Years")
plt.add_artist(legend1)
plt.show()
Heres my plot
I also encountered this problem.
Try to upgrade your matplotlib with pip3 install --upgrade matplotlib
Uninstalling matplotlib-3.0.3:
Successfully uninstalled matplotlib-3.0.3
Successfully installed matplotlib-3.1.2
It works for me.
Despite the fact that my answer may not be relevant to the current question, I decided to leave it to describe my case - it might be useful to someone else:
When using matplotlib functions such as scatter or plot, incorrectly specify the name of some additional arguments, you can get the same error.
Example:
x = list(range(10))
y = list(range(10))
plt.scatter(x, y, labels='RESULT')
I get the error:
AttributeError: 'PathCollection' object has no property 'labels'
As it said in error message (but it is not obvious to an inattentive developer :) ):
the problem that I use labels instead of label

"DataFrame" is not callable

It seems to be a recurrent problem on the site but i was not able to understand any of the similar problems/topics. I'm trying to get a scatter matrix from pandas (pandas.plotting.scatter_matrix), but I get the error DataFrame is not callable.
Sorry to bother you, the error is maybe obvious but I'm not able to deal with it.
I'm not very familiar with pandas.
#Data_set is data from load_iris from sklearn.datasets, it is a bunch and it
#has 5 keys : 'features_names','target_names','target','DESCR', 'data'
iris_df = pd.DataFrame(Data_set['data'], columns=Data_set['feature_names'])
iris_df['species'] = Data_set['target']
pd.plotting.scatter_matrix(iris_df, alpha=0.2, figsize=(10, 10))
plt.show()
I just want to print the scatter matrix of my data and I get the error DataFrame is not callable and I'm not able to understand why.
I can get the scatter_matrix without any problems using the following code:
from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
pal = sns.color_palette("cubehelix", 8)
sns.set_palette(pal)
Data_set = datasets.load_iris()
iris_df = pd.DataFrame(Data_set['data'], columns=Data_set['feature_names'])
iris_df['species'] = Data_set['target']
pd.plotting.scatter_matrix(iris_df, alpha=0.2, figsize=(10, 10))
plt.show()
There's a possibility you haven't read in the data set correctly. Check the contents of your Data_set.

Resources