"DataFrame" is not callable - python-3.x

It seems to be a recurrent problem on the site but i was not able to understand any of the similar problems/topics. I'm trying to get a scatter matrix from pandas (pandas.plotting.scatter_matrix), but I get the error DataFrame is not callable.
Sorry to bother you, the error is maybe obvious but I'm not able to deal with it.
I'm not very familiar with pandas.
#Data_set is data from load_iris from sklearn.datasets, it is a bunch and it
#has 5 keys : 'features_names','target_names','target','DESCR', 'data'
iris_df = pd.DataFrame(Data_set['data'], columns=Data_set['feature_names'])
iris_df['species'] = Data_set['target']
pd.plotting.scatter_matrix(iris_df, alpha=0.2, figsize=(10, 10))
plt.show()
I just want to print the scatter matrix of my data and I get the error DataFrame is not callable and I'm not able to understand why.

I can get the scatter_matrix without any problems using the following code:
from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
pal = sns.color_palette("cubehelix", 8)
sns.set_palette(pal)
Data_set = datasets.load_iris()
iris_df = pd.DataFrame(Data_set['data'], columns=Data_set['feature_names'])
iris_df['species'] = Data_set['target']
pd.plotting.scatter_matrix(iris_df, alpha=0.2, figsize=(10, 10))
plt.show()
There's a possibility you haven't read in the data set correctly. Check the contents of your Data_set.

Related

Plotly python facetted heatmaps

I'm using the example from this SO Q&A to use seaborn for facetted heatmaps in python. The result looks like this:
I'd like to do the same thing with plotly express and have tried with this starter code:
import plotly.express as px
df = px.data.medals_wide(indexed=True)
fig = px.imshow(df)
fig.show()
My data is also in a pd.DataFrame and it's important I show the groups the heatmaps are grouped by as well as the x/y-axis of the maps.
How do you extend the px.imshow example to create a facetted heatmap by group like the seaborn example above?
The sample data is taken from the referenced responses to answer the question. express, as data, can be subplotted if it is column data, but the results cannot be used with a categorical variable as the extraction condition with a different categorical variable, as in the sample data. You can draw it if it is as a subplot using a graph object in A heat map can be created by specifying the xy-axis in the data frame of the result of data extraction by category variable.
import numpy as np
import pandas as pd
import plotly.express
# Generate a set of sample data
np.random.seed(0)
indices = pd.MultiIndex.from_product((range(5), range(5), range(5)), names=('label0', 'label1', 'label2'))
data = pd.DataFrame(np.random.uniform(0, 100, size=len(indices)), index=indices, columns=('value',)).reset_index()
import plotly.graph_objects as go
from plotly.subplots import make_subplots
titles = ['label0='+ str(x) for x in range(5)]
fig = make_subplots(rows=1, cols=len(data['label0'].unique()),
shared_yaxes=True,
subplot_titles = tuple(titles))
for i in data['label0'].unique():
df = data[data['label0'] == i]
fig.add_trace(go.Heatmap(z=df.value, x=df.label1, y=df.label2), row=1, col=i+1)
fig.update_traces(showscale=False)
fig.update_xaxes(dtick=[0,1,2,3,4])
fig.update_xaxes(title_text='label1', row=1, col=i+1)
fig.update_yaxes(title_text='label2', row=1, col=1)
fig.show()

How do i fix "If using all scalar values, you must pass an index" error?

I am manually trying to build a linear regression model for understanding purpose without using the builtin function. I am getting the error while plotting the regression line. Kindly help me fix it.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sb
data = {'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(np.ones(10), columns = ['ones'])
df_new = pd.concat([df2,df], axis = 1)
X = df_new.loc[:, ['ones', 'X']].values
Y = df_new['Y'].values.reshape(-1,1)
theta = np.array([0.5, 0.2]).reshape(-1,1)
Y_pred = X.dot(theta)
sb.lineplot(df['X'].values.reshape(-1,1),Y_pred)
plt.show()
Error message:
If using all scalar values, you must pass an index
You are passing a 2d array, while seaborn's lineplot expects a 1d array (or a pandas column which is basically same). So change it to
sb.lineplot(df['X'].values,Y_pred.reshape(-1))

Unable to use "from sklearn.preprocessing import Imputer" , it shows the exception " Data must be 1-dimensional"

I have made a model for the artificial neural network(ANN). I want to preprocess the data before train the model.
I have tried the code given below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Update-Detaset with hacking1.csv')
y=[]
X = dataset.iloc[:,2:7]
y = dataset.iloc[:,8]
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
Y = np.reshape(y,(-1,1))
imputer = imputer.fit(Y)
Y= imputer.transform(Y)
Exception: Data must be 1-dimensional
Here, Update-Detaset with hacking1.csv is the .csv file. The dataset is lookig like:
Please click the link to see the demo of the csv file
It shows the following errors.
How can I solve this?
This has nothing to do with Imputer. You should have been able to tell this from the line number that threw the Exception. The error is from you trying to reshape a pandas DataFrame. Change
y = dataset.iloc[:,8]
to
y = dataset.iloc[:,8].values
and it should work.

How to loop through items in pandas col and run and plot a scikit model?

I got some interesting user data from races. I know when the respecitve athletes planed to finish a race and I know when they actaully finished (next to some more stuff). The goal is to find out when the athletes come in late. I want to run a support vector machine for each athlete and plot the decision boundaries.
Here is what I do:
import numpy as np
import pandas as pd
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
# Create arbitrary dataset for example
df = pd.DataFrame({'User': np.random.random_integers(low=1, high=4, size=50),
'Planned_End': np.random.uniform(low=-5, high=5, size=50),
'Actual_End': np.random.uniform(low=-1, high=1, size=50),
'Late': np.random.random_integers(low=0, high=2, size=50)}
)
# Fit Support Vector Machine Classifier
X = df[['Planned_End', 'Actual_End']]
y = df['Late']
clf = svm.SVC(decision_function_shape='ovo')
for i, y in df['User']:
clf.fit(X, y)
ax = plt.subplot()
fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
plt.title(lab)
plt.show()
I get the following error: TypeError: 'numpy.int64' object is not iterable - that is, I somehow can't loop through the column.
I guess it comes down to the numpy data format? How can I solve that?
try iteritems()
for i, y in df['User'].iteritems():
Your User Series contains numpy.int64 objects so you can only use:
for y in df['User']:
And you don't use i anywhere.
As for the rest of the code, this produces some solution, please edit accordingly:
import numpy as np
import pandas as pd
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
# Create arbitrary dataset for example
df = pd.DataFrame({'User': np.random.random_integers(low=1, high=4, size=50),
'Planned_End': np.random.uniform(low=-5, high=5, size=50),
'Actual_End': np.random.uniform(low=-1, high=1, size=50),
'Late': np.random.random_integers(low=0, high=2, size=50)}
)
# Fit Support Vector Machine Classifier
X = df[['Planned_End', 'Actual_End']].as_matrix()
y = df['Late']
clf = svm.SVC(decision_function_shape='ovo')
y = df['User'].values
clf.fit(X, y)
ax = plt.subplot()
fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
plt.title('lab')
plt.show()

Matplotlib.animation.FuncAnimation using pcolormesh

Python 3.5, windows 10 Pro.
I'm trying to continuously plot an 8x8 array of pixels (for the sake of the question I'll just use random data, but in the real thing I'm reading from a serial port).
I can do it using a while loop, but I need to switch over to matplotlib.animation.FuncAnimation and I can't get it to work. I've tried looking at the help files and tried to follow examples from matplotlib.org here, but I've not been able to follow it.
Can someone help me figure out how to continuously plot an 8x8 array of pixels using FuncAnimation and pcolormesh? Here is what I've got so far:
import scipy as sp
import matplotlib.pyplot as plt
from matplotlib import animation
plt.close('all')
y = sp.rand(64).reshape([8,8])
def do_something():
y = sp.rand(64).reshape([8,8])
fig_plot.set_data(y)
return fig_plot,
fig1 = plt.figure(1,facecolor = 'w')
plt.clf()
fig_plot = plt.pcolormesh(y)
fig_ani = animation.FuncAnimation(fig1,do_something)
plt.show()
If you want to see the while loop code, just so you know exactly what I'm trying to reproduce, see below.
import scipy as sp
import matplotlib.pyplot as plt
plt.figure(1)
plt.clf()
while True:
y = sp.rand(64).reshape([8,8])
plt.pcolormesh(y)
plt.show()
plt.pause(.000001)
I was able to find a solution using imshow instead of pcolormesh. In case anyone else is struggling with the same issues I had, I've posted the working code below.
import scipy as sp
import matplotlib.pyplot as plt
import matplotlib.animation as animation
Hz = sp.rand(64).reshape([8,8]) # initalize with random data
fig = plt.figure(1,facecolor='w')
ax = plt.axes()
im = ax.imshow(Hz)
im.set_data(sp.zeros(Hz.shape))
def update_data(n):
Hz = sp.rand(64).reshape([8,8]) # More random data
im.set_data(Hz)
return
ani = animation.FuncAnimation(fig, update_data, interval = 10, blit = False, repeat = False)
fig.show()

Resources