Plot polynomial regression in Python with Scikit-Learn - python-3.x

I am writing a python code for investigating the over-fiting using the function sin(2.pi.x) in range of [0,1]. I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1. I fit the model using M-th polynomial. Here is my code
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)
M=2
poly_features=PolynomialFeatures(degree=M, include_bias=False)
X_poly=poly_features.fit_transform(X) # contain original X and its new features
model=LinearRegression()
model.fit(X_poly,y) # Fit the model
# Plot
X_plot=np.linspace(0,1,100).reshape(-1,1)
X_plot_poly=poly_features.fit_transform(X_plot)
plt.plot(X,y,"b.")
plt.plot(X_plot_poly,model.predict(X_plot_poly),'-r')
plt.show()
Picture of polynomial regression
I don't know why I have M=2 lines of m-th polynomial line? I think it should be 1 line regardless of M. Could you help me figure out this problem.

Your data after polynomial feature transformation is of shape (n_samples,2).
So pyplot is plotting the predicted variable with both columns.
Change the plot code to
plt.plot(X_plot_poly[:,i],model.predict(X_plot_poly),'-r')
where i your column number

Related

Nyquist Plot using Python with certain parameters

I am trying to draw the Nyquist plot using python but I have no clue what all parameters are required by python to do plot that curve.
Here is a glimpse of the parameters that I have:
Channel_ID,Step_ID,Cycle_ID,Test_Time,EIS_Test_ID,EIS_Data_Point,Frequency,Zmod,Zphz,Zreal,Zimg,OCV,AC_Amp_RMS
4,7,1,36966.3072,0,0,200015.6,0.4933,70.9969,0.1606,0.4664,3.6231,0.35
4,7,1,36966.3072,0,1,158953.1,0.412,70.8901,0.1349,0.3893,3.6231,0.35
4,7,1,36966.3072,0,2,126234.4,0.3437,70.7115,0.1135,0.3244,3.6231,0.35
4,7,1,36966.3072,0,3,100265.6,0.2869,70.6312,0.0951,0.2706,3.6231,0.35
4,7,1,36966.3072,0,4,79640.63,0.2364,70.2418,0.0799,0.2224,3.6231,0.35
and above are the values to those parameters.
Based on the above parameters that are
Test_Time, Frequency, Zmod, Zphz, Zreal, Zimg, OCV, AC_Amp_RMS where Zmod is the absolute value of Zreal and Zimg, I need to draw a Nyquist plot. I have no clue how these parameters could be used for the plot.
PS: I tried to plot the curve by making use of the real and imaginary part that is Zimg and Zreal
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
train_df = pd.read_csv("above_data_with_around_100_rows.csv")
plt.figure()
plt.plot(train_df["Zreal"], train_df["Zimg"], "b")
plt.plot(train_df["Zreal"], -train_df["Zimg"], "r")
plt.show()
Can this be the useful for Nyquist plot?

How to save Confusion Matrix plot so that I can call it for future reference?

I was using this latest function, sklearn.metrics.plot_confusion_matrix to plot my confusion matrix.
cm = plot_confusion_matrix(classifier,X , y_true,cmap=plt.cm.Greens)
And when I execute that cell, the confusion matrix plot showed up as expected. My problem is I want to use the plot for another cell later. When I called cm in another cell, it only shows the location of that object.
>>> cm
>>> <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1af790ac6a0>
Calling plt.show() doesn't work either
For your problem to work as you expect it you should do cm.plot()
Proof
Let's try to do it in a reproducible fashion:
from sklearn.metrics import plot_confusion_matrix
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
np.random.seed(42)
X, y = make_classification(1000, 10, n_classes=2)
clf = RandomForestClassifier()
clf.fit(X,y)
cm = plot_confusion_matrix(clf, X , y, cmap=plt.cm.Greens)
You can plot your cm object later as:
cm.plot(cmap=plt.cm.Greens);
For your reference. You can access methods available for cm object as:
[method for method in dir(cm) if not method.startswith("__")]
['ax_',
'confusion_matrix',
'display_labels',
'figure_',
'im_',
'plot',
'text_']
cm.figure_.savefig('conf_mat.png',dpi=300)

How to convert scalar array to 2d array?

I am new to machine learning and facing some issues in converting scalar array to 2d array.
I am trying to implement polynomial regression in spyder. Here is my code, Please help!
# Polynomial Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)
# Predicting a new result with Linear Regression
lin_reg.predict(6.5)
# Predicting a new result with Polynomial Regression
lin_reg_2.predict(poly_reg.fit_transform(6.5))
ValueError: Expected 2D array, got scalar array instead: array=6.5.
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
You get this issue in Jupyter only.
To resolve in jupyter make the value into np array using below code.
lin_reg.predict(np.array(6.5).reshape(1,-1))
lin_reg_2.predict(poly_reg.fit_transform(np.array(6.5).reshape(1,-1)))
For spyder it work same as you expected:
lin_reg.predict(6.5)
lin_reg_2.predict(poly_reg.fit_transform(6.5))
The issue with your code is linreg.predict(6.5).
If you read the error statement it says that the model requires a 2-d array , however 6.5 is scalar.
Why? If you see your X data is having 2-d so anything that you want to predict with your model should also have two 2d shape.
This can be achieved either by using .reshape(-1,1) which creates a column vector (feature vector) or .reshape(1,-1) If you have single sample.
Things to remember in order to predict I need to prepare my data in the same way as my original training data.
If you need any more info let me know.
You have to give the input as 2D array, Hence try this!
lin_reg.predict([6.5])
lin_reg_2.predict(poly_reg.fit_transform([6.5]))

How to visualize an sklearn GradientBoostingClassifier?

I've trained a gradient boost classifier, and I would like to visualize it using the graphviz_exporter tool shown here.
When I try it I get:
AttributeError: 'GradientBoostingClassifier' object has no attribute 'tree_'
this is because the graphviz_exporter is meant for decision trees, but I guess there's still a way to visualize it, since the gradient boost classifier must have an underlying decision tree.
Does anybody know how to do that?
The attribute estimators contains the underlying decision trees. The following code displays one of the trees of a trained GradientBoostingClassifier. Notice that although the ensemble is a classifier as a whole, each individual tree computes floating point values.
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import export_graphviz
import numpy as np
# Ficticuous data
np.random.seed(0)
X = np.random.normal(0,1,(1000, 3))
y = X[:,0]+X[:,1]*X[:,2] > 0
# Classifier
clf = GradientBoostingClassifier(max_depth=3, random_state=0)
clf.fit(X[:600], y[:600])
# Get the tree number 42
sub_tree_42 = clf.estimators_[42, 0]
# Visualization
# Install graphviz: https://www.graphviz.org/download/
from pydotplus import graph_from_dot_data
from IPython.display import Image
dot_data = export_graphviz(
sub_tree_42,
out_file=None, filled=True, rounded=True,
special_characters=True,
proportion=False, impurity=False, # enable them if you want
)
graph = graph_from_dot_data(dot_data)
Image(graph.create_png())
Tree number 42:

Sklearn.mixture.dpgmm not functioning correctly

I'm having trouble with sklearn.mixture.dpgmm. The main issue is that it is not returning correct covariances for synthetic data (2 separated 2D gaussians), where it really should have no issue. In particular, when I do dpgmm._get_covars(), the covariance matrices have diagonal elements that are always exactly 1.0 too large, regardless of the input data distributions. This seems like a bug, as gmm works perfectly (when limiting to known exact number of groups)
Another issue is that dpgmm.weights_ makes no sense, they sum to one but the values appear meaningless.
Does anyone have a solution to this or see something clearly wrong with my example?
Here is the exact script I'm running:
import itertools
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib as mpl
import pdb
from sklearn import mixture
# Generate 2D random sample, two gaussians each with 10000 points
rsamp1 = np.random.multivariate_normal(np.array([5.0,5.0]),np.array([[1.0,-0.2],[-0.2,1.0]]),10000)
rsamp2 = np.random.multivariate_normal(np.array([0.0,0.0]),np.array([[0.2,-0.0],[-0.0,3.0]]),10000)
X = np.concatenate((rsamp1,rsamp2),axis=0)
# Fit a mixture of Gaussians with EM using 2
gmm = mixture.GMM(n_components=2, covariance_type='full',n_iter=10000)
gmm.fit(X)
# Fit a Dirichlet process mixture of Gaussians using 10 components
dpgmm = mixture.DPGMM(n_components=10, covariance_type='full',min_covar=0.5,tol=0.00001,n_iter = 1000000)
dpgmm.fit(X)
print("Groups With data in them")
print(np.unique(dpgmm.predict(X)))
##print the input and output covars as example, should be very similar
correct_c0 = np.array([[1.0,-0.2],[-0.2,1.0]])
print "Input covar"
print correct_c0
covars = dpgmm._get_covars()
c0 = np.round(covars[0],decimals=1)
print "Output Covar"
print c0
print("Output Variances Too Big by 1.0")
According to the dpgmm docs this Class is Deprecated in version 0.18 and will be removed in version 0.20
You should use BayesianGaussianMixture Class instead, with parameter weight_concentration_prior_type set with option "dirichlet_process"
Hope it helps
instead of writing
from sklearn.mixture import GMM
gmm = GMM(2, covariance_type='full', random_state=0)
you should write:
from sklearn.mixture import BayesianGaussianMixture
gmm = BayesianGaussianMixture(2, covariance_type='full', random_state=0)

Resources