DataConversionWarning on sklearn Logistic Regression - python-3.x

I am trying to perform a logistics regression in sklearn below:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
mod_data2 = mod_data.copy()
classifier.fit(mod_data2[['prob1_norm', 'prob2_norm']].values.reshape(-1,2), mod_data2['Success'].values.reshape(-1,1))
But it is giving me the error message:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
I have tried using .ravel() on the end of my input data but then it tells me I have the wrong dimensions.
Thanks

df.squeeze() should work. It converts a dataframe to a series and when I used it the warning conversion went away
y = mod_data2['Success'].squeeze()

Related

Inverse Feature Scaling not working while predicting results

# Importing required libraries
import numpy as np
import pandas as pd
# Importing dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1: -1].values
y = dataset.iloc[:, -1].values
y = y.reshape(len(y), 1)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
scy = StandardScaler()
scX = StandardScaler()
X = scX.fit_transform(X)
y = scy.fit_transform(y)
# Training SVR model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting results from SCR model
# this line is generating error
scy.inverse_transform(regressor.predict(scX.transform([[6.5]])))
I am trying to execute this code to predict values from the model but after running it I am getting errors like this:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.27861589].
Reshape your data either using an array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Complete Stack trace of error:
Even my instructor is using the same code but his one is working mine one not I am new to machine learning can anybody tell me what I am doing wrong here.
Thanks for your help.
This is the data for reference
It is because of the shape of your predictions, the scy is expecting an output with (-1, 1) shape. Change your last line to this:
scy.inverse_transform([regressor.predict(scX.transform([[6.5]]))])
You can also use this line to predict:
pred = regressor.predict(scX.transform([[6.5]]))
pred = pred.reshape(-1, 1)
scy.inverse_transform(pred)

error while printing the predicted value in multiple linear regression

from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)
x1 = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y1 = np.asanyarray(test[['CO2EMISSIONS']])
xy = regr.predict(y1)
print(xy) // an error is generating while printing this (valueError)
this worked in simple linear regression but here is not working in multiple-linear-regression
regr.predict expects the same shape of x.
Furthermore, when you want to predict something, it should be based on some input, not output.
So, xy = regr.predict(y1) is wrong.
You should try xy = regr.predict(x1) instead.
The reason why it works (but in fact, it is not correct) in simple regression in that you provide a 1D array to regr.predict. As mentionned, this should be regr.predict(x1) instread of regr.predict(y1), since you are trying to predict y1 from x1. The algorithm does not "distinguish" between x1 and y1 in simple regression because they are both 1D arrays, so it does not raise an error.
However in multiple regression, you fit an equation on a 2D or 3D or...N-dimensional x array. So, when you run regr.predict(y1), it raises an error because you are trying to predict with the 1D y1 array.
Just replace regr.predict(y1) by regr.predict(x1) and it will work both for simple and multiple regrerssion.

How to normalize time series data with multiple features by using sklearn?

For data with the shape (num_samples,features), MinMaxScaler from sklearn.preprocessing can be used to normalize it easily.
However, when using the same method for time series data with the shape (num_samples, time_steps,features), sklearn will give an error.
from sklearn.preprocessing import MinMaxScaler
import numpy as np
#Making artifical time data
x1 = np.linspace(0,3,4).reshape(-1,1)
x2 = np.linspace(10,13,4).reshape(-1,1)
X1 = np.concatenate((x1*0.1,x2*0.1),axis=1)
X2 = np.concatenate((x1,x2),axis=1)
X = np.stack((X1,X2))
#Trying to normalize
scaler = MinMaxScaler()
X_norm = scaler.fit_transform(X) <--- error here
ValueError: Found array with dim 3. MinMaxScaler expected <= 2.
This post suggests something like
(timeseries-timeseries.min())/(timeseries.max()-timeseries.min())
Yet, it only works for data with only 1 feature. Since my data has more than 1 feature, this method doesn't work.
How to normalize time series data with multiple features?
To normalize a 3D tensor of shape (n_samples, timesteps, n_features) use the following:
(timeseries-timeseries.min(axis=2))/(timeseries.max(axis=2)-timeseries.min(axis=2))
Using the argument axis=2 will return the result of the tensor operation performed along the 3rd dimension i.e., the feature axis. Thus each feature will be normalized independently.

python 3 linear regression problems

I'm doing a simple linear regression using sklearn (on python3.7), but the intercept and the fit comand give me the same result.
Here is my code:
df = pd.DataFrame({'stud': [1,2,3,4,5,6,4,1,2,1,3],'red':[12000,23000,35000,47000,55000,67000,43000,15000, 25000,15000,31500]})
df
mat = np.matrix(df)
x = mat[:,1]
y = mat[:,0]
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr
lr.fit(x,y)
lr.intercept_
This last comand gives me the following result:
array([-0.26514558])
and every value I put on predict comand gives me the same result
lr.predict(5)
lr.predict(5)
array([[-0.26467182]])
Could anyone help me?

how to resolve this ValueError: only 2 non-keyword arguments accepted sklearn python

hello i am new to sklearn in python and iam trying to learn it and use this module to predict some numbers based on two features here is the error i am getting:
ValueError: only 2 non-keyword arguments accepted
and here is my code:
from sklearn.linear_model import LinearRegression
import numpy as np
trainingData = np.array([[861, 16012018], [860, 12012018], [859, 9012018], [858, 5012018], [857, 2012018], [856, 29122017], [855, 26122017], [854, 22122017], [853, 19122017]])
trainingScores = np.array([11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2])
clf = LinearRegression(fit_intercept=True)
clf.fit(trainingScores,trainingData)
predictionData = np.array([862, 19012018 ])
x=clf.predict(predictionData)
print(x)
I am not sure what you are trying to do here, but change this line:
trainingScores = np.array([11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2])
to this (Notice the extra square brackets around your data):
trainingScores = np.array([[11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2]])
Then change the order of params in fit() like this:
clf.fit(trainingData,trainingScores)
And finally change prediction data like this (again look at the extra square brackets):
predictionData = np.array([[862, 19012018]])
After that your code will run.
You are doing a linear regression code in ML and try to change this line with
trainingScores = np.array(
[11,18,23,33,34,6],
[10,19,21,33,34,1],
[14,18,22,23,31,6],
[16,22,29,31,33,10],
[21,24,27,30,31,6],
[1,14,15,20,27,7],
[1,9,10,11,15,8],
[2,9,27,31,35,1],
[7,13,18,22,33,2]
)

Resources