from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)
x1 = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y1 = np.asanyarray(test[['CO2EMISSIONS']])
xy = regr.predict(y1)
print(xy) // an error is generating while printing this (valueError)
this worked in simple linear regression but here is not working in multiple-linear-regression
regr.predict expects the same shape of x.
Furthermore, when you want to predict something, it should be based on some input, not output.
So, xy = regr.predict(y1) is wrong.
You should try xy = regr.predict(x1) instead.
The reason why it works (but in fact, it is not correct) in simple regression in that you provide a 1D array to regr.predict. As mentionned, this should be regr.predict(x1) instread of regr.predict(y1), since you are trying to predict y1 from x1. The algorithm does not "distinguish" between x1 and y1 in simple regression because they are both 1D arrays, so it does not raise an error.
However in multiple regression, you fit an equation on a 2D or 3D or...N-dimensional x array. So, when you run regr.predict(y1), it raises an error because you are trying to predict with the 1D y1 array.
Just replace regr.predict(y1) by regr.predict(x1) and it will work both for simple and multiple regrerssion.
For data with the shape (num_samples,features), MinMaxScaler from sklearn.preprocessing can be used to normalize it easily.
However, when using the same method for time series data with the shape (num_samples, time_steps,features), sklearn will give an error.
from sklearn.preprocessing import MinMaxScaler
import numpy as np
#Making artifical time data
x1 = np.linspace(0,3,4).reshape(-1,1)
x2 = np.linspace(10,13,4).reshape(-1,1)
X1 = np.concatenate((x1*0.1,x2*0.1),axis=1)
X2 = np.concatenate((x1,x2),axis=1)
X = np.stack((X1,X2))
#Trying to normalize
scaler = MinMaxScaler()
X_norm = scaler.fit_transform(X) <--- error here
ValueError: Found array with dim 3. MinMaxScaler expected <= 2.
This post suggests something like
(timeseries-timeseries.min())/(timeseries.max()-timeseries.min())
Yet, it only works for data with only 1 feature. Since my data has more than 1 feature, this method doesn't work.
How to normalize time series data with multiple features?
To normalize a 3D tensor of shape (n_samples, timesteps, n_features) use the following:
(timeseries-timeseries.min(axis=2))/(timeseries.max(axis=2)-timeseries.min(axis=2))
Using the argument axis=2 will return the result of the tensor operation performed along the 3rd dimension i.e., the feature axis. Thus each feature will be normalized independently.
I'm doing a simple linear regression using sklearn (on python3.7), but the intercept and the fit comand give me the same result.
Here is my code:
df = pd.DataFrame({'stud': [1,2,3,4,5,6,4,1,2,1,3],'red':[12000,23000,35000,47000,55000,67000,43000,15000, 25000,15000,31500]})
df
mat = np.matrix(df)
x = mat[:,1]
y = mat[:,0]
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr
lr.fit(x,y)
lr.intercept_
This last comand gives me the following result:
array([-0.26514558])
and every value I put on predict comand gives me the same result
lr.predict(5)
lr.predict(5)
array([[-0.26467182]])
Could anyone help me?
hello i am new to sklearn in python and iam trying to learn it and use this module to predict some numbers based on two features here is the error i am getting:
ValueError: only 2 non-keyword arguments accepted
and here is my code:
from sklearn.linear_model import LinearRegression
import numpy as np
trainingData = np.array([[861, 16012018], [860, 12012018], [859, 9012018], [858, 5012018], [857, 2012018], [856, 29122017], [855, 26122017], [854, 22122017], [853, 19122017]])
trainingScores = np.array([11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2])
clf = LinearRegression(fit_intercept=True)
clf.fit(trainingScores,trainingData)
predictionData = np.array([862, 19012018 ])
x=clf.predict(predictionData)
print(x)
I am not sure what you are trying to do here, but change this line:
trainingScores = np.array([11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2])
to this (Notice the extra square brackets around your data):
trainingScores = np.array([[11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2]])
Then change the order of params in fit() like this:
clf.fit(trainingData,trainingScores)
And finally change prediction data like this (again look at the extra square brackets):
predictionData = np.array([[862, 19012018]])
After that your code will run.
You are doing a linear regression code in ML and try to change this line with
trainingScores = np.array(
[11,18,23,33,34,6],
[10,19,21,33,34,1],
[14,18,22,23,31,6],
[16,22,29,31,33,10],
[21,24,27,30,31,6],
[1,14,15,20,27,7],
[1,9,10,11,15,8],
[2,9,27,31,35,1],
[7,13,18,22,33,2]
)