python 3 linear regression problems - python-3.x

I'm doing a simple linear regression using sklearn (on python3.7), but the intercept and the fit comand give me the same result.
Here is my code:
df = pd.DataFrame({'stud': [1,2,3,4,5,6,4,1,2,1,3],'red':[12000,23000,35000,47000,55000,67000,43000,15000, 25000,15000,31500]})
df
mat = np.matrix(df)
x = mat[:,1]
y = mat[:,0]
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr
lr.fit(x,y)
lr.intercept_
This last comand gives me the following result:
array([-0.26514558])
and every value I put on predict comand gives me the same result
lr.predict(5)
lr.predict(5)
array([[-0.26467182]])
Could anyone help me?

Related

Inverse Feature Scaling not working while predicting results

# Importing required libraries
import numpy as np
import pandas as pd
# Importing dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1: -1].values
y = dataset.iloc[:, -1].values
y = y.reshape(len(y), 1)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
scy = StandardScaler()
scX = StandardScaler()
X = scX.fit_transform(X)
y = scy.fit_transform(y)
# Training SVR model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting results from SCR model
# this line is generating error
scy.inverse_transform(regressor.predict(scX.transform([[6.5]])))
I am trying to execute this code to predict values from the model but after running it I am getting errors like this:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.27861589].
Reshape your data either using an array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Complete Stack trace of error:
Even my instructor is using the same code but his one is working mine one not I am new to machine learning can anybody tell me what I am doing wrong here.
Thanks for your help.
This is the data for reference
It is because of the shape of your predictions, the scy is expecting an output with (-1, 1) shape. Change your last line to this:
scy.inverse_transform([regressor.predict(scX.transform([[6.5]]))])
You can also use this line to predict:
pred = regressor.predict(scX.transform([[6.5]]))
pred = pred.reshape(-1, 1)
scy.inverse_transform(pred)

how to do a linear fit where my variable X is vector in 3d?

I need to do a linear fit as follows:
Y=a*X+b
I need to find the values ​​of a and b that fit the experimental data
the first thing that occurred to me was to use the polyfit function,
but the problem is that in my data, X is a vector with 3 entries,
this is my code:
p_0=np.array([10,10,10])
p_1=np.array([100,10,10])
p_2=np.array([10,100,10])
p_3=np.array([10,10,100])
# Experimental data:
x=np.array([p_0,p_1,p_2,p_3])
y=np.array([35,60,75,65])
a=np.polyfit(x, y,1)
print(a)
I was expecting a list of lists to print, with the matrix and matrix b ... but I got TypeError("expected 1D vector for x")
Is there any way to do this with numpy or some other library?
sklearn can be used for this:
import numpy as np
from sklearn.linear_model import LinearRegression
model = LinearRegression()
p_0=np.array([10,10,10])
p_1=np.array([100,10,10])
p_2=np.array([10,100,10])
p_3=np.array([10,10,100])
# Experimental data:
x=np.array([p_0,p_1,p_2,p_3])
y=np.array([35,60,75,65])
model.fit(X=x, y=y)
print("coeff: ", *model.coef_)
print("intercept: ", model.intercept_)
output:
coeff: 0.27777777777777785 0.44444444444444464 0.33333333333333337
intercept: 24.444444444444436
A few other nice features of the sklearn package:
model.fit(x,y) # 1.0
model.rank_ # 3
model.predict([[1,2,3]]) # array([26.61111111])
One way to go about this is using numpy.linalg.lstsq:
# Experimental data:
x=np.array([p_0,p_1,p_2,p_3])
y=np.array([35,60,75,65])
A = np.column_stack([x, np.ones(len(x))])
coefs = np.linalg.lstsq(A, y)[0]
print (coefs)
# 0.27777778 0.44444444 0.33333333 24.44444444
Another option is to use LinearRegression from sklearn:
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(x, y)
print (reg.coef_, reg.intercept_)
# array([0.27777778, 0.44444444, 0.33333333]), 24.444444444444443

DataConversionWarning on sklearn Logistic Regression

I am trying to perform a logistics regression in sklearn below:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
mod_data2 = mod_data.copy()
classifier.fit(mod_data2[['prob1_norm', 'prob2_norm']].values.reshape(-1,2), mod_data2['Success'].values.reshape(-1,1))
But it is giving me the error message:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
I have tried using .ravel() on the end of my input data but then it tells me I have the wrong dimensions.
Thanks
df.squeeze() should work. It converts a dataframe to a series and when I used it the warning conversion went away
y = mod_data2['Success'].squeeze()

how to resolve this ValueError: only 2 non-keyword arguments accepted sklearn python

hello i am new to sklearn in python and iam trying to learn it and use this module to predict some numbers based on two features here is the error i am getting:
ValueError: only 2 non-keyword arguments accepted
and here is my code:
from sklearn.linear_model import LinearRegression
import numpy as np
trainingData = np.array([[861, 16012018], [860, 12012018], [859, 9012018], [858, 5012018], [857, 2012018], [856, 29122017], [855, 26122017], [854, 22122017], [853, 19122017]])
trainingScores = np.array([11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2])
clf = LinearRegression(fit_intercept=True)
clf.fit(trainingScores,trainingData)
predictionData = np.array([862, 19012018 ])
x=clf.predict(predictionData)
print(x)
I am not sure what you are trying to do here, but change this line:
trainingScores = np.array([11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2])
to this (Notice the extra square brackets around your data):
trainingScores = np.array([[11,18,23,33,34,6],[10,19,21,33,34,1], [14,18,22,23,31,6],[16,22,29,31,33,10],[21,24,27,30,31,6],[1,14,15,20,27,7],[1,9,10,11,15,8],[2,9,27,31,35,1],[7,13,18,22,33,2]])
Then change the order of params in fit() like this:
clf.fit(trainingData,trainingScores)
And finally change prediction data like this (again look at the extra square brackets):
predictionData = np.array([[862, 19012018]])
After that your code will run.
You are doing a linear regression code in ML and try to change this line with
trainingScores = np.array(
[11,18,23,33,34,6],
[10,19,21,33,34,1],
[14,18,22,23,31,6],
[16,22,29,31,33,10],
[21,24,27,30,31,6],
[1,14,15,20,27,7],
[1,9,10,11,15,8],
[2,9,27,31,35,1],
[7,13,18,22,33,2]
)

Outputting coefficients when running linear regression using sklearn

I'm attempting to run a simple linear regression on a data set and retrieve the coefficients. The data which is from a a .csv file looks like:
"","time","LakeHuron"
"1",1875,580.38
"2",1876,581.86
"3",1877,580.97
"4",1878,580.8
...
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
def Main():
location = r"~/Documents/Time Series/LakeHuron.csv"
ts = pd.read_csv(location, sep=",", parse_dates=[0], header=None)
ts.drop(ts.columns[[0]], axis=1, inplace=True)
length = len(ts)
x = ts[1].values
y = ts[2].values
x = x.reshape(length, 1)
y = y.reshape(length, 1)
regr = linear_model.LinearRegression()
regr.fit(x, y)
print(regr.coef_)
if __name__ == "__main__":
Main()
Since this is a simple linear model then $Y_t = a_0 + a_1*t$, which in this case should be $Y_t = 580.202 -0.0242t$. and all that prints out when running the above code is [[-0.02420111]]. Is there anyway to get the second coefficient 580.202?
I've had a look at the documentation on http://scikit-learn.org/stable/modules/linear_model.html and it outputs two variables in the array.
Look like you only have one X and one Y, So output is correct.
Try this:
#coef_ : array, shape (n_features, ) or (n_targets, n_features)
print(regr.coef_)
#intercept_ : array Independent term in the linear model.
print(regr.intercept_)
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

Resources