Inverse Feature Scaling not working while predicting results - python-3.x

# Importing required libraries
import numpy as np
import pandas as pd
# Importing dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1: -1].values
y = dataset.iloc[:, -1].values
y = y.reshape(len(y), 1)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
scy = StandardScaler()
scX = StandardScaler()
X = scX.fit_transform(X)
y = scy.fit_transform(y)
# Training SVR model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting results from SCR model
# this line is generating error
scy.inverse_transform(regressor.predict(scX.transform([[6.5]])))
I am trying to execute this code to predict values from the model but after running it I am getting errors like this:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.27861589].
Reshape your data either using an array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Complete Stack trace of error:
Even my instructor is using the same code but his one is working mine one not I am new to machine learning can anybody tell me what I am doing wrong here.
Thanks for your help.
This is the data for reference

It is because of the shape of your predictions, the scy is expecting an output with (-1, 1) shape. Change your last line to this:
scy.inverse_transform([regressor.predict(scX.transform([[6.5]]))])
You can also use this line to predict:
pred = regressor.predict(scX.transform([[6.5]]))
pred = pred.reshape(-1, 1)
scy.inverse_transform(pred)

Related

how to do a linear fit where my variable X is vector in 3d?

I need to do a linear fit as follows:
Y=a*X+b
I need to find the values ​​of a and b that fit the experimental data
the first thing that occurred to me was to use the polyfit function,
but the problem is that in my data, X is a vector with 3 entries,
this is my code:
p_0=np.array([10,10,10])
p_1=np.array([100,10,10])
p_2=np.array([10,100,10])
p_3=np.array([10,10,100])
# Experimental data:
x=np.array([p_0,p_1,p_2,p_3])
y=np.array([35,60,75,65])
a=np.polyfit(x, y,1)
print(a)
I was expecting a list of lists to print, with the matrix and matrix b ... but I got TypeError("expected 1D vector for x")
Is there any way to do this with numpy or some other library?
sklearn can be used for this:
import numpy as np
from sklearn.linear_model import LinearRegression
model = LinearRegression()
p_0=np.array([10,10,10])
p_1=np.array([100,10,10])
p_2=np.array([10,100,10])
p_3=np.array([10,10,100])
# Experimental data:
x=np.array([p_0,p_1,p_2,p_3])
y=np.array([35,60,75,65])
model.fit(X=x, y=y)
print("coeff: ", *model.coef_)
print("intercept: ", model.intercept_)
output:
coeff: 0.27777777777777785 0.44444444444444464 0.33333333333333337
intercept: 24.444444444444436
A few other nice features of the sklearn package:
model.fit(x,y) # 1.0
model.rank_ # 3
model.predict([[1,2,3]]) # array([26.61111111])
One way to go about this is using numpy.linalg.lstsq:
# Experimental data:
x=np.array([p_0,p_1,p_2,p_3])
y=np.array([35,60,75,65])
A = np.column_stack([x, np.ones(len(x))])
coefs = np.linalg.lstsq(A, y)[0]
print (coefs)
# 0.27777778 0.44444444 0.33333333 24.44444444
Another option is to use LinearRegression from sklearn:
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(x, y)
print (reg.coef_, reg.intercept_)
# array([0.27777778, 0.44444444, 0.33333333]), 24.444444444444443

How to get a specific sample from pytorch DataLoader?

In Pytorch, is there any way of loading a specific single sample using the torch.utils.data.DataLoader class? I'd like to do some testing with it.
The tutorial uses
trainloader = torch.utils.data.DataLoader(...)
images, labels = next(iter(trainloader))
to fetch a random batch of samples. Is there are way, using DataLoader, to get a specific sample?
Cheers
Turn off the shuffle in DataLoader
Use batch_size to calculate the batch in which the desired sample you are looking for falls in
Iterate to the desired batch
Code
import torch
import numpy as np
import itertools
X= np.arange(100)
batch_size = 2
dataloader = torch.utils.data.DataLoader(X, batch_size=batch_size, shuffle=False)
sample_at = 5
k = int(np.floor(sample_at/batch_size))
my_sample = next(itertools.islice(dataloader, k, None))
print (my_sample)
Output:
tensor([4, 5])
if you want to get a specific signle sample from your dataset you can
you should check Subset class.(https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset)
something like this:
indices = [0,1,2] # select your indices here as a list
subset = torch.utils.data.Subset(train_set, indices)
trainloader = DataLoader(subset , batch_size = 16 , shuffle =False) #set shuffle to False
for image , label in trainloader:
print(image.size() , '\t' , label.size())
print(image[0], '\t' , label[0]) # index the specific sample
here is a useful link if you want to learn more about the Pytorch data loading utility
(https://pytorch.org/docs/stable/data.html)

DataConversionWarning on sklearn Logistic Regression

I am trying to perform a logistics regression in sklearn below:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
mod_data2 = mod_data.copy()
classifier.fit(mod_data2[['prob1_norm', 'prob2_norm']].values.reshape(-1,2), mod_data2['Success'].values.reshape(-1,1))
But it is giving me the error message:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
I have tried using .ravel() on the end of my input data but then it tells me I have the wrong dimensions.
Thanks
df.squeeze() should work. It converts a dataframe to a series and when I used it the warning conversion went away
y = mod_data2['Success'].squeeze()

How to normalize time series data with multiple features by using sklearn?

For data with the shape (num_samples,features), MinMaxScaler from sklearn.preprocessing can be used to normalize it easily.
However, when using the same method for time series data with the shape (num_samples, time_steps,features), sklearn will give an error.
from sklearn.preprocessing import MinMaxScaler
import numpy as np
#Making artifical time data
x1 = np.linspace(0,3,4).reshape(-1,1)
x2 = np.linspace(10,13,4).reshape(-1,1)
X1 = np.concatenate((x1*0.1,x2*0.1),axis=1)
X2 = np.concatenate((x1,x2),axis=1)
X = np.stack((X1,X2))
#Trying to normalize
scaler = MinMaxScaler()
X_norm = scaler.fit_transform(X) <--- error here
ValueError: Found array with dim 3. MinMaxScaler expected <= 2.
This post suggests something like
(timeseries-timeseries.min())/(timeseries.max()-timeseries.min())
Yet, it only works for data with only 1 feature. Since my data has more than 1 feature, this method doesn't work.
How to normalize time series data with multiple features?
To normalize a 3D tensor of shape (n_samples, timesteps, n_features) use the following:
(timeseries-timeseries.min(axis=2))/(timeseries.max(axis=2)-timeseries.min(axis=2))
Using the argument axis=2 will return the result of the tensor operation performed along the 3rd dimension i.e., the feature axis. Thus each feature will be normalized independently.

Outputting coefficients when running linear regression using sklearn

I'm attempting to run a simple linear regression on a data set and retrieve the coefficients. The data which is from a a .csv file looks like:
"","time","LakeHuron"
"1",1875,580.38
"2",1876,581.86
"3",1877,580.97
"4",1878,580.8
...
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
def Main():
location = r"~/Documents/Time Series/LakeHuron.csv"
ts = pd.read_csv(location, sep=",", parse_dates=[0], header=None)
ts.drop(ts.columns[[0]], axis=1, inplace=True)
length = len(ts)
x = ts[1].values
y = ts[2].values
x = x.reshape(length, 1)
y = y.reshape(length, 1)
regr = linear_model.LinearRegression()
regr.fit(x, y)
print(regr.coef_)
if __name__ == "__main__":
Main()
Since this is a simple linear model then $Y_t = a_0 + a_1*t$, which in this case should be $Y_t = 580.202 -0.0242t$. and all that prints out when running the above code is [[-0.02420111]]. Is there anyway to get the second coefficient 580.202?
I've had a look at the documentation on http://scikit-learn.org/stable/modules/linear_model.html and it outputs two variables in the array.
Look like you only have one X and one Y, So output is correct.
Try this:
#coef_ : array, shape (n_features, ) or (n_targets, n_features)
print(regr.coef_)
#intercept_ : array Independent term in the linear model.
print(regr.intercept_)
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

Resources