Error while fitting the data in KNN in python - python-3.x

This is my code
clf = KNN(n_neighbors = 1)
clf.fit(train_x, train_y)
train_predict = clf.predict(train_x)
k = f1_score(train_predict,train_y)
print("Training F1 Score:",k)
test_predict = clf.predict(test_x)
k = f1_score(test_predict,test_y)
print("Test F1 score:",k)
I am getting the error
Found input variables with inconsistent numbers of samples: [668, 223]
The shape of the data is
train_x=(668, 24),
train_y=(223,24)
Please help me, Thanks in advance

If you look at the documentation of the fit function
Fit the model using X as training data and y as target values
Parameters:
X : {array-like, sparse matrix, BallTree, KDTree}
Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’.
y : {array-like, sparse matrix}
Target values of shape = [n_samples] or [n_samples, n_outputs]
It is clear that the shape of train_x and train_y doesn't match the requirements of the fit function. I am guessing that dimensions of the train_x and train_y are flipped in your case.
Therefore try:
clf.fit(train_x.T, train_y.T)

Related

Output of the model depends on the shape of the weights tensor

I want to train the model to sum the three inputs. So it is as simple as possible.
Firstly the weights are initialized randomly. It produces bad error estimate (approx. 0.5)
Then I initialize the weights with zeros. There are two options:
the shape of the weights tensor is [1, 3]
the shape of the weights tensor is [3]
When I choose the 1st option the model still works bad and can't learn this simple formula.
When I choose the 2nd option it works perfect with the error of 10e-12.
Why the result depends on the shape of the weights? Why do I need to initialize the model with zeros to solve this simple problem?
import torch
from torch.nn import Sequential as Seq, Linear as Lin
from torch.optim.lr_scheduler import ReduceLROnPlateau
X = torch.rand((1024, 3))
y = (X[:,0] + X[:,1] + X[:,2])
m = Seq(Lin(3, 1, bias=False))
# 1 option
m[0].weight = torch.nn.parameter.Parameter(torch.tensor([[0, 0, 0]], dtype=torch.float))
# 2 option
#m[0].weight = torch.nn.parameter.Parameter(torch.tensor([0, 0, 0], dtype=torch.float))
optim = torch.optim.SGD(m.parameters(), lr=10e-2)
scheduler = ReduceLROnPlateau(optim, 'min', factor=0.5, patience=20, verbose=True)
mse = torch.nn.MSELoss()
for epoch in range(500):
optim.zero_grad()
out = m(X)
loss = mse(out, y)
loss.backward()
optim.step()
if epoch % 20 == 0:
print(loss.item())
scheduler.step(loss)
First option doesn't learning because it fails with broadcasting: while out.shape == (1024, 1) corresponding targets y has shape of (1024, ). MSELoss, as expected, computes mean of tensor (out - y)^2, which in this case has shape (1024, 1024), clearly wrong objective for this task. At the same time, after applying 2-nd option tensor (out - y)^2 has size (1024, ) and mean of it corresponds to actual mse. Default approach, without explicit changing weights shape (through option 1 and 2), would work if set target shape to (1024, 1) for example by y = y.unsqueeze(-1) after definition of y.

Input dimension for CrossEntropy Loss in PyTorch

For a binary classification problem with batch_size = 1, I have logit and label values using which I need to calculate loss.
logit: tensor([0.1198, 0.1911], device='cuda:0', grad_fn=<AddBackward0>)
label: tensor(1], device='cuda:0')
# calculate loss
loss_criterion = nn.CrossEntropyLoss()
loss_criterion.cuda()
loss = loss_criterion( b_logits, b_labels )
However, this always results in the following error,
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
What input dimensions is the CrossEntropyLoss actually asking for?
You are passing wrong shape of tensors.
shape should be (from doc)
Input: (N,C) where C = number of classes
Target: (N) where each value is 0 ≤ targets[i] ≤ C−1
So here, b_logits shape should be ([1,2]) instead of ([2]) to make it right shape you can use torch.view like b_logits.view(1,-1).
And b_labels shape should be ([1]).
Ex.:
b_logits = torch.tensor([0.1198, 0.1911], requires_grad=True)
b_labels = torch.tensor([1])
loss_criterion = nn.CrossEntropyLoss()
loss = loss_criterion( b_logits.view(1,-1), b_labels )
loss
tensor(0.6581, grad_fn=<NllLossBackward>)

Linear regression with pytorch

I tried to run linear regression on ForestFires dataset.
Dataset is available on Kaggle and gist of my attempt is here:
https://gist.github.com/Chandrak1907/747b1a6045bb64898d5f9140f4cf9a37
I am facing two problems:
Output from prediction is of shape 32x1 and target data shape is 32.
input and target shapes do not match: input [32 x 1], target [32]¶
Using view I reshaped predictions tensor.
y_pred = y_pred.view(inputs.shape[0])
Why there is a mismatch in shapes of predicted tensor and actual tensor?
SGD in pytorch never converges. I tried to compute MSE manually using
print(torch.mean((y_pred - labels)**2))
This value does not match
loss = criterion(y_pred,labels)
Can someone highlight where is the mistake in my code?
Thank you.
Problem 1
This is reference about MSELoss from Pytorch docs: https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss
Shape:
- Input: (N,∗) where * means, any number of additional dimensions
- Target: (N,∗), same shape as the input
So, you need to expand dims of labels: (32) -> (32,1), by using: torch.unsqueeze(labels, 1) or labels.view(-1,1)
https://pytorch.org/docs/stable/torch.html#torch.unsqueeze
torch.unsqueeze(input, dim, out=None) → Tensor
Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
Problem 2
After reviewing your code, I realized that you have added size_average param to MSELoss:
criterion = torch.nn.MSELoss(size_average=False)
size_average (bool, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
That's why 2 computed values not matched. This is sample code:
import torch
import torch.nn as nn
loss1 = nn.MSELoss()
loss2 = nn.MSELoss(size_average=False)
inputs = torch.randn(32, 1, requires_grad=True)
targets = torch.randn(32, 1)
output1 = loss1(inputs, targets)
output2 = loss2(inputs, targets)
output3 = torch.mean((inputs - targets) ** 2)
print(output1) # tensor(1.0907)
print(output2) # tensor(34.9021)
print(output3) # tensor(1.0907)

Resnet with Custom Data

I am trying to modify Resnet50 with my custom data as follows:
X = [[1.85, 0.460,... -0.606] ... [0.229, 0.543,... 1.342]]
y = [2, 4, 0, ... 4, 2, 2]
X is a feature vector of length 2000 for 784 images. y is an array of size 784 containing the binary representation of labels.
Here is the code:
def __classifyRenet(self, X, y):
image_input = Input(shape=(2000,1))
num_classes = 5
model = ResNet50(weights='imagenet',include_top=False)
model.summary()
last_layer = model.output
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(last_layer)
# add fully-connected & dropout layers
x = Dense(512, activation='relu',name='fc-1')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu',name='fc-2')(x)
x = Dropout(0.5)(x)
# a softmax layer for 5 classes
out = Dense(num_classes, activation='softmax',name='output_layer')(x)
# this is the model we will train
custom_resnet_model2 = Model(inputs=model.input, outputs=out)
custom_resnet_model2.summary()
for layer in custom_resnet_model2.layers[:-6]:
layer.trainable = False
custom_resnet_model2.layers[-1].trainable
custom_resnet_model2.compile(loss='categorical_crossentropy',
optimizer='adam',metrics=['accuracy'])
clf = custom_resnet_model2.fit(X, y,
batch_size=32, epochs=32, verbose=1,
validation_data=(X, y))
return clf
I am calling to function as:
clf = self.__classifyRenet(X_train, y_train)
It is giving an error:
ValueError: Error when checking input: expected input_24 to have 4 dimensions, but got array with shape (785, 2000)
Please help. Thank you!
1. First, understand the error.
Your input does not match the input of ResNet, for ResNet, the input should be (n_sample, 224, 224, 3) but you are having (785, 2000). From your question, you have 784 images with array of size 2000, which doesn't really align with the original ResNet50 input shape of (224 x 224) no matter how you reshape it. That means you cannot use the ResNet50 directly with your data. The only thing you did in your code is to take the last layer of ResNet50 and added you output layer to align with your output class size.
2. Then, what you can do.
If you insist to use the ResNet architecture, you will need to change the input layer rather than output layer. Also, you will need to reshape your image data to utilize the convolution layers. That means, you cannot have it in a (2000,) array, but need to be something like (height, width, channel), just like what ResNet and other architectures are doing. Of course you will also need to change the output layer as well just like you did so that you are predicting for your classes. Try something like:
model = ResNet50(input_tensor=image_input_shape, include_top=True,weights='imagenet')
This way, you can specify customized input image shape. You can check the github code for more information (https://github.com/keras-team/keras/blob/master/keras/applications/resnet50.py). Here's part of the docstring:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.

1D Convolutional network with keras, error on input size

I'm trying to build a convolutional neural network for my dataset. My training dataset has 1209 examples of 800 features each.
Here's what part of the code looks like :
model = Sequential()
model.add(Conv1D(64, 3, activation='linear', input_shape=(1209, 800)))
model.add(GlobalMaxPooling1D())
model.add(Dense(1, activation='linear'))
model.compile(loss=loss_type, optimizer=optimizer_type, metrics=[metrics_type])
model.fit(X, Y, validation_data=(X2,Y2),epochs = nb_epochs,
batch_size = batch_size,shuffle=True)
When I compile this code, I get the following error :
Error when checking input: expected conv1d_25_input to have 3 dimensions,
but got array with shape (1209, 800)
So I add a dimension, here's what I do :
X = np.expand_dims(X, axis=0)
X2 = np.expand_dims(X2, axis=0)
And then I get this error :
ValueError: Input arrays should have the same number of samples as target arrays.
Found 1 input samples and 1209 target samples.
My training data has now a shape like this (1, 1209, 800), should it be something else ?
Thanks a lot for reading this.
Instead of expanding the dimensions on X at axis 0, you should expand on axis 2. Thus, rather than X = np.expand_dims(X, axis=0), you need X = np.expand_dims(X, axis=2).
Afterwards, the shape of X should be (1209, 800, 1), and you should then specify input_shape=(800, 1) in your first layer.

Resources