GridSearchCV is calculating negative score using my custom classifier - scikit-learn

I am creating a custom classifier for multi-class classification task. To fine tune parameters, I used GridsearchCV with my scoring method as follows:
def accuracy(self,y,y_hat):
return np.sum(y==y_hat)/len(y)
It gives me negative scores around -0.82, which I fail to understand why? This must return a positive number as per simple Math. When I call it without GridSearchCV, it stays around 0.82 to 0.9 for my train and validation sets predictions. I am definitely missing some concepts.
Here is the output:
Epoch 000 ==> Train acc = 0.7656, Val acc = 0.8092
Epoch 010 ==> Train acc = 0.9219, Val acc = 0.8318
Epoch 020 ==> Train acc = 0.8750, Val acc = 0.8233
Epoch 030 ==> Train acc = 0.8750, Val acc = 0.8497
Epoch 040 ==> Train acc = 0.8750, Val acc = 0.8332
Epoch 050 ==> Train acc = 0.8750, Val acc = 0.8407
[CV 1/2] END batchSize=64, c=10, epochs=50, lr=0.1;, score=-0.842 total time= 38.4s

Related

Precision,recall, F1 score with Sklearn on Pytorch

I've been looking through samples but am unable to understand how to integrate the precision, recall and f1 metrics for my model. My code is as follows:
for epoch in range(num_epochs):
#Calculate Accuracy (stack tutorial no n_total)
n_correct = 0
n_total = 0
for i, (words, labels) in enumerate(train_loader):
words = words.to(device)
labels = labels.to(dtype=torch.long).to(device)
# Forward pass
outputs = model(words)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
#feedforward tutorial solution
_, predicted = torch.max(outputs, 1)
n_correct += (predicted == labels).sum().item()
n_total += labels.shape[0]
accuracy = 100 * n_correct/n_total
#Push to matplotlib
train_losses.append(loss.item())
train_epochs.append(epoch)
train_acc.append(accuracy)
#Loss and Accuracy
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.2f}, Acc: {accuracy:.2f}')
Since you have the predicted and the labels variables, you can aggregate them during the epoch loop and convert them to numpy arrays to calculate the required metrics.
At the beginning of the epoch, initialize two empty lists; one for true labels and one for ground truth labels.
for epoch in range(num_epochs):
predicted_labels, ground_truth_labels = [], []
...
Then, keep appending the respective entries to each list during the epoch:
...
_, predicted = torch.max(outputs, 1)
n_correct += (predicted == labels).sum().item()
# appending
predicted_labels.append(predicted.cpu().detach().numpy())
ground_truth_labels.append(labels.cpu().detach().numpy())
...
Then, at the epoch end, you could use precision_recall_fscore_support with predicted_labels and ground_truth_labels as inputs.
Notes:
You'll probably have to refer something like this to flatten the above two lists.
Read about torch.no_grad() to apply it as a good practice during the calculations of metrics.

Cross validation for MNIST dataset with pytorch and sklearn

I am new to pytorch and are trying to implement a feed forward neural network to classify the mnist data set. I have some problems when trying to use cross-validation. My data has the following shapes:
x_train:
torch.Size([45000, 784]) and
y_train: torch.Size([45000])
I tried to use KFold from sklearn.
kfold =KFold(n_splits=10)
Here is the first part of my train method where I'm dividing the data into folds:
for train_index, test_index in kfold.split(x_train, y_train):
x_train_fold = x_train[train_index]
x_test_fold = x_test[test_index]
y_train_fold = y_train[train_index]
y_test_fold = y_test[test_index]
print(x_train_fold.shape)
for epoch in range(epochs):
...
The indices for the y_train_fold variable is right, it's simply:
[ 0 1 2 ... 4497 4498 4499], but it's not for x_train_fold, which is [ 4500 4501 4502 ... 44997 44998 44999]. And the same goes for the test folds.
For the first iteration I want the varibale x_train_fold to be the first 4500 pictures, in other words to have the shape torch.Size([4500, 784]), but it has the shape torch.Size([40500, 784])
Any tips on how to get this right?
I think you're confused!
Ignore the second dimension for a while, When you've 45000 points, and you use 10 fold cross-validation, what's the size of each fold? 45000/10 i.e. 4500.
It means that each of your fold will contain 4500 data points, and one of those fold will be used for testing, and the remaining for training i.e.
For testing: one fold => 4500 data points => size: 4500
For training: remaining folds => 45000-4500 data points => size: 45000-4500=40500
Thus, for first iteration, the first 4500 data points (corresponding to indices) will be used for testing and the rest for training. (Check below image)
Given your data is x_train: torch.Size([45000, 784]) and y_train: torch.Size([45000]), this is how your code should look like:
for train_index, test_index in kfold.split(x_train, y_train):
print(train_index, test_index)
x_train_fold = x_train[train_index]
y_train_fold = y_train[train_index]
x_test_fold = x_train[test_index]
y_test_fold = y_train[test_index]
print(x_train_fold.shape, y_train_fold.shape)
print(x_test_fold.shape, y_test_fold.shape)
break
[ 4500 4501 4502 ... 44997 44998 44999] [ 0 1 2 ... 4497 4498 4499]
torch.Size([40500, 784]) torch.Size([40500])
torch.Size([4500, 784]) torch.Size([4500])
So, when you say
I want the variable x_train_fold to be the first 4500 picture... shape torch.Size([4500, 784]).
you're wrong. this size corresonds to x_test_fold. In the first iteration, based on 10 folds, x_train_fold will have 40500 points, thus its size is supposed to be torch.Size([40500, 784]).
Think I have it right now, but I feel the code is a bit messy, with 3 nested loops. Is there any simpler way to it or is this approach okay?
Here's my code for the training with cross validation:
def train(network, epochs, save_Model = False):
total_acc = 0
for fold, (train_index, test_index) in enumerate(kfold.split(x_train, y_train)):
### Dividing data into folds
x_train_fold = x_train[train_index]
x_test_fold = x_train[test_index]
y_train_fold = y_train[train_index]
y_test_fold = y_train[test_index]
train = torch.utils.data.TensorDataset(x_train_fold, y_train_fold)
test = torch.utils.data.TensorDataset(x_test_fold, y_test_fold)
train_loader = torch.utils.data.DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(test, batch_size = batch_size, shuffle = False)
for epoch in range(epochs):
print('\nEpoch {} / {} \nFold number {} / {}'.format(epoch + 1, epochs, fold + 1 , kfold.get_n_splits()))
correct = 0
network.train()
for batch_index, (x_batch, y_batch) in enumerate(train_loader):
optimizer.zero_grad()
out = network(x_batch)
loss = loss_f(out, y_batch)
loss.backward()
optimizer.step()
pred = torch.max(out.data, dim=1)[1]
correct += (pred == y_batch).sum()
if (batch_index + 1) % 32 == 0:
print('[{}/{} ({:.0f}%)]\tLoss: {:.6f}\t Accuracy:{:.3f}%'.format(
(batch_index + 1)*len(x_batch), len(train_loader.dataset),
100.*batch_index / len(train_loader), loss.data, float(correct*100) / float(batch_size*(batch_index+1))))
total_acc += float(correct*100) / float(batch_size*(batch_index+1))
total_acc = (total_acc / kfold.get_n_splits())
print('\n\nTotal accuracy cross validation: {:.3f}%'.format(total_acc))
You messed with indices.
x_train = x[train_index]
x_test = x[test_index]
y_train = y[train_index]
y_test = y[test_index]
x_fold = x_train[train_index]
y_fold = y_train[test_index]
It should be:
x_fold = x_train[train_index]
y_fold = y_train[train_index]
Though all the above answers provide a good example of how to split the dataset, I am curious about the way to implement the K-fold cross-validation. K-fold aims to estimate the skill of a machine learning model on unseen data. To use a limited sample to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model. (See the concept and explanation in Wikipedia https://en.wikipedia.org/wiki/Cross-validation_(statistics)) Therefore, it is necessary to initialize the parameters of your to-be-trained model at the beginning of each fold. Otherwise, your model will see every sample in the dataset after K-fold and there is no such thing as validation (all are training samples).

my simple loss function cause NAN

I have write a customer loss for myself, but after several steps, the loss became nan, my code is
def my_loss(label_batch, logits_batch, alpha=1.3, beta=0.5):
softmax_logits_batch = tf.nn.softmax(logits_batch, axis=-1)
indices_not_0 = tf.where(tf.not_equal(label_batch, 0)) # not-zero indices
indices_0 = tf.where(tf.equal(label_batch, 0)) # zero indices
predict_not_0 = tf.gather_nd(softmax_logits_batch, indices_not_0)
predict_0 = tf.gather_nd(softmax_logits_batch, indices_0)
avg_p_not_0 = tf.reduce_mean(predict_not_0, axis=0)
avg_p_0 = tf.reduce_mean(predict_0, axis=0)
euclidean_distance = tf.sqrt(tf.reduce_sum(tf.square(avg_p_0 - avg_p_not_0)))
max_value = tf.maximum(alpha - euclidean_distance, 0)
return max_value
Some basic ideas behind are:
My loss is for semantic segmentation which has only 2 categories.
The shape of label_batch is (?, H, W), all the value in it are 0 or 1. The shape of logits_batch is (?, H, W, 2) the value of logits_batch is the logits of the FCN (without Softmax).
I want to find all the logits values (predict_0 or predict_not_0) whose label value is 0 or 1 respectively by indices_0 or indices_not_0.
The shape of both predict_not_0 and predict_0 should be (?, 2)
Calculate the average value for predict_not_0 and predict_0 respectively (which represents the central point coordinates of Euclidean space for category 0 and category 1). The shape of them should be (2,)
calculate the Euclidean distance between two central point coordinates, and it should larger than a certain alpha value (alpha = 1.3 for example)
Now, the problem is after several steps, the loss value become nan.
the output of the code is (I used a very small learning rate parameter)
Epoch[0],step[1],train batch loss = 2.87282,train acc = 0.486435.
Epoch[0],step[2],train batch loss = 2.87282,train acc = 0.485756.
Epoch[0],step[3],train batch loss = 2.87281,train acc = 0.485614.
Epoch[0],step[4],train batch loss = 2.87282,train acc = 0.485649.
Epoch[0],step[5],train batch loss = 2.87282,train acc = 0.485185.
Epoch[0],step[6],train batch loss = 2.87279,train acc = 0.485292.
Epoch[0],step[7],train batch loss = 2.87281,train acc = 0.485222.
Epoch[0],step[8],train batch loss = 2.87282,train acc = 0.484989.
Epoch[0],step[9],train batch loss = 2.87282,train acc = 0.48406.
Epoch[0],step[10],train batch loss = 2.8728,train acc = 0.483306.
Epoch[0],step[11],train batch loss = 2.87281,train acc = 0.483426.
Epoch[0],step[12],train batch loss = 2.8728,train acc = 0.482954.
Epoch[0],step[13],train batch loss = 2.87281,train acc = 0.482535.
Epoch[0],step[14],train batch loss = 2.87281,train acc = 0.482225.
Epoch[0],step[15],train batch loss = 2.87279,train acc = 0.482005.
Epoch[0],step[16],train batch loss = 2.87281,train acc = 0.48182.
Epoch[0],step[17],train batch loss = 2.87282,train acc = 0.48169.
Epoch[0],step[18],train batch loss = 2.8728,train acc = 0.481279.
Epoch[0],step[19],train batch loss = 2.87281,train acc = 0.480878.
Epoch[0],step[20],train batch loss = 2.87281,train acc = 0.480607.
Epoch[0],step[21],train batch loss = 2.87278,train acc = 0.480186.
Epoch[0],step[22],train batch loss = 2.87281,train acc = 0.479925.
Epoch[0],step[23],train batch loss = 2.87282,train acc = 0.479617.
Epoch[0],step[24],train batch loss = 2.87282,train acc = 0.479378.
Epoch[0],step[25],train batch loss = 2.87281,train acc = 0.479496.
Epoch[0],step[26],train batch loss = 2.87281,train acc = 0.479354.
Epoch[0],step[27],train batch loss = 2.87282,train acc = 0.479262.
Epoch[0],step[28],train batch loss = 2.87282,train acc = 0.479308.
Epoch[0],step[29],train batch loss = 2.87282,train acc = 0.479182.
Epoch[0],step[30],train batch loss = 2.22282,train acc = 0.478985.
Epoch[0],step[31],train batch loss = nan,train acc = 0.494112.
Epoch[0],step[32],train batch loss = nan,train acc = 0.508811.
Epoch[0],step[33],train batch loss = nan,train acc = 0.523289.
Epoch[0],step[34],train batch loss = nan,train acc = 0.536233.
Epoch[0],step[35],train batch loss = nan,train acc = 0.548851.
Epoch[0],step[36],train batch loss = nan,train acc = 0.561351.
Epoch[0],step[37],train batch loss = nan,train acc = 0.573149.
Epoch[0],step[38],train batch loss = nan,train acc = 0.584382.
Epoch[0],step[39],train batch loss = nan,train acc = 0.595006.
Epoch[0],step[40],train batch loss = nan,train acc = 0.605065.
Epoch[0],step[41],train batch loss = nan,train acc = 0.614475.
Epoch[0],step[42],train batch loss = nan,train acc = 0.623371.
Epoch[0],step[43],train batch loss = nan,train acc = 0.632092.
Epoch[0],step[44],train batch loss = nan,train acc = 0.640199.
Epoch[0],step[45],train batch loss = nan,train acc = 0.647391.
I used exactly the same code before, except the loss function is tf.nn.sparse_softmax_cross_entropy_with_logits() and everything works, so I suppose there is something wrong in my new loss function.
I have a guess, maybe some batch data only have one category's label (only 0 or 1), so one of predict_not_0 and predict_0 will have no data therefore, but I don't know how to code to validate whether there has data in predict_not_0 and predict_0
can somebody help me find where the problem is and how can I improve my loss function to avoid nan?
This is probably due to the use of tf.sqrt, which has the bad property of having an exploding gradient near 0. Therefore, you are progressively hitting more numerical instabilities as you converge.
The solution is to get rid of tf.sqrt. You could minimize the squared euclidean distance for example.
Another potentiel source of error is tf.reduce_mean, which could return NaN when operated on an empty list. You need to figure out what you want your loss to be when that happens.
nan is caused by 0.0/0.0, log(0.0) or some other computations in many programming language because of the floating point number computation, usually in very big or small number(treated as Infinity or zero because of accuracy).
tf.nn.softmax is not safe enough while training, try some other functions instead, like tf.log_softmax, tf.softmax_cross_entropy_with_logits and so on.

How to use SimpleRNN to model a modulus function & predict many to many sequence

I have designed this toy problem to understand the working of SimpleRNN in Keras.
My input sequence is:
[x1,x2,x3,x4,x5]
and the corresponding output is:
[(0+x1)%2,(x1+x2)%2,(x2+x3)%2,(x3+x4)%2,(x4+x5)%2)]
My code is:
import numpy as np
import random
from scipy.ndimage.interpolation import shift
def generate_sequence():
max_len = 5
x = np.random.randint(1,100,max_len)
shifted_x = shift(x, 1, cval=0)
y = (x + shifted_x) % 2
return x.reshape(max_len,1),y.reshape(max_len,1),shifted_x.reshape(max_len,1)
X_train = np.zeros((100,5,1))
y_train = np.zeros((100,5,1))
for i in range(100):
x,y,z = generate_sequence()
X_train[i] = x
y_train[i] = y
X_test = np.zeros((100,5,1))
y_test = np.zeros((100,5,1))
for i in range(100):
x,y,z = generate_sequence()
X_test[i] = x
y_test[i] = y
from keras.layers import SimpleRNN
model = Sequential()
model.add(SimpleRNN(3,input_shape=(5,1),return_sequences=True,name='rnn'))
model.add(Dense(1,activation='sigmoid'))
# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
print('Train...')
model.fit(X_train, y_train,
batch_size=70,
epochs=200,verbose=0,validation_split=0.3)
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
When I train this SimpleRNN I only get an accuracy of 50%, each item in the sequence only depends on the previous item. Why is the RNN struggling to learn this?
100/100 [==============================] - 0s 37us/step
Test score: 0.6975522041320801
Test accuracy: 0.5120000243186951
UPDATE:
It turns out mod function is very hard to model, I switched to simple data generation strategy like y[t] = x[t] < x[t-1], then I could see the model performing with 80% binary accuracy.
def generate_rnn_sequence():
max_len = 5
x = np.random.randint(1,100,max_len)
shifted_x = shift(x, 1, cval=0)
y = (x < shifted_x).astype(float)
return x.reshape(5,1),y.reshape(5,1)
So How do i model a mod function using a RNN?

How to output a prediction in Tensorflow?

I am trying to use a Tensorflow DNN for a Kaggle Competion. The data is about 100 columns of categorical data, 29 columns of numerical data, and 1 column for the output. What I did was I split it into training and testing with X and y using Scikit's train test split function, where X is a list of each rows without the "id" or the value that needs to be predicted, and y is the value that is needed to be predicted. I then built the model, shown below:
import tensorflow as tf
import numpy as np
import time
import pickle
with open('pickle.pickle', 'rb') as f:
trainX, trainy, testX, testy = pickle.load(f)
trainX = np.array(trainX)
trainy = np.array(trainy)
trainy = trainy.reshape(trainy.shape[0], 1)
testX = np.array(testX)
testy = np.array(testy)
print (trainX.shape)
print (trainy.shape)
testX = testX.reshape(testX.shape[0], 130)
testy = testy.reshape(testy.shape[0], 1)
print (testX.shape)
print (testy.shape)
n_nodes_hl1 = 256
n_nodes_hl2 = 256
n_nodes_hl3 = 256
n_classes = 1
batch_size = 100
# Matrix = h X w
X = tf.placeholder('float', [None, len(trainX[0])])
y = tf.placeholder('float')
def model(data):
hidden_1_layer = {'weights':tf.Variable(tf.random_normal([trainX.shape[1], n_nodes_hl1])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}
hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}
hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}
output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
'biases':tf.Variable(tf.random_normal([n_classes]))}
# (input_data * weights) + biases
l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
l1 = tf.nn.sigmoid(l1)
l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
l2 = tf.nn.sigmoid(l2)
l3 = tf.add(tf.matmul(l2, hidden_3_layer['weights']), hidden_3_layer['biases'])
l3 = tf.nn.sigmoid(l3)
output = tf.matmul(l3, output_layer['weights']) + output_layer['biases']
return output
def train(x):
pred = model(x)
#loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
loss = tf.reduce_mean(tf.square(pred - y))
optimizer = tf.train.AdamOptimizer(0.01).minimize(loss)
epochs = 1
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print ('Beginning Training \n')
for e in range(epochs):
timeS = time.time()
epoch_loss = 0
i = 0
while i < len(trainX):
start = i
end = i + batch_size
batch_x = np.array(trainX[start:end])
batch_y = np.array(trainy[start:end])
_, c = sess.run([optimizer, loss], feed_dict = {x: batch_x, y: batch_y})
epoch_loss += c
i += batch_size
done = time.time() - timeS
print ('Epoch', e + 1, 'completed out of', epochs, 'loss:', epoch_loss, "\nTime:", done, 'seconds\n')
correct = tf.equal(tf.arg_max(pred, 1), tf.arg_max(y, 1))
acc = tf.reduce_mean(tf.cast(correct, 'float'))
print("Accuracy:", acc.eval({x:testX, y:testy}))
train(X)
Output for 1 epoch:
Epoch 1 completed out of 1 loss: 1498498282.5
Time: 1.3765859603881836 seconds
Accuracy: 1.0
I do realize that the loss is very high, and I am using 1 epoch just for testing purposes, and yes, I know my code is quite messy. But all I want to do is print out a prediction. How would I do that? I know that I need to feed a list of features for X, but I just don't understand how to do it. I also don't quite understand why my accuracy is at 1.0, so if you have any suggestions for that, or any ways to change my code, I would be more that happy to listen to any ideas.
Thanks in advance
To get a prediction you just have to evaluate pred, which is the operation that defines the output of the model.
How to do it? With pred.eval(). But you need an input to evalaute its prediction, so you have to provide a feed_dict dictionary to eval() with the sample (or samples) you want to process.
The resulting code looks like:
predictions = pred.eval(feed_dict = {x:testX})
Notice how this is very similar to acc.eval({x:testX, y:testy}), because the idea is the same. You have an operation (acc in this case) which needs some input to be evaluated, and you can evaluate it either by calling acc.eval() or sess.run(acc) with the corresponding feed_dict with the necessary inputs.
The simplest way would be to use the existing session while training (between iterations):
print (sess.run(model, {x:X_example}))
where X_example is some numpy example tensor.
The below line will give you probability scores for every class for example is you 3 classes then the below line will give you a array of shape of 1x3
Considering you want prediction of a single data point X_test you can do the following:
output = sess.run(pred, {x:X_test})
the maximum number in the above variable output will be you prediction so for that we will modify the above statement :
output = sess.run(tf.argmax(pred, 1), {x:X_test})
print("your prediction for X_test is :", output[0])
Other thing you can do is :
output = sess.run(pred, {x:X_test})
output = np.argmax(output)
print("your prediction for X_test is :", output)

Resources