CNN model is non deterministic on validation set - conv-neural-network

We have a regression problem we are trying to solve.
We are using Transfer learning by using resnet50 and adding a linear activation layer at the end of it.
each image input is 3 layers of synthetic wavelet images (not RGB).
since resent uses Relu as an activation function and the fact that the wavelet transformation produces negative values, we have shifted all the data of our images (3Dmatrix) to be positive
our label data is between -5 and 5.
we discovered that once we run our train process several times and try to use it and predict the validation data set we get a huge different in the results.
here are some example of training the model on the exact same train data set with exactly the same hyper parameters:
Train 1 validation set prediction results:
max Predictions Value : [2.605783]
min Predictions Value : [0.71650916]
avg Predictions Value : 1.938421
medial Predictions Value : 1.9630035
Train 2 validation set prediction results:
max Predictions Value : [3.7345936]
min Predictions Value : [0.438244]
avg Predictions Value : 1.1411991
medial Predictions Value : 1.0634146
Train 3 validation set prediction results:
max Predictions Value : [1.6383451]
min Predictions Value : [0.24169573]
avg Predictions Value : 0.8020503
medial Predictions Value : 0.8167548
Train 4 validation set prediction results:
max Predictions Value : [2.3159726]
min Predictions Value : [0.6428349]
avg Predictions Value : 1.0716639
medial Predictions Value : 1.0022478
we are using the same network, hyper parameters and data.
In every model the train and validation loss are very similar +-5%.
why are the prediction values rage are so different
why don't we get any negative predictions (train dataset is balanced 50% positive and negative)?

Related

Custom loss for single-label, multi-class problem

I have a single-label, multi-class classification problem, i.e., a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay to not penalise the model that heavily.
For example, the ground truth for 1 sample is [0,1,1,0,1] of 5 classes, instead of a one-hot vector. This implies that, the model predicting any one (not necessarily all) of the above classes (2,3 or 5) is fine.
For every batch, the predicted output dimension is of the shape bs x n x nc, where bs is the batch size, n is the number of samples per point and nc is the number of classes. The ground truth is also of the same shape as the predicted tensor.
For every batch, I'm expecting my loss function to compare n tensors across nc classes and then average it across n.
Eg: When dimensions are 32 x 8 x 5000. There are 32 batch points in a batch (for bs=32). Each batch point has 8 vector points, and each vector point has 5000 classes. For a given batch point, I wish to compute loss across all (8) vector points, compute their average and do so for the rest of the batch points (32). Final loss would be loss over all losses from each batch point.
How can I approach designing such a loss function? Any help would be deeply appreciated
P.S.: Let me know if the question is ambiguous
One way to go about this was to use a sigmoid function on the network output, which removes the implicit interdependency between class scores that a softmax function has.
As for the loss function, you can then calculate the loss based on the highest prediction for any of your target classes and ignore all other class predictions. For your example:
# your model output
y_out = torch.tensor([[0.1, 0.2, 0.95, 0.1, 0.01]], requires_grad=True)
# class labels
y = torch.tensor([[0,1,1,0,1]])
since we only care about the highest class probability, we set all other class scores to the maximum value achieved for one of the classes:
class_mask = y == 1
max_class_score = torch.max(y_out[class_mask])
y_hat = torch.where(class_mask, max_class_score, y_out)
From which we can use a regular Cross-Entropy loss function
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_hat, y.float())
loss.backward()
when inspecting the gradients, we see that this only updates the prediction that achieved the highest score as well ass all predictions outside of any of the classes.
>>> y_out.grad
tensor([[ 0.3326, 0.0000, -0.6653, 0.3326, 0.0000]])
Predictions for other target classes do not receive a gradient update. Note that if you have a very high ratio of possible classes, this might slow down your convergence.

Threshold does not work on numpy array for accuracy metric

I am trying to implement logistic regression from scratch using numpy. I wrote a class with the following methods to implement logistic regression for a binary classification problem and to score it based on BCE loss or Accuracy.
def accuracy(self, true_labels, predictions):
"""
This method implements the accuracy score. Where the accuracy is the number
of correct predictions our model has.
args:
true_labels: vector of shape (1, m) that contains the class labels where,
m is the number of samples in the batch.
predictions: vector of shape (1, m) that contains the model predictions.
"""
counter = 0
for y_true, y_pred in zip(true_labels, predictions):
if y_true == y_pred:
counter+=1
return counter/len(true_labels)
def train(self, score='loss'):
"""
This function trains the logistic regression model and updates the
parameters based on the Batch-Gradient Descent algorithm.
The function prints the training loss and validation loss on every epoch.
args:
X: input features with shape (num_features, m) or (num_features) for a
singluar sample where m is the size of the dataset.
Y: gold class labels of shape (1, m) or (1) for a singular sample.
"""
train_scores = []
dev_scores = []
for i in range(self.epochs):
# perform forward and backward propagation & get the training predictions.
training_predictions = self.propagation(self.X_train, self.Y_train)
# get the predictions of the validation data
dev_predictions = self.predict(self.X_dev, self.Y_dev)
# calculate the scores of the predictions.
if score == 'loss':
train_score = self.loss_function(training_predictions, self.Y_train)
dev_score = self.loss_function(dev_predictions, self.Y_dev)
elif score == 'accuracy':
train_score = self.accuracy((training_predictions==+1).squeeze(), self.Y_train)
dev_score = self.accuracy((dev_predictions==+1).squeeze(), self.Y_dev)
train_scores.append(train_score)
dev_scores.append(dev_score)
plot_training_and_validation(train_scores, dev_scores, self.epochs, score=score)
after testing the code with the following input
model = LogisticRegression(num_features=X_train.shape[0],
Learning_rate = 0.01,
Lambda = 0.001,
epochs=500,
X_train=X_train,
Y_train=Y_train,
X_dev=X_dev,
Y_dev=Y_dev,
normalize=False,
regularize = False,)
model.train(score = 'loss')
i get the following results
however when i swap the scoring metric to measure over time from loss to accuracy ass follows model.train(score='accuracy') i get the following result:
I have removed normalization and regularization to make sure i am using a simple implementation of logistic regression.
Note that i use an external method to visualize the training/validation score overtime in the LogisticRegression.train() method.
The trick you are using to create your predictions before passing into the accuracy method is wrong. You are using (dev_predictions==+1).
Your problem statement is a Logistic Regression model that would generate a value between 0 and 1. Most of the times, the values will NOT be exactly equal to +1.
So essentially, every time you are passing a bunch of False or 0 to the accuracy function. I bet if you check the number of classes in your datasets having the value False or 0 would be :
exactly 51.7 % in validation dataset
exactly 56.2 % in training dataset.
To fix this, you can use a in-between threshold like 0.5 to generate your labels. So use something like dev_predictions>0.5

Could someone explain me what's behind the FaceNet Paper ? (one-shot learning, siamese network and triplet loss)

I'm struggling since about 3 weeks on my One-Shot learning project. I'm trying to unlock my computer with my face. Unfortunately, I'm far from this task.
First, I wanted to understand well the concepts behind one-shot learning and especially triplet loss before anything else. So know I try to train a network (in PyTorch) with transfert learning which will lead me I hope to my goal.
What I understood until now :
One shot learning
It's a method where a model should be able to minimise the euclidean distance between two embeddings of two faces of the same person and at the contrary, maximise the euclidean distance between two faces of different persons. In other words, the model should put any face in a d-dimensional Euclidean space, same persons are close to each other and different are fare away from each other.
This model should not especially be trained with known identity. In other words, once well trained, anyone could use it to compare a fixed, unchanged, photo of his face to another face of him/her.
Face verification is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized person and minimise only the distances of faces belonging to the authorized person (1:1 problem).
Face recognition is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized persons and minimise any distances of faces belonging to a set of authorized persons (1:K problem).
Triplet mining
To ensure that the model learns information, one needs to feed it with triplets that are well defined and not obvious. For a dataset of faces this leads to :
triplets such as [for all (i,j,k) distincts] : face[i] == face[j]; and face[i] != face[k]; and face[j] != face[k]
Those triplets are called "valid triplets" and the faces are defined as Anchors; Positives and Negatives.
triplets such as the faces in the euclidean space are not already far away from each others (prevent trivial losses which collapses to zero). They are defined as semi-hard and hard triplets.
From those base lines, I looked for examples on the internet. I understood that the usual ways to produce triplets are online mining or offline mining. I used the marvelous code of https://omoindrot.github.io/triplet-loss to implement batch hard strategy and batch all strategy which are online mining.
My questions :
From this point I'm kind of lost. I've tried different approaches to build my dataset but my loss never converges. The model doesn't seem to learn anything.
Description of my approach (through PyTorch)
Model and dataset
I'm using InceptionResnetV1 from the pytorch_facenet library, pretrained with Casia-Webfaces. I unfreeze the last two layers : linear layer model.last_linear(1792, 512) and model.last_bn() which lead me to 918,528 trainable parameters and an output embedding of dim (512,1).
For the dataset, I'm using the HeadPoseImageDatabase which is a dataset containing 15 persons with for each : 2 front pictures and 186 various head pose pictures. This leads to a set of 2797 pictures (one person has 193 pictures) and 30 front pictures.
My work
I understood that the model should see various identities. So first, I tried the
nn.TripletMarginLoss of PyTorch and provide an anchor (one of the two front pictures of each identity); positive (one of the 183 pictures relative to the anchor's identity); and a negative (a random other face with a different identity).
This was unsuccessful : the loss seems to reduce but the model doesn't generalize on test set.
I thought maybe I didn't provide enough semi-hard or hard triplets to the loss so I constructed 15 datasets relative to each identity "i" : each dataset contains the positive faces of the identity "i" and other negative faces. So that, each dataset contains 2797 images and returns an image with its label (1 if the identity of the face correspond to the dataset I else 0). I made a loop over each identity dataset (and there was a batch loop inside each dataset). I used batch hard this time (https://omoindrot.github.io/triplet-loss) but again, unsuccessful.
Questions
Do I need to create a much simpler model and train it from scratch ?
Is my method seem correct : does the Anchor should pass through the same model than the Positive and Negative's ?
How should I set the margin ?
About the face verification is my statements above correct ? I expect to train my model without pictures of me, and then be capable of minimise/maximise euclidean distances between any embedding faces. Is it correct ?
Is this work feasible with a decent accuracy as a small project (i.e. smth around 95%) ?
Thanks all for your time, I hope my explanations were clear. I let you a piece of code below.
model = InceptionResnetV1(pretrained='casia-webface')
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
num_epochs = 10
for param in model.parameters():
param.requires_grad = False
model.last_linear.weight.requires_grad = True
model.last_bn.weight.requires_grad = True
. . .
. . .
. . .
Block8-511 [-1, 1792, 3, 3] 0
AdaptiveAvgPool2d-512 [-1, 1792, 1, 1] 0
Dropout-513 [-1, 1792, 1, 1] 0
Linear-514 [-1, 512] 917,504
BatchNorm1d-515 [-1, 512] 1,024
================================================================
Total params: 23,482,624
Trainable params: 918,528
Non-trainable params: 22,564,096
----------------------------------------------------------------
Input size (MB): 0.29
Forward/backward pass size (MB): 88.63
Params size (MB): 89.58
Estimated Total Size (MB): 178.50
----------------------------------------------------------------
def model_loop(model, epochs, trainloader, batch_size, pytorchLoss, optimizer, device):
### This model uses the PyTorchMarginLoss with a training set of 4972 images
### separate between Anchors / Positives / Negatives
delta_time = datetime.timedelta(hours=1)
timezone = datetime.timezone(offset=delta_time)
model.to(device)
train_loss_list = []
size_train = len(trainloader.dataset)
for epoch in range(num_epochs):
t = datetime.datetime.now(tz=timezone)
str_t = '{:%Y-%m-%d %H:%M:%S}'.format(t)
print(f"{str_t} : Epoch {epoch+1} on {device} \n---------------------------")
train_loss = 0.0
model.train()
for batch, (imgsA, imgsP, imgsN) in enumerate(trainloader):
# Transfer Data to GPU if available
imgsA, imgsP, imgsN = imgsA.to(device), imgsP.to(device), imgsN.to(device)
# Clear the gradients
optimizer.zero_grad()
# Make prediction & compute the mini-batch training loss
predsA, predsP, predsN = model(imgsA), model(imgsP), model(imgsN)
loss = pytorchLoss(predsA, predsP, predsN)
# Compute the gradients
loss.backward()
# Update Weights
optimizer.step()
# Aggregate mini-batch training losses
train_loss += loss.item()
train_loss_list.append(train_loss)
if batch == 0 or batch == 19:
loss, current = loss.item(), batch*BATCH_SIZE + len(imgsA)
print(f"mini-batch loss for training : \
{loss:>7f} [{current:>5d}/{size_train:>5d}]")
# Compute the global training & as the mean of the mini-batch losses
train_loss /= len(trainloader)
print(f"--Fin Epoch {epoch+1}/{epochs} \n Training Loss: {train_loss:>7f}" )
print('\n')
return train_loss_list
train_loss = model_loop(model = model,
epochs = num_epochs,
trainloader = train_dataloader,
batch_size = 256,
pytorchLoss = nn.TripletMarginLoss(margin=0.1),
optimizer = optimizer,
device = device)
2022-02-18 20:26:30 : Epoch 1 on cuda
-------------------------------
mini-batch loss for training : 0.054199 [ 256/ 4972]
mini-batch loss for training : 0.007469 [ 4972/ 4972]
--Fin Epoch 1/10
Training Loss: 0.026363
2022-02-18 20:27:48 : Epoch 5 on cuda
-------------------------------
mini-batch loss for training : 0.005694 [ 256/ 4972]
mini-batch loss for training : 0.011877 [ 4972/ 4972]
--Fin Epoch 5/10
Training Loss: 0.004944
2022-02-18 20:29:24 : Epoch 10 on cuda
-------------------------------
mini-batch loss for training : 0.002713 [ 256/ 4972]
mini-batch loss for training : 0.001007 [ 4972/ 4972]
--Fin Epoch 10/10
Training Loss: 0.003000
Stats through a dataset of 620 images :
TP : 11.25%
TN : 98.87%
FN : 88.75%
FP : 1.13%
The model accuracy is 55.06%

How do i get the maximum valued label when using a softmax activation function in the output layer of neural network?

in a model I have trained I am applying softmax function in the output layer of the neural network. the output has 41 categories and I want to fetch the label with max value and the value itself ..i. e in the output. 41 diseases for a set of inputs ....the softmax predicts for all the diseases but I want to print the disease with maximum probability along with the probability how do I do it?
You can achieve this by simply using the np.argmax() function :
For example, to get the index of the disease with maximum probability of your first test example :
predictions = model.predict(x_test)
print(np.argmax(predictions[0]) #output the index of the disease with max proba
#for example N#0

Multi-step Time Series Prediction w/ seq2seq LSTM

I am trying to predict time series data using an encoder/decoder with LSTM layers. So far, I am using 20 points of past data to predict 20 future points. For each sample of 20 past data points, the 1st value in the predicted sequence is very close to the true 1st value in each sequence: predicting 1 step into the future
However, for the 2nd value in each sequence (2 timesteps into the future), the predicted values look like they are "shifted": predicting 2 steps into the future
This "shifted" nature is true for all values of the predicted sequences, with the shifts increasing as I go farther into the predicted sequence. Here is the code for my model:
model = Sequential()
model.add(LSTM(input_dim = 1, output_dim=128,
return_sequences=False))
model.add(RepeatVector(20))
model.add(LSTM(output_dim=128, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
Is it something with RepeatVector? Any help would be appreciated.

Resources