Why does torchvision pretrained model perform better than scratch trained? - pytorch

I followed the torchvision tutorial (https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html) and trained a segmentation model on my own dataset of aerial images. Which only has 2 classes. The result were:
Average Precision (AP) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.640
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.700
Decent but I would like to improve the accuracy of the masks.
So I then trained exactly the same with pretrained=False and pretrained_backbone=False and was very surprised that the performance was about half of the pre-trained model.
Average Precision (AP) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.328
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.472
I'm trying to understand why this would be the case and where I should be directing my effort to improve the model:
Training parameters like: lr, momentum, dropout, batch size ..
Architecture of the MaskRCNN network: roi, anchors ..
Something about my dataset: adding augmentation ..
I'm not sure where to look to understand how I can improve the performance. What is a decent way to approach that?
EDIT: My model is:
def get_model_instance_segmentation(num_classes):
# load an instance segmentation model pre-trained pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
return model

Related

Could someone explain me what's behind the FaceNet Paper ? (one-shot learning, siamese network and triplet loss)

I'm struggling since about 3 weeks on my One-Shot learning project. I'm trying to unlock my computer with my face. Unfortunately, I'm far from this task.
First, I wanted to understand well the concepts behind one-shot learning and especially triplet loss before anything else. So know I try to train a network (in PyTorch) with transfert learning which will lead me I hope to my goal.
What I understood until now :
One shot learning
It's a method where a model should be able to minimise the euclidean distance between two embeddings of two faces of the same person and at the contrary, maximise the euclidean distance between two faces of different persons. In other words, the model should put any face in a d-dimensional Euclidean space, same persons are close to each other and different are fare away from each other.
This model should not especially be trained with known identity. In other words, once well trained, anyone could use it to compare a fixed, unchanged, photo of his face to another face of him/her.
Face verification is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized person and minimise only the distances of faces belonging to the authorized person (1:1 problem).
Face recognition is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized persons and minimise any distances of faces belonging to a set of authorized persons (1:K problem).
Triplet mining
To ensure that the model learns information, one needs to feed it with triplets that are well defined and not obvious. For a dataset of faces this leads to :
triplets such as [for all (i,j,k) distincts] : face[i] == face[j]; and face[i] != face[k]; and face[j] != face[k]
Those triplets are called "valid triplets" and the faces are defined as Anchors; Positives and Negatives.
triplets such as the faces in the euclidean space are not already far away from each others (prevent trivial losses which collapses to zero). They are defined as semi-hard and hard triplets.
From those base lines, I looked for examples on the internet. I understood that the usual ways to produce triplets are online mining or offline mining. I used the marvelous code of https://omoindrot.github.io/triplet-loss to implement batch hard strategy and batch all strategy which are online mining.
My questions :
From this point I'm kind of lost. I've tried different approaches to build my dataset but my loss never converges. The model doesn't seem to learn anything.
Description of my approach (through PyTorch)
Model and dataset
I'm using InceptionResnetV1 from the pytorch_facenet library, pretrained with Casia-Webfaces. I unfreeze the last two layers : linear layer model.last_linear(1792, 512) and model.last_bn() which lead me to 918,528 trainable parameters and an output embedding of dim (512,1).
For the dataset, I'm using the HeadPoseImageDatabase which is a dataset containing 15 persons with for each : 2 front pictures and 186 various head pose pictures. This leads to a set of 2797 pictures (one person has 193 pictures) and 30 front pictures.
My work
I understood that the model should see various identities. So first, I tried the
nn.TripletMarginLoss of PyTorch and provide an anchor (one of the two front pictures of each identity); positive (one of the 183 pictures relative to the anchor's identity); and a negative (a random other face with a different identity).
This was unsuccessful : the loss seems to reduce but the model doesn't generalize on test set.
I thought maybe I didn't provide enough semi-hard or hard triplets to the loss so I constructed 15 datasets relative to each identity "i" : each dataset contains the positive faces of the identity "i" and other negative faces. So that, each dataset contains 2797 images and returns an image with its label (1 if the identity of the face correspond to the dataset I else 0). I made a loop over each identity dataset (and there was a batch loop inside each dataset). I used batch hard this time (https://omoindrot.github.io/triplet-loss) but again, unsuccessful.
Questions
Do I need to create a much simpler model and train it from scratch ?
Is my method seem correct : does the Anchor should pass through the same model than the Positive and Negative's ?
How should I set the margin ?
About the face verification is my statements above correct ? I expect to train my model without pictures of me, and then be capable of minimise/maximise euclidean distances between any embedding faces. Is it correct ?
Is this work feasible with a decent accuracy as a small project (i.e. smth around 95%) ?
Thanks all for your time, I hope my explanations were clear. I let you a piece of code below.
model = InceptionResnetV1(pretrained='casia-webface')
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
num_epochs = 10
for param in model.parameters():
param.requires_grad = False
model.last_linear.weight.requires_grad = True
model.last_bn.weight.requires_grad = True
. . .
. . .
. . .
Block8-511 [-1, 1792, 3, 3] 0
AdaptiveAvgPool2d-512 [-1, 1792, 1, 1] 0
Dropout-513 [-1, 1792, 1, 1] 0
Linear-514 [-1, 512] 917,504
BatchNorm1d-515 [-1, 512] 1,024
================================================================
Total params: 23,482,624
Trainable params: 918,528
Non-trainable params: 22,564,096
----------------------------------------------------------------
Input size (MB): 0.29
Forward/backward pass size (MB): 88.63
Params size (MB): 89.58
Estimated Total Size (MB): 178.50
----------------------------------------------------------------
def model_loop(model, epochs, trainloader, batch_size, pytorchLoss, optimizer, device):
### This model uses the PyTorchMarginLoss with a training set of 4972 images
### separate between Anchors / Positives / Negatives
delta_time = datetime.timedelta(hours=1)
timezone = datetime.timezone(offset=delta_time)
model.to(device)
train_loss_list = []
size_train = len(trainloader.dataset)
for epoch in range(num_epochs):
t = datetime.datetime.now(tz=timezone)
str_t = '{:%Y-%m-%d %H:%M:%S}'.format(t)
print(f"{str_t} : Epoch {epoch+1} on {device} \n---------------------------")
train_loss = 0.0
model.train()
for batch, (imgsA, imgsP, imgsN) in enumerate(trainloader):
# Transfer Data to GPU if available
imgsA, imgsP, imgsN = imgsA.to(device), imgsP.to(device), imgsN.to(device)
# Clear the gradients
optimizer.zero_grad()
# Make prediction & compute the mini-batch training loss
predsA, predsP, predsN = model(imgsA), model(imgsP), model(imgsN)
loss = pytorchLoss(predsA, predsP, predsN)
# Compute the gradients
loss.backward()
# Update Weights
optimizer.step()
# Aggregate mini-batch training losses
train_loss += loss.item()
train_loss_list.append(train_loss)
if batch == 0 or batch == 19:
loss, current = loss.item(), batch*BATCH_SIZE + len(imgsA)
print(f"mini-batch loss for training : \
{loss:>7f} [{current:>5d}/{size_train:>5d}]")
# Compute the global training & as the mean of the mini-batch losses
train_loss /= len(trainloader)
print(f"--Fin Epoch {epoch+1}/{epochs} \n Training Loss: {train_loss:>7f}" )
print('\n')
return train_loss_list
train_loss = model_loop(model = model,
epochs = num_epochs,
trainloader = train_dataloader,
batch_size = 256,
pytorchLoss = nn.TripletMarginLoss(margin=0.1),
optimizer = optimizer,
device = device)
2022-02-18 20:26:30 : Epoch 1 on cuda
-------------------------------
mini-batch loss for training : 0.054199 [ 256/ 4972]
mini-batch loss for training : 0.007469 [ 4972/ 4972]
--Fin Epoch 1/10
Training Loss: 0.026363
2022-02-18 20:27:48 : Epoch 5 on cuda
-------------------------------
mini-batch loss for training : 0.005694 [ 256/ 4972]
mini-batch loss for training : 0.011877 [ 4972/ 4972]
--Fin Epoch 5/10
Training Loss: 0.004944
2022-02-18 20:29:24 : Epoch 10 on cuda
-------------------------------
mini-batch loss for training : 0.002713 [ 256/ 4972]
mini-batch loss for training : 0.001007 [ 4972/ 4972]
--Fin Epoch 10/10
Training Loss: 0.003000
Stats through a dataset of 620 images :
TP : 11.25%
TN : 98.87%
FN : 88.75%
FP : 1.13%
The model accuracy is 55.06%

Training the AI model taking so long

I am solving a multi-class classification problem.The data set looks like below :
|---------------------|------------------|----------------------|------------------|
| feature 1 | feature 3 | feature 4 | feature 2 |
|---------------------|------------------|------------------------------------------
| 1.302 | 102.987 | 1.298 | 99.8 |
|---------------------|------------------|----------------------|------------------|
|---------------------|------------------|----------------------|------------------|
| 1.318 | 102.587 | 1.998 | 199.8 |
|---------------------|------------------|----------------------|------------------|
The 4 features are floats and my target variable classes are either 1,2, or 3 .When I build the follow model and train it takes so long to converge (24 hours and still running )
I used a keras model like below :
def create_model(optimizer='adam', init='uniform'):
# create model
if verbose: print("**Create model with optimizer: %s; init: %s" % (optimizer, init) )
model = Sequential()
model.add(Dense(16, input_dim=X.shape[1], kernel_initializer=init, activation='relu'))
model.add(Dense(8, kernel_initializer=init, activation='relu'))
model.add(Dense(4, kernel_initializer=init, activation='relu'))
model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
fitting the model
best_epochs = 200
best_batch_size = 5
best_init = 'glorot_uniform'
best_optimizer = 'rmsprop'
verbose=0
model_pred = KerasClassifier(build_fn=create_model, optimizer=best_optimizer, init=best_init, epochs=best_epochs, batch_size=best_batch_size, verbose=verbose)
model_pred.fit(X_train,y_train)
I followed the tutorial here: https://www.kaggle.com/stefanbergstein/keras-deep-learning-on-titanic-data
and also a fast ai model like below :
cont_names = [ 'feature1', 'feature2', 'feature3', 'feature4']
procs = [FillMissing, Categorify, Normalize]
test = TabularList.from_df(test,cont_names=cont_names, procs=procs)
data = (TabularList.from_df(train, path='.', cont_names=cont_names, procs=procs)
.random_split_by_pct(valid_pct=0.2, seed=43)
.label_from_df(cols = dep_var)
.add_test(test, label=0)
.databunch())
learn = tabular_learner(data, layers=[1000, 200, 15], metrics=accuracy, emb_drop=0.1, callback_fns=ShowGraph)
I followed the tutorial below
https://medium.com/#nikkisharma536/applying-deep-learning-on-tabular-data-for-regression-and-classification-problems-1e5f80743259
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
(138507, 4) (138507, 1) (34627, 4) (34627, 1)
Not sure why both the models are taking so long to run .is there any error in my inputs? Any help is appreciated.
With 200 epochs and over 138k training examples (and almost 35k test examples), you are dealing with a total of 34626800 (~35M) examples shown to the network. Those are big numbers.
Assuming that you are using your CPU for training, this may take several hours, even days, depending on your hardware.
One thing you could do is lower the number of epochs to see if you have and acceptable model earlier.

Why is my detection score high inspite of obvious misclassifications during prediction?

I am working on an intrusion classification problem using NSL-KDD dataset. I used 10 features (out of 42) for training after applying Recursive feature elimination technique using Random Forest Classifier as the estimator parameter and Gini index as criterion for splitting Decision tree. After training the classifier, I use same classifier to predict the classes of test data. My cross validation score (Accuracy, precision, recall, f-score) using cross_val_score of sklearn gave above 99 % scores for all the four scores. But plotting the confusion matrix showed otherwise with higher values seen in False positive and False negative values. Claerly, they are not matching with accuracy and all these scores. Where did I do wrong ?
# Train set contain X_train (dataframe of features) and Y_train (series
# of target labels)
# Test set contain X_test and Y_test
# Classifier variable
clf = RandomForestClassifier(n_estimators = 10, criterion = 'gini')
#Training
clf.fit(X_train, Y_train)
# Testing
Y_pred = clf.predict(X_test)
pandas.crosstab(Y_test, Y_pred, rownames = ['Actual'], colnames =
['Predicted'])
# Scoring
accuracy = cross_val_score(clf, X_test, Y_test, cv = 10, scoring =
'accuracy')
print("Accuracy: %0.5f (+/- %0.5f)" % (accuracy.mean(), accuracy.std() *
2))
precision = cross_val_score(clf, X_test, Y_test, cv = 10, scoring =
'precision_weighted')
print("Precision: %0.5f (+/- %0.5f)" % (precision.mean(), precision.std()
* 2))
recall = cross_val_score(clf, X_test, Y_test, cv = 10, scoring =
'recall_weighted')
print("Recall: %0.5f (+/- %0.5f)" % (recall.mean(), recall.std() * 2))
f = cross_val_score(clf, X_test, Y_test, cv = 10, scoring = 'f1_weighted')
print("F-Score: %0.5f (+/- %0.5f)" % (f.mean(), f.std() * 2))
I got accuracy, precision, recall and f-score of
Accuracy 0.99825
Precision 0.99826
Recall 0.99825
F-Score 0.99825
However, the confusion matrix showed otherwise
Predicted 9670 41
Actual 5113 2347
Am I training the whole thing wrong or is it just misclassification problem from poor feature selection?
Your predicted values are stored in y_pred.
accuracy_score(y_test,y_pred)
Just check whether this works...
You are not comparing equivalent results! For the confusion matrix, you train on (X_train,Y_train) and test on (X_test,Y_test).
However, the crossvalscore fits the estimator on k-1 folds of (X_test,Y_test) and test it on the remaining fold of (X_test,Y_test) because crossvalscore do its own cross-validation (with 10 folds here) on the dataset you provide. Check out crossvalscore documentation for more explanation.
So basically, you don't fit and test your algorithm on the same data. This might explain some inconsistency in the results.

Getting perfect ROC-AUC score for Linear SVC

I am evaluating different classifiers for my sentiment analysis model. I am looking at all available metrics, and whilst most achieve a similar precision, recall, F1-scores and ROC-AUC scores, Linear SVM appears to get a perfect ROC-AUC score. Look at the chart below:
Abbreviations: MNB=Multinomial Naive Bayes, SGD=Stochastic Gradient Descent, LR=Logistic Regression, LSVC=Linear Support Vector Classification
Here are the rest of the performance metrics for LSVC, which are very similar to the rest of the classifiers:
precision recall f1-score support
neg 0.83 0.90 0.87 24979
pos 0.90 0.82 0.86 25021
avg / total 0.87 0.86 0.86 50000
As you can see the dataset is balanced for pos and neg comments.
Here is the relevant code:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = np.max(probabilities, axis=1)
pos_idx = np.where(predicted == 'pos')
predicted_true_binary = np.zeros(predicted.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
I am using predict_proba for all classifiers apart from LSVC which uses decision_function instead (since it does not have a predict_proba method`)
What's going on?
EDIT: changes according to #Vivek Kumar's comments:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = probabilities[:, 1] # NEW
testing_category_array = np.array(testing_category) # NEW
pos_idx = np.where(testing_category_array == 'pos')
predicted_true_binary = np.zeros(testing_category_array.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
This now yields this graph:
I don't think it is valid to compare the methods predict_proba and decision_function like for like. The first sentence in the docs for LSVC decision function "Predict confidence scores for samples." must not be read as "predicting probabilties". The second sentences clarifies it, it is similar to the decision function for the general SVC.
You can use predict_proba for a linear SVC with sklearn; then you need to specific under the general SVC the kernel as 'linear'. However, then you are changing the implementation under the hood (away from "LIBLINEAR").

Binary classification (logistic regression) predict wrong label with high accuracy

I have a problem that a binary Logistic regression (using scikit-learn python=2.7) classification that is predicting the wrong/opposite class with a high accuracy. That is, after fitting the model the predicted score and predicted probabilities for each class are very consistent but always of the wrong class. I cannot share the data, but some pseudo-code of my approach is:
X = np.vstack((cond_1, cond_2)) # shape of X = 200*51102
y = np.concatenate([np.zeros(len(cond_1)), np.ones(len(cond_2)])
scls = []
clfs = []
scores = []
for train, test in cv.split(X, y):
clf = LogisticRegression(C=1)
scl = StandardScaler()
scl.fit(X[train])
X_train = scl.transform(X[train])
scls.append(scl)
X_test = scl.transform(X[test])
clf.fit(X_train, y[train])
y_pred = clf.predict(X_test)
scores.append(roc_auc_score(y[test], y_pred))
The roc_auc scores have a mean of 0.065% and a standard deviation of 0.05% so there seems to be going something, but what? I have plotted the features and they seem to be okay normally distributed. I also look that at the probabilities from predict_proba and they are mostly above 80% for the wrong class/label.
Any ideas what is going on and/or how to proper diagnose the problem?
I apologise for not being able to ask a more precise question but I'm lacking the vocabulary.

Resources