Performance evaluation of image segmentation - Keras? - keras

I am currently using a model (e.g. U-Net or SegNet) implemented by Keras to segment high resolution images.
Below is the code for performance evaluation:
score = model.evaluate(test_data, test_label, verbose=1)
The trained model produced very high scores on my test dataset (loss: 0.4232, acc: 0.9789)
Then I showed the segmented test images by the following code:
k = 7
output = model.predict_classes(test_data[k: k+ 1])
visualize(np.squeeze(output, axis=0))
I do not understand why the real outputs were totally different from the expected outputs (i.e. groundtruths), although the accuracy was very high. Here, I have 2 kinds of objects, red color denotes object 1 and green color denotes object 2.
Any help or suggestions would be greatly appreciated!

Related

Question about Training and Testing in supervised learning

I am a bit confused and hope someone can help me.
I am currently experimenting with supervised learning. And I think I have a basic misunderstanding about the input and output of LSTMs.
When I have a sequence of 10 observations,
And I split it into trains = 1,2,3,4,5,6,7,8
Also, test = 9,10.
And I transform it into a supervised problem like:
Xtrain= [(1,2)(2,3)(3,4)(4,5)(5,6)]
Ytrain= [(3,4)(4,5)(5,6)(6,7)(7,8)]
And
Xtest= [(7,8)]
So the model is made to predict the next two observations from the previous two.
prediction <- predict(Xtest)
Is this illegal for a train/test split ? Am I correct that I can than evaluate the prediction output from xtest against the actual test set containing [(9,10)]
Or should I stop training at xtrain =[(4,5)] and ytrain = [(6,7)] to get some space between training and testing, since the last observations from y training in my example are used for the prediction
?

Keras Nan value when computing the loss

My question is related to this one
I am working to implement the method described in the article https://drive.google.com/file/d/1s-qs-ivo_fJD9BU_tM5RY8Hv-opK4Z-H/view . The final algorithm to use is here (it is on page 6):
d are units vector
xhi is a non-null number
D is the loss function (sparse cross-entropy in my case)
The idea is to do an adversarial training, by modifying the data in the direction where the network is the most sensible to small changes and training the network with the modified data but with the same label as the original data.
The loss function used to train the model is here:
l is a loss measure on the labelled data
Rvadv is the value inside the gradient in the picture of algorithm 1
the article chose alpha = 1
The idea is to incorporate the performances of the model for the labelled dataset in the loss
I am trying to implement this method in Keras with the MNIST dataset and a mini-batch of 100 data. When I tried to do the final gradient descent to update the weights, after some iterations I have Nan values that appear, and I don't know why. I posted the notebook on a collab session (I
don't for how much time it will stand so I also post the code in a gist):
collab session: https://colab.research.google.com/drive/1lowajNWD-xvrJDEcVklKOidVuyksFYU3?usp=sharing
gist : https://gist.github.com/DridriLaBastos/e82ec90bd699641124170d07e5a8ae4c
It's kind of stander problem of NaN in training, I suggest you read this answer about issue NaN with Adam solver for the cause and solution in common case.
Basically I just did following two change and code running without NaN in gradients:
Reduce the learning rate in optimizer at model.compile to optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
Replace the C = [loss(label,pred) for label, pred in zip(yBatchTrain,dumbModel(dataNoised,training=False))] to C = loss(yBatchTrain,dumbModel(dataNoised,training=False))
If you still have this kind of error then the next few thing you could try is:
Clip the loss or gradient
Switch all tensor from tf.float32 to tf.float64
Next time when you facing this kind of error, you could using tf.debugging.check_numerics to find root cause of the NaN

Multi-class segmentation in Keras

I'm trying to implement a multi-class segmentation in Keras:
input image is grayscale (i.e 1 channel)
ground truth image has 3 channels, each pixel is a one-hot vector of length 3
prediction is standard U-Net trained with categorical_crossentropy outputting 3 channels (softmax-ed)
What is wrong with this setup? The training loss has some weird behaviour:
in my lucky cases it behaves as expected (decreases)
90 % of the time it's stuck at ~0.9
My implementation can be found here
I don't think there is anything wrong with the code: if my ground truth is 1-channel (i.e 0s everywhere and 1s somewhere) and use binary_crossentropy + sigmoid as final activation I see no weird behaviour.
I'll answer my own question. The solution is to weight each class i.e using a weighted cross entropy loss

ResUNet Segmentation output is bad although precision and recall values are higher on training and validation

I recently have implemented the RESUNET for a parasite segmentation on blood sample images. The model is described in this papaer, https://arxiv.org/pdf/1711.10684.pdf and here is the code https://github.com/DuFanXin/deep_residual_unet/blob/master/res_unet.py. The segmentation output is a binary image. I trained the model with the weighted Binary cross-entropy Loss, given more weight to the parasite class since there is an imbalance of classes in my images. The last ouput layer has a sigmoid activation.
I calculate precision, recall, and Dice Coefficient value to verify how good is the segmentation on trainning. On training and validation I got good numerical results:
Training
dice_coeff: .6895, f2: 0.8611, precision: 0.6320, recall: 0.9563
Validation
val_dice_coeff: .6433, val_f2: 0.7752, val_precision: 0.6052, val_recall: 0.8499
However, when I try to visually see the segmentations of the validation set my algorithm outputs all black. After analyzing the predictions returned by the model, almost all values are close to zero, so it cannot correctly differenciate between background and foreground. The problems is: Why my metrics shows good numerical values but the segmentation ouput not?
I mean, the metrics are not giving me good information? Why the recall value is higher even if the output is all black?
I trained for about 50 epochs, and my training curves shows constantly learning. Is this because the vanishing gradient problem?
No, you do not have a vanishing gradient issue.
I am almost 100% sure that the problem is related to the way in which you test.
The numbers in your training/validation do not lie.
Ensure that you use the exact same preprocessing on your test dataset, exactly the same preprocessing that is applied during the training.
E.g. : If you use "rescale = 1/255.0" parameter in Keras ImageDataGenerator(), ensure that when you load the test image, divide it by 255.0 before predicting on it.
Note that the aforementioned is a pure example; your inconsistency in train/test preprocessing may stem from other reasons.

Model evaluation : model.score Vs. ROC curve (AUC indicator)

I want to evaluate a logistic regression model (binary event) using two measures:
1. model.score and confusion matrix which give me a 81% of classification accuracy
2. ROC Curve (using AUC) which gives back a 50% value
Are these two result in contradiction? Is that possible
I'missing something but still can't find it
y_pred = log_model.predict(X_test)
accuracy_score(y_test , y_pred)
cm = confusion_matrix( y_test,y_pred )
y_test.count()
print (cm)
tpr , fpr, _= roc_curve( y_test , y_pred, drop_intermediate=False)
roc = roc_auc_score( y_test ,y_pred)
enter image description here
The accuracy score is calculated based on the assumption that a class is selected if it has a prediction probability of more than 50%. This means that you are looking only at 1 case (one working point) out of many. Let's say you'd like to classify an instance as '0' even if it has a probability greater than 30% (this may happen if one of your classes is more important for you, and its a-priori probability is very low). In this case - you will have a very different confusion matrix with a different accuracy ([TP+TN]/[ALL]). The ROC auc score examines all of these working points and gives you an estimation of your overall model. A score of 50% means that the model is equal to a random selection of classes based on your a-priori probabilities of the classes. You would like the ROC to be much higher to say that you have a good model.
So in the above case - you can say that your model does not have a good prediction strength. As a matter of fact - a better prediction will be to predict everything as "1" - in your case it will lead to an accuracy of above 99%.

Resources