Calculating loss in YOLOv3 for all three scales - conv-neural-network

I'm currently trying to implement YOLOv3 in TensorFlow, using the Estimator API. However, I'm stuck at the loss function. YOLOv3 makes predictions at three scales and I can't figure out, how to calculate the loss for all of them. I've already looked at the paper and also tried to find the loss function in the darknet source code but can't figure it out. I've also looked at the code for the loss function of another implementation YOLOv3 TensorFlow implementation but this hasn't really helped me to understand the calculation of the loss either.
Can someone explain how exactly the loss for training is calculated while taking into account the predictions of all three scales?

Related

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation

I get different validation loss for almost same type of accuracy

I made two different convolution neural networks for a multi-class classification. And I tested the performance of the two networks using evaluate_generator function in keras. Both models give me comparable accuracies. One gives me 55.9% and the other one gives me 54.8%. However, the model that gives me 55.9% gives a validation loss of 5.37 and the other 1.24.
How can these test losses be so different when the accuracies are
similar. If anything I would expect the loss for the model with
55.9% accuracy to be lower but it's not.
Isn't loss the total sum of errors the network is making?
Insights would be appreciated.
Isn't loss the total sum of errors the network is making?
Well, not really. Loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.
For exaple, in regression tasks loss function can be mean squared error. In classification - binary or categorical crossentropy. These loss functions measure how your model understanding of data is close to the reality.
Why both loss and accuracy are high?
High loss doesn't mean you model don't know anything. In basic case you can think about it that the smaller the loss, the more confident the model is in its choice.
So model with the higher loss not really sure about its answers.
You can also read this discussion about high loss and accuracy
Even though the accuracies are similar, the loss value is not correlated when comparing different models.
Accuracy measures the fraction of correctly classified samples over the Total Population of your samples.
With regards to the loss value, from keras documentation:
Return value
For scalars, the loss value of the test (if the model does not have a merit function) or > a list of scalars (if the model computes another merit function).
If this doesn't help on your case (I don't have a way to reproduce the issue), please check the following known issues in keras, with regards to the evaluate_generator function:
evaluate_generator

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset.
I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. I've tried many different learning rates, and 0.01 seems to be working fairly well.
After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. However, I would wish for this to be even higher. However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. (A graph of this is below)
What are the most effective ways to get out of this "valley" that the model seems to be stuck in?
Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try:
Try more than one hidden layer
Experiment with LSTM and GRU layer and combination of these layers together with RNN.
Shape of your data i.e. the history you look at to predict the weather.
Make sure your features are scaled properly since you have about 50 input variables.
Your question is little ambiguous as you mentioned RNN with a single hidden layer. Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. So, I would like to add a few points.
You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. Did you try using SGD or Adam optimizers which are very useful?
Do you have any regularization term in the loss function? Are you familiar with dropout? Did you check the training performance? Does your model overfit?
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not?
150 epochs just for a binary classification task looks too much. Why don't you start from an off-the-shelf classifier model? You can find several examples of regression, classification in this tutorial.

Getting loss as "nan" while running CNN (using Keras)

Need Suggestion
I am trying to design a model to guess Facial-Points. Its a part of Kaggle Competition (https://www.kaggle.com/c/facial-keypoints-detection).
In this solution, I am trying to design a CNN model (using Keras Library), as a Multi-variable regression model to Predict the co-ordinates of Facial-points.
Issue Faced --> I am getting loss as "nan"
Solutions tried --
1. Tried optimizers - Adam, SGD
2. tested with Learning rate 0.01 to 0.00001
3. Tried with various batch sizes
Can anyone suggest, if I am missing something. The code is present in below link -
https://www.kaggle.com/saurabhrathor/facialpoints-practice

How to write a categorization accuracy loss function for keras (deep learning library)?

How to write a categorization accuracy loss function for keras (deep learning library)?
Categorization accuracy loss is the percentage of predictions that are wrong, i.e. #wrong/#data points.
Is it possible to write a custom loss function for that?
Thanks.
EDIT
Although Keras allows you to use custom loss function, I am not convinced anymore that using accuracy as loss makes sense. First, the network's last layer will typically be soft-max, so that you obtain a vector of class probabilities rather than the single most likely class. Second, I fear that there will be issues with gradient computation due to lack of smoothness of accuracy.
OLD POST
Keras offers you the possibility to use custom loss functions. To get the accuracy loss, you can take inspiration from the examples that are already implemented. For binary classification, I would suggest the following implementation
def mean_accuracy_error(y_true, y_pred):
return K.mean(K.abs(K.sign(y_true - y_pred)), axis=-1)

Resources