I am new at deep learning.
I have a dataset with 1001 values of human pose upper body. The model I trained for that has 4 Conv layers and 2 fully connected layers with ReLu and Dropout. This is the result I got after 200 iterations. Does anyone have any ideas about why the curve of training loss decreases in a sharp way?
I think probably I need more data, as my dataset concludes numerical values what do you think is the best data augmentation method I have to use here?
Related
I'm trying to train a multilabel text classification model using BERT. Each piece of text can belong to 0 or more of a total of 485 classes. My model consists of a dropout layer and a linear layer added on top of the pooled output from the bert-base-uncased model from Hugging Face. The loss function I'm using is the BCEWithLogitsLoss in PyTorch.
I have millions of labeled observations to train on. But the training data are highly unbalanced, with some labels appearing in less than 10 observations and others appearing in more than 100K observations! I'd like to get a "good" recall.
My first attempt at training without adjusting for data imbalance produced a micro recall rate of 70% (good enough) but a macro recall rate of 45% (not good enough). These numbers indicate that the model isn't performing well on underrepresented classes.
How can I effectively adjust for the data imbalance during training to improve the macro recall rate? I see we can provide label weights to BCEWithLogitsLoss loss function. But given the very high imbalance in my data leading to weights in the range of 1 to 1M, can I actually get the model to converge? My initial experiments show that a weighted loss function is going up and down during training.
Alternatively, is there a better approach than using BERT + dropout + linear layer for this type of task?
In your case it might be helpful to balance the labels in the training data. You have a lot of data, so you could afford to loose a part of it by balancing. But before you do this, I recommend to read this answer about balancing classes in traing data.
If you really only care about recall, you could try to tune your model maximizing recall.
Below is a plot of validation loss and training loss for my CNN model.
The validation loss is decreasing with training loss but there is a gap between the two functions.
What does this mean? The model is not overfitting as the validation loss is decreasing, but is something wrong with the model because there is a gap between the two functions?
I'm new to this, so please help.
Validatoin loss
Overfitting is not necessarily accompanied by the flattening of the validation loss curve - the gap between the loss curves simply indicates that the model is learning relationships that do not apply to the validation data. The first thing I would check in such a scenario is the balance of the sets - do both training and validation sets comprise of an equal distribution of classes/values? Was the entire dataset properly shuffled before assigning 'training' and 'validation' tags to them?
I applied batch normalization technique to increase the accuracy of my cnn model.The accuracy of model without batch Normalization was only 46 % but after applying batch normalization it crossed 83% but a here arisen a bif overfitting problem that the model was giving validation Accuracy only 15%. Also please tell me how to decide no of filters strides in convolution layer and no of units in dence layer
Batch normalization has been shown to help in many cases but is not always optimal. I found that it depends where it resides in your model architecture and what you are trying to achieve. I have done a lot with different GAN CNNs and found that often BN is not needed and can even degrade performance. It's purpose is to help the model generalize faster but sometimes it increases training times. If I am trying to replicate images, I skip BN entirely. I don't understand what you mean with regards to the accuracy. Do you mean it achieved 83% accuracy with the training data but dropped to 15% accuracy on the validation data? What was the validation accuracy without the BN? In general, the validation accuracy is the more important metric. If you have a high training accuracy and a low validation accuracy, you are indeed overfitting. If you have several convolution layers, you may want to apply BN after each. If you still over-fit, try increasing your strides and kernel size. If that doesn't work you might need to look at the data again and make sure you have enough and that it is somewhat diverse. Assuming you are working with image data, are you creating samples where you rotate your images, crop them, etc. Consider synthetic data to augment your real data to help combat overfiiting.
I am training a CNN model(made using Keras). Input image data has around 10200 images. There are 120 classes to be classified. Plotting the data frequency, I can see that sample data for every class is more or less uniform in terms of distribution.
Problem I am facing is loss plot for training data goes down with epochs but for validation data it first falls and then goes on increasing. Accuracy plot reflects this. Accuracy for training data finally settles down at .94 but for validation data its around 0.08.
Basically its case of over fitting.
I am using learning rate of 0.005 and dropout of .25.
What measures can I take to get better accuracy for validation? Is it possible that sample size for each class is too small and I may need data augmentation to have more data points?
Hard to say what could be the reason. First you can try classical regularization techniques like reducing the size of your model, adding dropout or l2/l1-regularizers to the layers. But this is more like randomly guessing the models hyperparameters and hoping for the best.
The scientific approach would be to look at the outputs for your model and try to understand why it produces these outputs and obviously checking your pipeline. Did you had a look at the outputs (are they all the same)? Did you preprocess the validation data the same way as the training data? Did you made a stratified train/test-split, i.e. keeping the class distribution the same in both sets? Is the data shuffles when you feed it to your model?
In the end you have about ~85 images per class which is really not a lot, compare CIFAR-10 resp. CIFAR-100 with 6000/600 images per class or ImageNet with 20k classes and 14M images (~500 images per class). So data augmentation could be beneficial as well.
I am training a neural network with 3 convolutional layers and 1 fully connected layer which further connected to a regression layer. My training data set if TID: for image quality estimation task. I am pre-processing images by doing local contrast normalization on a small 6 * 6 window, which results in pixel values ranging from -4 to +4 approximately. I am using RELU activation function in all the layers.
The problem is that the gradient values for different layers produced by my network are very small, somewhere around 2e-6 to 2e-7, which I guess are not ideal for good convergence as it is not producing expected results. I have already tried altering initialization of network weights and biases, changing learning rates, employing L1 and L2 regularization, etc., but nothing seems to be help elevate this problem.
So my first question is that is it actually a problem? or is a common thing to have such small gradient values in convolutional network? If it is a problem then what could an appropriate solution for it?