Image Classification models trained on animal classification data like iNaturalist or iWildcam sometimes developed spurious correlations with the background. How to measure model performance limitations caused only by such spurious correlations as opposed to other plausible (non-spurious) reasons (i.e 2 animals do look a lot like each other) ?!
Related
I want to compare the performance between ELMo and word2vec as word embedding using the CNN model by classifying 4000 tweet data on five class labels, but the results show that ELMo gives worse performance than word2vec.
I used ELMoformanylangs for ELMo and pretrained 1 million tweets for word2vec
Curve loss of word2vec-cnn
Curve loss of ELMo-cnn
It shows that the 2 models are overfitting, but why can ELMo be worse than word2vec?
From the elmoformanylangs project you've linked, it looks like your generic ELMo model was trained on "on a set of 20-million-words data randomly sampled from the raw text released by the shared task (wikidump + common crawl)".
Given that many tweets are larger than 20 words, your 1-million-tweets training set for word2vec might be larger training data than was used for the ELMo model. And, coming from actual tweets, it may also reflect words/word-senses used in tweets better than generic wikidump/common-crawl text.
Given that, I'm not sure why you'd have expected the ELMo approach to necessarily be better.
But also, as you've noted, the fact that your classifier is performing worse with more training is highly indicative of extreme overfitting. You may want to fix that before attempting to reason any further about the relative merits of different approaches. (When both classifiers are massively broken, exactly why one's brokenness is a bit better than the others' brokenness should be a fairly moot point. After they're both fixed to do as well as they can, then the remaining difference may be interesting to choose between, or understand deeply.)
I am trying to train a CNN model for a regression problem, after that, I categorize predicted labels into 4 classes and check some accuracy metrics. In confusion matrix accuracy of class 2,3 are around 54% and accuracy of class 1,4 are more than 90%. labels are between 0-100 and classes are 1: 0-45,2: 45-60, 3:60-70, 4:70-100. I do not know where the problem comes from Is it because of the distribution of labels in the training set and what is the solution! Regards...
I attached the plot in the following link.
Training set target distribution
It's not a good idea to create classes that way. Giving to some classes a smaller window of values (i.e. you predict 2 for 15 values and 1 for 45 values), it is intrinsically more difficult for your model to predict class 2, and the best thing the model will learn during training will be to avoid class 2 as much as possible.
You may confirm this having a look at False Negatives for classes 2 and 3, if they are too many, it might be due to this.
The best thing to do would be categorizing your output space in equal portions, and trusting your model will learn which classes are less frequent, without trying to force that proportion by yourself.
If you don't have good results, it means you have to improve your model in other ways, maybe using data augmentation to get a uniform distribution of training samples may help.
If this doesn't sound convincing for you, try to have a look at this paper:
https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network.pdf
In end-to-end models for autonomous driving, neural networks have to predict classes indicating the steering angle. The distribution of these values is highly imbalanced as most of the time the car is going straight. Despite this, the best models do not discriminate against some classes to adapt to data distribution.
Good luck!
I am training a CNN model(made using Keras). Input image data has around 10200 images. There are 120 classes to be classified. Plotting the data frequency, I can see that sample data for every class is more or less uniform in terms of distribution.
Problem I am facing is loss plot for training data goes down with epochs but for validation data it first falls and then goes on increasing. Accuracy plot reflects this. Accuracy for training data finally settles down at .94 but for validation data its around 0.08.
Basically its case of over fitting.
I am using learning rate of 0.005 and dropout of .25.
What measures can I take to get better accuracy for validation? Is it possible that sample size for each class is too small and I may need data augmentation to have more data points?
Hard to say what could be the reason. First you can try classical regularization techniques like reducing the size of your model, adding dropout or l2/l1-regularizers to the layers. But this is more like randomly guessing the models hyperparameters and hoping for the best.
The scientific approach would be to look at the outputs for your model and try to understand why it produces these outputs and obviously checking your pipeline. Did you had a look at the outputs (are they all the same)? Did you preprocess the validation data the same way as the training data? Did you made a stratified train/test-split, i.e. keeping the class distribution the same in both sets? Is the data shuffles when you feed it to your model?
In the end you have about ~85 images per class which is really not a lot, compare CIFAR-10 resp. CIFAR-100 with 6000/600 images per class or ImageNet with 20k classes and 14M images (~500 images per class). So data augmentation could be beneficial as well.
what approach should i take when I want my CNN multi-class network to output something like [0.1, 0,1] when image doesn't belong
to any class. Using softmax and categorical_crossentropy for multi-class would give me output that sums up to 1 so still not what I want.
I'm new to neural networks so sorry for silly question and thanks in advance for any help.
I think you are gonna think about Bayesian Learning. First, talking about uncertainty.
For example, given several pictures of dog breeds as training data—when a user uploads a photo of his dog—the hypothetical website should return a prediction with rather high confidence. But what should happen if a user uploads a photo of a cat and asks the website to decide on a dog breed?
The above is an example of out of distribution test data. The model has been trained on photos of dogs of different breeds, and has (hopefully) learnt to distinguish between them well. But the model has never seen a cat before, and a photo of a cat would lie outside of the data distribution the model was trained on. This illustrative example can be extended to more serious settings, such as MRI scans with structures a diagnostics system has never observed before, or scenes an autonomous car steering system has never been trained on.
A possible desired behaviour of a model in such cases would be to return a prediction (attempting to extrapolate far away from our observed data), but return an answer with the added information that the point lies outside of the data distribution. We want our model to possess some quantity conveying a high level of uncertainty with such inputs (alternatively, conveying low confidence).
Then, I think you could read briefly this paper when they also apply to classification task and generate uncertainty for classes (dog, cat...). From this paper, you can extend your finding to application using this paper, and I think you will find what you want.
I am using keras sequential model for binary classification. But My data is unbalanced. I have 2 features column and 1 output column(1/0). I have 10000 of data. Among that only 20 results in output 1, all others are 0. Then i have extended the data size to 40000. Now also only 20 results in output 1, all others are 0. Since the data is unbalanced(0 dominates 1), which neural network will be better for correct prediction?
First of all, two features is a really small amount. Neural Networks are highly non-linear models with a really really high amount of freedom degrees, thus if you try to train a network with more than just a couple of networks it will overfit even with balanced classes. You can find more suitable models for a small dimensionality like Support Vector Machines in scikit-learn library.
Now about unbalanced data, the most common techniques are Undersampling and Oversampling. Undersampling is basically training your model several times with a fraction of the dataset, that contains the non dominant class and a random sample of the dominant so that the ratio is acceptable, where as oversampling consist on generating artificial data to balance the classes. In most cases undersampling works better.
Also when working with unbalanced data it's quite important to choose the right metric based on what is more important for the problem (is minimizing false positives more important than false negatives, etc).