Suppose we have five classes: Dog, Cat, Banana, Apple, and Tree.
If we train a CNN with all and want to predict the class of unknown images like an image of "Car", the model gives one of the classes each time.
Can you please tell me how we can tell the model if the data is not part of the training dataset, say "I did not detect the class" or something like that?
Thank you
You can solve this problem either by adding a neutral class or by transforming your problem into a multi-classification and adding thresholding.
for the neutral class, you have to go back to the labeling phase and add some random image (be careful that this random image does not contain an image of the other class, for example in your case an image of a dog because this would hurt your model) with the label "other".
in case of multi-classification, you don't need additional images, but you need to change the activation function of your classification layer (the last layer of your network), from softmax to sigmoid
for exemple
from :
keras.layers.Dense(5, activation='softmax')
to
keras.layers.Dense(5, activation='sigmoid')
The main difference between the sigmoid model and the softmax model is that the softmax model is guaranteed to contain values whose sum is equal to 1, whereas the output of the sigmoid model will contain values, each between 0 and 1.
So your model in this case learns to make an independent prediction of each class for example if the output is [0.7,0.4,0.4,0.5,0.8] if we take 0.6 as the threshold, then your model is almost sure that the image contains a dog and a tree, in the case we take for example 0.9 as the threshold this image does not contain any object of your 5 - classes so you can say "other".
Related
I have a multi-label text classification task. The train data labels are categories that might exist as tokens in the training data texts. For instance, some observations look like the following:
Train=[["input": "Dogs are animals. Dogs ove humans.", class: ["dog"]],
["input": "Cats are running in the street.", class: ["cat"]],
["input": "Cats and dogs live with humans.", class: ["cat", "dog"]],
["input": "These animals don't each chocolate.", class: ["dog"]]]
I want to train a classifier by fine-tuning a language model using Pytorch. My question is if I must ensure that the class labels are masked in the training input text? If not, will the classifier overfit or lose generalizability?
If I must mask the labels in the inputs, how can I do it using Pytorch?
First, the label will be mapped to continuous integers, so the model does not know that the text contains the label.
Secondly, you are right, the dog label high probability model will pay more attention to the dog word in the text, but not because the text contains the label, but a feature.
Finally, if you want the model to learn more features that are not related to label words, do not use MASK and replace them with other label words. It is a good solution at present. Refer to MABEL: Attenuating Gender Bias using Textual Entailment Data
I looked at the documentation of scikit-learn but it is not clear to me what sort of classification method is used under the hood of the VotingClassifier? Is it logistic regression, SVM or some sort of a tree method?
I'm interested in ways to vary the classifier method used under the hood. If Scikit-learn is not offering such an option is there a python package which can be integrated easily with scikit-learn which would offer such functionality?
EDIT:
I meant the classifier method used for the second level model. I'm perfectly aware that the first level classifiers can be any type of classifier supported by scikit-learn.
The second level classifier uses the predictions of the first level classifiers as inputs. So my question is - what method does this second level classifier use? Is it logistic regression? Or something else? Can I change it?
General
The VotingClassifier is not limited to one specific method/algorithm. You can choose multiple different algorithms and combine them to one VotingClassifier. See example below:
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
clf1 = LogisticRegression(...)
clf2 = RandomForestClassifier(...)
clf3 = SVC(...)
eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svm', clf3)], voting='hard')
Read more about the usage here: VotingClassifier-Usage.
When it comes down to how the VotingClassifier "votes" you can either specify voting='hard' or voting='soft'. See the paragraph below for more detail.
Voting
Majority Class Labels (Majority/Hard Voting)
In majority voting, the predicted class label for a particular sample
is the class label that represents the majority (mode) of the class
labels predicted by each individual classifier.
E.g., if the prediction for a given sample is
classifier 1 -> class 1 classifier 2 -> class 1 classifier 3 -> class
2 the VotingClassifier (with voting='hard') would classify the sample
as “class 1” based on the majority class label.
Source: scikit-learn-majority-class-labels-majority-hard-voting
Weighted Average Probabilities (Soft Voting)
In contrast to majority voting (hard voting), soft voting returns the
class label as argmax of the sum of predicted probabilities.
Specific weights can be assigned to each classifier via the weights
parameter. When weights are provided, the predicted class
probabilities for each classifier are collected, multiplied by the
classifier weight, and averaged. The final class label is then derived
from the class label with the highest average probability.
Source/Read more here: scikit-learn-weighted-average-probabilities-soft-voting
The VotingClassifier does not fit any meta model on the first level of classifiers output.
It just aggregates the output of each classifier in the first level by the mode (if voting is hard) or averaging the probabilities (if the voting is soft).
In simple terms, VotingClassifier does not learn anything from the first level of classifiers. It only consolidates the output of individual classifiers.
If you want your meta model to be more intelligent, try using the adaboost, gradientBoosting models.
I have 10 class dataset with this I got 85% accuracy, got the same accuracy on a saved model.
now I want to add a new class, how to add a new class To the saved model.
I tried by deleting the last layer and train but model get overfit and in prediction every Images show same result (newly added class).
This is what I did
model.pop()
base_model_layers = model.output
pred = Dense(11, activation='softmax')(base_model_layers)
model = Model(inputs=model.input, outputs=pred)
# compile and fit step
I have trained model with 10 class I want to load the model train with class 11 data and give predictions.
Using the model.pop() method and then the Keras Model() API will lead you to an error. The Model() API does not have the .pop() method, so if you want to re-train your model more than once you will have this error.
But the error only occurs if you, after the re-training, save the model and use the new saved model in the next re-training.
Another very wrong and used approach is to use the model.layers.pop(). This time the problem is that function only removes the last layer in the copy it returns. So, the model still has the layer, and just the method's return does not have the layer.
I recommend the following solution:
Admitting you have your already trained model saved in the model variable, something like:
model = load_my_trained_model_function()
# creating a new model
model_2 = Sequential()
# getting all the layers except the output one
for layer in model.layers[:-1]: # just exclude last layer from copying
model_2.add(layer)
# prevent the already trained layers from being trained again
# (you can use layers[:-n] to only freeze the model layers until the nth layer)
for layer in model_2.layers:
layer.trainable = False
# adding the new output layer, the name parameter is important
# otherwise, you will add a Dense_1 named layer, that normally already exists, leading to an error
model_2.add(Dense(num_neurons_you_want, name='new_Dense', activation='softmax'))
Now you should specify the compile and fit methods to train your model and it's done:
model_2.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# model.fit trains the model
model_history = model_2.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_split=0.1)
EDIT:
Note that by adding a new output layer we do not have the weights and biases adjusted in the last training.
Thereby we lost pretty much everything from the previous training.
We need to save the weights and biases of the output layer of the previous training, and then we must add them to the new output layer.
We also must think if we should let all the layers train or not, or even if we should allow the training of only some intercalated layers.
To get the weights and biases from the output layer using Keras we can use the following method:
# weights_training[0] = layer weights
# weights_training[1] = layer biases
weights_training = model.layers[-1].get_weights()
Now you should specify the weights for the new output layer. You can use, for example, the mean of the weights for the weights of the new classes. It's up to you.
To set the weights and biases of the new output layer using Keras we can use the following method:
model_2.layers[-1].set_weights(weights_re_training)
model.pop()
base_model_layers = model.output
pred = Dense(11, activation='softmax')(base_model_layers)
model = Model(inputs=model.input, outputs=pred)
Freeze the first layers, before train it
for layer in model.layers[:-2]:
layer.trainable = False
I am assuming that the problem is singlelabel-multiclass classification i.e. a sample will belong to only 1 of the 11 classes.
This answer will be completely based on implementing the way humans learn into machines. Hence, this will not provide you with a proper code of how to do that but it will tell you what to do and you will be able to easily implement it in keras.
How does a human child learn when you teach him new things? At first, we ask him to forget the old and learn the new. This does not actually mean that the old learning is useless but it means that for the time while he is learning the new, the old knowledge should not interfere as it will confuse the brain. So, the child will only learn the new for sometime.
But the problem here is, things are related. Suppose, the child learned C programming language and then learned compilers. There is a relation between compilers and programming language. The child cannot master computer science if he learns these subjects separately, right? At this point we introduce the term 'intelligence'.
The kid who understands that there is a relation between the things he learned before and the things he learned now is 'intelligent'. And the kid who finds the actual relation between the two things is 'smart'. (Going deep into this is off-topic)
What I am trying to say is:
Make the model learn the new class separately.
And then, make the model find a relation between the previously learned classes and the new class.
To do this, you need to train two different models:
The model which learns to classify on the new class: this model will be a binary classifier. It predicts a 1 if the sample belongs to class 11 and 0 if it doesn't. Now, you already have the training data for samples belonging to class 11 but you might not have data for the samples which doesn't belong to class 11. For this, you can randomly select samples which belong to classes 1 to 10. But note that the ratio of samples belonging to class 11 to that not belonging to class 11 must be 1:1 in order to train the model properly. That means, 50% of the samples must belong to class 11.
Now, you have two separate models: the one which predicts class 1-10 and one which predicts class 11. Now, concatenate the outputs of (the 2nd last layers) these two models with a newly created Dense layer with 11 nodes and let the whole model retrain itself adjusting the weights of pretrained two models and learning new weights of the dense layer. Keep the learning rate low.
The final model is the third model which is a combination of two models (without last Dense layer) + a new Dense layer.
Thank you..
I trained CNN model for just one epoch with very little data. I use Keras 2.05.
Here is the CNN model's (partial) last 2 layers, number_outputs = 201. Training data output is one hot encoded 201 output.
model.add(Dense(200, activation='relu', name='full_2'))
model.add(Dense(40, activation='relu', name='full_3'))
model.add(Dense(number_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
The model is saved to a h5 file. Then, saved mode is loaded with same model as above. batch_image is an image file.
prediction = loaded_model.predict(batch_image, batch_size=1)
I get prediction like this:
ndarray: [[ 0.00498065 0.00497852 0.00498095 0.00496987 0.00497506 0.00496112
0.00497585 0.00496474 0.00496769 0.0049708 0.00497027 0.00496049
0.00496767 0.00498348 0.00497927 0.00497842 0.00497095 0.00496493
0.00498282 0.00497441 0.00497477 0.00498019 0.00497417 0.00497654
0.00498381 0.00497481 0.00497533 0.00497961 0.00498793 0.00496556
0.0049665 0.00498809 0.00498689 0.00497886 0.00498933 0.00498056
Questions:
Prediction array should be 1, 0? Why do I get output like output activate as sigmoid, and loss is binary_crossentropy. What is wrong? I want to emphasize again, the model is not really trained well with data. It's almost just initialized with random weights.
If I don't train the network well (not converge yet), such as just initializing weights with random number, should the prediction still be 1, 0?
If I want to get the probability of prediction, and then, I decide how to interpret it, how to get the probability prediction output after the CNN is trained?
Your number of output is 201 that is why your output comes as (1,201) and not as (1,0). You can easily get which class has the highest value just by using np.argmax and that class is the output for your given input by your model.
And for the fact even when you have trained for 1 epoch only, your model has learned something that may be very lame, but still, it learns something and based on that, it has predicted the output.
You have used softmax as your activation in the last layer. It normalizes your output in a non-linear fashion so that the sum of output for all classes is equals to 1. So the value you get for each class can be interpreted as the probability of that class as output for the given input by the model. (For more clarity, you can look into how softmax function works)
And lastly, each class has values like 0.0049 or similar because the model is not sure which class your input belongs to. So it calculates values for each class and then softmax normalizes it. That is why your output values are in the range 0 to 1.
For example, say I have four class so one of the probable output can be like [0.223 0.344 0.122 0.311] which in the end we look as a confidence score for each class. And by looking at confidence score for each class we can say the predicted class is 2 as it has the highest confidence score of 0.344.
The output of a softmax layer is not 0 or 1. It is actually a normalized layer adding up to 1. If you do the sum of all your coefficient, they will add up. To get the prediction, you should get the one with the highest value. You can interpret them as probability even if there are not technically. https://en.wikipedia.org/wiki/Softmax_function for the definition.
This layer is used in the training process in order to be able to compare the prediction of a categorical classification and the true label.
It is required for the optimization because the optimization is done on derivable functions (having a gradient) and a 0,1 output would not be derivable (not even continuous). The optimization is done afterwards on all these values.
An interesting example is the following one: if your true target is [0 0 1 0] and your prediction output [0.1 0.1 0.6 0.2], even if the prediction is correct, it will still be able to learn, because it still give a non zero probabilty to the other classes, on which you can compute a gradient.
In order to get the prediction output in form of class in stead of probability, use:
model.predict_classes(x_train,batch_size)
My understanding is, Softmax says the likelihood of the value landing in that bucket out of the 201 buckets. With certainty of the first bucket you would get [1,0,0,0,0........]. Since very little training/learning/weight adjustment has occurred, the 201 values are all about 0.00497 which together sum to 1.
A decent description on developers.Google of SoftMax here
The output was specified as 'number_outputs' so you get 201 outputs, each of which tell you the likelihood (as a value between 0 and 1) of your prediction being THAT output.
A silly question: after i train my SVM in scikit-learn i have to use predict function: predict(X) for predicting at which class belongs? (http://scikit-learn.org/dev/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.predict)
X parameter is the image feature vector?
In case i give an image not trained (not trained because SVM ask at least 3 samples for class), what returns?
First remark: "predict() returns image similarities with SVM in scikit learn" is not a question. Please put a question in the header of Stack Overflow entries.
Second remark: the predict method of the SVC class in sklearn does not return "image similarities" but a class assignment prediction. Read the http://scikit-learn.org documentation and tutorials to understand what we mean by classification and prediction in machine learning.
X parameter is the image feature vector?
No, X is not "the image" feature vector: it is a set of image feature vectors with shape (n_samples, n_features) as explained in the documentation you refer to. In your case a sample is an image hence the expected shape would be (n_images, n_features). The predict API was design to compute many predictions at once for efficiency reason. If you want to compute a single prediction, you will have to wrap your single feature vector in an array with shape (1, n_features).
For instance if you have a single feature vector (1D) called my_single_image_features with shape (n_features,) you can call predict with:
predictions = clf.predict([my_single_image_features])
my_single_prediction = predictions[0]
Please note the [] signs around the my_single_image_features variable to turn it into a 2D array.
my_single_prediction will be an integer whose meaning depends on the integer values provided by you when calling the clf.fit(X_train, y_train) method in the first place.
In case i give an image not trained (not trained because SVM ask at least 3 samples for class), what returns?
An image is not "trained". Only the model is trained. Of course you can pass samples / images that are not part of the training set to the predict method. This is the whole purpose of machine learning: making predictions on new unseen data based on what you learn from the statistical regularities seen in the past training data.