How to manually assign weights to some features in SVM? - svm

I ran a multi class SVM using LibLinear but the Model is giving high weights to certain typical features for every class.
For Ex: For Class 1, a particular variable which is 0 for Class 1 and non-zero otherwise has a very dominating weight in my hyperplane equation.
I want to ignore these specific features while computing hyperplanes for the specific classes. One way is to assign Zero Weights to those features. How do I change the code for this?
For Ex:
For Class 1, I assign W=0 for Feature_1
For Class 2, I assign W=0 for Feature_2
For Class 3, I assign W=0 for Feature_3
and so on...

You will have to do it by hand. Multi-class SVM is just a bunch of SVMs trained independently. You can do it on your own, each time removing different feature.

Related

Predict a class that is not in the training dataset

Suppose we have five classes: Dog, Cat, Banana, Apple, and Tree.
If we train a CNN with all and want to predict the class of unknown images like an image of "Car", the model gives one of the classes each time.
Can you please tell me how we can tell the model if the data is not part of the training dataset, say "I did not detect the class" or something like that?
Thank you
You can solve this problem either by adding a neutral class or by transforming your problem into a multi-classification and adding thresholding.
for the neutral class, you have to go back to the labeling phase and add some random image (be careful that this random image does not contain an image of the other class, for example in your case an image of a dog because this would hurt your model) with the label "other".
in case of multi-classification, you don't need additional images, but you need to change the activation function of your classification layer (the last layer of your network), from softmax to sigmoid
for exemple
from :
keras.layers.Dense(5, activation='softmax')
to
keras.layers.Dense(5, activation='sigmoid')
The main difference between the sigmoid model and the softmax model is that the softmax model is guaranteed to contain values whose sum is equal to 1, whereas the output of the sigmoid model will contain values, each between 0 and 1.
So your model in this case learns to make an independent prediction of each class for example if the output is [0.7,0.4,0.4,0.5,0.8] if we take 0.6 as the threshold, then your model is almost sure that the image contains a dog and a tree, in the case we take for example 0.9 as the threshold this image does not contain any object of your 5 - classes so you can say "other".

Keras - understand example

I am currently trying to learn Deep Learning by focussing on Keras and the book "Deep Learning with Python-Keras"
I do have an example - I do understand the code but not the result - where I need your help. The example is about analyzing movie review from the imdB dataset which is included in Keras. The code goes as follows
def vectorize_sequences(sequences,dimension=10000):
results=np.zeros((len(sequences),dimension))
for i, sequence in enumerate(sequences):
results[i,sequence]=1.
return results
X_train=vectorize_sequences(train_data)
X_test=vectorize_sequences(test_data)
y_train=np.asarray(train_labels)
y_test=np.asarray(test_labels)
model=models.Sequential()
model.add(layers.Dense(16,activation="relu",input_shape=(10000,)))
model.add(layers.Dense(16,activation="relu"))
model.add(layers.Dense(1,activation="sigmoid"))
model.compile(optimizer="rmsprop",loss="binary_crossentropy",metrics=["accuracy"])
history=model.fit(X_train,y_train,epochs=4,batch_size=512)
In the explanation it is written, that "the final layer will use a sigmoid activation so as to output a probability indicating how likely the sample is to have the target “1”"
I know that the sigmoid function ranges between [0,1]. Suppose the output of my network is 0.6
Why am I allowed to say that this value gives the probability to have the target "1" and not the target "0"?
I am kind of stucked and need some help :)
The interpretation of your output depends on the labels you used during your training. So train_labels and test_labels are concluded of 0s and 1s.
During training, the network is optimized to yield the correct label corresponding to an input sequence. So if your output is 0 or 1, the network is giving a confident classification. But, if your output is e.g. 0.5, the network is totally unsure to which class your input belongs.
Now we make the assumption that your input corresponds to class 1. In case of an output like 0.6, the class might be 1, but only with a confidence of 60 percent. It describes the probability to be class 1, since an output of 1 is a correct interpretation of the input to its label. If the output would be a 0, it would be the worst classification of the input since the label is 1. So this in the end corresponds to values ranging from 0 to 1, while the closer to 1 you are the better the classification - so it is a probability in the end.
But keep in mind that this definition only holds if you know that your input belongs to class 1. If it instead is part of class 0, the previous definition has to be turned around.
So in the end, you got two options. First, you can take these values as they are and treat them as a probability an input corresponds to one of the classes. Second, you can introduce a threshold - in this case it makes sense to set it to 0.5 - and say that if you are larger than the threshold, categorize your input to class 1, else to class 0. The closer your output is to 0.5 the more the network is just guessing the class in the end.
The choice of the threshold has a direct influence on the performance of your network in the end. This can be evaluated for example with a ROC curve (https://en.wikipedia.org/wiki/Receiver_operating_characteristic).

Using regression instead of classification for multi class classification

I have a multi class classification problem. I am using random forest classifier. My boss has asked if it is possible to also view our problem with regression. I understand that for a classification task, it is of course better to use a classifier, but is it possible to implement a regression model.
My data is as such:
I have a dataset consisting of software requirements, these are rated as either 1, 2, 3, 4 or 5.
I am then creating a feature matrix to use for training the model to make predictions on the class, with 10 features such as: num_words, num_sentences, num_syllables, weak_words, flesh_idx etc
My model works quite well with 93% accuracy.
Is there a way I can view this problem using regression? Such that the model would make predictions such as 1.5 for example, where the prediction doesn't fall into the class 1 or 2 but somewhere in the middle? Or maybe 2.2, 3.3 etc as opposed to 1, 2, 3, 4, or 5?
I guess the reason is just to see if we can see the software requirement scores in a continuous way.
try Softmax regression (or multinomial logistic regression) with mxnet or with tensorflow
The way you can use regression in classification problems is with Logistic Regressions. You can use this individually to classify 1 vs not 1, 2 vs not 2, and so on for each classification (don't do this), or use Softmax that in simple words, weights each class and returns a probability for each given class, then you just pick the one with the max probability and that will be your predicted class. There are a lot of neural networks that use softmax when working with mutli-class classification.
Here is a great article from scikit-learn's documentation:
https://scikit-learn.org/stable/modules/neural_networks_supervised.html

What is the classifier used in scikit-learn's VotingClassifier?

I looked at the documentation of scikit-learn but it is not clear to me what sort of classification method is used under the hood of the VotingClassifier? Is it logistic regression, SVM or some sort of a tree method?
I'm interested in ways to vary the classifier method used under the hood. If Scikit-learn is not offering such an option is there a python package which can be integrated easily with scikit-learn which would offer such functionality?
EDIT:
I meant the classifier method used for the second level model. I'm perfectly aware that the first level classifiers can be any type of classifier supported by scikit-learn.
The second level classifier uses the predictions of the first level classifiers as inputs. So my question is - what method does this second level classifier use? Is it logistic regression? Or something else? Can I change it?
General
The VotingClassifier is not limited to one specific method/algorithm. You can choose multiple different algorithms and combine them to one VotingClassifier. See example below:
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
clf1 = LogisticRegression(...)
clf2 = RandomForestClassifier(...)
clf3 = SVC(...)
eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svm', clf3)], voting='hard')
Read more about the usage here: VotingClassifier-Usage.
When it comes down to how the VotingClassifier "votes" you can either specify voting='hard' or voting='soft'. See the paragraph below for more detail.
Voting
Majority Class Labels (Majority/Hard Voting)
In majority voting, the predicted class label for a particular sample
is the class label that represents the majority (mode) of the class
labels predicted by each individual classifier.
E.g., if the prediction for a given sample is
classifier 1 -> class 1 classifier 2 -> class 1 classifier 3 -> class
2 the VotingClassifier (with voting='hard') would classify the sample
as “class 1” based on the majority class label.
Source: scikit-learn-majority-class-labels-majority-hard-voting
Weighted Average Probabilities (Soft Voting)
In contrast to majority voting (hard voting), soft voting returns the
class label as argmax of the sum of predicted probabilities.
Specific weights can be assigned to each classifier via the weights
parameter. When weights are provided, the predicted class
probabilities for each classifier are collected, multiplied by the
classifier weight, and averaged. The final class label is then derived
from the class label with the highest average probability.
Source/Read more here: scikit-learn-weighted-average-probabilities-soft-voting
The VotingClassifier does not fit any meta model on the first level of classifiers output.
It just aggregates the output of each classifier in the first level by the mode (if voting is hard) or averaging the probabilities (if the voting is soft).
In simple terms, VotingClassifier does not learn anything from the first level of classifiers. It only consolidates the output of individual classifiers.
If you want your meta model to be more intelligent, try using the adaboost, gradientBoosting models.

when to use to_categorical in keras

Please someone explain when keras.utils.np_utils.to_categorical is to be used?
I understand that it converts class vector to binary matrix, probably used in Deep Learning model.
But if we still go ahead and use the class vector itself, and probably use model.predict_classes - what is the drawback?
A classification model with multiple classes doesn't work well if you don't have classes distributed in a binary matrix.
Suppose you have three clasess, the vector goes like this:
[1, 0, 0] = class 1
[0, 1, 0] = class 2
[0, 0, 1] = class 3
You use to_categorical to transform your training data before you pass it to your model.
If your training data uses classes as numbers, to_categorical will transform those numbers in proper vectors for using with models. You can't simply train a classification model without that.
Unfortunately, predict_classes is not documented, so it's probably better not to use it. But I suppose it does exactly the inverse operation to_categorical does. Your model outputs the vectors, and predict_classes transforms those vectors in human readable classes.
I know this is an old thread, but figured I'd help clarify.
The reason you want to_categorical (even on numeric labels) is due to how the relationship between your labels is understood by the algorithm.
For example, suppose you made a color classifier. You mark red as 1, blue as 2, and orange as 3.
Now you feed them into the machine learning algorithm to help decide what your input matches. The math is going to say that orange is higher than red. This obviously isn't your intent, but the network will know that orange is greater than red.

Resources