Is there a direct implementation of multiclass SVM in R(e1071) - svm

I have five classes and I want to use SVM(e1071 package) for the classification. I can see some good examples for binary classification using SVM, however,for Multiclass support, some members have suggested using either of One_Vs_Rest or One_vs_One binary classifier and then combine them to get the final prediction. Is there a direct implementation of Multiclass (either approach is fine for me) available?

Yes, now, I got the solution. I used the basic help file from the R and implemented the One_vs_One Multiclass using e1071 which is very short and to the point with clear comments in it.
library(xlsx)
library(gdata)
data(iris)
library(e1071)
library(caTools)
##---------- Split the overall dataset into two parts:70% for training and 30% for testing-----------
index_iris<-sample.split(iris$Species,SplitRatio=.7)
trainset_iris<-iris[index_iris==TRUE,]
testset_iris<-iris[index_iris==FALSE,]
y <- testset_iris$Species
##---------- Now Create an SVM Model with the training dataset--------------------
model <- svm(Species ~ ., data = trainset_iris)
# print(model)
# summary (model)
##-------------Use the model to predict the test dataset so that we can find the accuracy of the model-----
pred <- predict(model,testset_iris)
table(pred, y)
##-------------- Compute decision values and probabilities--------------
pred <- predict(model, testset_iris, decision.values = TRUE)
attr(pred, "decision.values")

Related

How to adopt multiple different loss functions in each steps of LSTM in Keras

I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/

Feature selection on a keras model

I was trying to find the best features that dominate for the output of my regression model, Following is my code.
seed = 7
np.random.seed(seed)
estimators = []
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=3,
batch_size=20)))
pipeline = Pipeline(estimators)
rfe = RFE(estimator= pipeline, n_features_to_select=5)
fit = rfe.fit(X_set, Y_set)
But I get the following runtime error when running.
RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
How to overcome this issue and select best features for my model? If not, Can I use algorithms like LogisticRegression() provided and supported by RFE in Scikit to achieve the task of finding best features for my dataset?
I assume your Keras model is some kind of a neural network. And with NN in general it is kind of hard to see which input features are relevant and which are not. The reason for this is that each input feature has multiple coefficients that are linked to it - each corresponding to one node of the first hidden layer. Adding additional hidden layers makes it even more complicated to determine how big of an impact the input feature has on the final prediction.
On the other hand, for linear models it is very straightforward since each feature x_i has a corresponding weight/coefficient w_i and its magnitude directly determines how big of an impact it has in prediction (assuming that features are scaled of course).
The RFE estimator (Recursive feature elimination) assumes that your prediction model has an attribute coef_ (linear models) or feature_importances_(tree models) that has the length of input features and that it represents their relevance (in absolute terms).
My suggestion:
Feature selection: (Option a) Run the RFE on any linear / tree model to reduce the number of features to some desired number n_features_to_select. (Option b) Use regularized linear models like lasso / elastic net that enforce sparsity. The problem here is that you cannot directly set the actual number of selected features. (Option c) Use any other feature selection technique from here.
Neural Network: Use only features from (1) for your neural network.
Suggestion:
Perform the RFE algorithm on a sklearn-based algorithm to observe feature importance. Finally, you use the most importantly observed features to train your algorithm based on Keras.
To your question: Standardization is not required for logistic regression

Scikit SVM gives very poor accuracy for STL-10 dataset

I am using Scikit-learn SVM for training my model for STL-10 dataset which contains 5000 training images (10 pre-defined folds). So I have 5000*96*96*3 size dataset for training and test purposes. I used following code to train it and measure the accuracy for the test set. (80% 20%). Final result was 0.323 accuracy. How can I increase the accuracy for SVM.
This is STL10 dataset
def train_and_evaluate(clf, train_x, train_y):
clf.fit(train_x, train_y)
#make 2D array as we can apply only 2d to fit() function
nsamples, nx, ny, nz = images.shape
reshaped_train_dataset = images.reshape((nsamples, nx * ny * nz))
X_train, X_test, Y_train, Y_test = train_test_split(reshaped_train_dataset, read_labels(LABEL_PATH), test_size=0.20, random_state=33)
train_and_evaluate(my_svc, X_train, Y_train)
print(metrics.accuracy_score(Y_test, clf2.predict(X_test)))
So it seems you are using raw SVM directly on the images. That is usually not a good idea (it is rather bad actually).
I will describe the classic image-classification pipeline popular in the last decades! Keep in mind, that the highest performing approaches right now might use Deep Neural Networks to combine some of these steps (a very different approach; a lot of research in the last years!)
First step:
Preprocessing is needed!
Normalize mean and variance (i would not expect your dataset to be already normalized)
Optional: histogram-equalization
Second step:
Feature-extraction -> you should learn some features from these images. There are a lot of approaches including
(Kernel-)PCA
(Kernel-)LDA
Dictionary-learning
Matrix-factorization
Local binary patterns
... (just test with LDA initially)
Third:
SVM for classification
again there might be a Normalization-step needed before this and as mentioned in the comments by #David Batista: there might be some parameter-tuning needed (especially for Kernel-SVM)
It is also not clear, if using color-information is wise here. For more simple approaches i expect black-and-white images to be superior (you are losing information but tuning your pipeline is more robust; high-performance approaches will of course use color-information).
See here for some random tutorial describing a similar problem. While i don't know if it's good work, you could immediatly recognize the processing-pipeline mentioned above (preprocessing, feature-extraction, classifier-learning)!
Edit:
Why preprocessing?: some algorithms assume centered samples with unit-variance, therefore normalization is needed. This is (at least) very important for PCA, LDA and SVM's.

Negative R2 on training data for linear regression

Using scikit-learn to fit a one dimensional model, without an intercept:
lm = sklearn.linear_models.LinearRegression(fit_intercept=False).
lm.fit(x, y)
When evaluating the score using the training data I get a negative .score().
lm.score(x, y)
-0.00256
Why? Does the R2 score compare the variance of my intercept-less model with a model with an intercept?
(Note that it is the same data that I used to fit the model.)
From Wikipedia article on R^2:
Important cases where the computational definition of R2 can yield
negative values, depending on the definition used, arise [...] where
linear regression is conducted without including an intercept.
(emphasis mine).

predict() returns image similarities with SVM in scikit learn

A silly question: after i train my SVM in scikit-learn i have to use predict function: predict(X) for predicting at which class belongs? (http://scikit-learn.org/dev/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.predict)
X parameter is the image feature vector?
In case i give an image not trained (not trained because SVM ask at least 3 samples for class), what returns?
First remark: "predict() returns image similarities with SVM in scikit learn" is not a question. Please put a question in the header of Stack Overflow entries.
Second remark: the predict method of the SVC class in sklearn does not return "image similarities" but a class assignment prediction. Read the http://scikit-learn.org documentation and tutorials to understand what we mean by classification and prediction in machine learning.
X parameter is the image feature vector?
No, X is not "the image" feature vector: it is a set of image feature vectors with shape (n_samples, n_features) as explained in the documentation you refer to. In your case a sample is an image hence the expected shape would be (n_images, n_features). The predict API was design to compute many predictions at once for efficiency reason. If you want to compute a single prediction, you will have to wrap your single feature vector in an array with shape (1, n_features).
For instance if you have a single feature vector (1D) called my_single_image_features with shape (n_features,) you can call predict with:
predictions = clf.predict([my_single_image_features])
my_single_prediction = predictions[0]
Please note the [] signs around the my_single_image_features variable to turn it into a 2D array.
my_single_prediction will be an integer whose meaning depends on the integer values provided by you when calling the clf.fit(X_train, y_train) method in the first place.
In case i give an image not trained (not trained because SVM ask at least 3 samples for class), what returns?
An image is not "trained". Only the model is trained. Of course you can pass samples / images that are not part of the training set to the predict method. This is the whole purpose of machine learning: making predictions on new unseen data based on what you learn from the statistical regularities seen in the past training data.

Resources