one class SVM libSVM - svm

Lets say my feature vector is (x1, x2, ...xn)
Could anyone give me a code to train one-class SVM using libSVM?
How should I learn the parameters using cross validation.

This may help you
label=ones(Number Of your training instances,1); % You should generate labels for your only class!
model = svmtrain( label, Training Data , '-s 2 -t 2 -n 0.5' ) ; % You can change the parameters
[predicted_label,accuracy]=svmpredict(TestLabels,Test Set, model);

Related

SkLearn SVM - How to get multiple predictions ordered by probability?

I am doing some text classification.
Let's say I have 10 categories and 100 "samples", where each sample is a sentence of text. I have split my samples into 80:20 (training, testing) and trained the SVM classifier:
text_clf_svm = Pipeline([('vect', CountVectorizer(stop_words=('english'),ngram_range=(1,2))), ('tfidf', TfidfTransformer()),
('clf-svm', SGDClassifier(loss='hinge', penalty='l2', random_state=42, learning_rate='adaptive', eta0=0.9))])
# Fit training data to SVM classifier, predict with testing data and print accuracy
text_clf_svm = text_clf_svm.fit(training_data, training_sub_categories)
Now when it comes to predicting, I do not want just a single category to be predicted. I want to see, for example, a list of the "top 5" categories for a given unseen sample as well as their associated probabilities:
top_5_category_predictions = text_clf_svm.predict(a_single_unseen_sample)
Since text_clf_svm.predict returns a value which represents the index of the categories available, I want to see something like this as output:
[(4,0.70),(1,0.20),(7,0.04),(9,0.06)]
Anyone know how to achieve this?
This is something I had used a while back for a similar problem:
probs = clf.predict_proba(X_test)
# Sort desc and only extract the top-n
top_n_category_predictions = np.argsort(probs)[:,:-n-1:-1]
This will give you the top n categories for each sample.
If you also want to see the probabilities corresponding to these categories, then you can do:
top_n_probs = np.sort(probs)[:,:-n-1:-1]
Note: Here X_test is of shape (n_samples, n_features). So make sure you use your single_unseen_sample in the same format.

How to use cross_val_predict to predict probabilities for a new dataset?

I am using sklearn's cross_val_predict for training like so:
myprobs_train = cross_val_predict(LogisticRegression(),X = x_old, y=y_old, method='predict_proba', cv=10)
I am happy with the returned probabilities, and would like now to score up a brand-new dataset. I tried:
myprobs_test = cross_val_predict(LogisticRegression(), X =x_new, y= None, method='predict_proba',cv=10)
but this did not work, it's complaining about y having zero shape. Does it mean there's no way to apply the trained and cross-validated model from cross_val_predict on new data? Or am I just using it wrong?
Thank you!
You are looking at a wrong method. Cross validation methods do not return a trained model; they return values that evaluate the performance of a model (logistic regression in your case). Your goal is to fit some data and then generate prediction for new data. The relevant methods are fit and predict of the LogisticRegression class. Here is the basic structure:
logreg = linear_model.LogisticRegression()
logreg.fit(x_old, y_old)
predictions = logreg.predict(x_new)
I have the same concern as #user3490622. If we can only use cross_val_predict on training and testing sets, why y (target) is None as the default value? (sklearn page)
To partially achieve the desired results of multiple predicted probability, one could use the fit then predict approach repeatedly to mimic the cross-validation.

Default value in Svm prediction Scikitlearn

I am using scikitlearn for svm classification.
I need a classifier that returns default value when a given test item doesn't match any of the training-set items, i.e. when the distance is very high. Is that possible?
For Example
Let's say my training-set is
X= [[0.5,0.5,2],[4, 4,16],[16, 16,64]]
and labels
y=[0,1,2]
then I run training
clf = svm.SVC()
clf.fit(X, y)
then I run prediction
clf.predict([-100,-100,-200])
Now as we can see the test-item [-100,-100,-200] is too far away from any of the training-items, in this case the prediction will yield [2] which is this item [16, 16,64], is there anyway to make it return anything else (not from training-set)?
I think you can create a label for those big values, and added into your training set.
X= [[0.5,0.5,2],[4, 4,16],[16, 16,64],[-100,-100,200]]
Y=[0,1,2,100]
and give a try.
Since SVM is supervised learning, which means the 'OUTPUT' have to be specified. If you are not certain about the 'OUTPUT', do some non supervised clustering (kmeans for example), and have a rough idea how many possible 'OUTPUT' you will expect.

How to apply random forest properly?

I am new to machine learning and python. Now I am trying to apply random forest to predict binary results of a target. In my data I have 24 predictors (1000 observations) where one of them is categorical(gender) and all the others numerical. Among numerical ones, there are two types of values which are volume of money in euros (very skewed and scaled) and numbers (number of transactions from an atm). I have transformed the big scale features and did the imputation. Last, I have checked correlation and collinearity and based on that removed some features (as a result I had 24 features.) Now when I implement RF it is always perfect in the training set while the ratios not so good according to crossvalidation. And even applying it in the test set it gives very very low recall values. How should I remedy this?
def classification_model(model, data, predictors, outcome):
# Fit the model:
model.fit(data[predictors], data[outcome])
# Make predictions on training set:
predictions = model.predict(data[predictors])
# Print accuracy
accuracy = metrics.accuracy_score(predictions, data[outcome])
print("Accuracy : %s" % "{0:.3%}".format(accuracy))
# Perform k-fold cross-validation with 5 folds
kf = KFold(data.shape[0], n_folds=5)
error = []
for train, test in kf:
# Filter training data
train_predictors = (data[predictors].iloc[train, :])
# The target we're using to train the algorithm.
train_target = data[outcome].iloc[train]
# Training the algorithm using the predictors and target.
model.fit(train_predictors, train_target)
# Record error from each cross-validation run
error.append(model.score(data[predictors].iloc[test, :], data[outcome].iloc[test]))
print("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error)))
# Fit the model again so that it can be refered outside the function:
model.fit(data[predictors], data[outcome])
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20)
predictor_var = train.drop('Sold', axis=1).columns.values
classification_model(model,train,predictor_var,outcome_var)
#Create a series with feature importances:
featimp = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False)
print(featimp)
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20, max_depth=20, oob_score = True)
predictor_var = ['fet1','fet2','fet3','fet4']
classification_model(model,train,predictor_var,outcome_var)
In Random Forest it is very easy to overfit. To resolve this you need to do parameter search a little more rigorously to know the best parameter to use. [Here](http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html
) is the link on how to do this: (from the scikit doc).
It is overfitting and you need to search for the best parameter that will work work on the model. The link provides implementation for Grid and Randomized search for hyper parameter estimation.
And it will also be fun to go through this MIT Artificial Intelligence lecture to get get deep theoretical orientation: https://www.youtube.com/watch?v=UHBmv7qCey4&t=318s.
Hope this helps!

How to SVM Train my Edge images using Java code

I have set of images on which I performed edge detection using OpenCV 3.1. The edges are stored in MAT of OpenCV. Can someone help me in processing for Java SVM train and test code on those set of images ?
Following discussion in comments I am providing you with an example project which I built for android studio a while back.
This was used to classify images depending on Lab color spaces.
//1.a Assign the parameters for SVM training here
double nu = 0.999D;
double gamma = 0.4D;
double epsilon = 0.01D;
double coef0 = 0;
//kernel types are Linear(0), Poly(1), RBF(2), Sigmoid(3)
//For Poly(1) set degree and gamma
double degree = 2;
int kernel_type = 4;
//1.b Create an SVM object
SVM B_channel_svm = SVM.create();
B_channel_svm.setType(104);
B_channel_svm.setNu(nu);
B_channel_svm.setCoef0(coef0);
B_channel_svm.setKernel(kernel_type);
B_channel_svm.setDegree(degree);
B_channel_svm.setGamma(gamma);
B_channel_svm.setTermCriteria(new TermCriteria(2, 10, epsilon));
// Repeat Step 1.b for the number of SVMs.
//2. Train the SVM
// Note: training_data - If your image has n rows and m columns, you have to make a matrix of size (n*m, o), where o is the number of labels.
// Note: Label_data is same as above, n rows and m columns, make a matrix of size (n*m, o) where o is the number of labels.
// Note: Very Important - Train the SVM for the entire data as training input and the specific column of the Label_data as the Label. Here, I train the data using B, G and R channels and hence, the name B_channel_SVM. I make 3 different SVM objects separately but you can do this by creating only one object also.
B_channel_svm.train(training_data, Ml.ROW_SAMPLE, Label_data.col(0));
G_channel_svm.train(training_data, Ml.ROW_SAMPLE, Label_data.col(1));
R_channel_svm.train(training_data, Ml.ROW_SAMPLE, Label_data.col(2));
// Now after training we "predict" the outcome for a sample from the trained SVM. But first, lets prepare the Test data.
// As above for the training data, make a matrix of (n*m, o) and use the columns to predict. So, since I created 3 different SVMs, I will input three separate matrices for the three SVMs of size (n*m, 1).
//3. Predict the testing data outcome using the trained SVM.
B_channel_svm.predict(scene_ml_input, predicted_final_B, StatModel.RAW_OUTPUT);
G_channel_svm.predict(scene_ml_input, predicted_final_G, StatModel.RAW_OUTPUT);
R_channel_svm.predict(scene_ml_input, predicted_final_R, StatModel.RAW_OUTPUT);
//4. Here, predicted_final_ are matrices which gives you the final value as in Label(0,1,2... etc) for the input data (edge profile in your case)
Now, I hope you have an idea for how SVM works. You basically need to do these steps:
Step 1: Identify labels - In your case Gestures from edge profile.
Step 2: Assign values to the labels - For example, if you are trying to classify haptic gestures - Open Hand = 1, Closed Hand/Fist = 2, Thumbs up = 3 and so on.
Step 3: Prepare the training data (edge profiles) and Labels (1,2,3) etc. according to the process above.
Step 4: Prepare data for prediction using the transformation calculated using SVM.
Very Important for SVM on OpenCV - Normalize your data, make sure you all matrices are of Same Type - CvType
Hope it helps. Feel free to ask questions if you have any doubts and post what you have tried. I can solve the problem for you if you send me some images but then you won't learn anything right? ;)

Resources