SVM: Why are all my training examples getting categorised as error support vectors? - svm

I am training an online svm using 20 training examples at a time. However, it turns out that the svm classifier is categorising all 20 of them as Error Support Vectors. Can someone explain why this might be happening?
The dimensions of each example is 1024x1.
I am using Dieh'l code for Online SVM learning

Related

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation

How to properly use BERT in keras for classification

I am having an issue with using BERT for classification of text within my database. Previously, I have used GLoVE and ELMo that work quite ok. Also Random forests give me quite good F1-scores (over 0.85), however, when using BERT, I am stuck around 0.55. I was trying to modify learning rate for Adam optimizer, used anything between 0.001 to 0.000001, but nothing really helps.
This is my code: https://github.com/EuropeanSocialInnovationDatabase/ESID-main/blob/development/TextMining/Classifiers/DatabaseWithKickStarter/NNClassifierTest2.py
If anyone can pin the problem down, I would be really grateful.

Logistic regression overfits even using cross validation in sklearn?

I am implementing a logistic regression model using sklearn, for a text classification competition on Kaggle.
When I use unigram, there are 23,617 features. The best mean_test_score Cross validation search (sklearn's GridSearchCV) gives me is similar to the score I got from Kaggle, using the best model.
There are 1,046,524 features if I use bigram. GridSearchCV gives me a better mean_test_score compared to unigram, but using this new model I got a much much lower score on Kaggle.
I guess the reason might be overfitting, since I have too many features. I have tried to set the GridSearchCV using 5-fold, or even 2-fold, but the scores are still inconsistent.
Does it really indicate my second model is overfitting, even in the validation stage? If so, how can I tune the regularization term for my logistic model using sklearn? Any suggestions are appreciated!
Assuming you are using sklearn. You could try looking into using the tuning parameters max_df, min_df, and max_features. Throwing these into a GridSearch may take a long time but you will likely get some interesting results back. I know these features are implemented in the sklearn.feature_extraction.text.TfidfVectorizer, but I am sure they use them elsewhere as well. Essentially the idea is that including too many grams can lead to overfitting, same thing with having too many grams with low or high document frequencies.

Train and predict using SVM theory

I have implemented character recognition using a library
but I still don't get how SVM theory works in training and prediction process, I just understand SVM is only finding the hyperplane
E.g., suppose I have a training image as follows
image from google, number zero
How do we find hyperplane for each training data like above?
How is the prediction process is done?
How can the SVM classify the data based on those hyperplane?
Thank you very much if you can help me
You can use opencv and python.Opencv has implemented svm and you can use it by function call.
SVM is machine leraning model for data classification.We can use SVM to classify images.the steps are
you must have a training dataset(a dataset of images whose labels are known)
Extract features [features are color,shape,hog,surf,sift etc..] from that images and store that,also store the assosiated labels
then train svm using these datas
Now you can use svm to predict labels of unkonwn images
this link will help you
First, It is a non linear separable problem you have to implement kernel SVM which projects them into higher dimensional space where it becomes linearly separable. You can use sklearn library to achieve the above.

How to achive the importance of each variable for SVM after classification?

I have two classes and several variables. After training the SVM, it gives me a good accuracy on prediction of testing data classes. Does anybody know how can I find out which of my variables are less important in the prediction done by SVM ? I'm nearly new in SVM and I'm just familiar with console interface and matlab interface of SVM. Is there any option to achive the importance of variables for SVM after training or prediction phase ?

Resources