Exception on instance weights when Using SVM and Weka from my own JAVA Code - svm

I'm writing my own JAVA code using WEKA and LibSVM.
I'm using weka 3.8.3, libsvm-3.23 and libsvm-1.0.6
I get the following error when the method buildClassifier of LibSVM is executed (SVMMy extends LibSVM)
SEVERE: null
weka.core.WekaException: mycode.SVMMy: Some instance weights are not equal to 1 and scheme does not implement the WeightedInstancesHandler interface!
at weka.core.Capabilities.test(Capabilities.java:1307)
at weka.core.Capabilities.test(Capabilities.java:1138)
at weka.core.Capabilities.testWithFail(Capabilities.java:1468)
at weka.classifiers.functions.LibSVM.buildClassifier(LibSVM.java:1652)
Can you explain me what is the problem?
The same object Instances is previously correctly classified by a Random Forest.
Thank you a lot.

LibSVM cannot handle instance weights, but J48 can. See http://weka.sourceforge.net/doc.stable/weka/core/WeightedInstancesHandler.html for classifiers that can handle instance weights.

Related

Can I build a one class cnn in keras?

Can I build a cnn in keras with only one class (class - 0) so it can predict if the given date belongs to this class?
Thanks in advance
Edite :Thanks for the answer and comments so far. My data is acceleration time series from a healthy structure but I don't have access to damaged state acceleration signals, so I have only data for class 0.
I believe what you're describing is an anomaly detection model. Other ML models exist for this purpose, such as the one class support vector machine (https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html) and isolation forest (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html). It's possible to implement a neural network, but you will need to have a customized loss function - as in, binary cross-entropy doesn't make sense for this application. One example of such a loss function is described here: https://arxiv.org/pdf/1802.06360.pdf which is based on the one class SVM.
I have an implementation of a one class fully connected network here in Keras: https://github.com/danielenricocahall/One-Class-NeuralNetwork which utilizes a loss function based on the one described in that paper, if that helps.
Good luck!

XGBoost get classifier object form booster object?

I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.

Can I extract significane values for Logistic Regression coefficients in pyspark

Is there a way to get the significance level of each coefficient we receive after we fit a logistic regression model on training data?
I was trying to find out a way and could not figure out myself.
I think I may get the significance level of each feature if I run chi sq test but first of all not sure if I can run the test on all features together and secondly I have numeric data value so if it will give me right result or not that remains a question as well.
Right now I am running the modeling part using statsmodel and scikit learn but certainly, want to know, how can I get these results from PySpark ML or MLLib itself
If anyone can shed some light, it will be helpful
I use only mllib, I think that when you train a model you can use toPMML method to export your model un PMML format (xml file), then you can parse the xml file to get features weights, here an example
https://spark.apache.org/docs/2.0.2/mllib-pmml-model-export.html
Hope that will help

Parameter tuning for 1-class classification with LibSVM in weka

I am doing a 1-class classification with LibSVM wrapper in Weka. But the problem is during TESTING, even if I use the same TRAINING instances, I see most of them are classified as outliers (NaN) which is unreasonable (how this can happen?). If this is something to deal with parameter tuning, what parameters should I try tweaking?
A classifier needs at least two class values to "work". If all you have is labeled data with one label value(your one class value), then you need to get data that is not part of that class so that a classifier can function.

unary class text classification in weka?

I have a training dataset (text) for a particular category (say Cancer). I want to train a SVM classifier for this class in weka. But when i try to do this by creating a folder 'cancer' and putting all those training files to that folder and when i run to code i get the following error:
weka.classifiers.functions.SMO: Cannot handle unary class!
what I want to do is if the classifier finds a document related to 'cancer' it says the class name correctly and once i fed a non cancer document it should say something like 'unknown'.
What should I do to get this behavior?
The SMO algorithm in Weka only does binary classification between two classes. Sequential Minimal Optimization is a specific algorithm for solving an SVM and in Weka this a basic implementation of this algorithm. If you have some examples that are cancer and some that are not, then that would be binary, perhaps you haven't labeled them correctly.
However, if you are using training data which is all examples of cancer and you want it to tell you whether a future example fits the pattern or not, then you are attempting to do one-class SVM, aka outlier detection.
LibSVM in Weka can handle one-class svm. Unlike the Weka SMO implementation, LibSVM is a standalone program which has been interfaced into Weka and incorporates many different variants of SVM. This post on the Wekalist explains how to use LibSVM for this in Weka.

Resources