Creating an arff file for Weka - attributes

I am creating an arff file and loading it into Weka and when I try to run a J48 decision tree i am getting the following error message:
Can't have more folds than instances
Below is the arff file code that I have got so far.
#relation correlation
#attribute Exercise {StarJumps, Swimming, Weightlift}
#attribute mean real
#attribute median real
#attribute mode real
#attribute variance real
#data
StarJumps,-35.1736999860234,-38.3397969100000,-78.7680334500000,1640.14992832077
StarJumps,12.8784175778633,11.2917098850000,-14.6784661500000,198.409868585395
StarJumps,-9.46453776621485,-4.66403639400000,-77.4871379300000,914.608427610169
Swimming,-22.0052249449766,-21.9835538100000,-61.8976363600000,184.991150374811
Swimming,27.4404695437188,31.7069603200000,-38.8908675200000,571.188153279291
Swimming,-23.0069047690899,-23.7253122400000,-65.0011242700000,312.565133535617
Weightlift,-29.9352298211914,-29.2759990400000,-77.3315111600000,739.673719008139
Weightlift,-2.73629441549609,-1.20216138950000,-42.4630638500000,208.415460430934
Weightlift,6.59529078057812,15.0662737500000,-77.0678713200000,982.853303465782
Any help would be appreciated!

this is because you are using a Cross-validation Folds higher than the number of instances (9) use 9 or less. You can also increase the number of instances. I would recommend to put the Class at the end for your data structure. Check:
https://weka.wikispaces.com/ARFF+%28stable+version%29
Cheers

Related

(could not convert string to float) error while using knn algorithm

Dataset
So I am trying to implement KNN classification algorithm but am facing an error while trying to fit the model. Please help guys I am a beginner.
Error
Data type of columns
The numeric columns which are of object type are the ones giving the error cause when I try to fit without them then its working. How to convert them
You can't have any non-numerique features on your dataset. You should use encoding for all of your non-numerique features.
Scikit Learn Preprocessing

Where can I get CoNLL-X training data?

I'm trying to train the Stanford Neural Network Dependency Parser to check phrase similarity.
The way I tried is:
java edu.stanford.nlp.parser.nndep.DependencyParser -trainFile trainPath -devFile devPath -embedFile wordEmbeddingFile -embeddingSize wordEmbeddingDimensionality -model modelOutputFile.txt.gz
The error that I got is:
Train File: C:\Users\rohit\Downloads\CoreNLP-master\CoreNLP-master\data\edu\stanford\nlp\parser\trees\en-onetree.txt
Dev File: null
Model File: modelOutputFile.txt.gz
Embedding File: null
Pre-trained Model File: null
################### Train
#Trees: 1
0 tree(s) are illegal (0.00%).
1 tree(s) are legal but have multiple roots (100.00%).
0 tree(s) are legal but not projective (0.00%).
###################
#Word: 3
#POS:3
#Label: 2
###################
#Transitions: 3
#Labels: 1
ROOTLABEL: null
Random generator initialized with seed 1459831358061
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.parser.nndep.Util.scaling(Util.java:49)
at edu.stanford.nlp.parser.nndep.DependencyParser.readEmbedFile. (DependencyParser.java:636)
at edu.stanford.nlp.parser.nndep.DependencyParser.setupClassifierForTraining(DependencyParser.java:787)
at edu.stanford.nlp.parser.nndep.DependencyParser.train(DependencyParser.java:676)
at edu.stanford.nlp.parser.nndep.DependencyParser.main(DependencyParser.java:1247)
The help embedded within the code says that the training file should be a - "Path to a training treebank in CoNLL-X format".
Does anyone know where I can find some CoNLL-X training data to train?
I gave training file but not embedding file and got this error.
My guess is if I give the embedding file it might work.
Please shed some light on which training file & embedding file I should use and where I can find them.
CoNLL-X treebanks
You can get the training data for Danish, Dutch, Portuguese, and Swedish available for free here. For other languages, you'll probably need to license a treebank from LDC, unfortunately (details for many languages on that page).
Universal Dependencies are in CoNLL-U format, which can usually be converted to CoNLL-X format with some work.
Lastly, there's a large list of treebanks and their availability on this page. You should be able to convert many of the dependency treebanks in this list into CoNLL-X format if they're not already in that format.
Training the Stanford Neural Net Dependency parser
From this page: The embedding file is optional, but the treebank is not. The best treebank and embedding files to use depend on which language and type of text you'd like to parse. Ideally, you would train on as much data as possible in the domain/genre that you're trying to parse.

Rapid Miner NeuralNet polynomial

I'm usig RapidMiner for the first time. I have a dataset (in .xlsx format) on which I want to run the neural network algorithm. I am getting this error;
The operator NeuralNet does not have sufficient capabilities for the given data set; polynomial attributes not supported
Any help about this please?
Thank in advance!
Per the Neural Net operator's Help file...
...This operator cannot handle polynominal attributes.
Your given input file has several binominal and polynominal attributes. Therefore, if you wish to use the out of the box Neural Net operator, you need to convert your nominal data to numerical data. One way of doing this within RapidMiner is with the Nominal to Numerical operator.
Always be cognizant of the type of data/attribute you are maniuplating: (1) text, (2) numeric, and (3) nominal.

how can i use weka to terminology extraction?

i need to extract domain-specific terms from a big training corpus, such as political terms or etc .how can i use Weka and it's filters to aim this object?
can i use feature vector produced by StringToVector() filter in Weka to do this or not?
You can at least partly, as far as you have an appropriate dataset. For instance, let us assume you have a dataset like this one:
#relation test
#attribute text String
#attribute politics {yes,no}
#attribute religion {yes,no}
#data
"this is a text about politics",yes,no
"this text is about religion",no,yes
"this text mixes everything",yes,yes
For instance, for getting terms about politics, you can:
Remove the religion attribute.
Apply the StringToWordVector filter to the text attribute to get terms.
Apply the AttributeSelection filter with Ranker and InfoGainAttributeEval to get the top ranked terms.
This latter step will give you a list of terms that are most predictive for the politics category. Most of them will be terms in the politics domain (although it is possible that some terms are predictive but just because they are not in the politics domain - that is, they provide negative evidence).
The quality of the terms you get depens on the dataset. The more topics it deals with, the better for your results; so instead of having two classes (politics, religion, like in my dataset), it is much better to have plenty of them and many examples for each category.

unary class text classification in weka?

I have a training dataset (text) for a particular category (say Cancer). I want to train a SVM classifier for this class in weka. But when i try to do this by creating a folder 'cancer' and putting all those training files to that folder and when i run to code i get the following error:
weka.classifiers.functions.SMO: Cannot handle unary class!
what I want to do is if the classifier finds a document related to 'cancer' it says the class name correctly and once i fed a non cancer document it should say something like 'unknown'.
What should I do to get this behavior?
The SMO algorithm in Weka only does binary classification between two classes. Sequential Minimal Optimization is a specific algorithm for solving an SVM and in Weka this a basic implementation of this algorithm. If you have some examples that are cancer and some that are not, then that would be binary, perhaps you haven't labeled them correctly.
However, if you are using training data which is all examples of cancer and you want it to tell you whether a future example fits the pattern or not, then you are attempting to do one-class SVM, aka outlier detection.
LibSVM in Weka can handle one-class svm. Unlike the Weka SMO implementation, LibSVM is a standalone program which has been interfaced into Weka and incorporates many different variants of SVM. This post on the Wekalist explains how to use LibSVM for this in Weka.

Resources