class index differ error in weka - nlp

I want to do text classification with weka. I have a train and a test file (Persian language). first I load the train file and then choose "string to word vector" in preprocess. And because of choosing that, the class position goes to the start. For moving the class to its index (which is 2 in the files), I can go either to "Edit" part and right click on the class column and choose "attribute as class" or just in classify menu, choose (NOM)class. (unless most of the algorithms would be inactive). I run SMO and save the model. The problem is, after opening the test file, and click on "re-evaluate the model on current test set", this error occurs that, "...class index differ: 1!=2". I know it is because after opening the test file, again the class column goes to the start. For train part I solved the problem as I described above. But how can I solve it for the test part, too?
sample train file:
sample test file:

You should use the same transformation(s) on your testset before you use it to evaluate a trained model. When using the GUI, you could use the preprocessor view from the explorer, apply the same transformations by hand and than save the set to a new arff file. When you want to conduct a series of experiment, I suggest writing a routine that does your transformation for you.
That would look a little something like this:
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
import weka.filters.unsupervised.attribute.Reorder;
import weka.filters.unsupervised.attribute.NumericToNominal;
import java.io.File;
public class DataConverter
{
public static void Convert(String sourcepath,String destpath) throws Exception
{
CSVLoader loader = new CSVLoader();
loader.setSource(new File(sourcepath));
Instances data = loader.getDataSet();
Remove remove = new Remove();
remove.setOptions(weka.core.Utils.splitOptions("-R 1"));
remove.setInputFormat(data);
data = Filter.useFilter(data, remove);
Reorder reorder = new Reorder();
reorder.setOptions(weka.core.Utils.splitOptions("-R first-29,31-last,30"));
reorder.setInputFormat(data);
data = Filter.useFilter(data, reorder);
NumericToNominal ntn = new NumericToNominal();
ntn.setOptions(weka.core.Utils.splitOptions("-R first,last"));
ntn.setInputFormat(data);
data = Filter.useFilter(data, ntn);
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(destpath));
//saver.setDestination(new File(destpath));
saver.writeBatch();
}
public static void main(String args[]) throws Exception
{
File folder = new File("..\\..\\data\\output\\learning\\csv\\");
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
String target = listOfFiles[i].getName();
target = target.substring(0, target.lastIndexOf("."));
System.out.println("converting file " + (i + 1) + "/" + listOfFiles.length);
Convert("..\\..\\data\\output\\learning\\csv\\" + listOfFiles[i].getName(), "..\\..\\data\\output\\learning\\arff\\" + target + ".arff");
}
}
}
}
Also: The reorder filter can help you place your target class at the end of the file. It takes a new order of the old indices as arguments. In this case you could apply Reorder -R 2-last,1

First, I changed the files to vector based on 1000 most frequent words in train file and made a numeric arff file for the train and test file, then for both of them in the "classify" menu in "Test options" I chose "(Nom) class.

Related

How to process tree that i got from syntaxnet?(conll format)

I guess that i need Semgrex from edu.stanford.nlp package. For this task i need to construct Tree from edu.stanford.nlp.trees.Tree and process that tree like
import edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.semgraph.SemanticGraphFactory;
public class SemgrexDemo {
public static void main(String[] args) {
Tree someHowBuiltTree;//idnt know how to construct Tree from conll
SemanticGraph graph = SemanticGraphFactory.generateUncollapsedDependencies(someHowBuiltTree);
SemgrexPattern semgrex = SemgrexPattern.compile("{}=A <<nsubj {}=B");
SemgrexMatcher matcher = semgrex.matcher(graph);
}
}
Actually i need some suggestions about how to constract tree from conll.
You want to load a SemanticGraph from your CoNLL file.
import edu.stanford.nlp.trees.ud.ConLLUDocumentReader;
...
CoNLLUDocumentReader reader = new CoNLLUDocumentReader();
Iterator<SemanticGraph> it = reader.getIterator(IOUtils.readerFromString(conlluFile));
This will produce an Iterator that will give you a SemanticGraph for each sentence in your file.
It is an open research problem to generate a constituency tree from a dependency parse, so there is no way in Stanford CoreNLP to do that at this time to the best of my knowledge.

How to convert soap xml response to delimited

I don't know hardly anything about XML. I have successfully gotten a SOAP response (using things like SOAPUi and Boomerang) from an asmx web service. It's a large file.
Now I need to get it to regular delimited columns. Is there a simple way to do this?
My file is attached here
Not sure if it is required one time transformation or do this job quite frequently.
So, adding the answer here with some more details.
Approach #1: Using on-line
As mentioned in the comments, you can use the on-line site to convert your xml data into csv.
Even it requires to do some pre-process with the message / response that you have i.e.,
save the data into file
remove headers or unwanted data etc or make it ready to be usable in the above mentioned online site.
The disadvantages in this approaches
requires some manual work
expose data on public, but at times may be possible to share
time taking
can not use it an automated fashion
difficult to repeat
Approach #2: Using Groovy Script
So, this approach addresses the disadvantages of #1 approach.
Here is the Groovy Script which reads previous soap request step's response, and gives the data into a csv file.
In your test case, add a new groovy script test step right after the soap request step which gives you the data and copy below script content into it. i.e., (Test Case -> Step 1: Soap Request where you are getting responseStep 2: Groovy Script (with below script))
Add a test case custom property, say OUTPUT_FILE_NAME and provide the file path for csv to be saved at. Even, if you do not provide this property, it will automatically saves the csv file chargedata.csv under System temp directory.
You may find the comments in-line
/**
* this script will read the previous step response
* extract the cdata at the given xpath
* read all the records and transfroms into csv file
**/
import com.eviware.soapui.support.XmlHolder
import groovy.xml.*
/**Define the output file name in test case custom property say OUTPUT_FILE_NAME and value as absolute file path
* otherwise, it write a file chargedata.csv in system temp directory
**/
def outputFileName = context.testCase.getPropertyValue('OUTPUT_FILE_NAME') ?: System.getProperty("java.io.tmpdir")+ '/chargedata.csv'
//csv field separator - change it if needed
def delimiter = ','
/**
* Below statement will fetch the previous request step response.
*/
def response = context.testCase.testStepList[context.currentStepIndex - 1].testRequest.response.responseContent
//Create the xml holder object to get the xpath value which cdata in this case
def responseHolder = new XmlHolder(response)
def xpath = '//*:Charges_FileResponse/*:Charges_FileResult'
//Get the cdata part from above xpath which is a string
def data = responseHolder.getNodeValue(xpath)
//This again parses the xml inside of cdata
def chargeRecords = new XmlParser().parseText(data)
//This is going hold all the data from ChargeRecords
def chargeRecordsDataStructure = []
//This is to hold all the headers
def headers = [] as Set
/**
* This is to create Charge data
**/
def buildChargeDataStructure = { charge ->
def chargeDataStructure = new Expando()
charge.children().each {
def elementName = it.name()
def elementText = it.value().join()
chargeDataStructure[elementName] = elementText
//Add to field name to the list if not already added
(elementName in headers) ?: headers << elementName
}
chargeDataStructure
}
/**
* this is to create a csv row in string format
**/
def createRow = { recordDataStructure ->
def row = new StringBuffer()
headers.each {
if (row) {
row += delimiter + recordDataStructure[it] ?: ''
} else {
row += recordDataStructure[it] ?: ''
}
}
row.toString()+'\n'
}
//Build the whole data structure of Charge Records
chargeRecords.Charge.each { charge ->
chargeRecordsDataStructure << buildChargeDataStructure( charge )
}
//Build the rows
def rows = new StringBuffer()
rows << headers.join(',') +'\n'
chargeRecordsDataStructure.each { rows << createRow (it)}
//Write the rows into file
new File(outputFileName).text = rows

How read all files in the folder and replace the pattern in file using Groovy

import groovy.io.FileType
import java.io.File;
def list = []
def dir = new File("C:\\Users\\Desktop\\CodeTest")
dir.eachFileRecurse (FileType.FILES)
{
file ->list << file
}
list.each
{
println it.path
}
//Replace the pattern in file and write to file sequentially.
def replacePatternInFile(file, Closure replaceText)
{
file.write(replaceText(file.text))
}
def file = new File(file)
def patternToFind1 = ~/</
def patternToFind2 = ~/>/
def patternToReplace1 = '&lt'
def patternToReplace2 = '&gt'
//Call the method
replacePatternInFile(file){
it.replaceAll(patternToFind1,patternToReplace1)
}
replacePatternInFile(file){
it.replaceAll(patternToFind2,patternToReplace2)
}
println file.getText()
I am able to change the pattern for one file but I want to read all the files in the folder and replace the pattern in each file one by one
while executing it:
ERROR:An error occurred [Could not find matching constructor for: java.io.File(java.util.ArrayList)], see error log for details
You have many problems with your code...
1) You don't need to import:
import java.io.File;
2) When you call:
def file = new File(file)
There is no variable called file in new File(file) (did you mean files?)
3) If you did mean new File(files) then that is where your error is... You can't make a new file from a list of Strings
4) The entity for > is > NOT &gt... The same for < (it needs a semicolon at the end)
You will need to iterate your list of Strings (files.each { path -> ?) and then work on each one in turn.
Though 2) and 3) make me suspect that the above code isn't your real code, but a pretend copy from memory (or a badly redacted copy), as the above code will not give you the error you say you're getting

Create a self updating string based on files in a folder in Processing

Alright, so I was messing around with a simple fft visualization in Processing and thought it would be fun to have more than just one song playing every time. In the end I added 3 songs manually and on mouse click change between the songs randomly using a predefined string. I wanted to add the rest of my computers music, but every time I would want to add a new song to my sketch I would have to copy and paste it's name into the string in my sketch. Seems like a lot of unnecessary work
Is there a way to have processing scan a folder, recognize how many files are inside, and copy all of the file names into the string? I found a library called sDrop for processing 1.1 which lets you drag and drop files into the sketch directly. However, that doesn't seem to exist anymore in version 2+ of Processing.
Here is the simple version of my current working code to play the music:
import ddf.minim.spi.*;
import ddf.minim.signals.*;
import ddf.minim.*;
import ddf.minim.analysis.*;
import ddf.minim.ugens.*;
import ddf.minim.effects.*;
AudioPlayer player;
Minim minim;
String [] songs = {
"song1.mp3",
"song2.mp3",
"song3.mp3",
};
int index;
void setup() {
size(100, 100, P3D);
index = int(random(songs.length));
minim = new Minim(this);
player = minim.loadFile(songs[index]);
player.play();
}
void draw() {
}
void mouseClicked() {
index = int(random(songs.length));
player.pause();
player = minim.loadFile(songs[index]);
player.play();
}
If anyone has suggestions or could guide me towards a good tutorial that would be great. Thanks!
Assuming you're using this in Java mode, then you can use the Java API: https://docs.oracle.com/javase/8/docs/api/
The Java API contains a File class that contains several methods for reading the contents of a directory: https://docs.oracle.com/javase/8/docs/api/java/io/File.html
Something like this:
ArrayList<String> songs = new ArrayList<String>();
File directory = new File("path/to/song/directory/");
for(File f : directory.listFiles()){
if(!f.isDirectory()){
songs.add(f.getAbsolutePath());
}
}
Googling "java list files of directory" will yield you a ton of results.
Just to add to Kevin's Workman's answer:
Try to use File.separator instead of "/" or "\". It does the same thing, but it figures out the right based on the OS you're using for you, so you can move you sketch on other computers and still have it working.
Check out Daniel Shiffman's that comes with Processing in Examples > Topics > File IO > DirectoryList

folder hierarchy in flash cs5.5

I'm trying to create a folder hierarchy in flash, the folders i have are
C:\uk\ac\uwe\webgames\math
in the math folder i have the following file called GameMath.as
package uk.ac.uwe.webgames.math{
public class GameMath {
// ------- Constructor -------
public function GameMath() {
}
// ------- Properties -------
const PI:Number = Math.PI;
// ------- Methods -------
public function areaOfCircle(radius:Number):Number {
var area:Number;
area = PI * radius * radius;
return area;
}
}
}
In the webgames folder i have a file called webgames_driver.fla
import uk.ac.uwe.webgames.math.GameMath;
import flash.text.TextField;
// Create a GameMath instance
var output:TextField = new TextField();
var aGameMathInstance:GameMath = new GameMath();
// you will need to create a dynamic textfield called
// output on the stage to display method return value
output.text=aGameMathInstance.areaOfCircle(5).toString();
addChild(output);
//trace(aGameMathInstance.areaOfCircle(1))
however i am getting the following errors
Scene 1, Layer 'Layer 1', Frame 1, Line 1 1172: Definition
uk.ac.uwe.webgames.math:GameMath could not be found.
Scene 1, Layer 'Layer 1', Frame 1, Line 1 1172: Definition
uk.ac.uwe.webgames.math:GameMath could not be found.
Scene 1, Layer 'Layer 1', Frame 1, Line 5 1046: Type was not found or
was not a compile-time constant: GameMath.
Scene 1, Layer 'Layer 1', Frame 1, Line 5 1180: Call to a possibly
undefined method GameMath.
Could anyone help coz i am just stuck, and i'm really new to flash
I'll put this in as basic and detailed terms as possible, not just for your benefit, but for anyone else reading this who isn't terribly experienced with custom classes. Better to get it all out there now and avoid confusion. (I know I wish some people had given me this level of detail on some of my early questions...)
The import code is for importing an .as class. As you know, the top of a class, you'd have code something like this (except from my own custom class, Trailcrest).
package trailcrest
{
public class sonus
{
Then, in my .fla or an .as file, I can use
import trailcrest.sonus;
I will mention that your .fla must be in the main directory that contains all the custom classes you want to import. My file layout is something like this (folders in parenthesis):
MyProject.fla
MyDocumentClass.as
(trailcrest)
sonus.as
Note that my package name corresponds with the folder structure, with the folder containing the .fla being assumed as the starting place by the code. If I wanted to use a package name like trailcrest.v1, the folders would have to be like this:
MyProject.fla
MyDocumentClass.as
(trailcrest)
(v1)
sonus.as
Then, I'd refer to my custom class using
import trailcrest.v1.sonus;
Note that MyProject.fla MUST be at the main directory of that folder structure. This is because Flash cannot search backwards through the folders, only forwards. So if I had a structure like...
(project)
MyProject.fla
MyDocumentClass.as
(trailcrest)
sonus.as
...then, the line of code...
import trailcrest.sonus;
...would search for the path "\project\trailcrest\sonus.as", which as you can see, doesn't exist. Flash isn't able to go to the parent folder of "\project\".
Your line of code...
import uk.ac.uwe.webgames.math.GameMath;
...is looking for the path "webgames\uk\ac\use\webgames\math\GameMath.as". (Remember, the code assumes the folder containing the .fla as the starting place, so the code is literally trying to go to "C:\uk\ac\uwe\webgames\uk\ac\use\webgames\math\GameMath.as")
To fix this, you'll need to change the package for GameMath.as:.
package math{
...and the import statement in your code:
import math.GameMath;
This will point everything to the literal path "C:\uk\ac\uwe\webgames\math\GameMath.as"
I hope this answers your question!

Resources