How to process tree that i got from syntaxnet?(conll format) - nlp

I guess that i need Semgrex from edu.stanford.nlp package. For this task i need to construct Tree from edu.stanford.nlp.trees.Tree and process that tree like
import edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.semgraph.SemanticGraphFactory;
public class SemgrexDemo {
public static void main(String[] args) {
Tree someHowBuiltTree;//idnt know how to construct Tree from conll
SemanticGraph graph = SemanticGraphFactory.generateUncollapsedDependencies(someHowBuiltTree);
SemgrexPattern semgrex = SemgrexPattern.compile("{}=A <<nsubj {}=B");
SemgrexMatcher matcher = semgrex.matcher(graph);
}
}
Actually i need some suggestions about how to constract tree from conll.

You want to load a SemanticGraph from your CoNLL file.
import edu.stanford.nlp.trees.ud.ConLLUDocumentReader;
...
CoNLLUDocumentReader reader = new CoNLLUDocumentReader();
Iterator<SemanticGraph> it = reader.getIterator(IOUtils.readerFromString(conlluFile));
This will produce an Iterator that will give you a SemanticGraph for each sentence in your file.
It is an open research problem to generate a constituency tree from a dependency parse, so there is no way in Stanford CoreNLP to do that at this time to the best of my knowledge.

Related

class index differ error in weka

I want to do text classification with weka. I have a train and a test file (Persian language). first I load the train file and then choose "string to word vector" in preprocess. And because of choosing that, the class position goes to the start. For moving the class to its index (which is 2 in the files), I can go either to "Edit" part and right click on the class column and choose "attribute as class" or just in classify menu, choose (NOM)class. (unless most of the algorithms would be inactive). I run SMO and save the model. The problem is, after opening the test file, and click on "re-evaluate the model on current test set", this error occurs that, "...class index differ: 1!=2". I know it is because after opening the test file, again the class column goes to the start. For train part I solved the problem as I described above. But how can I solve it for the test part, too?
sample train file:
sample test file:
You should use the same transformation(s) on your testset before you use it to evaluate a trained model. When using the GUI, you could use the preprocessor view from the explorer, apply the same transformations by hand and than save the set to a new arff file. When you want to conduct a series of experiment, I suggest writing a routine that does your transformation for you.
That would look a little something like this:
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
import weka.filters.unsupervised.attribute.Reorder;
import weka.filters.unsupervised.attribute.NumericToNominal;
import java.io.File;
public class DataConverter
{
public static void Convert(String sourcepath,String destpath) throws Exception
{
CSVLoader loader = new CSVLoader();
loader.setSource(new File(sourcepath));
Instances data = loader.getDataSet();
Remove remove = new Remove();
remove.setOptions(weka.core.Utils.splitOptions("-R 1"));
remove.setInputFormat(data);
data = Filter.useFilter(data, remove);
Reorder reorder = new Reorder();
reorder.setOptions(weka.core.Utils.splitOptions("-R first-29,31-last,30"));
reorder.setInputFormat(data);
data = Filter.useFilter(data, reorder);
NumericToNominal ntn = new NumericToNominal();
ntn.setOptions(weka.core.Utils.splitOptions("-R first,last"));
ntn.setInputFormat(data);
data = Filter.useFilter(data, ntn);
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(destpath));
//saver.setDestination(new File(destpath));
saver.writeBatch();
}
public static void main(String args[]) throws Exception
{
File folder = new File("..\\..\\data\\output\\learning\\csv\\");
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
String target = listOfFiles[i].getName();
target = target.substring(0, target.lastIndexOf("."));
System.out.println("converting file " + (i + 1) + "/" + listOfFiles.length);
Convert("..\\..\\data\\output\\learning\\csv\\" + listOfFiles[i].getName(), "..\\..\\data\\output\\learning\\arff\\" + target + ".arff");
}
}
}
}
Also: The reorder filter can help you place your target class at the end of the file. It takes a new order of the old indices as arguments. In this case you could apply Reorder -R 2-last,1
First, I changed the files to vector based on 1000 most frequent words in train file and made a numeric arff file for the train and test file, then for both of them in the "classify" menu in "Test options" I chose "(Nom) class.

Error: Annotator "sentiment" requires annotator "binarized_trees"

Could any one help me when this error can happen. Any idea is really appreciated. Do I need to add anything, any annotator. Is this an issues with the data or the model that i am passing apart from default model.
i am using Standford NLP 3.4.1 to do the sentiment calculation for social media data. When i run it through spark/scala job i am getting this following error for some data.
java.lang.IllegalArgumentException: annotator "sentiment" requires annotator "binarized_trees"
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:300)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
at com.pipeline.sentiment.NonTwitterSentimentAndThemeProcessorAction$.create(NonTwitterTextEnrichmentComponent.scala:142)
at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action$lzycompute(NonTwitterTextEnrichmentComponent.scala:52)
at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action(NonTwitterTextEnrichmentComponent.scala:50)
at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action(NonTwitterTextEnrichmentComponent.scala:49)
here is the code i have in scala
def create(features: Seq[String] = Seq("tokenize", "ssplit", "pos","parse","sentiment")): TwitterSentimentAndThemeAction = {
println("comes inside the TwitterSentimentAndThemeProcessorAction create method")
val props = new Properties()
props.put("annotators", features.mkString(", "))
props.put(""pos.model", "tagger/gate-EN-twitter.model");
props.put("parse.model", "tagger/englishSR.ser.gz");
val pipeline = new StanfordCoreNLP(props)
Any help is really appreciated. Thanks for the help
...Are you sure this is the error you get? With your code, I get an error
Loading parser from serialized file tagger/englishSR.ser.gz ...edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "tagger/englishSR.ser.gz" as either class path, filename or URL
This makes much more sense. The shift reduce parser models lives at edu/stanford/nlp/models/srparser/englishSR.ser.gz. If I don't use the shift reduce model, the code as written works fine for me; likewise if I include the model path above it works ok.
The exact code I tried is:
#!/bin/bash
exec scala -J-mx4g "$0" "$#"
!#
import scala.collection.JavaConversions._
import edu.stanford.nlp.pipeline._
import java.util._
val props = new Properties()
props.put("annotators", Seq("tokenize", "ssplit", "pos","parse","sentiment").mkString(", "))
props.put("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
val pipeline = new StanfordCoreNLP(props)

Create a self updating string based on files in a folder in Processing

Alright, so I was messing around with a simple fft visualization in Processing and thought it would be fun to have more than just one song playing every time. In the end I added 3 songs manually and on mouse click change between the songs randomly using a predefined string. I wanted to add the rest of my computers music, but every time I would want to add a new song to my sketch I would have to copy and paste it's name into the string in my sketch. Seems like a lot of unnecessary work
Is there a way to have processing scan a folder, recognize how many files are inside, and copy all of the file names into the string? I found a library called sDrop for processing 1.1 which lets you drag and drop files into the sketch directly. However, that doesn't seem to exist anymore in version 2+ of Processing.
Here is the simple version of my current working code to play the music:
import ddf.minim.spi.*;
import ddf.minim.signals.*;
import ddf.minim.*;
import ddf.minim.analysis.*;
import ddf.minim.ugens.*;
import ddf.minim.effects.*;
AudioPlayer player;
Minim minim;
String [] songs = {
"song1.mp3",
"song2.mp3",
"song3.mp3",
};
int index;
void setup() {
size(100, 100, P3D);
index = int(random(songs.length));
minim = new Minim(this);
player = minim.loadFile(songs[index]);
player.play();
}
void draw() {
}
void mouseClicked() {
index = int(random(songs.length));
player.pause();
player = minim.loadFile(songs[index]);
player.play();
}
If anyone has suggestions or could guide me towards a good tutorial that would be great. Thanks!
Assuming you're using this in Java mode, then you can use the Java API: https://docs.oracle.com/javase/8/docs/api/
The Java API contains a File class that contains several methods for reading the contents of a directory: https://docs.oracle.com/javase/8/docs/api/java/io/File.html
Something like this:
ArrayList<String> songs = new ArrayList<String>();
File directory = new File("path/to/song/directory/");
for(File f : directory.listFiles()){
if(!f.isDirectory()){
songs.add(f.getAbsolutePath());
}
}
Googling "java list files of directory" will yield you a ton of results.
Just to add to Kevin's Workman's answer:
Try to use File.separator instead of "/" or "\". It does the same thing, but it figures out the right based on the OS you're using for you, so you can move you sketch on other computers and still have it working.
Check out Daniel Shiffman's that comes with Processing in Examples > Topics > File IO > DirectoryList

Unexpected results with groovy.util.slurpersupport.NodeChild.appendNode() (Groovy 2.2.1)

I think what I am trying to do is simple: Add child nodes dynamically to a node (without even knowing the name of the node to be added - developing some framework) using XmlSlurper.
For ease of explaining, something like this:
def colorsNode = new XmlSlurper().parseText("""
<colors>
<color>red</color>
<color>green</color>
</colors>""")
NodeChild blueNode = new XmlSlurper().parseText("<color>blue</color>") // just for illustration. The actual contents are all dynamic
colorsNode.appendNode(blueNode) // In real life, I need to be be able to take in any node and append to a given node as child.
I was expecting the resulting node to be the same as slurping the following:
“””
<colors>
<color>red</color>
<color>green</color>
<color>blue</color>
</colors>"""
However the result of appending is:
colorsNode
.node
.children => LinkedList[Node('red') -> Node('green') -> <b>NodeChild</b>(.node='blue')]
In other words, what gets appended to the LinkedList is the NodeChild that wraps the new node, not the node itself.
Not surprising, looking at the source code for NodeChild.java:
protected void appendNode(final Object newValue) {
this.node.appendNode(newValue, this);
}
Well, I would gladly modify my code into:
colorsNode.appendNode(blueNode<b>.node</b>)
Unfortunately NodeChild.node is private :(, don't know why! What would be a decent way of achieving what I am trying? I couldn’t see any solutions online.
I was able to complete my prototyping work by tweaking Groovy source and exposing NodeChild.node, but now need to find a proper solution.
Any help would be appreciated.
Thanks,
Aby Mathew
It would be easier if you use XmlParser:
#Grab('xmlunit:xmlunit:1.1')
import org.custommonkey.xmlunit.Diff
import org.custommonkey.xmlunit.XMLUnit
def xml = """
<colors>
<color>red</color>
<color>green</color>
</colors>
"""
def expectedResult = """
<colors>
<color>red</color>
<color>green</color>
<color>blue</color>
</colors>
"""
def root = new XmlParser().parseText(xml)
root.appendNode("color", "blue")
def writer = new StringWriter()
new XmlNodePrinter(new PrintWriter(writer)).print(root)
def result = writer.toString()
XMLUnit.setIgnoreWhitespace(true)
def xmlDiff = new Diff(result, expectedResult)
assert xmlDiff.identical()

Exhaustively walking the AST tree in Groovy

This is related to my question on intercepting all accesses to a field in a given class, rather than just those done in a manner consistent with Groovy 'property' style accesses. You can view that here: intercepting LOCAL property access in groovy.
One way I've found that will definitely resolve my issue there is to use AST at compile time re-write any non-property accesses with property accesses. For example, a if a class looks like this:
class Foo {
def x = 1
def getter() {
x
}
def getProperty(String name) {
this."$name" ++
}
}
foo = new Foo()
assert foo.getter() == 1
assert foo.x == 2
These assert statements will work out because the getter method access x directly and the foo.x goes through getProperty("x") which increments x before returning.
After some trial and error I can use an AST transformation to change the behavior of the code such that the expression 'x' in the 'getter' method is actually accessed as a Property rather than as a local field. So far so good!
Now, how do I go about getting to ALL accesses of local fields in a given class? I've been combing the internet looking for an AST tree walker helper of some kind but haven't found one. Do I really need to implement an expression walker for all 38 expression types here http://groovy.codehaus.org/api/org/codehaus/groovy/ast/expr/package-summary.html and all 18 statement types here http://groovy.codehaus.org/api/org/codehaus/groovy/ast/stmt/package-summary.html? That seems like something that someone must have already written (since it would be integral to building an AST tree in the first place) but I can't seem to find it.
Glenn
You are looking for some sort of visitor. Groovy has a few (weakly documented) visitors defined that you could use. I don't have the exact answer for your problem, but I can provide you a few directions.
The snippet below shows how to transverse the AST of a class and print all method names:
class TypeSystemUsageVisitor extends ClassCodeVisitorSupport {
#Override
public void visitExpression(MethodNode node) {
super.visitMethod(node)
println node.name
}
#Override
protected SourceUnit getSourceUnit() {
// I don't know ho I should implement this, but it makes no difference
return null;
}
}
And this is how I am using the visitor defined above
def visitor = new TypeSystemUsageVisitor()
def sourceFile = new File("path/to/Class.groovy")
def ast = new AstBuilder().buildFromString(CompilePhase.CONVERSION, false, sourceFile.text).find { it.class == ClassNode.class }
ast.visitContents(visitor)
Visitors take care of transversing the tree for you. They have visit* methods that you can override and do whatever you want with them. I believe the appropriate visitor for your problem is CodeVisitorSupport, which has a visitVariableExpression method.
I recommend you to read the code of the AST Browser that comes along with groovyConsole for more examples on how to use Groovy AST Visitors. Also, take a look at the api doc for CodeVisitorSupport.

Resources