How to write scripts to keep punctuation in Stanford dependency parser - nlp

In order to get some specific dependency information I write a java script to parse sentences rather than directly use ParserDemo.java that Stanford Parser 3.9.1 provided. But I found punctuation is missing after got typedDependencies. Is there any function to get punctuation in Stanford Parser?
I had to write a script to parse sentences myself for the reason that I need to create a SemanticGraph from a List of TypedDependencies, in order to use methods in SemanticGraph to get evey single tokens dependent information(include punctuation).
public class ChineseFileTest3 {
public static void main(String[] args){
String modelpath = "edu/stanford/nlp/models/lexparser/xinhuaFactored.ser.gz";
LexicalizedParser lp = LexicalizedParser.loadModel(modelpath);
String textFile = "data/chinese-onesent-unseg-utf8.txt";
demoDP(lp,textFile);
}
public static void demoDP(LexicalizedParser lp, String filename){
for(List<HasWord> sentence : new DocumentPreprocessor(filename)) {
Tree t = lp.apply(sentence);
ChineseGrammaticalStructure gs = new ChineseGrammaticalStructure(t);
Collection<TypedDependency> tdl = gs.typedDependenciesCollapsed();
System.out.println(tdl);
}
}
}

I would suggest not using the parser standalone but instead just running a pipeline. That will maintain the punctuation.
There is comprehensive documentation about using the Java API for pipelines here:
https://stanfordnlp.github.io/CoreNLP/api.html
You need to set the properties for Chinese. A quick way to do that is with this line of code
Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-chinese.properties");

Related

Getting a Fail Trying to Parse a ZonedDateTime from a String Using ZonedDateTimePattern

I've been learning how to use NodaTime, as I think it is a far superior "all things temporal" library that the handful of structs in the BCL. Reading the docs and experimenting.
This experiment has me flummoxed. I started out just trying to parse a ZonedDateTime.
The things I was trying were not successful, so I thought I'd try something which should be "bulletproof". The following code represents that attempt:
Instant thisNow = SystemClock.Instance.GetCurrentInstant();
var timezone = DateTimeZoneProviders.Tzdb["Australia/Brisbane"];
var zonedDateTime = thisNow.InZone(timezone);
var zonePattern = ZonedDateTimePattern.GeneralFormatOnlyIso;
var zoneFormatted = zonePattern.Format(zonedDateTime);
var zoneParseResult = zonePattern.Parse(zoneFormatted);
Console.WriteLine(zoneParseResult.Success ? "parse success" : "parse failure");
So, simply trying to parse back that which you just converted to a string.
The zoneFormatted has the following value 2021-09-04T16:59:08 Australia/Brisbane (+10)
Any ideas what I am doing wrong?
Cheers
Any ideas what I am doing wrong?
You're using ZonedDateTimePattern.GeneralFormatOnlyIso, which is (as the name suggests) only for formatting, not for parsing.
To get a pattern which is able to parse time zones, you need to specify an IDateTimeZoneProvider. The easiest way to do that is to start with a format-only pattern, and use WithZoneProvider:
using NodaTime;
using NodaTime.Text;
using System;
class Program
{
static void Main(string[] args)
{
var pattern = ZonedDateTimePattern.GeneralFormatOnlyIso
.WithZoneProvider(DateTimeZoneProviders.Tzdb);
var text = "2021-09-04T16:59:08 Australia/Brisbane (+10)";
var result = pattern.Parse(text);
Console.WriteLine(result.Success);
Console.WriteLine(result.Value);
}
}

Is there a way to suppress single quotes in YamlScalarNode output?

Consider this simple program:
using System;
using YamlDotNet.RepresentationModel;
namespace TestYamlNode
{
class Program
{
static void Main(string[] args)
{
var scalarNode = new YamlScalarNode("!yada");
scalarNode.Style = YamlDotNet.Core.ScalarStyle.Plain;
var serializer = new YamlDotNet.Serialization.Serializer();
serializer.Serialize(Console.Out, scalarNode);
scalarNode = new YamlScalarNode("yada");
scalarNode.Style = YamlDotNet.Core.ScalarStyle.Plain;
serializer.Serialize(Console.Out, scalarNode);
}
}
}
The oputput of the program is:
'!yada'
yada
Is there a way to tell YamlDotNet to not include single quotes in the output when it has characters like !, { etc. included in it?
For some context, I'm processing a AWS SAM template that has a property that looks like this:
uri: !Sub arn:${AWS::Partition}:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PetStorePetFunc.Arn}/invocations
OK - so after playing with this a bit, I realized that I asked what might be a dumb question.
In my use case I'm trying to generate an output with the form:
uri: !Sub arn:${AWS::Partition}:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PetStorePetFunc.Arn}/invocations
It looks like the correct way to do this is to use the Tag property on the YamlScalarNode:
example:
var node = new YamlScalarNode("arn:${AWS::Partition}:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PetStorePetFunc.Arn}/invocations");
node.Tag = "!Sub";

explicit POS tagged input provided and getting sentiment stanfordnlp

I am trying the code mentioned in question 11 from the URL.
I want to first give POS tagged input and second get sentiment analysis. First one I able to successfully get done. I able to print the tree and it looks fine. However second one returns me -1 (it should return me 4=very positive).
Please provide inputs/suggestions.
public static String test(){
try{
String grammer="/Users/lenin/jar/stanfordparser-master/stanford-parser/models/englishPCFG.ser.gz";
// set up grammar and options as appropriate
LexicalizedParser lp = LexicalizedParser.loadModel(grammer);
String[] sent3 = { "movie", "was","very", "good","." };
// Parser gets tag of second "can" wrong without help
String[] tag3 = { "PRP", "VBD", "RB", "JJ","." };
List sentence3 = new ArrayList();
for (int i = 0; i < sent3.length; i++) {
sentence3.add(new TaggedWord(sent3[i], tag3[i]));
}
Tree parse = lp.parse(sentence3);
parse.pennPrint();
int sentiment_score = RNNCoreAnnotations.getPredictedClass(parse);
System.out.println("score: "+sentiment_score);
}
catch(Exception e){
e.printStackTrace();
}
return "";
}
You're getting a value of -1 because you haven't run any sentiment analysis. You've only parsed the sentence for grammatical structure.
You can, of course, run the sentiment analyzer via code, but, unfortunately, at the moment there isn't an easy lower-level interface to do so. That would be a good thing to add sometime! You essentially need to duplicate the processing that happens in the class edu.stanford.nlp.pipeline.SentimentAnnotator:
Get a binarized tree from the parser (directly or by binarizing the tree returned)
Collapse unaries
Run the SentimentCostAndGradient class's forwardPropagateTree

How to use Apache OpenNLP in a node.js application

What is the best way to use Apache Open NLP with node.js?
Specifically, I want to use Name Entity Extraction API. Here is what is says about it - the documentation is terrible (new project, I think):
http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind
From the docs:
To use the Name Finder in a production system its strongly recommended
to embed it directly into the application instead of using the command
line interface. First the name finder model must be loaded into memory
from disk or an other source. In the sample below its loaded from
disk.
InputStream modelIn = new FileInputStream("en-ner-person.bin");
try {
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
There is a number of reasons why the model loading can fail:
Issues with the underlying I/O
The version of the model is not compatible with the OpenNLP version
The model is loaded into the wrong component, for example a tokenizer
model is loaded with TokenNameFinderModel class.
The model content is not valid for some other reason
After the model is loaded the NameFinderME can be instantiated.
NameFinderME nameFinder = new NameFinderME(model);
The initialization is now finished and the Name Finder can be used.
The NameFinderME class is not thread safe, it must only be called from
one thread. To use multiple threads multiple NameFinderME instances
sharing the same model instance can be created. The input text should
be segmented into documents, sentences and tokens. To perform entity
detection an application calls the find method for every sentence in
the document. After every document clearAdaptiveData must be called to
clear the adaptive data in the feature generators. Not calling
clearAdaptiveData can lead to a sharp drop in the detection rate after
a few documents. The following code illustrates that:
for (String document[][] : documents) {
for (String[] sentence : document) {
Span nameSpans[] = find(sentence);
// do something with the names
}
nameFinder.clearAdaptiveData()
}
the following snippet shows a call to find
String sentence = new String[]{
"Pierre",
"Vinken",
"is",
"61",
"years"
"old",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
The nameSpans arrays contains now exactly one Span which marks the
name Pierre Vinken. The elements between the begin and end offsets are
the name tokens. In this case the begin offset is 0 and the end offset
is 2. The Span object also knows the type of the entity. In this case
its person (defined by the model). It can be retrieved with a call to
Span.getType(). Additionally to the statistical Name Finder, OpenNLP
also offers a dictionary and a regular expression name finder
implementation.
Checkout this NodeJS library.
https://github.com/mbejda/Node-OpenNLP
https://www.npmjs.com/package/opennlp
Just do NPM install opennlp
And look at the examples on the Github.
var nameFinder = new openNLP().nameFinder;
nameFinder.find(sentence, function(err, results) {
console.log(results)
});

Xtext/EMF how to do model-to-model transform?

I have a DSL in Xtext, and I would like to reuse the rules, terminals, etc. defined in my .xtext file to generate a configuration file for some other tool involved in the project. The config file uses syntax similar to BNF, so it is very similar to the actual Xtext content and it requires minimal transformations. In theory I could easily write a script that would parse Xtext and spit out my config...
The question is, how do I go about implementing it so that it fits with the whole ecosystem? In other words - how to do a Model to Model transform in Xtext/EMF?
If you have both metamodels(ecore,xsd,...), your best shot is to use ATL ( http://www.eclipse.org/atl/ ).
If I understand you correct you want to go from an xtext model to its EMF model. Here is a code example that achieves this, substitute your model specific where necessary.
public static BeachScript loadScript(String file) throws BeachScriptLoaderException {
try {
Injector injector = new BeachStandaloneSetup().createInjectorAndDoEMFRegistration();
XtextResourceSet resourceSet = injector.getInstance(XtextResourceSet.class);
resourceSet.addLoadOption(XtextResource.OPTION_RESOLVE_ALL, Boolean.TRUE);
Resource resource = resourceSet.createResource(URI.createURI("test.beach"));
InputStream in = new ByteArrayInputStream(file.getBytes());
resource.load(in, resourceSet.getLoadOptions());
BeachScript model = (BeachScript) resource.getContents().get(0);
return model;
} catch (Exception e) {
throw new BeachScriptLoaderException("Exception Loading Beach Script " + e.toString(),e );
}

Resources