Natural Logic Inference - nlp

I've been trying to use the Natural Logic Inference component (Naturalli) packaged with Stanford CoreNLP 3.5.2 to extract relation triples...however upon creating a new OpenIE instance I get the following exception:
Could not load affinity model at edu/stanford/nlp/naturalli/: Could not find a part of the path '...\edu\stanford\nlp\naturalli\pp.tab.gz'
I tried searching for the pp.tab.gz file on the web but I couldn't find it. Then I tried to get around by disabling the affinity:
Properties props = new Properties();
props.put("ignoreaffinity", true);
OpenIE ie = new OpenIE(props);
But then I started getting this following exception:
Could not load clause splitter model at edu/stanford/nlp/naturalli/clauseSplitterModel.ser.gz: Unable to resolve "edu/stanford/nlp/naturalli/clauseSplitterModel.ser.gz" as either class path, filename or URL
Same issue with this file...I couldn't find it anywhere.
Any help regarding how to solve these issues is greatly appreciated! Thanks for everyone in advanced!

This was recently put up, there are some downloads available here:
http://nlp.stanford.edu/software/openie.shtml
I would recommend using the jars pointed to there instead of the Stanford CoreNLP 3.5.2 release.

Related

Does the sagemaker official tutorial generate an AttributeError, and how to solve it?

I'm following the AWS Sagemaker tutorial, but I think there's an error in the step 4a. Particularly, at line 3 I'm instructed to type:
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket_name, prefix), content_type='csv')
and I get the error
----> 3 s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket_name, prefix), content_type='csv')
AttributeError: module 'sagemaker' has no attribute 's3_input'
Indeed, using dir shows that sagemaker has no attribute called s3_input. How can fix this so that I can keep advancing in the tutorial? I tried using session.inputs, but this redirects me to a page saying that session is deprecated and suggesting that I use sagemaker.inputs.TrainingInput instead of sagemaker.s3_inputs. Is this a good way of going forward?
Thanks everyone for the help and patience!
Using sagemaker.inputs.TrainingInput instead of sagemaker.s3_inputs worked to get that code cell functioning. It is an appropriate solution, though there may be another approach.
Step 4.b also had code which needed updating
sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(containers[my_region],role, train_instance_count=1, train_instance_type='ml.m4.xlarge',output_path='s3://{}/{}/output'.format(bucket_name, prefix),sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,eta=0.2,gamma=4,min_child_weight=6,subsample=0.8,silent=0,objective='binary:logistic',num_round=100)
uses parameters train_instance_count and train_instance_type which have been changed in a later version (https://sagemaker.readthedocs.io/en/stable/v2.html#parameter-and-class-name-changes).
Making these changes resolved the errors for the tutorial using conda_python3 kernel.

Getting started with Spark (Datastax Enterprise)

I'm trying to setup and run my first Spark query following the official example.
On our local machines we have already setup last version of Datastax Enterprise packet (for now it is 4.7).
I do everything exactly according documentation, I appended latest version of dse.jar to my project but errors comes right from the beginning:
Here is the snippet from their example
SparkConf conf = DseSparkConfHelper.enrichSparkConf(new SparkConf())
.setAppName( "My application");
DseSparkContext sc = new DseSparkContext(conf);
Now it appears that DseSparkContext class has only default empty constructor.
Right after these lines comes the following
JavaRDD<String> cassandraRdd = CassandraJavaUtil.javaFunctions(sc)
.cassandraTable("my_keyspace", "my_table", .mapColumnTo(String.class))
.select("my_column");
And here comes the main problem, CassandraJavaUtil.javaFunctions(sc)method accepts only SparkContext on input and not DseSparkContext (SparkContext and DseSparkContext are completely different classes and one is not inherited from another).
I assume that documentation is not up to date with the realese version and if anyone met this problem before, please share with me your experience,
Thank you!
There looks like a bug in the docs. That should be
DseSparkContext.apply(conf)
Since DseSparkContext is a Scala object which uses the Apply function to create new SparkContexts. In Scala you can just write DseSparkContext(conf) but in Java you must actually call the method. I know you don't have access to this code so I'll make sure that this gets fixed in the documentation and see if we can get better API docs up.

Sphinx4 figuring out correct models

I am trying to use the Sphinx4 library for speech recognition, but I cannot seem to figure out the correct combination of acoustic model-dictionary-language model. I have tried out various combinations and I get a different error every time.
I am trying to follow the tutorial on http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4. I do not have a config.xml as I would if I was using ConfigurationManager instead of Configuration, because there is no perceivable way of passing the location of the config file to the Configuration itself (ConfigMgr takes it as an argument to the constructor); and that might be my problem right there. I just do not know how to point to one, and since the tutorial says "It is possible to configure low-level components of the application through XML file although you should do that ONLY IF you understand what is going on.", I assume having a config.xml file is not compulsory.
Combining the latest dictionary (7b - obtained from Sourceforge) with the latest acoustic model (cmusphinx-en-us-5.2.tar.gz - from SF again) and the language model (cmusphinx-5.0-en-us.lm.gz - from SF again) results in NullPointerException in startRecognition. The issue is similar to the problem here: sphinx-4 NullPointerException at startRecognition, but the link given in the answer no longer works. I obtained 0.7a from SF (since that is the dict the link seems to point at), but I am getting even earlier in the execution Error loading word: ;;; when I use that one. I tried downloading latest models and dict from the Github repo, that results in java.lang.IndexOutOfBoundsException: Index: 16128, Size: 16128.
Any help is much appreciated!
You need to use latest code from github
http://github.com/cmusphinx/sphinx4
as described by tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
Correct models (en-us) are already included, you should not replace anything. You should not configure any XML files, use samples as provided in the sources.

Stanford CoreNLP model sentiment.ser.gz missing?

I am new stanford to corenlp and trying to use it. I was able to run sentimental analysis pipeline and corenlp software. While when I am trying to execute evaluate tool it is asking for model sentiment.ser.gz.
java edu.stanford.nlp.sentiment.Evaluate edu/stanford/nlp/models/sentiment/sentiment.ser.gz test.txt
I could not find the model in the software that I downloaded from stanford site or anywhere on internet.
Can someone please guide if we can create our own model or if I can find anywhere on the internet.
Appreciate your help.
The file stanford-corenlp-full-2014-01-04.zip contains another file called stanford-corenlp-3.3.1-models.jar. The latter file is a ZIP archive that contains the model file you are looking for.
CoreNLP is able to load the model file from the classpath if you add the stanford-corenlp-3.3.1-models.jar to your Java classpath, so you do not have to do anything.
It also appears the documentation on running the Evaluate tool is slightly outdated.
The correct call goes like this (tested with CoreNLP 3.3.1 and the test data downloaded from the sentiment homepage):
java -cp "*" edu.stanford.nlp.sentiment.Evaluate -model edu/stanford/nlp/models/sentiment/sentiment.ser.gz -treebank test.txt
The '-cp "*"' adds everything in the current directory to the classpath. Thus, the command above must be executed in the directory to which you extracted CoreNLP, otherwise it will not work.
If you do not add the "-model" and -treebank" to the call, you'll an error message like this
Unknown argument test.txt
If you do not supply a treebank and a model, you get another error message
Exception in thread "main" java.lang.NullPointerException
at java.io.File.<init>(File.java:277)

Labelling Neo4j database using Neo4django

This question is related to the github issue of Neo4django. I want to create multiple graphs using Neo4j graph DB from Django web framework. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8.
As of now Neo4django doesn't support labeling but the above is my core purpose and I want to be able to create labels from Neo4django. So I went into the source code and tried to tweak it a little to see if I can make this addition. In my understanding, the file 'db/models/properties.py' has class BoundProperty(AttrRouter) which calls gremlin script through function save(instance, node, node_is_new). The script is as follows:
script = '''
node=g.v(nodeId);
results = Neo4Django.updateNodeProperties(node, propMap);
'''
The script calls the update function from library.groovy and all the function looks intuitive and nice. I'm trying to add on this function to support labeling but I have no experience of groovy. Does anyone have any suggestions on how to proceed? Any help would be appreciated. If it works it would be a big addition to neo4django :)
Thank you
A little background:
The Groovy code you've highlighted is executed using the Neo4j Gremlin plugin. First it supports the Gremlin graph DSL (eg node=g.v(nodeId)), which is implemented atop the Groovy language. Groovy itself is a dynamic superset of Java, so most valid Java code will work with scripts sent via connection.gremlin(...). Each script sent should define a results variable that will be returned to neo4django, even if it's just null.
Anyway, accessing Neo4j this way is handy (though will be deprecated I've heard :( ) because you can use the full Neo4j embeddeded Java API. Try something like this to add a label to a node
from neo4django.db import connection
connection.gremlin("""
node = g.v(nodeId)
label = DynamicLabel.label('Label_Name')
node.rawVertex.addLabel(label)
""", nodeId=node_id)
You might also need to add an import for DynamicLabel- I haven't run this code so I'm not sure. Debugging code written this way is a little tough, so make liberal use of the Gremlin tab in the Neo4j admin.
If you come up with a working solution, I'd love to see it (or an explanatory blog post!)- I'm sure it could be helpful to other users.
HTH!
NB - Labels will be properly supported shortly after Neo4j 2.0's release- they'll replace the current in-graph type structure.

Resources