Error: Annotator "sentiment" requires annotator "binarized_trees" - nlp

Could any one help me when this error can happen. Any idea is really appreciated. Do I need to add anything, any annotator. Is this an issues with the data or the model that i am passing apart from default model.
i am using Standford NLP 3.4.1 to do the sentiment calculation for social media data. When i run it through spark/scala job i am getting this following error for some data.
java.lang.IllegalArgumentException: annotator "sentiment" requires annotator "binarized_trees"
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:300)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
at com.pipeline.sentiment.NonTwitterSentimentAndThemeProcessorAction$.create(NonTwitterTextEnrichmentComponent.scala:142)
at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action$lzycompute(NonTwitterTextEnrichmentComponent.scala:52)
at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action(NonTwitterTextEnrichmentComponent.scala:50)
at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action(NonTwitterTextEnrichmentComponent.scala:49)
here is the code i have in scala
def create(features: Seq[String] = Seq("tokenize", "ssplit", "pos","parse","sentiment")): TwitterSentimentAndThemeAction = {
println("comes inside the TwitterSentimentAndThemeProcessorAction create method")
val props = new Properties()
props.put("annotators", features.mkString(", "))
props.put(""pos.model", "tagger/gate-EN-twitter.model");
props.put("parse.model", "tagger/englishSR.ser.gz");
val pipeline = new StanfordCoreNLP(props)
Any help is really appreciated. Thanks for the help

...Are you sure this is the error you get? With your code, I get an error
Loading parser from serialized file tagger/englishSR.ser.gz ...edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "tagger/englishSR.ser.gz" as either class path, filename or URL
This makes much more sense. The shift reduce parser models lives at edu/stanford/nlp/models/srparser/englishSR.ser.gz. If I don't use the shift reduce model, the code as written works fine for me; likewise if I include the model path above it works ok.
The exact code I tried is:
#!/bin/bash
exec scala -J-mx4g "$0" "$#"
!#
import scala.collection.JavaConversions._
import edu.stanford.nlp.pipeline._
import java.util._
val props = new Properties()
props.put("annotators", Seq("tokenize", "ssplit", "pos","parse","sentiment").mkString(", "))
props.put("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
val pipeline = new StanfordCoreNLP(props)

Related

while running huggingface gpt2-xl model embedding index getting out of range

I am trying to run hugginface gpt2-xl model. I ran code from the quickstart page that load the small gpt2 model and generate text by the following code:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained('gpt2')
generated = tokenizer.encode("The Manhattan bridge")
context = torch.tensor([generated])
past = None
for i in range(100):
print(i)
output, past = model(context, past=past)
token = torch.argmax(output[0, :])
generated += [token.tolist()]
context = token.unsqueeze(0)
sequence = tokenizer.decode(generated)
print(sequence)
This is running perfectly. Then I try to run gpt2-xl model.
I changed tokenizer and model loading code like following:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-xl")
model = GPT2LMHeadModel.from_pretrained('gpt2-xl')
The tokenizer and model loaded perfectly. But I a getting error on the following line:
output, past = model(context, past=past)
The error is:
RuntimeError: index out of range: Tried to access index 204483 out of table with 50256 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418
Looking at error it seems that the embedding size is not correct. So I write the following line to specifically fetch the config file of gpt2-xl:
config = GPT2Config.from_pretrained("gpt2-xl")
But, here vocab_size:50257
So I changed explicitly the value by:
config.vocab_size=204483
Then after printing the config, I can see that the previous line took effect in the configuration. But still, I am getting the same error.
This was actually an issue I reported and they fixed it.
https://github.com/huggingface/transformers/issues/2774

ScriptRunConfig with datastore reference on AML

When trying to run a ScriptRunConfig, using :
src = ScriptRunConfig(source_directory=project_folder,
script='train.py',
arguments=['--input-data-dir', ds.as_mount(),
'--reg', '0.99'],
run_config=run_config)
run = experiment.submit(config=src)
It doesn't work and breaks with this when I submit the job :
... lots of things... and then
TypeError: Object of type 'DataReference' is not JSON serializable
However if I run it with the Estimator, it works. One of the differences is the fact that with a ScriptRunConfig we're using a list for parameters and the other is a dictionary.
Thanks for any pointers!
Being able to use DataReference in ScriptRunConfig is a bit more involved than doing just ds.as_mount(). You will need to convert it into a string in arguments and then update the RunConfiguration's data_references section with the DataReferenceConfiguration created from ds. Please see here for an example notebook on how to do that.
If you are just reading from the input location and not doing any writes to it, please check out Dataset. It allows you to do exactly what you are doing without doing anything extra. Here is an example notebook that shows this in action.
Below is a short version of the notebook
from azureml.core import Dataset
# more imports and code
ds = Datastore(workspace, 'mydatastore')
dataset = Dataset.File.from_files(path=(ds, 'path/to/input-data/within-datastore'))
src = ScriptRunConfig(source_directory=project_folder,
script='train.py',
arguments=['--input-data-dir', dataset.as_named_input('input').as_mount(),
'--reg', '0.99'],
run_config=run_config)
run = experiment.submit(config=src)
you can see this link how-to-migrate-from-estimators-to-scriptrunconfig in official documents.
The core code of using DataReference in ScriptRunConfig is
# if you want to pass a DataReference object, such as the below:
datastore = ws.get_default_datastore()
data_ref = datastore.path('./foo').as_mount()
src = ScriptRunConfig(source_directory='.',
script='train.py',
arguments=['--data-folder', str(data_ref)], # cast the DataReference object to str
compute_target=compute_target,
environment=pytorch_env)
src.run_config.data_references = {data_ref.data_reference_name: data_ref.to_config()} # set a dict of the DataReference(s) you want to the `data_references` attribute of the ScriptRunConfig's underlying RunConfiguration object.

Scala dynamic String interpolation - Read from properties file

I tried to get the log message from message.properties and did string interpolation with that log message.In this time,The log message is not interpolated with original message.
I am not able to get string interpolated result and am getting the output as log message what i have specified in properties file
Here I dont want to hard code any log message in scala file,Instead of this,I want to get all message from properties file and redirected into application log after interpolating string value.
import com.typesafe.config.ConfigFactory
import grizzled.slf4j.Logging
object Test extends Logging {
def main(args: Array[String]){
val subjectArea="Member"
val messageProp = ConfigFactory.load("message.properties")
val log=messageProp.getString("log.subject.area")
debug(s"$log")
}
}
message.properties
log.subject.area=The Subject Area : $subjectArea
Console Output: The Subject Area : $subjectArea
i want this output : The Subject Area : Member
Thanks in advance!!!
Test.scala
message.propeties
This is not a string interpolation problem. You want a lightweight templating engine (e.g. http://jtwig.org/documentation/quick-start/application or something else). I feel that most of them would be an overkill if your problem is as simple as in the snippet you've provided.
If you want to do something more or less complex, then sure, go with template engines.
Otherwise, I'd just go with string substitution.
String interpolation only works with constants. To do what you want dynamically, you need to write some explicit processing yoruself (or use a template engine library). Something like this, perhaps?
val substPattern = """\$\{(.+?)\}""".r
import java.util.regex.Matcher.{ quoteReplacement => qq }
def processSubstitutions(
input: String,
vars: Map[String, String]
) = substPattern.replaceAllIn(
input, { m =>
val ref = m.group(1)
qq(vars.getOrElse(ref, ref)
}
)
val vars = Map("subjectArea" -> "Member")
val messageProp = ConfigFactory.load("message.properties")
val log=processSubstitutions(
messageProp.getString("log.subject.area"),
vars
)

No suitable ClassLoader found for grab while Instantiating a class

I have created two groovy scripts as below. One script has a class which is instantiated in the other script. Both are in default package.
When I'm trying to run ImportGpsData.groovy I'm getting the following exception...
Caught: java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at ImportGpsData$_run_closure1.doCall(ImportGpsData.groovy:10)
at ImportGpsData.run(ImportGpsData.groovy:6)
Caused by: java.lang.RuntimeException: No suitable ClassLoader found for grab
at DateParser.<clinit>(DateParser.groovy)
... 2 more
ImportGpsData.groovy
def file = new File('fells_loop.gpx')
def slurper = new XmlSlurper()
def gpx = slurper.parse(file)
gpx.rte.rtept.each {
println it.#lat
println it.#lon
def parser = new DateParser()
println parser.parse(it.time.toString())
}
Dateparser.groovy
#Grapes(
#Grab(group='joda-time', module='joda-time', version='2.3')
)
import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat
class DateParser {
def String parse(time){
def printableTime = new DateTime(time)
def format = DateTimeFormat.forPattern('MM/dd/yyyy - hh:mm aa')
return printableTime.toString(format)
}
}
I've found some other StackOverFlow questions that dealt with No Suitable classloader found for grab error. One answer suggested using #GrabConfig(systemClassLoader=true) inside #Grapes statement however adding it is resulting in compilation error, I'm getting error unexpected token # in line two.
#Grapes([
#Grab(group='joda-time', module='joda-time', version='2.3')
#GrabConfig( systemClassLoader=true )
])
Above way of using it gave unexpected token # found in line 3...
Adding a comma before #GrabConfig is giving the below error
Multiple markers at this line
- Groovy:Invalid duplicate class definition of class DateParser : The source F:\GroovyEclipses\src\DateParser.groovy contains at least two definitions of the class DateParser.
- General error during conversion: No suitable ClassLoader found for grab java.lang.RuntimeException: No suitable ClassLoader found for grab
After further analysis, I have figured that I'm getting this error when ever I user #Grapes and #Grab in any of my scripts. However I have to use them to work with joda-time.
Not sure if you were able to resolve this, if not then try to compile the class file first:
groovyc Dateparser.groovy
and then do
groovy ImportGpsData.groovy
should work.

Unpickler for class with tuple

I recently came across this framework and it seems really promising for what I need. I am testing some simple examples out and I'm curious why I can pickle my object but it can't find an unpickler. Here is my example:
import scala.pickling._
import json._
object JsonTest extends App {
val simplePickled = new Simple(("test", 3)).pickle
val unpickled = simplePickled.unpickle[Simple]
}
class Simple(val x: (String, Int)) {}
Cannot generate an unpickler for com.ft.Simple
Thanks in advance for any help.
This behavior is actually a regression introduced 3 days ago. We actually just resolved this and have pushed a fix less than 1-2 hours ago.
The code you posted above now works again:
scala> :paste
// Entering paste mode (ctrl-D to finish)
import scala.pickling._
import json._
object JsonTest extends App {
val simplePickled = new Simple(("test", 3)).pickle
val unpickled = simplePickled.unpickle[Simple]
}
class Simple(val x: (String, Int)) {}
// Exiting paste mode, now interpreting.
import scala.pickling._
import json._
defined module JsonTest
defined class Simple
I've also added your code snippet here as a test case in our test suite
If you're using the artifacts we publish on sonatype, you'll have to wait until the next artifact is published (tomorrow), or if you want the fix incorporated right away, you can just checkout and package scala/pickling with sbt and use the jar that sbt builds (sbt should print where it put the jar).

Resources