Nullpointer Exception with OpenNLP in NameFinderME class - nlp

I am using OpenNLP to extract named entities from a given text.
It gives me the following error while running the code on large data. When I run it on small data it works fine.
java.lang.NullPointerException
at opennlp.tools.util.Cache.put(Cache.java:134)
at opennlp.tools.util.featuregen.CachedFeatureGenerator.createFeatures(CachedFeatureGenerator.java:71)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:116)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:39)
at opennlp.tools.util.BeamSearch.bestSequences(BeamSearch.java:125)
at opennlp.tools.util.BeamSearch.bestSequence(BeamSearch.java:198)
at opennlp.tools.namefind.NameFinderME.find(NameFinderME.java:214)
at opennlp.tools.namefind.NameFinderME.find(NameFinderME.java:198)
Please help me out with this.

I had this same issue with POSTaggerME, and the cause is almost certainly because you're sharing a NameFinderME instance between threads.
Per the opennlp doc, most of the exposed library classes are not thread-safe:
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.namefind.recognition.api

Related

Alloy API: Decompile into .als

BLUF: Can I export a .als file corresponding to a model I have created with the Alloy API?
Example: I have a module that I read in using edu.mit.csail.sdg.alloy4compiler.parser.CompUtil. I then add signatures and facts to create a modified model in memory. Can I "de-parse" that and basically invert the lexer (edu.mit.csail.sdg.alloy4compiler.parser.CompLexer) to get a .als file somehow?
It seems like there ought to be a way to decompile the model in memory and save that as code to be later altered, but I'm having trouble identifying a path to that in the Alloy API Javadocs. I'm building a translator from select behavioral aspects of UML/SysML as part of some research, so I'm trying to figure out if there is something extant I can take advantage of or if I need to create it.
It seems a similar question has been asked before: Generating .als files corresponding to model instances with Alloy API
From the attached post https://stackoverflow.com/users/2270610/lo%c3%afc-gammaitoni stated he has written a solution for this in his Lightning application. He said that he may include the source code for completing this task. I'm unsure if he has uploaded the solution yet.

Reading a grib2 message into an Iris cube

I am currently exploring the notion of using iris in a project to read forecast grib2 files using python.
My aim is to load/convert a grib message into an iris cube based on a grib message key having a specific value.
I have experimented with iris-grib, which uses gribapi. Using iris-grib I have not been to find the key in the grib2 file, althrough the key is visible with 'grib_ls -w...' via the cli.
gribapi does the job, but I am not sure how to interface it with iris (which is what, I assume, iris-grib is for).
I was wondering if anyone knew of a way to get a message into an iris cube based on a grib message key having a specific value. Thank you
You can get at anything that the gribapi understands through the low-level grib interface in iris-grib, which is the iris_grib.GribMessage class.
Typically you would use for msg in GribMessage.messages_from_filename(xxx): and then access it like e.g. msg.sections[4]['productDefinitionTemplateNumber']; msg.sections[4]['parameterNumber'] and so on.
You can use this to identify required messages, and then convert to cubes with iris_grib.load_pairs_from_fields().
However, Iris-grib only knows how to translate specific encodings into cubes : it is quite strict about exactly what it recognises, and will fail on anything else. So if your data uses any unrecognised templates or data encodings it will definitely fail to load.
I'm just anticipating that you may have something unusual here, so that might be an issue?
You can possibly check your expected message contents against the translation code at iris_grib:_load_convert.py, starting at the convert() routine.
To get an Iris cube out of something not yet supported, you would either :
(a) extend the translation rules (i.e. a Github PR), or
(b) sometimes you can modify the message so that it looks like something
that can be recognised.
Failing that, you can
(c) simply build an Iris cube yourself from the data found in your GribMessage : That can be a little simpler than using 'gribapi' directly (possibly not, depending on detail).
If you have a problem like that, you should definitely raise it as an issue on the github project (iris-grib issues) + we will try to help.
P.S. as you have registered a Python3 interest, you may want to be aware that the newer "ecCodes" replacement for gribapi should shortly be available, making Python3 support for grib data possible at last.
However, the Python3 version is still in beta and we are presently experiencing some problems with it, now raised with ECMWF, so it is still almost-but-not-quite achievable.

OpenModelica with Python

I created two models in OpenModelica-V1.12. One model has the main part and other model has the math related calculations. Main model uses the other model for the calculations. This works fine in OpenModelica.
When I launch the main model using Python with OMPython it shows the below error:
Error: Error occurred while flattening model Sim_FS_SingleEx.
How do I include the dependent models (.mo) in python?
Any suggestions to resolve the issue?
Thanks
I guess you can send two expressions to omc:
loadFile("file1.mo");
loadFile("file2.mo");
Better still would be to make your models into a library, then you can just load Library/package.mo which will load all the files from the library automatically.
I have fixed the ticket. You can specify .mo files in the third parameter.

Linking multiple name finder entities using OpenNLP

First a little bit of context: I'm trying to identify street addresses in a corpus of documents and we decided that the obvious solution for this would be to use an NLP (Apache OpenNLP in this case) tool to achieve this and so far everything looks great although we still need to train the model with a lot of documents, but that's not really an issue. We improved the solution by adding a extra step for address validation by using the USAddress parser from Datamade. My biggest issue is the fact that the addresses by themselves are nothing without a location next to them, sometimes the location is specified in the text and we will assume that this happens quite often.
Here comes my question: Is there someway to use coreference to associate the entities in the text? Or better yet is there a way to annotate arbitrary words in the text and identify them as being one entity?
I've been looking at the Apache OpenNLP documentation but...it's pretty thin and I think it still needs some work.
If you want to use coreference for this problem, you can have a look at this blog
But a simpler solution would be using a sentence detector+ RegEx or a location NER+ sentence detector(presuming addresses are in a single line)
I think the US addresses can be identified using a Regular Expression and once the regex matches, you can use opennlp's sentence detector to print the whole address line.
Similarly you can use NER model provided by opennlp to find locations and print the sentence you want.
Hope this helps!
edit
this Github Repo made it simple for us. Check it out!
OpenNLP does not provide a coreference resolution module. You have to use either Stanford or Illinois or Berkeley system to accomplish the task. They may not work out of the box, you may have to do some parameter tuning or supervised training to achieve reasonable performance.
#edit
Thanks #Alaye for pointing out that OpenNLP does have a coref module, for more details see his answer.
Thanks
Ok, several months later! It wasn't Coref what I was after... what I as actually looking for was Relation Extraction (Information Extraction). I used MITIE (BinaryRelation) and that did the trick, I trained my own model using Brat annotation tool and I got an F1 score of 0.81. Pretty neat...

Improving entity naming with custom file/code in NLTK

We've been working with the NLTK library in a recent project where we're
mainly interested in the named entities part.
In general we're getting good results using the NEChunkParser class.
However, we're trying to find a way to provide our own terms to the
parser, without success.
For example, we have a test document where my name (Shay) appears in
several places. The library finds me as GPE while I'd like it to find
me as PERSON...
Is there a way to provide some kind of a custom file/
code so the parser will be able to interpret the named entity as I
want it to?
Thanks!
The easy solution is to compile a list of entities that you know are misclassified, then filter the NEChunkParser output in a postprocessing module and replace these entities' tags with the tags you want them to have.
The proper solution is to retrain the NE tagger. If you look at the source code for NLTK, you'll see that the NEChunkParser is based on a MaxEnt classifier, i.e. a machine learning algorithm. You'll have to compile and annotate a corpus (dataset) that is representative for the kind of data you want to work with, then retrain the NE tagger on this corpus. (This is hard, time-consuming and potentially expensive.)

Resources