RASA how to use Japanese (Tokennization-Mecab) - nlp

RASA is known to be an effective bots framework.
Stack such as RASA NLU and RASA Core is really useful.
I hand-on it around, I find out that its amazing especially with English text. I give another try on Japanese Text (Alpha support of spacy). I used it with tensorflow pipeline, I got stuck, I cannot figure out how to use external tokenization such as Mecab.
Anyone experience it ??

The tensorflow pipeline works with any language that’s whitespace tokenized. As it is not the case with Japanse, you have to built your own tokenizer.
You can do so by extending the classes Tokenizer and Component, e,g.:
class MecabTokenizer(Tokenizer, Component):
# fill with your code
You can then use your custom class in your NLU pipeline by specifying the module path in the name (also described in the docs), e.g.:
pipeline:
- name: "path.to.MecabTokenizer"
# other components
Somebody tried something similar here, maybe you can use this or take is some sort of template.

I have added a custom component using Mecab tokenizer. It works fine for me for Japanese text.
Link: Rasa_Japanese

Related

Visualizing keywords from text using spaCY

I am using Textrank method for extracting keywords from text and I am able to print individual keywords along with their scores. But I am currently trying to output whole text with the keywords I extracted earlier be highlighted (encircled etc).
I'm not sure who your target audience is, but I think the simplest solution might be to programmatically generate hypertext (HTTP), for example, where the keywords are given a foreground/background color of your choice. In fact, this can see this as being quite useful.
SpaCy has visualization tools but I believe they are targetted at providing specific NLP visualizations. I don't think they offer what you want, which seems to be a canvas for present information.
Oh! If you want to hack a solution, you can try this:
Create a custom entity type in SpaCy and have SpaCy report your keywords as your new custom entity type. Then you can use the SpaCy Entity Visualizer to highlight your entities.

Named entity recognition - tagging tools

Does someone have a recommendation of tagging tool for NER types in raw text?
The input for the tool should be a library of text files(.txt simple format) , there should be a convenient UI for selecting words and set the tag/annotation fit to selection, the output should be structural representations of the tags(e.gs tart index , last index, tag in a JSON format)
Founderof LightTag here
We provide a super convenient interface to do span annotations such as named entity recognition, classifications and relationships.
You can work as one labeler or bring in a team and LightTag will disribute work between everyone automatically (no more selecting files and remembering what you labeled already) .
You can upload your own suggestions and let labelers use those, or use LightTags built in model.
Of course you can annotate at the character level and highlight subwords or multi word phrases.
You can try https://github.com/lasigeBioTM/MER (bash)
see the demo at http://labs.fc.ul.pt/mer/
Online tools:
I guess Dataturks' POS tool should work fine for your use case, you can just upload your data and specify the labels. The UI seems convenient enough.
Here is the link:
https://dataturks.com
It's an online tool, so you can work with multiple people to get the tagging done.
The exact output format you are looking for is not supported, but the format can easily be converted to what you are looking for, the output is like: word___LABEL word2___LABEL , so a simple 2-line script can convert it to start and end index.
Offline:
Another tool you can check out is prodigy, it's a downloadable software and does similar things. Just that you might be willing to pay for it upfront.
https://prodi.gy

Difference between tag and class in Stanford NER

I'm not sure what the difference between tag and class is?
NERFeatureFactory mentions:
t - tag
c - class
The NER FAQ seems to use the two terms interchangeably as well?
For example what does the following feature do?
t,c useTags
Many thanks in advance!
As Gabor said, tag is part of speech tag.
However, for c (or class): the features are defined over each class (so, John-NAME and John-PLACE are two different features).
useTags feature would be something like NNP-NAME, RB-PLACE, and so on.
My informed guess is that actually neither tag nor class refer to the NER tag, but rather "tag" is the part of speech tag, and class is the word class (e.g., brown cluster).

I want to use The GATE Predicate-Argument EXtractor Component (PAX)

I want to use The GATE Predicate-Argument EXtractor Component (PAX), but I can't figure out how to load the plugin on GATE developer (ver. 7)
Please help me.
How do I load it?
The MultiPaX plugin is a bit of a complex beast. You need to download the package from the homepage, unpack it (use something like 7-zip if you're on Windows) and then build it using Ant, which you will need to install separately - GATE used to bundle a copy of Ant but that is no longer the case since version 7. Once you have the plugin compiled and packaged you should be able to load it through the plugin manager the same as any other plugin.
However, to get anything useful out of the PR you need to build quite a complex pipeline including at least one of SUPPLE, MiniPar or the Stanford Parser PR. Getting these to work is itself a non-trivial task...
The predicate argument extractor is not developed by the core GATE team, so specific questions are probably better posted in the semanticsoftware.info forum (linked from the bottom of http://www.semanticsoftware.info/pax) rather than the usual GATE user list.

Specifying attribute types in Papyrus

I recently installed Papyrus and attempted to follow the tutorial for creating a model and generating Java code off it.
Unfortunately, I'm pretty much stuck right at the beginning. In the class definition, I added a static operation for main, and I'm able to specify its argument as an array:
in args: <Undefined> [*]
Unfortunately, it does not allow me to specify the type to be String. That is, I would like to specify it as follows:
in args: String [*]
No matter how I enter the type (or any type for that matter), the type reverts back to <Undefined>. I suspect something is wrong or missing with a profile it is supposed to use. Unfortunately, the documentation for this tool is rather sparse, and I cannot find an answer or solution to this.
Anyone has run into this same problem as well???
Have you imported the UML Primitive types package? This should allow you to define the type of the argument to Integer, String or Boolean.
In the model explorer, right click on your model, then choose 'Import package from registered library'
In the window that pops up pick 'UMLPrimitiveTypes' (or the Java ones, if you prefer that).
After this, it should work.
In the latest version of Papyrus, you have to go to model explorer, right click on your model and then choose Import. After that select "Import Registered Package." That will bring up a dialog with a list of registered packages. From that you can select either "UMLPrimitiveTypes" or "JavaPrimitiveTypes"
Papyrus consider that modeling is independent of languages and therefore no java type are usually possible. This is fully logical in a MDA approach but not in the real life :-)
There is an accelero plugin which is supposed to generate code from a diagram but it doesn't work with the latest Helios build so.....
The best is to do you class diagram and then manually code the related code. I am sure that your code will be better than the one you could get from accelero :-)
It's very simple, you can do it by just clicking on the attribute properties, then select type option, In the menu bar select Tree, then Primitive Types.
The String class is not able to see in java primitive types, you need import a package from java core library.
There is a java profile and library/package in Papyrus Software Designer extention.
You may install it via the market place.
More details: https://wiki.eclipse.org/Java_Code_Generation

Resources