Is it possible to use DialogFlow to simply parse some text and return the entities within that text?
I'm not interested in a conversation or bot-like behaviour, simply text in and list of entities out.
The entity recognition seems to be better with DialogFlow than Google Natural Language Processing and the ability to train might be useful also.
Cheers.
I've never considered this... but yeah, it should be possible. You would upload the entities with synonyms. Then, remove the "Default Fallback Intent", and make a new intent, called "catchall". Procedurally generate sentences with examples of every entity being mentioned, alone or in combination (in whatever way you expect to need to extract them). In "Settings", change the "ML Settings" so the "ML Classification Threshold" is 0.
In theory, it should now classify every input as "catchall", and return all the entities it finds...
If you play around with tagging things as sys.any, this could be pretty effective...
However, you may want to look into something that is built for this. I have made cool stuff with Aylien's NLP API. They have entity extraction, and the free tier gives you 1,000 hits per day.
EDIT: If you can run some code, instead of relying on SaaS, you could check out Rasa NLU for Entity Extraction. With a SpaCy backend it would do well on recognizing pre-trained entities, and with a different backend you can use custom entities.
Related
I'm trying to ascertain the "right tool for the job" here, and I believe Cognitive Services can do this but without disappearing down an R&D rabbit-hole I thought I'd make sure I was tunnelling in the right direction first.
So, here is the brief:
I have a collection of known existing phrases which I want to look for, but these might be written in slightly different ways, be that grammar or language.
I want to be able to parse a (potentially large) volume of text to scan and look for those phrases so that I can identify them.
For example, my phrase could be "the event will be in person" but that also needs to identify different uses of language; for example "in-person event", "face to face event", or "on-site event" - as well as the various synonyms and variations you can get with such things.
LUIS initially appeared to be the go-to tool for this kind of thing, and includes the ability to write your own Features (aka Phrase Lists) to augment the model, but it isn't clear whether that would hit the brief - LUIS appears to be much more about "intent" and user interaction (for example building a chat Bot, or understanding intent from emails).
Text Analytics also seems a likely candidate, but again seems more focused about identifying "entities" (such as people / places / organisations) rather than a natural language "phrase" - would this tool work if I was defining my own "Topics" or is that really just barking up the wrong tree?
.. or ... is there actually something else I should be looking at completely different?
At this point - I'm really looking for a "which tool should I spend lots of time learning about".
Thanks all in advance - I appreciate this is a fairly open-ended requirement.
It seems your scenario aligns more with our text analytics service. I was going to recommend Key Phrase Extraction API which evaluates unstructured text and returns a list of key phrases. However, since you require to use known (custom) phrase list, it may not be the solution you're looking for. We currently don't support custom key phrase extraction today, however it's on our roadmap. If interested, we can connect you with the product team to learn more about your scenario.
Updated:
Please try custom NER capability.
I am currently developing a chatbot that recommends theater plays to the user.
It is working pretty well, but now I want to enable the user to get recommendations based on the type of theater plays (like funny, dramatic, sad).
Nevertheless, as I do not know how exactly the user is phrasing the request also synonyms might be used (funny: witty, humourous, ...)
What is a good solution to get these types from the user's request in a normalized way?
Typically I would use the List entity, but then I have to insert all synonyms for each possible value by myself. Is there a way how i can define my "normalized" values and synonyms are automatically matched by LUIS (and improved by further training of the model)
I am using two different entity extraction methods (https://rasa.com/docs/nlu/entities/) while building my NLP model in the RASA framework to build a chatbot.
The bot should handle different questions which have custom entities as well as some general ones like location or organisation.
So I use both components ner_spacy and ner_crf to create the model. After that I build a small helper script in python to evaluate the model performance. There I noticed that the model struggles to choose the correct enity.
For example for a word 'X' it choosed the pre-defined enity 'ORG' from SpaCy, but it should be recogniced as a custom enity which I defined in the training data.
If I just use the ner_crf extractor I face huge problems in identifing location enities like capitals. Also one of my biggest problems are single answer enities.
Q : "What´s your favourite animal?"
A : Dog
My model is not able to extract this single entity 'animal' for this single answer. If I answer this question with two words like 'The Dog', the model has no problems to extract the animal entity with the value 'Dog'.
So my question is, is it clever to use two different components to extract entities? One for custom enities and the other one for pre-defined enities.
If I use two methods, what´s the mechanism in the model which extractor is used?
By the way, currently I´m just testing things out, so my training samples are not that huge it should be (less then 100 examples). Could the problem been solved if I have much more training examples?
You are facing 2 problems here. I am suggesting few ways that i found helpful.
1. Custom entity recognition:
To solve this you need to add more training sentences with all possible lengths of entities. ner_crf is going to predict better when there are identifiable markers around entities (e.g. prepositions)
2. Extracting entities from single word answer :
As a workaround, i suggest you to do below manipulations on client end.
When you are sending question like What´s your favorite animal?, append a marker to question to indicate to client that a single answer is expected. e.g.
You can send ##SINGLE## What´s your favorite animal? to client.
Client can remove the ##SINGLE## from question and show it to user. But when client sends user's response to server, it doesn't send Dog, it send something like User responded with single answer as Dog
You can train your model to extract entities from such an answer.
Both DialogFlow and Google Cloud NL (Natural Language) are under Google, and to me they are very similar. Does anyone know any specific on their differences and whether Google will consolidate into one product? If I am a new developer to use the features, which one I should pick?
I search around and cannot find any satisfactory answers.
Thanks!
While they are vaguely similar, since they both take text inputs, the results from each are somewhat different.
By default, GCNL doesn't require you to provide any training phrases at all. It takes any sorts of textual input and lets you do things such as sentiment analysis, parts of speech analysis, and sentence structure analysis on the phrase.
If you are expecting very free-form inputs, then GCNL is very appropriate for what you want.
On the other hand, Dialogflow requires that you provide training phrases that are associated with each Intent and possible parameters for some of the words in those phrases. It then tries to take the input and determine which Intent matches that input and how the parameters match.
If you have a more narrow set of commands, and just want a way to more flexibly have people issue those commands in a conversation, Dialogflow is more appropriate.
It is unlikely the two will ever be merged. Dialogflow is well tuned to make conversational interfaces easier to develop, while GCNL is more open-ended, and thus more complex.
Is it possible to evaluate how well my model extracts entities (and maps synonym values) in Rasa NLU?
I have tried the rasa_nlu -evaluate mode however, it seems to only work for intent classification, although my JSON data file contains entities information and I'd really like to know if my entity extraction is up to the mark given various scenarios. I've used Tracy to generate test dataset.
Actually yes - you should get the score to you entities as well.
Are you sure you added some to your training data?
do you have it NER algo that fetches them? something like this?
pipeline:
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
batch_size: 64
epochs: 1500
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "ner_crf"
ner_crf is conditional random field for extracting the "name entity recognition"
To make sure you follow the model building correctly have a look at this tutorial:
https://hackernoon.com/build-simple-chatbot-with-rasa-part-1-f4c6d5bb1aea
As the documentation says https://rasa.com/docs/nlu/0.12.0/evaluation/, if your are using either ner_crf or ner_duckling, the evaluation method automatically takes entity extraction performance unto account. If you only use ner_synonyms the evaluate method won't compute an output table.
Other possible pitfalls could be:
If you parse a single sentence including a desired entity, does your trained model extract an entity? This could be a clue to the situation that your model was not able to evolve a pattern recognizing entities.
Also a problem could be that by randomly splitting the data into train and test set, there's no entity in your test set to extract. Your algorithm could have learned the pattern but is not forced to apply this pattern. Did you check wether your test set contains entities?
If I understand right, perhaps you are interested in something like https://github.com/RasaHQ/rasa_nlu/issues/1472? So, this issue was written because for intents you could get overall score and you could see how each intent was classified, but you could only get the overall score for entities and not how each entity was classified.
So in short, this is still an open issue and not possible in Rasa. However, it was an issue I was asked to look at just yesterday, so I will let you know if I make any progress on it.