I'm trying to use enviroments in my dialogflow agent. And I faced with a restriction that make it imposible to get training phases from intent when using listIntent with enviroment path.
Why do I need training phrases? I'm trying to get a list of intents that I want to use as faqs. Also I need to have names of these faqs on two languages. I found only one way to store these names - in training phrases. When I started using enviroments the INTENT_VIEW_FULL option does not work and I can't form faq list anymore.
May be there is another possible solution to store faq (intent) names in two languages? Or there is a way to get training phrases when using enviroments?
Related
I'm new to NLP. I am looking for recommendations for an Annotation tool to create a labeled NER dataset from raw texts.
In details:
I'm trying to create a labeled data set for specific types of Entities in order to develop my own NER project (rule based at first).
I assumed there will be some friendly frameworks that allows create tagging projects, tag text data, create a labeled dataset, and even share projects so several people could work on the same project, but I'm struggling to find one (I admit "friendly" or "intuitive" are subjective, yet this is my experience).
So far I've tried several Frameworks:
I tried LightTag. It makes the tagging itself fast and easy (i.e. marking the words and giving them labels) but the entire process of creating a useful dataset is not as intuitive as I expected (i.e. uploading the text files, split to different tagging objects, save the tags, etc.)
I've installed and tried LabelStudio and found it less mature then LightTag (don't mean to judge here :))
I've also read about spaCy's Prodigy, which offers a paid annotation tool. I would consider purchasing it, but their website only offers a live demo of the the tagging phase and I can't access if their product is superior to the other two products above.
Even in StackOverflow the latest question I found on that matter is over 5 years ago.
Do you have any recommendation for a tool to create a labeled NER dataset from raw text?
⚠️ Disclaimer
I am the author of Acharya. I would limit my answers to the points raised in the question.
Based on your question, Acharya would help you in creating the project and upload your raw text data and annotate them to create a labeled dataset.
It would allow you to mark records individually for train or test in the dataset and would give data-centric reports to identify and fix annotation/labeling errors.
It allows you to add different algorithms (bring your own algorithm) to the project and train the model regularly. Once trained, it can give annotation suggestions from the trained models on untagged data to make the labeling process faster.
If you want to train in a different setup, it allows you to export the labeled dataset in multiple supported formats.
Currently, it does not support sharing of projects.
Acharya community edition is in alpha release.
github page (https://github.com/astutic/Acharya)
website (https://acharya.astutic.com/)
Doccano is another open-source annotation tool that you can check out https://github.com/doccano/doccano
I have used both DOCCANO (https://github.com/doccano/doccano) and BRAT (https://brat.nlplab.org/).
Find the latter very good and it supports more functions. Both are free to use.
I am using two different entity extraction methods (https://rasa.com/docs/nlu/entities/) while building my NLP model in the RASA framework to build a chatbot.
The bot should handle different questions which have custom entities as well as some general ones like location or organisation.
So I use both components ner_spacy and ner_crf to create the model. After that I build a small helper script in python to evaluate the model performance. There I noticed that the model struggles to choose the correct enity.
For example for a word 'X' it choosed the pre-defined enity 'ORG' from SpaCy, but it should be recogniced as a custom enity which I defined in the training data.
If I just use the ner_crf extractor I face huge problems in identifing location enities like capitals. Also one of my biggest problems are single answer enities.
Q : "What´s your favourite animal?"
A : Dog
My model is not able to extract this single entity 'animal' for this single answer. If I answer this question with two words like 'The Dog', the model has no problems to extract the animal entity with the value 'Dog'.
So my question is, is it clever to use two different components to extract entities? One for custom enities and the other one for pre-defined enities.
If I use two methods, what´s the mechanism in the model which extractor is used?
By the way, currently I´m just testing things out, so my training samples are not that huge it should be (less then 100 examples). Could the problem been solved if I have much more training examples?
You are facing 2 problems here. I am suggesting few ways that i found helpful.
1. Custom entity recognition:
To solve this you need to add more training sentences with all possible lengths of entities. ner_crf is going to predict better when there are identifiable markers around entities (e.g. prepositions)
2. Extracting entities from single word answer :
As a workaround, i suggest you to do below manipulations on client end.
When you are sending question like What´s your favorite animal?, append a marker to question to indicate to client that a single answer is expected. e.g.
You can send ##SINGLE## What´s your favorite animal? to client.
Client can remove the ##SINGLE## from question and show it to user. But when client sends user's response to server, it doesn't send Dog, it send something like User responded with single answer as Dog
You can train your model to extract entities from such an answer.
Both DialogFlow and Google Cloud NL (Natural Language) are under Google, and to me they are very similar. Does anyone know any specific on their differences and whether Google will consolidate into one product? If I am a new developer to use the features, which one I should pick?
I search around and cannot find any satisfactory answers.
Thanks!
While they are vaguely similar, since they both take text inputs, the results from each are somewhat different.
By default, GCNL doesn't require you to provide any training phrases at all. It takes any sorts of textual input and lets you do things such as sentiment analysis, parts of speech analysis, and sentence structure analysis on the phrase.
If you are expecting very free-form inputs, then GCNL is very appropriate for what you want.
On the other hand, Dialogflow requires that you provide training phrases that are associated with each Intent and possible parameters for some of the words in those phrases. It then tries to take the input and determine which Intent matches that input and how the parameters match.
If you have a more narrow set of commands, and just want a way to more flexibly have people issue those commands in a conversation, Dialogflow is more appropriate.
It is unlikely the two will ever be merged. Dialogflow is well tuned to make conversational interfaces easier to develop, while GCNL is more open-ended, and thus more complex.
I am making a chat bot to answer questions on a particular subject(example, physics). How would you structure all the possible questions as intent in dialogflow?
I am considering the following 2 methods,
Methods:
make each question as an unique intent.
group all the questions into one "asking questions" intent and use entity to identify the specific question being asked.
Pros:
Dialogflow can easily match users input to the specific questions using low confidence score threshold, and can give multiple training phrases per question.
Only need one "asking questions" intent, neater and maintaining it is easier.
Cons:
There will be tons of intents, and maintaining it might be a nightmare. Might also reach the max number of intents.
Detecting entity might be more strict and less robust.
I would suggest you to try Knowledge Base feature of DialogFlow.
You can give multiple web-page links from where it can gather all the questions, or you can manually prepare a list and upload it to DialogFlow.
That way you don't need to make it in separate intents, it will try to match it automatically.
Let me know if you have any confusion.
This looks like an FAQ type chatbot. You can develop the chatbot in 2 ways:
Use Prebuilt Agents - Go to prebuilt agent and select and import FAQ and add your intents.
Use Knowledge Base approach - This is in Beta mode right now, but super easy to build.
a. You need to enable Beta Features from the agent settings
b. Go to Knowledge Base on the left menu, create a new document and upload CSV file (Q and A). You can also provide a link for Q/A if you have.
Check out the documentation for more details.
Knowledge Base seems to be the best way, but it only supports English content
Is it possible to use DialogFlow to simply parse some text and return the entities within that text?
I'm not interested in a conversation or bot-like behaviour, simply text in and list of entities out.
The entity recognition seems to be better with DialogFlow than Google Natural Language Processing and the ability to train might be useful also.
Cheers.
I've never considered this... but yeah, it should be possible. You would upload the entities with synonyms. Then, remove the "Default Fallback Intent", and make a new intent, called "catchall". Procedurally generate sentences with examples of every entity being mentioned, alone or in combination (in whatever way you expect to need to extract them). In "Settings", change the "ML Settings" so the "ML Classification Threshold" is 0.
In theory, it should now classify every input as "catchall", and return all the entities it finds...
If you play around with tagging things as sys.any, this could be pretty effective...
However, you may want to look into something that is built for this. I have made cool stuff with Aylien's NLP API. They have entity extraction, and the free tier gives you 1,000 hits per day.
EDIT: If you can run some code, instead of relying on SaaS, you could check out Rasa NLU for Entity Extraction. With a SpaCy backend it would do well on recognizing pre-trained entities, and with a different backend you can use custom entities.