Best practice wit ai training

Best practice wit ai training - nlp

I am developing an apps that use wit ai as a service. Right now, I am having problems training it. In my apps I have 3 intents:
to call
to text
to send picture
Here are my example training:
Call this number 072839485 and text this number 0623744758 and send picture to this number 0834952849.
Call this number 072839485, 0834952849 and 0623744758
In my first training I labeled that sentence with all 3 intents, and 072839485 as phone_number with role to_call_phone_number, 0623744758 as phone_number with role to_text_phone_number and 0834952849 as phone_number with role to_send_pic_phone_number.
In my second training I labeled all the 3 numbers as phone_number with to_call_phone_number role.
After many training, the wit still output the wrong labelled. When the sentence like this:
Call this number 072637464, 07263485 and 0273847584
The wit says 072637464 is to_call_phone_number but 07263485 and 0273847584 are to_send_pic_phone_number.
Am I not correctly training it? Can some one give me some suggestions about the best practice to train wit?

There aren't many best practices out there for wit.ai training at the moment, but with this particular example in mind I would recommend the following:
Pay attention to the type of entity in addition to just the value. If you choose free-text or keyword, you'll get different responses from the wit engine. For example: in your training if the number is a keyword, it'll associate the particular number with the intent/role rather than the position. This is probably the reason your training isn't working correctly.
One good practice would be to train your bot with specific examples first which will provide the bot with more information (such as user providing keyword 'photograph' and number) and then general examples which will apply to more cases (such as your second example).
Think about the user's perspective and what would seem natural to them. Work with those training examples first. Generate a list of possible training examples labelling them from general to specific and then train intents/roles/entities based on those examples rather than thinking about intents and roles first.

Related

How to design a LSTM network , which accept multiple input

Here is the scenario, I want to create a contextual chatbot, which means the bot will answer or reply based on context. As an example
Input :["text": "it was really nice", "topic":movie]
Output:["text": "indeed,it was an awesome movie","topic":movie]
Whenever I have to consider the only one thing about the input, which is the sentence itself I can do it, all I need to do is to tokenize the sentences and feed into the input of LSTM. But how can I consider "topic"?
I have already prepared a dataset, in such a format.
I am using Keras to build such a bot.

I am not really sure what you want to build.
The first thing that comes to mind is a normal generativ lstm like this one
https://keras.rstudio.com/articles/examples/lstm_text_generation.html
wich generates text based on nietches works.
To use such a network you would need your training data in a question?, answer format.
And you would need to set your question as the seed.
You do not need to load the topic seperatly, as the concept of a neural net is that it learns on its own to understand the data.

how to identify nil intent using Intent Classification techniques

When we have to focus on only one domain (say weather) and we use a LSTM model to identify sub-intents inside weather using softmax classifier (which picks the sub-intent with highest score), what is the way to handle non-weather queries for which we want to say we don't have any answer? The problem is that there are too many outside domains and I don't know if it is feasible to generate data for all of them.

There is no really good way to do this.
In practice these are common approaches:
Build a class of examples of stuff you want to ignore. For a chatbot this might be greetings ("hello", "hi!", "how are you") or obscenities.
Create a confidence threshold and give an uncertain reply if all intents are below the threshold.

LDA vs Word2Vec. Which is the right solution for predicting recipients of a message?

I'm investigating various NLP algorithms and tools to solve the following problem; NLP newbie here, so pardon my question if it's too basic.
Let's say, I have a messaging app where users can send text messages to one or more people. When the user types a message, I want the app to suggest to the user who the potential recipients of the message are?
If user "A" sends a lot of text messages regarding "cats" to user "B" and some messages to user "C" and sends a lot of messages regarding "politics" to user "D", then next time user types the message about "cats" then the app should suggest "B" and "C" instead of "D".
So I'm doing some research on topic modeling and word embeddings and see that LDA and Word2Vec are the 2 probable algorithms I can use.
Wanted to pick your brain on which one you think is more suitable for this scenario.
One idea I have is, extract topics using LDA from the previous messages and rank the recipients of the messages based on the # of times a topic has been discussed (ie, the message sent) in the past. If I have this mapping of the topic and a sorted list of users who you talk about it (ranked based on frequency), then when the user types a message, I can again run topic extraction on the message, predict what the message is about and then lookup the mapping to see who can be the possible recipients and show to user.
Is this a good approach? Or else, Word2Vec (or doc2vec or lda2vec) is better suited for this problem where we can predict similar messages using vector representation of words aka word embeddings? Do we really need to extract topics from the messages to predict the recipients or is that not necessary here? Any other algorithms or techniques you think will work the best?
What are your thoughts and suggestions?
Thanks for the help.

Since you are purely looking at topic extraction from previous posts, in my opinion LDA would be a better choice. LDA would describe the statistical relationship of occurrences. Semantics of the words would mostly be ignored (if you are looking for that then you might want to rethink). But also I would suggest to have a look at a hybrid approach. I have not tried it myself but looks quiet interesting.
lda2vec new hybrid approach
Also, if you happen to try it out, would love to know your findings.

I think you're looking for recommender systems (Netflix movie suggestions, amazon purchase recommendations, ect) or Network analysis (Facebook friend recommendations) which utilize topic modeling as an attribute. I'll try to break them down:
Network Analysis:
FB friends are nodes of a network whose edges are friendship relationships. Calculates betweenness centrality, finds shortest paths between nodes, stores shortest edges as a list, closeness centrality is the sum of length between nodes.
Recommender Systems:
recommends what is popular, looks at users similar and suggests things that the user might be interested in, calculates cosine similarity by measuring angels between vectors that point in the same direction.
LDA:
topic modeler for text data - returns topics of interest might be used as a nested algorithm within the algorithms above.
Word2Vec:
This is a neccassary step in building an LDA it looks like this: word -> # say 324 then count frequency say it showed up twice in a sentence:
This is a sentence is.
[(1,1), (2,2), (3,1), (4,1), (2,2)]
It is a neural net you will probably have to use as a pre-processing step.
I hope this helps :)

predicting next event from averaging sequences

I am pretty new in ml so I am facing some difficulties realizing how could I use spark machine learning libraries with time series data that reflect to a sequence of events.
I have a table that contains this info:
StepN#, element_id, Session_id
Where step n# is the sequence in which each element appears, element_id is the element that has been clicked and session_id in which user session this happened.
It consists of multiple sessions and multiple element-sequence per session. i.e. one session will contain multiple lines of elements. Also each session would have the same starting and ending point.
My objective is to train a model that would use the element sequences observed to predict the next element that is most likely to be clicked. Meaning I need to predict the next event given the previous events.
(in other words I need to average users click behavior for a specific workflow so that the model will be able to predict the next most-relevant click based on the average)
From the papers and the examples I find online I understand that this makes sense when there is a single sequence of events that is meant to be used as an input for the training model.
In my case though, I have multiple sessions/instances of events (starting all at the same point) and I would like to train an averaging model. I find it a bit challenging though to understand how could that be approached using for example HMM in spark. Is there any practical example or tutorial that covers this case?
Thank you for spending the time to read my post. Any ideas would be appreciated!

This can also solve with frequent pattern mining. check this: https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html
In this situation, you can find frequent items that occurred frequently together. In the first step you teach the model what is frequent, Then for prediction step, the model can see some events and can predict the most common events to this event

Is it possible to supplement Naive Bayes text classification algorithm with author information?

I am working on a text classification project where I am trying to assign topic classifications to speeches from the Congressional Record.
Using topic codes from the Congressional Bills Project (http://congressionalbills.org/), I've tagged speeches that mention a specific bill as belonging to the topic of the bill. I'm using this as my "training set" for the model.
I have a "vanilla" Naive Bayes classifier working well-enough, but I keep feeling like I could get better accuracy out of the algorithm by incorporating information about the member of Congress who is making the speech (e.g. certain members are much more likely to talk about Foreign Policy than others).
One possibility would be to replace the prior in the NB classifier (usually defined as the proportion of documents in the training set that have the given classification) with speaker's observed prior speeches.
Is this worth pursuing? Are there existing approaches that have followed this same kind of logic? I'm a little bit familiar with the "author-topic models" that come out of Latent Dirichlet Allocation models, but I like the simplicity of the NB model.

There is no need to modify anything, simply add this information to your Naive Bayes and it will work just fine.
And as it was previously mentioned in the comment - do not change any priors - prior probability is P(class), this has nothing to do with actual features.
Just add to your computations another feature corresponding to the authorship, e.g. "author:AUTHOR" and train Naive Bayes as usual, ie. compute P(class|author:AUTHOR) for each class and AUTHOR and use it later on in your classification process.If your current representation is a bag of words, it is sufficient to add a "artificial" word of form "author:AUTHOR" to it.
One other option would be to train independent classifier for each AUTHOR, which would capture person-specific type of speech, for example - one uses lots of words "environment" only when talking about "nature", while other simply likes to add this word in each speach "Oh, in our local environment of ...". Independent NBs would capture these kind of phenomena.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string