Is AWS Sage Maker Auto Pilot suitable for NLP? - nlp

Is AWS Sage Maker Auto Pilot suitable for NLP?
We currently have a tensorflow model that does classification on input of a sequence of URLS (
We transform the URLs to Word vec and Char vec to feed it to the model).
Looking at Sage Maker Auto Pilot documentation it says that it works on input in tabular form.
I was wondering if we could use it to for our use case.

No. SageMaker AutoPilot doesn't support deep learning at the moment, only classification and regression problems on tabular data. Technically, I guess you could pass embeddings in CSV format, and pray that XGBoost figures them out, but I seriously doubt that this would deliver meaningful results :)
Amazon Comprehend does support fully managed custom classification models https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification.html. It may be worth taking a look at it.

Related

How can i do many things to configure data with keras

I am a beginner learning deep learning by Keras.
The ImageDataGenerator class in Keras and the flow_from_directory function made it easy to label images.
But all of themAs I configure the electory, there seems to be a limitation in configuring the dataset.
For example, if model want to learn ensemble by dividing the data into 5 folds,
then should i make many directory?
or to bootstrap the data and use the verification data as out of bag data, should I implement it myself?
I'd appreciate it if you could give me a little advice!
As a reference photo, I uploaded the process of labeling with the flow_from_directory function, (88 state classification model)

speech to text training for impaired voice

I want to train and use an ML based personal voice to text converter for a highly impaired voice, for a small set of 300-400 words. This is to be used for people with voice impairment. But cannot be generic because each person will have a unique voice input for words, depending on their type of impairment.
Wanted to know if there are any ML engines which allow for such a training. If not, what is the best approach to go about it.
Thanks
Most of the speech recognition engines support training (wav2letter, deepspeech, espnet, kaldi, etc), you just need to feed in the data. The only issue is that you need a lot of data to train reliably (1000 of samples for each word). You can check Google Commands dataset for example of how to train from scratch.
Since the training dataset will be pretty small for your case and will consist of just a few samples, you can probably start with existing pretrained model and finetune it on your samples to get best accuracy. You need to look on "few short learning" setups.
You can probably look on wav2vec 2.0 pretrained model, it should be effective for such learning. You can find examples and commands for fine-tuning and inference here.
You can also try fine-tuning Japser models in Google Commands for NVIDIA NEMO. It might be a little less effective but could still work and should be easier to setup.
I highely recommend watching the youtube original series "The age of AI"'s First season, episode two.
Basically, google already done this for people who can't really form normal words with impared voice. It is very interesting and speaks a little bit about how they done and doing that with ML technologies.
enter link description here

What is the process to create an FAQ bot using Spacy?

I am beginner to Machine Learning and NLP, I have to create a bot based on FAQ dataset, Each FAQ dataset excel file contains 2 columns "Questions" and its "Answers".
Eg. A record from an excel file (A question & it's answer).
Question - What is RASA-NLU?
Answer - Rasa NLU is trained to identify intent and entities. Better the training, better the identification...
We have 3K+ excel files which has around 10K to 20K such records each excel.
To implement the bot, I would have followed exactly this FAQ bot approach which uses RASA-NLU, but the RASA,Chatterbot also Microsoft's QnA maker are not allowed in my organization.
And Spacy does the NER extraction perfectly for me, so I am looking for a bot creation using Spacy. but I don't know how to proceed further after extracting the entities. (IMHO, I will have to predict the exact question from dataset (and its answer from knowlwdge base) from user query to the bot)
I don't know what NLP algorithm/ ML process to be used or is there any easiest way to create that FAQ bot using extracted NERs.
One way to achieve your FAQ bot is to transform the problem into a classification problem. You have questions and the answers can be the "labels". I suppose that you always have multiple training questions which map to the same answer. You can encode each answer in order to get smaller labels (for instance, you can map the text of the answer to an id).
Then, you can use your training data (the questions) and your labels (the encoded answers) and feed a classifier. After the training your classifier can predict the label of unseen questions.
Of course, this is a supervised approach, so you will need to extract features from your training sentences (the questions). In this case, you can use as a feature the bag-of-word representations and even include the named entities.
An example of how to do text classification in spacy is available here: https://www.dataquest.io/blog/tutorial-text-classification-in-python-using-spacy/

Is there any way to classify text based on some given keywords using python?

i been trying to learn a bit of machine learning for a project that I'm working in. At the moment I managed to classify text using SVM with sklearn and spacy having some good results, but i want to not only classify the text with svm, I also want it to be classified based on a list of keywords that I have. For example: If the sentence has the word fast or seconds I would like it to be classified as performance.
I'm really new to machine learning and I would really appreciate any advice.
I assume that you are already taking a portion of your data, classifying it manually and then using the result as your training data for the SVM algorithm.
If yes, then you could just append your list of keywords (features) and desired classifications (labels) to your training data. If you are not doing it already, I'd recommend using the SnowballStemmer on your training data features.

opennlp sample training data for disease

I'm using OpenNLP for data classification. I could not find TokenNameFinderModel for disease here. I know I can create my own model but I was wondering is there any large sample training data available for disease?
You can easily create your own training data-set using the modelbuilder addon and follow some rules as mentioned here to train create a good NER model.
you can find some help using modelbuilder addon here.
It is basically, you put all the information in a text file and the NER entities in another. The addon searches for a paticular entity and replace it with the required tag. Hence producing the tagged data. It must be pretty easy to use this tool!
Hope this helps!

Resources