Conversational Data for building a chat bot - nlp

I am building a chat bot with rasa-nlu. I went through the tutorial and I have built a simple bot. However, I need lots of training data for building a chat bot that is able to book a taxi. So I need data to build a specific bot.
Is there a repository, or corpus, for booking a taxi?
Or is there a way to generate this kind of dataset?

This is a blog post from one of the founders of Rasa and I think it's got some really excellent advice. I think you're going about it the wrong way asking for a pre-built training set. Start it yourself, then add friends, etc until you've built a training set that works best for your bot.
Put on your robot costume
Beyond that the Rasa docs have this under improving model performance
When the rasa_nlu server is running, it keeps track of all the
predictions it’s made and saves these to a log file. By default log
files are placed in logs/. The files in this directory contain one
json object per line. You can fix any incorrect predictions and add
them to your training set to improve your parser.
I think you'll be surprised how far you can get with just the training set you can come up with yourself.
Good luck on finding the corpus, but either way hope these links and snippets helped.

One method of doing this is, head over to LUIS.AI
Login using Office 365, Make your own Taxi Booking App, by giving in Intents and Utterances like below:
Now after training and publishing the model, download the corpus like below:
Now, after downloading the corpus, it will look something like this:
Install RASA NLU, I have Windows 8.1 on my machine, so the steps are as follows:
These are the steps to configure RASA:
First install:
Anaconda 4.3.0 64-bit Windows for installing Python 3.6 interpreter: https://repo.continuum.io/archive/Anaconda3-4.3.0-Windows-x86_64.exe
&
Python Tools for Visual Studio 2015: https://ptvs.azureedge.net/download/PTVS%202.2.6%20VS%202015.msi
Next, install the following packages in this order in administrative mode in command prompt:
Spacy Machine Learning Package: pip install -U spacy
Spacy English Language Model: python -m spacy download en
Scikit Package: pip install -U scikit-learn
Numpy package for mathematical calculations: pip install -U numpy
Scipy Package: pip install -U scipy
Sklearn Package for Intent Recognition: pip install -U sklearn-crfsuite
NER Duckling for better Entity Recognition with Spacy: pip install -U duckling
RASA NLU: pip install -U rasa_nlu==0.10.4
After installing all the above packages successfully, make a spaCy configuration file which will be read by RASA, like as follows:
{
"project": "Travel",
"pipeline": "spacy_sklearn",
"language": "en",
"num_threads": 1,
"max_training_processes": 1,
"path": "C:\\Users\\Kunal\\Desktop\\RASA\\models",
"response_log": "C:\\Users\\Kunal\\Desktop\\RASA\\log",
"config": "C:\\Users\\Kunal\\Desktop\\RASA\\config_spacy.json",
"log_level": "INFO",
"port": 5000,
"data": "C:\\Users\\Kunal\\Desktop\\RASA\\data\\FlightBotFinal.json",
"emulate": "luis",
"spacy_model_name": "en",
"token": null,
"cors_origins": ["*"],
"aws_endpoint_url": null
}
Next, Make a directory structure like this:
data folder -> Will contain all LUIS formatted corpus
models -> Will contain all trained models
logs -> Will contain active learning logs and RASA framework logs
Like this,
Now, make batch file scripts for Training and Starting RASA NLU Server.
Make a TrainRASA.bat by Notepad or Visual Studio Code and write this:
python -m rasa_nlu.train -c config_spacy.json
pause
Now make a StartRASA.bat by Notepad or Visual Studio Code and write this:
python -m rasa_nlu.server -c config_spacy.json
pause
Now train and start RASA Server by clicking on the batch file scripts that you just now made.
Now, everything is ready, just fire up chrome and issue a HTTP GET request to your enpoint /parse
Like: http://localhost:5000/parse?q=&project=
You will get a JSON response that corresponds to LUISResult class of Bot Framework C#.
Now handle the business logic you want to perform after doing that.
Alternatively, You can take a look at RASA Core, it was mainly built for this purpose.
RASA Core, which uses machine learning to build dialogs instead of
simple if-else statements.

The below link contains datasets relevant for commercial chatbot applications ('human-machine' dialogues). It's a fairly comprehensive collection of both human-human and human-machine text dialogue datasets, as well as audio dialogue datasets. https://breakend.github.io/DialogDatasets/

We did face the same problem while trying to build a love relationship coach bot. Long story short, we decided to create a simple tool to collect data from our friends, our colleagues or people on Mechanical Turk: https://chatbotstrap.io.
The idea is to create polls like this one: https://chatbotstrap.io/en/project/q5pimyskbhna2rm?language=en&nb_scenarios=10
and send them to anyone you know. With that solution, we were able to build a dataset of more than 6000 sentences divided in 10 intents in a few days.
The tool is free as long as you agree that the dataset constructed with it can be opensourced. They are also payed plans if you prefer to be the sole beneficiary of the data you collect.

Related

Logging and Fetching Run Parameters in AzureML

I am able to log and fetch metrics to AzureML using Run.log, however, I need a way to also log run parameters, like Learning Rate, or Momentum. I can't seem to find anything in the AzureML Python SDK documentation to achieve this. However, if I use MLflow's mlflow.log_param, I am able to log parameters, and they even nicely show up on the AzureML Studio Dashboard (bottom right of the image):
Again, I am able to fetch this using MLflow's get_params() function, but I can't find a way to do this using just AzureML's Python SDK. Is there a way to do this directly using azureml?
The retrieving of log run parameters like Learning Rate, or Momentum is not possible with AzureML alone. Because it was tied with MLFlow and azureml-core. without those two involvements, we cannot retrieve the log run parameters.
pip install azureml-core mlflow azureml-mlflow
Need to install these three for getting run parameters. Link

Building Python OCR using machine learning

There are a ton of questions dealing with OCR and Machine Learning, I am looking for guidance building my own from scratch.
I have an obscene number of photographs that contain text pertaining to the feature in the photo. The text is the latitude, longitude and id of the feature. I am looking for a way to extract this information into a text file to feed into my GIS.
I am sure Tesseract and Pytesseract would do exactly what I want, I however have a blocker in that I cannot load additional software onto the PC I am working on.
My PC is connected to a strictly controlled and secure network. I cannot install new software. I can however “pip install” any Python libraries needed, using a mobile hotspot. I have installed the Pytesseract library in Python. However if I have understood there is a dependency requiring a windows install file to be downloaded and installed before this works.
So I have decided to try (as a side project) create my own OCR model using Python and whatever libraries I need. The only issue is, there is a ton of information online and trying to find a focused and easy to follow process is not easy.
I am looking for resources detailing step by step what I need to do to create a training dataset, train a model and feed the images into the train model to get an output that makes sense.
I have been using OpenCv to process an image (crop, filter etc) to get bounding boxes of all the identifiable text in the test image. I am not sure where to go from there.
Are there any recommended tutorials online / resources that might make sense to a complete novice? I am using Python 3.5.

How to specify pytorch as a package requirement on windows?

I have a python package which depends on pytorch and which I’d like windows users to be able to install via pip (the specific package is: https://github.com/mindsdb/lightwood, but I don’t think this is very relevant to my question).
What are the best practices for going about this ?
Are there some project I could use as examples ?
It seems like the pypi hosted version of torch & torchvision aren’t windows compatible and the “getting started” section suggests installing from the custom pytorch repository, but beyond that I’m not sure what the ideal solution would be to incorporate this as part of a setup script.
What are the best practices for going about this ?
If your project depends on other projects that are not distributed through PyPI then you have to inform the users of your project one way or another. I recommend the following combination:
clearly specify (in your project's documentation pages, or in the project's long description, or in the README, or anything like this) which dependencies are not available through PyPI (and possibly the reason why, with the appropriate links) as well as the possible locations to get them from;
to facilitate the user experience, publish alongside your project a pre-prepared requirements.txt file with the appropriate --find-links options.
The reason why (or main reason, there are others), is that anyone using pip assumes that (by default) everything will be downloaded from PyPI and nowhere else. In other words anyone using pip puts some trust into pypi.org as a source for Python project distributions. If pip were suddenly to download artifacts from other sources, it would breach this trust. It should be the user's decision to download from other sources.
So you could provide in your project's documentation an example of requirements.txt file like the following:
# ...
torch===1.4.0 --find-links https://download.pytorch.org/whl/torch_stable.html
torchvision===0.5.0 --find-links https://download.pytorch.org/whl/torch_stable.html
# ...
Update
The best solution would be to help the maintainers of the projects in question to publish Windows wheels on PyPI directly:
https://github.com/pytorch/pytorch/issues/24310
https://github.com/pytorch/vision/issues/1774
https://pypi.org/help/#file-size-limit

Can't Import Bert_Text after installing it successfully

Bert is very powerful model for text classification but implementation of bert requires much more code than any other model. bert-text is pypi package to provide developer a ready-to-use solution.I have installed it properly.When I have tried to import ,it is throwing error ModuleNotFoundError: No module named 'bert_text'.I have properly written the name bert_text.
I have tried it in Kaggle,Colab and local machine but the error is same.
Hey as this is a refactor made by Yan Sun, This issue is already pending, you can go to this link and subscribe for an update when the developers will provide its solution. https://github.com/SunYanCN/bert-text/issues/1

Heroku Deployment error-Nltk Unable to download

I deployed my project on Heroku with heroku/python as Buildpack then with github link from Learn more section of image as Buildpack. It is not working
with any of the buildpacks.
Please help me out
It seems you should create a nltk.txt file to download the corpora that you are interested in, as mentioned in the link.
In order to use it, you have to download corpora
and make it available to your application.
This is not required by NLTK, so they are simply letting you know that no corpora will be downloaded.
you can go to http://www.nltk.org/nltk_data/ to see a list of corpora available or in a python terminal run:
>>> import nltk
>>> nltk.download()
Then simply choose and install what you want.

Resources