Custom entity recognition using Azure Text Analytics API? - azure

Is is possible to define custom/special entities to be used for entity recognition within the Azure Text Analytics API?
NER (Names Entities Recognition) allows to discover a wide range of entities but for our purposes we're focusing on some model-specific entities (e.g. brand and product names) which we need to relate to the overall sentiment. General NER might not be enough for our purposes since we're looking for very specific appreciation/criticism terms during the topic generation.
The theme has already been presented in different flavors with no answers so far:
in 2016 it seemed to be an "upcoming" feature: Customizing the Named Entity Recogntition model in Azure ML
in 2018 someone was searching for much more specialized version capable to physically locate custom entities spatials position within documents: Documentation / Examples for Custom Entity Detection (Azure NLP / Text Analytics)

Related

Azure / Videoindexer AI API / Visual Recognition Classes list

Is it possible to retrieve the full list of default recognized classes of Azure Video Indexer?
Azure Video Analyzer for Media, a.k.a. Video Indexer, supports thousands of class labels for video frames classification referred to as Labels. Although the full list is not available online you can easily infer the classes relevant for your data with a few API calls... Feel free to reach out the customer support at: visupport#microsoft.com for additional assistance.

Regarding container of form recogniser,OCR and labeltool containers

We are trying to use the container preview of form recogniser,OCR and labeltool and have following questions:
Is there any software which can help us to classify similar kind of documents. This will help us to categorize document and create training dataset
Is there any way to give the model user-defined name. Following is output from model query API.It is difficult to tie it back to different kind of models:
{
"modelId": "f136f65b-bb94-493b-a798-a3e8023ea1b5",
"status": "ready",
"createdDateTime": "2020-05-06T21:35:58+00:00",
"lastUpdatedDateTime": "2020-05-06T21:36:06+00:00"
}
I can see models file stored in \output\subscriptions\global\models where /output directory shared container in docker compose file. Is it possible to import this model to new containers.
Models have json and gz file with the same nae as model id
I am also attaching docker compose file for your reference
Is there way to fine tune or update same custom model(same model id) with model training data
We were also trying the labeltool but it only takes Azure blob as input. Is it possible to provide input same as we do for training of form recognizer.
We are struggling to get this setup and if it is not resolved we might to start looking to alternatives.
Following are answers to your questions:
To classify documents you can use custom vision to build a document classifier or use text classification and OCR. In addition you can use the Form Recognizer train without labels run it on the training data and use the cluster option within the model to classify similar documents and pages in the training dataset.
Friendly Model name is not yet available in Form Recognizer, its a future feature on our roadmap but not available yet.
Models can't be copied between containers, you can use the same data-set to train a model in a different container. Models can be copied between subscriptions, resources and regions when using the Form Recognizer cloud service.
Each train creates a new model ID in order not to overwrite the previous model you can't update existing models.
Form Recognizer v2.0 release is not yet available in containers, only Form Recognizer v1.0 release is currently available in containers. Form Recognizer v2.0 will be also available in containers shortly. When using containers release all the data remains on premise and the labeling tool once available for the v2.0 containers release will also take as input a local or mounted disk and not blob.
Thanks !
Neta - MSFT

Is there any Azure based service similar to IBM Watson Knowledge Studio?

I tried to find some Azure service similar to IBM Watson Knowledge Studio but I failed. I'm looking for something I can train to analyze texts and retrieving entities, relations between entities and entities related sentiments.
Do you know if there is anything in Azure I could use to do that?
Yes, there is remote similarity between Azure Language Understanding (LUIS) and IBM Watson Knowledge Studio (WKS), but they are not substitutes.
The notable differences:
LUIS is for building chatbots - conversations with utterance data. WKS is for general unstructured text, usually much larger in volume than utterances. In this respect, LUIS is competing with IBM Conversation service, not with WKS and the IBM Watson services that run the WKS custom models - Watson Natural Language Understanding and Watson Discovery.
Because LUIS is built for processing utterances, it has much lower limits, compared to WKS. For example, LUIS limits the input text to 500 characters, while WKS processes text of up to 40,000 characters. LUIS also limits the Simple entities to 30, which may be ok for processing targeted utterances, but not for building high quality model for processing large documents and complex domains.
LUIS supports only customization of Entity Type mentions (in various forms, like Simple, Hierarchical, Composite, RegEx, List,.. - very similar to the WKS Entity types). WKS (and the runtime services that use the WKS custom models), on the other hand, supports Entity Relations - an important feature that helps you extract insights from the client-specific corpus that you cannot do with Entity mentions alone.
LUIS supports only a fraction of the languages that WKS supports. And the LUIS language support is partial - see https://learn.microsoft.com/en-us/azure/cognitive-services/luis/luis-supported-languages
LUIS, similarly to IBM Conversation service, is a runtime NLP service that allows customization in its tooling. WKS, on the other hand, is a stand alone customization SaaS offering, that was specifically designed to organize a team of domain subject matter experts (SME) and cognitive solution developers to transfer the SME domain knowledge into the custom model that is then deployed to and used by IBM Watson runtime services, like Natural Language Understanding and Discovery. In other words, while LUIS and IBM Conversion provide tooling for customizing the solution directly, WKS provides a separate environment with built-in methodology for managing customization projects and for annotation skill building.
LUIS, to my understanding, is offered as a multi-tenant public service. WKS is offered in both multi-tenant and isolated configurations. In that respect, WKS is suitable not only for the general public, but also for projects with sensitive client data.
In conclusion, there's no WKS equivalent (substitute) that I'm aware of. LUIS may be considered as the Azure alternative to IBM Conversation service, if your solution is built in Azure, but LUIS is not a substitute for IBM Watson Knowledge Studio.
So, it's very important to consider your use case (application domain), when choosing on which platform to build your solution.
Hope this helps.
Did you look at the Azure AI gallery? One common approach is to customise the solutions there for your particular requirements. Here's a search of all text-related items, which you can for example refine to be Microsoft content only.
I'm not aware of a single service that maps directly; the Text Analytics API for example just does language and phrase detection and sentiment analysis.
Have a look at Luis.ai...should do what you need.

How to implement BOT engine like WIT.AI for on an on-premise solution?

I want to build a chatbot for a customer service application. I tried SaaS services like Wit.Ai, Motion.Ai, Api.Ai, LUIS.ai etc. These cognitive services find the "intent" and "entities" when trained with the typical interactions model.
I need to build chatbot for on-premise solution, without using any of these SaaS services.
e.g Typical conversation would be as following -
Can you book me a ticket?
Is my ticket booked?
What is the status of my booking BK02?
I want to cancel the booking BK02.
Book the tickets
StandFord NLP toolkit looks promising but there are licensing constraints. Hence I started experimenting with the OpenNLP. I assume, there are two OpenNLP tasks involved -
Use 'Document Categorizer' to find out the intent
Use 'Named Entity Recognition' to find out entities
Once the context is identified, I will call my application APIS to build the response.
Is it a right approach?
How good OpenNLP is in parsing the text?
Can I use Facebook FASTTEXT library for Intent identification?
Is there any other open source library which can be helpful in building the BOT?
Will "SyntaxNet" be useful for my adventure?
I prefer to do this in Java. BUT open to node or python solution too.
PS - I am new to NLP.
Have a look at this. It says it is an Open-source language understanding for bots and a drop-in replacement for popular NLP tools like wit.ai, api.ai or LUIS
https://rasa.ai/
Have a look at my other answer for a plan of attack when using Luis.ai:
Creating an API for LUIS.AI or using .JSON files in order to train the bot for non-technical users
In short use Luis.ai and setup some intents, start with one or two and train it based on your domain. I am using asp.net to call the Cognitive Service API as outlined above. Then customize the response via some JQuery...you could search a list of your rules in a javascript array when each intent or action is raised by the response from Luis.
If your Bot is english based, then I would use OpenNLP's sentence parser to dump the customer input into a database (I do this today). I then use the OpenNLP tokenizer and push the keywords (less the stop words) and Parts of Speech into a database table for keyword analysis. I have a custom Sentiment model built for OpenNLP that will tag each sentence with a Pos, Neg, Neutral sentiment...You can then use this to identify negative customer service feedback. To build your own Sentiment model have a look at SentiWord.net and download their domain agnostic data file to build and train an OpenNLP model or have a look at this Node version...
https://www.npmjs.com/package/sentiword
Hope that helps.
I'd definitely recommend Rasa, it's great for your use case, working on-premise easily, handling intents and entities for you and on top of that it has a friendly community too.
Check out my repo for an example of how to build a chatbot with Rasa that interacts with a simple database: https://github.com/nmstoker/lockebot
I tried RASA, But one glitch I found there was the inability of Rasa to answer unmatched/untrained user texts.
Now, I'm using ChatterBot and I'm totally in love with it.
Use "ChatterBot", and host it locally using - 'flask-chatterbot-master"
Links:
ChatterBot Installation: https://chatterbot.readthedocs.io/en/stable/setup.html
Host Locally using - flask-chatterbot-master: https://github.com/chamkank/flask-chatterbot
Cheers,
Ratnakar
With the help of the RASA and Botkit framework we can build the onpremise chatbot and the NLP engine for any channel. Please follow this link for End to End steps on building the same. An awsome blog that helped me to create a one for my office
https://creospiders.blogspot.com/2018/03/complete-on-premise-and-fully.html
First of all any chatbot is going to be the program that runs along with the NLP, Its the NLP that brings the knowledge to the chatbot. NLP lies on the hands of the Machine learning techniques.
There are few reasons why the on premise chatbots are less.
We need to build the infrastructure
We need to train the model often
But using the cloud based NLP may not provide the data privacy and security and also the flexibility of including my business logic is very less.
All together going to the on premise or on cloud is based on the needs and the use case of the requirements.
How ever please refer this link for end to end knowledge on building the chatbot on premise with very few steps and easily and fully customisable.
Complete On-Premise and Fully Customisable Chat Bot - Part 1 - Overview
Complete On-Premise and Fully Customisable Chat Bot - Part 2 - Agent Building Using Botkit
Complete On-Premise and Fully Customisable Chat Bot - Part 3 - Communicating to the Agent that has been built
Complete On-Premise and Fully Customisable Chat Bot - Part 4 - Integrating the Natural Language Processor NLP
Disclaimer: I am the author of this package.
Abodit NLP (https://nlp.abodit.com) can do what you want but it's .NET only at present.
In particular you can easily connect it to databases and can provide custom Tokens that are queries against a database. It's all strongly-typed and adding new rules is as easy as adding a method in C#.
It's also particularly adept at turning date time expressions into queries. For example "next month on a Thursday after 4pm" becomes ((((DatePart(year,[DATEFIELD])=2019) AND (DatePart(month,[DATEFIELD])=7)) AND (DatePart(dw,[DATEFIELD])=4)) AND DatePart(hour,[DATEFIELD])>=16)

DDD and data export system

I'm a beginner in DDD and am facing a little problem with architecture.
Our system must be able to export business data in various formats (Excel, Word, PDF and other more exotic formats).
In your opinion, which layer must be responsible for the overall process of retrieving source data, exporting them in the target format and preparing the final result to the user ? I'm confusing between domain and application responsibilities.
And regarding the export subsystem, should implementations and their common interface contract belong to the infrastructure layer ?
Neither Application nor Domain layer or any other 'layer'. DDD is not a layered architecture. Search for onion architecture or ports and adapters pattern for mor on this subject.
Now to the core of your problem. The issue you are facing is a separate bounded context and should go to a separate component of your system. Lets call it Reporting. And as it's just a presentation problem, no domain logic - DDD is not suitable for it. Just make some SQL views, read them using NHibernate, LINQ2SQL, EF or even plain DataReaders and build your Word/whatever documents using a Builder pattern. No Aggregates, Repositories, Services or any other DDD building blocks.
You may want to go a bit further and make all data presentation in your application to be handled by a separate component. Thats CQRS.
I usually take the simple approach where possible.
The domain code implements the language of your domain - The nouns, verbs etc.
Application code creates poetry using this language.
So in your example the construction of the output is an application concern, while the construction of the bits that the application will talk about (since you don't mention it) would the the domain concern.
e.g. if the report consists of sales data, that things like dates, bills, orders etc. would be abstractly constructed in the domain, while the application would concerned with producing documents using these.

Resources