How to correctly use LUIS ML-features? - nlp

I just stumbled over the new "ML-features" in LUIS and I am not sure if I really understand how to use them correctly. The documentation seems very abstract and vague to me:
https://learn.microsoft.com/de-de/azure/cognitive-services/luis/luis-concept-feature
Besides a good general explanation a solution for the following example would be very welcome:
Example
Intent: OpenABox
Sample utterances: "open the green box", "open the azure box".
Entity: ColorEntity (no prebuilt entity).
The color should understand "green", "blue", "azure" and "olive", where "olive" should be regarded as synonym to "green" and "azure" to "blue".
Solution Proposal
I assume you would have to
Add an intent
Add a list-entity, that lists all colors and assigns their synonyms?
Add a phrase list, that again lists some, but maybe not all, colors, without respect to their meaning?
Make the ML-feature global?
Mark the values as interchangable?
Add a ML-entity, and assign the list entity as well as the phrase list as features?
Make the list-entity-feature required?
Add sample utterances and mark the entities with the list-entity? Or with the ML-entity? Or both?
Add the ML-Entity as feature to the intent? Or the phrase list? Or the list-entity? Or none at all?
Is it correct, that there is no way to confirm the correct resolution of "olive" to its canonical form "green" using the test panel? So I have to use the API to test this?
The Model
This model has been created as described above. It seems to do its job. But is this really the optimal way to do it? There seems to be a lot of redundancy in there.
{
"luis_schema_version": "7.0.0",
"intents": [
{
"name": "None",
"features": []
},
{
"name": "OpenABox",
"features": [
{
"modelName": "ColorMLEntity",
"isRequired": false
}
]
}
],
"entities": [
{
"name": "ColorMLEntity",
"children": [],
"roles": [],
"features": [
{
"featureName": "ColorPhraseList",
"isRequired": false
},
{
"modelName": "ColorListEntity",
"isRequired": true
}
]
}
],
"hierarchicals": [],
"composites": [],
"closedLists": [
{
"name": "ColorListEntity",
"subLists": [
{
"canonicalForm": "green",
"list": [
"olive"
]
},
{
"canonicalForm": "blue",
"list": [
"azure"
]
}
],
"roles": []
}
],
"prebuiltEntities": [],
"utterances": [
{
"text": "open the azure box",
"intent": "OpenABox",
"entities": [
{
"entity": "ColorMLEntity",
"startPos": 9,
"endPos": 13,
"children": []
}
]
},
{
"text": "open the green box",
"intent": "OpenABox",
"entities": [
{
"entity": "ColorMLEntity",
"startPos": 9,
"endPos": 13,
"children": []
}
]
}
],
"versionId": "0.1",
"name": "ColorTest",
"desc": "",
"culture": "en-us",
"tokenizerVersion": "1.0.0",
"patternAnyEntities": [],
"regex_entities": [],
"phraselists": [
{
"name": "ColorPhraseList",
"mode": true,
"words": "green,blue,azure,olive",
"activated": true,
"enabledForAllModels": false
}
],
"regex_features": [],
"patterns": [],
"settings": []
}

Features are supposed to be signals relevant to an intent, or an entity.
So for this example,
Create an ML entity "ColorEntity",
Label the utterances
Add ColorEntity as a feature for the intent
Then you can add a feature to ColorEntity, either a list entity or phrase list, no need for both.

Related

Convert data from spreadsheets to nested json

I'm using mongodb in my project. And I'll import about 20,000 products to the database in the end. So, I tried to write a script to convert the data from the spreadsheet to json, and then upload them to the mongodb. But many fields were missing.
I'm trying to figure out how to layout the spreadsheet so it would contain nested data. But I didn't find any resources to do so but only this package:
https://www.npmjs.com/package/spread-sheet-to-nested-json
But it has one problem, it will always contain "title" and "children", not the actual name of the field.
This is my product json:
[
{
"sku": "ADX112",
"name": {
"en": "Multi-Mat Gallery Frames",
"ar": "لوحة بإطار"
},
"brand": "Dummy brand",
"description": {
"en": "Metal frame in a Black powder-coated finish. Tempered glass. 2 removable, acid-free paper mats included with each frame. Can be hung vertically and horizontally. D-rings included. 5x7 and 8x10 frames include easel backs. Sold individually. Made in China.",
"ar": "إطار اسود. صنع في الصين."
},
"tags": [
"art",
"frame",
"لوحة",
"إطار"
],
"colors": [
"#000000"
],
"dimensions": [
"5x7",
"8x10"
],
"units_in_stock": {
"5x7": 5,
"8x10": 7
},
"thumbnail": "https://via.placeholder.com/150",
"images": [
"https://via.placeholder.com/150",
"https://via.placeholder.com/150"
],
"unit_size": {
"en": [
"individual",
"set of 3"
],
"ar": [
"فردي",
"مجموعة من 3"
]
},
"unit_price": 2000,
"discount": 19,
"category_id": "631f3ca65b2310473b978ab5",
"subCategories_ids": [
"631f3ca65b2310473b978ab5",
"631f3ca65b2310473b978ab5"
],
"featured": false
}
]
How can I layout a spreadsheet so it would be a template for future imports?

Microsoft Teams Adaptive Card - Dark Mode Color Issue

I am trying to develope a simple messagaging extension app for Microsoft Teams. With the use of Task Modules I can load a simple Adative Card. Works as designed. The only problem I have with it, is that my Adaptive Card has a color issue withing Microsoft Teams in Dark Mode.
Take a look at the image below. 1 shows a very simple Adaptive Card designed via https://adaptivecards.io/designer/ (preview mode). 2 the very same Adaptive Card but now an actual snippet from Microsoft Teams. As you can see the card below has some color issues which makes the input hard to see.
Here is the code I've used:
public async handleTeamsMessagingExtensionFetchTask(
context: TurnContext,
action: any
): Promise<any> {
const adaptiveCard = CardFactory.adaptiveCard({
"type": "AdaptiveCard",
"body": [
{
"type": "TextBlock",
"size": "Medium",
"weight": "Bolder",
"text": "${title}"
},
{
"type": "Input.Text",
"placeholder": "Placeholder text"
}
],
"actions": [
{
"type": "Action.Submit",
"title": "Submit"
}
],
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"version": "1.3"
});
return {
task: {
type: 'continue',
value: {
card: adaptiveCard,
height: 535,
title: '${title}',
url: null,
width: 500
}
}
};
}
There's not much that can be done about the input box itself in this case, but maybe try changing the colour of the label above it, something like this (I've changed some of your property names as well, as they were invalid case - things like "medium" instead of "Medium":
"size": "medium",
"weight": "bolder",
"text": "${title}",
"color": "good"
Color allows the following values:
"default"
"dark"
"light"
"accent"
"good"
"warning"
"attention"
If you nest your Text input into a Container, you are able change the Container's style to provide a coloured background upon which the Text input sits.
It's not necessarily right for your use case, but it could be a workaround of interest
Container Style colour options
{
"type": "Container",
"style": "emphasis",
"bleed": true,
"items": [
{
"type": "TextBlock",
"text": "Request New Ticket",
"wrap": true,
"fontType": "Default",
"style": "heading",
"size": "Large",
"color": "Good",
"weight": "Bolder",
"horizontalAlignment": "Center"
}
]
}

ElasticSearch: Suggestion Completion Multi Search

I am using the suggestion api within ES with completion. My implementation works (code below) but I would like to search for multiple words within a query. In the example below if I query search "word" it finds "wordpress" and outputs "Found". What I am am trying to accomplish is querying with something like "word blog magazine" which are all tags and have an output of "Found". Any help would be appreciated!
Mapping:
curl -XPUT "http://localhost:9200/test_index/" -d'
{
"mappings": {
"product": {
"properties": {
"description": {
"type": "string"
},
"tags": {
"type": "string"
},
"title": {
"type": "string"
},
"tag_suggest": {
"type": "completion",
"index_analyzer": "simple",
"search_analyzer": "simple",
"payloads": false
}
}
}
}
}'
Add document:
curl -XPUT "http://localhost:9200/test_index/product/1" -d'
{
"title": "Product1",
"description": "Product1 Description",
"tags": [
"blog",
"magazine",
"responsive",
"two columns",
"wordpress"
],
"tag_suggest": {
"input": [
"blog",
"magazine",
"responsive",
"two columns",
"wordpress"
],
"output": "Found"
}
}'
_suggest query:
curl -XPOST "http://localhost:9200/test_index/_suggest" -d'
{
"product_suggest":{
"text":"word",
"completion": {
"field" : "tag_suggest"
}
}
}'
The results are as we would expect:
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"product_suggest": [
{
"text": "word",
"offset": 0,
"length": 4,
"options": [
{
"text": "Found",
"score": 1
},
]
}
]
}
If you're willing to switch to using edge ngrams (or full ngrams if you need them), I think it will solve your problem.
I wrote up a pretty detailed explanation of how to do this in this blog post:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
But I'll give you a quick and dirty version here. The trick is to use ngrams together with the _all field and the match AND operator.
So with this mapping:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"type": "string",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"word": {
"type": "string",
"include_in_all": true
},
"definition": {
"type": "string",
"include_in_all": true
}
}
}
}
}
and some documents:
PUT /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"word":"democracy", "definition":"government by the people; a form of government in which the supreme power is vested in the people and exercised directly by them or by their elected agents under a free electoral system."}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"word":"republic", "definition":"a state in which the supreme power rests in the body of citizens entitled to vote and is exercised by representatives chosen directly or indirectly by them."}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"word":"oligarchy", "definition":"a form of government in which all power is vested in a few persons or in a dominant class or clique; government by the few."}
{"index":{"_index":"test_index","_type":"doc","_id":4}}
{"word":"plutocracy", "definition":"the rule or power of wealth or of the wealthy."}
{"index":{"_index":"test_index","_type":"doc","_id":5}}
{"word":"theocracy", "definition":"a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities."}
{"index":{"_index":"test_index","_type":"doc","_id":6}}
{"word":"monarchy", "definition":"a state or nation in which the supreme power is actually or nominally lodged in a monarch."}
{"index":{"_index":"test_index","_type":"doc","_id":7}}
{"word":"capitalism", "definition":"an economic system in which investment in and ownership of the means of production, distribution, and exchange of wealth is made and maintained chiefly by private individuals or corporations, especially as contrasted to cooperatively or state-owned means of wealth."}
{"index":{"_index":"test_index","_type":"doc","_id":8}}
{"word":"socialism", "definition":"a theory or system of social organization that advocates the vesting of the ownership and control of the means of production and distribution, of capital, land, etc., in the community as a whole."}
{"index":{"_index":"test_index","_type":"doc","_id":9}}
{"word":"communism", "definition":"a theory or system of social organization based on the holding of all property in common, actual ownership being ascribed to the community as a whole or to the state."}
{"index":{"_index":"test_index","_type":"doc","_id":10}}
{"word":"feudalism", "definition":"the feudal system, or its principles and practices."}
{"index":{"_index":"test_index","_type":"doc","_id":11}}
{"word":"monopoly", "definition":"exclusive control of a commodity or service in a particular market, or a control that makes possible the manipulation of prices."}
{"index":{"_index":"test_index","_type":"doc","_id":12}}
{"word":"oligopoly", "definition":"the market condition that exists when there are few sellers, as a result of which they can greatly influence price and other market factors."}
I can apply partial matching across both fields (would work with as many fields as you want) like this:
POST /test_index/_search
{
"query": {
"match": {
"_all": {
"query": "theo go",
"operator": "and"
}
}
}
}
which in this case, returns:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.7601639,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "5",
"_score": 0.7601639,
"_source": {
"word": "theocracy",
"definition": "a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities."
}
}
]
}
}
Here is the code I used here (there's more in the blog post):
http://sense.qbox.io/gist/e4093c25a8257499f54ced5a09f35b1eb48e4e3c
Hope that helps.

Marklogic 8 Node.js API - How can I scope a search on a property child of root?

[updated 17:15 on 28/09]
I'm manipulating json data of type:
[
{
"id": 1,
"title": "Sun",
"seeAlso": [
{
"id": 2,
"title": "Rain"
},
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 2,
"title": "Rain",
"seeAlso": [
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 3,
"title": "Cloud",
"seeAlso": [
{
"id": 1,
"title": "Sun"
}
]
},
];
After inclusion in the database, a node.js search using
db.documents.query(
q.where(
q.collection('test films'),
q.value('title','Sun')
).withOptions({categories: 'none'})
)
.result( function(results) {
console.log(JSON.stringify(results, null,2));
});
will return both the film titled 'Sun' and the films which have a seeAlso/title property (forgive the xpath syntax) = 'Sun'.
I need to find 1/ films with title = 'Sun' 2/ films with seeAlso/title = 'Sun'.
I tried a container query using q.scope() with no success; I don't find how to scope the root object node (first case) and for the second case,
q.where(q.scope(q.property('seeAlso'), q.value('title','Sun')))
returns as first result an item which matches all text inside the root object node
{
"index": 1,
"uri": "/1.json",
"path": "fn:doc(\"/1.json\")",
"score": 137216,
"confidence": 0.6202662,
"fitness": 0.6701325,
"href": "/v1/documents?uri=%2F1.json&database=Documents",
"mimetype": "application/json",
"format": "json",
"matches": [
{
"path": "fn:doc(\"/1.json\")/object-node()",
"match-text": [
"Sun Rain Cloud"
]
}
]
},
which seems crazy.
Any idea about how doing such searches on denormalized json data?
Laurent:
XPaths on JSON are supported by MarkLogic.
In particular, you might consider setting up a path range index to match /title at the root:
http://docs.marklogic.com/guide/admin/range_index#id_54948
Scoped property matching required either filtering or indexed positions to be accurate. An alternative is to set up another path range index on /seeAlso/title
For the match issue it would be useful to know the MarkLogic version and to see the entire query.
Hoping that helps,

Elasticsearch index short words + make indexes applying EdgeNGram

I am using Elasticsearch with a EdgeNGram filter which is set as follows:
"edgeNGram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 15,
},
The problem is that when I make a query using very short words, they are completely omitted from the search. Let's say I type in "Vitamin C" -> this gives me results for the first term "Vitamin" only. Is there any way how to tell Elasticsearch not to use EdgeNGram filter when indexing words up to 3 characters?
Thank you.
EDIT:
These are my settings:
ELASTICSEARCH_INDEX_SETTINGS = {
"settings": {
"analysis": {
"analyzer": {
"sk_hunspell": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"sk_lowercase", "sk_SK", "stopwords_SK",
"edgeNGram", "asciifolding",
"remove_duplicities",
]
},
},
"filter": {
"sk_SK": {
"type": "hunspell",
"locale": "sk_SK",
"dedup": True,
"recursion_level": 0,
"ignore_case": True,
},
"sk_lowercase": {
"type": "lowercase",
},
"stopwords_SK": {
"type": "stop",
"stopwords": STOPWORDS_SK,
},
"remove_duplicities": {
"type": "unique",
"only_on_same_position": True
},
"edgeNGram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 15,
"token_chars": ["letter", "digit"],
},
},
}
}
}
In the database I store information about vitamins, minerals and medicinal plants. (Their use, collecting, blooming, health benefits etc.) The information are written in Slovak. (The names of the plants and minerals are also stored in Czech and Latin).
This idea may be a hack but you could pad words less than 3 with a special charecter before inserting them into the index so they are length 3.
When you accept the user's query you would have to also pad their words less than three with the same special charecter.
You would need to create a custom tokenizer for this.

Resources