How to get confidence score for detected entities? - azure

When I call the LUIS API, I get confidence scores associated with my intents. I also get a list of entities, but I don't get the corresponding confidence scores. How do I get the confidence scores?

This somewhat depends on how you're calling the API (directly or with some connector/recognizer). My answer assumes you're calling directly via the URL. In that case, whether you get a confidence or not is going to depend on the type of entity. Things like Regex or List entities aren't going to have a confidence because they only are identified if they are a 100% match. If you are using Machine Learned entities, you will have a confidence score. Not sure about any other entity types or features. Here is an example payload from my application. You can see that I have both an orderNumberML and orderNumber entity, the former being Machine Learned with a confidence value and the latter Regex without. You have to go into the $instance property, as the top level json.prediction.entities will just give you the list without any additional context.
{
"query":"what is the status of order ABC123 and order DEF456?",
"prediction":{
"topIntent":"viewOrder",
"intents":{
"viewOrder":{
"score":0.999304056
},
"cancelChangeQuantity":{
"score":0.0195436124
},
"escalate":{
"score":0.018896237
},
"qna":{
"score":0.0164053086
},
"changeShipMethod":{
"score":0.0147548188
},
"expediteOrder":{
"score":0.0100477394
},
"mainMenu":{
"score":0.00383487041
},
"requestCoc":{
"score":0.00324145844
},
"orderShortage":{
"score":0.00208944362
},
"Utilities.Help":{
"score":0.00205096183
},
"generalSupport":{
"score":0.001971956
},
"trcSupport":{
"score":0.00169838977
},
"trcEscalate":{
"score":0.00165500911
},
"getPricing":{
"score":0.00135509949
},
"getAvailability":{
"score":0.00125210814
},
"orderOverage":{
"score":0.000846677169
},
"srStatus":{
"score":0.0006817043
},
"shippingProblem":{
"score":0.000577154336
},
"warrantyClaim":{
"score":0.000458181225
},
"getTranscript":{
"score":0.000367239147
},
"None":{
"score":0.000275740429
},
"manageProfile":{
"score":0.0002755769
},
"confirmShipDate":{
"score":0.0001726267
},
"Utilities.Cancel":{
"score":7.628063E-05
}
},
"entities":{
"orderNumberML":[
"ABC123",
"DEF456"
],
"orderNumber":[
"ABC123",
"DEF456"
],
"$instance":{
"orderNumberML":[
{
"type":"orderNumberML",
"text":"ABC123",
"startIndex":28,
"length":6,
"score":0.916349649,
"modelTypeId":1,
"modelType":"Entity Extractor",
"recognitionSources":[
"model"
]
},
{
"type":"orderNumberML",
"text":"DEF456",
"startIndex":45,
"length":6,
"score":0.9027585,
"modelTypeId":1,
"modelType":"Entity Extractor",
"recognitionSources":[
"model"
]
}
],
"orderNumber":[
{
"type":"orderNumber",
"text":"ABC123",
"startIndex":28,
"length":6,
"modelTypeId":8,
"modelType":"Regex Entity Extractor",
"recognitionSources":[
"model"
]
},
{
"type":"orderNumber",
"text":"DEF456",
"startIndex":45,
"length":6,
"modelTypeId":8,
"modelType":"Regex Entity Extractor",
"recognitionSources":[
"model"
]
}
]
}
}
}
}

Related

Couchdb 2 _find query not using index

I'm struggling with something that should be easy but it's making no sense to me, I have these 2 documents in a database:
{ "name": "foo", "type": "typeA" },
{ "name": "bar", "type": "typeB" }
And I'm posting this to _find:
{
"selector": {
"type": "typeA"
},
"sort": ["name"]
}
Which works as expected but I get a warning that there's no matching index, so I've tried posting various combinations of the following to _index which makes no difference:
{
"index": {
"fields": ["type"]
}
}
{
"index": {
"fields": ["name"]
}
}
{
"index": {
"fields": ["name", "type"]
}
}
If I remove the sort by name and only index the type it works fine except it's not sorted, is this a limitation with couchdbs' mango implementation or am I missing something?
Using a view and map function works fine but I'm curious what mango is/isn't doing here.
With just the type index, I think it will normally be almost as efficient unless you have many documents of each type (as it has to do the sorting stage in memory.)
But since fields are ordered, it would be necessary to do:
{
"index": {
"fields": ["type", "name"]
}
}
to have a contiguous slice of this index for each type that is already ordered by name. But the query planner may not determine that this index applies.
As an example, the current pouchdb-find (which should be similar) needs the more complicated but equivalent query:
{
selector: {type: 'typeA', name: {$gte: null} },
sort: ['type','name']
}
to choose this index and build a plan that doesn't resort to building in memory for any step.

How to pull embedded docs from MongoDB query into array?

I have a variable var correctAnswers;
In my MongoDB I have the following document (below). I am trying to write a query that takes all of the "correct" fields from the "quiz" field and put them into their own array, so I can set that array equal to var correctAnswers;.
"title" : "Economics questions"
"quiz": "[{
"question": "Which of these involves the analysis of of a business's financial statements, often used in stock valuation?",
"choices": ["Fundamental analysis", "Technical analysis"],
"correct": 0
}, {
"question": "What was the name of the bond purchasing program started by the U.S. Federal Reserve in response to the 2008 financial crisis?",
"choices": ["Stimulus Package", "Mercantilism", "Quantitative Easing"],
"correct": 2
}, {
"question": "Which term describes a debt security issued by a government, company, or other entity?",
"choices": ["Bond", "Stock", "Mutual fund"],
"correct": 0
}, {
"question": "Which of these companies has the largest market capitalization (as of October 2015)?",
"choices": ["Microsoft", "General Electric", "Apple", "Bank of America"],
"correct": 2
}, {
"question": "Which of these is a measure of the size of an economy?",
"choices": ["Unemployment rate", "Purchasing power index", "Gross Domestic Product"],
"correct": 2
}]"
How should I go about that, or can someone point me in the right direction? I have tried projections, but should I do an aggregation? Thank you for any help.
Edit for clarity: the output I am looking for in this example is an array, [0,2,0,2,2]
you can get this result
[{correct:0},{correct:2},{correct:0},{correct:2}] but [0,2,0,2,2] type of result is not possible unless we use distinct
db.quiz.aggregate(
// Initial document match (uses index, if a suitable one is available)
{ $match: {
"title" : "Economics questions"
}},
// Convert embedded array into stream of documents
{ $unwind: '$quiz' },
},
// Note: Could add a `$group` by _id here if multiple matches are expected
// Final projection: exclude fields with 0, include fields with 1
{ $project: {
_id: 0,
score: "$quiz.correct"
}} )
db.users.find( { }, { "quiz.correct": 1,"_id":0 } )
// above query will return following output :
{
"quiz" : [
{
"correct" : 0
},
{
"correct" : 2
},
{
"correct" : 0
},
{
"correct" : 2
},
{
"correct" : 2
}
]
}
Process this output as per requirement in the node js.
Try this:
db.getCollection('quize').aggregate([
{$match:{_id: id }},
{$unwind:'$quiz'},
{$group:{
_id:null,
score: {$push:"$quiz.correct"}
}}
])
It will give you the expected output.
One way to achieve this through aggregation
db.collectionName.aggregate([
// use index key in match pipeline,just for e.g using title here
{ $match: { "title" : "Economics questions" }},
{ $unwind: "$quiz" },
{ $group: {
_id:null,
quiz: { $push: "$quiz.correct" }
}
},
//this is not required, use projection only if you want to exclude/include fields
{
$project: {_id: 0, quiz: 1}
}
])
Above query will give you the following output
{
"quiz" : [ 0, 2, 0, 2, 2 ]
}
Then simply process this output as per your need.

Elasticsearch query_string combined with match_phrase

I think it's best if I describe my intent and try to break it down to code.
I want users to have the ability of complex queries should they choose to that query_string offers. For example 'AND' and 'OR' and '~', etc.
I want to have fuzziness in effect, which has made me do things I feel dirty about like "#{query}~" to the sent to ES, in other words I am specifying fuzzy query on the user's behalf because we offer transliteration which could be difficult to get the exact spelling.
At times, users search a number of words that are suppose to be in a phrase. query_string searches them individually and not as a phrase. For example 'he who will' should bring me the top match to be when those three words are in that order, then give me whatever later.
Current query:
{
"indices_boost": {},
"aggregations": {
"by_ayah_key": {
"terms": {
"field": "ayah.ayah_key",
"size": 6236,
"order": {
"average_score": "desc"
}
},
"aggregations": {
"match": {
"top_hits": {
"highlight": {
"fields": {
"text": {
"type": "fvh",
"matched_fields": [
"text.root",
"text.stem_clean",
"text.lemma_clean",
"text.stemmed",
"text"
],
"number_of_fragments": 0
}
},
"tags_schema": "styled"
},
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"include": [
"text",
"resource.*",
"language.*"
]
},
"size": 5
}
},
"average_score": {
"avg": {
"script": "_score"
}
}
}
}
},
"from": 0,
"size": 0,
"_source": [
"text",
"resource.*",
"language.*"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "inna alatheena",
"fuzziness": 1,
"fields": [
"text^1.6",
"text.stemmed"
],
"minimum_should_match": "85%"
}
}
],
"should": [
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase"
}
}
}
]
}
}
}
Note: alatheena searched without the ~ will not return anything although I have allatheena in the indices. So I must do a fuzzy search.
Any thoughts?
I see that you're doing ES indexing of Qur'anic verses, +1 ...
Much of your problem domain, if I understood it correctly, can be solved simply by storing lots of transliteration variants (and permutations of their combining) in a separate field on your Aayah documents.
First off, you should make a char filter that replaces all double letters with single letters [aa] => [a], [ll] => [l]
Maybe also make a separate field containing all of [a, e, i] (because of their "vocative"/transcribal ambiguity) replaced with € or something similar, and do the same while querying in order to get as many matches as possible...
Also, TH in "allatheena" (which as a footnote may really be Dhaal, Thaa, Zhaa, Taa+Haa, Taa+Hhaa, Ttaa+Hhaa transcribed ...) should be replaced by something, or both the Dhaal AND the Thaa should be transcribed multiple times.
Then, because it's Qur'anic script, all Alefs without diacritics, Hamza, Madda, etc should be treated as Alef (or Hamzat) ul-Wasl, and that should also be considered when indexing / searching, because of Waqf / Wasl in reading arabic. (consider all the Wasl`s in the first Aayah of Surat Al-Alaq for example)
Dunno if this is answering your question in any way, but I hope it's of some assistance in implementing your application nontheless.
You should use Dis Max Query to achieve that.
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with
different boost factors (so that the fields cannot be combined
equivalently into a single search field). We want the primary score to
be the one associated with the highest boost.
Quick example how to use it:
POST /_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase",
"boost": 5
}
}
},
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase",
"fuzziness": "AUTO",
"boost": 3
}
}
},
{
"query_string": {
"default_field": "text",
"query": "inna alatheena"
}
}
]
}
}
}
It will run all of your queries, and the one, which scored highest compared to others, will be taken. So just define your rules using it. You should achieve what you wanted.

Getting joined data from strongloop/loopback

How can I get data from two joined tables?
Suppose, there are two models called Category (CategoryId, CategoryName) and Product(ProductId, ProductName, CategoryId), Is there a way to get a result like:(ProductId, ProductName, CategoryId, CategoryName)
There should be a relation between your Category model and your Product model. A Category hasMany Products and each Product belongsTo a Category. So your model json files should be something like
Category:
{
"name":"Category",
"properties":{
"CategoryId": {
"type":"Number",
"id":1
},
"CategoryName":{
"type":"String"
}
},
"relations": {
"products": {
"type":"hasMany",
"model":"Product",
"foreignKey":"CategoryId"
}
}
}
Product
{
"name":"Product",
"properties":{
"ProductId: {
"type":"Number",
"id":1
},
"ProductName":{
"type":"String"
},
"CategoryId": {
"type":"Number"
}
},
"relations": {
"category": {
"type":"belongsTo",
"model":"Category",
"foreignKey":"CategoryId"
}
}
}
Of course these json definitions must be completed with the corresponding options and datasource properties, which only you do know.
The relation between those two models will add endpoints to loopback explorer so you can query for products in a certan category:
GET /api/Categorys/:id_category/products
of the category to which a product belongs
GET /api/Products/:id_product/category
Please note that, unless you specify a plural option for Category, its plural will be Categorys. This is not a typo.
Finally, if you want to query for a product and its category, you would use the include filter
GET /api/Products/:id_product?filter[include]=category
hope it helps.

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources