Parsing Spacy's output - nlp

I understand spacy is parsing the given sentence and doing a POS tagging for the same.But after the sentence is parsed, i would like to get some sense of the output.
set an alarm for 7 PM tomorrow,
Expected output
{
Intent : set_alarm,
entity : { "time" : 7PM, "date": tomorrow}
}
Output from spacy :
[
{
word: "Set",
lemma: "set",
NE: "",
POS_fine: "JJ",
POS_coarse: "ADJ",
arc: "ROOT",
children: [
{
word: "alarm",
lemma: "alarm",
NE: "",
POS_fine: "NN",
POS_coarse: "NOUN",
arc: "dobj",
children: [ ]
},
{
word: "for",
lemma: "for",
NE: "",
POS_fine: "IN",
POS_coarse: "ADP",
arc: "prep",
children: [
{
word: "9 pm",
lemma: "9 pm",
NE: "TIME",
POS_fine: "NN",
POS_coarse: "NOUN",
arc: "pobj",
children: [ ]
}
]
},
{
word: "today",
lemma: "today",
NE: "",
POS_fine: "NN",
POS_coarse: "NOUN",
arc: "npadvmod",
children: [ ]
}
]
}
]

Your output is a parse tree. You're also given part of speech information (POS) and recognized named entities (NE). What you provide as expected output is called intent detection as far as I remember, please also see this ticket.

Related

Arango DB Filter query for print array of value

Given the following document structure:
{
"name": [
{
"use": "official",
"family": "Chalmers",
"given": [
"Peter",
"James"
]
},
{
"use": "usual",
"given": [
"Jim"
]
},
{
"use": "maiden",
"family": "Windsor",
"given": [
"Peter",
"James"
]
}
]
}
Query:
FOR client IN Patient FILTER client.name[*].use=='official' RETURN client.name[*].given
I have telecom and name array.
I want to query to compare if name[*].use=='official' then print corresponding give array.
Expected result:
"given": [
"Peter",
"James"
]
client.name[*].use is an array, so you need to use an array operator. It can be either of the following:
'string' in doc.attribute
doc.attribute ANY == 'string'
doc.attribute ANY IN ['string']
To return just the given names from the 'official' array, you can use a subquery:
RETURN { given:
FIRST(FOR name IN client.name FILTER name.use == 'official' LIMIT 1 RETURN name.given)
}
Alternatively, you can use an inline expression:
FOR client IN Patient
FILTER 'official' IN client.name[*].use
RETURN { given:
FIRST(client.name[* FILTER CURRENT.use == 'official' LIMIT 1 RETURN CURRENT.given])
}
Result:
[
{
"given": [
"Peter",
"James"
]
}
]
In your original post, the example document and query didn't match, but assuming the following structure:
{
"telecom": [
{
"use": "official",
"value": "+1 (03) 5555 6473 82"
},
{
"use": "mobile",
"value": "+1 (252) 5555 910 920 3"
}
],
"name": [
{
"use": "official",
"family": "Chalmers",
"given": [
"Peter",
"James"
]
},
{
"use": "usual",
"given": [
"Jim"
]
},
{
"use": "maiden",
"family": "Windsor",
"given": [
"Peter",
"James"
]
}
]
}
… here is a possible query:
FOR client IN Patient
FILTER LENGTH(client.telecom[* FILTER
CONTAINS(CURRENT.value, "(03) 5555 6473") AND
CURRENT.use == 'official']
)
RETURN {
given: client.name[* FILTER CURRENT.use == 'official' RETURN CURRENT.given]
}
Note that client.telecom[*].value LIKE "..." causes the array of phone numbers to be cast to a string "[\"+1 (03) 5555 6473 82\",\"+1 (252) 5555 910 920 3\"]" against which the LIKE operation is run - this kind of works, but it's not ideal.
CONTAINS() is also faster than LIKE with % wildcards on both sides.
It would be possible that there are multiple 'official' elements, which might require an extra level of array nesting. Above query produces:
[
{
"given": [
[
"Peter",
"James"
]
]
}
]
If you know that there is only one element or restrict it to one element explicitly then you can get rid of one of the wrapping square brackets with FIRST() or FLATTEN().

python dictionary how can create (structured) unique dictionary list if the key contains list of values of other keys

I have below unstructured dictionary list which contains values of other keys in a list .
I am not sure if the question i ask is strange. this is the actual dictionary payload that we receive from source which not aligned with respective entry
[
{
"dsply_nm": [
"test test",
"test test",
"",
""
],
"start_dt": [
"2021-04-21T00:01:00-04:00",
"2021-04-21T00:01:00-04:00",
"2021-04-21T00:01:00-04:00",
"2021-04-21T00:01:00-04:00"
],
"exp_dt": [
"2022-04-21T00:01:00-04:00",
"2022-04-21T00:01:00-04:00",
"2022-04-21T00:01:00-04:00",
"2022-04-21T00:01:00-04:00"
],
"hrs_pwr": [
"14",
"12",
"13",
"15"
],
"make_nm": "test",
"model_nm": "test",
"my_yr": "1980"
}
]
"the length of list cannot not be expected and it could be more than 4 sometimes or less in some keys"
#Expected:
i need to check if the above dictionary are in proper structure or not and based on that it should return the proper dictionary list associate with each item
for eg:
def get_dict_list(items):
if type(items == not structure)
result = get_associated_dict_items_mapped
return result
else:
return items
#Final result
expected_dict_list=
[{"dsply_nm":"test test","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"14"},
{"dsply_nm":"test test","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"12","make_nm": "test",model_nm": "test","my_yr": "1980"},
{"dsply_nm":"","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"13"},
{"dsply_nm":"","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"15"}
]
in above dictionary payload, below part is associated with the second dictionary items and have to map respectively
"make_nm": "test",
"model_nm": "test",
"my_yr": "1980"
}
Can anyone help on this?
Thanks
Since customer details is a list
dict(zip(customer_details[0], list(customer_details.values[0]())))
this yields:
{'insured_details': ['asset', 'asset', 'asset'],
'id': ['213', '214', '233'],
'dept': ['account', 'sales', 'market'],
'salary': ['12', '13', '14']}
​
I think a couple of list comprehensions will get you going. If you would like me to unwind them into more traditional for loops, just let me know.
import json
def get_dict_list(item):
first_value = list(item.values())[0]
if not isinstance(first_value, list):
return [item]
return [{key: item[key][i] for key in item.keys()} for i in range(len(first_value))]
cutomer_details = [
{
"insured_details": "asset",
"id": "xxx",
"dept": "account",
"salary": "12"
},
{
"insured_details": ["asset", "asset", "asset"],
"id":["213","214","233"],
"dept":["account","sales","market"],
"salary":["12","13","14"]
}
]
cutomer_details_cleaned = []
for detail in cutomer_details:
cutomer_details_cleaned.extend(get_dict_list(detail))
print(json.dumps(cutomer_details_cleaned, indent=4))
That should give you:
[
{
"insured_details": "asset",
"id": "xxx",
"dept": "account",
"salary": "12"
},
{
"insured_details": "asset",
"id": "213",
"dept": "account",
"salary": "12"
},
{
"insured_details": "asset",
"id": "214",
"dept": "sales",
"salary": "13"
},
{
"insured_details": "asset",
"id": "233",
"dept": "market",
"salary": "14"
}
]

How to correctly use LUIS ML-features?

I just stumbled over the new "ML-features" in LUIS and I am not sure if I really understand how to use them correctly. The documentation seems very abstract and vague to me:
https://learn.microsoft.com/de-de/azure/cognitive-services/luis/luis-concept-feature
Besides a good general explanation a solution for the following example would be very welcome:
Example
Intent: OpenABox
Sample utterances: "open the green box", "open the azure box".
Entity: ColorEntity (no prebuilt entity).
The color should understand "green", "blue", "azure" and "olive", where "olive" should be regarded as synonym to "green" and "azure" to "blue".
Solution Proposal
I assume you would have to
Add an intent
Add a list-entity, that lists all colors and assigns their synonyms?
Add a phrase list, that again lists some, but maybe not all, colors, without respect to their meaning?
Make the ML-feature global?
Mark the values as interchangable?
Add a ML-entity, and assign the list entity as well as the phrase list as features?
Make the list-entity-feature required?
Add sample utterances and mark the entities with the list-entity? Or with the ML-entity? Or both?
Add the ML-Entity as feature to the intent? Or the phrase list? Or the list-entity? Or none at all?
Is it correct, that there is no way to confirm the correct resolution of "olive" to its canonical form "green" using the test panel? So I have to use the API to test this?
The Model
This model has been created as described above. It seems to do its job. But is this really the optimal way to do it? There seems to be a lot of redundancy in there.
{
"luis_schema_version": "7.0.0",
"intents": [
{
"name": "None",
"features": []
},
{
"name": "OpenABox",
"features": [
{
"modelName": "ColorMLEntity",
"isRequired": false
}
]
}
],
"entities": [
{
"name": "ColorMLEntity",
"children": [],
"roles": [],
"features": [
{
"featureName": "ColorPhraseList",
"isRequired": false
},
{
"modelName": "ColorListEntity",
"isRequired": true
}
]
}
],
"hierarchicals": [],
"composites": [],
"closedLists": [
{
"name": "ColorListEntity",
"subLists": [
{
"canonicalForm": "green",
"list": [
"olive"
]
},
{
"canonicalForm": "blue",
"list": [
"azure"
]
}
],
"roles": []
}
],
"prebuiltEntities": [],
"utterances": [
{
"text": "open the azure box",
"intent": "OpenABox",
"entities": [
{
"entity": "ColorMLEntity",
"startPos": 9,
"endPos": 13,
"children": []
}
]
},
{
"text": "open the green box",
"intent": "OpenABox",
"entities": [
{
"entity": "ColorMLEntity",
"startPos": 9,
"endPos": 13,
"children": []
}
]
}
],
"versionId": "0.1",
"name": "ColorTest",
"desc": "",
"culture": "en-us",
"tokenizerVersion": "1.0.0",
"patternAnyEntities": [],
"regex_entities": [],
"phraselists": [
{
"name": "ColorPhraseList",
"mode": true,
"words": "green,blue,azure,olive",
"activated": true,
"enabledForAllModels": false
}
],
"regex_features": [],
"patterns": [],
"settings": []
}
Features are supposed to be signals relevant to an intent, or an entity.
So for this example,
Create an ML entity "ColorEntity",
Label the utterances
Add ColorEntity as a feature for the intent
Then you can add a feature to ColorEntity, either a list entity or phrase list, no need for both.

how to query nested array of objects in mongodb?

i am trying to query nested array of objects in mongodb from node js, tried all the solutions but no luck. can anyone please help this on priority?
I have tried following :
{
"name": "Science",
"chapters": [
{
"name": "ScienceChap1",
"tests": [
{
"name": "ScienceChap1Test1",
"id": 1,
"marks": 10,
"duration": 30,
"questions": [
{
"question": "What is the capital city of New Mexico?",
"type": "mcq",
"choice": [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer": [
"Santa Fe",
"Taos"
]
},
{
"question": "Who is the author of beowulf?",
"type": "notmcq",
"choice": [
"Mark Twain",
"Shakespeare",
"Abraham Lincoln",
"Newton"
],
"answer": [
"Shakespeare"
]
}
]
},
{
"name": "ScienceChap1test2",
"id": 2,
"marks": 20,
"duration": 30,
"questions": [
{
"question": "What is the capital city of New Mexico?",
"type": "mcq",
"choice": [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer": [
"Santa Fe",
"Taos"
]
},
{
"question": "Who is the author of beowulf?",
"type": "notmcq",
"choice": [
"Mark Twain",
"Shakespeare",
"Abraham Lincoln",
"Newton"
],
"answer": [
"Shakespeare"
]
}
]
}
]
}
]
}
Here is what I've tried so far but still can't get it to work
db.quiz.find({name:"Science"},{"tests":0,chapters:{$elemMatch:{name:"ScienceCh‌​ap1"}}})
db.quiz.find({ chapters: { $elemMatch: {$elemMatch: { name:"ScienceChap1Test1" } } }})
db.quiz.find({name:"Science"},{chapters:{$elemMatch:{$elemMatch:{name:"Scienc‌​eChap1Test1"}}}}) ({ name:"Science"},{ chapters: { $elemMatch: {$elemMatch: { name:"ScienceChap1Test1" } } }})
Aggregation Framework
You can use the aggregation framework to transform and combine documents in a collection to display to the client. You build a pipeline that processes a stream of documents through several building blocks: filtering, projecting, grouping, sorting, etc.
If you want get the mcq type questions from the test named "ScienceChap1Test1", you would do the following:
db.quiz.aggregate(
//Match the documents by query. Search for science course
{"$match":{"name":"Science"}},
//De-normalize the nested array of chapters.
{"$unwind":"$chapters"},
{"$unwind":"$chapters.tests"},
//Match the document with test name Science Chapter
{"$match":{"chapters.tests.name":"ScienceChap1test2"}},
//Unwind nested questions array
{"$unwind":"$chapters.tests.questions"},
//Match questions of type mcq
{"$match":{"chapters.tests.questions.type":"mcq"}}
).pretty()
The result will be:
{
"_id" : ObjectId("5629eb252e95c020d4a0c5a5"),
"name" : "Science",
"chapters" : {
"name" : "ScienceChap1",
"tests" : {
"name" : "ScienceChap1test2",
"id" : 2,
"marks" : 20,
"duration" : 30,
"questions" : {
"question" : "What is the capital city of New Mexico?",
"type" : "mcq",
"choice" : [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer" : [
"Santa Fe",
"Taos"
]
}
}
}
}
$elemMatch doesn't work for sub documents. You can use the aggregation framework for "array filtering" by using $unwind.
You can delete each line from the bottom of each command in the aggregation pipeline in the above code to observe the pipelines behavior.
You should try the following queries in the mongodb simple javascript shell.
There could be Two Scenarios.
Scenario One
If you simply want to return the documents that contain certain chapter names or test names for example just one argument in find will do.
For the find method the document you want to be returned is specified by the first argument. You could return documents with the name Science by doing this:
db.quiz.find({name:"Science"})
You could specify criteria to match a single embedded document in an array by using $elemMatch. To find a document that has a chapter with the name ScienceChap1. You could do this:
db.quiz.find({"chapters":{"$elemMatch":{"name":"ScienceChap1"}}})
If you wanted your criteria to be a test name then you could use the dot operator like this:
db.quiz.find({"chapters.tests":{"$elemMatch":{"name":"ScienceChap1Test1"}}})
Scenario Two - Specifying Which Keys to Return
If you want to specify which keys to Return you can pass a second argument to find (or findOne) specifying the keys you want. In your case you can search for the document name and then provide which keys to return like so.
db.quiz.find({name:"Science"},{"chapters":1})
//Would return
{
"_id": ObjectId(...),
"chapters": [
"name": "ScienceChap2",
"tests: [..all object content here..]
}
If you only want to return the marks from each test object you can use the dot operator to do so:
db.quiz.find({name:"Science"},{"chapters.tests.marks":1})
//Would return
{
"_id": ObjectId(...),
"chapters": [
"tests: [
{"marks":10},
{"marks":20}
]
}
If you only want to return the questions from each test:
db.quiz.find({name:"Science"},{"chapters.tests.questions":1})
Test these out. I hope these help.

How can I query for “falsey” values in MongoDB?

I want to query a Mongo collection for documents where a specific field is either missing or has a value that would evaluate to false in Python. This includes atomic values null, 0, '' (the empty string), false, []. However, arrays containing such values (such as ['foo', ''] or just ['']) are not falsey and must not be matched. Can I do this with Mongo’s structured queries (without resorting to JavaScript)?
$type doesn’t seem to help:
> db.foo.insert({bar: ['baz', '', 'qux']});
> db.foo.find({$and: [{bar: ''}, {bar: {$type: 2}}]});
{ "_id" : ObjectId("50599937da5254d6fd731816"), "bar" : [ "baz", "", "qux" ] }
This should work
db.test.find({$or:[{a:{$size:0}},{"a.0":{$exists:true}}]})
Just make sure the a field doesn't have an object inside with the 0 key.
e.g.
> db.test.find()
{ "_id": ObjectId("5059ac3ab1cee080a7168fff"), "bar": [ "baz", "", "qux" ] }
{ "_id": ObjectId("5059ac48b1cee080a7169000"), "hello": 1, "bar": false, "world": 34 }
{ "_id": ObjectId("5059ac53b1cee080a7169001"), "hello": 1, "world": 42 }
{ "_id": ObjectId("5059ac60b1cee080a7169002"), "hello": 13, "bar": null, "world": 34 }
{ "_id": ObjectId("5059ac6bb1cee080a7169003"), "hello": 133, "bar": [ ], "world": 334 }
{ "_id": ObjectId("5059b36cb1cee080a7169004"), "hello": 133, "bar": [ "" ], "world": 334 }
{ "_id": ObjectId("5059b3e3b1cee080a7169005"), "hello": 133, "bar": "foo", "world": 334 }
{ "_id": ObjectId("5059b3f8b1cee080a7169006"), "hello": 1333, "bar": "", "world": 334 }
{ "_id": ObjectId("5059b424b1cee080a7169007"), "hello": 1333, "bar": { "0": "foo" }, "world": 334 }
> db.test.find({$or: [{bar: {$size: 0}}, {"bar.0": {$exists: true}}]})
{ "_id": ObjectId("5059ac3ab1cee080a7168fff"), "bar": [ "baz", "", "qux" ] }
{ "_id": ObjectId("5059ac6bb1cee080a7169003"), "hello": 133, "bar": [ ], "world": 334 }
{ "_id": ObjectId("5059b36cb1cee080a7169004"), "hello": 133, "bar": [ "" ], "world": 334 }
{ "_id": ObjectId("5059b424b1cee080a7169007"), "hello": 1333, "bar": { "0": "foo" }, "world": 334 }
I found this: https://jira.mongodb.org/browse/SERVER-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel from back in the day.
I can replicate it on my MongoDB 2.0.1 (I also played around a bit to see if it picked it up as something else):
> db.g.find()
{ "_id" : ObjectId("50599eb65395c82c7a47d124"), "bar" : [ "baz", "", "qux" ] }
{ "_id" : ObjectId("5059a0005395c82c7a47d125"), "a" : 3, "b" : { "a" : 1, "b" : 2 } }
> db.g.find({bar: {$type: 4}});
> db.g.find({a: {$type: 2}});
> db.g.find({a: {$type: 16}});
> db.g.find({bar: {$type: 16}});
> db.g.find({bar: {$type: 2}});
{ "_id" : ObjectId("50599eb65395c82c7a47d124"), "bar" : [ "baz", "", "qux" ] }
> db.g.find({bar: {$type: 3}});
> db.g.find({b: {$type: 3}});
{ "_id" : ObjectId("5059a0005395c82c7a47d125"), "a" : 3, "b" : { "a" : 1, "b" : 2 } }
And when I use $type 4 I cannot get the document with the array in. As you can see $type 3 works fine (Which could be related to: https://jira.mongodb.org/browse/SERVER-1475) but the array cannot seem to be picked up.
It is possible that you are seeing a bug. If you file a JIRA on MongoDBs site (jira.mongodb.org) it could help to solve the problem.
However, though, the $type op might not solve your problem. This might be better done via a client side method like unsetting the field totally if it has no elements that way you query for existance. This standardises your querying patterns and makes it easier to integrate in general.
So my personal recommendation here is to standardise "falsey" values into one single conherrent value.
Edit
I noticed they have marked the original bug as a duplicate (that's why it is closed) however I am not sure how it is a duplicate. These arrays are not being picked up as objects but rather as strings, most likely since $type is acting on every element within that field rather than the field itself (or something like that).
I would still open a JIRA and stress that the array cannot be picked up at all.

Resources