Google Cloud NLP - No Entities Returned - nlp

We are having some issues with the Google NLP service. The service is intermittently refusing to return entities for certain terms. We use the NLP annotate API for free text answers to survey responses. A recent question was related to an image of a kids TV character in the UK called Zippy. Some example responses are below. Unfortunately we had thousands of responses like this and none of them detected "zippy" as an entity. Strangely "elmo", "zippie" and others were detected without any issue, only this specific set of chars ("zippy") returned with no entities. Any ideas why this might be?
{
"sentences": [{
"text": {
"content": "zippy",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0.1,
"score": 0.1
}
}],
"tokens": [],
"entities": [],
"documentSentiment": {
"magnitude": 0.1,
"score": 0.1
},
"language": "en",
"categories": []
}
"rainbow" detected but not "zippy"
{
"sentences": [{
"text": {
"content": "zippy from rainbow",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0.1,
"score": 0.1
}
}],
"tokens": [],
"entities": [{
"name": "rainbow",
"type": "OTHER",
"metadata": [],
"salience": 1,
"mentions": [{
"text": {
"content": "rainbow",
"beginOffset": 11
},
"type": "COMMON"
}]
}],
"documentSentiment": {
"magnitude": 0.1,
"score": 0.1
},
"language": "en",
"categories": []
}
"zippie" detected fine
{
"sentences": [{
"text": {
"content": "zippie",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0,
"score": 0
}
}],
"tokens": [],
"entities": [{
"name": "zippie",
"type": "OTHER",
"metadata": [],
"salience": 1,
"mentions": [{
"text": {
"content": "zippie",
"beginOffset": 0
},
"type": "PROPER"
}]
}],
"documentSentiment": {
"magnitude": 0,
"score": 0
},
"language": "en",
"categories": []
}
"elmo" detected fine
{
"sentences": [{
"text": {
"content": "elmo",
"beginOffset": 0
},
"sentiment": {
"magnitude": 0.1,
"score": 0.1
}
}],
"tokens": [],
"entities": [{
"name": "elmo",
"type": "OTHER",
"metadata": [],
"salience": 1,
"mentions": [{
"text": {
"content": "elmo",
"beginOffset": 0
},
"type": "COMMON"
}]
}],
"documentSentiment": {
"magnitude": 0.1,
"score": 0.1
},
"language": "en",
"categories": []
}

Services like these are trained on a specific corpus of 'entity' values.
The service tokenizes/chunks, then uses part of speech tagging to identify noun phrases and checks against a giant index to see if that noun phrase is an entity.
Zippy must not be in the corpus. Not sure about google NLP, but Watson NLU comes with a GUI product for easily creating your own 'dictionary' of entity noun phrases.
Also very possible to create your own using NLTK or from scratch in python, but all require the effort of manually curating your own 'dictionary', unless you are able to get your hands on and adapt another.

Related

How to get the available time slots for particular minutes in a mongodb collection

I am creating a booking system ,which has booking collection like this
{
"_id": {
"$oid": "61cb68eed0a209fa3f76335d"
},
"name": "string",
"vendorId": "string",
"serviceDate": "2021-12-29",
"serviceStartTime": {
"$date": "2021-12-28T19:45:00.000Z"
},
"serviceEndTime": {
"$date": "2021-12-28T21:15:00.000Z"
},
"staffId": "string",
"services": [
[
{
"serviceName": "string",
"servicePrice": "string",
"gender": "string",
"description": "string"
}
]
],
"isAccepted": false,
"isActive": true,
"createdAt": {
"$date": "2021-12-28T19:43:42.953Z"
},
"updatedAt": {
"$date": "2021-12-28T19:43:42.953Z"
},
"__v": 0
}
There can be many bookings like this ,with serviceStart time and serviceEnd time for a particular serviceDate
I want to get the available time slots lets say for 45 minutes between 8 am to 8pm for showing the availability slots in a calendar for particular staffId
Is this possible in mongdb ?.Please help
Expected output
{
slot :"11:00 - 12:00"
}
or
{
slotStartTime :"11:00",
slotEndTime :" 12:00",
}
I don't mind the format ,just need available time slots like start time and end time
Edit
[{
"_id": {
"$oid": "61cb68eed0a209fa3f76335d"
},
"name": "string",
"vendorId": "string",
"serviceDate": "2021-12-29",
"serviceStartTime": {
"$date": "2021-12-28T19:45:00Z"
},
"serviceEndTime": {
"$date": "2021-12-28T21:15:00Z"
},
"staffId": "string",
"services": [
[
{
"serviceName": "string",
"servicePrice": "string",
"gender": "string",
"description": "string"
}
]
],
"isAccepted": false,
"isActive": true,
"createdAt": {
"$date": "2021-12-28T19:43:42.953Z"
},
"updatedAt": {
"$date": "2021-12-28T19:43:42.953Z"
},
"__v": 0
},{
"_id": {
"$oid": "61cb7a9f8be10f47d4089e7b"
},
"name": "string",
"vendorId": "string",
"serviceDate": "2021-12-29",
"serviceStartTime": {
"$date": "2021-12-29T07:45:00Z"
},
"serviceEndTime": {
"$date": "2021-12-29T09:15:00Z"
},
"staffId": "string",
"services": [
[
{
"serviceName": "string",
"servicePrice": "string",
"gender": "string",
"description": "string"
}
]
],
"isAccepted": false,
"isActive": true,
"createdAt": {
"$date": "2021-12-28T20:59:11.638Z"
},
"updatedAt": {
"$date": "2021-12-28T20:59:11.638Z"
},
"__v": 0
}]
Query
match the date "2021-12-29" (put any date)
check hour for both to be >=8 and <=19
map on the range of the 2 dates difference with step 45min
(we need this to get all slots)
reduce to make them pairs
unwind to make each pair separate document
convert the dates to strings in the format hour:min "slot": "15:00 - 15:45"
*instead of 45 you can put the minutes you want like 60,15 etc
Test code here
aggregate(
[{"$match":{"$expr":{"$eq":["$serviceDate", "2021-12-29"]}}},
{"$set":
{"serviceStartTime":
{"$dateFromParts":
{"year":{"$year":"$serviceStartTime"},
"day":{"$dayOfYear":"$serviceStartTime"},
"hour":
{"$switch":
{"branches":
[{"case":{"$lt":[{"$hour":"$serviceStartTime"}, 8]}, "then":8},
{"case":{"$gt":[{"$hour":"$serviceStartTime"}, 19]},
"then":19}],
"default":{"$hour":"$serviceStartTime"}}},
"minute":
{"$switch":
{"branches":
[{"case":{"$lt":[{"$hour":"$serviceStartTime"}, 8]}, "then":0},
{"case":{"$gt":[{"$hour":"$serviceStartTime"}, 19]},
"then":0}],
"default":{"$minute":"$serviceStartTime"}}}}}}},
{"$set":
{"serviceEndTime":
{"$dateFromParts":
{"year":{"$year":"$serviceEndTime"},
"day":{"$dayOfYear":"$serviceEndTime"},
"hour":
{"$switch":
{"branches":
[{"case":{"$lt":[{"$hour":"$serviceEndTime"}, 8]}, "then":8},
{"case":{"$gt":[{"$hour":"$serviceEndTime"}, 19]}, "then":
20}],
"default":{"$hour":"$serviceEndTime"}}},
"minute":
{"$switch":
{"branches":
[{"case":{"$lt":[{"$hour":"$serviceEndTime"}, 8]}, "then":0},
{"case":{"$gt":[{"$hour":"$serviceEndTime"}, 19]}, "then":0}],
"default":{"$minute":"$serviceEndTime"}}}}}}},
{"$set":
{"slots":
{"$map":
{"input":
{"$range":
[0,
{"$subtract":
[{"$add":["$serviceEndTime", {"$multiply":[60, 1000]}]},
"$serviceStartTime"]},
{"$multiply":[45, 60, 1000]}]},
"in":{"$add":["$serviceStartTime", "$$this"]}}}}},
{"$set":
{"slots":
{"$filter":
{"input":"$slots",
"cond":
{"$and":
[{"$gte":[{"$hour":"$$this"}, 8]},
{"$lte":[{"$hour":"$$this"}, 19]}]}}}}},
{"$set":
{"slots":
{"$reduce":
{"input":"$slots",
"initialValue":{"data":[], "prv":null},
"in":
{"$cond":
[{"$eq":["$$value.prv", null]},
{"data":"$$value.data", "prv":"$$this"},
{"data":
{"$concatArrays":
["$$value.data", [["$$value.prv", "$$this"]]]},
"prv":"$$this"}]}}}}},
{"$project":{"_id":0, "staffId":1, "slots":"$slots.data"}},
{"$unwind":"$slots"},
{"$project":
{"staffId":1,
"slot":
{"$concat":
[{"$dateToString":
{"date":{"$arrayElemAt":["$slots", 0]}, "format":"%H:%M"}},
" - ",
{"$dateToString":
{"date":{"$arrayElemAt":["$slots", 1]}, "format":"%H:%M"}}]}}}])

Flter mongodb database using mongoose nodejs

I need to filter some users according to some fixed criteria. I have a user collection and a talent collection. The talent collection holds the reference to a master category collection.
What I need is to filter these users according to the category in the talent collection and some keys from the user collection.
For example I need to search for a user whose gender is 'male' and education 'BTech' and will have talents as a programmer and tester
my user collection is like,
{
"_id": "5f1939239bd35429ac9cd78f",
"isOtpVerified": "false",
"role": "user",
"adminApproved": 1,
"status": 0,
"languages": "Malayalam, Tamil, Telugu, Kannada",
"name": "Test user",
"email": "test#email.com",
"phone": "1234567890",
"otp": "480623",
"uid": 100015,
"bio": "Short description from user",
"dob": "1951-09-07T00:00:00.000Z",
"gender": "Male",
"education": "Btech",
"bodyType": "",
"complexion": "",
"height": "",
"weight": "",
"requests": [],
"location": {
"place": "place",
"state": "state",
"country": "country"
},
"image": {
"avatar": "5f1939239bd35429ac9cd78f_avatar.jpeg",
"fullsize": "5f1939239bd35429ac9cd78f_fullsize.png",
"head_shot": "5f1939239bd35429ac9cd78f_head_shot.jpeg",
"left_profile": "5f1939239bd35429ac9cd78f_left_profile.png",
"right_profile": "5f1939239bd35429ac9cd78f_right_profile.png"
},
"__v": 42,
"createdAt": "2020-07-23T07:15:47.387Z",
"updatedAt": "2020-08-18T18:54:22.272Z",
}
Talent collection
[
{
"_id": "5f38efef179aca47a0089667",
"userId": "5f1939239bd35429ac9cd78f",
"level": "5",
"chars": {
"type": "Fresher",
},
"category": "5f19357b50bcf9158c6be572",
"media": [],
"createdAt": "2020-08-16T08:35:59.692Z",
"updatedAt": "2020-08-16T08:35:59.692Z",
"__v": 0
},
{
"_id": "5f3b7e6f7e322948ace30a2c",
"userId": "5f1939239bd35429ac9cd78f",
"level": "3",
"chars": {
"type": "Fresher",
},
"category": "5f19359250bcf9158c6be573",
"media": [
{
"adminApproved": 0,
"status": 0,
"_id": "5f3c22573065f84a48e04a14",
"file": "id=5f1939239bd35429ac9cd78f&dir=test&img=5f1939239bd35429ac9cd78f_image_undefined.jpeg",
"description": "test",
"fileType": "image",
"caption": "test file"
},
{
"adminApproved": 0,
"status": 0,
"_id": "5f3c2d7a8c7f8336b0bfced2",
"file": "id=5f1939239bd35429ac9cd78f&dir=test&img=5f1939239bd35429ac9cd78f_image_1.jpeg",
"description": "this is a demo poster for testing",
"fileType": "image",
"caption": "A Test Poster"
}
],
"createdAt": "2020-08-18T07:08:31.532Z",
"updatedAt": "2020-08-18T19:35:22.899Z",
"__v": 2
}
]
And the category in the above document is a separate one populated to this. the category collection as,
[
{
"_id": "5f19359250bcf9158c6be573",
"status": true,
"title": "Testing",
"description": "Application tester",
"code": "test",
"characteristics": [],
"createdAt": "2020-07-23T07:00:34.221Z",
"updatedAt": "2020-07-23T07:00:34.221Z",
"__v": 0
},
{
"status": true,
"_id": "5f29829a705b4e648c28bc88",
"title": "Designer",
"description": "UI UX Designer",
"code": "uiux",
"createdAt": "2020-08-04T15:45:30.125Z",
"updatedAt": "2020-08-04T15:45:30.125Z",
"__v": 0
},
{
"_id": "5f19357b50bcf9158c6be572",
"status": true,
"title": "programming",
"description": "Java programmer",
"code": "program",
"createdAt": "2020-07-23T07:00:11.137Z",
"updatedAt": "2020-07-23T07:00:11.137Z",
"__v": 0
}
]
So my filter terms will be;
{
categories: ["5f19359250bcf9158c6be573", "5f19357b50bcf9158c6be572"],
minAge: 18,
maxAge: 25,
minHeight: 5,
maxHeight: 6,
minWeight: 50,
maxWeight: 80,
complexion: "white",
gender: "male",
}
And the expected result will be a user have both the above talents and followed conditions,
{
users: { ..User details.. },
medias: { ...medias from the matching talents.. }
}
If there are two collections you need to join them either by primary key or _id with foriegn fields and you can use $lookup with $match to filter down.
Documentation
You need to use $lookup with pipeline,
$match you condition for category match
$lookup to join users collection
$match conditions for users collections fields
$match exclude documents that don't found matching users of criteria passed in conditions
db.talents.aggregate([
{
$match: {
category: { $in: ["5f19359250bcf9158c6be573", "5f19357b50bcf9158c6be572"] }
}
},
{
$lookup: {
from: "users",
as: "users",
let: { userId: "$userId" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$$userId", "$_id"] },
{ $eq: ["$gender", "Male"] },
{ $eq: ["$education", "Btech"] }
// ... add you other match criteria here
]
}
}
}
]
}
},
{ $match: { users: { $ne: [] } } }
])
Playground

limit EntityRecognitionSkill to confident > .5

I'm using Microsoft.Skills.Text.EntityRecognitionSkill in my skillset which output "Person", "Location", "Organization".
however I want to only output Location that have a confident level > .5
is there a way to do that?
here is a snap of my code
{
"#odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"categories": [
"Person",
"Location",
"Organization"
],
"context": "/document/finalText/pages/*",
"inputs": [
{
"name": "text",
"source": "/document/finalText/pages/*"
},
{
"name": "languageCode",
"source": "/document/languageCode"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "locations"
},
{
"name": "namedEntities",
"targetName": "entities"
}
]
},
[Edited based on Mick's comment]
Yes, this should be possible by setting the minimumPrecision parameter of the entity recognition skill to 0.5, which will result in entities whose confidence is >= 0.5 to be returned.
The documentation for entity recognition skill is here: https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-entity-recognition
As Mick points out, the documentation says minimumPrecision is unused, however that documentation is out of date and I will fix it soon.

MongoDB create product summary collection

Say I have a product collection like this:
{
"_id": "5a74784a8145fa1368905373",
"name": "This is my first product",
"description": "This is the description of my first product",
"category": "34/73/80",
"condition": "New",
"images": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstproduct_image1.jpg"
},
...
],
"attributes": [
{
"name": "Material",
"value": "Synthetic"
},
...
],
"variation": {
"attributes": [
{
"name": "Color",
"values": ["Black", "White"]
},
{
"name": "Size",
"values": ["S", "M", "L"]
}
]
}
}
and a variation collection like this:
{
"_id": "5a748766f5eef50e10bc98a8",
"name": "color:black,size:s",
"productID": "5a74784a8145fa1368905373",
"condition": "New",
"price": 1000,
"sale": null,
"image": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstvariation_image1.jpg"
}
],
"attributes": [
{
"name": "Color",
"value": "Black"
},
{
"name": "Size",
"value": "S"
}
]
}
I want to keep the documents separate and for the purpose of easy browsing, searching and faceted search implementation, I want to fetch all the data in a single query but I don't want to do join in my application code.
I know it's achievable using a third collection called summary that might look like this:
{
"_id": "5a74875fa1368905373",
"name": "This is my first product",
"category": "34/73/80",
"condition": "New",
"price": 1000,
"sale": null,
"description": "This is the description of my first product",
"images": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstproduct_image1.jpg"
},
...
],
"attributes": [
{
"name": "Material",
"value": "Synthetic"
},
...
],
"variations": [
{
"condition": "New",
"price": 1000,
"sale": null,
"image": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstvariation_image.jpg"
}
],
"attributes": [
"color=black",
"size=s"
]
},
...
]
}
problem is, I don't know how to keep the summary collection in sync with the product and variation collection. I know it can be done using mongo-connector but i'm not sure how to implement it.
please help me, I'm still a beginner programmer.
you don't actually need to maintain a summary collection, its redundant to store product and variation summary in another collection
instead of you can use an aggregate pipeline $lookup to outer join product and variation using productID
aggregate pipeline
db.products.aggregate(
[
{
$lookup : {
from : "variation",
localField : "_id",
foreignField : "productID",
as : "variations"
}
}
]
).pretty()

Modifying elasticsearch score based on nested field value

I want to modify scoring in ElasticSearch (v2+) based on the weight of a field in a nested object within an array.
For instance, using this data:
PUT index/test/0
{
"name": "red bell pepper",
"words": [
{"text": "pepper", "weight": 20},
{"text": "bell","weight": 10},
{"text": "red","weight": 5}
]
}
PUT index/test/1
{
"name": "hot red pepper",
"words": [
{"text": "pepper", "weight": 15},
{"text": "hot","weight": 11},
{"text": "red","weight": 5}
]
}
I want a query like {"words.text": "red pepper"} which would rank "red bell pepper" above "hot red pepper".
The way I am thinking about this problem is "first match the 'text' field, then modify scoring based on the 'weight' field". Unfortunately I don't know how to achieve this, if it's even possible, or if I have the right approach for something like this.
If proposing alternative approach, please try and keep a generalized idea where there are tons of different similar cases (eg: simply modifying the "red bell pepper" document score to be higher isn't really a suitable alternative).
The approach you have in mind is feasible. It can be achieved via function score in a nested query .
An example implementation is shown below :
PUT test
PUT test/test/_mapping
{
"properties": {
"name": {
"type": "string"
},
"words": {
"type": "nested",
"properties": {
"text": {
"type": "string"
},
"weight": {
"type": "long"
}
}
}
}
}
PUT test/test/0
{
"name": "red bell pepper",
"words": [
{"text": "pepper", "weight": 20},
{"text": "bell","weight": 10},
{"text": "red","weight": 5}
]
}
PUT test/test/1
{
"name": "hot red pepper",
"words": [
{"text": "pepper", "weight": 15},
{"text": "hot","weight": 11},
{"text": "red","weight": 5}
]
}
post test/_search
{
"query": {
"bool": {
"disable_coord": true,
"must": [
{
"match": {
"name": "red pepper"
}
}
],
"should": [
{
"nested": {
"path": "words",
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field" : "words.weight",
"missing": 0
}
}
],
"query": {
"match": {
"words.text": "red pepper"
}
},
"score_mode": "sum",
"boost_mode": "replace"
}
},
"score_mode": "total"
}
}
]
}
}
}
Result :
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "0",
"_score": 26.030865,
"_source": {
"name": "red bell pepper",
"words": [
{
"text": "pepper",
"weight": 20
},
{
"text": "bell",
"weight": 10
},
{
"text": "red",
"weight": 5
}
]
}
},
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 21.030865,
"_source": {
"name": "hot red pepper",
"words": [
{
"text": "pepper",
"weight": 15
},
{
"text": "hot",
"weight": 11
},
{
"text": "red",
"weight": 5
}
]
}
}
]
}
The query in a nutshell would score a document that satisfies the must clause as follows : sum up the weights of the matched nested documents with the score of the must clause.

Resources