node.js+mongoose - How to implement matching (like order matching)? - node.js

I have a node.js+mongoose rest api. I have two schemas which needs to be matched as and when a new entry is added to either one or on a timely basis. The matching will compare the whole set of documents with a set of paramters. Let's say, for example, i have the below 2 schemas -
Males: {
age: Number,
location: String,
language: String,
matchedFemales: []
}
Females: {
age: Number,
location: String,
language: String,
matchedMales: []
}
Now, i have to take a collection and scroll through all the documents and find matches. I have lots of parameters as matching criteria, but let us take for an example, language, location should be same and the age is almost equal (+ or - 1 year). Like below -
Males: [{
id: 1001,
age: 20,
location: London,
language: English,
matchedFemales: [2001,2002]
},
{
id: 1002,
age: 30,
location: London,
language: English,
matchedFemales: []
},
{
id: 1003,
age: 20,
location: Madrid,
language: Spanish,
matchedFemales: [2003]
}]
Females: [{
id: 2001,
age: 20,
location: London,
language: English,
matchedFemales: [1001]
},
{
id: 2002,
age: 19,
location: London,
language: English,
matchedFemales: [1001]
},
{
id: 2003,
age: 20,
location: Madrid,
language: Spanish,
matchedFemales: [1003]
}]
How to perform this matching and how to store the matches?
Should i iterate though each document in Male collection and find matches in the Female collection and update it? IF so, i plan to have a service do it and call this service every X minutes. This job will be time and resource consuming (as it has to go through each document and find matches) but will be run for a definite number of times per day. If its every 5 mins, then it will be run only 12 times in an hour.
Instead of matching all documents in LHS against all documents in RHS, as and when a record is getting inserted, i can find matches just for that document and update it. This method will be less time & resource consuming than the previous method, but it will be run more number of times i.e., for every insert/update this has to be done.
Or is there any other elegant way to do this?
P.S - If this question seems inappropriate, kindly direct me to the right source for reference or consultation.

Related

How to Perform UPSERT Operation in Arango DB with Different multiple keys (Composite Key)?

In official documentations, it's already shown how to do that. Below, an example that working fine:
Example: 1
LET documents = [
{ name: 'Doc 1', value: 111, description: 'description 111' },
{ name: 'Doc 2', value: 222, description: 'description 2' },
{ name: 'Doc 3', value: 333, description: 'description 3' }
]
FOR doc IN documents
UPSERT { name: doc.name, description: doc.description }
INSERT doc
UPDATE doc
IN MyCollection
But, I want to check different multiple keys for each document on UPSERT, like:
Example: 2
LET documents = [
{ name: 'Doc 1', value: 777, description: 'description 111' },
{ name: 'Doc 2', value: 888, description: 'description 2' },
{ name: 'Doc 3', value: 999, description: 'description 3' }
]
FOR doc IN documents
UPSERT {
{ name: doc.name, description: doc.description },
{ value: doc.value, description: doc.description },
{ name: doc.name, value: doc.value }
}
INSERT doc
UPDATE doc
IN MyCollection
Or, any other other way (using filter or something). I had tried but nothing works
If I understand your problem, you would want to update a document, if there's an existing one with at least 2 fields matching, otherwise insert it as new.
UPSERT won't be able to do that. It can only do one match. So a subquery is necessary. In the solution below, I ran a query to find the key of the first document that matches at least 2 fields. If there's no such document then it will return null.
Then the UPSERT can work by matching the _key to that.
LET documents = [
{ name: 'Doc 1', value: 777, description: 'description 111' },
{ name: 'Doc 2', value: 888, description: 'description 2' },
{ name: 'Doc 3', value: 999, description: 'description 3' }
]
FOR doc IN documents
LET matchKey= FIRST(
FOR rec IN MyCollection
FILTER (rec.name==doc.name) + (rec.value==doc.value) + (rec.description==doc.description) > 1
LIMIT 1
RETURN rec._key
)
UPSERT {_key:matchKey}
INSERT doc
UPDATE doc
IN MyCollection
Note: There's a trick with adding booleans together which works because true will be converted to 1, while false is zero. You can write it out explicitly like this: (rec.name==doc.name?1:0)
While this will work for you it's not a very effective solution. Actually there's no effective one in this case because all the existing documents need to be scoured through to find a matching one, for each document to be added/updated. I'm not sure what kind of problem you are trying to solve with this, but it might be better to re-think your design so that the matching condition could be more simple.

Schema for inheriting content from central down to subsidiary

We have a content model where we set content at a central level, but users have the ability to edit the content down to subsidiary level if they want to. If they don't want to change anything from what gets set from central they don't have to.
Let's assume a simple api endpoint that just takes in companyId. This is how it would respond:
GET /offers (central values)
[{
offerId: 1,
title: "Central offer title",
paragraph: "Lorem ipsum central",
price: 100
}, {...more offers}]
Company 123
/offers?companyId=123
[{
offerId: 1,
title: "Company offer title",
paragraph: "Lorem ipsum central", // Inherited from central
price: 125
}, {...more offers}]
Company 456 which is a subsidiary to 123
/offers?companyId=456
[{
offerId: 1,
title: "Company offer title", // Inherited from Company 1
paragraph: "Lorem ipsum subsidiary",
price: 125, // Inherited from Company 1
custom_field: "A completely custom field for subsidiary" // Field only available for subsidiary
}, {...more offers}]
In previous implementations we have done something in the lines of:
{
offerId: 1,
values: [
{
companyId: null,
title: "Central offer title",
paragraph: "Lorem ipsum central",
price: 100
},
{
companyId: 123,
title: "Company offer title",
price: 125
},
{
companyId: 456,
paragraph: "Lorem ipsum subsidiary",
custom_field: "A completely custom field for subsidiary"
}
]
}
And then in the application we have compiled this down so values are specific for subsidiary, but still inheriting data from central or parent company.
Now that we're about to write new applications that should once again allow this type of content inheritance, we're having doubts about this approach and wonder if there's another way to model this.
What other ways exist to model this type of behavior?
Are you using mongoose? If so - use Discriminators:
const event1 = new Event({ time: Date.now() });
const event2 = new ClickedLinkEvent({ time: Date.now(), url: 'google.com' });
const event3 = new SignedUpEvent({ time: Date.now(), user: 'testuser' });
More on that topic in this fine read: https://dev.to/helenasometimes/getting-started-with-mongoose-discriminators-in-expressjs--22m9
If not - then I think you should follow the same approach as in mongoose - have each company as a separate document. Because in your old setup, there are a few things:
Company resides in Offer document
If multiple offers use the same companies, you'd duplicate company data, right?
There's no easy way to search for company - you get offers.values and search for a property inside. That's basically traversing your entire database.
If you only search by company (and not offer ID as you said), you can reverse them. But it's still weird. I'd split them in two separate things and use references between them. This way you could:
Find company by id
Find it's parent (do this until there is no parent; use aggregate to make it in a single query)
Get a list of offers for that company (document property) and query them
This approach would allow you to do the opposite (offer to company).

how to get the document details according to the field value in mongodb aggregate

I have a collection named users
var UserSchema = new Schema({
name: String,
age: Number,
points: {type: Number, default: 0}
})
all users have some different points like 5, 10, 20, 50
so i want to count the number of users having 5 points, 10 points etc, and want to show the counted users details also, like who are those users which are having 5 points, 10 points etc.
how to write query for that in $aggregate
You can write a group stage and push all the values you need using the $push operator
db.collection.aggregate([
{
"$group": {
"_id": "$points",
"details": {
"$push": {
"name": "$name",
"age": "$age"
}
}
}
}
])
In the above example, I've grouped according to points and for each group, you'll get an array containing the name and age of the people having those points

How does MongoDB $text search works?

I have inserted following values in my events collection
db.events.insert(
[
{ _id: 1, name: "Amusement Ride", description: "Fun" },
{ _id: 2, name: "Walk in Mangroves", description: "Adventure" },
{ _id: 3, name: "Walking in Cypress", description: "Adventure" },
{ _id: 4, name: "Trek at Tikona", description: "Adventure" },
{ _id: 5, name: "Trekking at Tikona", description: "Adventure" }
]
)
I've also created a index in a following way:
db.events.createIndex( { name: "text" } )
Now when I execute the following query (Search - Walk):
db.events.find({
'$text': {
'$search': 'Walk'
},
})
I get these results:
{ _id: 2, name: "Walk in Mangroves", description: "Adventure" },
{ _id: 3, name: "Walking in Cypress", description: "Adventure" }
But when I search Trek:
db.events.find({
'$text': {
'$search': 'Trek'
},
})
I get only one result:
{ _id: 4, name: "Trek at Tikona", description: "Adventure" }
So my question is why it dint resulted:
{ _id: 4, name: "Trek at Tikona", description: "Adventure" },
{ _id: 5, name: "Trekking at Tikona", description: "Adventure" }
When I searched walk it resulted the documents containing both walk and walking. But when I searched for Trek it only resulted the document including trek where it should have resulted both trek and trekking
MongoDB text search uses the Snowball stemming library to reduce words to an expected root form (or stem) based on common language rules. Algorithmic stemming provides a quick reduction, but languages have exceptions (such as irregular or contradicting verb conjugation patterns) that can affect accuracy. The Snowball introduction includes a good overview of some of the limitations of algorithmic stemming.
Your example of walking stems to walk and matches as expected.
However, your example of trekking stems to trekk so does not match your search keyword of trek.
You can confirm this by explaining your query and reviewing the parsedTextQuery information which shows the stemmed search terms used:
db.events.find({$text: {$search: 'Trekking'} }).explain().queryPlanner.winningPlan.parsedTextQuery
{
​ "terms" : [
​ "trekk"
​ ],
​ "negatedTerms" : [ ],
​ "phrases" : [ ],
​ "negatedPhrases" : [ ]
}
You can also check expected Snowball stemming using the online Snowball Demo or by finding a Snowball library for your preferred programming language.
To work around exceptions that might commonly affect your use case, you could consider adding another field to your text index with keywords to influence the search results. For this example, you would add trek as a keyword so that the event described as trekking also matches in your search results.
There are other approaches for more accurate inflection which are generally referred to as lemmatization. Lemmatization algorithms are more complex and start heading into the domain of natural language processing. There are many open source (and commercial) toolkits that you may be able to leverage if you want to implement more advanced text search in your application, but these are outside the current scope of the MongoDB text search feature.

Is reading whole object from DocumentDb faster and more efficient?

I'm trying to understand if it would actually be more efficient to read the entire document from Azure DocumentDb than it is to read a property that may have multiple objects in it?
Let's use this basketball team object as an example:
{
id: 123,
name: "Los Angeles Lakers",
coach: "Byron Scott",
players: [
{ id: 24, name: "Kobe Bryant" },
{ id: 3, name: "Anthony Brown" },
{ id: 4, name: "Ryan Kelly" },
]
}
If I want to get only a list of players, is it more efficient/faster for me to read the entire team document from which I can extract the players OR is it better to send SQL statement and try to read only the players from the document?
Returning only the players will be more efficient on the network, as you're returning less data. And, you should also be able to look at the Request Units burned for your query.
For example, I put your document into one of my collections and ran two queries in the portal (and if you do the same, and look at the bottom of the portal, you'll see the resulting Request Unit cost). I slightly modified your document with unique ID and quotes around everything, so I could load it via the portal:
{
"id": "basketball123",
"name": "Los Angeles Lakers",
"coach": "Byron Scott",
"players": [
{ "id": 24, "name": "Kobe Bryant" },
{ "id": 3, "name": "Anthony Brown" },
{ "id": 4, "name": "Ryan Kelly" }
]
}
I first selected just player data:
SELECT c.players FROM c where c.id="basketball123"
with an RU cost of 2.2:
I then asked for the entire document:
SELECT * FROM c where c.id="basketball123"
with an RU cost of 2.24:
Note: Your document size is very small, so there's really not much difference here. But at least you can see that returning a subset costs less than returning the entire document.

Resources