Mongo: index performance and order of query fields - node.js

I am puzzled by MongoDB's indexes and their performance. I am building a node.js app to find dishes within a certain area.
My table looks like this and has about 3M dishes:
{
"_id" : ObjectId("560efcf76ea0f2293c60bf6a"),
"name" : "Brunch Pizza",
"loc" : [
-77.063166,
38.906866
],
"rs" : NumberInt(4) }
I have several indexes, but here are the relevant ones:
{ "loc" : "2d" }
{ "loc" : "2d", "name" : 1}
Now, when I query it for either field, the response times are very quick (less than 0.2 second). When I query for both together, it's about 1 or 2 seconds. What am I doing wrong?
sort is always: {rs:-1}
{"loc":{"$within":{"$box":[[-78.0,38.0],[-77.0,39.0]]}}}: 0.173s for 186k documents
{name:/pizza/gi}: 0.112s
but
{"loc":{"$within":{"$box":[[-78.0,38.0],[-77.0,39.0]]}}, name:/pizza/gi}: 2s
{name:/pizza/gi, "loc":{"$within":{"$box":[[-78.0,38.0],[-77.0,39.0]]}}}: 2s
These are the numbers from MongoChef, but when called from node, it's similar times.

Related

Count occurrence of id in a complex array of objects structure(mongodb)

I need to check the amount of occurrence for a specific product id in a high complexity array of objects structure.
I tried with the aggregation functionality of MongoDB, but I only got the occurrences per document rather than the occurrences cross all the documents. The data structure of each document looks like this:
{
"_id" : ObjectId("5c4d67905a07f5dec1fcc763"),
"updatedAt" : ISODate("2019-01-27T08:16:11.706Z"),
"createdAt" : ISODate("2019-01-27T08:10:56.553Z"),
"pickupTime" : ISODate("2019-01-27T08:20:00.000Z"),
"shop" : ObjectId("5c24b007b55ea3d7c599c95b"),
"owner" : ObjectId("5c242e361ee775cdd8b047b6"),
"total" : 350,
"status" : "completed",
"lineItems" : [
{
"product" : {
"options" : [],
"enabled" : false,
"description" : "",
"shop" : ObjectId("5c24b007b55ea3d7c599c95b"),
"price" : 350,
"name" : "Capuccino",
"createdAt" : ISODate("2019-01-01T21:56:31.928Z"),
"updatedAt" : ISODate("2019-01-08T12:14:53.322Z"),
"_id" : ObjectId("5c2be20f8e52115849726fdc")
},
"_id" : ObjectId("5c48efbc9efde32fab7ae47d"),
"status" : "pending",
"quantity" : 1,
"config" : []
}
],
"__v" : 0
}
As you can see each document is a order, each order has lineItems that is a array of objects that contain the quantity of the desired product, the status of the current product, some configuration and the product itself. The product field is a snapshot of the product in the moment of the order as they may be changed by the supplier (cost, name description, etc.)
I currently tried the next aggregate formula:
{
$group: {
_id: {
'product': '$lineItems.product._id'
},
count: {
$sum: 1
}
}
}
But only returns the the occurrences of a product in each order, rather than cross all the orders. I understood I need to use $reduce to reformat the data structure, but I couldn't find a way how to organice my current data structure in order to make it match. Also I got said that after counting per product I can use the $sort to put the highest values on the top and then limit the search to the amount of records to return
The desirable result is get the top 5 most sold products based on the orders data.
Thanks.

mongoose, nodejs - add reference of current schema object to the previous schema object

I am using mongoose, nodejs with MVC architecture.
So, I have two collections crops and pesticides. I want a many to many relationship between these two collections.
For example, if I have 2 crops like below:
{
"_id" : ObjectId("5af1d1d54558fae1d0010bb4"),
"nameOfCrop" : "Tomato",
"imageOfCrop" : "tomatoimage",
"soilType" : " almost all soil types except heavy clay",
"waterNeeded" : "water once every two or three days",
"tagCrop" : "Vegetables",
"pesticideForCrop" : [ ]
}
{
"_id" : ObjectId("5af1d1d54558fae1d0010bb5"),
"nameOfCrop" : "Brinjal",
"imageOfCrop" : "brinjalimage",
"soilType" : "all types of soil varying from light sandy to heavy clay",
"waterNeeded" : "Regularly irrigated",
"tagCrop" : "Vegetables",
"pesticideForCrop" : [ ]
}
and two pesticides like below:
{
"_id" : ObjectId("5af7d3e735d4222b78a93838"),
"cropForPesticide" : [ ],
"nameOfPesticide" : "pesticide8",
"imageOfPesticide" : "p8image",
"__v" : 0
}
{
"_id" : ObjectId("5af7d49122b63e0824ed2d3d"),
"cropForPesticide" : [ ],
"nameOfPesticide" : "pesticide9",
"imageOfPesticide" : "p9image",
"__v" : 0
}
What I want is that tomato's pesticideForCrop key have object ids(suppose) of the pesticide pesticide8 and pesticide9 (meaning tomato can be treated with pesticide8 and pesticide9) and simultaneously I want a reference(_id) of tomato in the pesticide8's cropForPesticide key and pesticide9's cropForPesticide key.
I have a very vague approach in mind like firstly, I save a crop with the pesticideForCrop key being null at this point. Then I save a pesticide and while saving it, I can ask the user to select the crops which can be treated with that pesticide. I don't know how to code this. It would be nice if another feasible approach can be notified of or someone can point me in the right direction of how to code this.

Sort documents by a present field and a calculated value

How would I go about displaying the best reviews and the worst reviews at the top of the page.
I think the user's "useful" and "notUseful" votes should have an effect on the result.
I have reviews and if people click on the useful and notUseful buttons their Id gets added to the appropriate array (useful or notUseful).
you can tell what a positive or a negative score is by the "overall" score. that is 1 through 5. so 1 would be the worst and 5 would be the best.
I guess If someone gave a review with a 5 overall score but only got one useful but someone gave a score with a 4 overall and 100 people clicking on "useful" the one with 100 people should be shown as the best positive?
I only want to show 2 reviews at the top of the page the best and the worst worst review if there are ties with the overall scores the deciding factor should be the usefulness. so if there are 2 reviews with the same overall score and one of them has 5 usefuls and 10 notUsefuls that would be -5 usefuls and in the other review someone has 5 usefuls and and 4 notUsefuls that would be 1 usefuls so that would be shown to break the tie.
I'm hopping to do it with one mongoose query and not aggregation but I think the answer will be aggregation.
I guess there could be a cut off like scores greater than 3 is a positive review and lower is negative review.
I use mongoose.
Thanks in advance for your help.
some sample data.
{
"_id" : ObjectId("5929f89a54aa92274c4e4677"),
"compId" : ObjectId("58d94c441eb9e52454932db6"),
"anonId" : ObjectId("5929f88154aa92274c4e4675"),
"overall" : 3,
"titleReview" : "53",
"reviewText" : "53",
"companyName" : "store1",
"replies" : [],
"version" : 2,
"notUseful" : [ObjectId("58d94c441eb9e52454932db6")],
"useful" : [],
"dateCreated" : ISODate("2017-05-27T22:07:22.207Z"),
"images" : [],
"__v" : 0
}
{
"_id" : ObjectId("5929f8dfa1435135fc5e904b"),
"compId" : ObjectId("58d94c441eb9e52454932db6"),
"anonId" : ObjectId("5929f8bab0bc8834f41e9cf8"),
"overall" : 3,
"titleReview" : "54",
"reviewText" : "54",
"companyName" : "store1",
"replies" : [],
"version" : 1,
"notUseful" : [ObjectId("5929f83bf371672714bb8d44"), ObjectId("5929f853f371672714bb8d46")],
"useful" : [],
"dateCreated" : ISODate("2017-05-27T22:08:31.516Z"),
"images" : [],
"__v" : 0
}
{
"_id" : ObjectId("5929f956a692e82398aaa2f2"),
"compId" : ObjectId("58d94c441eb9e52454932db6"),
"anonId" : ObjectId("5929f93da692e82398aaa2f0"),
"overall" : 3,
"titleReview" : "56",
"reviewText" : "56",
"companyName" : "store1",
"replies" : [],
"version" : 1,
"notUseful" : [],
"useful" : [],
"dateCreated" : ISODate("2017-05-27T22:10:30.608Z"),
"images" : [],
"__v" : 0
}
If I am reading your question correctly then it appears you want a calculated difference of the "useful" and "nonUseful" votes to also be taken into account when sorting on the "overall" score of the documents.
The better option here is include that calculation in your stored documents, but for totality we will cover both options.
Aggregation
Without changes to your schema and other logic, then aggregation is indeed required to do that calculation. This is best presented as:
Model.aggregate([
{ "$addFields": {
"netUseful": {
"$subtract": [
{ "$size": "$useful" },
{ "$size": "$notUseful" }
]
}
}},
{ "$sort": { "overall": 1, "netUseful": -1 } }
],function(err, result) {
})
So you are basically getting the difference between the two arrays, where more "useful" responses have a positive impact boosting the ranking ans more "notUseful" will reduce that impact. Depending on the MongoDB version you have available you use either $addFields with only the additional field or $project with all the fields you need to return.
The $sort is then performed on the combination of the "overall" score in ascending order as per your rules, and the new field of "netUseful" in descending order ranking "positive" to "negative".
Re-Modelling
Foregoing aggregation altogether, you get a faster result from the plain query. But this of course means maintaining that "score" in the document as you add members to the array.
In basic options, you are using the $inc update operator along with $push to change the score.
So for a "useful" entry, you would do something like this:
Model.update(
{ "_id": docId, "useful": { "$ne": userId } },
{
"$push": { "useful": userId },
"$inc": { "netUseful": 1 }
},
function(err, status) {
}
)
And for a "notUseful" you do the opposite by "decrementing" with a negative value to $inc:
Model.update(
{ "_id": docId, "nonUseful": { "$ne": userId } },
{
"$push": { "nonUseful": userId },
"$inc": { "netUseful": -1 }
},
function(err, status) {
}
)
To cover all cases including where a vote is "changed" from "useFul" to "nonUseful" then you would expand on the logic and implement the appropriate reverse actions with $pull. But this should give the general idea.
N.B The reason we do not use the $addToSet operation here is because we want to make sure the user id is not present in the array when "incrementing" or "decrementing". Thus instead the $ne operator is used to test the value does not exist. If it does, then we do not attempt to modify the array or affect the "netUseful" value. The same applies to the reverse case of "removing" the user from those votes.
Since the calculation is always maintained with each update, you simply perform as query with a standard .sort()
Model.find().sort({ "overall": 1, "netUseful": -1 }).exec(function(err,results) {
})
So by moving the "cost" into the maintenance of the "votes", you remove the overhead of running the aggregation later. For my money, where this is a regular operation and the "sort" does not rely on other run-time parameters which force the calculation to be dynamic, then you use the stored result instead.

Mongodb query slow response time

I'm working on a project that uses flexible schemas. I've setup a local mongodb server and am using mongoose inside node.
Having an interesting scaling problem and was wondering if these response times were normal. If a query returns 50 documents, I takes 5-10 seconds for mongo to respond. In the same collection, a query that returns 2 documents is milliseconds.
It's not a slow connection because it's local, was wondering if anyone had an idea as to what was causing this.
I'm using OS X and mongo 3.0.1
Edit: The documents are nearly empty at the moment, with just one or two properties.
Edit: The total number of documents doesn't really matter, just the returned size. If there are 51 documents, 50 like {_id: "...", _schema:"bar"} and 1 {_id:"...", _schema: "foobar" } then collection.find({_schema:"bar"}) takes several seconds and collection.find({_schema:"foobar"}) takes no time.
Explain output:
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mean-dev.documentmodels",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "Sams-MBP.local",
"port" : 27017,
"version" : "3.0.1",
"gitVersion" : "nogitversion"
},
"ok" : 1
No, it should not take that much time.
The issue is probably in the operations in your query (projections, sorting, geosearch, grouping, etc). The best way to solve that is by creating an index to speed up such query.
To create an index on _schema field execute that command in mongodb:
db.collection.ensureIndex({"_schema":1});

Using near with elemMatch in Mongoose

I am searching within a collection of Stores. Stores have an embedded collection of outlets with locations. My goal is to return the set of stores that have outlets near a geolocation, and also only return those Outlets within that location.
I can successfully restrict the query to only return Stores have an Outlet at a particular location using 'near'
Store
.where('isActive').equals(true)
.where('outlets.location')
.near({ center: [153.027117, -27.468515], maxDistance: 1000 / 6378137, spherical: true })
.where('outlets.isActive').equals(true)
.where('products.productType').equals('53433f1f3e02e39addde1954')
.where('products.isActive').equals(true)
.select('name outlets')
.select({'products': {$elemMatch: {'isActive': true, 'productType': '53433f1f3e02e39addde1954'}}})
.select('name outlets')
.execQ()
.then(function (results) {
console.log(results);
})
.fail(function (err) {
console.log(err);
})
.done();
The problem I have is that the store document returns all the outlets, not just the outlet that matched the geolocation. I've tried using elemMatch within a select like I did with the products;
.select({'outlets': {$elemMatch: {'location': {near:{ center: [153.027117, -27.468515], maxDistance: 10000 / 6378137, spherical: true }}}}})
However it returns an empty array. Can use use the near operator in an elemMatch clause? Am I doing it incorrectly? Is there an more efficient/fast/better way to achieve the goal?
I see what you are trying to do here but there seems to be a few flaws in this sort of design. Though not exactly your document structure I see you are trying to do something like this:
{
"_id" : ObjectId("5344badd519563414f23fdf8"),
"store" : "Mine",
"outlets" : [
{
"name" : "somewhere",
"loc" : {
"type" : "Point",
"coordinates" : [
150.975131,
-33.8440366
]
}
},
{
"name" : "else",
"loc" : {
"type" : "Point",
"coordinates" : [
151.3651524,
-33.8389783
]
}
}
]
}
{
"_id" : ObjectId("5344be6f519563414f23fdf9"),
"store" : "Another",
"outlets" : [
{
"name" : "else",
"loc" : {
"type" : "Point",
"coordinates" : [
151.3651524,
-33.8389783
]
}
},
{
"name" : "somewhere",
"loc" : {
"type" : "Point",
"coordinates" : [
150.975131,
-33.8440366
]
}
}
]
}
So basically you appear to be attempting to nest the outlet locations within an array in a top level document.
What I am referring to a flaw here is that by design, any type of "near" based query is going to return more than 1 result. That does seem logical when you look at the purpose. You can of course modify this to restrict the results by "maxDistance" but generally it will be more than 1.
So the only way is to .limit() the results returned by the cursor to a single "nearest" response. Also note that with some operations those results are not necessarily "sorted" with the "nearest response first.
Now as these results are actually contained within an array of the document, remember that .find() itself does not actually "filter" the results of an array, so of course the document will contain all of the array contents.
If you tried to "project" with a positional $ operator, then the problem falls back to the original point because there is no singular actual match, so it is not possible to return an "index" value for the matching element. If you in fact did try this, you would always get the default index value of 0, so just returning the first element.
If then you thought you could run off to aggregate and and try to actually "de-normalize" the array entries, you would be out of luck because due to the need to use the index at the first stage of any aggregation pipeline statement.
So the summary of this is that embedded entries like this are not suited to this design where you need to do geo-spatial matching on those store locations. The locations are better off in a separate collection:
{
"_id" : ObjectId("5344bec7519563414f23fdfa"),
"store": "Mine"
"name" : "else",
"loc" : {
"type" : "Point",
"coordinates" : [
151.3651524,
-33.8389783
]
}
}
{
"_id" : ObjectId("5344bed5519563414f23fdfb"),
"store": "Mine"
"name" : "somewhere",
"loc" : {
"type" : "Point",
"coordinates" : [
150.975131,
-33.8440366
]
}
}
So that would allow you to "limit" the result to the "nearest" match by setting the limit to 1. You can also include any necessary values such as the "store" to be used in your filtering. If you need to you can include other information aside from what you need to filter or otherwise just put the ObjectId values within the array of the original object, or possibly even duplicate for both collections.
But since the very nature of these queries is intended to not only return 1 match, then there is no way you are going to get this to work on embedded documents. So your solution will require some changes in your overall schema.

Resources