Mongoose query returning repeated results - node.js

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.
Example:
1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.
2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.
QUERY
Locations.find({ coordinates :
{ $near : [ x , y ],
$maxDistance: maxDistance }
})
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
console.log("[+]Found Locations");
callback(locations);
});
SCHEMA
var locationSchema = new Schema({
date_created: { type: Date },
coordinates: [],
text: { type: String }
});
I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?

There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.
The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.
The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.
The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:
var seenIds = [],
lastDistance = null,
lastDate = null;
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance
"distanceField": "dist",
"limit": 10
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".
All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance,
"minDistance": lastDistance,
"distanceField": "dist",
"limit": 10,
"query": {
"_id": { "$nin": seenIds },
"date_created": { "$lt": lastDate }
}
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.
Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.
So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.
The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

Related

CouchDB Count Reduce with timestamp filtering

Let's say I have documents like so:
{
_id: "a98798978s978dd98d",
type: "signature",
uid: "u12345",
category: "cat_1",
timestamp: UNIX_TIMESTAMP
}
My goal is to be able to count all signature's created by a certain uid but being able to filter by timestamp
Thanks to Alexis, I've gotten to this far with a reduce _count function:
function (doc) {
if (doc.type === "signature") {
emit([doc.uid, doc.timestamp], 1);
}
}
With the following queries:
start_key=[null,lowerTimestamp]
end_key=[{},higherTimestamp]
reduce=true
group_level=1
Response:
{
"rows": [
{
"key": [ "u11111" ],
"value": 3
},
{
"key": [ "u12345" ],
"value": 26
}
]
}
It counts the uid correctly but the filter doesn't work properly. At first I thought it might be a CouchDB 2.2 bug, but I tried on Cloudant and I got the same response.
Does anyone have any ideas on how I could get this to work with being ale to filter timestamps?
When using compound keys in MapReduce (i.e. the key is an array of things), you cannot query a range of keys with a "leading" array element missing. i.e. you can query a range of uuids and get the results ordered by timestamp, but your use-case is the other way round - you want to query uuids by time.
I'd be tempted to put time first in the array, but unix timestamps are not so good for grouping ;). I don't known the ins and outs of your application but if you were to index a date instead of a timestamp like so:
function (doc) {
if (doc.type === "signature") {
var date = new Date(doc.timestamp)
var datestr = date.toISOString().split('T')[0]
emit([datestr, doc.uuid], 1);
}
}
This would allow you to query a range of dates (to the resolution of a whole day):
?startkey=["2018-01-01"]&endkey=["2018-02-01"]&group_level=2
albeit with your uuids grouped by day.

mongodb sort by dynamic property [duplicate]

I have a Mongo collection of messages that looks like this:
{
'recipients': [],
'unRead': [],
'content': 'Text'
}
Recipients is an array of user ids, and unRead is an array of all users who have not yet opened the message. That's working as intended, but I need to query the list of all messages so that it returns the first 20 results, prioritizing the unread ones first, something like:
db.messages.find({recipients: {$elemMatch: userID} })
.sort({unRead: {$elemMatch: userID}})
.limit(20)
But that doesn't work. What's the best way to prioritize results based on whether they fit a certain criteria?
If you want to "weight" results by certain criteria or have any kind of "calculated value" within a "sort", then you need the .aggregate() method instead. This allows "projected" values to be used in the $sort operation, for which only a present field in the document can be used:
db.messages.aggregate([
{ "$match": { "messages": userId } },
{ "$project": {
"recipients": 1,
"unread": 1,
"content": 1,
"readYet": {
"$setIsSubset": [ [userId], "$unread" ] }
}
}},
{ "$sort": { "readYet": -1 } },
{ "$limit": 20 }
])
Here the $setIsSubset operator allows comparison of the "unread" array with a converted array of [userId] to see if there are any matches. The result will either be true where the userId exists or false where it does not.
This can then be passed to $sort, which orders the results with preference to the matches ( decending sort is true on top ), and finally $limit just returns the results up to the amount specified.
So in order to use a calulated term for "sort", the value needs to be "projected" into the document so it can be sorted upon. The aggregation framework is how you do this.
Also note that $elemMatch is not required just to match a single value within an array, and you need only specify the value directly. It's purpose is where "multiple" conditions need to be met on a single array element, which of course does not apply here.

Mongoose query returns wrong results when using $geoNear aggregate [duplicate]

Take this query:
{ 'location' : { '$near' : [x,y], '$maxDistance' : this.field } }
I want to assign $maxDistance the value of the specified field from the current evaluated document. Is that possible?
Yes it's possible. You just use $geoNear instead. Beware the catches and read carefully.
Presuming that your intent to is to store a field such as "travelDistance" to indicate on the document that any such searches must be "within" that supplied distance from the queried point to be valid. Then we simply query and evaluate the condition with $redact:
db.collection.aggregate([
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [x,y]
},
"spherical": true,
"distanceField": "distance"
}},
{ "$redact": {
"$cond": {
"if": { "$lte": [ "$distance", "$travelDistance" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The only catch is that $geoNear just like $near will only return a certain number of documents "near" in the first place. You can tune that with the options, but unlike the general query form, this is basically guaranteeing that the eventual returned results will be less than the specified "nearest" numbers.
As long as you are aware of that, then this is perfectly valid.
It is in fact the general way to deal with qualifying what is "near" within a radius.
Also be aware of the "distance" according to how you have the coordinates stored. As legacy coordinate pairs the distances will be in radians which you will probably need to do the math to convert to kilometers or miles.
If using GeoJSON, then the distances are always considered in meters, as the standard format.
All the math notes are in the documentation.
N.B Read the $geoNear documentation carefully. Options like "spherical" are required for "2dsphere" indexes, such as you should have for real world coordinates. Also "limit" may need to be applied to increase past the default 100 document result, for further trimming.
As the comments mention spring mongo, then here is basically the same thing done for that:
Aggregation aggregation = newAggregation(
new AggregationOperation() {
#Override
public DBObject toDBObject(AggregationOperationContext context) {
return new BasicDBObject("$geoNear",
new BasicDBObject(
"near", new BasicDBObject(
"type","Point")
.append("coordinates", Arrays.asList(20,30))
)
.append("spherical",true)
.append("distanceField","distance")
);
}
},
new AggregationOperation() {
#Override
public DBObject toDBObject(AggregationOperationContext context) {
return new BasicDBObject("$redact",
new BasicDBObject(
"$cond", Arrays.asList(
new BasicDBObject("$lte", Arrays.asList("$distance", "$travelDistance")),
"$$KEEP",
"$$PRUNE"
)
)
);
}
}
);

MongoDB update/insert document and Increment the matched array element

I use Node.js and MongoDB with monk.js and i want to do the logging in a minimal way with one document per hour like:
final doc:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1 }, {action: action2, count: 27 }, {action: action3, count: 5 } ] }
the complete document should be created by incrementing one value.
e.g someone visits a webpage first this hour and the incrementation of action1 should create the following document with a query:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1} ] }
an other user in this hour visits an other webpage and document should be exteded to:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1}, {action: action2, count: 1} ] }
and the values in count should be incremented on visiting the different webpages.
At the moment i create vor each action a doc:
tracking.update({
time: moment().format('YYYY-MM-DD_HH'),
action: action,
info: info
}, { $inc: {count: 1} }, { upsert: true }, function (err){}
Is this possible with monk.js / mongodb?
EDIT:
Thank you. Your solution looks clean and elegant, but it looks like my server can't handle it, or i am to nooby to make it work.
i wrote a extremly dirty solution with the action-name as key:
tracking.update({ time: time, ts: ts}, JSON.parse('{ "$inc":
{"'+action+'": 1}}') , { upsert: true }, function (err) {});
Yes it is very possible and a well considered question. The only variation I would make on the approach is to rather calculate the "time" value as a real Date object ( Quite useful in MongoDB, and manipulative as well ) but simply "round" the values with basic date math. You could use "moment.js" for the same result, but I find the math simple.
The other main consideration here is that mixing array "push" actions with possible "updsert" document actions can be a real problem, so it is best to handle this with "multiple" update statements, where only the condition you want is going to change anything.
The best way to do that, is with MongoDB Bulk Operations.
Consider that your data comes in something like this:
{ "timestamp": 1439381722531, "action": "action1" }
Where the "timestamp" is an epoch timestamp value acurate to the millisecond. So the handling of this looks like:
// Just adding for the listing, assuming already defined otherwise
var payload = { "timestamp": 1439381722531, "action": "action1" };
// Round to hour
var hour = new Date(
payload.timestamp - ( payload.timestamp % ( 1000 * 60 * 60 ) )
);
// Init transaction
var bulk = db.collection.initializeOrderedBulkOp();
// Try to increment where array element exists in document
bulk.find({
"time": hour,
"log.action": payload.action
}).updateOne({
"$inc": { "log.$.count": 1 }
});
// Try to upsert where document does not exist
bulk.find({ "time": hour }).upsert().updateOne({
"$setOnInsert": {
"log": [{ "action": payload.action, "count": 1 }]
}
});
// Try to "push" where array element does not exist in matched document
bulk.find({
"time": hour,
"log.action": { "$ne": payload.action }
}).updateOne({
"$push": { "log": { "action": payload.action, "count": 1 } }
});
bulk.execute();
So if you look through the logic there, then you will see that it is only ever possible for "one" of those statements to be true for any given state of the document either existing or not. Technically speaking, the statment with the "upsert" can actually match a document when it exists, however the $setOnInsert operation used makes sure that no changes are made, unless the action actually "inserts" a new document.
Since all operations are fired in "Bulk", then the only time the server is contacted is on the .execute() call. So there is only "one" request to the server and only "one" response, despite the multiple operations. It is actually "one" request.
In this way the conditions are all met:
Create a new document for the current period where one does not exist and insert initial data to the array.
Add a new item to the array where the current "action" classification does not exist and add an initial count.
Increment the count property of the specified action within the array upon execution of the statement.
All in all, yes posssible, and also a great idea for storage as long as the action classifications do not grow too large within a period ( 500 array elements should be used as a maximum guide ) and the updating is very efficient and self contained within a single document for each time sample.
The structure is also nice and well suited to other query and possible addtional aggregation purposes as well.

mapReduce not calling map nor reduce

I just started working with mongodb and I am having troubles using mapReduce function.
For some reason it seems to not be calling the map and reduce functions.
Here is my code:
#getMonthlyReports: (req, res) ->
app_id = req.app.id
start = moment().subtract('years', 1).startOf('month').unix()
end = moment().endOf('day').unix()
console.log(start)
console.log(end)
map = ->
geotriggers = 0
pushes = 0
console.log("ok")
date = moment(#timestamp).startOf('month').unix()
for campaign in #campaigns
if campaign.geotriggers?
geotriggers += campaign.geotriggers
else if campaign.pushes?
pushes += campaign.pushes
emit date,
geotriggers: geotriggers
pushes: pushes
reduce = (key, values) ->
console.log("ok")
geotriggers = 0
pushes = 0
for value in values
geotriggers += value.geotriggers
pushes += value.pushes
geotriggers: geotriggers
pushes: pushes
common.db.collection(req.app.id + "_daily_campaign_reports").mapReduce map, reduce,
query:
timestamp:
$gte: start
$lt: end
out:
inline: 1
, (err, results) ->
console.log(results)
ResponseHelper.returnMessage req, res, 200, results
I put some console.logs and it seems the map and reduce functions are not being called.
Also my results is undefined.
Is there something I am missing?
Apart from how I have already commented on the reason your mapReduce is failing is due to calling a library function that does not exist on your server (moment.js), this is not really a good usage of mapReduce.
While mapReduce has it's uses, a simple aggregation case like this is better suited to the aggregation framework as it is a native C++ implementation as opposed to mapReduce which runs inside a JavaScript interpreter. As a result the processing is much faster.
All you need are your existing unix timestamp values for start and end as well as the current day of the month ( dayOfMonth ) in order to do the date math:
db.collection.aggregate([
// Match documents using your existing start and end values
{ "$match": {
"timestamp": { "$gte": start, "$lt": end }
}},
// Unwind campaigns array
{ "$unwind": "$campaigns" },
// Group on the start of month value
{ "$group": {
"_id": {
"$subtract": [
"$timestamp",
{ "$mod": [ "$timestamp", 1000 * 60 * 60 * 24 * dayOfMonth ] }
]
},
"geotriggers": {
"$sum": {
"$cond": [
"$campaigns.geotriggers",
1,
0
]
}
},
"pushes": {
"$sum": {
"$cond": [
"$campaigns.pushes",
1,
0
]
}
},
}}
])
If I am reading your code correctly you have have each document containing an array for "campaigns", so to deal with this in the aggregation framework you use the $unwind pipeline stage to expose each array member as it's own document.
The date math is done in the $group stage for the _id key by changing the "timestamp" value to be equal to the starting date of the month which is the same thing that your code is trying to do. It's debatable that you could just use null here as your range selection is only going to result in a singular date value, but this is just to show that the date math is possible.
With the "unwound" array elements, we process every element just like the "for loop" does and conditionally adds the values for "geotriggers" and "pushes" using the $cond operator. Again this presumes as by your code these fields evaluate to boolean true/false which is the evaluation part of $cond
Your query condition is of course just met with the $match stage at the start of the pipeline, using the same range query.
That basically does the same thing without relying on additional libraries in server side processing and does it much faster as well.
See the other Aggregation Framework operators for reference.

Resources