MongoDB update/insert document and Increment the matched array element - node.js

I use Node.js and MongoDB with monk.js and i want to do the logging in a minimal way with one document per hour like:
final doc:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1 }, {action: action2, count: 27 }, {action: action3, count: 5 } ] }
the complete document should be created by incrementing one value.
e.g someone visits a webpage first this hour and the incrementation of action1 should create the following document with a query:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1} ] }
an other user in this hour visits an other webpage and document should be exteded to:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1}, {action: action2, count: 1} ] }
and the values in count should be incremented on visiting the different webpages.
At the moment i create vor each action a doc:
tracking.update({
time: moment().format('YYYY-MM-DD_HH'),
action: action,
info: info
}, { $inc: {count: 1} }, { upsert: true }, function (err){}
Is this possible with monk.js / mongodb?
EDIT:
Thank you. Your solution looks clean and elegant, but it looks like my server can't handle it, or i am to nooby to make it work.
i wrote a extremly dirty solution with the action-name as key:
tracking.update({ time: time, ts: ts}, JSON.parse('{ "$inc":
{"'+action+'": 1}}') , { upsert: true }, function (err) {});

Yes it is very possible and a well considered question. The only variation I would make on the approach is to rather calculate the "time" value as a real Date object ( Quite useful in MongoDB, and manipulative as well ) but simply "round" the values with basic date math. You could use "moment.js" for the same result, but I find the math simple.
The other main consideration here is that mixing array "push" actions with possible "updsert" document actions can be a real problem, so it is best to handle this with "multiple" update statements, where only the condition you want is going to change anything.
The best way to do that, is with MongoDB Bulk Operations.
Consider that your data comes in something like this:
{ "timestamp": 1439381722531, "action": "action1" }
Where the "timestamp" is an epoch timestamp value acurate to the millisecond. So the handling of this looks like:
// Just adding for the listing, assuming already defined otherwise
var payload = { "timestamp": 1439381722531, "action": "action1" };
// Round to hour
var hour = new Date(
payload.timestamp - ( payload.timestamp % ( 1000 * 60 * 60 ) )
);
// Init transaction
var bulk = db.collection.initializeOrderedBulkOp();
// Try to increment where array element exists in document
bulk.find({
"time": hour,
"log.action": payload.action
}).updateOne({
"$inc": { "log.$.count": 1 }
});
// Try to upsert where document does not exist
bulk.find({ "time": hour }).upsert().updateOne({
"$setOnInsert": {
"log": [{ "action": payload.action, "count": 1 }]
}
});
// Try to "push" where array element does not exist in matched document
bulk.find({
"time": hour,
"log.action": { "$ne": payload.action }
}).updateOne({
"$push": { "log": { "action": payload.action, "count": 1 } }
});
bulk.execute();
So if you look through the logic there, then you will see that it is only ever possible for "one" of those statements to be true for any given state of the document either existing or not. Technically speaking, the statment with the "upsert" can actually match a document when it exists, however the $setOnInsert operation used makes sure that no changes are made, unless the action actually "inserts" a new document.
Since all operations are fired in "Bulk", then the only time the server is contacted is on the .execute() call. So there is only "one" request to the server and only "one" response, despite the multiple operations. It is actually "one" request.
In this way the conditions are all met:
Create a new document for the current period where one does not exist and insert initial data to the array.
Add a new item to the array where the current "action" classification does not exist and add an initial count.
Increment the count property of the specified action within the array upon execution of the statement.
All in all, yes posssible, and also a great idea for storage as long as the action classifications do not grow too large within a period ( 500 array elements should be used as a maximum guide ) and the updating is very efficient and self contained within a single document for each time sample.
The structure is also nice and well suited to other query and possible addtional aggregation purposes as well.

Related

Node.js/MongoDB - querying dates

I'm having a bit of an issue understanding how to query dates; I think the issue might be with how my data is structured. Here is a sample document on my database.
{
"phone_num": 12553,
"facilities": [
"flat-screen",
"parking"
],
"surroundings": [
"ping-pong",
"pool"
],
"rooms": [
{
"room_name": "Standard Suite",
"capacity": 2,
"bed_num": 1,
"price": 50,
"floor": 1,
"reservations": [
{
"checkIn": {
"$date": "2019-01-10T23:23:50.000Z"
},
"checkOut": {
"$date": "2019-01-20T23:23:50.000Z"
}
}
]
}
]
}
I'm trying to query the dates to see check if a specific room is available at a certain date-range but no matter what I do I can't seem to get a proper result, either my query 404's or returns empty array.
I really tried everything, right now for simplicity I'm just trying to get the query to work with checkIn so I can figure out what I'm doing wrong. I tried 100 variants of the code below but I couldn't get it to work at all.
.find({"rooms": { "reservations": { "checkIn" : {"$gte": { "$date": "2019-01-09T00:00:00.000Z"}}}}})
Am I misunderstanding how the .find method works or is something wrong with how I'm storing my dates? (I keep seeing people mentioning ISODates but not too sure what that is or how to implement).
Thanks in advance.
I think the query you posted is not correct. For example, if you want to query for the rooms with the checkin times in a certain range then the query should be like this -
.find({"rooms.reservations.checkout":{$gte:new Date("2019-01-06T13:11:50+06:00"), $lt:new Date("2019-01-06T14:12:50+06:00")}})
Now you can do the same with the checkout time to get the proper filtering to find the rooms available within a date range.
A word of advice though, the way you've designed your collection is not sustainable in the long run. For example, the date query you're trying to run will give you the correct documents, but not the rooms inside each document that satisfy your date range. You'll have to do it yourself on the server side (assuming you're not using aggregation). This will block your server from handling other pending requests which is not desirable. I suggest you break the collection down and have rooms and reservations in separate collections for easier querying.
Recently I was working on date query. First of all we need to understand how we store date into the mongodb database. Say I have stored data using UTC time format like 2020-07-21T09:45:06.567Z.
and my json structure is
[
{
"dateOut": "2020-07-21T09:45:06.567Z",
"_id": "5f1416378210c50bddd093b9",
"customer": {
"isGold": true,
"_id": "5f0c1e0d1688c60b95360565",
"name": "pavel_1",
"phone": 123456789
},
"movie": {
"_id": "5f0e15412065a90fac22309a",
"title": "hello world",
"dailyRentalRate": 20
}
}
]
and I want to perform a query so that I can get all data only for this( 2020-07-21) date. So how can we perform that?. Now we need to understand the basic.
let result = await Rental.find({
dateOut: {
$lt:''+new Date('2020-07-22').toISOString(),
$gt:''+new Date('2020-07-21').toISOString()
}
})
We need to find 21 date's data so our query will be greater than 21 and less than 22 cause 2020-07-21T00:45:06.567Z , 2020-07-21T01:45:06.567Z .. ... .. this times are greater than 21 but less than 22.
var mydate1 = new Date();
var mydate1 = new Date().getTime();
ObjectId.getTimestamp()
Returns the timestamp portion of the ObjectId() as a Date.
Example
The following example calls the getTimestamp() method on an ObjectId():
ObjectId("507c7f79bcf86cd7994f6c0e").getTimestamp()
This will return the following output:
ISODate("2012-10-15T21:26:17Z")
If your using timestamps data to query.
EG : "createdAt" : "2021-07-12T16:06:34.949Z"
const start = req.params.id; //2021-07-12
const data = await Model.find({
"createdAt": {
'$gte': `${start}T00:00:00.000Z`,
'$lt': `${start}T23:59:59.999Z`
}
});
console.log(data);
it will show the data of particular date .i.,e in this case. "2021-07-12"

Find after aggregate in MongoDB

{
"_id" : ObjectId("5852725660632d916c8b9a38"),
"response_log" : [
{
"campaignId" : "AA",
"created_at" : ISODate("2016-12-20T11:53:55.727Z")
},
{
"campaignId" : "AB",
"created_at" : ISODate("2016-12-20T11:55:55.727Z")
}]
}
I have a document which contains an array. I want to select all those documents that do not have response_log.created_at in last 2 hours from current time and count of response_log.created_at in last 24 is less than 3.
I am unable to figure out how to go about it. Please help
You can use the aggregation framework to filter the documents. A pipeline with $match and $redact steps will do the filtering.
Consider running the following aggregate operation where $redact allows you to proccess the logical condition with the $cond operator and uses the system variables $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:
var moment = require('moment'),
last2hours = moment().subtract(2, 'hours').toDate(),
last24hours = moment().subtract(24, 'hours').toDate();
MongoClient.connect(config.database)
.then(function(db) {
return db.collection('MyCollection')
})
.then(function (collection) {
return collection.aggregate([
{ '$match': { 'response_log.created_at': { '$gt': last2hours } } },
{
'$redact': {
'$cond': [
{
'$lt': [
{
'$size': {
'$filter': {
'input': '$response_log',
'as': 'res',
'cond': {
'$lt': [
'$$res.created_at',
last24hours
]
}
}
}
},
3
]
},
'$$KEEP',
'$$PRUNE'
]
}
}
]).toArray();
})
.then(function(docs) {
console.log(docs)
})
.catch(function(err) {
throw err;
});
Explanations
In the above aggregate operation, if you execute the first $match pipeline step
collection.aggregate([
{ '$match': { 'response_log.created_at': { '$gt': last2hours } } }
])
The documents returned will be the ones that do not have "response_log.created_at" in last 2 hours from current time where the variable last2hours is created with the momentjs library using the subtract API.
The preceding pipeline with $redact will then further filter the documents from the above by using the $cond ternary operator that evaluates this logical expression that uses $size to get the count and $filter to return a filtered array with elements that match other logical condition
{
'$lt': [
{
'$size': {
'$filter': {
'input': '$response_log',
'as': 'res',
'cond': { '$lt': ['$$res.created_at', last24hours] }
}
}
},
3
]
}
to $$KEEP the document if this condition is true or $$PRUNE to "remove" the document where the evaluated condition is false.
I know that this is probably not the answer that you're looking for but this may not be the best use case for Mongo. It's easy to do that in a relational database, it's easy to do that in a database that supports map/reduce but it will not be straightforward in Mongo.
If your data looked different and you kept each log entry as a separate document that references the object (with id 5852725660632d916c8b9a38 in this case) instead of being a part of it, then you could make a simple query for the latest log entry that has that id. This is what I would do in your case if I ware to use Mongo for that (which I wouldn't).
What you can also do is keep a separate collection in Mongo, or add a new property to the object that you have here which would store the latest date of campaign added. Then it would be very easy to search for what you need.
When you are working with a database like Mongo then how your data looks like must reflect what you need to do with it, like in this case. Adding a last campaign date and updating it on every campaign added would let you search for those campaign that you need very easily.
If you want to be able to make any searches and aggregates possible then you may be better off using a relational database.

Combine multiple query with one single $in query and specify limit for each array field?

I am using mongoose with node.js for MongoDB. Now i need to make 20 parallel find query requests in my database with limit of documents 4, same as shown below just brand_id will change for different brand.
areamodel.find({ brand_id: brand_id }, { '_id': 1 }, { limit: 4 }, function(err, docs) {
if (err) {
console.log(err);
} else {
console.log('fetched');
}
}
Now as to run all these query parallely i thought about putting all 20 brand_id in a array of string and then use a $in query to get the results, but i don't know how to specify the limit 4 for every array field which will be matched.
I write below code with aggregation but don't know where to specify limit for each element of my array.
var brand_ids = ["brandid1", "brandid2", "brandid3", "brandid4", "brandid5", "brandid6", "brandid7", "brandid8", "brandid9", "brandid10", "brandid11", "brandid12", "brandid13", "brandid14", "brandid15", "brandid16", "brandid17", "brandid18", "brandid19", "brandid20"];
areamodel.aggregate(
{ $project: { _id: 1 } },
{ $match : { 'brand_id': { $in: brand_ids } } },
function(err, docs) {
if (err) {
console.error(err);
} else {
}
}
);
Can anyone please tell me how can i solve my problem using only one query.
UPDATE- Why i don't think $group be helpful for me.
Suppose my brand_ids array contains these strings
brand_ids = ["id1", "id2", "id3", "id4", "id5"]
and my database have below documents
{
"brand_id": "id1",
"name": "Levis",
"loc": "india"
},
{
"brand_id": "id1",
"name": "Levis"
"loc": "america"
},
{
"brand_id": "id2",
"name": "Lee"
"loc": "india"
},
{
"brand_id": "id2",
"name": "Lee"
"loc": "america"
}
Desired JSON output
{
"name": "Levis"
},
{
"name": "Lee"
}
For above example suppose i have 25000 documents with "name" as "Levis" and 25000 of documents where "name" is "Lee", now if i will use group then all of 50000 documents will be queried and grouped by "name".
But according to the solution i want, when first document with "Levis" and "Lee" gets found then i will don't have to look for remaining thousands of the documents.
Update- I think if anyone of you can tell me this then probably i can get to my solution.
Consider a case where i have 1000 total documents in my mongoDB, now suppose out of that 1000, 100 will pass my match query.
Now if i will apply limit 4 on this query then will this query take same time to execute as the query without any limit, or not.
Why i am thinking about this case
Because if my query will take same time then i don't think $group will increase my time as all documents will be queried.
But if time taken by limit query is more than the time taken without the limit query then.
If i can apply limit 4 on each array element then my question will be solved.
If i cannot apply limit on each array element then i don't think $group will be useful, as in this case i have to scan whole documents to get the results.
FINAL UPDATE- As i read on below answer and also on mongodb docs that by using $limit, time taken by query does not get affected it is the network bandwidth that gets compromised. So i think if anyone of you can tell me how to apply limit on array fields (by using $group or anything other than that)then my problem will get solved.
mongodb: will limit() increase query speed?
Solution
Actually my thinking about mongoDB was very wrong i thought adding limit with queries decrease time taken by query but it is not the case that's why i stumbled so many days to try the answer which Gregory NEUT and JohnnyHK Told me to. Thanks a lot both of you guys i must have found the solution at the day one if i had known about this thing. thanks alot for helping me out of here guys i really appreciate it.
I propose you to use the $group aggregation attribute to group all data you got from the $match by brand_id, and then limit the groups of data using $slice.
Look at this stack overflow post
db.collection.aggregate(
{
$sort: {
created: -1,
}
}, {
$group: {
_id: '$city',
title: {
$push: '$title',
}
}, {
$project: {
_id: 0,
city: '$_id',
mostRecentTitle: {
$slice: ['$title', 0, 2],
}
}
})
I propose using distinct, since that will return all different brand names in your collection. (I assume this is what you are trying to achieve?)
db.runCommand ( { distinct: "areamodel", key: "name" } )
MongoDB docs
In mongoose i think it is: areamodel.db.db.command({ distinct: "areamodel", key: "name" }) (Untested)

Mongoose query returning repeated results

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.
Example:
1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.
2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.
QUERY
Locations.find({ coordinates :
{ $near : [ x , y ],
$maxDistance: maxDistance }
})
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
console.log("[+]Found Locations");
callback(locations);
});
SCHEMA
var locationSchema = new Schema({
date_created: { type: Date },
coordinates: [],
text: { type: String }
});
I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?
There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.
The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.
The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.
The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:
var seenIds = [],
lastDistance = null,
lastDate = null;
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance
"distanceField": "dist",
"limit": 10
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".
All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance,
"minDistance": lastDistance,
"distanceField": "dist",
"limit": 10,
"query": {
"_id": { "$nin": seenIds },
"date_created": { "$lt": lastDate }
}
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.
Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.
So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.
The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

mapReduce not calling map nor reduce

I just started working with mongodb and I am having troubles using mapReduce function.
For some reason it seems to not be calling the map and reduce functions.
Here is my code:
#getMonthlyReports: (req, res) ->
app_id = req.app.id
start = moment().subtract('years', 1).startOf('month').unix()
end = moment().endOf('day').unix()
console.log(start)
console.log(end)
map = ->
geotriggers = 0
pushes = 0
console.log("ok")
date = moment(#timestamp).startOf('month').unix()
for campaign in #campaigns
if campaign.geotriggers?
geotriggers += campaign.geotriggers
else if campaign.pushes?
pushes += campaign.pushes
emit date,
geotriggers: geotriggers
pushes: pushes
reduce = (key, values) ->
console.log("ok")
geotriggers = 0
pushes = 0
for value in values
geotriggers += value.geotriggers
pushes += value.pushes
geotriggers: geotriggers
pushes: pushes
common.db.collection(req.app.id + "_daily_campaign_reports").mapReduce map, reduce,
query:
timestamp:
$gte: start
$lt: end
out:
inline: 1
, (err, results) ->
console.log(results)
ResponseHelper.returnMessage req, res, 200, results
I put some console.logs and it seems the map and reduce functions are not being called.
Also my results is undefined.
Is there something I am missing?
Apart from how I have already commented on the reason your mapReduce is failing is due to calling a library function that does not exist on your server (moment.js), this is not really a good usage of mapReduce.
While mapReduce has it's uses, a simple aggregation case like this is better suited to the aggregation framework as it is a native C++ implementation as opposed to mapReduce which runs inside a JavaScript interpreter. As a result the processing is much faster.
All you need are your existing unix timestamp values for start and end as well as the current day of the month ( dayOfMonth ) in order to do the date math:
db.collection.aggregate([
// Match documents using your existing start and end values
{ "$match": {
"timestamp": { "$gte": start, "$lt": end }
}},
// Unwind campaigns array
{ "$unwind": "$campaigns" },
// Group on the start of month value
{ "$group": {
"_id": {
"$subtract": [
"$timestamp",
{ "$mod": [ "$timestamp", 1000 * 60 * 60 * 24 * dayOfMonth ] }
]
},
"geotriggers": {
"$sum": {
"$cond": [
"$campaigns.geotriggers",
1,
0
]
}
},
"pushes": {
"$sum": {
"$cond": [
"$campaigns.pushes",
1,
0
]
}
},
}}
])
If I am reading your code correctly you have have each document containing an array for "campaigns", so to deal with this in the aggregation framework you use the $unwind pipeline stage to expose each array member as it's own document.
The date math is done in the $group stage for the _id key by changing the "timestamp" value to be equal to the starting date of the month which is the same thing that your code is trying to do. It's debatable that you could just use null here as your range selection is only going to result in a singular date value, but this is just to show that the date math is possible.
With the "unwound" array elements, we process every element just like the "for loop" does and conditionally adds the values for "geotriggers" and "pushes" using the $cond operator. Again this presumes as by your code these fields evaluate to boolean true/false which is the evaluation part of $cond
Your query condition is of course just met with the $match stage at the start of the pipeline, using the same range query.
That basically does the same thing without relying on additional libraries in server side processing and does it much faster as well.
See the other Aggregation Framework operators for reference.

Resources