MongoDB schema schedule publish - node.js

I want to do a post publication routine every 10 minutes of data obtained by API.
The routine of cron and others works perfectly, but the problem comes when I try to get the first article with a date closer to expiration and has not been published or as little as possible.
I have tried to create an incremental field with the times that has been published, update the date of publication, etc; But I can not give with the formula, I always return the same result, the same article.
Schema
"_id": item.ID,
"title": item.title,
"images": [ item.firstImage ],
"url": "none",
"expired": "2017-04-30T22:00:00+03:00",
"lastPublish": "2017-04-30T20:41:02+03:00",
"publish": 0
Code
db.find({}).sort({ 'expired': 1, 'lastPublish': 1 }).limit(1).exec(function(err, doc) {
//db.find({}).sort({ 'publish': 1, 'expired': 1 }).limit(1).exec(function(err, doc) {
db.update({ '_id': doc._id }, { '$set': { 'lastPublish': new Date(), '$inc': { 'publish': 1 } })
});
I need to make a sort of circular queue, post those that expire earlier (they have dates of hours and days of difference) and when it's over, start over. But I always publish the first post and I'm running out of ideas.
I'd better leave for a few hours and retake it later, when I've cleared my mind.
Thanks !

Solved! Running the following routine works properly, I don't know why yesterday didn't work, apart from the one that was missing to choose the element of the array ([0]).
db.find({}).sort({ 'publish': 1, 'expired': 1 }).limit(1).exec(function(err, doc) {
db.update({ '_id': doc[0]._id }, { '$inc': { 'publish': 1 }, '$set': { 'lastPublish': new Date() } });
thanks !

Related

Node.js/MongoDB - querying dates

I'm having a bit of an issue understanding how to query dates; I think the issue might be with how my data is structured. Here is a sample document on my database.
{
"phone_num": 12553,
"facilities": [
"flat-screen",
"parking"
],
"surroundings": [
"ping-pong",
"pool"
],
"rooms": [
{
"room_name": "Standard Suite",
"capacity": 2,
"bed_num": 1,
"price": 50,
"floor": 1,
"reservations": [
{
"checkIn": {
"$date": "2019-01-10T23:23:50.000Z"
},
"checkOut": {
"$date": "2019-01-20T23:23:50.000Z"
}
}
]
}
]
}
I'm trying to query the dates to see check if a specific room is available at a certain date-range but no matter what I do I can't seem to get a proper result, either my query 404's or returns empty array.
I really tried everything, right now for simplicity I'm just trying to get the query to work with checkIn so I can figure out what I'm doing wrong. I tried 100 variants of the code below but I couldn't get it to work at all.
.find({"rooms": { "reservations": { "checkIn" : {"$gte": { "$date": "2019-01-09T00:00:00.000Z"}}}}})
Am I misunderstanding how the .find method works or is something wrong with how I'm storing my dates? (I keep seeing people mentioning ISODates but not too sure what that is or how to implement).
Thanks in advance.
I think the query you posted is not correct. For example, if you want to query for the rooms with the checkin times in a certain range then the query should be like this -
.find({"rooms.reservations.checkout":{$gte:new Date("2019-01-06T13:11:50+06:00"), $lt:new Date("2019-01-06T14:12:50+06:00")}})
Now you can do the same with the checkout time to get the proper filtering to find the rooms available within a date range.
A word of advice though, the way you've designed your collection is not sustainable in the long run. For example, the date query you're trying to run will give you the correct documents, but not the rooms inside each document that satisfy your date range. You'll have to do it yourself on the server side (assuming you're not using aggregation). This will block your server from handling other pending requests which is not desirable. I suggest you break the collection down and have rooms and reservations in separate collections for easier querying.
Recently I was working on date query. First of all we need to understand how we store date into the mongodb database. Say I have stored data using UTC time format like 2020-07-21T09:45:06.567Z.
and my json structure is
[
{
"dateOut": "2020-07-21T09:45:06.567Z",
"_id": "5f1416378210c50bddd093b9",
"customer": {
"isGold": true,
"_id": "5f0c1e0d1688c60b95360565",
"name": "pavel_1",
"phone": 123456789
},
"movie": {
"_id": "5f0e15412065a90fac22309a",
"title": "hello world",
"dailyRentalRate": 20
}
}
]
and I want to perform a query so that I can get all data only for this( 2020-07-21) date. So how can we perform that?. Now we need to understand the basic.
let result = await Rental.find({
dateOut: {
$lt:''+new Date('2020-07-22').toISOString(),
$gt:''+new Date('2020-07-21').toISOString()
}
})
We need to find 21 date's data so our query will be greater than 21 and less than 22 cause 2020-07-21T00:45:06.567Z , 2020-07-21T01:45:06.567Z .. ... .. this times are greater than 21 but less than 22.
var mydate1 = new Date();
var mydate1 = new Date().getTime();
ObjectId.getTimestamp()
Returns the timestamp portion of the ObjectId() as a Date.
Example
The following example calls the getTimestamp() method on an ObjectId():
ObjectId("507c7f79bcf86cd7994f6c0e").getTimestamp()
This will return the following output:
ISODate("2012-10-15T21:26:17Z")
If your using timestamps data to query.
EG : "createdAt" : "2021-07-12T16:06:34.949Z"
const start = req.params.id; //2021-07-12
const data = await Model.find({
"createdAt": {
'$gte': `${start}T00:00:00.000Z`,
'$lt': `${start}T23:59:59.999Z`
}
});
console.log(data);
it will show the data of particular date .i.,e in this case. "2021-07-12"

mongodb/mongoose aggregate memory usage very big+

I have a mongodb into which multiple sensors dump their data once a day to
a mongodb. Each document in essense is: { sid , date, data } (sensor_id, date as date (I only use the date component), and a data array of a couple hundred values.
Now I want to be able to get a overview statistic, for how many sensors I have data for each day. This aggegation works nicely, while I have a few dozens of elements, but even if I have a couple of hundred documents, then the query never finishes.
function dailyStatistic(callback) {
return air
.aggregate( [
{ $match: {} },
{ $group: { _id: { date: '$date' }, myCount: { $sum: 1 } } }
])
.allowDiskUse(true);
}
air is the name of my mongoose collection.
The aggregation should really just return:
[ {date:2017-08-07, myCount: 10}, {date:2017-08-08}, myCount: 26} ]
Now when I watch the machine (via glances) I get CPU_IOWAIT and MEMSWAP errrors, that ultimately will kill the node.js process before it gets the data.
When I check out the collection on robomongo, I can easily browse the
different data points. But also in robomongo, this script never gets me
a result:
db.getCollection('air').find({}).length()
Any ideas?
Thanks Andreas
Probably you do not have an index on date db.getCollection('air').createIndex({date:1})
db.getCollection('air').find({}).length() browse all the results
Instead uses db.getCollection('air').count({})
The best way to do this without crashing MongoDb would be to fetch data for a date range. In your case for 1 day.
function dailyStatistic(dateMin,dateMax,callback) {
return air
.aggregate( [
{ $match: {
date:{$gte:dateMin,$lte:dateMax}} },
{
$project:{
sid:1,
date:1,
data:1,
day: {$day: "$date"},
month: {$month: "$date"},
year: {$year: "$date"}
}
},
{ $group: { _id: {day: "$day",month: "$month", year: "$year"}, myCount: { $sum: 1 } } }
])
.allowDiskUse(true);}
You can take this further by adding pagination when the records available per hour/min is also too huge.
And as pagetronic suggested, create the indexes if you haven't.

Combine multiple query with one single $in query and specify limit for each array field?

I am using mongoose with node.js for MongoDB. Now i need to make 20 parallel find query requests in my database with limit of documents 4, same as shown below just brand_id will change for different brand.
areamodel.find({ brand_id: brand_id }, { '_id': 1 }, { limit: 4 }, function(err, docs) {
if (err) {
console.log(err);
} else {
console.log('fetched');
}
}
Now as to run all these query parallely i thought about putting all 20 brand_id in a array of string and then use a $in query to get the results, but i don't know how to specify the limit 4 for every array field which will be matched.
I write below code with aggregation but don't know where to specify limit for each element of my array.
var brand_ids = ["brandid1", "brandid2", "brandid3", "brandid4", "brandid5", "brandid6", "brandid7", "brandid8", "brandid9", "brandid10", "brandid11", "brandid12", "brandid13", "brandid14", "brandid15", "brandid16", "brandid17", "brandid18", "brandid19", "brandid20"];
areamodel.aggregate(
{ $project: { _id: 1 } },
{ $match : { 'brand_id': { $in: brand_ids } } },
function(err, docs) {
if (err) {
console.error(err);
} else {
}
}
);
Can anyone please tell me how can i solve my problem using only one query.
UPDATE- Why i don't think $group be helpful for me.
Suppose my brand_ids array contains these strings
brand_ids = ["id1", "id2", "id3", "id4", "id5"]
and my database have below documents
{
"brand_id": "id1",
"name": "Levis",
"loc": "india"
},
{
"brand_id": "id1",
"name": "Levis"
"loc": "america"
},
{
"brand_id": "id2",
"name": "Lee"
"loc": "india"
},
{
"brand_id": "id2",
"name": "Lee"
"loc": "america"
}
Desired JSON output
{
"name": "Levis"
},
{
"name": "Lee"
}
For above example suppose i have 25000 documents with "name" as "Levis" and 25000 of documents where "name" is "Lee", now if i will use group then all of 50000 documents will be queried and grouped by "name".
But according to the solution i want, when first document with "Levis" and "Lee" gets found then i will don't have to look for remaining thousands of the documents.
Update- I think if anyone of you can tell me this then probably i can get to my solution.
Consider a case where i have 1000 total documents in my mongoDB, now suppose out of that 1000, 100 will pass my match query.
Now if i will apply limit 4 on this query then will this query take same time to execute as the query without any limit, or not.
Why i am thinking about this case
Because if my query will take same time then i don't think $group will increase my time as all documents will be queried.
But if time taken by limit query is more than the time taken without the limit query then.
If i can apply limit 4 on each array element then my question will be solved.
If i cannot apply limit on each array element then i don't think $group will be useful, as in this case i have to scan whole documents to get the results.
FINAL UPDATE- As i read on below answer and also on mongodb docs that by using $limit, time taken by query does not get affected it is the network bandwidth that gets compromised. So i think if anyone of you can tell me how to apply limit on array fields (by using $group or anything other than that)then my problem will get solved.
mongodb: will limit() increase query speed?
Solution
Actually my thinking about mongoDB was very wrong i thought adding limit with queries decrease time taken by query but it is not the case that's why i stumbled so many days to try the answer which Gregory NEUT and JohnnyHK Told me to. Thanks a lot both of you guys i must have found the solution at the day one if i had known about this thing. thanks alot for helping me out of here guys i really appreciate it.
I propose you to use the $group aggregation attribute to group all data you got from the $match by brand_id, and then limit the groups of data using $slice.
Look at this stack overflow post
db.collection.aggregate(
{
$sort: {
created: -1,
}
}, {
$group: {
_id: '$city',
title: {
$push: '$title',
}
}, {
$project: {
_id: 0,
city: '$_id',
mostRecentTitle: {
$slice: ['$title', 0, 2],
}
}
})
I propose using distinct, since that will return all different brand names in your collection. (I assume this is what you are trying to achieve?)
db.runCommand ( { distinct: "areamodel", key: "name" } )
MongoDB docs
In mongoose i think it is: areamodel.db.db.command({ distinct: "areamodel", key: "name" }) (Untested)

MongoDB update/insert document and Increment the matched array element

I use Node.js and MongoDB with monk.js and i want to do the logging in a minimal way with one document per hour like:
final doc:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1 }, {action: action2, count: 27 }, {action: action3, count: 5 } ] }
the complete document should be created by incrementing one value.
e.g someone visits a webpage first this hour and the incrementation of action1 should create the following document with a query:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1} ] }
an other user in this hour visits an other webpage and document should be exteded to:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1}, {action: action2, count: 1} ] }
and the values in count should be incremented on visiting the different webpages.
At the moment i create vor each action a doc:
tracking.update({
time: moment().format('YYYY-MM-DD_HH'),
action: action,
info: info
}, { $inc: {count: 1} }, { upsert: true }, function (err){}
Is this possible with monk.js / mongodb?
EDIT:
Thank you. Your solution looks clean and elegant, but it looks like my server can't handle it, or i am to nooby to make it work.
i wrote a extremly dirty solution with the action-name as key:
tracking.update({ time: time, ts: ts}, JSON.parse('{ "$inc":
{"'+action+'": 1}}') , { upsert: true }, function (err) {});
Yes it is very possible and a well considered question. The only variation I would make on the approach is to rather calculate the "time" value as a real Date object ( Quite useful in MongoDB, and manipulative as well ) but simply "round" the values with basic date math. You could use "moment.js" for the same result, but I find the math simple.
The other main consideration here is that mixing array "push" actions with possible "updsert" document actions can be a real problem, so it is best to handle this with "multiple" update statements, where only the condition you want is going to change anything.
The best way to do that, is with MongoDB Bulk Operations.
Consider that your data comes in something like this:
{ "timestamp": 1439381722531, "action": "action1" }
Where the "timestamp" is an epoch timestamp value acurate to the millisecond. So the handling of this looks like:
// Just adding for the listing, assuming already defined otherwise
var payload = { "timestamp": 1439381722531, "action": "action1" };
// Round to hour
var hour = new Date(
payload.timestamp - ( payload.timestamp % ( 1000 * 60 * 60 ) )
);
// Init transaction
var bulk = db.collection.initializeOrderedBulkOp();
// Try to increment where array element exists in document
bulk.find({
"time": hour,
"log.action": payload.action
}).updateOne({
"$inc": { "log.$.count": 1 }
});
// Try to upsert where document does not exist
bulk.find({ "time": hour }).upsert().updateOne({
"$setOnInsert": {
"log": [{ "action": payload.action, "count": 1 }]
}
});
// Try to "push" where array element does not exist in matched document
bulk.find({
"time": hour,
"log.action": { "$ne": payload.action }
}).updateOne({
"$push": { "log": { "action": payload.action, "count": 1 } }
});
bulk.execute();
So if you look through the logic there, then you will see that it is only ever possible for "one" of those statements to be true for any given state of the document either existing or not. Technically speaking, the statment with the "upsert" can actually match a document when it exists, however the $setOnInsert operation used makes sure that no changes are made, unless the action actually "inserts" a new document.
Since all operations are fired in "Bulk", then the only time the server is contacted is on the .execute() call. So there is only "one" request to the server and only "one" response, despite the multiple operations. It is actually "one" request.
In this way the conditions are all met:
Create a new document for the current period where one does not exist and insert initial data to the array.
Add a new item to the array where the current "action" classification does not exist and add an initial count.
Increment the count property of the specified action within the array upon execution of the statement.
All in all, yes posssible, and also a great idea for storage as long as the action classifications do not grow too large within a period ( 500 array elements should be used as a maximum guide ) and the updating is very efficient and self contained within a single document for each time sample.
The structure is also nice and well suited to other query and possible addtional aggregation purposes as well.

Mongo aggregate query pulling out last 7 days worth of data (node.js)

I have a large collection of data which I'm trying to pull out of Mongo (node js) in order to render some graphs.
I need to pull the last 7 days worth of data out of a few thousand users. The specific piece of data on each user is formatted like so:
{
"passedModules" :
[{
"module" : ObjectId("53ea17dcac1d13a66fb6d14e"),
"date" : ISODate("2014-09-17T00:00:00.000Z")
},
{
"module" : ObjectId("53ec5c91af2792f1112554e8"),
"date" : ISODate("2014-09-17T00:00:00.000Z")
},
{
"module" : ObjectId("53ec5c5baf2792f1112554e6"),
"date" : ISODate("2014-09-17T00:00:00.000Z")
}]
}
At the moment I have a really messy group of queries which is working, but I believe this is possible to do entirely with Mongo?
Basically, I need to pull out all the entries from 7 days ago until now, in a dynamic fashion.
Is there a tried and testing way of working with dynamic dates in this way, more specifically using the aggregation framework in mongo? The reason for the aggregation framework is that I need to group these afterwards.
Many thanks
The general pattern for this type of query is:
// Compute the time 7 days ago to use in filtering the data
var d = new Date();
d.setDate(d.getDate()-7);
db.users.aggregate([
// Only include the docs that have at least one passedModules element
// that passes the filter.
{$match: {'passedModules.date': {$gt: d}}},
// Duplicate the docs, one per passedModules element
{$unwind: '$passedModules'},
// Filter again to remove the non-matching elements
{$match: {'passedModules.date': {$gt: d}}}
])
#JohnnyHK has a good answer already. But if you are using a querying tool (Like Robot3T or Metabase) and don't have programmatic access to create and change variables, here's another option.
{
"$match": {
"$expr":{
"$gte":[
"$passedModules.date",
{ $add: [new Date(), -604800000]}
]
}
}
}
Where the number 604800000 is just the time in milliseconds: 1000*60*60*24*<Number of Days>
You can make it relative to the week or month as well.
{
"$match": {
"$expr":{
"$eq":[
{ $add:[ {$multiply:[1000, { $year: ["$passedModules.date"] }]}, { $week: ["$passedModules.date"] } ]},
{ $add:[ {$multiply:[1000, { $year: [new Date()] }]}, { $week: [new Date()] } ]}
]
}
}
},
Hope it can help others in the same context as I was.

Resources