i am trying to limit the number of records returned in a query:
Property.find(searchParams).nin('_id', prop_ids).limit(5).exec(function(err, properties) {
when the first call comes in, i get 5 records back. then i make a second call and pass in an array of ids (prop_ids). This array has all of the ids that were records that were returned in the first call... in this case i get no records back. I have a total of 7 records in my database, so the second call should return 2 records. How should I go about doing this?
I think mongoose might apply the limit before the nin query is applied so you will always just get those five. If it's a type of pagination you want to perform where you get 5 objects and then get 5 others, you can use the option skip instead:
var SKIP = ... // 0, 5, 10...
Property.find(searchParams, null, {
skip: SKIP,
limit: 5,
}, function(err, properties) {
})
This is what I took from your question, maybe you had something other in mind with the nin call?
Related
I am using mongoose to query a really big list from Mongodb
const chat_list = await chat_model.find({}).sort({uuid: 1}); // uuid is a index
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1});// create_time is a index of message collection, time: t1
// chat_list length is around 2,000, msg_list length is around 90,000
compute(chat_list, msg_list); // time: t2
function compute(chat_list, msg_list) {
for (let i = 0, len = chat_list.length; i < len; i++) {
msg_list.filter(msg => msg.uuid === chat_list[i].uuid)
// consistent handling for every message
}
}
for above code, t1 is about 46s, t2 is about 150s
t2 is really to big, so weird.
then I cached these list to local json file,
const chat_list = require('./chat-list.json');
const msg_list = require('./msg-list.json');
compute(chat_list, msg_list); // time: t2
this time, t2 is around 10s.
so, here comes the question, 150 seconds vs 10 seconds, why? what happened?
I tried to use worker to do the compute step after mongo query, but the time is still much bigger than 10s
The mongodb query returns a FindCursor that includes arrayish methods like .filter() but the result is not an Array.
Use .toArray() on the cursor before filtering to process the mongodb result set like for like. That might not make the overall process any faster, as the result set still needs to be fetched from mongodb, but compute will be similar.
const chat_list = await chat_model
.find({})
.sort({uuid: 1})
.toArray()
const msg_list = await message_model
.find({}, {content: 1, xxx})
.sort({create_time: 1})
.toArray()
Matt typed faster than I did, so some of what was suggested aligns with part of this answer.
I think you are measuring and comparing something different than what you are expecting and implying.
Your expectation is that the compute() function takes around 10 seconds once all of the data is loaded by the application. This is (mostly) demonstrated by your second test, apart from the fact that that test includes the time it takes to load the data from the local files. But you're seeing that there is a difference of 104 seconds (150 - 46) between the completion of message_model.find() and compute() hence leading to the question.
The key thing is that successfully advancing from the find against message_model is not the same thing as retrieving all of the results. As #Matt notes, the find() will return with a cursor object once the initial batch of results are ready. That is very different than retrieving all of the results. So there is more work (apparently ~94 seconds worth) left to do from the two find() operations to further iterate the cursors and retrieve the rest of the results. This additional time is getting reported inside of t2.
Ass suggested by #Matt, calling .toArray() should shift that time back into t1 as you are expecting. Also sounds like it may be more correct due to ambiguity with .filter() functions.
There are two other things that catch my attention. The first is: why are you retrieving all of this data client-side to do the filtering there? Perhaps you would like to do this uuid matching inside of the database via $lookup?
Secondly, this comment isn't clear to me:
// create_time is a index of message collection, time: t1
create_time itself is a field here, existent or not, that you are requesting an ascending sort against.
You are taking data from 2 tables, then with for loop you are comparing ID using filter function, what is happening now is your loop will be executed 2000 time and so the filter function also which contains 90000 records.
So take a worst case scenario here lets consider 2000 uuid you are getting is not inside the msg_list, here you are executing loop 2000*90000 even though you are not getting data.
It wan't take more than 10 to 15 secs if use below code.
//This will generate array of uuid present in message_model
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1}).distinct("uuid");
// Below query will match all uuid present in msg_list array with chat_list UUID
const chat_list = await chat_model.find({uuid:{$in:msg_list}}).sort({uuid: 1});
The above result is doing same as you have done in your code with filter function and loop but this is proper and fastest way to receive the data you required.
Currently I working on a news feed and want to fetch more data when the user scrolls down the list. At first I get a specific amount of data from my server sorted by date like this ->
var newsFeed = await NewsFeed.find().sort({createdDate: 'desc'}).limit(parseInt(amount));
When the user know reach the end of the list, I want to load more data by simply increase the amount variable in my api call. With the current call I also get the first elements that I already have. So is there a solution to get like the first 10 documents sorted by date and when from 11 - 20 and so on ?
If your documents are sorted you can use skip.
For example, if you have 10 objects, like this:
{ id:1 }, { id:2 }, { id:3 }, { id:4 } , ... { id:n}
You can query the number of documents you want in this way:
var find = await model.find({}).sort({id:1}).limit(amount)
Then, to get the next values, you can do this query:
find = await model.find({}).sort({id:1}).skip(amount).limit(amount)
The first find (assuming amount is, for example, 2), will return documents wit id 1 and 2.
The second find will return id 3 and 4.
Also, check this stack overflow question and this docs from Mongo.
I'm using MongoDB, (Mongoose on Node.js) I have a very large db of events, each event has a field seq (sequence), the order of the events.
I want to allow my users to find all the occurrences of a given event.
For example:
The user is searching for the event "ButtonClicked", I should return the all the locations that this event happened, in this example say [239, 1992, 5932]
This is easy, and I can just search for the requested event, and return the seq field.
Now I want to let the user view 20 events before, and 20 events after a specific seq.
It would have been great if I could do something like this:
db.events.find( { id:"ButtonClicked", seq: 1992 } ).before(20).after(20);
How can I do that?
Please note that the field seq might start with any number, and skip numbers, but it is incremental!
For example: [3,4,5,6,7,12,13,15,56,57...]
Also, note that the solution can ignore seq, I mentioned this field because I think that it can help the solution.
Thanks!
You could use comparison query operators, in particular $gte and $lte, using seq as a offset for the comparison.
Try:
var seqOffset = 1992;
db.events.find( { seq: { $gte: seqOffset - 20, $lte: seqOffset + 20 } } );
You could not get exactly 40 events, since as you mentioned seq might skip numbers.
I need to select from my messages collection, the 10 most recent messages, and return them as an array form the older to the most recent.
Example : m1, m2, m3, ... [m19, m20, m22] <-- Select last 10 but keep the order.
The slice operator is great because it can take value like -10 and so select 10 last values from an array. The problem is that I want to sort the whole query response, and not a property within the response (slice operator takes 2 arguments, the first one is the path).
Actually what I do is :
.find({ conversation: conversationId })
.sort('-updatedAt')
.limit(10)
.exec(function(err, messages) {
});
But the array it returns contains messages from the most recent to the older.
What can I do to keep the good order, without reverting the whole array after the query execution ? ( messages.reverse(); )
Get 10 latest documents
MySchema.find().sort({ _id: -1 }).limit(10)
Get 10 oldest documents
MySchema.find().sort({ _id: 1 }).limit(10)
You could change the order to ascending(old-to-new) messages and get the last N elements that you need
.find().skip(db.collection.count() - N)
So you will get the most recent messages in ascending orden
I've come across many of this same question here on StackOverflow. None providing a valid solid solution, so here we go:
I need to pick a random document from around 5 million documents in my MongoDB database in an efficient way.
I've tried getting the .count and using the .skip to get the random document, but it takes almost three seconds and very, very inefficient.
I can't make changes to the documents (like adding a "random") entry to each document or changing their _id's.
I've tried the solution of adding documents with an incremental _id (to pick a random _id to bypass using .skip) but this brought more headache than what it did when I try to add many documents in a short amount of time.
Adding data in an incremental way, or picking a random document, should not be this hard. I'm either missing some common knowledge, or doing something wrong, or this is what it really is..
Wanted to bring up the topic and get your responses.
Here is a way using the default ObjectId values for _id and a little math and logic.
// Get the "min" and "max" timestamp values from the _id in the collection and the
// diff between.
// 4-bytes from a hex string is 8 characters
var min = parseInt(db.collection.find()
.sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
max = parseInt(db.collection.find()
.sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
diff = max - min;
// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;
// work out a "random" _id value in the range:
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")
// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
.sort({ "_id": 1 }).limit(1).toArray()[0];
That's the general logic in shell representation and easily adaptable.
So in points:
Find the min and max primary key values in the collection
Generate a random number that falls between the timestamps of those documents.
Add the random number to the minimum value and find the first document that is greater than or equal to that value.
This uses "padding" from the timestamp value in "hex" to form a valid ObjectId value since that is what we are looking for. Using integers as the _id value is essentially simplier but the same basic idea in the points.