mongoose: get documents from to certain position - node.js

Currently I working on a news feed and want to fetch more data when the user scrolls down the list. At first I get a specific amount of data from my server sorted by date like this ->
var newsFeed = await NewsFeed.find().sort({createdDate: 'desc'}).limit(parseInt(amount));
When the user know reach the end of the list, I want to load more data by simply increase the amount variable in my api call. With the current call I also get the first elements that I already have. So is there a solution to get like the first 10 documents sorted by date and when from 11 - 20 and so on ?

If your documents are sorted you can use skip.
For example, if you have 10 objects, like this:
{ id:1 }, { id:2 }, { id:3 }, { id:4 } , ... { id:n}
You can query the number of documents you want in this way:
var find = await model.find({}).sort({id:1}).limit(amount)
Then, to get the next values, you can do this query:
find = await model.find({}).sort({id:1}).skip(amount).limit(amount)
The first find (assuming amount is, for example, 2), will return documents wit id 1 and 2.
The second find will return id 3 and 4.
Also, check this stack overflow question and this docs from Mongo.

Related

Nodejs compute gets slow after query big list from Mongodb

I am using mongoose to query a really big list from Mongodb
const chat_list = await chat_model.find({}).sort({uuid: 1}); // uuid is a index
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1});// create_time is a index of message collection, time: t1
// chat_list length is around 2,000, msg_list length is around 90,000
compute(chat_list, msg_list); // time: t2
function compute(chat_list, msg_list) {
for (let i = 0, len = chat_list.length; i < len; i++) {
msg_list.filter(msg => msg.uuid === chat_list[i].uuid)
// consistent handling for every message
}
}
for above code, t1 is about 46s, t2 is about 150s
t2 is really to big, so weird.
then I cached these list to local json file,
const chat_list = require('./chat-list.json');
const msg_list = require('./msg-list.json');
compute(chat_list, msg_list); // time: t2
this time, t2 is around 10s.
so, here comes the question, 150 seconds vs 10 seconds, why? what happened?
I tried to use worker to do the compute step after mongo query, but the time is still much bigger than 10s
The mongodb query returns a FindCursor that includes arrayish methods like .filter() but the result is not an Array.
Use .toArray() on the cursor before filtering to process the mongodb result set like for like. That might not make the overall process any faster, as the result set still needs to be fetched from mongodb, but compute will be similar.
const chat_list = await chat_model
.find({})
.sort({uuid: 1})
.toArray()
const msg_list = await message_model
.find({}, {content: 1, xxx})
.sort({create_time: 1})
.toArray()
Matt typed faster than I did, so some of what was suggested aligns with part of this answer.
I think you are measuring and comparing something different than what you are expecting and implying.
Your expectation is that the compute() function takes around 10 seconds once all of the data is loaded by the application. This is (mostly) demonstrated by your second test, apart from the fact that that test includes the time it takes to load the data from the local files. But you're seeing that there is a difference of 104 seconds (150 - 46) between the completion of message_model.find() and compute() hence leading to the question.
The key thing is that successfully advancing from the find against message_model is not the same thing as retrieving all of the results. As #Matt notes, the find() will return with a cursor object once the initial batch of results are ready. That is very different than retrieving all of the results. So there is more work (apparently ~94 seconds worth) left to do from the two find() operations to further iterate the cursors and retrieve the rest of the results. This additional time is getting reported inside of t2.
Ass suggested by #Matt, calling .toArray() should shift that time back into t1 as you are expecting. Also sounds like it may be more correct due to ambiguity with .filter() functions.
There are two other things that catch my attention. The first is: why are you retrieving all of this data client-side to do the filtering there? Perhaps you would like to do this uuid matching inside of the database via $lookup?
Secondly, this comment isn't clear to me:
// create_time is a index of message collection, time: t1
create_time itself is a field here, existent or not, that you are requesting an ascending sort against.
You are taking data from 2 tables, then with for loop you are comparing ID using filter function, what is happening now is your loop will be executed 2000 time and so the filter function also which contains 90000 records.
So take a worst case scenario here lets consider 2000 uuid you are getting is not inside the msg_list, here you are executing loop 2000*90000 even though you are not getting data.
It wan't take more than 10 to 15 secs if use below code.
//This will generate array of uuid present in message_model
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1}).distinct("uuid");
// Below query will match all uuid present in msg_list array with chat_list UUID
const chat_list = await chat_model.find({uuid:{$in:msg_list}}).sort({uuid: 1});
The above result is doing same as you have done in your code with filter function and loop but this is proper and fastest way to receive the data you required.

Get multiple documents from collection using nodejs and mongodb

Hi I have two mongodb collections. The first one returns json data (array) and with the output of this, I want to return documents that match.
When I run Console.log (req.bidder.myBids) I get the following output:
[{"productId":"3798b537-9c7b-4395-9e41-fd0ba39aa984","price":3010},{"productId":"3798b537-9c7b-4395-9e41-fd0ba39aa984","price":3020},{"productId":"4c4bd71c-6664-4d56-b5d3-6428fe1bed19","price":1040},{"productId":"4c4bd71c-6664-4d56-b5d3-6428fe1bed19","price":1050},{"productId":"4c4bd71c-6664-4d56-b5d3-6428fe1bed19","price":1060},{"productId":"4c4bd71c-6664-4d56-b5d3-6428fe1bed19","price":1070},{"productId":"4c4bd71c-6664-4d56-b5d3-6428fe1bed19","price":1090},{"productId":"4c4bd71c-6664-4d56-b5d3-6428fe1bed19","price":1100}]
The productId has duplicates, I want to remove duplicates and then call a routine that finds all the products that match and output as json.
So far I have this code that only outputs one document, but cant figure out how to add the array of productId's and then fetch all corresponding products.
var agencyId = req.body.agencyId;
var productId = req.body.productId;
if (!validate.STRING(agencyId)) {
res.apiError(messages.server.invalid_request);
} else {
dbProduct.find({productId:{$in:['3798b537-9c7b-4395-9e41-fd0ba39aa984','4c4bd71c-6664-4d56-b5d3-6428fe1bed19']}
}).then(dbRes => {
console.log(dbRes);
Updated code and works with hard-wired productId and updated above code. Looking at how to get the array data and transpose replacing the hard-wired productId's
The $in operator is what you want. See the docs here: https://docs.mongodb.com/manual/reference/operator/query/in/

Get actual count of matches in Azure Search

Azure Search returns a maximum of 1,000 results at a time. For paging on the client, I want the total count of matches in order to be able to display the correct number of paging buttons at the bottom and in order to be able to tell the user how many results there are. However, if there are over a thousand, how do I get the actual count? All I know is that there were at least 1,000 matches.
I need to be able to do this from within the SDK.
If you want to get total number of documents in an index, one thing you could do is set IncludeTotalResultCount to true in your search parameters. Once you do that when you execute the query, you will see the count of total documents in an index in Count property of search results.
Here's a sample code for that:
var credentials = new SearchCredentials("account-key (query or admin key)");
var indexClient = new SearchIndexClient("account-name", "index-name", credentials);
var searchParameters = new SearchParameters()
{
QueryType = QueryType.Full,
IncludeTotalResultCount = true
};
var searchResults = await indexClient.Documents.SearchAsync("*", searchParameters);
Console.WriteLine("Total documents in index (approx) = " + searchResults.Count.GetValueOrDefault());//Prints the total number of documents in the index
Please note that:
This count will be approximate.
Getting the count is an expensive operation so you should only do it with the very first request when implementing pagination.
For REST clients using the POST API, just include "count": "true" to the payload. You get the count in #odata.count.
Ref: https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents

mongodb: another "how to add a random record" thread

I've come across many of this same question here on StackOverflow. None providing a valid solid solution, so here we go:
I need to pick a random document from around 5 million documents in my MongoDB database in an efficient way.
I've tried getting the .count and using the .skip to get the random document, but it takes almost three seconds and very, very inefficient.
I can't make changes to the documents (like adding a "random") entry to each document or changing their _id's.
I've tried the solution of adding documents with an incremental _id (to pick a random _id to bypass using .skip) but this brought more headache than what it did when I try to add many documents in a short amount of time.
Adding data in an incremental way, or picking a random document, should not be this hard. I'm either missing some common knowledge, or doing something wrong, or this is what it really is..
Wanted to bring up the topic and get your responses.
Here is a way using the default ObjectId values for _id and a little math and logic.
// Get the "min" and "max" timestamp values from the _id in the collection and the
// diff between.
// 4-bytes from a hex string is 8 characters
var min = parseInt(db.collection.find()
.sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
max = parseInt(db.collection.find()
.sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
diff = max - min;
// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;
// work out a "random" _id value in the range:
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")
// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
.sort({ "_id": 1 }).limit(1).toArray()[0];
That's the general logic in shell representation and easily adaptable.
So in points:
Find the min and max primary key values in the collection
Generate a random number that falls between the timestamps of those documents.
Add the random number to the minimum value and find the first document that is greater than or equal to that value.
This uses "padding" from the timestamp value in "hex" to form a valid ObjectId value since that is what we are looking for. Using integers as the _id value is essentially simplier but the same basic idea in the points.

mongoose limit & nin not working properly

i am trying to limit the number of records returned in a query:
Property.find(searchParams).nin('_id', prop_ids).limit(5).exec(function(err, properties) {
when the first call comes in, i get 5 records back. then i make a second call and pass in an array of ids (prop_ids). This array has all of the ids that were records that were returned in the first call... in this case i get no records back. I have a total of 7 records in my database, so the second call should return 2 records. How should I go about doing this?
I think mongoose might apply the limit before the nin query is applied so you will always just get those five. If it's a type of pagination you want to perform where you get 5 objects and then get 5 others, you can use the option skip instead:
var SKIP = ... // 0, 5, 10...
Property.find(searchParams, null, {
skip: SKIP,
limit: 5,
}, function(err, properties) {
})
This is what I took from your question, maybe you had something other in mind with the nin call?

Resources