I'm wondering if it's a good idea (performance wise) to store queries results in variables and update them only every few minute, since I have multiple database queries(MongoDB) in my node application that don't need to be up to date and some of them are a bit complex.
I'm thinking about something like this :
var queryResults = [];
myModel.find().exec(function(err, results) {
queryResults = results;
});
Then :
var interval = 10 * 60 * 1000;
setInterval(function() {
myModel.find().exec(function(err, results) {
queryResults = results;
});
}, interval);
And when I need to send the query results to my views engine :
app.get('/', function(req, res) {
res.render('index.ejs', {entries : queryResults});
});
Is this a good way to cache and display the same queries results to multiple clients?
You can use this module instead of creating your cache layer:
https://www.npmjs.com/package/memory-cache
You have to be careful not to put huge amount of data into memory. If you want to push in there millions of results, probably it is not a good idea.
Related
I am working on a project that deals with a large amount of data. Fetching of all data at once from mongodb is not an option since it results in bad user experience. I am working on creating an infinite loading setup and with each scroll, I want a fix number of data that is fetched from mongodb and will concatenate the newly fetched data with the previously fetched data to show results on my webpage.
How to do pagination in mongodb using nodejs?
The mongodb node.js driver allows you to use the pagination through the limit and the skip attributes.
// You can first start by counting the number of entities in your collection
collection.countDocuments().then((count) => {
var step = 1000;
var offset = 0;
var limit = step;
//then exploiting your offset and limit variables
//you can limit the number of results you get in each query (a page of results)
while (offset < count) {
process(offset, limit);
offset += step;
}
});
})
.catch((error) => console.error(error));
async function process(offset, limit) {
var entities = collection.find(
{},
{
limit: limit,
skip: offset,
}
);
for await (const entity of entities) {
// do what you want
}
}
You can find more details on the MongoDB documentation page.
https://www.mongodb.com/docs/drivers/node/current/fundamentals/crud/read-operations/limit/
Hello,
I use Node.js to provide an API for storing data on a MongoDB database.
I ran multiple tests on a read method, which takes ids and returns the corresponding documents. The point is that I must return these documents in the specified order. To ensure that, I use the following code:
// Sequentially fetch every element
function read(ids, callback) {
var i = 0;
var results = [];
function next() {
db.findOne(ids[i], function (err, doc) {
results.push(err ? null : doc);
if (ids.length > ++i) {
return next();
}
callback(results);
});
}
next();
}
This way, documents are fetched one-by-one, in the right order. It takes about 11s on my laptop to retrieve 27k documents.
However, I thought that it was possible to improve this method:
// Asynchronously map the whole array
var async = require('async');
function read(ids, callback) {
async.map(ids, db.findOne.bind(db), callback):
}
After running a single test, I was quite satisfied seeing that the 27k documents were retrieved in only 8s using simpler code.
The problem happens when I repeat the same request: the response time keeps growing (proportionally to the number of elements retrieved): 9s 10s 11s 12s.... This problem does not happen in the sequential version.
I tried two versions of Node.js, v6.2.0 and v0.10.29. The problem is the same. What causes this latency and how could I suppress it?
Try to use async.mapLimit to prevent overload. You need some tests to tune limit value with your environment.
But find({_id: {$in: list}}) is always better, because single database request instead of multiple.
I suggest you to try to perform restore of original order client-side.
Something like this:
function read(ids, cb) {
db.find(
{_id: {$in: ids.map(id => mongoose.Types.ObjectId(id))}},
process
);
function process(err, docs) {
if (err) return cb(err);
return cb(null, docs.sort(ordering))
}
function ordering(a, b) {
return ids.indexOf(b._id.toString()) - ids.indexOf(a._id.toString());
}
}
May be, find query needs to be corrected, I can't to know what exact mongodb driver you use.
This code is first-try, more manual sorting can improve performance alot. [].indexOf is heavy too(O(n)).
But I'm almost sure, even as-is now, it will work much faster.
Possible ordering replacement:
var idHash = {};
for(var i = 0; i < ids.length; i++)
idHash[ids[i]] = i;
function ordering(a, b) {
return idHash[b._id.toString()] - idHash[a._id.toString()];
}
Any sort algorithm has O(nlogn) in best case, but we already know result position of each found document, so, we can restore original order by O(n):
var idHash = ids.reduce((c, id, i) => (c[id] = i, c), {});
function process(err, docs) {
if (err) return cb(err);
return cb(null,
docs.reduce(
(c, doc) => (c[idHash[doc._id.toString()]] = doc, c),
ids.map(id => null))) //fill not_found docs by null
}
Functional style makes code flexier. For example this code can be easy modified to use async.reduce to be less sync-blocking.
In Node.js I am trying to get the following behaviour: During the runtime of my Express application I accumulate several IDs of objects that need further processing. For further processing, I need to transmit these IDs to a different service. The other service however, cannot handle a lot of requests, but rather requires batch transmission. Hence I need to accumulate a lot of individual requests to a bigger one while allowing persistence.
tl;dr — Over 15 minutes, several IDs are accumulated in my application, then after this 15 minute-window, they're all emitted at once. At the same time, the next windows is opened.
From my investigation, this is probably the abstract data type of a multiset: The items in my multiset (or bag) can have duplicates (hence the multi-), they're packaged by the time window, but have no index.
My infrastructure is already using redis, but I am not sure whether there's a way to accumulate data into one single job. Or is there? Are there any other sensible ways to achieve this kind of behavior?
I might be misunderstanding some of the subtlety of your particular situation, but here goes.
Here is a simple sketch of some code that processes a batch of 10 items at a time. The way you would do this differs slightly depending on whether the processing step is synchronous or asynchronous. You don't need anything more complicated than an array for this, since arrays have constant time push and length methods and that is the only thing you need to do. You may want to add another option to flush the batch after a given item in inserted.
Synchronous example:
var batch = [];
var batchLimit = 10;
var sendItem = function (item) {
batch.push(item);
if (item.length >= batchLimit) {
processBatchSynch(batch);
batch = [];
}
}
Asynchronous example:
// note that in this case the job of emptying the batch array
// has to be done inside the callback.
var batch = [];
var batchLimit = 10;
// your callback might look something like function(err, data) { ... }
var sendItem = function (item, cb) {
batch.push(item);
if (item.length >= batchLimit) {
processBatchAsync(batch, cb);
}
}
I came up with a npm module to solve this specific problem using a MySQL database for persistence:
persistent-bag: This is to bags like redis is to queues. A bag (or multiset) is filled over time and processed at once.
On instantiation of the object, the required table is created in the provided MySQL database if necessary.
var PersistentBag = require('persistent-bag');
var bag = new PersistentBag({
host: 'localhost',
port: '3306',
user: 'root',
password: '',
database: 'test'
});
Then items can be .add()ed during the runtime of any number of applications:
var item = {
title: 'Test item to store to bag'
};
bag.add(item, function (err, itemId) {
console.log('Item id: ' + itemId);
});
Working with the emitted, aggregated items every 15 minutes is done like in kue for redis by subscribing to .process():
bag.process(function worker(bag, done) {
// bag.data is now an array of all items
doSomething(bag.data, function () {
done();
});
});
I have a mongodb Relationships collection that stores the user_id and the followee_id(person the user is following). If I query for against the user_id I can find all the the individuals the user is following. Next I need to query the Users collection against all of the returned followee ids to get their personal information. This is where I confused. How would I accomplish this?
NOTE: I know I can embed the followees in the individual user's document and use and $in operator but I do not want to go this route. I want to maintain the most flexibility I can.
You can use an $in query without denormalizing the followees on the user. You just need to do a little bit of data manipulation:
Relationship.find({user_id: user_id}, function(error, relationships) {
var followee_ids = relationships.map(function(relationship) {
return relationship.followee_id;
});
User.find({_id: { $in: followee_ids}}, function(error, users) {
// voila
});
};
if i got your problem right(i think so).
you need to query each of the "individuals the user is following".
that means to query the database multiple queries about each one and get the data.
because the queries in node.js (i assume you using mongoose) are asynchronies you need to get your code more asynchronies for this task.
if you not familier with the async module in node.js it's about time to know it.
see npm async for docs.
i made you a sample code for your query and how it needs to be.
/*array of followee_id from the last query*/
function query(followee_id_arr, callback) {
var async = require('async')
var allResults = [];
async.eachSerias(followee_id_arr, function (f_id, callback){
db.userCollection.findOne({_id : f_id},{_id : 1, personalData : 1},function(err, data){
if(err) {/*handel error*/}
else {
allResults.push(data);
callback()
}
}, function(){
callback(null, allResults);
})
})
}
you can even make all the queries in parallel (for better preformance) by using async.map
Performance of individual findOne query is abnormally slow (upwards of 60-85ms). Is there something fundamentally wrong with the design below? What steps should I take to make this operation faster?
Goal (fast count of items within a range, under 10-20ms):
Input max and min time
Query database for document with closest time for max and min
Return the "number" field of both query result
Take the difference of the "number" field to get document count
Setup
MongoDB database
3000 documents, compound ascending index on time_axis, latency_axis, number field
[ { time_axis:1397888153982,latency_axis:5679,number:1},
{ time_axis:1397888156339,latency_axis:89 ,number:2},
...
{ time_axis:1398036817121,latency_axis:122407,number:2999},
{ time_axis:1398036817122,latency_axis:7149560,number:3000} ]
NodeJs
exports.getCount = function (uri, collection_name, min, max, callback) {
var low, high;
var start = now();
MongoClient.connect(uri, function(err, db) {
if(err) {
return callback(err, null);
}
var collection = db.collection(collection_name);
async.parallel([
function findLow(callback){
var query = {time_axis : { $gte : min}};
var projection = { _id: 0, number: 1};
collection.findOne( query, projection, function(err, result) {
low = result.number;
console.log("min query time: "+(now()-start));
callback();
});
},
function findHigh(callback){
var query = {time_axis : { $gte : max}};
var projection = { _id: 0, number: 1};
collection.findOne( query, projection, function(err, result) {
high = result.number;
console.log("max query time: "+(now()-start));
callback();
});
}
],
function calculateCount ( err ){
var count = high - low;
db.close();
console.log("total query time: "+(now()-start));
callback(null, count);
});
});
}
Note: Thank you for Adio for the answer. It turns out mongodb connection only need to be initialized once and handles connection pooling automatically. :)
Looking at your source code, I can see that you create a new connection every time you query MongoDB. Try provide an already created connection and thus reuse the connection you create. Coming from Java world I think you should create some connection pooling.
You can also check this question and its answer.
" You open do MongoClient.connect once when your app boots up and reuse the db object. It's not a singleton connection pool each .connect creates a new connection pool. "
Try to use the --prof option in NodeJs to generate profiling results and you can find out where it spent time on. eg. node --prof app.js
you will get a v8.log file containing the profiling results. The tool for interpreting v8.log is linux-tick-processor, which can be found within the v8 project v8/tools/linux-tick-processor.
To obtain linux-tick-processor:
git clone https://github.com/v8/v8.git
cd v8
make -j8 ia32.release
export D8_PATH=$PWD/out/ia32.release
export PATH=$PATH:$PWD/tools
linux-tick-processor /path/to/v8.log | vim -