Mongoose js batch find - node.js

I'm using mongoose 3.8. I need to fetch 100 documents, execute the callback function then fetch next 100 documents and do the same thing.
I thought .batchSize() would do the same thing, but I'm getting all the data at once.
Do I have to use limit or offset? If yes, can someone give a proper example to do it?
If it can be done with batchSize, why is it not working for me?
MySchema.find({}).batchSize(20).exec(function(err,docs)
{
console.log(docs.length)
});
I thought it would print 20 each time, but its printing whole count.

This link has the information you need.
You can do this,
var pagesize=100;
MySchema.find().skip(pagesize*(n-1)).limit(pagesize);
where n is the parameter you receive in the request, which is the page number client wants to receive.

Docs says:
In most cases, modifying the batch size will not affect the user or the application, as the mongo shell and most drivers return results as if MongoDB returned a single batch.
You may want to take a look at streams and perhaps try to accumulate subresults:
var stream = Dummy.find({}).stream();
stream.on('data', function (dummy) {
callback(dummy);
})

Related

How to handle multiple database connections for 2 or 3 SELECT queries in AWS Lambda with nodejs?

The lambda's job is to see if a query returns any results and alert subscribers via an SNS topic. If no rows are return, all good, no action needed. This has to be done every 10 minutes.
For some reasons, I was told that we can't have any triggers added on the database, and no on prem environment is suitable to host a cron job
Here comes lambda.
This is what I have in the handler, inside a loop for each database.
sequelize.authenticate()
.then(() => {
for (let j = 0; j < database[i].rawQueries[j].length; j++) {
sequelize.query(database[i].rawQueries[j] => {
if (results[0].length > 0) {
let message = "Temporary message for testing purposes" // + query results
publishSns("Auto Query Alert", message)
}
}).catch(err => {
publishSns("Auto Query SQL Error", `The following query could not be executed: ${database[i].rawQueries[j])}\n${err}`)
})
}
})
.catch(err => {
publishSns("Auto Query DB Connection Error", `The following database could not be accessed: ${databases[i].database}\n${err}`)
})
.then(() => sequelize.close())
// sns publisher
function publishSns(subject, message) {
const params = {
Message: message,
Subject: subject,
TopicArn: process.env.SNStopic
}
SNS.publish(params).promise()
}
I have 3 separate database configurations, and for those few SELECT queries, I thought I could just loop through the connection instances inside a single lambda.
The process is asynchronous and it takes 9 to 12 seconds per invocation, which I assume is far far from optimal
The whole thing feels very very sub optimal but that's my current level :)
To make things worse, I now read that lambda and sequelize don't really play well together:
I am using sequelize because that's the only way I could get 3 connections to the database in the same invocation to work without issues. I tried mssql and tedious packages and wasn't able with either of them
It now feels like using an ORM is an overkill for this very simple task of a SELECT query, and I would really like to at least have the connections and their queries done asynchronously to save some execution time
I am looking into different ways to accomplish this and i went down the rabbit hole and I now have more questions than before! Generators? are they still useful? Observables with RxJs? Could this apply here? Async/Await or just Promises? Do I even need sequelize?
Any guidance/opinion/criticism would be very appreciated
I'm not familiar with sequelize.js but hope I can help. I don't know your level with RxJS and Observables but it's worth to try.
I think you could definitely use Observables and RxJS.
I would start with an interval() that will run the code every time you define.
You can then pipe the interval since it's an Observable, do the auth bit and do a map() to get an array of Observables (for each .query call, I am assuming all your calls, authenticate and query, are Promises so it's possible to transform them into Observables with from()). You can then use something like forkJoin() with the previous array to get a response after all calls are done.
In the .subscribe at the end, you would make the publishSns().
You can pipe a catchError() too and process errors.
The map() part might be not necessary and do it previously and have it stored in a variable since you don't depend on an authenticate value.
I'm certain my solution isn't the only one or the best but i think it would work.
Hope it helps and let me know if it works!

How to find multiple mongo db objects at once in node js

i am trying to run a search , it is working fine with findOne and recieving data but when i use find it just return some headers and prototypes like this
response i am getting
code
let data = await client
.db("Movies")
.collection("movies")
.find();
console.log(data);
please tell me where i am going wrong
Based on documentation
The toArray() method returns an array that contains all the documents from a cursor. The method iterates completely the cursor, loading all the documents into RAM and exhausting the cursor.
so just try
let data = await client
.db("Movies")
.collection("movies")
.find({}).toArray();
console.log(data);

How can I hold on http calls for a while?

I have the following problem and I would appreciate if someone could send me an idea, I have tried some, but it did not work.
Consider the code:
while (this.fastaSample.length > 0) {
this.datainputService
.saveToMongoDB(this.fastaSample.slice(0, batch))
.subscribe();
}
It supposes to solve the issue that I cannot send my data in a single http call, since it is too big, I was able to send 10% without issue, more than that, it does not work!
So I thought, I should send smaller batches, and I have consulted sone Q&As here, and they helped me, but did not solve the problem.
I have tried to use await as I did in node, but it does not work; it sends all the http at once, it would be nice to stop/hold the code until the last http call is complete, that would be nice!
Any suggestion?
I suppose you could make it all nice and rxjs by using from and concatAll:
untested code
// first create batches by chunking the array
const batches = Array.from(
{ length: Math.ceil(fastaSample.length / batch) },
(v, i) => fastaSample.slice(i * batch, i * batch + batch)
)
// Second go over these chunks using `from` and `concatAll`:
from(batches).pipe(
map((batch) => this.data.inputService.saveToMongoDB(batch)),
concatAll()
).subscribe();
This will make the calls consecutively. If it's possible to do the requests at the same time, you can do mergeAll().
But like #Mike commented, it seems like the issue should be handled in the MongoDB backend and accept a multipart request. This way you don't need to chunk stuff

where to specify "noCursorTimeout" option using nodejs-mongodb driver?

it might be obvious, but right now I'm not able to either find it in the docs or google it...
I'm using mongodb with the nodejs-driver and have a potentially long operation (> 10 minutes) pertaining to a cursor which does get a timeout (as specified in http://docs.mongodb.org/manual/core/cursors/#cursor-behaviors).
In the nodejs-driver API Documentation (http://mongodb.github.io/node-mongodb-native/2.0/api/Cursor.html) a method addCursorFlag(flag, value) is mentioned to be called on a Cursor.
However, there's no example on how to do it, and simply calling e.g.
objectCollection.find().limit(objectCount).addCursorFlag('noCursorTimeout', true).toArray(function (err, objects) {
...
}
leads to a TypeError: Object #<Cursor> has no method 'addCursorFlag'.
So how to go about making this Cursor exist longer than those 10 minutes?
Moreover, as required by the mongodb documentation, how do I then manually close the cursor?
Thanks!
The example you've provided:
db.collection.find().addCursorFlag('noCursorTimeout',true)
..works fine for me on driver version 2.14.21. I've an open cursor for 45 minutes now.
Could it be you were using 1.x NodeJS driver?
so I've got a partial solution for my problem. it's doesn't say so in the API docs, but apparently you have to specify it in the find() options like so:
objectCollection.find({},{timeout: false}).limit(objectCount).toArray(function (err, objects) {
...
}
however still, what about the cleanup? do those cursors ever get killed? is a call to db.close() sufficient?

How to get Vertica copy from stdin response in NodeJS?

I'm using Vertica Database 07.01.0100 and node.js v0.10.32. I'm using the vertica nodejs module by vanberger. I want to send a copy from stdin command, and that is working using this example: https://gist.github.com/soldair/5168249. Here's my code:
var loadStreamQuery = "COPY \""+input('table-name')+"\" FROM STDIN DELIMITER ',' skip 1 direct;"
var stream = through();
connection.copy(loadStreamQuery,function(transfer, success, fail){
stream.on('data',function(data){
log.info("loaddata: on data =>",data);
transfer(data);
});
stream.on('end',function(data){
log.info("loaddata: on end =>", data);
if(data) {
transfer(data);
}
success();
callback(null,{'result':{'status':'200','result':"Data was loaded successfully into Vertica"}});
});
stream.on('error',function(err){
fail();
log.error("loaddata: on error =>",err);
connection.disconnect();
});
stream.write(new Buffer(file));
stream.end();
}
);
But, if the data file has more columns than the target table, it doesn't say so. It just happily runs, copying nothing and then ends. When I look at the table, nothing has been loaded. If I do the same thing in dbvisualizer, it tells me that 0 rows were affected.
I would like to examine the status of the command, but I don't know how. Is there some other event that I need to listen for? Do I need to save the result of copy to a variable and listen there, like I do with query calls? I'm a nodejs noob, so if the answer is obvious, just let me know.
Thanks!
I don't really think it is a node.js thing as much as it is a Vertica thing.
You need to look for rejected rows. You can find some good examples in the docs here.
If you want to actually see the rows that reject, you can do this by using a COPY statement clause like REJECTED DATA AS table "loader_rejects". Alternatively you can send it to a file on the cluster. I'm not aware of a way to get rejected rows to a local file using STDIN.
If you don't care at all about the actual data, and just want to know how many rows loaded and rejected... you can use GET_NUM_REJECTED_ROWS() and GET_NUM_ACCEPTED_ROWS(). I think COPY will actually also return a result set with just the count of loaded rows, at least that is what I've noticed in the past.
So I guess as an example, if you want to see how many rows were accepted and rejected, you could do:
connection.query "SELECT GET_NUM_REJECTED_ROWS() AS REJECTED_ROWS, GET_NUMBER_ACCEPTED_ROWS() AS ACCEPTED_ROWS", (err, resultset) -> log.info( err, resultset.fields, resultset.rows, resultset.status )

Resources