How can I hold on http calls for a while? - node.js

I have the following problem and I would appreciate if someone could send me an idea, I have tried some, but it did not work.
Consider the code:
while (this.fastaSample.length > 0) {
this.datainputService
.saveToMongoDB(this.fastaSample.slice(0, batch))
.subscribe();
}
It supposes to solve the issue that I cannot send my data in a single http call, since it is too big, I was able to send 10% without issue, more than that, it does not work!
So I thought, I should send smaller batches, and I have consulted sone Q&As here, and they helped me, but did not solve the problem.
I have tried to use await as I did in node, but it does not work; it sends all the http at once, it would be nice to stop/hold the code until the last http call is complete, that would be nice!
Any suggestion?

I suppose you could make it all nice and rxjs by using from and concatAll:
untested code
// first create batches by chunking the array
const batches = Array.from(
{ length: Math.ceil(fastaSample.length / batch) },
(v, i) => fastaSample.slice(i * batch, i * batch + batch)
)
// Second go over these chunks using `from` and `concatAll`:
from(batches).pipe(
map((batch) => this.data.inputService.saveToMongoDB(batch)),
concatAll()
).subscribe();
This will make the calls consecutively. If it's possible to do the requests at the same time, you can do mergeAll().
But like #Mike commented, it seems like the issue should be handled in the MongoDB backend and accept a multipart request. This way you don't need to chunk stuff

Related

How to handle multiple database connections for 2 or 3 SELECT queries in AWS Lambda with nodejs?

The lambda's job is to see if a query returns any results and alert subscribers via an SNS topic. If no rows are return, all good, no action needed. This has to be done every 10 minutes.
For some reasons, I was told that we can't have any triggers added on the database, and no on prem environment is suitable to host a cron job
Here comes lambda.
This is what I have in the handler, inside a loop for each database.
sequelize.authenticate()
.then(() => {
for (let j = 0; j < database[i].rawQueries[j].length; j++) {
sequelize.query(database[i].rawQueries[j] => {
if (results[0].length > 0) {
let message = "Temporary message for testing purposes" // + query results
publishSns("Auto Query Alert", message)
}
}).catch(err => {
publishSns("Auto Query SQL Error", `The following query could not be executed: ${database[i].rawQueries[j])}\n${err}`)
})
}
})
.catch(err => {
publishSns("Auto Query DB Connection Error", `The following database could not be accessed: ${databases[i].database}\n${err}`)
})
.then(() => sequelize.close())
// sns publisher
function publishSns(subject, message) {
const params = {
Message: message,
Subject: subject,
TopicArn: process.env.SNStopic
}
SNS.publish(params).promise()
}
I have 3 separate database configurations, and for those few SELECT queries, I thought I could just loop through the connection instances inside a single lambda.
The process is asynchronous and it takes 9 to 12 seconds per invocation, which I assume is far far from optimal
The whole thing feels very very sub optimal but that's my current level :)
To make things worse, I now read that lambda and sequelize don't really play well together:
I am using sequelize because that's the only way I could get 3 connections to the database in the same invocation to work without issues. I tried mssql and tedious packages and wasn't able with either of them
It now feels like using an ORM is an overkill for this very simple task of a SELECT query, and I would really like to at least have the connections and their queries done asynchronously to save some execution time
I am looking into different ways to accomplish this and i went down the rabbit hole and I now have more questions than before! Generators? are they still useful? Observables with RxJs? Could this apply here? Async/Await or just Promises? Do I even need sequelize?
Any guidance/opinion/criticism would be very appreciated
I'm not familiar with sequelize.js but hope I can help. I don't know your level with RxJS and Observables but it's worth to try.
I think you could definitely use Observables and RxJS.
I would start with an interval() that will run the code every time you define.
You can then pipe the interval since it's an Observable, do the auth bit and do a map() to get an array of Observables (for each .query call, I am assuming all your calls, authenticate and query, are Promises so it's possible to transform them into Observables with from()). You can then use something like forkJoin() with the previous array to get a response after all calls are done.
In the .subscribe at the end, you would make the publishSns().
You can pipe a catchError() too and process errors.
The map() part might be not necessary and do it previously and have it stored in a variable since you don't depend on an authenticate value.
I'm certain my solution isn't the only one or the best but i think it would work.
Hope it helps and let me know if it works!

Rate Limit the Nodejs Module Request

So I'm trying to create a data scraper with Nodejs using the Request module. I'd like to limit the concurrency to 1 domains on a 20ms cycle to go through 50,000 urls.
When I execute the code, I'm DoS-ing the network with the 40Gbps bandwidth my system has access to... This creates local problems and remote problems.
The 5 concurrent scans on a 120ms cycle for 50k domains (if I calculated correctly) will finish the list in ~20 minutes and will not create any issues remotely at least.
The code I'm testing with:
var urls = // data from mongodb
urls.forEach(fn(url) {
// pseudo
request the url
process
});
The forEach function executes instantly "queueing" all urls and tries to fetch all. It seems impossible to do a delay on each loop. All google searches seem to show how to rate limit incoming request to your server/api. Same thing appears to happen with a for loop as well. Impossible to control how fast the loops execute. I'm missing something probably or the code logic is wrong. Any suggestions?
For simplification your code implementation use async/await and Promises instead callbacks.
Use package got or axios for run Promised requests.
Use p-map or similar approach form promise-fun
There is copypasted example:
const pMap = require('p-map');
const urls = [
'sindresorhus.com',
'ava.li',
'github.com',
…
];
console.log(urls.length);
//=> 100
const mapper = url => {
return fetchStats(url); //=> Promise
};
pMap(urls, mapper, {concurrency: 5}).then(result => {
console.log(result);
//=> [{url: 'sindresorhus.com', stats: {…}}, …]
});

Mongoose js batch find

I'm using mongoose 3.8. I need to fetch 100 documents, execute the callback function then fetch next 100 documents and do the same thing.
I thought .batchSize() would do the same thing, but I'm getting all the data at once.
Do I have to use limit or offset? If yes, can someone give a proper example to do it?
If it can be done with batchSize, why is it not working for me?
MySchema.find({}).batchSize(20).exec(function(err,docs)
{
console.log(docs.length)
});
I thought it would print 20 each time, but its printing whole count.
This link has the information you need.
You can do this,
var pagesize=100;
MySchema.find().skip(pagesize*(n-1)).limit(pagesize);
where n is the parameter you receive in the request, which is the page number client wants to receive.
Docs says:
In most cases, modifying the batch size will not affect the user or the application, as the mongo shell and most drivers return results as if MongoDB returned a single batch.
You may want to take a look at streams and perhaps try to accumulate subresults:
var stream = Dummy.find({}).stream();
stream.on('data', function (dummy) {
callback(dummy);
})

Is there any risk to read/write the same file content from different 'sessions' in Node JS?

I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.

Nodejs: Do additional stuff after res.send

I'm using Node as webserver and I want to log every request to it into a database. I also want the user to receive the response as quickly as possible, so I came up with this code:
// ... putting together the response_data
res.send(response_data);
// ... now log the request into the DB and maybe do additional stuff
It works and I like the idea of putting some of the (time) expensive stuff behind the send. But as I'm new to Node I'm asking if this is a common pattern?
On Stackoverflow I just find people having problems bc they try to send additional data after res.send - but I never heard anybody saying "yeah this is a great feature for your responsiveness" so I'm not sure if there's a major flaw with this solution I just don't see yet...
As long as you don't need to send anything back to the user as a result of the "additional" stuff then your approach is fine.
The problem most people come across is trying to send data down the response after the response has already been sent e.g.
res.send(response_data);
// do additional stuff
res.send(additional_data); // KABOOM!

Resources