I'm working with an express app which is deployed in a EC2 container.
This app gets the request from anAWSLambda with some data to handle a Web Scrapping service (is deployed in EC2 because deploying it in AWSLambda is difficult).
The problem is that I need to implement a queue service in the express app to avoid opening more than X quantity of browsers at the same time depending the instance size.
How can I implement a queue service to await for a web scrapper request to be terminated before launching another one?
The time is not a problem because is a scheduled task that executes in early morning.
A simple in memory queue would not be enough, as you would not want request to be lost incase there is a crash.
If you are ok with app crash or think there there is very low probability then node modules like below could be handy.
https://github.com/mcollina/fastq
If reliability is important then amazon SQS should be good to go.
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/sqs-examples-send-receive-messages.html
Queue the work on request handler. Have a simple timeout base handler which can listen to queue and perform task.
Related
I have deployed a node js web application on app service in azure. Issue is that my application occasionally getting killed for unknown reason. I have done exhaustive search through all the log fines using kudu.
If I restart app service, application starts working.
Is there any way I can restart my node application once it has crashed. Kind of run for ever no matter what. For example if any error happens in an asp.net code deployed in IIS, IIS never crashes, its keeps of serving other incoming request.
Something like using forever/pm2 in azure app service.
node.js in Azure App Services is powered by IISNode, which takes care of everything you described, including monitoring your process for failures and restarting it.
Consider the following POC:
var http = require('http');
http.createServer(function (req, res) {
if (req.url == '/bad') {
throw 'bad';
}
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('bye');
}).listen(process.env.PORT || 1337);
If I host this in a Web App and issue the following sequence of requests:
GET /
GET /bad
GET /
Then the first will yield HTTP 200, the second will throw on the server and yield HTTP 500, and the third will yield HTTP 200 without me having to do anything. IISNode will just detect the crash and restart the process.
So you shouldn't need PM2 or similar solution because this is built in with App Services. However, if you really want to, they now have App Services Preview on Linux, which is powered by PM2 and lets you configure PM2. More on this here. But again you get this out of the box already.
Another thing to consider is Always On setting which is on by default:
By default, web apps are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the app loaded all the time. If your app runs continuous web jobs, you should enable Always On, or the web jobs may not run reliably.
This is another possible root cause for your issue and the solution is to disable Always On for your Web App (see the link above).
I really want to thank itaysk for your support for this issue.
Issue was not what I was suspecting. Actually the node server was getting restarted on failure correctly.
There was a different issue. Why my website was getting non responsive is for a different reason. Here is what was happening-
We have used rethinkdbdash to connect to rethinkdb database and we was using connection pool. There was a coding/design issue. We have around 15 change feeds implemented with along with socket.io. And the change feed was getting initialised for every user logged in. This was increasing number of active connections in the pool. And rethinkdbdash has default limit of 1000 connection in the pool and as there were lots of live connections, all the available connection in the pool was getting exhausted resulting no more available connection. So, request was waiting for an open connection and it was not getting any available, hence waiting for ever blocking any new requests to be served.
I have created a Nodejs + Koa application, that contains my Koa website and an API that receives requests from Angular.js that runs on the Koa website.
I will use the AWS SQS service to push messages from the application. These messages will be handled by an AWS Lambda function. When the Lambda function completes the work, it will push a message to another SQS queue. The Nodejs application will be polling that SQS queue for messages and when there is a message, it will send a status report to the user.
I have red the SQS documentation and it says, that it is not recommended to use long polling in a single thread applications, because it will block the thread.
I was wondering if it is a good idea to use the short polling at a 5 - 10 seconds interval (maybe less)? Is there a chance that this will significantly slow the website performance? Are there best practices for this?
Although I would recommend separating the reporting functionality to a different process.(Keeps the concerns separate)
I do not think, even long polling will adversely effect on your application performance.
Whatever SQS says about single threaded application is true, but for a application built on nodejs it does not apply. When you use the receive message api of SQS with the long polling, the wait happens on the server and the client API is asynch.
Nodejs leverages the eventloop mechanism and during the retrieval of messages, other processing can continue. Only when the messages are received on the client, the callback will be invoked and your process will be blocked.
Unless your processing is time consuming, I don't think the overall processing will be adversely impacted.
I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.
I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/
I have a four layer application:
HTML UI layer making AJAX calls to a UI Service (Web APIController).
UI Service which is a Web API controller. UI Service calls App Service.
App Service layer which has methods that call database directly through EF as well as make calls to other Domain services (Web APIs).
Domain Service which are Web APIs too.
The first, second and third layers are hosted on one machine. The Domain Service is hosted on a different machine. The SQL Server is hosted on a different server.
My questions are:
How can I differentiate between CPU bound and IO bound calls? Are the calls made from the UI Service to the App Service, CPU bound because they exist in the same app domain?
Are the calls from the App Service to the Domain Service IO Bound because the calls go through network? Is it the case with calls made from the App Service to the DB also?
Should I make all the methods TASK based with async/await to take advantage of the scalability? What I mean by scalability is that the IIS where the HTML UI layer, UI Service and App Service have hosted can process more requests?
What will happen under a heavy traffic on the website if I don't have async APIController? Will some users get a 404 because the IIS can't handle many requests?
What will happen under a heavy traffic scenario on the website if I have a async APIController? Will all the users see the UI although it is little bit late because IIS can handle all the requests but they are all queued?
A call over the network is IO bound to the caller. What kind of boundedness is present at the callee depends on the implementation.
Yes.
"can process more requests" if the number of requests that you concurrently process (at the same point in time) exceeds 100 you might start to see benefits from going async. Before that point the throughput benefit is negative (more CU load) and the productivity costs are non-trivial.
Requests queue up and more and more threads spawn. This can lead to death. The situations under which you can get into this problem are limited. Chances are you don't have 100 concurrent requests going because that would likely overload your servers by 10x. The prime case for async on the server is slow backend services (like web services or Azure stuff).
Only if the app can handle the load will all responses arrive. That is pretty logical. Async only gets you more throughput if the thread-pool (if properly configured) was not able to process all outstanding work. This is almost never the case.
For a discussion when it is good to use async the my previous posts:https://stackoverflow.com/a/25087273/122718 https://stackoverflow.com/a/12796711/122718 Does async calling of web service make sense on server?
I am using Sails js (node js framework) and running it on Heroku and locally.
The API function reads from an external file and performs long computations that might take hours on the queries it read.
My concern is that after a few minutes it returns with timeout.
I have 2 questions:
How to control the HTTP request / response timeout (what do I really need to control here?)
Is HTTP request considered best practice for this target? or should I use Socket IO? (well, I have no experience on Socket IO and not sure if I am not talking bullshit).
You should use the worker pattern to accomplish any work that would take more than a second or so:
"Web servers should focus on serving users as quickly as possible. Any non-trivial work that could slow down your user’s experience should be done asynchronously outside of the web process."
"The Flow
Web and worker processes connect to the same message queue.
A process adds a job to the queue and gets a url.
A worker process receives and starts the job from the queue.
The client can poll the provided url for updates.
On completion, the worker stores results in a database."
https://devcenter.heroku.com/articles/asynchronous-web-worker-model-using-rabbitmq-in-node