Should AWS SQS message polling be used in a Nodejs + Koa application?

Should AWS SQS message polling be used in a Nodejs + Koa application? - node.js

I have created a Nodejs + Koa application, that contains my Koa website and an API that receives requests from Angular.js that runs on the Koa website.
I will use the AWS SQS service to push messages from the application. These messages will be handled by an AWS Lambda function. When the Lambda function completes the work, it will push a message to another SQS queue. The Nodejs application will be polling that SQS queue for messages and when there is a message, it will send a status report to the user.
I have red the SQS documentation and it says, that it is not recommended to use long polling in a single thread applications, because it will block the thread.
I was wondering if it is a good idea to use the short polling at a 5 - 10 seconds interval (maybe less)? Is there a chance that this will significantly slow the website performance? Are there best practices for this?

Although I would recommend separating the reporting functionality to a different process.(Keeps the concerns separate)
I do not think, even long polling will adversely effect on your application performance.
Whatever SQS says about single threaded application is true, but for a application built on nodejs it does not apply. When you use the receive message api of SQS with the long polling, the wait happens on the server and the client API is asynch.
Nodejs leverages the eventloop mechanism and during the retrieval of messages, other processing can continue. Only when the messages are received on the client, the callback will be invoked and your process will be blocked.
Unless your processing is time consuming, I don't think the overall processing will be adversely impacted.

Related

How can i implement a Queue in express App

I'm working with an express app which is deployed in a EC2 container.
This app gets the request from anAWSLambda with some data to handle a Web Scrapping service (is deployed in EC2 because deploying it in AWSLambda is difficult).
The problem is that I need to implement a queue service in the express app to avoid opening more than X quantity of browsers at the same time depending the instance size.
How can I implement a queue service to await for a web scrapper request to be terminated before launching another one?
The time is not a problem because is a scheduled task that executes in early morning.

A simple in memory queue would not be enough, as you would not want request to be lost incase there is a crash.
If you are ok with app crash or think there there is very low probability then node modules like below could be handy.
https://github.com/mcollina/fastq
If reliability is important then amazon SQS should be good to go.
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/sqs-examples-send-receive-messages.html
Queue the work on request handler. Have a simple timeout base handler which can listen to queue and perform task.

Google Pub-Sub two way communication architecture

I'm trying to understand how to do two-way communication with google pub-sub with the following architecture
EDIT: I meant to say subscribers instead of consumers
I'm trying to support the following workflow:
UI sends a request to an api service to process an async process
API Service publishes request to a topic to begin the process kick-off
The consumer picks up the message and processes the async process service.
once the async process service is done it publishes to a process complete topic.
Here is where I want the UI to pick up the process complete message and I'm trying to figure out the best approach.
So two questions:
Is the multiple topic the preferred approach when wanting to do two-way communication back to the client? Or is there a way to do this with a single topic with multiple subscriptions?
How should the consumer of the Process-Complete get the response back to the UI? Should the UI be the consumer of the subscription? Or should I send it back to the api service and publish a websocket message? Both these approaches seem to have tradeoffs.

Multiple topics are going to be preferred in this situation, one for messages going to the asynchronous processors and then one for the responses that go back. Otherwise, your asynchronous processors are going to needlessly receive the response messages and have to ack them immediately, which is unnecessary extra delivery of messages.
With regard to getting the response back to the UI, the UI should not be the consumer of the subscription. In order to do that, you'd need every running instance of the UI to have its own subscription because otherwise, they would load balance messages across them and you couldn't guarantee that the particular client that sent the request would actually receive the response. The same would be true if you have multiple API servers that need to receive particular responses based on the requests that transmitted through them. Cloud Pub/Sub isn't really designed for topics and subscriptions to be ephemeral in this way; it is best when these are created once and all of the data is transmitted across them.
Additionally, having the UI act as a subscriber means that you'd have to have the credentials in the UI to subscribe, which could be a security issue.
You might also consider not using a topic for the asynchronous response. Instead, you could encode as part of the message the address or socket of the client or API server that expects the response. Then, the asynchronous processor could receive a message, process it, send a response to the address specified in the message, and then ack the message it received. This would ensure responses are routed to where they need to go and minimize the delivery of messages that subscribers just ack that they don't need to process, e.g., messages that were intended for a different API server.

How does NodeJS process incoming requests

I have been studying and doing hands-on practice with NodeJS since few weeks now.
I understand that it is a single-threaded, event based, javascript runtime environment.
It uses an event loop to process javascript statements and any incoming IO requests from clients.
But I am getting a hard time understanding that when a request comes to NodeJS from an external client like ReactJS or Postman. How it reaches the event loop and in which phase of event loop does it get processed. So far I have read many articles which go on repeating the same thing about various phases but I am not convinced whether my understanding of how request is handled by NodeJS is correct.
So here is my understanding:
NodeJS listens on a port for any incoming requests
When a client sends the request to that port, then the request is picked up by NodeJS
Now it goes to Event Queue and waits there until it gets picked up in the Poll phase by the event loop
Also there are a couple of doubts I have around my understanding:
I know that we setup various routes in our node app that trigger certain logic when a request comes in. But at what point/phase does NodeJS do this route matching of incoming requests.
What happens after event loop picks up a request from Event Queue? How does it get processed by the V8 engine?

Nodejs response to API request after data processing is done

In my current project, my nodeJs/express will receive a HTTP request through a route.
Once received, node will then use NightmareJS to perform webscraping and subsequently execute a python script that further process the data.
Lastly, it would then update this data into MongoDB.
Everything takes about 5mins.
What I am trying to achieve is to allow my front-end to somehow receive an acknowledgement that the request was put through. But also receive an update when the above process is completed and the database is updated.
I have looked into using Long polling or socket.io. However, I don't know which one I should use or how. Or should I use rabbitMQ instead? Putting the response that it is complete into the queue while my front-end constantly querying this queue.

Long polling or socket.io are similar, socket.io has Long polling fallback if WS not supported
rabbitMQ is quite different, you cannot use rabbitMQ protocal in browser, so you need a client app, not a web
socket.io is excellent and go well along express, still there are other options, SSE (server send events), firebase. You need to feel them before you choose one, they are not that hard if you follow their official guide
4.some of my opensource might help
https://github.com/postor/sse-notify-suite
https://github.com/postor/node-realtime-db
benefits of each solution
ajax + server cache: simple
long pull: low latency
SSE: low latency, event based
socket.io: low latency, event based, high throug put, double direction, long pull fall back

node js on heroku - request timeout issue

I am using Sails js (node js framework) and running it on Heroku and locally.
The API function reads from an external file and performs long computations that might take hours on the queries it read.
My concern is that after a few minutes it returns with timeout.
I have 2 questions:
How to control the HTTP request / response timeout (what do I really need to control here?)
Is HTTP request considered best practice for this target? or should I use Socket IO? (well, I have no experience on Socket IO and not sure if I am not talking bullshit).

You should use the worker pattern to accomplish any work that would take more than a second or so:
"Web servers should focus on serving users as quickly as possible. Any non-trivial work that could slow down your user’s experience should be done asynchronously outside of the web process."
"The Flow
Web and worker processes connect to the same message queue.
A process adds a job to the queue and gets a url.
A worker process receives and starts the job from the queue.
The client can poll the provided url for updates.
On completion, the worker stores results in a database."
https://devcenter.heroku.com/articles/asynchronous-web-worker-model-using-rabbitmq-in-node

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string