send SMS Que through api using NODE JS Express JS Mongoose - node.js

I have 500 000 data in a single collection (NODE JS, Express JS, Mongoose).
We need send a sms to each one of them. we have a direct API URL.
Now we need to send this one by one in the background and update the status on db as sent.
There is a condition, to send 60 sms per minute.
How do i schedule automatically in the background.
How do i send this? Is it possible via corn tab? Any reference

The most basic solution is to use a timer function like setTimeout and setInterval, run it every minute, fetch a batch of unsent messages from the database and send them while marking them as sent. There are a few caveats:
Make sure the sending function finishes before being executed again, in case sending the batch of messages takes more than 1 minute. Running setTimeout at the end of the function is a safer alternative to setInterval.
Active timer will prevent your application from exiting, so make sure in to run clearTimeout in an exit handler.
Periodic task can block your event loop, so it may be better to either run message sending in a separate worker thread or as a completely separate process (i.e. script executed separately with Node).
Possibly more robust option is to use a job scheduler like bree or a task queue like bull (requires Redis).

Related

Node.js API that runs a script executing continuously in the background

I need to build a Node.js API that, for each different user that calls it, starts running some piece of code (a simple script that sets up a Telegram client, listens to new messages and performs a couple of tasks here) that'd then continuously run in the background.
My ideas so far have been a) launching a new child process for each API call and b) for each call automatically deploying the script on the cloud.
I assume the first idea wouldn't be scalable, as for the second I have no experience on the matter.
I searched a dozen of keyword and haven't found anything relevant so far. Is there any handy way to implement this? In which direction can I search?
I look forward to any hint
Im not a node dev, but as a programmer you can do something like:
When user is active, it calls a function
this function must count the seconds that has passed to match the 24h (86400 seconds == 24 hours) and do the tasks;
When time match, the program stops
Node.js is nothing more that an event loop (libuv) whose execution stack run on v8 (javascript). The process will keep running until the stack is empty.
Keep it mind that there is only one thread executing your code (the event loop) and everything will happen as callback.
As long as you set up your telegram client with some listeners, node.js will wait for new messages and execute related listener.
Just instantiate a new client on each api call and listen to it, no need to spam a new process.
Anyway you'll eventually end in out of memory if you don't limit the number of parallel client of if you don't close them after some time (eg. using setInterval()).

How to poll another server from Node.js?

I'm currently developing a Shopify app with Node/Express and a Postgres database. When a user registers an account and connects their Shopify store, I'll need to download all of their store's orders. They could have 100,000s of orders, so I'd like to use a Shopify GraphQL Bulk Operation. While Shopify is handling this, my Node server will need to poll the Shopify server to check on the progress, and when the operation is complete, Shopify will send me a link where I can download all of the data. Once the data is processed and stored in my database, I'll send the user an email to say that their account is now set up.
How should I handle polling the Shopify server? The process could take anywhere from a few mins to hours. Using setInterval() would be a bad idea right? Because if the server restarts for whatever reason, It will lose the interval? So, should I use some sort of background task? And would I need to store anything in my database? I've researched cron jobs, child processes, worker threads, the bull package -- and it's left me a little confused.
(I also know that I could use a webhook, but Shopify offers no guarantees that my app will receive the webhook.)
Upon installation, launch a background job labeled "GetCustomerOrders". As you know, background jobs are mature, and nicely handle problems. For example, they can retry themselves if something goes wrong.
The Background job itself just sets up the Bulk Download and then settles into Poll. Polling is no big deal and just happens. As you said, could be minutes, could take hours. Nevertheless, a poll gets status on a bulk download, and that can even be hot-rodded. For example, you poll with an ID. So you poll till that ID completes. Regardless of restarts.
At the end of that rather simple setup, you get an URL to download and parse JSON. Spawn another job even for that. Endless fun. Why sweat it? Background jobs are the way to go.
The Webhook idea is OK but as the documentation says, they are not 100% and CRON is bush-league in that it misses out on the mature development of jobs in queues and is more like a simple trigger. Relying on CRON to start something is fine, but gives you zero management over what it starts.
I am guessing NodeJS has a decent background job system by this time. When you look at Sidekiq for Ruby you realize what awesome is. Surely you can find a copycat in Node that comes close anyway.

How to perform long event processing in Node JS with a message queue?

I am building an email processing pipeline in Node JS with Google Pub/Sub as a message queue. The message queue has a limitation where it needs an acknowledgment for a sent message within 10 minutes. However, the jobs it's sending to the Node JS server might take an hour to complete. So the same job might run multiple times till one of them finishes. I'm worried that this will block the Node JS event loop and slow down the server too.
Find an architecture diagram attached. My questions are:
Should I be using a message queue to start this long-running job given that the message queue expects a response in 10 mins or is there some other architecture I should consider?
If multiple such jobs start, should I be worried about the Node JS event loop being blocked. Each job is basically iterating through a MongoDB cursor creating hundreds of thousands of emails.
Well, it sounds like you either should not be using that queue (with the timeout you can't change) or you should break up your jobs into something that easily finishes long before the timeouts. It sounds like a case of you just need to match the tool with the requirements of the job. If that queue doesn't match your requirements, you probably need a different mechanism. I don't fully understand what you need from Google's pub/sub, but creating a queue of your own or finding a generic queue on NPM is generally fairly easy if you just want to serialize access to a bunch of jobs.
I rather doubt you have nodejs event loop blockage issues as long as all your I/O is using asynchronous methods. Nothing you're doing sounds CPU-heavy and that's what blocks the event loop (long running CPU-heavy operations). Your whole project is probably limited by both MongoDB and whatever you're using to send the emails so you should probably make sure you're not overwhelming either one of those to the point where they become sluggish and lose throughput.
To answer the original question:
Should I be using a message queue to start this long-running job given that the message queue expects a response in 10 mins or is there
some other architecture I should consider?
Yes, a message queue works well for dealing with these kinds of events. The important thing is to make sure the final action is idempotent, so that even if you process duplicate events by accident, the final result is applied once. This guide from Google Cloud is a helpful resource on making your subscriber idempotent.
To get around the 10 min limit of Pub/Sub, I ended up creating an in-memory table that tracked active jobs. If a job was actively being processed and Pub/Sub sent the message again, it would do nothing. If the server restarts and loses the job, the in-memory table also disappears, so the job can be processed once again if it was incomplete.
If multiple such jobs start, should I be worried about the Node JS event loop being blocked. Each job is basically iterating through a
MongoDB cursor creating hundreds of thousands of emails.
I have ignored this for now as per the comment left by jfriend00. You can also rate-limit the number of jobs being processed.

Background jobs that run on every request on Heroku and node.js

I have an app that needs to run a very long process (takes 30-60 seconds for each request). After the processing, the result is then returned to the request as a response. This works fine locally, but it crashes my Heroku instance.
What I'd like to happen instead is:
User comes on site, request sent to backend
Backend returns immediately, and kicks off another process/task/job that does the processing
When the processing ends, the response is returned to the correct user.
I am not sure what all I need for this. Based on an hour-long research, it seems like I can use Redis as a queue and a worker can poll it every x minutes. But what I can't understand is how to figure out which request to send the response to after processing ends.
Is there a sample Express/node.js for this? Any pointers are helpful.
Like you found in your research, setting up a worker queue using Redis is a good approach for long running processes. A nice library for this is kue (https://github.com/learnboost/kue).
When it comes to responding to a request with the results of the job, having an outanding requesting hanging waiting for a response is not a good way to go about it (and may not work, heroku kills requests that have been idle for a certain period of time).
What you could do is when the request is made start the background job and respond to the request right away with job ID. The client can then poll the server for the status of the job, when the job is complete it can then fetch the needed result.
Kue (from #mattetre's answer) is not maintained anymore. Kue's GitHub page suggests Bull as a good alternative. It is a fast and reliable Redis based queue for Node.js.

API with Work Queue Design Pattern

I am building an API that is connected to a work queue and I'm having trouble with the structure. What I'm looking for is a design pattern for a worker queue that is interfaced via a API.
Details:
I'm using a Node.js server and Express to create an API that takes a request and returns JSON. These request can take a long time to process (very data intensive) so this is why we use a queuing system (RabbitMQ).
So for example lets say I send a request to the API that will take 15 min to process. The Express API formats the request and puts it in a RabbitMQ (AMQP) queue. The next available worker takes the request off the queue and starts to process it. After its done (in this case 15 min) it saves the data into a MongoDB. .... now what .....
My issue is, how do I get the finished data back to the caller of the API? The caller is a completely separate program that contacts the API via something like an Ajax request.
The worker will save the processed data into a database but I have no way to push back to the original calling program.
Does anyone have any API with a work queue resources?
Please and thank you.
On the initiating call by the client you should return to the client a task identifier that will persist with the data all the way to MongoDB.
You can then provide an additional API method for the client to check the task's status. This method should take a single parameter, the task identifier, and check if a document with that identifier has made in into your collection in MongoDB. Return false if it doesn't exist yet, true when it does.
The client will have to repeatedly poll (but maybe at a 1 minute interval) the task status API method until it returns true.

Resources