Node.js API that runs a script executing continuously in the background

Node.js API that runs a script executing continuously in the background - node.js

I need to build a Node.js API that, for each different user that calls it, starts running some piece of code (a simple script that sets up a Telegram client, listens to new messages and performs a couple of tasks here) that'd then continuously run in the background.
My ideas so far have been a) launching a new child process for each API call and b) for each call automatically deploying the script on the cloud.
I assume the first idea wouldn't be scalable, as for the second I have no experience on the matter.
I searched a dozen of keyword and haven't found anything relevant so far. Is there any handy way to implement this? In which direction can I search?
I look forward to any hint

Im not a node dev, but as a programmer you can do something like:
When user is active, it calls a function
this function must count the seconds that has passed to match the 24h (86400 seconds == 24 hours) and do the tasks;
When time match, the program stops

Node.js is nothing more that an event loop (libuv) whose execution stack run on v8 (javascript). The process will keep running until the stack is empty.
Keep it mind that there is only one thread executing your code (the event loop) and everything will happen as callback.
As long as you set up your telegram client with some listeners, node.js will wait for new messages and execute related listener.
Just instantiate a new client on each api call and listen to it, no need to spam a new process.
Anyway you'll eventually end in out of memory if you don't limit the number of parallel client of if you don't close them after some time (eg. using setInterval()).

Related

Node js how to schedule jobs (non-blocking)

I want to schedule jobs to run every x seconds so how can it be done.
An example is to ping a server every 10 seconds but all this should be async so non of the other functionality stops.

It depends upon what you mean by "none of the other functionality stops". Pinging a server is already, non-blocking and asynchronous as all networking in nodejs is that way.
So, code like this:
setInterval(() => {
// put code to do the ping here
}, 5000);
Will not block the rest of your server because code do to a ping is already non-blocking.
However, the rest of your server could block the setInterval() from firing on time. If you were executing some long running blocking code in your server, then the setInterval() timer callback would not happen until that code was done and it might be delayed from exactly when it was scheduled to run. Since nodejs is single threaded and event-driven, it can't process the next timer event until the previous event (whatever that was) is done executing the blocking portion of its code.
If you wanted to make absolutely sure that your timer would always run very close to when it should run, you either have to make sure the rest of your program never blocks for very long or you will need to move the timer out into another Javascript thread or process and communicate back to the main thread via messaging.
You could use a WorkerThread within your existing nodejs process or you can use the child_process module to run a separate child program that does your pinging for you. Nodejs has built-in messages that will work from either the WorkerThread or the child_process back to or from your main program.

You would need to utilize multithreading to make non blocking code. I would suggest using the built in node.js multithreading package named worker_threads.
https://nodejs.org/api/worker_threads.html

How to perform long event processing in Node JS with a message queue?

I am building an email processing pipeline in Node JS with Google Pub/Sub as a message queue. The message queue has a limitation where it needs an acknowledgment for a sent message within 10 minutes. However, the jobs it's sending to the Node JS server might take an hour to complete. So the same job might run multiple times till one of them finishes. I'm worried that this will block the Node JS event loop and slow down the server too.
Find an architecture diagram attached. My questions are:
Should I be using a message queue to start this long-running job given that the message queue expects a response in 10 mins or is there some other architecture I should consider?
If multiple such jobs start, should I be worried about the Node JS event loop being blocked. Each job is basically iterating through a MongoDB cursor creating hundreds of thousands of emails.

Well, it sounds like you either should not be using that queue (with the timeout you can't change) or you should break up your jobs into something that easily finishes long before the timeouts. It sounds like a case of you just need to match the tool with the requirements of the job. If that queue doesn't match your requirements, you probably need a different mechanism. I don't fully understand what you need from Google's pub/sub, but creating a queue of your own or finding a generic queue on NPM is generally fairly easy if you just want to serialize access to a bunch of jobs.
I rather doubt you have nodejs event loop blockage issues as long as all your I/O is using asynchronous methods. Nothing you're doing sounds CPU-heavy and that's what blocks the event loop (long running CPU-heavy operations). Your whole project is probably limited by both MongoDB and whatever you're using to send the emails so you should probably make sure you're not overwhelming either one of those to the point where they become sluggish and lose throughput.

To answer the original question:
Should I be using a message queue to start this long-running job given that the message queue expects a response in 10 mins or is there
some other architecture I should consider?
Yes, a message queue works well for dealing with these kinds of events. The important thing is to make sure the final action is idempotent, so that even if you process duplicate events by accident, the final result is applied once. This guide from Google Cloud is a helpful resource on making your subscriber idempotent.
To get around the 10 min limit of Pub/Sub, I ended up creating an in-memory table that tracked active jobs. If a job was actively being processed and Pub/Sub sent the message again, it would do nothing. If the server restarts and loses the job, the in-memory table also disappears, so the job can be processed once again if it was incomplete.
If multiple such jobs start, should I be worried about the Node JS event loop being blocked. Each job is basically iterating through a
MongoDB cursor creating hundreds of thousands of emails.
I have ignored this for now as per the comment left by jfriend00. You can also rate-limit the number of jobs being processed.

NodeJs App - Repeated job - Single or multiple child processes?

I am currently developing a node js app with a REST API that exposes data from a mongo db.
The application needs to update some data every 5 minutes by calling an external service (could take more than one minute to get the new data).
I decided to isolate this task into a child_process but I am not sure about what should I need put in this child process :
Only the function to be executed. The schedule is managed by the main process.
Having a independent process that auto-refresh data every 5 minute and send a message to main process every time the refresh is done.
I don't really know if there is a big cost to start a new child process every 5 minutes or if I should use only one long time running child process or if I am overthinking the problem ^^
EDIT - Inforamtion the update task
the update task can take up than one minute but it consists in many smaller tasks (gathering information from many external providers) than run asynchronously do many I don't even need a child process ?
Thanks !

Node.js has an event-driven architecture capable of handling asynchronous calls hence it is unlike your typical C++ program where you will go with a multi-threaded/process architecture.
For your use-case I'm thinking maybe you can make use of the setInterval to repeatedly perform an operation which you can define more tiny async calls through using some sort of promises framework like bluebirdJS?
For more information see:
setInterval: https://developer.mozilla.org/en-US/docs/Web/API/WindowTimers/setInterval
setInterval()
Repeatedly calls a function or executes a code snippet, with a fixed
time delay between each call. Returns an intervalID.
Sample code:
setInterval(function() {
console.log("I was executed");
}, MILLISECONDS_IN_FIVE_MINUTE);
Promises:
http://bluebirdjs.com/docs/features.html
Sample code:
new Promise(function(resolve, reject) {
updateExternalService(data)
.then(function(response) {
return this.parseExtResp(response);
})
.then(function(parsedResp) {
return this.refreshData(parsedResp);
})
.then(function(returnCode) {
console.log("yay updated external data source and refreshed");
return resolve();
})
.catch(function(error) {
// Handle error
console.log("oops something went wrong ->" + error.message);
return reject();
});
}

It does not matter the total clock time that it takes to get data from an external service as long as you are using asynchronous requests. What matters is how much CPU you are using in doing so. If the majority of the time is waiting for the external service to respond or to send the data, then your node.js server is just sitting idle most of the time and you probably do not need a child process.
Because node.js is asynchronous, it can happily have many open requests that are "in flight" that it is waiting for responses to and that takes very little system resources.
Because node.js is single threaded, it is CPU usage that typically drives the need for a child process. If it takes 5 minutes to get a response from an external service, but only 50ms of actual CPU time to process that request and do something with it, then you probably don't need a child process.
If it were me, I would separate out the code for communicating with the external service into a module of its own, but I would not add the complexity of a child process until you actually have some data that such a change is needed.
I don't really know if there is a big cost to start a new child
process every 5 minutes or if I should use only one long time running
child process or if I am overthinking the problem
There is definitely some cost to starting up a new child process. It's not huge, but if you're going to be doing it every 5 minutes and it doesn't take a huge amount of memory, then it's probably better to just start up the child process once, have it manage the scheduling of communicating with the external service entirely upon it's own and then it can communicate back results to your other node.js process as needed. This makes the 2nd node process much more self-contained and the only point of interaction between the two processes is to communicate an update. This separation of function and responsibility is generally considered a good thing. In a multi-developer project, you could more easily have different developers working on each app.

It depends on how cohesion between your app and the auto refresh task.
If the auto refresh task can running standalone, without interaction with your app, then it better to start your task as a new process. Use child_process directly is not a good idea, spawn/monitor/respawn child process is tricky, you can use crontab or pm2 to manage it.
If auto refresh task depends on your app, you can use child_process directly, send message to it for schedule. But first try to break this dependency, this will simplify your app, easy to deployment and maintain separately. Child process is long running or one shot is not a question until you have hundreds of such task running on one machine.

Node/Express: running specific CPU-instensive tasks in the background

I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?

The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.

General question about parallel threading in C++

I haven't used threading in my program before. But there is a problem I am having with this 3rd party application.
It is an offsite backup solution and it has a server and many clients. We have an admin console to manage all the clients and that is where there is a problem.
If one of the client side application gets stuck, or is running in a broken condition, the admin console waits forever to get a response and does not display anything.
$for(client= client1; client < last_client; client++){
if (getOServConnection(client, &socHandler)!=NULL) { .. }
}
I want two solutions to this. I want to know if there is anyway, I can set a timeout for the function getOServConnection, so that I get a response within X seconds.
And, I want to know how to call this function in parallel for all clients, so that I get the response from all clients within X seconds.
the getOServConnection contains a WSAConnect call, and I don't want to use any options on the socket, since it is used by other modules and it will affect the application severely.

First.. If you move the call that hangs into a separate thread you can use the main thread for starting a timer an waiting for the timeout. If you are using Visual C++ and if you are in Win32 you can use the (rather old) MFC based timer. Once this timer expires it will launch a function call OnTimer. This timer does not affect your application's main thread as it works in a different system based thread.
Second.. If you need to start any number of threads with that connection you should start thinking of a design pattern to use for that. You could use a fixed number of threads, and in that case you may want to use a object pool. Or if the number of threads is (relatively) limitless you may want to use a factory method

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string