How to scale a node cron job to multiple machines?

How to scale a node cron job to multiple machines? - node.js

I have a node cron job inside an express server that runs every minute and picks up campaigns from mongodb and then for the active campaigns does some processing. This server only does processing. I now want to scale this job processing across multiple machines because of increasing campaigns. I was thinking of assigning a key to the campaign when it is created and then during job only pick campaigns of a particular key on a particular machine. So If I have two machines, assign half campaigns with key1 and other half with key 2 and then specify in the env variable of the app that which machine should pick campaigns of which key. I will have to manually maintain env variable for each machine that machine one processes only key 1 and machine 2 processes key 2.
This looks like a lot of manual work. Is their any node package/platform or approach to handle this more gracefully and without manual overhead?

Related

How do you handle time based events in a cluster?

I have a Node.js application that runs in a cluster, therefore, there are many instances of an app running simultaneously and accepting requests from load balancer.
Consider I have a notion of a "subscription" in my app, and each subscription is stored in the central database with dateStart and dateEnd fields. For each subscription I need to send notifications, reminding clients about subscription expiration (e.g. 14, 7 and 3 days before expiration). Also, I will need to mark a subscription as expired and perform some additional logic, when time comes.
What are the best practices to handle such time-based events for multi-instance applications?
I can make my application to run expiration routine, e.g. every five minutes, but then I will have to deal with concurrency issues, because every instance will try to do so and we don't want notifications to be submitted twice.

I refactored the scheduled jobs for one of our systems when we clustered it a few years ago, a similar issue to what you are describing.
I created a cluster aware scheduled job monitor and used the DB to ensure only one was operating at any given time. Each generated their own unique GUID at startup and used it for an ID. At startup, they all look to the DB to see if a primary is running based on a table indicating ID, start time and last run. A primary is running if the recorded last run is with a specified time. If a primary is running, the rest stay running as backups and check on a given interval to take over if the primary were to die. If the primary dies, the one which takes over as primary marks the record with its ID and updates the times, then looks for jobs in other tables which would be similar to your subscriptions. The primary will continue to look for jobs at a configurable interval until it dies or is restarted.
During testing, I was able to spin up 50+ instances of the monitor which all constantly attempted to become primary. Only one would ever take over and during testing I would then manually kill the primary and watch the others all vie for primary, but only one would prevail. This approach relies on the DB record to only allow one of the threads to update the record using qualified updates based on the prior information in the record.

Is it possible to run 2 different agents at the same time

In IBM Notes I have 2 agents on 1 server, 1 on database A and one on database B. Agent on Database A copies forms to Server 2 and agent on database B copies forms with the same form-name also to Server 2. So eventually on Server 2 I have 1 view where I can access all these documents.
1 agent takes time to run so that it can copy all documents to Server 2. My question is, is it possible to run on server 1 the 2 agents at the same time?

Domino's Agent Manager can be configured to allow many agents to run at the same time. I believe the default setting is just one at a time during the day and two at a time at night, but this is routinely changed by server administrators. At all times, it allows only one scheduled or new/edited documents agent per database to run simultaneously, but there are various other ways to trigger agents so that many will be running even though they are in the same database. Web agents, for example, aren't run under the control of the Agent Manger, and mail-triggered agents are not either, so they are not subject to this restriction.

Amazon EC2 boot time

Our web app performs a random number of tasks for a user initiated action. We have built a small system where a master server calculates the number of worker servers that are needed to complete the task, and the same number of EC2 instances are "Turned On" which pick up the tasks and perform the same.
"Turned On" because the time taken to span an instance from an AMI is extremely high. So the idea is have a pool of worker instances and start and stop them as per requirement.
Also considering how amazon charges when you start up an instance (You are billed for 1 hour every time you Turn on an instance). The workers once spawned will be active for an hour and will accept other tasks during this period.
We have managed to get this architecture up and running, however the boot up time still bothers us as it fluctuates between 40 to 80 seconds. Is there some way we can reduce the same.
Below is the stack information of the things running on the worker instance
Ubuntu AMI
Node JS (using forever-service for auto startup on boot)
Docker (the tasks are performed inside individual docker containers)

Have you taken a look at AWS lambda ? (https://aws.amazon.com/lambda ).
Lambda supports node.js and will automatically manage the scaling of required worker infrastructure, depending on the number of requests. This will avoid your "one hour bill" problem. You only pay for used processing time.

How to parallelize crontab executions to increase user base for web app based on mongodb and mysql?

I have a symfony based web application that runs on mongodb and mysql backend. The principal of the application is that for each user there is a python script that runs 4-12 times a day on cronjobs and populates the mysql and mongodb databases. The script takes between 1.5 minutes to 2 minutes to execute. At the moment the cronjob runs on sequential basis. That means that the script executes a job and waits for the job to end before executing the next one. The moment my web application has a new user the cronjobs are auto created for a duration of time. With 24 hours in a day I can run a limited number of cronjobs thereby, limited number of users (around 250-300)
What would I need to do if I wanted to host 1000 to a million users on my web application? Can I run my script on multithread basis? That means instead of waiting for a job to finish, launch hundreds of job at the same time. This way I can grow my user base exponentially.
But, what concurrency will mongodb and mysql be able to sustain? how many jobs can I execute parallelly? What system factors do I need to consider to grow my user base? Do I need to add more machines to my application?

Lotus Notes Agent

Where can I find a great online reference on Lotus Notes Agent. I currently having problems with having simultaneous agents and understanding agents, how it works, best practices, etc? Thanks in advance!

I currently having problems with having simultaneous agents
Based on this comment I take it you are running a scheduled agent?
The way that scheduled agents work is that only one agent from a particular database can be run at one time, even if you have multiple Agent manager (AMGR) threads. Also agents cannot run less then every 5 minutes. The UI will let you put in a lower number, but it will change it.
The other factors to take into account is how long your agent will run for. If it runs for longer then the interval time you setup you will end up backlogging the running time. Also the server can be configured to kill agents that run over a certain time. So you need to make sure the agent runs within that timeframe.
Now to bypass all this you can execute an agent from the Domino console like as follows.
tell amgr run "database.nsf" 'agentName'
This will run in it's own thread outside of the scheduler. Because of this you can create a program document to execute an agent in less then 5 minute intervals and multiple agents within the same database.
This is dangerous in doing this however, as you have to be aware of a number of issues.
As the agent is outside the control of the scheduler you can't kill it as you would in the scheduler.
Running multiple threads can tie up more processes. So while the scheduler will backlog everything if the agent runs longer then the schedule, doing a program document in this instance will crash the server.
You need to be aware of what the agent is doing in the database so that it won't interfere with any other agents in the same database, and can cope if it is run twice in parallel.
For more reading material on this:
Improving Agent Manager Performance.
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/topic/com.ibm.help.domino.admin.doc/DOC/H_AGENT_MANAGER_NOTES_INI_VARIABLES.html
Agent Manager trouble shooting.
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/topic/com.ibm.help.domino.admin.doc/DOC/H_ABOUT_TROUBLESHOOTING_AGENTS.html
Troubleshooting Agents (Old material but still relevant)
http://www.ibm.com/developerworks/lotus/library/ls-Troubleshooting_agents/index.html
... and related tech notes:
Title: How to run two agents concurrently in the same database using a wrapper agent
http://www.ibm.com/support/docview.wss?uid=swg21279847
Title: How to run multiple agents in the same database using a Program document
http://www.ibm.com/support/docview.wss?uid=swg21279832

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string