Architecture for "Sending Batch Email every midnight on AWS" - node.js

We have a Nodejs server running on an AWS EC2 instance with MongoDB at the backend. We need to send email notifications to our users depending on certain time-specific criteria.
The simplest way is to setup another instance and run a cron job on it every midnight. Go through all users one by one and dispatch emails if they fit the criteria.
This seems to be a pretty non-optimal solution as we have thousands of users. Is there any other way of doing this?

Related

How to poll another server from Node.js?

I'm currently developing a Shopify app with Node/Express and a Postgres database. When a user registers an account and connects their Shopify store, I'll need to download all of their store's orders. They could have 100,000s of orders, so I'd like to use a Shopify GraphQL Bulk Operation. While Shopify is handling this, my Node server will need to poll the Shopify server to check on the progress, and when the operation is complete, Shopify will send me a link where I can download all of the data. Once the data is processed and stored in my database, I'll send the user an email to say that their account is now set up.
How should I handle polling the Shopify server? The process could take anywhere from a few mins to hours. Using setInterval() would be a bad idea right? Because if the server restarts for whatever reason, It will lose the interval? So, should I use some sort of background task? And would I need to store anything in my database? I've researched cron jobs, child processes, worker threads, the bull package -- and it's left me a little confused.
(I also know that I could use a webhook, but Shopify offers no guarantees that my app will receive the webhook.)
Upon installation, launch a background job labeled "GetCustomerOrders". As you know, background jobs are mature, and nicely handle problems. For example, they can retry themselves if something goes wrong.
The Background job itself just sets up the Bulk Download and then settles into Poll. Polling is no big deal and just happens. As you said, could be minutes, could take hours. Nevertheless, a poll gets status on a bulk download, and that can even be hot-rodded. For example, you poll with an ID. So you poll till that ID completes. Regardless of restarts.
At the end of that rather simple setup, you get an URL to download and parse JSON. Spawn another job even for that. Endless fun. Why sweat it? Background jobs are the way to go.
The Webhook idea is OK but as the documentation says, they are not 100% and CRON is bush-league in that it misses out on the mature development of jobs in queues and is more like a simple trigger. Relying on CRON to start something is fine, but gives you zero management over what it starts.
I am guessing NodeJS has a decent background job system by this time. When you look at Sidekiq for Ruby you realize what awesome is. Surely you can find a copycat in Node that comes close anyway.

Algorithm to trigger bulk events by schedule

I'd like to create a web app that allows users to do email outreach, but I'm having trouble with a good solution.
I'd like each user to be able to send 100 emails per day, which would be configurable during certain times, e.g. 6 am to 10 am. I'm able to determine a delivery schedule per user (based on times that they configure), but b/c users can change their email schedules at any time, I'd have to reconfigure the order of processing.
Is there a queue type in Redis (for instance) that triggers by time?
Or a way to trigger events on a schedule in nodejs that's scalable?
There is a Redis feature: Keyspace notifications, which allow clients to subscribe to Pub/Sub channels in order to receive redis events( an example event is a"key expiry" event).
Documentation: http://redis.io/topics/notifications
For your use case, you can maybe use the "key expiry" event.

How to synchronize background tasks across multiple FastAPI processes?

I have a FastAPI application which sends emails when the users register on the website.
The app is implemented in such a way that there is a cron task (scheduled every minute) which checks the database and if a flag is set, it will try to send an email.
Deployment: Two instances of the FastAPI application are running connected to a locally hosted MYSQL database
Problem: Since, there are two instances of the Application running, each one will trigger a cron job every minute and this results in sending an email twice.
How to synchronise between multiple processes? Please help me with this issue. Thanks

Cron job on NodeJS server runs multiple times simultaneously due to load balancers

I have cron job services on my nodeJS server (part of a React app) that I deploy using Convox to AWS, which has 4 load balancer servers. This means my cron job runs 4 times simultaneously on each server, when I only want it to run once. How can I stop this from happening and have my cron jobs run only once? As far as I know, there is no reliable way to lock my cron to a specific instance, since instances are volatile and may be deleted/recreated as needed.
The cron job services conduct tasks such as querying and updating our database, sending out emails and texts to users, and conducting external API calls. The services are run using the cron npm package, upon the server starting (after server.listen).
Can you expose these tasks via url? That way you can have an external cron service that requests each job via url against the ELB.
See https://cron-job.org/en/
Another advantage of this approach is you get error reports if a url does not return a 200 status. This could simplify error tracking across all jobs.
Also this provides better redudency and load balancing, as opposed to having a single instance where you run all jobs.
I had the same issue. Se my solution here. Two emails was sent because of two instances on AWS. I lock each sending by unique random number.
My example based on MongoDB.
https://forums.meteor.com/t/help-email-sends-emails-twice/50624

nodejs - run a function at a specific time

I'm building a website that some users will enter and after a specific amount of time an algorithm has to run in order to take the input of the users that is stored in the database and create some results for them storing the results also in the database. The problem is that in nodejs i cant figure out where and how should i implement this algorithm in order to run after a specific amount of time and only once(every few minutes or seconds).
The app is builded in nodejs-expressjs.
For example lets say that i start the application and after 3 minutes the algorithm should run and take some data from the database and after the algorithm has created some output stores it in database again.
What are the typical solutions for that (at least one is enough). thank you!
Let say you have a user request that saves url to crawl and get listed products
So one of the simplest ways would be to:
On user requests create in DB "tasks" table
userId | urlToCrawl | dateAdded | isProcessing | ....
Then in node main site you have some setInterval(findAndProcessNewTasks, 60000)
so it will get all tasks that are not currently in work (where isProcessing is false)
every 1 min or whatever interval you need
findAndProcessNewTasks
will query db and run your algorithm for every record that is not processed yet
also it will set isProcessing to true
eventually once algorithm is finished it will remove the record from tasks (or mark some another field like "finished" as true)
Depending on load and number of tasks it may make sense to process your algorithm in another node app
Typically you would have a message bus (Kafka, rabbitmq etc.) with main app just sending events and worker node.js apps doing actual job and inserting products into db
this would make main app lightweight and allow scaling worker apps
From your question it's not clear whether you want to run the algorithm on the web server (perhaps processing input from multiple users) or on the client (processing the input from a particular user).
If the former, then use setTimeout(), or something similar, in your main javascript file that creates the web server listener. Your server can then be handling inputs from users (via the app listener) and in parallel running algorithms that look at the database.
If the latter, then use setTimeout(), or something similar, in the javascript code that is being loaded into the user's browser.
You may actually need some combination of the above: code running on the server to periodically do some processing on a central database, and code running in each user's browser to periodically refresh the user's display with new data pulled down from the server.
You might also want to implement a websocket and json rpc interface between the client and the server. Then, rather than having the client "poll" the server for the results of your algorithm, you can have the client listen for events arriving on the websocket.
Hope that helps!
If I understand you correctly - I would just send the data to the client-side while rendering the page and store it into some hidden tag (like input type="hidden"). Then I would run a script on the server-side with setTimeout to display the data to the client.

Resources