I'm relatively new to nodejs and was wondering what the best approach would be in setting up a automated job between set hours (lets say 8am-10pm every hour dependent on the customers timezone), to perform a task then based on results send users emails?
The task you describe is not so simple at first:
You need to have some queue implementation, if you want something simple you can use Kue.js which uses Redis for storing the jobs. For more scaled solution would be using ZMQ or RabbitMQ.
The job processor, is the part where you have to interact with database and give you out the list of emails. Depending on the size of your email list, it's better if it could be splitted into chunks like 200 emails and then create "child-jobs"
Email sending, there are different providers and possible solutions. First your own SMTP server, Amazon SES, Mailchimp and other similar services. For the sending library there is already great, Nodemailer.
This is not the full concept, but should give you the idea. I am sorry if this is too abstract concept, but the question is not specific either.
Related
Background
I have a monolith Node.js + PostgreSQL app that, besides other things, needs to provide real-time in-app notifications to end users.
It is currently implemented in the following way:
there's a db table notifications which has state (pending/sent), userid (id of the notification receiver), isRead (did a user read the notification), type and body - notification data.
once specific resources get created or specific events occur, a various number of users should receive in-app notifications. When a notification is created, it gets persisted to the db and gets sent to the user using WebSockets. Notifications can also get created by a cron job.
when a user receives N number of notifications of the same type, they get collapsed into one single notification. This is done via db trigger by deleting repeated notifications and inserting a new one.
usually it works fine. But when the number of receivers exceeds several thousands, the app lags or other requests get blocked or not all notifications get sent via WebSockets.
Examples of notifications
Article published
A user is awarded with points
A user logged in multiple times but didn't perform some action
One user sends a friend request to another
One user sent a message to another
if a user receives 3+ Article published notifications, they get collapsed into the N articles published notification (N gets updated if new same notifications get received).
What I currently have doesn't seem to work very well. For example, for the Article created event the api endpoint that handles the creation, also handles notifications send-outs (which is maybe not a good approach - it creates ~5-6k notifications and sends them to users via websockets).
Question
How to correctly design such functionality?
Should I stay with a node.js + db approach or add a queuing service? Redis Pub/Sub? RabbitMQ?
We deploy to the k8s cluster, so adding another service is not a problem. More important question - is it really needed in my case?
I would love some general advice or resources to read on this topic.
I've read several articles on messaging/queuing/notifications system design but still don't quite get if this fits my case.
Should the queue store the notifications or should they be in the db? What's the correct way to notify thousands of users in real-time (websockets? SSE?)?
Also, the more I read about queues and message brokers, the more it feels like I'm overcomplicating things and getting more confused.
Consider using the Temporal open source project. It would allow modeling each user lifecycle as a separate program. The Temporal makes the code fully fault tolerant and preserves its full state (including local variables and blocking await calls) across process restarts.
I want to show the user exactly to the second when he can have access to a given page, othervise it will be blocked. Lets say that I receive specific date and time from the server.
I guess I could use setTimeout function but I'm sure its a bad idea.
I can use a scheduler like node cron in backend but I'd need to send a message to frontend somehow after given time has passed.
Are webSockets an option? Or is there easier way?
I want to show the user exactly to the second when he can have access to a given page
For such accuracy, indeed the WebSocket communication is the way to go. This protocol is widely used on the web for push notification in email/social services like Gmail, Facebook etc.
Regarding the backend, I would suggest you to use a more scalable approach. You could use Bull to create a scheduling service. Bull uses Redis as a store and can operate with multiple processors(Node Processes), ensuring that each task is processed only by one processor. With one word it abstracts away the complexities which arise in distributed systems.
I am developing an email sending service, probably for sending bulk emails using sendgrid web API, but I am not able to figure out best practice for scalable system. I wish to keep a record of all those emails which failed to deliver and retry sending to those failed emails after all emails have been sent. I am using NodeJs, so just wanted to know if there is any way to speed up my process(something like sending multiple emails at the same time)
There are multiple ways to handle this, I will suggest two which seems obvious to me.
(Recommended - Easy) Use Async module's control flow option called queue Async Documentation. You can feed in all the request in form of an array of object request and then change concurrency setting to let's say 100, it'll run concurrent 100 workers at one time and to log errors make a separate mechanism and once all the values have been run through handle it separately.
Spawn multiple workers using node.js native approach.
Sendgrid offers an npm package for node.js integration, so you don't have to reinvent the wheel. It accepts messages at a high rate, so you shouldn't have problems delivering yours to sendgrid. You just dump your messages into sendgrid.
Email, being a store-and-forward system, is inherently asynchronous. That means it operates far from real time. Some messages are delivered in a few seconds, and others take hours (when they get soft--"retry later"--rejections from destination servers, for example).
Sendgrid handles this issue with a "bounces" API. (And with "bounces" features in their web back end application). Many bounces are "hard" bounces, meaning you must avoid trying to send messages to that address again. You can use the bounces API to retrieve a list of bounced messages. You should remove those addresses from your email list, and not try to send them again. (Sendgrid bans users who repeatedly send mailings with a high undeliverable rate.)
They also have an "invalid emails" API. This works like "bounces" and returns lists of addresses that are ill-formed or, if sendgrid can tell, not present on the destination server. Again, you should remove these addresses from your email list. If they're invalid now, they will be invalid tomorrow.
Sendgrid offers all sorts of tutorials on this subject.
I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.
I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/
I am developing a node.js app. In my app, I need to send blast emails and SMS to users satisfying a particular criteria. I use Gmail SMTP for emails and a third party vendor for SMS. I'm assuming firing the API's for email and sms services in a loop is dangerous. What's the right way do it?
The time spent is obviously proportional to N, being N the size of your set. As much users you have, longer it takes. Keep in mind that requests over the network are not blocking in any case.
Anyway, unless it risks N to be thousands or millions of items, you can do it in a loop and attach a proper callback to handle responses/errors.
Otherwise, you can send an email/sms and schedule the same operation for the next element using nextTick (see nodejs documentation for further details).
This way you'll spread all the activities over several iterations of the event loop.