Scheduled push notifications - inside Node.js function or CRON job? - node.js

I have to implement scheduled push notifications in backend (Node.js) which are triggered only at certain time. So I need to query DB (PostgreSQL) in interval 1-2 minutes and find only those notifications which need to be triggered.
What is the better solution?
Use internal setTimeout query function
or
External CRON script which will trigger querying function in Node.js?
Thank you

If the second option is just a cron that makes an HTTP request to your service, its pretty equivalent. If, instead the solution is packaged as a script and the cron drives that script directly it has a couple of tradeoffs, mainly based around operations:
Use internal setTimeout query function or
This means you have to launch a long running service, and to keep it running. Things like memory leaks may become an issue.
External CRON script which will trigger querying function in Node.js?
This is the strategy that google GCP uses for its cron offering. If the function just pings a web url the solutions are pretty equivalent.
IMO, The biggest issue with both of these is being careful about coupling a background (async) workload with an online workload. If your service is servicing real time live HTTP requests, but is also running these background workloads that takes resources away from servicing synchronous HTTP requests. If they are two fundamentally different workloads than it also makes sense to separate them for scaling purposes.
I've been in a situation where monitoring has actually informed this decision. The company used prometheus and didn't have push gateway installed. So a cron based solution had 0 metric visibility, but the service version was trivial to add metrics / alerting.

Related

AWS and NodeJS architecture for a scheduled/cron task in multi server setup

I am using AWS services in deploying my application which currently has the production site setup on an application load balancer running 2 instances of my NodeJS server.
My current concern is if I just setup a node-cron to trigger a task at 5:00am, it will do this for each server I spin up.
I need to implement an email delivery system where at 5:00am it will query my database table I made to generate customized emails (need to iterate over each individual;s record which has a unique array that helps build a list of items for each user). I then fire the object off to AWS SES.
What's are some ways you have done this?
**Currently based off my readings I am looking at two options:
**
Setup a node-cron child process within one cluster (but if I have auto-scaling, wouldn't this create a duplicate node-cron task), but this would probably require Redis and tracking the process across servers
OR
Setup an EventBridge API which fires an api.mybackendserver.com/send-email-event where I then carry out my logic. (this seems like the simpler approach, and the drawbacks would be potential CPU/RAM spikes which would be fine as i'm regionally based and would do this in off-peak hours).
EventBridge is definitely a way to go with CRON. If you're worried about usage spikes you could use CRON to invoke a Lambda function. That pushes events to SQS for each job. Those would be polled by EC2 instances.
Other way would be to schedule a task to increase number of instances before cron event occurs.

Alternative to Cloud Tasks / Cron / Task Queue on CGP in Python 3 that doesn't have a 10 minute timeout

I've recently started using an App Engine on Google Cloud Platform and have set up some cron jobs to get some scheduled work done. However recently one of my tasks took more than 10 minutes and it timed out... obviously I could break this work into batches or find another way around the problem, however I'm keen to not always be mindful of how long a job might take and want future jobs to run until completed or failed.
I've looked into various services that Google offer but with no success; Task Queue is Python 2.x only and Cloud Tasks has the same 10 minute limit unless you manually manage scaling (which I would prefer to stay automatic as that's the point of App Engine for me).
Am I missing something? This 10 minute limit seems like a big unnecessary blocker and I have no idea where to look.
https://cloud.google.com/tasks/docs/creating-appengine-handlers
Thanks for your time.
Google services such as App Engine are designed to model a web server HTTP Request / Response design. You are trying to use them as task/execute engines.
Use the correct service if you require long execution times, which usually means requests that take longer than a few minutes to complete. Use Cloud Tasks and Compute Engine. Otherwise you will need to architect your application to fit with App Engine's requirements and limitations.
Cloud Tasks for Asynchronous task execution
If you want to use App Engine, you need to use either Basic Scaling or Manual Scaling. I understand that manual scaling isn't your favorite, I also don't like this mode. But the basic scaling is acceptable.
In addition, it's more designed to perform background task, exactly what you try to achieve.
If you accept this change, you can use Cloud Task. You have up to 24H of timeout if your App Engine service is in basic scaling (or manual)
You have this same information, on the scaling description on App Engine documentation.
When you use basic scaling, your instance type needs to be updated to BXXX.

Running Locust in distributed mode on Azure functions

I am building a small utility that packages Locust - performance testing tool (https://locust.io/) and deploys it on azure functions. Just a fun side project to get some hands on with the serverless craze.
Here's the git repo: https://github.com/amanvirmundra/locust-serverless.
Now I am thinking that it would be great to run locust test in distributed mode on serverless architecture (azure functions consumption plan). Locust supports distributed mode but it needs the slaves to communicate with master using it's IP. That's the problem!!
I can provision multiple functions but I am not quite sure how I can make them talk to each other on the fly(without manual intervention).
Thinking out loud:
Somehow get the IP of the master function and pass it on to the slave functions. Not sure if that's possible in Azure functions, but some people have figured a way to get an IP of azure function using .net libraries. Mine is a python version but I am sure if it can be done using .net then there would be a python way as well.
Create some sort of a VPN and map a function to a private IP. Not sure if this sort of mapping is possible in azure.
Some has done this using AWS Lambdas (https://github.com/FutureSharks/invokust). Ask that person or try to understand the code.
Need advice in figuring out what's possible at the same time keeping things serverless. Open to ideas and/or code contributions :)
Update
This is the current setup:
The performance test session is triggered by an http request, which takes in number of requests to make, the base url, and no. of concurrent users to simulate.
Locustfile define the test setup and orchestration.
Run.py triggers the tests.
What I want to do now, is to have master/slave setup (cluster) for a massive scale perf test.
I would imagine that the master function is triggered by an http request, with a similar payload.
The master will in turn trigger slaves.
When the slaves join the cluster, the performance session would start.
What you describe doesn't sounds like a good use-case for Azure Functions.
Functions are supposed to be:
Triggered by an event
Short running (max 10 minutes)
Stateless and ephemeral
But indeed, Functions are good to do load testing, but the setup should be different:
You define a trigger for your Function (e.g. HTTP, or Event Hub)
Each function execution makes a given amount of requests, in parallel or sequentially, and then quits
There is an orchestrator somewhere (e.g. just a console app), who sends "commands" (HTTP call or Event) to trigger the Function
So, Functions are "multiplying" the load as per schedule defined by the orchestrator. You rely on Consumption Plan scalability to make sure that enough executions are provisioned at any given time.
The biggest difference is that function executions don't talk to each other, so they don't need IPs.
I think the mentioned example based on AWS Lambda is just calling Lambdas too, it does not setup master-client lambdas talking to each other.
I guess my point is that you might not need that Locust framework at all, and instead leverage the built-in capabilities of autoscaled FaaS.

Trigger multiple concurrent service bus trigger azure functions without time degradation

I have a service bus trigger function that when receiving a message from the queue will do a simple db call, and then send out emails/sms. Can I run > 1000 calls in my service bus queue to trigger a function to run simultaneously without the run time being affected?
My concern is that I queue up 1000+ messages to trigger my function all at the same time, say 5:00PM to send out emails/sms. If they end up running later because there is so many running threads the users receiving the emails/sms don't get them until 1 hour after the designated time!
Is this a concern and if so is there a remedy?
FYI - I know I can make the function run asynchronously, would that make any difference in this scenario?
1000 messages is not a big number. If your e-mail/sms service can handle them fast, the whole batch will be gone relatively quickly. Few things to know though:
Functions won't scale to 1000 parallel executions in this case. They will start with 1 instance doing ~16 parallel calls at the same time, and then observe how fast the processing goes, then maybe add a second instance, wait again etc.
The exact scaling behavior is not publicly described and can change over time. Thus, YMMV, and you need to test against your specific scenario.
Yes, make the functions async whenever you can. I don't expect a huge boost in processing speed just because of that, but it certainly won't hurt.
Bottom line: your scenario doesn't sound like a problem for Functions, but if you need very short latency, you'll have to run a test before relying on it.
I'm assuming you are talking about an Azure Service Bus Binding to an Azure Function. There should be no issue with >1000 Azure Functions firing at the same time. They are a Serverless runtime and should be able to scale greatly if you are running under a consumption model. If you are running the functions in a service plan, you may be limited by the service plan.
In your scenario you are probably more likely to overwhelm the downstream dependencies: the database and SMS sending system, before you overwhelm the Azure Functions infrastructure.
The best thing to do is to do some load testing, and monitor the exceptions coming out of the connections to the database and SMS systems.

Migrating Task Queues to Cloud Functions

We're using Google App Engine Standard Environment for our application. The runtime we are using is Python 2.7. We have a single service which uses multiple versions to deploy the app.
Most of our long-running tasks are done via Task Queues. Most of those tasks do a lot of Cloud Datastore CRUD operations. Whenever we have to send the results back to the front end, we use Firebase Cloud Messaging for that.
I wanted to try out Cloud Functions for those tasks, mostly to take advantage of the serverless architecture.
So my question is What sort of benefits can I expect if I migrate the tasks from Task Queues to Cloud Functions? Is there any guideline which tells when to use which option? Or should we stay with Task Queues?
PS: I know that migrating a code which is written in Python to Node.js will be a trouble, but I am ignoring this for the time being.
Apart from the advantage of being serverless, Cloud Functions respond to specific events "glueing" elements of your architecture in a logical way. They are elastic and scale automatically - spinning up and down depending on the current demand (therefore they incur costs only when they are actually used). On the other hand Task Queues are a better choice if managing execution concurrency is important for you:
Push queues dispatch requests at a reliable, steady rate. They
guarantee reliable task execution. Because you can control the rate at
which tasks are sent from the queue, you can control the workers'
scaling behavior and hence your costs.
This is not possible with Cloud Functions which handle only one request at a time and run in parallel. Another thing for which Task Queues would be a better choice is handling retry logic for the operations that didn't succeed.
Something you can also do with Cloud Functions together with App Engine Cron jobs is to run the function based on a time interval, not an event trigger.
Just as a side note, Google is working on implementing Python to Cloud Functions also. It is not known when that will be ready, however it will be surely announced in Google Cloud Platform Blog.

Resources