I have a backend written in Adonis v4.1 (node.js) that I'm deploying on AWS ECS.
Inside the Task Definition of my API I have Adonis and Redis, linked by bridge network.
I'm using adonis-scheduler to run a cronjob every 30 minutes. This cronjob makes a db query and for each row it creates a job (with adonis-queue-pro library). The job concurrency is 1 (so if I have 30 records will be processed one job at a time). Each job makes and external API call and updates the record in the db.
On localhost everything works fine, but on staging environment sometimes happens that my API Task has more than one instance and more than one cronjob starts at the same exact time. So instead of having on job for each record I have multiple jobs for each record that runs simultaneously. This is a big problem.
Is there a way to handle this situation and have only the cronjob of one instance to execute?
Or maybe could I have a Task Definition with only the scheduler part in one single instance (always) and the API and redis with many instances?
The problem is that the code I wrote is deeply linked to my backend, so I cannot use something like lambda functions or other external services.
With ECS, you can have REPLICA or DAEMON service types. DAEMON ensures that only one instance of the task is placed on each instance in the cluster. In your case, it'd mean having exactly one instance behind the cluster. If this constraint is acceptable, give it a go and see if the problem persists.
Related
I am using AWS services in deploying my application which currently has the production site setup on an application load balancer running 2 instances of my NodeJS server.
My current concern is if I just setup a node-cron to trigger a task at 5:00am, it will do this for each server I spin up.
I need to implement an email delivery system where at 5:00am it will query my database table I made to generate customized emails (need to iterate over each individual;s record which has a unique array that helps build a list of items for each user). I then fire the object off to AWS SES.
What's are some ways you have done this?
**Currently based off my readings I am looking at two options:
**
Setup a node-cron child process within one cluster (but if I have auto-scaling, wouldn't this create a duplicate node-cron task), but this would probably require Redis and tracking the process across servers
OR
Setup an EventBridge API which fires an api.mybackendserver.com/send-email-event where I then carry out my logic. (this seems like the simpler approach, and the drawbacks would be potential CPU/RAM spikes which would be fine as i'm regionally based and would do this in off-peak hours).
EventBridge is definitely a way to go with CRON. If you're worried about usage spikes you could use CRON to invoke a Lambda function. That pushes events to SQS for each job. Those would be polled by EC2 instances.
Other way would be to schedule a task to increase number of instances before cron event occurs.
I need to create a node.js service that is supposed to run a large number (potentially hundreds) of scheduled jobs simultaneously. This service should also expose a REST interface to allow end users to perform CRUD on these jobs.
At first I thought of going for agenda.js and since we use k8s, launching a few instances so we could deal with this amount of jobs.
However, I also thought of another idea and wanted to see if somebody already done if - since we use k8s, I thought of harnessing the power of k8s jobs and create a service that will communicate with k8s api and manage the jobs.
Is it feasible? What things do I have to take into consideration if i'm going in this direction?
what you want is basically the definition of kubernetes operator, and yes, it is possible to do what you want.
in your case, you can use the kubernetes client for nodejs
I have to implement scheduled push notifications in backend (Node.js) which are triggered only at certain time. So I need to query DB (PostgreSQL) in interval 1-2 minutes and find only those notifications which need to be triggered.
What is the better solution?
Use internal setTimeout query function
or
External CRON script which will trigger querying function in Node.js?
Thank you
If the second option is just a cron that makes an HTTP request to your service, its pretty equivalent. If, instead the solution is packaged as a script and the cron drives that script directly it has a couple of tradeoffs, mainly based around operations:
Use internal setTimeout query function or
This means you have to launch a long running service, and to keep it running. Things like memory leaks may become an issue.
External CRON script which will trigger querying function in Node.js?
This is the strategy that google GCP uses for its cron offering. If the function just pings a web url the solutions are pretty equivalent.
IMO, The biggest issue with both of these is being careful about coupling a background (async) workload with an online workload. If your service is servicing real time live HTTP requests, but is also running these background workloads that takes resources away from servicing synchronous HTTP requests. If they are two fundamentally different workloads than it also makes sense to separate them for scaling purposes.
I've been in a situation where monitoring has actually informed this decision. The company used prometheus and didn't have push gateway installed. So a cron based solution had 0 metric visibility, but the service version was trivial to add metrics / alerting.
I am building a small utility that packages Locust - performance testing tool (https://locust.io/) and deploys it on azure functions. Just a fun side project to get some hands on with the serverless craze.
Here's the git repo: https://github.com/amanvirmundra/locust-serverless.
Now I am thinking that it would be great to run locust test in distributed mode on serverless architecture (azure functions consumption plan). Locust supports distributed mode but it needs the slaves to communicate with master using it's IP. That's the problem!!
I can provision multiple functions but I am not quite sure how I can make them talk to each other on the fly(without manual intervention).
Thinking out loud:
Somehow get the IP of the master function and pass it on to the slave functions. Not sure if that's possible in Azure functions, but some people have figured a way to get an IP of azure function using .net libraries. Mine is a python version but I am sure if it can be done using .net then there would be a python way as well.
Create some sort of a VPN and map a function to a private IP. Not sure if this sort of mapping is possible in azure.
Some has done this using AWS Lambdas (https://github.com/FutureSharks/invokust). Ask that person or try to understand the code.
Need advice in figuring out what's possible at the same time keeping things serverless. Open to ideas and/or code contributions :)
Update
This is the current setup:
The performance test session is triggered by an http request, which takes in number of requests to make, the base url, and no. of concurrent users to simulate.
Locustfile define the test setup and orchestration.
Run.py triggers the tests.
What I want to do now, is to have master/slave setup (cluster) for a massive scale perf test.
I would imagine that the master function is triggered by an http request, with a similar payload.
The master will in turn trigger slaves.
When the slaves join the cluster, the performance session would start.
What you describe doesn't sounds like a good use-case for Azure Functions.
Functions are supposed to be:
Triggered by an event
Short running (max 10 minutes)
Stateless and ephemeral
But indeed, Functions are good to do load testing, but the setup should be different:
You define a trigger for your Function (e.g. HTTP, or Event Hub)
Each function execution makes a given amount of requests, in parallel or sequentially, and then quits
There is an orchestrator somewhere (e.g. just a console app), who sends "commands" (HTTP call or Event) to trigger the Function
So, Functions are "multiplying" the load as per schedule defined by the orchestrator. You rely on Consumption Plan scalability to make sure that enough executions are provisioned at any given time.
The biggest difference is that function executions don't talk to each other, so they don't need IPs.
I think the mentioned example based on AWS Lambda is just calling Lambdas too, it does not setup master-client lambdas talking to each other.
I guess my point is that you might not need that Locust framework at all, and instead leverage the built-in capabilities of autoscaled FaaS.
I am trying to port an application to azure platform. I want to run an existing application multiple times. My initial idea is as follows: I have a master_process. I have many slave_processes. Each process is a worker role in Azure. Each slave_process will run an instance of the application independently. I want master_process to start many slave_processes and provide them the input arguments. At the end, master_process will collect the results. Currently, I have a working setup for calling the whole application from a C# wrapper. So, for the success, I need two things: First, I have to find a way to start slave workers inside of a master worker (just like threads). Second, I need to find a way to store results of the slave workers and reach these result files from master worker. Can anyone help me?
I think I would try and solve the problem differently. Deploying a whole new instance can take 15 to 30 minutes. Adding extra instances to an already running worker role is a little quicker, but not by much. I'm going to presume that you want results faster than that and that this process is something that is run frequently.
I would have just one worker role type that runs your existing logic and as many instances of that worker role that you determine you'll need. Whatever your client is will decide that it needs to break the work up in a certain number of pieces, let's say 10 for the sake of argument. It will give each piece of work an ID (e.g. a guid) and then put 10 messages that contain the parameters and the ID into a queue. Your worker role instances take messages out of the queue, do their work and write their results to storage somewhere (either SQL Azure, Azure Table Storage or maybe even blob storage depending on what the results are). The client polls that storage to wait for all of the results to be complete and then carries on.
If this is a process that is only run infrequently, then rather than having the worker roles deployed all of the time, you could use the same method I've described, but in addition get the client code to deploy the worker roles when it starts and then delete them when it's finished through the management API. There are samples on MSDN on how to use this.
I have a similar situation you might find useful:
I have a large sequential batch process I run in Azure which requires pre and post processing. The technique I used was to use instances of a single multifunctional worker role, but to use a "quorum" to nominate a head node, which then controls the workflow.
They way I do it is using an azure page blob as the quorum (basically a kind of global mutex/lock), because once a node grabs it for writing it's locked. For resilience, in case there's an issue with the head node, all nodes occasionally try to recapture the quorum.