How to fail over node.js timer on amazon load balancer? - node.js

I have setup 2 instance under aws load balancer. I have deployed node.js web services + mongodb in both instance. load balancer works fine with web services.
But, Problem is I have one timer service (node.js service only). the behavior of this timer is updating my mongodb based on some calculation.
My problem is, I must need to run this timer service (timer.js) at only one aws instance (out of 2) at same time. and expected that if one aws instance goes down then timer service at other instance will come up.
i know elb not providing this kind of facility.Can any one please help me to make it done ?
Condition : At a time only one timer service must be run with amazon load balancer.
Thanks.

You would have to implement this yourself using a locking algorithm using a shared data store that supports atomic operations
Alternatively, consider starting a "timer" server in an Auto Scale Group of Min:1, Max: 1 so Amazon keeps it running. This instance can be a t2.micro which is very cheap. It can either run the job itself, or just make an http request to your load balancer to run the job at the desired internal. If you so that, only one of your servers will run each job

Wouldn't it make more sense to handle this like any other "service" that needs to keep running?
upstart service
running node.js server using upstart causes 'terminated with status 127' on 'ubuntu 10.04'
This guy had a bad path in his file but his upstart script looks okay
monit
Node.js (sudo) and monit

Related

Heroku - restart on failed health check

Heroku does not support health checks on its own. It will restart services that crashed, but there is nothing like health checks.
It sometimes happen that service become unresponsive, but the process is still running. In most of modern cloud solution, you can provide health endpoint which is periodically called by the cloud hosting service and if that endpoints return either error or not at all, it will shut down such service and start new one.
That seems like industrial standard these days, but I am unable to find any solution to this for Heroku. I can even use external service with Heroku CLI, but just calling some endpoint is not sufficient - if there are multiple instances, they all share same URL and load balancer calls one of them randomly -> therefore it is possible to not hit failed instance at all. Even when I hit it, usually the health checks have something like "after 3 failed health checks in a row restart that instance", which is highly unprobable if there are 10 instances and one of it become unhealthy.
Do you have any solution to this?
You are right that this is industry standard and shame that it's not provided out of box.
I can think of 2 solutions (both involve running some extra code that does all of this:
a) use heroku API which allows you to get the IP of individual dynos, and then you can call each dyno how you want
b) in each dyno instance you can send a request to webserver like https://iamaalive.com/?dyno=${process.env.HEROKU_DYNO_ID}

AWS EBS runs into "504 Gateway Time-out"

I'm new to using AWS EBS and ECS, so please bear with me if I ask questions that might be obvious for others. To the issue:
I've got a single-container Node/Express application that runs on EBS. The local docker container works as expected. On EBS, I can access one endpoint of the API and get the expected output. For the second endpoint, which runs longer (around 10-15 seconds) I get no response and run after 60 seconds into a time out: "504 Gateway Time-out".
I wonder how I would approach debugging this as I can't connect to the container directly? Currently there isn't any debugging functionality in the code included either as I'm not sure what the best node approach for a EBS container is - any recommendations are highly appreciated.
Thank you in advance!
You can see the EC2 instances running on EBS in your AWS, and you can choose to give them IP addresses in your EBS options. That will let you SSH directly into them if you need to.
Otherwise check the keepAliveTimeout field in your server (the value returned by app.listen() of you're using express).
I got a decent number of 504s when my Node server timeout was less than my load balancer timeout.
Your application takes longer than expected (> 60 seconds) to respond, so either nginx or the Load Balancer terminates your request.
See my answer here

AWS EC2 boots via scheduled Lambda, how to alert of errors?

My EC2 instance boots daily for 5 minutes before shutting down.
On bootup, a NodeJS script is executed. Usually this script will complete long before the 5 minutes are up, but I'd like to be notified (SMS/email) whenever it doesn't.
What is the correct approach? I can try to send a notification within my NodeJS code after 5 minutes if execution wasn't finished, but Lambda could shut down the instance before this occurs.
I'm quite new to AWS so I apologize if this is rather basic, I haven't had luck on Google with this issue.
Can you check if whatever Node script is doing when EC2 instance is up could be replicated with one or more lambda functions.
Think about serverless and microservices architecture. Theoretically any workflow which need servers could be achived via AWS Lambda functions and various triggers. In you case I can think of the following:
SES to send out email messages
API gateway to expose your Lambda function for trigger
Cloud watch events to trigger lambda function like a cronjob.
I would be surprise to learn if Serverless won't work here. Please do share the case so that I can brainstorm more and share a solution.

Cron job on NodeJS server runs multiple times simultaneously due to load balancers

I have cron job services on my nodeJS server (part of a React app) that I deploy using Convox to AWS, which has 4 load balancer servers. This means my cron job runs 4 times simultaneously on each server, when I only want it to run once. How can I stop this from happening and have my cron jobs run only once? As far as I know, there is no reliable way to lock my cron to a specific instance, since instances are volatile and may be deleted/recreated as needed.
The cron job services conduct tasks such as querying and updating our database, sending out emails and texts to users, and conducting external API calls. The services are run using the cron npm package, upon the server starting (after server.listen).
Can you expose these tasks via url? That way you can have an external cron service that requests each job via url against the ELB.
See https://cron-job.org/en/
Another advantage of this approach is you get error reports if a url does not return a 200 status. This could simplify error tracking across all jobs.
Also this provides better redudency and load balancing, as opposed to having a single instance where you run all jobs.
I had the same issue. Se my solution here. Two emails was sent because of two instances on AWS. I lock each sending by unique random number.
My example based on MongoDB.
https://forums.meteor.com/t/help-email-sends-emails-twice/50624

Load test on Azure

I am running a load test using JMeter on my Azure web services.
I scale my services on S2 with 4 instances and run JMeter 4 instances with 500 threads on each.
It starts perfectly fine but after a while calls start failing and giving Timeout error (HTTP status:500).
I have checked HTTP request queue on azure and found that on 2nd instance it is very high and two instances it is very low.
Please help me to success my load test.
I assume you are using Azure App Service. If you check the settings of your App, you will notice ARR’s Instance Affinity will be enabled by default. A brief explanation:
ARR cleverly keeps track of connecting users by giving them a special cookie (known as an affinity cookie), which allows it to know, upon subsequent requests, to which server instance they were talking to. This way, we can be sure that once a client establishes a session with a specific server instance, it will keep talking to the same server as long as his session is active.
This is an important feature for session-sensitive applications, but if it's not your case then you can safely disable it to improve the load balance between your instances and avoid situations like the one you've described.
Disabling ARR’s Instance Affinity in Windows Azure Web Sites
It might be due to caching of network names resolution on JVM or OS level so all your requests are hitting only one server. If it is the case - add DNS Cache Manager to your Test Plan and it should resolve your issue.
See The DNS Cache Manager: The Right Way To Test Load Balanced Apps article for more detailed explanation and configuration instructions.

Resources