RabbitMQ cluster fails when one node is not reachable - node.js

I created a RabbitMQ cluster via Docker and Docker Cloud. I am running two RabbitMQ container on two separate nodes (both hosted on AWS).
The output of rabbitmqctl cluster_status is:
Cluster status of node 'rabbit#rabbitmq-cluster-2' ...
[{nodes,[{disc,['rabbit#rabbitmq-cluster-1','rabbit#rabbitmq-cluster-2']}]},
{running_nodes,['rabbit#rabbitmq-cluster-1','rabbit#rabbitmq-cluster-2']},
{cluster_name,<<"rabbit#rabbitmq-cluster-1">>},
{partitions,[]}]
However, when I am stopping one container/node, then my messages cannot get delievered and get queued in .dlx
I am using senecajs with NodeJS.
Did anybody have the same problems and can point me into a direction?

To answer my own question:
The problem was that Docker, after starting, caches the DNS and is not
able to connect to a new one. So if one cluster fails, Docker still
tries to connect to the one, instead of trying a new one.
The solution was to write my own function when connecting to the RabbitMQ. I first check with net.createConnection if the host is online. If yes, I connect to it, if not I try a different one.
Every time a RabbitMQ node is down, my service fails, restarts and calls the "try this host" function.

Related

Splitting read & write to redis with nodejs

I have setup redis on three seperate instances and have configured them in such a way that 1 instance is a master and 2 are replicas of master. I have used sentinels to make sure there is high availability of the setup. I have a nodejs application which needs to use the redis. How do i achieve the read and write splitting in my application as incase my redis master goes down one of my read replica becomes the master and the writes need to go to it.
As far has I know, ioredis is the only node redis client that supports sentinels.
"ioredis guarantees that the node you connected to is always a master even after a failover. When a failover happens, instead of trying to reconnect to the failed node (which will be demoted to slave when it's available again), ioredis will ask sentinels for the new master node and connect to it. All commands sent during the failover are queued and will be executed when the new connection is established so that none of the commands will be lost."

AWS EBS runs into "504 Gateway Time-out"

I'm new to using AWS EBS and ECS, so please bear with me if I ask questions that might be obvious for others. To the issue:
I've got a single-container Node/Express application that runs on EBS. The local docker container works as expected. On EBS, I can access one endpoint of the API and get the expected output. For the second endpoint, which runs longer (around 10-15 seconds) I get no response and run after 60 seconds into a time out: "504 Gateway Time-out".
I wonder how I would approach debugging this as I can't connect to the container directly? Currently there isn't any debugging functionality in the code included either as I'm not sure what the best node approach for a EBS container is - any recommendations are highly appreciated.
Thank you in advance!
You can see the EC2 instances running on EBS in your AWS, and you can choose to give them IP addresses in your EBS options. That will let you SSH directly into them if you need to.
Otherwise check the keepAliveTimeout field in your server (the value returned by app.listen() of you're using express).
I got a decent number of 504s when my Node server timeout was less than my load balancer timeout.
Your application takes longer than expected (> 60 seconds) to respond, so either nginx or the Load Balancer terminates your request.
See my answer here

Keep connection alive using --sysctl with Docker run

I currently have a container that is running node services. There code running creates a subscription to the Salesforce Change data capture event bus using CometD. This has been working well but after some time the service will stop receiving events from the Salesforce.
I am thinking this is happening because the alpine Linux container could be marking the connection as broken after data is not received after a while. I have verified that the CometD libraries are creating a connection with keep alive set as true.
Right now I am trying to increase the keep-alive time by running the container with the command:
docker run --sysctl net.ipv4.tcp_keepalive_time=10800 --sysctl net.ipv4.tcp_keepalive_intvl=60 --sysctl net.ipv4.tcp_keepalive_probes=20 -p 80:3000 <imageid>
My thinking behind this is:
net.ipv4.tcp_keepalive_time=10800
This means that the keepalive routines wait for three hours (1080 secs)
before sending the first keepalive probe
net.ipv4.tcp_keepalive_intvl=60
Resend the prob every 60 seconds
net.ipv4.tcp_keepalive_probes=20
If no ACK response is received for 20 consecutive times, the connection is
marked as broken.
I guess what I am asking is if this is the correct way to go about running the docker container so that sysctl will run with the settings I have passed in.
I am new to docker, so, I'm sure I did something that doesn't make sense. Thank you for any suggestions.

Kubernetes drops HTTP connection initialized by node.js / postgres

I have a very simple piece of code written in node.js which runs on Kubernetes and AWS. The app just does POST/GET request to create and get data from other services. service1-->service2->service3
Service1 get post request and call service2, service2 calls postgres DB (using sequlize) and create a new row and then call service3, service3 get data from the DB and returns the response to service2, service2 returns the response to service1.
Most of the times it works, but once in 4-5 attempts + concurrency, it dropped and I got a timeout. the problem is that the service1 receives the response back (according to the logs and network traces) but it seems that the connection was dropped somewhere between the services and I got a timeout (ESOCKETTIMEDOUT).
I've tried to use to replace request.js with node-fetch
I've tried to use NewRelic/Elastic APM
I've tried to use node -prof and analyze it with node --prof-process with no conclusions.
Is it possible Kubernetes drops my connection?
Hard to tell without debugging but since some connections are getting dropped when you add more load + concurrency it's likely that you need more replicas on your Kubernetes deployments and possibly adjusts the Resources on your container pod specs.
If this turns out to be the case you can also configure an HPA (Horizontal Pod Autoscaler) to handle your load.

How to fail over node.js timer on amazon load balancer?

I have setup 2 instance under aws load balancer. I have deployed node.js web services + mongodb in both instance. load balancer works fine with web services.
But, Problem is I have one timer service (node.js service only). the behavior of this timer is updating my mongodb based on some calculation.
My problem is, I must need to run this timer service (timer.js) at only one aws instance (out of 2) at same time. and expected that if one aws instance goes down then timer service at other instance will come up.
i know elb not providing this kind of facility.Can any one please help me to make it done ?
Condition : At a time only one timer service must be run with amazon load balancer.
Thanks.
You would have to implement this yourself using a locking algorithm using a shared data store that supports atomic operations
Alternatively, consider starting a "timer" server in an Auto Scale Group of Min:1, Max: 1 so Amazon keeps it running. This instance can be a t2.micro which is very cheap. It can either run the job itself, or just make an http request to your load balancer to run the job at the desired internal. If you so that, only one of your servers will run each job
Wouldn't it make more sense to handle this like any other "service" that needs to keep running?
upstart service
running node.js server using upstart causes 'terminated with status 127' on 'ubuntu 10.04'
This guy had a bad path in his file but his upstart script looks okay
monit
Node.js (sudo) and monit

Resources