Am using ActiveMQ for message publish/subscribe where it accepts the connection from the publisher. But sometime all of sudden its not accepting any connection from publisher client though the service is still running. And telnet also works fine in that time for the ActiveMQ port, but I had to restart ActiveMQ service to make the connection happen from publisher client to publish message.
During that I had done the basic checks like CPU,Mem usage, load average and IO wait and all are fine. Any idea how can we troubleshoot to find out what is causing the connectivity issue?
Related
I have created a simple GraphQL Subscription using Nest.js/Apollo GraphQL over Node.js. My client application which is a react.js/apollo client works find with the server. The client subscibes to the server via GraphQL similar to:
subscription
{
studentAdded
{
id
}
}
My problem is that it works only locally. When I deploy my server back-end to a hosted docker over internet, client won't receive data anymore.
I have traced the client, it sends GET request on ws://api.example.com:8010/graphql and receives the successful HTTP/1.1 101 Switching Protocols response. However, nothing is received from server like when the server was on my local machine. Checking the remote server log showed me that the client successfully connects to server. There, I can see onConnect log messages.
Now I need any guidance to solve the problem.
I check several things myself. Firstly, I thought WebSocket address is prohibited in the network but then realized that it is on same port as normal HTTP. Secondly, supposed that WebSocket messages/frames are transmitted over UDP but I was not correct, it is over TCP and no need to worry about network settings.
Additionally I have read several github threads and StackOverflow questions. But did not find any clue. I am not directly using Node.js/WebSocket, instead, I am using Nest.js/GraphqQL subscription. It has made my search tougher.
Your help is highly appreciated.
I am working on a nodejs app with Socket.io and I did a test in a single process using PM 2 and it was no errors. Then I move to our production environment(We use Google Cloud Compute Instance).
I run 3 app processes and a iOS client connects to the server.
By the way the iOS client doesn't keep the socket connection. It doesn't send disconnect to the server. But it's disconnected and reconnect to the server. It happens continuously.
I am not sure why the server disconnects the client.
If you have any hint or answer for this, I would appreciate you.
That's probably because requests end up on a different machine rather than the one they originated from.
Straight from Socket.io Docs: Using Multiple Nodes:
If you plan to distribute the load of connections among different processes or machines, you have to make sure that requests associated with a particular session id connect to the process that originated them.
What you need to do:
Enable session affinity, a.k.a sticky sessions.
If you want to work with rooms/namespaces you also need to use a centralised memory store to keep track of namespace information, such as the Redis/Redis Adapter.
But I'd advise you to read the documentation piece I posted, things might have changed a bit since the last time I've implemented something like this.
By default, the socket.io client "tests" out the connection to its server with a couple http requests. If you have multiple server requests and those initial http requests don't go to the exact same server each time, then the socket.io connect will never get established properly and will not switch over to webSocket and it will keep attempting to use http polling.
There are two ways to fix this.
You can configure your clients to just assume the webSocket protocol will work. This will initiate the connection with one and only one http connection which will then be immediately upgraded to the webSocket protocol (with socket.io running on top of that). In socket.io, this is a transport option specified with the initial connection.
You can configure your server infrastructure to be sticky so that a request from a given client always goes back to the exact same server. There are lots of ways to do this depending upon your server architecture and how the load balancing is done between your servers.
If your servers are keeping any client state local to the server (and not in a shared database that all servers access), then you will need even a dropped connection and reconnect to go back to the same server and you will need sticky connections as your only solution. You can read more about sticky sessions on the socket.io website here.
Thanks for your replies.
I finally figured out the issue. The issue was caused by TTL of backend service in Google Cloud Load Balancer. The default TTL was 30 seconds and it made each socket connection tried to disconnect and reconnect.
So I updated the value to 3600s and then I could keep the connection.
We have a weird networking issue.
We have a Hyperledger Fabric client application written in Node.js running in Kubernetes which communicates with an external Hyperledger Fabric Network.
We randomly get timeout errors on this communication. When the pod is restarted, all goes good for a while then timeout errors start, sometimes randomly fixed on its own and then goes bad again.
This is Azure EKS, we setup a quick Kubernetes cluster in AWS with Rancher and deployed the app there and same timeout error happened there too.
We ran scripts in the same container all night long which hits the external Hyperledger endpoint both with cURL and a small Node.js script every minute and we didnt get even a single error.
We ran the application in another VM as plain Docker containers and there was no issue there.
We inspected the network traffic inside container, when this issue happens, we can see with netstat a connection is established but tcpdump shows no traffic, no packages are even tried to be sent.
Checking Hyperledger Fabric SDK code, it uses gRPC protocol buffers behind the scenes.
So any clues maybe?
This turned out to be not Kubernetes but dropped connection issue.
gRPC keeps connection open and after some period of inactivity intermediary components drop the connection. In Azure AKS case this is the load balancer, as every outbound connection goes through a load balancer. There is a non configurable idle timeout period of 4 minutes after which load balancer drops the connection.
The fix is configuring gRPC for sending keep alive messages.
Scripts in the container worked without a problem, as they open a new connection every time they run.
Application running as plain Docker containers didnt have this issue since we were hitting endpoints every minute hence never reaching idle timeout threshold. When we hit endpoints every 10 minutes, timeout issue also started there too.
I was working fine with cloudamqp until all of a sudden wascally/rabbot stopped being able to connect to my endpoint. I have installed RabbitMQ locally and my system works fine. I have since then tried to setup a RabbitMq instance on Heroku via bigwig, to no avail. The endpoints I'm using should be fine and I also installed amqp.node and node-amqp to test if maybe it was a problem with rabbot. However none of these can connect either.
Any idea what the problem can be?
the most common cause is connection timeout. with all my wascally code, hosting on cloudamqp (with heroku, digital ocean or otherwise), i have to set a connection timeout much higher than the default for it to work.
this can be done with the connection_timeout parameter on the connection string url (https://www.rabbitmq.com/uri-query-parameters.html)
for example:
var conn = "amqp://myuser:mypassword#server.cloudamqp.com/my-vhost?connection_timeout=30"
this will set a connection timeout of 30 seconds
I have couchdb running on a Linux Ubuntu 14.04 VM and a .net Web application running under Azure Web Apps. Under our ELMAH logging for the web application I keep getting intermittent errors:
System.Net.Sockets.SocketException
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond [ipaddress]:5984
I've checked the CouchDB logs and there isn't a record of those requests so I don't believe it's hitting the CouchDB server, I can confirm this by looking at the web server logs on Azure and see the Error 500 response. I've also tried a tcpdump however with little success (another issue logging tcpdump to a separate disk keeps failing due to access denied)
We've previously ran CouchDB on a Windows VM with no issues so I wonder if the issue relates to the OS connection settings for tcp and timeouts
Anyone have any suggestions as to where to look or what immediately jumps to mind?