I have two microservices that communicate each other thru gRPC, A is the RPC client and B is the RPC server, both written in NodeJS using grpc NPM module.
Everything is working fine until, at some point in time, unexpectedly A stop being able to send requests to B, it fails because of a timeout (5s) and throw this error:
Error: Deadline Exceeded
Both microservices are Docker containers, run on AWS ECS and communicate thru AWS ELB (not ALB because it does not support HTTP2 and some other problems).
I tried to run telnet from A to the ELB of B, both from the EC2 instance and from the running ECS task (the Docker container itself) itself and it connected fine, but still, the NodeJS application in A cannot reach the NodeJS application in B using the gRPC connection.
The only way to solve it is to stop and start the ECS tasks and then A succeed to connect to B again (until the next unexpected time the same scenario is reproduced), but it's not a solution of course.
Do anyone faced with that kind of issue?
Do you use unary or streaming API? Do you set any deadline?
gRPC deadline is per-stream, so in case of streaming when you set X milliseconds deadline, you'll get DEADLINE_EXCEEDED X milliseconds after you opened a stream (not send or receive any messages!). And you'll keep getting it forever for this stream, the only way to get rid of it is reopening a stream.
I have found that I need to create both a new stub, but also re-create the connection after some errors in order to get it to reconnect. (Also running in ECS)
Related
I am using ioredis to create a redis client in Node for an appservice that is deployed in Azure. The code looks something like this, enter image description here
I am creating one connection per instance which will live forever until there is a problem during which it will try to re-connect. I am trying to understand if there is any idle timeout configuration in ioredis that will close these connections.
I am using ioredis to create a redis client in Node for an appservice that is deployed in Azure. The code looks something like this, enter image description here
I am creating one connection per instance which will live forever until there is a problem during which it will try to re-connect. I am trying to understand if there is any idle timeout configuration in ioredis that will close these connections. As per the microsoft documentation here,enter image description here some client libraries send a ping to keep the connection open, I am not sure if ioredis is one of them
Currently working on a microservice architecture where I have one particular microservice that will have a specific mechanism :
Receiving a Request saying it needs some data
Sending Status 202 - Accepted To Client
Generating Data and Saving it to a redis instance
Receiving a Request to see if data is ready
Data is not ready in redis instance : Sending status 102 To Client
Data is ready in redis instance : sending it back
The first point works fine with this kind of code :
res.sendStatus(202)
processData(req)
But I have different behavior Locally and when hosted on Cloud Run for the second point.
Locally, the 2nd request is not handled while the first one process is not ended and I presumed it was normal on a threading perspective.
Is there something that might be used to make express still handle the other request while the first one is sent to the client but the process is not ended ?
But considering that Google Cloud Run is based on instances and auto-scaling, I thought that well, the first one is locked because the process is not ended ? No problem ! A new one will come and handle the other request that will then check the redis instance key status.
It seems that I was wrong ! When I do the call to check the status of the data, if the data is not yet done, Cloud run send me back this error (502 Gateway) :
upstream connect error or disconnect/reset before headers. reset reason: protocol error
However, I don't have any res status to 502 so it seems that either Cloud Run or Express send this itself.
My only option would be to split my Cloud Run instance into a Cloud Function + a Cloud Run. The Cloud run would trigger the process in a Cloud Function but I'm pretty short on time so if I don't have any other option I will have to do that but I would hope to be able to manage it without introducing a new Cloud Function
Do you have any explanation about the fact that id doesn't worky locally and on Cloud run ?
My considerations are not convincing me and I don't find any truth :
Maybe a client can't do 2 request at the same time : Which seems not logical
Maybe express can't handle several request at the same time : Which does not seems logical to me
Any clues that seems more plausible ?
I'm currently deploying my Socket.IO server with Node.js/Express on Google Cloud Platform using Cloud Build + Run, and it works pretty well.
The issue I'm having is that GCP automatically times out all Socket.IO connections after 1 hour, and it's really annoying. The application I'm running forces it to be run in the background for hours on end, with multiple people in each socket room and interacting with it a bit every 30 mins to 1 hour.
That's why I have 2 questions:
How can I gracefully handle these timeouts? I have a reconnection process setup on my client, checking if the socket is connected every 5 seconds, but for some reason it can't detect when these timeouts happen and I'm not sure why.
Is there a better platform I can deploy my Socket.IO server on? I don't like the timeouts that GCP sets - would a platform like Digital Ocean or Azure be better?
Cloud Run has a max timeout of 3600s to handle the requests, whatever the protocol (HTTP, HTTP/2, streaming or not). If you need to maintain longer the connexion, Cloud Run isn't the correct platform for this.
I could recommend you to have a look to App Engine Flex or to Autopilot. On both, you have longer timeout and the capacity to run jobs in background. And both accept containers.
Background
We have a server that has socket.io 2.0.4. This server receives petitions from a stress script that simulates clients using socket.io-client 2.0.4.
The scrip simulates the creation of clients ( each client with its own socket ) that sends a petition and immediately dies after, using socket.disconnect();
Problem
During the first few of seconds all goes well. But every test reaches a point in which the script starts spitting out the following error:
connect_error: Error: websocket error
This means that the clients my script is creating are not connecting to the server because they are unable to connect.
This script creates 7 clients per second ( spaced evenly throughout the second ), each client makes 1 petition and then dies.
Research
At first I thought there was an issue with file descriptors and limits imposed by UNIX, since the server is in a Debian machine:
https://github.com/socketio/socket.io/issues/1393
After following these suggestions, the issue remained however.
Then I though maybe my test script was not connecting correctly, so I changed the connection options as in this discussion:
https://github.com/socketio/socket.io-client/issues/1097
Still, to no avail.
What could be wrong?
I see the machine's CPU's are constantly at 100% so I guess I am pounding the server with requests.
But if I am not mistaken, the server should simply accept more requests and process them when possible.
Questions
Is there a limit to the amount of connections a socket.io server can handle?
When making such stress tests one needs to be aware of protections and gate keepers.
In our case, our stack was deployed in AWS. So first, the AWS load balancers started blocking us because they thought the system was being DDOSed.
Then, the Debian system was getting flooded and it started refusing connections with SYN_FLOOD.
But after fixing that we were still having the error. Turns out we had to increase TCP connection's buffer and how TCP connections were being handled in the kernel.
Now it accepts all connections, but I wish no one the suffering we went through to find it out...
I am working on a webRTC application where a P2P connection is established between a Customer and free agents .The agents are fetched using AJAX call in the application.I want to scale the application such that if the agents are running on any node server they are able to have a communication mechanism and update status on agent(available,busy,unavailable)can be performed.
My problem statement is that the application is running on 8040 and agentsservice is running on 8088 where the application is making ajax calls and bringing the data.What best can be done to scale the agents or any idea about how to scale the application.
I followed https://github.com/rajaraodv/redispubsub using Redis pub/sub but my problem is not resolved as the agents are being updated , fetched on another node using ajax calls .
You didnt gave enough info... but to scale your nodejs app you need a centeral place which will hold all the info that needed and than can scale redis can scale easily, youc can try socket.io etc..
now after you have your cluster of redis for example you need to make all your node.js to communicate with the redis server that way all you nodes server will have access to same info, now its up to you to send to right info to right clients
Message Bus approach:
Ajax call will send to one of the nodejs servers. If the message doesn't find its destination in that server, it will be sent to the next one, and so one. So signaling server must distribute the received message to all the other nodes in the cluster by establishing a Message Bus