Unexplained Node.js 504s - node.js

We're running Node (v0.10.38) with Express (4.0.0), proxied through nginx (1.2.1), which usually works great. Recently, however, we switched to a new server setup. Now, roughly 30 minutes after starting up the server, the server starts returning 504s (Gateway Timeout). Accessing Node directly from the server (bypassing nginx) also times out. Every so often, we got a series of ETIMEDOUT errors from redis, but connecting to the redis server from the server works from the command line. Furthermore, the server started returning 504s even before redis errors came up anyways. Anyways, after updating our redis middleware (connect-redis) to the newest version, these errors stopped, but the 504s still occurred. However, after disabling the connection to redis in our code for 10 hours, no 504 occurred. We've tried sending a redis ping periodically to prevent the error, believing that to be the cause, but 504s continue. When not connecting to redis, the server doesn't 504, so it is likely tied to redis in some way. Anything else we can try?
Sorry if there's not much to work with. We don't have that much either, and are eager to solve this issue as soon as possible. If there's any more specifics needed, I can update the question. Thank you.

Still don't know the root cause, but we ended up fixing this by pinging Redis every minute so that the connection wouldn't get killed.

Related

Nodejs app disconnects from mongodb randomly

I have a nodejs/ExpressJS/Mongodb/Mongoose app hosted on aws elasticbeanstalk.
The problem is elasticbeanstalk health degrades randomly ( no specific times ), that happens because any request that requires database interaction results the following in logs:
*1360931 upstream timed out (110: Connection timed out) while reading response header from upstream
This happens no matter how much data I try to load. it happens with the least amount of data, this can last from a minute up to 20 minutes and it works again on its own, it is completely random.
And I can force it to work immediately by restarting the environment ( I connect to mongodb using connection string on app startup ).
While other requests that don't require database interaction work 100%.
The thing is while database queries aren't working , I can connect to the same database from localhost and database requests work like a charm, they even work really fast.
What is even more strange is I have 4 other identical apps with the same setup, and This situation doesn't occur with any of them, only this app faces this problem !
What is the problem here ?
The above error usually means that your server closed the connection due to shorter timeout , but your application is not aware about it. You may need to check your connection string and modify the timeouts maybe decrease them , example connection string:
MONGO_URI=mongodb://user:password#127.0.0.1:27017/dbname?keepAlive=true&poolSize=30&autoReconnect=true&socketTimeoutMS=360000&connectTimeoutMS=360000

Is there a hard limit on socket.io connections?

Background
We have a server that has socket.io 2.0.4. This server receives petitions from a stress script that simulates clients using socket.io-client 2.0.4.
The scrip simulates the creation of clients ( each client with its own socket ) that sends a petition and immediately dies after, using socket.disconnect();
Problem
During the first few of seconds all goes well. But every test reaches a point in which the script starts spitting out the following error:
connect_error: Error: websocket error
This means that the clients my script is creating are not connecting to the server because they are unable to connect.
This script creates 7 clients per second ( spaced evenly throughout the second ), each client makes 1 petition and then dies.
Research
At first I thought there was an issue with file descriptors and limits imposed by UNIX, since the server is in a Debian machine:
https://github.com/socketio/socket.io/issues/1393
After following these suggestions, the issue remained however.
Then I though maybe my test script was not connecting correctly, so I changed the connection options as in this discussion:
https://github.com/socketio/socket.io-client/issues/1097
Still, to no avail.
What could be wrong?
I see the machine's CPU's are constantly at 100% so I guess I am pounding the server with requests.
But if I am not mistaken, the server should simply accept more requests and process them when possible.
Questions
Is there a limit to the amount of connections a socket.io server can handle?
When making such stress tests one needs to be aware of protections and gate keepers.
In our case, our stack was deployed in AWS. So first, the AWS load balancers started blocking us because they thought the system was being DDOSed.
Then, the Debian system was getting flooded and it started refusing connections with SYN_FLOOD.
But after fixing that we were still having the error. Turns out we had to increase TCP connection's buffer and how TCP connections were being handled in the kernel.
Now it accepts all connections, but I wish no one the suffering we went through to find it out...

Heroku auto restart dyno on H12 Request timeout errors

We have a node dyno processing small API requests, ~10/second. All requests complete in under 0.5s
Once every few days, dyno starts giving H12 Request timeout errors on all requests. We couldn't discover the cause. Restarting fixes it.
How to make Heroku automatically restart the dyno on a H12 Request timeout threshold, e.g. more than 5/second?
As ryan said H12 Request timeout means that Heroku's load balancers are sending a request to your app but not getting a response in time (heroku has a max response time of 30 seconds). Sometimes a request is just intense to calculate or an inefficient DB query is delaying the response.
Yet the root of the problem does not necessary mean an application error on your side.
In our case we have multiple web dynos handling requests in parallel. Now and then one of those dynos produces H12 (timeouts) while all others are running flawless. So we can completely rule out all application problems. A restart of the affected dyno helps with a high probability, as your application lands on a different physical server whenever it is restarted (at least with a high probability).
So Heroku has "bad servers" in their rotation! And now and then your code will land on one of those bad servers. I cannot say if one has a "noisy neighbor" problem. I also asked Heroku how to prevent that and the only response that I got was to pay for dedicated performance dynos, which is quite dissatisfying...
H12 Request timeout means that Heroku's load balancers are sending a request to your app but not getting a response.
This can happen for lots of reasons, since the app is already working you can likely rule out configuration issues. So now you are looking at the application code and will have to inspect to the logs to understand whats happening. I'd suggest using one of their logging apps like papertrail so you can have a history of the logs when this happens.
Some things it could be, but not limited to:
Application crashing and not restarting
Application generating an error, but no response being sent
Application getting stuck in event loop preventing new request
Heroku provides some documentation around the issue that might help in debugging your situation
https://devcenter.heroku.com/articles/request-timeout
https://help.heroku.com/AXOSFIXN/why-am-i-getting-h12-request-timeout-errors-in-nodejs

Meteor's Remote Database Connection Timeout and Reconnect

Does Meteor have a setting to timeout and retry if its MongoDB does not give a response in x seconds? Wondering if anyone has tried this.
I am interested in running a MongoDB database remote to the Meteor production app. The Meteor-to-Mongo connection will be quick, just 3-9 milliseconds away, but I also want to understand how Meteor (and NodeJS) would react to a brief network outage. Would the app hang while waiting for a long timeout period? How can I force a 1 second timeout/retry to avoid a hang?
You can specify timeout in the mongo URL:
MONGO_URL=mongodb://host:port/db?connectTimeoutMS=60000&socketTimeoutMS=60000
but let's say you have a network outage, what does a short timeout give you?
your app will hang anyways...
To get high availability, look into replica sets.
https://docs.mongodb.com/manual/tutorial/deploy-replica-set/

Nodejitsu and Twitter Streaming API - multiple disconnects

Hi I've been struggling with this issue for a few days now. I have a simple node.js app that connects to Twitter's streaming API and tracks a few terms. As a term is found the client side gets a websocket notification. I've made sure that my OAuth credentials are only used by this app and that the connection to the streaming API occurs only on app start up. What keeps happening is I get a 200 ok response but the stream then disconnects. I have it set to reconnect in 30 seconds but it's becoming ridiculous. It seems to be fine for a few minutes after restarting the app and then goes back to repeatedly disconnecting. The error is {"disconnect":{"code":7,"stream_name":"XXXXX-statuses158325","reason":"admin logout"}}. I have ran the same app locally with multiple client connections and not had a problem. I looked into other hosting services but I can't find one that supports websockets without having to revert to a slow long polling option on socket.io (which won't work for my app's purposes).
Any ideas for why this keeps happening?
that error means that you're connecting again with the same credentials (https://dev.twitter.com/discussions/11251).
One cause might be running more than 1 drone.
If this doesn't help, join us on http://webchat.jit.su and we'll do our best to help you :D
-yawnt

Resources