Node.js and Socket.io become unresponsive - node.js

I have a relatively simple chat-type application running on Node.js and Socket.io. The node server streams chat data from a Minecraft server and then streams this to any clients connected on the website using Socket.io. A working demo of the system can be found here: standardsurvival.com/chat.
It works decently fine for the most part, but every once in a while the node server stops responding and active connections die shortly thereafter. The process will start consuming 100% CPU during this time but memory always stays relatively constant, so I'm doubting any sort of memory leak is involved.
It's been super frustrating as I haven't been able to reproduce the issue consistently enough to figure out what the problem is, and I don't know where to look. I've been setting up loops and commenting out various parts of the pipeline between the node server and website to try and pinpoint what may be causing it. No luck as of yet.
The code behind this system can be found here and here.
Any ideas?

Well I ended up figuring out what the problem was. A library I'm using was opening net.Sockets for standard HTTP requests to the Minecraft server, but was never actually closing them. Apparently the "end" event was never called when the request finished. So eventually all available file handles for the process were being used up and causing new requests to outright fail, which made the server appear to stop responding. I would have found this out sooner if I logged this error. Lesson learned.
I added a timeout to all sockets to fix this at least temporarily. Now the server has been running for days without a single issue :)

Related

EADDRINUSE: address already in use :::5000, happens every time I run my node/react apps

Pretty much every time I run my node/react apps I get this error on the port where I'm running the node server. I know how to solve it temporarily by finding the PID with lsof -i:5000 and then using the kill command to shut it down.
However I'm really tired of having to this every single time I'm running one of my node/react apps locally, since if I ignore the error, my apps won't work properly. Why does this keep happening? What can I do to make the error stop coming back all the time?
I'm putting this out as a new question since I haven't found an answer that deals with how to stop the error from constantly coming back, rather than to just clear the error once. Big thanks in advance!

Garbage collection causes lag on connected sockets (NodeJS Server)

I am hosting a game website on Heroku that runs on a NodeJS Server. Clients are connected via sockets using the package socket.io.
Once in a while when the garbage collection cycle is triggered, connected clients would experience severe lag and often, disconnections. This is experienced by the clients through delayed incoming chat and delayed inputs to the game.
When I look into the logs, I find error messages relating to the Garbage Collection. Please see the attached logs below. When these GC events happen, sometimes it causes massive memory spikes to the point where the app would exceed it's allowed 0.5GB RAM and would be killed by Heroku. Lately however, the memory spikes don't occur as often, but the severe lag on the client side still happens around once or twice a day.
One aspect of the lag is through the chat. When a user types a message through "All Chat" (and any chat channel), the server currently console.log()'s it to the standard out. I happened to be watching the logs live one time during a spike event and noticed that chat being outputted to the terminal was in real time with no delay, however clients (I was also on the website myself as a client) received these messages in a very delayed fashion.
I have found online a NodeJS bug (that I think was fixed) that would cause severe lag when too much was being console.loged to the screen so I ran a stress test by sending 1000 messages from the client per second, for a minute. I could not reproduce the spike.
I have read many guides on finding memory leaks, inspecting the stack etc. but I'm very unsure how to run these tests on a live Heroku server. I have suspicions that my game objects on closing, are not being immediately cleared out and are all being cleared at once, causing the memory spikes, but I am not confident. I don't know how to best debug this. It is also difficult for me to catch this happening live as it only happens when more than 30+ people are logged in (Doesn't happen often as this is still a fairly small site).
The error messages include references to the circular-json module I use, and I also suspect that this may be causing infinite callbacks on itself somehow and not clearing out correctly, but I am not sure.
For reference, here is a copy of the source code: LINK
Here is a snippet of the memory when a spike happens:
Memory spike
Crash log 1: HERE
Crash log 2: HERE
Is there a way I can simulate sockets or simulate the live server's environment (i.e. connected clients) locally?
Any advice on how to approach or debug this problem would be greatly appreciated. Thank you.
Something to consider is that console.log will increase memory usage. If you are logging verbosely with large amounts of data this can accumulate. Looking quickly at the log, it seems you are running out of memory? This would mean the app starts writing to disk which is slower and will also run garbage collection spiking CPU.
This could mean a memory-leak due to resources not being killed/closed and simply accumulating. Debugging this can be a PITA.
Node uses 1.5GB to keep long-live objects around. Seems like you on a 500mb container so best to configure the web app to start like:
web: node --optimize_for_size --max_old_space_size=460 server.js
While you need to get to the bottom of the leak, you can also increase availability by running more than one worker and also more than one node instance and use socket.io-redis to keep the instance is in sync. I highly recommend this route.
Some helpful content on Nodejs memory on Heroku.
You can also spin up multiple connections via node script to interact with your local dev server using socket.io-client and monitor the memory locally and add logging to ensure connections are being closed correctly etc.
I ended up managing to track my "memory leak" down. It turns out I was saving the games (in JSONified strings) to the database too frequently and the server/database couldn't keep up. I've reduced the frequency of game saves and I haven't had any issues.
The tips provided by Samuel were very helpful too.

Node.js on Openshift: Keep getting timeouts even with low load and HAProxy

I have a node app on openshift and am now constantly suffering from timeouts even though the logs show a lot of idling around. I even have started a second gear and am using HAproxy. Any ideas what could be done? Or how to even debug the problem?
It just suddenly started to happen, without any changes from me to the system. No additional traffic either and the database functions fine and is definitely not the bottleneck.

How to debug what keeps NodeJS process alive

I have some serious bunch of asynchronous operations running, but NodeJS process is just not exiting when supposedly all have been done. Can I somehow find out what keeps it running? Can I see heap stack of running process somehow? Or can you give me tips what are the most usual causes of such idlings?
I don't have any kind of server running there, but I am using async.nextTick quite extensively which basically uses setImmediate. I am not sure if this can somehow get stuck. Also there are no connections to any kind of database or remote server. It's just process that does some work on file system.
Maybe there is some recursive loop, but I have tried using node-inspector and paused execution after it was stuck and it didn't showed me any point in code where it would hanging.
Take a look at process._getActiveHandles() and process._getActiveRequests()

socket.io becoming slow and unresponsive after node server is left running for a couple days

I've been developing publishing features for my website using socket.io on a node server. I've been having issues over the past month or so with the socket connections becoming painfully slow or altogether unresponsive after only a couple days running. The server is not out of memory. I'm not fairly familiar with debugging this kind of issue.
The socket.io logs were not telling me much beyond "websocket connection invalid" or "client not handshaken 'client should reconnect'"
I had googled around and eventually saw a thread recommending running netstat in the command line and saw a large number of connections in FIN_WAIT2 and CLOSE_WAIT and figured that was the cause of my issue. After looking at some threads on the socket.io github related recommended upgrading to branch 0.9.14 (I had been running 0.9.13 at the time).
I have since done so and am still having periods of 'downtime' when the server has only been running for a few days straight. My site does not get anywhere near the amount of traffic where this should be an issue.
A new error has started popping up in my logs (websocket parser error: no handler for opcode 10), but my googling has turned up squat on the issue. I am not sure where to turn to resolve this issue or if I am simply after a red herring and the real issue is something else that one of you may be able to help me shed some light on.
I am running node.js v0.10.10 and using socket.io v0.9.14. A hard reboot of the linux server will resolve the issue 100% of the time whereas a restart of the node service does not, which is what has led me to believe it is an issue related to open sockets on the server.
You are probably experiencing a known bug in node.js that was recently fixed - see issue#5504.
Is the problem still there after upgrading to node v0.10.11?

Resources