Node.JS WebSocket High Memory Usage - node.js

We currently have a production node.js application that has been underperforming for a while. Now the application is a live bidding platform, and also runs timed auctions. The actual system running live sales is perfect and works as required. We have noticed that while running our timed sales (where items in a sale have timers and they incrementally finish, and if someone bids within the last set time, it will increment the time up X amount of seconds).
Now the issue I have found is that during the period of a timed sale finishing (which can go on for hours) if items have 60 seconds between each lots and have extensions if users bid in the last 10 seconds. So we were able to connect via the devtools and I have done heap memory exports to see what is going on, but all I can see is that all indications point to stream writeable and the buffers. So my question is what am I doing wrong. See below a screenshot of a heap memory export:
As you can see from the above, there is a lot of memory being used specifically for this it was using 1473MB of physical RAM. We saw this rise very quickly (within 30 mins) and each increment seemed to be more than the last. So when it hit 3.5GB it was incrementing at around 120MB each second, and then as it got higher around 5GB it was incrementing at 500MB per second and got to around 6GB and then the worker crashed (has a max heap size of 8GB), and then we were a process down.
So let me tell you about the platform. It is of course a bidding platform as I said earlier, the platform uses Node (v11.3.0) and is clustered using the built in cluster library. It spawns 4 workers, and has the main process (so 5 altogether). The system accepts bids, checks other bids, calculates who is winning and essentially pushes updates to the connected clients via Redis PUB/SUB then that is broadcasted to that workers connected users.
All data is stored within redis and mysql is used to refresh data into redis as redis has performed 10x faster than mysql was able to.
Now the way this works is on connection a small session is created against the connection, this is then used to authenticate the user (which is a message sent from the client) all message events are sent to a handler which pushes it to the correct command these commands are then all set as async functions and run async.
Now this has no issue on small scale, but we had over 250 connections and was seeing the above behaviour and are unsure where to find a fix. We noticed when opening the top obejct, it was connected to buffer.js and stream_writable.js as well. I can also see all references are connected to system / JSArrayBufferData and all refer back to these, there are lots of objects, and we are unable to fix this issue.
We think one of the following:
We log to file using append mode, which logs lots of information to the console and to a file using fs.writeFile and append mode. We did some research and saw that writing to console can be a cause of this kind of behaviour.
It is the get lots function which outputs all the lots for that page (currently set to 50) every time an item finishes, so if the timer ends it will ask for a full page load for all the items on that page, instead of adding new lots in.
There is something else happening here that we are unaware of, maybe the external library we are using that may not be removing a reference.
I have listed the libraries of interest that we require here:
"bluebird": "^3.5.1", (For promisifying the redis library)
"colors": "^1.2.5", (Used on every console.log (we call logs for everything that happens this can be around 50 every few seconds.)
"nodejs-websocket": "^1.7.1", (Our websocket library)
"redis": "^2.8.0", (Our redis client)
Anyway, if there is anything painstakingly obvious I would love to hear, as everything I have followed online and other stack overflow questions does not relate close enough to the issue we are facing.

Related

Handling many connections in node.js

I am making a live app with the use of websockets (express-ws npm package) in node.js.
The users send a message via ws every 10 seconds. Each of such requests takes about 1-1.5 milliseconds to handle (I have made some .time benchmarks). Everything works perfectly while there are less than ~9000 connections. However, if it grows above that, those 9000 requests every 10 seconds take 9000*1.5=13500ms > 10s and some users do not get their requests handled (as node.js is single-threaded). This is my first live app that gets so many online users at the same time so I do not know what to do. How to handle that many connections correctly?
I have read some articles about that and I have found some solutions which do not seem to work for me (at least I do not understand how to make them working).
Use the cluster module. The problem is that the requests have to share variables. I have an array of data which is updated or read during every request and clusters, as I have read and tested, are basically another processes which cannot share memory.
The same applies to worker_threads. They can kinda share memory, but I have to set up the communication between all threads and it still comes up to handling 9000 connections in 10 seconds which are not significantly faster than 9000 connections that have been in the beginning (that 9000 requests are simply a database search and an update with a few validations whether a user is registered and has provided valid data). Probably, if I throw the validation to a worker thread, the connections limit will grow up to 13000, but it is still insufficient.
I thought of creating a separate server on an another port (probably even in c++) and send all the requests that have passed the validation there (websocket between the servers). That seems like the best solution as for now but it still comes up to handling 9000 requests in one thread which will not make it much better.
So, how do I handle that many requests that need to share a variable efficiently? How do game servers which need to update the states of thousands of players multiple times per second do that?

Find the source of waves of latency from Redis using Node.js

I am investigating some latency issues on my server, and I've narrowed it down but not enough to solve it. I'm hoping someone with more experience with Redis or Node.js can help.
Within a function that is called a few thousand times per minute, scaling up and down with web traffic, I send a GET request to my redis client to check if a process is complete. I've noticed increased latency for my web requests, and it appears as though the redis GET command is taking up the bulk of my server time. Which surprised me, as I always thought redis was wicked fast all the time. And if I look at Redis's "time spent" info, it says everything is under 700 microseconds.
That didn't jive with what I was seeing from my transaction monitoring setup, so I added some logging to my code:
const start = Date.now();
client.get(`poll:${submittedId}`, (err, res) => {
console.log(`${Date.now() - start}`);
//other stuff
})
Now my logs print the number of milliseconds spend on each redis GET. I watch that for a while, and see a surprising pattern.
Most of the time, there are lots of 1s and an occasional number in the 10s or sometimes 100s. Then, periodically, all the gets across the server slow down, reaching up to several seconds for each get to complete. Then after a while the numbers curve back down and things are running smoothly again.
What could be causing this sort of behaviour?
Things I've tried to investigate:
Like I mentioned, I've combed through redis's performance data, as presented on Heroku's redis dashboard, and it doesn't have any complaints or latency spikes.
I confirmed that all these requests are coming from a small number of connections, because I read that opening and closing too many can cause issues.
I looked into connection pooling, thinking maybe the transactions are being queued and causing a backlog, but the internet seems to say this isn't necessary for Redis and Node.
Really appreciate anyone taking the time to advise on this!
Seems like Redis blocked in bgsave. Check if
your Redis used memory is large.
use lastsave command to assure the bgsave span.
close aof always.
use slowlog to assure if other command blocked.

Calling external API only when new data is available

I am serving my users with data fetched from an external API. Now, I don't know when this API will have new data, how would be the best approach to do that using Node, for example?
I have tried setInterval's and node-schedule to do that and got it working, but isn't it expensive for the CPU? For example, over a day I would hit this endpoint to check for new data every minute, but it could have new data every five minutes or more.
The thing is, this external API isn't ran by me. Would the only way to check for updates hitting it every minute? Is there any module that can do that in Node or any approach that fits better?
Use case 1 : Call a weather API for every city of the country and just save data to my db when it is going to rain in a given city.
Use case 2 : Send notification to the user when a given Philips Hue lamp is turned on at the time it is turned on without having to hit the endpoint to check if it is on or not.
I appreciate the time to discuss this.
If this external API has no means of notifying you when there's new data, then the only thing you can do is to "poll" it to check for new data.
You will have to decide what an "efficient design" for polling is in your specific application and given the type of data and the needs of the client (what is an acceptable latency for new data).
You also need to be sure that your service is not violating any terms of service with your polling scheme or running afoul of rate limiting that may deny you access to the server if you use it "too much".
Would the only way to check for updates hitting it every minute?
Unless the API offers some notification feature, there is no other scheme other than polling at some interval. Polling every minute is fairly quick. Do your clients really need information that is less than a minute old? Or would it really make no difference if the information was as much as 5 minutes old.
For example, in your example of weather, a client wouldn't really need temperature updates more often than probably every 10-15 minutes.
Is there any module that can do that in Node or any approach that fits better?
No. Not really. You'll probably just use some sort of timer (either repeated setTimeout() or setInterval() in a node.js app to repeatedly carry out your API operations.
Use case: Call a weather API for every city of the country and just save data to my db when it is going to rain in a given city.
Trying to pre-save every possible piece of data from an external API is probably a losing proposition. You're essentially trying to "scrape" all the data from the external API. That is likely against the terms of service and will likely also run afoul of rate limits. And, it's just not very practical.
Instead, you will probably want to fetch data upon demand (when a client requests data for Phoenix, then, and only then, do you start collecting data for Phoenix) and then once a demand for a certain type of data (temperatures in a particular city) is established, then you might want to pre-cache that data more regularly so you can notify clients of changes. If, after awhile, no clients are asking for data from Phoenix, you stop requesting updates for Phoenix any more until a client establishes demand again.
I have tried setInterval's and node-schedule to do that and got it working, but isn't it expensive for the CPU? For example, over a day I would hit this endpoint to check for new data every minute, but it could have new data every five minutes or more.
Making a remote network request is not a CPU intensive operation, even if you're doing it every minute. node.js uses non-blocking networking so most of the time during a network request, node.js isn't doing anything and isn't using the CPU at all. The only time the CPU would be briefly used is when you first send the API request and then when you receive back the result from the API call and need to process it.
Whether you really need to "poll" every minute depends upon the data and the needs of the client. I'd ask yourself if your app will work just fine if you check for new data every 5 minutes.
The method I would use to update would be contained outside of the code in a scheduled batch/powershell/bash file. In windows you can schedule tasks based upon time of day or duration since last run, so what you could do is run a simple command that will kill your application for five minutes, run npm update, and then restart your application before closing the shell.
That way you're staying out of your API and keeping code to a minimum, and if your code is inside that Node package in the update, it'll be there and ready once you make serious application changes or you need to take the server down for maintenance and updates to the low-level code.
This is a light-weight solution for you and it's a method I've used once or twice at my workplace. There are lots of options out there, and if this isn't what you're looking for I can keep looking out for you.

Nodejs application memory usage tracking and clean up on exit

"A Node application is an instance of a Node Process Object".link
Is there a way in which local memory on the server can be cleared every time the node application exits.
[By application exit i mean that when each individual user of the website shuts down the tab on the browser]
node.js is a single process that serves all your users. There is no specific memory associated with a given user other than any state that you yourself in your own node.js code might be storing locally in your node.js server on behalf of a given user. If you have some memory like that, then the typical ways to know when to clear out that state are as follows:
Offer a specific logout option in the web page and when the user logs out, you clear their state from memory. This doesn't catch all ways the user might disappear so this would typically be done in conjunction with other optins.
Have a recurring timer (say every 10 minutes) that automatically clears any state from an user who has not made a web request within the last hour (or however long you want the time set to). This also requires you to keep a timestamp for each user each time they access something on the site which is easy to do in a middleware function.
Have all your client pages keep a webSocket connection to the server and when that webSocket connection has been closed and not re-established for a few minutes, then you can assume that the user no longer has any page open to your site and you can clear their state from memory.
Don't store user state in memory. Instead, use a persistent database with good caching. Then, when the user is no longer using your site, their state info will just age out of the database cache gracefully.
Note: Tracking memory overall usage in node.js is not a trivial task so it's important you know exactly what you are measuring if you're tracking this. Overall process memory usage is a combination of memory that is actually being used and memory that was previously used, is currently available for reuse, but has not been given back to the OS. You obviously need to be able to track memory that is actually in use by node.js, not just memory that the process may be allocated. A heapsnapshot is one of the typical ways to track what is actually being used, not just what is allocated from the OS.

Node js avoid pyramid of doom and memory increases at the same time

I am writing a socket.io based server and I'm trying to avoid the pyramid of doom and to keep the memory low.
I wrote this client - http://jsfiddle.net/QUDXU/1/ which i run with node client-cluster 1000. So 1000 connections that are making continuous requests.
For the server side a tried 3 different solutions which i tested. The results in terms of RAM used by the server, after i let everything run for an hour are:
Simple callbacks - http://jsfiddle.net/DcWmJ/ - 112MB
Q module - http://jsfiddle.net/hhsja/1/ - 850MB and increasing
Async module - http://jsfiddle.net/SgemT/ - 1.2GB and increasing
The server and clients are on different machines. (Softlayer cloud instances). Node 0.10.12 and Socket.io 0.9.16
Why is this happening? How can I keep the memory low and use some kind of library which allows to keep the code readable?
Option 1. You can use the cluster module and gracefully kill your workers from time to time (make sure you disconnect() first). You can check process.memoryUsage().rss > 130000000 in the master and kill the workers when they exceed 130MB, for example :)
Option 2. NodeJS has the habit of using memory and rarely doing rigorous cleanups. As V8 reaches the maximum memory limit, GC calls are more aggressive. So you could lower the maximum memory a node process can take up by running node --max-stack-size <amount>. I do this when running node on embedded devices (often with less than 64 MB of ram available).
Option 3. If you really want to keep the memory low, use weak references where it is possible (anywhere except in long-running calls) https://github.com/TooTallNate/node-weak . This way, the objects will get garbage collected sooner. Extensive tests to make sure everything works are needed, though. GL if u use this one :) https://github.com/TooTallNate/node-weak
It seems like the problem was on the client script, not on the server one. I ran 1000 processes, each of them emitting messages to the server at every second. I think the server was getting very busy resolving all of those requests and thus using all of that memory. I rewrote the client side like this, spawning a number of processes proportional to the number of processors, each of them connecting multiple times like this:
client = io.connect(selectedEnvironment, { 'force new connection': true, 'reconnect': false });
Notice the 'force new connection' flag that allows to connect multiple clients using the same instance of socket.io-client.
The part that solved my problem was actually how the requests were made: any client would make another request after a second from receiving the acknowledge of the previous request, not at every second.
Connecting 1000 clients is making my server using ~100MB RSS. I also used async on the server script which seems very elegant and easier to understand than Q.
The bad part is that I've been running the server for about 2-3 days and the memory rised at 250MB RSS. This, I don't know why.

Resources