how to use all cpu with nodejs - node.js

We have a production chat app built in socketio/nodejs.
We use express.
Nodejs is a bit old : 10.21.0
SocketIO in 3.1.1
Our computer is a VM with 4vCPU and 16 GB RAM.
We use pm2 to manage starting node app with env variables.
We are facing an issue when there are about 500 users in chat and when they write. Bandwidth usage is around 250 Mbps in upload (but we have 10G so no issue). Issue begins here, we can see in our logs full of connection/disconnection and pm2 restart app.
In checking in more details, in launching "pm2 monit" we can see that only one processor is used and it is higher than 100% most of the time.
We read few documentation about clustering (cluster + fork). It seems to be interesting but in our case when we tested it, it's like we had few chat apps so for the same "chat room", users are in different workers so it's not OK.
Do you have an idea how we can fix that and use all processor/core ?
We are already thinking of starting with upgrading nodejs?
Thanks
Niko

Since Node.js is always single-threaded (aside from worker threads), upgrading Node won't get you much anywhere (aside from newer Nodes shipping newer V8 engines, which might be faster).
it's like we had few chat apps so for the same "chat room", users are in different workers so it's not OK.
This sounds like you've architected your app to use global variables or in-process state like that for these shared rooms. If you want to use cluster or PM2's multiple process mode, that state will need to live somewhere else, maybe a second Node application or, say, a Redis server.

Related

Node cluster versus Dynos scaling

After reading a handful of articles on scaling node apps I have not yet made up my mind about when should I use node builtin cluster or simply adding more dynos.
Let me tell you I have already read the following threads on StackOverflow:
How to properly scale nodejs app on heroku using clusters
Running Node.js App with cluster module is meaningless in Heroku?
As far as I understood it, if I make use of node cluster functionality I will end up with the total memory available divided by the number of forked processes.
On the other hand, if I add one more dyno I will double the memory available.
So, what is the point of using node clusters?
It's not really an either-or situation. You can make use of multiple node cluster instances on multiple dynos. Memory isn't really what you want to look at, though, since that would be a shared resource. CPU / core usage is more relevant to clustering in node, since each node process can only make use of one CPU core at a time.
It's really going to depend on which dynos you are using, too.
Have you seen these suggestions on the official heroku docs yet?

NodeJS Performance Issue

I'm running an API server using NodeJS 6.10.3 LTS on Ubuntu 14.04 (trusty). I've noticed that my API server tops out at ~600 reqs/min running on a c4.large EC2 instance. By tops out I mean, I see the CPU go uptil 100% Note, I know that I'm not fully utilizing the instance by using the cluster module, but that's ok for now.
I took a .cpuprofile dump of my API server for 10 seconds, and noticed that every second, for ~300ms, the profiler shows my NodeJS code is sitting (idle).
Does anyone know what that (idle) implies? Is it a GC issue? Or is it a internal (to V8) lock that I'm triggering? Any help or pointers to tools to help debug this would be nice. I'm working on anonymizing some of stack traces in the cpuprofile so I can share.
The packages I'm using are ExpressJS 4, Couchbase NodeJS SDK, Socket.IO mainly. The codepaths are mainly reading requests, and pushing to Couchbase. And finally querying couchbase via Views API, and pushing some aggregated data on a Socket.IO channel. So all pretty I/O async friendly stuff. I've made sure that I'm not calling any synchronous functions. There are no patterns of function calls before the (idle) in the cpu profile.
It could also just be I/O wait, meaning none of the sockets have data ready to read yet and so the time is spent idle. If you are using a load testing library you should check that the requests are evenly distributed within a second.
Take a look at https://www.npmjs.com/package/gc-stats to check GC data. There are flags to increase heap space, and to change when GC runs, if the problem turns out to be GC related.

How to prioritize express requests/responds over other intensive server related tasks

My node application currently has two main modules:
a scraper module
an express server
The former is very server intensive task which indefinately runs in a loop. It scrapes information from over more than 100 urls, crunches the data and puts it into a mongodb database (using mongoose). This process runs over and over and over. :P
The latter part, my express server, responds to http/socket get requests and returns the crunched data which was written to the db by the scraper to the requesting client.
I'd like to optimize the performance of my server so that the express requests and responds get prioritized over the server intensive task(s). A client should be able to get the requested data asap, without having the scraper eat up all of my server resources.
I though about putting the server intensive task or the express server into its own thread, but then I stumbled upon cluster, and child processes; and now I'm totally confused which approach would be the right one for my situation.
One of the benefits I'm having is that there is a clear seperation between the writing part of my application and the reading part. The scraper writes stuff to the db, express reads from the db (no post/put/delete/...) calls are exposed. So, I -guess- I won't run into threading problems with different threads trying to write to the same db.
Any good suggestions? Thanks in advance!
Resources like cpu and memory required by processes are managed by the operative system. You should not waste your time writing that logic within your source code.
I think you should look at the problem from outside your source code files. Once they ran they are processes. Processes are managed, as I said, by the OS.
Firstly I would split that on two separate commands.
One being the scraper module (eg npm run scraper, that runs something like node scraper.js).
The other one being your express server (eg npm start, that runs something like node server.js).
This approach will let you configure that within your OS or your cluster.
A rapid approach for that will be to use docker.
With two docker containers running your projects with cpu usage limitations. This is fairly easy to do and does not require for you to lift a new server... and at the same time it provides the
isolation level you need to scale it to many servers in the future.
Steps to do this:
Learn a little about docker and docker compose and install them in your server
Build a docker image for your application (you can upload it to a free private image that docker hub gives you for free)
Build a docker compose for your two services using that image, with the cpu configuration you need (you can set both cpu and memory limits easily)
An alternative to that would be running the two commands (npm run scraper and npm start) with some tool like cpulimit, nice/niceness and ionice, or something else like namespaces and cgroups manually (but docker does that for you).
PD: Also, I would recommend to rethink your backend process. Maybe it's better to run it every 12 hours or something like that, instead of all the time, and you may run it from within cron instead of a loop.

Which Node.js Concurrent Web Server is best on Heroku?

I have just learned about Heroku and was pretty much excited to test it out. Ive quickly assembled their demo's with Node.js Language and stumbled across a problem. When running the application locally, apache benchmark prints roughly about 3500 request/s but when its on the cloud that drops to 10 request/s and does not increase or lower based on network latency. I cannot believe that this is the performance they are asking 5 cents/hour for and highly suspect my application to be not multi-threaded.
This is my code.js: http://pastebin.com/hyM47Ue7
What configuration do I need to apply in order to get it 'running' (faster) on Heroku ? Or what other web servers for node.js could I use ?
I am thankful for every answer on this topic.
Your little example is not multi-threaded. (Not even on your own machine.) But you don't need to pay for more dyno's immediately, as you can make use of multiple cores on a dyno, see this answer Running Node.js App with cluster module is meaningless in Heroku?
To repeat that answer: a node solution to using multiple processes that should increase your throughput is to use the (built-in) cluster module.
I would guess that you can easily get more than 10 req/s from a heroku dyno without a problem, see this benchmark, for example:
http://openhood.com/ruby/node/heroku/sinatra/mongo_mapper/unicorn/express/mongoose/cluster/2011/06/14/benchmark-ruby-versus-node-js/
What do you use to benchmark?
You're right, the web server is not multi-threaded, until you pay for more web dynos. I've found Heroku is handy for prototyping; depending on the monetary value of your time, you may or may not want to use it to set up a scalable server instead of using EC2 directly.

Scaling Node.JS across multiple cores / servers

Ok so I have an idea I want to peruse but before I do I need to understand a few things fully.
Firstly the way I think im going to go ahead with this system is to have 3 Server which are described below:
The First Server will be my web Front End, this is the server that will be listening for connection and responding to clients, this server will have 8 cores and 16GB Ram.
The Second Server will be the Database Server, pretty self explanatory really, connect to the host and set / get data.
The Third Server will be my storage server, this will be where downloadable files are stored.
My first questions is:
On my front end server, I have 8 cores, what's the best way to scale node so that the load is distributed across the cores?
My second question is:
Is there a system out there I can drop into my application framework that will allow me to talk to the other cores and pass messages around to save I/O.
and final question:
Is there any system I can use to help move the content from my storage server to the request on the front-end server with as little overhead as possible, speed is a concern here as we would have 500+ clients downloading and uploading concurrently at peak times.
I have finally convinced my employer that node.js is extremely fast and its the latest in programming technology, and we should invest in a platform for our Intranet system, but he has requested detailed documentation on how this could be scaled across the current hardware we have available.
On my front end server, I have 8
cores, what's the best way to scale
node so that the load is distributed
across the cores?
Try to look at node.js cluster module which is a multi-core server manager.
Firstly, I wouldn't describe the setup you propose as 'scaling', it's more like 'spreading'. You only have one app server serving the requests. If you add more app servers in the future, then you will have a scaling problem then.
I understand that node.js is single-threaded, which implies that it can only use a single core. Not my area of expertise on how to/if you can scale it, will leave that part to someone else.
I would suggest NFS mounting a directory on the storage server to the app server. NFS has relatively low overhead. Then you can access the files as if they were local.
Concerning your first question: use cluster (we already use it in a production system, works like a charm).
When it comes to worker messaging, i cannot really help you out. But your best bet is cluster too. Maybe there will be some functionality that provides "inter-core" messaging accross all cluster workers in the future (don't know the roadmap of cluster, but it seems like an idea).
For your third requirement, i'd use a low-overhead protocol like NFS or (if you can go really crazy when it comes to infrastructure) a high-speed SAN backend.
Another advice: use MongoDB as your database backend. You can start with low-end hardware and scale up your database instance with ease using MongoDB's sharding/replication set features (if that is some kind of requirement).

Resources