I'm working on building a simple app that I can use to test out microservice scaling. I have a simple node.js webserver running with a few routes, but now I need to find something to run that will use some CPU when hitting the routes.
Right now I have a listener up that starts a child process and calculates prime numbers. In my testing it may take a decent while to generate primes up to 100,000,000.... but even with 10+ child process running I am not seeing any real CPU load. So guessing this single threaded type of math equation isn't a great use case.
Can anyone point me to some simple to run things in JS that will burn some CPU?
This page was exactly what I was looking for and trying to do. After changing over to worker threads instead of child processes, I was able to simulate 100 concurrent requests and finally started hammering my CPU.
Related
We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.
I have an app in NodeJS.
Recently we have been getting a lot more traffic (this is a new experience for me) and so I have been running into the "EMFILE: too many open files" error that is caused when a single process tries to open more files than the filesystem allows.
I have increased this limit, so we are good for now. However I'm not sure how long this solution will last...
I am wondering: What are other commonly used options for scaling a Node Application that is getting increasing amounts of traffic? (specifically with a mind to the open files limit problem.)
The PM2 process manager which allows clustering catches my eye (am I correct in understanding that every instance of the application requires it's own core -- ie you can't run 4 instances on a single core?). Are there any other techniques that are regularly used?
Thanks (in advance)
PM2 is a simple solution when you want to run more than one instance of Node, another common alternative is the cluster module http://nodejs.org/api/cluster.html Keep in mind, that you will need to configure another http server such as Nginx to reverse proxy your user requests to your Node processes.
You can run any number of Node processes, regardless of the amount of cores. But since each node process is a single thread, and each core can execute a single thread a time, the optimal configuration is when the number of cores match the number of Node processes. If the number of Node processes is greater than the number of cores, under load, you will experience reduced performance due to redundant context switches your processor will have to perform.
I have a simple nodejs webserver running, it:
Accepts requests
Spawns separate thread to perform background processing
Background thread returns results
App responds to client
Using Apache benchmark "ab -r -n 100 -c 10", performing 100 requests with 10 at a time.
Average response time of 5.6 seconds.
My logic for using nodejs is that is typically quite resource efficient, especially when the bulk of the work is being done by another process. Seems like the most lightweight webserver option for this scenario.
The Problem
With 10 concurrent requests my CPU was maxed out, which is no surprise since there is CPU intensive work going on the background.
Scaling horizontally is an easy thing to, although I want to make the most out of each server for obvious reasons.
So how with nodejs, either raw or some framework, how can one keep that under control as to not go overkill on the CPU.
Potential Approach?
Could accepting the request storing it in a db or some persistent storage and having a separate process that uses an async library to process x at a time?
In your potential approach, you're basically describing a queue. You can store incoming messages (jobs) there and have each process get one job at the time, only getting the next one when processing the previous job has finished. You could spawn a number of processes working in parallel, like an amount equal to the number of cores in your system. Spawning more won't help performance, because multiple processes sharing a core will just run slower. Keeping one core free might be preferred to keep the system responsive for administrative tasks.
Many different queues exist. A node-based one using redis for persistence that seems to be well supported is Kue (I have no personal experience using it). I found a tutorial for building an implementation with Kue here. Depending on the software your environment is running in though, another choice might make more sense.
Good luck and have fun!
So Im learning node.js and making a random player vs player cardgame. I will be using sockets and sessions.
Theoreticall ill have lots of games running at the same time.
Whats the best practice here, do I spawn a child process of my node server for every game, or simply just keep data of ongoing games and 'boards' in the main server and just grab a given game every time there is an action?
Presuming there's no intensive CPU processing going on, you should do it all in a single process. Node is very good at doing a high volume of very small operations very quickly, all on a single thread, and it sounds like that fits your application.
If you were going to be doing CPU-intensive calculations you might want to hand off to worker processes, but you still wouldn't want a separate process per game.
More than likely, you'll want to load balance however many servers you need. If you're using socket.io, then you should look into socket.io-redis (briefly covered here) to see how you can have connections running across multiple servers still talk to each other. With an approach like this, you'll need to consider the fact that you can't store some info in memory, and instead will need to persist to redis or a similarly accessible medium for all servers to access.
The NodeJS website says the following. Emphasis is mine.
Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.
Even though I love NodeJS I dont see why it is better for scalable applications compared to the existing technologies such as Python, Java or even PHP.
As I understand the JavaScript run-time always runs as a single thread in the CPU. The IO however probably uses underlying kernel methods which might rely on the thread pools provided by the kernel.
So the real questions that need to be answered are:
Because all JS code will run in a single thread NodeJS is unsuitable for applications where there is less IO and lots of computation ?
If I am writing a web application using nodejs and there are 100 open connections each performing a pure computation requiring 100ms, at least one of them will take 10s to finish ?
If your machine has 10 cores but if you are running just one nodeJS instance your other 9 CPUs are sitting ducks ?
I would appreciate if you also post how other technologies perform viz a viz NodeJS in these cases.
I haven't done a ton of node, but I have some opinions on this. Please correct if I am mistaken, SO.
Because all JS code will run in a single thread NodeJS is unsuitable for applications where there is less IO and lots of computation ?
Yeah. Single threaded means if you are crunching lots of data hard in your JS code, you are blocking everything else. And that sucks. But this isn't typical for most web applications.
If I am writing a web application using nodejs and there are 100 open connections each performing a pure computation requiring 100ms, at least one of them will take 10s to finish?
Yep. 10 seconds of CPU time.
If your machine has 10 cores but if you are running just one nodeJS instance your other 9 CPUs are sitting ducks?
That I'm not sure about. The V8 engine might have some optimizations in it that take advantage of multiple cores, transparent to the programmer. But I doubt it.
The thing is, most of the time a web application isn't calculating. If your app is engineered well, a single request can be responded to very quickly. And if you have to fetch things to do that (db, files, remote services) you shouldn't have to wait for that fetch to return before processing the next request.
So you may have many requests in various stages at the same time in various stages of completion, due to when I/O callbacks happen. Even though only one request is running JS code at a time, that code should do what it needs to do very quickly, exit the run loop, and await the next event callback.
If your JS can't run quickly, then this model does pose a problem. As you note, things will get hung as the CPU churns. So don't build a node web application that does lots of intense calculation on the fly.
However, you can refactor things to be asynchronous instead. Maybe you have a standalone node script that can do the calculation for you, with a callback when it's done. Your web application can then boot up that script as a child process, tell it do stuff, and provide a callback to run when it's done. You now have sort of faked threads, in a round about way.
In virtually all web application technologies, you do not want to be doing complex and intense calculation on the fly. Even with proper threading, it's a losing battle. Instead you have to strategize. Do the calculations in the background, or at regular intervals on a cron job, outside of the main web application process itself.
The things you point out are flaws in theory, but in practice it really only becomes an issue if you aren't doing it right.
Node.js is single threaded. This means anything that would block the main thread needs to be done outside the main thread.
In practice this just means using callbacks for heavy computations the same way you use callbacks for I/O.
For instace here's the API for node bcrypt
var bcrypt = require('bcrypt');
bcrypt.genSalt(10, function(err, salt) {
bcrypt.hash("B4c0/\/", salt, function(err, hash) {
// Store hash in your password DB.
});
});
Which Mozilla Persona uses in production. See their code here.